UNIVERSITY OF STIRLING DEPARTMENT OF COMPUTING SCIENCE 31V4 : Programming Systems Assignment 2 : Spring 1989 (25%) Due by: 4.00 pm Thursday 27 April 1989 This assignment requires you to write a large program in Modula-2, and your code should make good use of its features where appropriate. The documentation provided with your program should follow the guidelines given on the 3133 course; an updated copy of these is attached as an Appendix. Note that the documentation of your program carries a high weighting in the marking scheme. The Problem You are required to write a two-pass assembler for a subset of the instruction set and addressing modes of the PDP11 computer. The set of instructions that your assembler should accept are: mov, movb, cmp, add, jsr, clr, tst, tstb, dec, rts, bgt, bne and the addressing modes to be recognised are: 0, 2, 3, 4 and 6 You program should also be able to recognise the following two pseudoinstructions: .start .end It should be able to recognise the following form of declaration for symbolic constants: = as in errmsg = 2042 and such constants can only be declared before any labels or executable intructions. Only one such constant may be declared on a line. Labels used in an assembler program may be up to six characters long, should terminate with a ':' (colon) character and must begin at the first character position of a line. All numeric fields used in a program are considered to be in octal. Your assembler should be able to accept the two demonstration programs given in Appendix C, and should produce listings of the form given in Appendix D. No forms of binary of '.OBJ' files need to be produced. Copies of the output generated from these programs should be included with your documentation. The two demonstration programs are available to you as: [IIUSCV]test1.mac [IIUSCV]test2.mac These two programs are quite comprehensive, and we recommend that you create some more simple forms in order to help with the development of your program. Where appropriate, these too may be included with your documentation. You are not required to perform any error checking for this exercise (you may assume that the demonstration programs are syntactically correct). However, you may find it helpful to provide some very basic syntax checking while developing the program. If you do so, take care not to let this become too elaborate! Appendix A describes the modular structure that you should adopt for the program - note that this is a requirement and not a suggestion! We have also provided some more general hints about the program. Appendix B describes the details of the input forms that the assembler should be able to recognise. Appendix C lists the two demonstration programs supplied, and Appendix D shows the form of output that your program should produce for these. (Small deviations in terms of the tabulation and pagination will be accepted). A copy of the documentation guide is provided as Appendix E. Some General Hints 1. The more complex modules will require the use of the systematic development and testing techniques described in 3133. Where modules are tested in isolation, any test programs, test data and results should be presented as part of your documentation. (This is particularly relevant if the complete program does not work fully, since it helps us to assess how much you have achieved). 2. Try to use sensible naming conventions in your code, eg. get_symbol, decode_src_mode to aid readability. (Underscore characters are acceptable.) 3. Avoid '.def' modules which IMPORT from one another. 4. Remember that PDP11 words are always aligned on an even byte boundary. 5. Do NOT try to be too clever in using the file-handling facilities of the VAX - therein lies a veritable quagmire for the unwary! The routines provided in MODULE InOut should be quite sufficient for your needs. 6. A significant sub-task that needs to be performed correctly in the exercise is that of sizing instructions. In order to create a symbol table on the first pass, you need to identify the amount of storeage needed by each instruction. Obviously some forms are fixed (such as branch instructions), while others need 2, 4 or 6 bytes according to the instruction and the addressing modes. The need to size on the first pass means that you need to do some decoding which will then be repeated on the second pass: so try to organise the way that this is done so that it can be used on both passes. (As a sub-point, do not attempt any optimisation for 'efficiency' or economy of storage. Concentrate on developing a clear structure - if this means doing a job twice, then this is quite acceptable.) 7. The question of the case of input characters is another issue to consider. You may find it worthwhile to construct a simple function procedure that checks a character, and if it is alphbetical, converts it to either upper or lower case - according to your preference. If this is used during input, then it simplifies any later checking or matching operation. 8. One problem that needs to be organised fairly carefully in this exercise is that of reading through the source file twice. The procedure OpenInput in MODULE InOut allows you ro select input from a file, and avoids getting into the depths of FileSystem. However, to reset the file, you need to IMPORT the procedure Reset from FileSystem and use it with the File variable 'in' from InOut (this is accidently missing from the listing in your manual). To clarify this, a very simple example of a file copy program is given over. (* Simple test program to try to overcome the problems of OpenInput when needing to read a file twice. *) MODULE Readtwice FROM FileSystem IMPORT Reset; FROM InOut IMPORT OpenInput, in, Read, Write, Done, CloseInput; VAR ch : CHAR; (* buffer for each character copied *) BEGIN OpenInput('test'); Read(ch); WHILE Done DO White(ch); Read(ch) END; (* while *) (* now rewind the file and copy it again *) Reset(in); Read(ch); WHILE Done DO Write(ch); Read(ch) END; (* while *) CloseInput; END Readtwice 9. You can obtain further ideas about the PDP11 instruction set and the structuring of the Assemblers from 'Tanenbaum', and of couse from other texts in the library. This problem is quite tractable as long as you think it through before you begin to write any code and the develop it one step at a time - eg an initial form might be a prototype for the Pass_1 module which prints out the location and size of each instruction, and the value assigned to each label as they are encountered. 31V4 : Assignment 2 Modular Structure Appendix A Your program must be constructed using at least the following five (5) MODULES: (a) Pass_1 The task of the initial pass is to read through the source file, sizing instructions and inserting label values in the symbol table. This MODULE need only EXPORT the 'outer' procedure 'pass1' - any parameters used by this are to be chosen by you. (Hint: making this a BOOLEAN Function Procedure helps it signal errors back to the caller.) (b) Pass_2 This module EXPORTS only the procedure 'pass2', and again the parameters of this are left to you to chose. Its task is to re-read the source file and construct the assembler listing with the aid of the symbol table created by 'pass1'. As such it will produce an output listing of the form shown in Appendix D, with the address (octal) for each instruction, the op-codes and the original source lines, plus a summary of the symbol table at the end of this. (The formats need not be reproduced in minute detail as regards tabulation and pagination. If necessary, you can print the address field out on every line.) Note that the symbol table should be printed in alphabetical order. (c) Parser This MODULE will provide any of the text and symbol-handling routines needed by the previous two modules. Any handling of the input file should be confined to this module. When parsing the lines of an assembler program, some form of 'get_symbol' procedure or function is needed to return the type and value of the next symbol along a line - you are advised to create an enumerated type 'symbol_type' to aid this. The type elements are up to you, obviously a minimum is: Symbol_type = (comment, label, constant, op_code, pseudoinstruction, src_address, dst_address) but you may find it useful to include elements for all op-codes and address modes. For the PDP11 assembler, symbols are seperated by spaces, tab characters (CHR(9)) and on occasions, commas. Spaces and tabs are also used to improve layout and may be ignored in this context. As an example, 'oggin' reads each line of a source program into a line buffer and then scans along, discarding spaces and tabs using the (Pascal) 'skip' procedure described below: PROCEDURE skip(VAR currentline:line; VAR position:INTEGER); BEGIN WHILE ((currentline[position] IN [CHR(32),CHR(9)]) AND (position < linelength) ) DO position:= position + 1; END; (* skip *) As a general hint, parsing is a significant problem, do not attempt the whole task in one go! (d) Symbol_table This MODULE is used to hold the symbol table created by pass1, and to provide values as requested by pass2. Storage of labels and their values may be performed in any manner that you prefer and can justify. This module sould EXPORT the type: Label = ARRAY [0..6] OF CHAR; (In the example, constants are listed slightly differently in the symbol table. This may be omitted if you wish, or you can modify the above declaration to incorporate this in some way.) (Note that strings in Modula-2 are conventionally terminated by the null characters 0C). The MODULE should also EXPORT the procedures: PROCEDURE enter_label(l : Label; v : INTEGER); (* for use by pass1 *) PROCEDURE find_label(l : Label; VAR v : INTEGER): BOOLEAN; (* for use by pass2 wen obtaining label values. The Boolean return indicates whether the label was found in the symbol table or not *) PROCEDURE init_list(VAR n : INTEGER); (* used by pass2 when printing out the symbol table. The number of entries in the symbol table is returned in n. It also sets up an internal pointer to the first alphabetical entry and is followed by n calls to the next procedure *) PROCEDURE get_label(VAR l : label; VAR V : INTEGER): BOOLEAN; (* returns next label in the table and its value, a return status of FALSE indicates the end of the table *) (e) Master This MODULE is simply used to sequence the other modules and does not EXPORT anything at all. IMPORTANT NOTE The word 'symbol' is ambiguous in this Appendix. As used in 'symbol table' it refers simply to label names (in this case), whereas when used in parsing, it means any string of characters, for example: first: last: inout: err10: are all symbols as are: mov add clr and also: #20 r1 -(sp) 31V4 : Assignment 2 Source File Formats Appendix B This appendix describes the formats used in an assembler source program, and hence forms an input specification for your assembler program. (a) Comments A comment begins with a semi-colon character ';' and is terminated at the end of the line. It may begin at any point along the line. (b) Labels These must begin in the first character position of a line, and may consist of up to six alphanumeric characters (a - z, 0 - 9), terminated by a colon character ':' and followed by a space or tab character. (c) Pseudo-instructions These begin with a period character '.' and may be placed anywhere on a line. They may be followed by parameters and comments, but not by instructions. Only the following two need to be recognised by your assembler: .start (where is an octal constant that defines the initial value to be assigned to the progtam's program counter), and .end which tells the assembler to end its operation. Any lines or characters following this are ignored. (d) Instructions For the PDP11, the format of an instruction will depend on its group, and you will need to parse each group in a seperate way. The operand fields are seperated from the opcode by spaces and/or tab characters, and for two-operand formats, the source and destination operands are seperated by a comma, together with optional spaces or tabs. For example, the following forms are all equivalent and equal: mov r0, r1 mov r0 ,r1 mov r0, r1 mov r0 , r1 The following instructions are to be recognised by your program: mov src, dst movb src, dst cmp src, dst add src, dst jsr r, dst clr dst tst dst tstb dst dec dst rts r bgt lbl bne lbl src, dst can be any register and any of the five modes described in the next section. r must be a register and lbl must be a label. Note that for branching instructions, the relative address field is constructed as a signed 8-bit offset, which is a number of words. The offset is calculated relative to the address of the next instruction following the branch instruction. (e) Addressing Modes Five of the eight PDP11 addressing modes are to be recognised. For three of these, a different syntax is used with register 7 (pc). The modes and their formats are: (i) mode 0 format (all registers): rn, sp, pc n=0..5 eg mov r0,sp (ii) mode 2 format (r0 - r5, sp): (rn)+ (sp)+ n=0..5 eg clr (r1)+ format (pc): #val eg add #2,r0 (Where mode 2 is used with the pc, the characters following the # must all be numeric, forming an octal constant.) (iii) mode 3 format (pc): @#val eg mov r0,@#776 mov r1,@#limit (val may be an octal constant or a symbolic constant) (iv) mode 4 format (r0 - r5, sp): -(rn) -(sp) n=0..5 eg mov #2,-(sp) (mode 4 should not be used with the pc) (v) mode 6 format (r0 - r5, sp): x(rn) x(sp) n=0..5 eg clr 4(r5) format (pc): label eg jsr r5,nextch DB/0389/4555 Appendix C: Files TEST1.MAC, TEST2.MAC Appendix D: Files TEST1.TXT, TEST2.TXT Appendix E: 31V4 Assignment Documentation Guide D Budgen March 10, 1989 This guide gives an outline of the form of presentation that is required for assignments in 31V4, and to a large extent, in later courses too. Two particular points should be kept in mind. One is that the martker is familiar with the compiler(s) and operating system - so don't describe them; the second is that the documentation of an assignment not only carries a high weighting of marks (30-50%), but also provides the means of showing to the marker exactly what you have achieved - an especally important point if your program does not work fully. 1 Requirement Specification The assignment instructions and details given to you should no be repeated in your documentaion. You will gain no marked and only waste time by doing so. Under this heading you should try to summarise and formalise the problem that you have been given, using your own words, and to indicate what asumptions you have made. This then forms a basis for reading the rest of the documentation. 2 Design details Documentation under this heading must include the following: * A brief overview of the solution that you have adopted (not more than a few paragrapgs), written in English prose and describing how the problem was decomposed. * A structure diagram showing all the (sub) programs in boxes, with arrows showing their interdependence. Below the subprogram name in each box, there should be a list of the formal parameters, together with an indication as to whether they are used to pass information in to the subprogram or out from it. If necessary, these formal parameters can be tabulated at the side of the diagram. When a (sub) program calls another subprogram more than once, only one arrow is shown. If two (sub) programs call one subprogram, both arrows are given. * The following information should be provided for each sub-program, particular headings may be omitted where there is nothing to report. (For Modula-2 programs, these should be grouped according to the MODULE in which they appear. Where such a MODULE has a significant main body, then the algorithm for this should also be declared.) 1. Name of the sub-program, preceded by the word procedure or function. 2. in: followed by the names and types of parameters used to pass information into the sub-program. 3. out: followed by the names and types of parameters passing information out of the sub-program. A function's name will be repeated here. Variable parameters may appear in both in and out lists. 4. global: followed by the names and types of any global variables accessed in the sub-program, together with the name of the module in which each was declared. 5. (Modula-2 only) local: followed by the names and types of any variables accessed within the sub-program, and which are declared within the outer block of the MODULE. 6. effect: followed by a brief statement of the overall effect of calling the sub-program with it being viewed as a 'black box'. This will include a description of the purpose of each parameter. 7. algorithm: followed by a top-level algorithm for the sub- program. This should only be provided for long or complicated sub-programs. 3 Data Structures Used This section should contain the details of any structured types and variables used, their purpose and the name of the (sub) program or MODULE in which they are declared. In particular, where these are other than very simple and static, diagrams should be included in this section. 4 Test documentation This should consist of the following: * A specification for each test used, explaining what it is trying to examine for a particular module. If appropriate, an outline algorithm of the actual test routine should be given. It is usually most convenient to present this as a table. For example, the headings for such a table might include: - Test number: - Purpose of test: - Test data used: * Any printout generated from such tests, or if appropriate, a section of printout (enough only to demonstrate what was done). This should be annotated to give a suitable link with the information presented in the previous sub-section. You should also provide details of any test data that you may have used with this. [1] * A sample printout of the test data run on the final program. Where some form of modular basis has been used for your design and implementation then the documentation for this section should be ordered in the same manner as that for sub-programs, as described in section 2, so that each module is fully documented. [1] NB: Where the overall program may have failed to work correctly this documentation is especially valuable in that it allows the marker to see how much has been achieved so that marks can be allocated for partial success. 5 The User Guide This is an important (though not necessarily very lengthy) section that should be used to describe the following points: * what the program can do * how to actually run the program (the user interface and how the program interfaces with the Operating System), supported by an annotated exaple. * details of any error messages, and how they should be interpreted. This sections should be written from the standpoint of treating the program as a 'black box' - i.e. so that a user with no programming knowledge can use it. An example of use should always be included, with any user input clearly distinguished (eg by using a different colour). 6 Code You should provide compiler listings for each module (to show that it does compile). The layout of your code is important and should include: * appropriate use of structuring and indentation * a 'header' comment block describing what the module does, who wrote the code, when, where it is stored, features of importance etc. * a comment by the declaration of each variable, explaining its use * comment lines wherever appropriate, to indicate the task performed by each procedure, use made of variables, particular decision points, etc. You should try to develop the habit of decumenting code as you write it, rather than adding comments as afterthoughts, since the comments can be a useful aid to debugging. 7 Conclusion This overall scheme does not require vast quantities of writing, but it does demand an orderly approach to your work and the ability to describe what you have done in a clear manner. Try to make good use of headings and of underlining (rather than excessive use) to show the structure of your documentation. In particular, make use of diagrams and tables where these are appropriate so that the marker has immediate and convenient access to the required information. Bear in mind that we award marks for quality and clarity, not for quantity!