COMP 230 Assembler Programming Project

Programming Project: Assembler

Behavioral Requirements | Provided Files | Coding Tips | Ensuring Quality | Submission Requirements

You may work individually or in groups of 2 people to finish this project. I expect that the programming will be your group's effort and not the effort of other persons.

Assembler

Write a program that will read in MIPS assembly language and write out the corresponding "pseudo-binary" machine language instructions. ("Pseudo-binary" because each line will actually contain 32 '0' or '1' ASCII characters rather than true binary.)

Your assembler should be a two-pass assembler. In the first pass (already implemented in pass1.c), the program parses each instruction to see if it contains a label at the beginning of the instruction. If it does, the program adds the label and the instruction address to a table. For example, for the following assembly language code fragment, if the instruction labelled main were at address 0 then the first pass of the assembler would generate the following internal label table data structure.

Sample Assembler Input

Label Table after Pass 1

main:   lw $a0, 0($t0)
begin:  addi $t0, $zero, 0   # beginning
        addi $t1, $zero, 1
loop:   slt $t2, $a0, $t1    # top of loop
        bne $t2, $zero, finish
        add $t0, $t0, $t1
        addi $t1, $t1, 2
        j loop               # bottom of loop
finish: add $v0, $t0, $zero

Label		Address
main		0
begin		4
loop		12
finish		32

The second pass, which you will implement, does the actual translation from assembly to machine language. The machine language output should be in the same format as the input for your disassembler program, i.e., each line should contain 32 characters representing the 32 bits of a single machine instruction.

Sample Assembler Input		Sample Assembler Output
main: lw $a0, 0($t0) begin: addi $t0, $zero, 0 # beginning addi $t1, $zero, 1 loop: slt $t2, $a0, $t1 # top of loop bne $t2, $zero, finish add $t0, $t0, $t1 addi $t1, $t1, 2 j loop # bottom of loop finish: add $v0, $t0, $zero		10001101000001000000000000000000 00100000000010000000000000000000 00100000000010010000000000000001 00000000100010010101000000101010 00010101010000000000000000000011 00000001000010010100000000100000 00100001001010010000000000000010 00001000000000000000000000000011 00000001000000000001000000100000

Sample Assembler Input

Sample Assembler Output

main:   lw $a0, 0($t0)
begin:  addi $t0, $zero, 0   # beginning
        addi $t1, $zero, 1
loop:   slt $t2, $a0, $t1    # top of loop
        bne $t2, $zero, finish
        add $t0, $t0, $t1
        addi $t1, $t1, 2
        j loop               # bottom of loop
finish: add $v0, $t0, $zero

10001101000001000000000000000000
00100000000010000000000000000000
00100000000010010000000000000001
00000000100010010101000000101010
00010101010000000000000000000011
00000001000010010100000000100000
00100001001010010000000000000010
00001000000000000000000000000011
00000001000000000001000000100000

Behavioral Requirements:

In a general way, your assembler should provide the reverse functionality of your disassembler in the Disassembler Project. It might seem, in fact, that you could use the MIPS assembly language instructions produced by your disassembler as input to the assembler. This is, in fact, almost true. The statements that are exceptions are the beq, bne, j, and jal instructions, which should have labels instead of addresses in the assembly language input. (These sample instructions come from the testfile described in the Coding Tips below.)

Disassembler Output		Assembler Input with Labels instead of Addresses
bne $t2, $zero, 32		bne $t2, $zero, FINISH
j 12		j LOOP

The output of your assembler should be a text file containing strings representing MIPS instructions, one per line. Actual MIPS instructions would be stored in 32-bit integers; instead, your file will contain lines of 32 characters ('0' or '1'), where each line represents a machine language MIPS instruction. This is the same format as the input for your disassembler. In fact, you should be able to use the output of your assembler as the input to your disassembler and get back the MIPS assembly language program that you started with, except that the labels and comments in the original assembly language program will have disappeared and labels in instructions will have been replaced by their corresponding addresses in beq/bne/j/jal instructions.

Sample Assembler Input		Sample Assembler Output		Sample Disassembler Output
main: lw $a0, 0($t0) begin: addi $t0, $zero, 0 # beginning addi $t1, $zero, 1 loop: slt $t2, $a0, $t1 # top of loop bne $t2, $zero, finish add $t0, $t0, $t1 addi $t1, $t1, 2 j loop # bottom of loop finish: add $v0, $t0, $zero		10001101000001000000000000000000 00100000000010000000000000000000 00100000000010010000000000000001 00000000100010010101000000101010 00010101010000000000000000000011 00000001000010010100000000100000 00100001001010010000000000000010 00001000000000000000000000000011 00000001000000000001000000100000		lw $a0, 0($t0) addi $t0, $zero, 0 addi $t1, $zero, 1 slt $t2, $a0, $t1 bne $t2, $zero, 32 add $t0, $t0, $t1 addi $t1, $t1, 2 j 12 add $v0, $t0, $zero

Sample Assembler Input

Sample Assembler Output

Sample Disassembler Output

main:   lw $a0, 0($t0)
begin:  addi $t0, $zero, 0   # beginning
        addi $t1, $zero, 1
loop:   slt $t2, $a0, $t1    # top of loop
        bne $t2, $zero, finish
        add $t0, $t0, $t1
        addi $t1, $t1, 2
        j loop               # bottom of loop
finish: add $v0, $t0, $zero

10001101000001000000000000000000
00100000000010000000000000000000
00100000000010010000000000000001
00000000100010010101000000101010
00010101010000000000000000000011
00000001000010010100000000100000
00100001001010010000000000000010
00001000000000000000000000000011
00000001000000000001000000100000

lw $a0, 0($t0)
addi $t0, $zero, 0
addi $t1, $zero, 1
slt $t2, $a0, $t1
bne $t2, $zero, 32
add $t0, $t0, $t1
addi $t1, $t1, 2
j 12
add $v0, $t0, $zero

Your program should handle all of the instructions and registers in the MIPS Instructions Table and MIPS Registers Table I have provided online. You should be able to handle all 32 registers and all three instruction formats (R, I, and J), including all forms of addressing (for example, lw addresses, beq addresses, and j addresses). You may find Figures 2.1 (p. 78), 2.6 (p. 100), and 2.14 (p. 121) helpful in addition to the table I have provided. [Optional: You may wish to extend your program to handle the additional load and store instructions listed as part of the "Core Instruction Set" on the green MIPS Reference Data card that comes with the book.]

You may assume that every line contains an instruction, i.e., you can increment the program counter for every line. (This is helpful in determining the address associated with labels.)

You should handle all error conditions gracefully. In other words, an error condition should not cause your program to terminate unless it is an error that cannot be recovered from. Otherwise, your program should print a message indicating the type and location of the error and then continue as best it can (at the very least, with the next instruction).

You should develop a test file of your own (or more than one) for testing pass 2. You may also want to test it using your own disassembler.

Provided Files:

You should be able to use the Label Table functions you wrote for the Label Table Programming Project, as well as the Makefile, header files, print functions, and process_arguments function used in that program. In the Makefile, adjust the all: target to include the assembler program. If you create new files beyond the ones already listed in the assembler section of the Makefile, don't forget to add them to the list of dependencies and to the GCC compilation action.

I have provided additional code that you may use:

Main Function and Pass 1:
- pass1.c: Implements pass 1.
- testPass1.c: Tests pass1.c; can be used as a model for the assembler if you copy it and then uncomment the statements that rewind the file back to the beginning and call pass2. Note, though, that the documentation at the top of the file describes a test driver for pass1, not an assembler program, so that needs to be updated as well.
Pass 2: The following functions are used by Pass 2 to generate "pseudo-binary" machine language code corresponding to the assembly language input.
- pass2: (in pass2.c) This function is complete but calls the functions below, so you will need to edit it if you decide to replace them with functions with different names or parameters.
- getOpCode: (in instructionNames.c, implemented in the Encoding/Decoding Names project) Returns the I- or J-format opcode for the given instruction name, or 0 if the instruction is not a valid I- or J-format instruction.
- getFunctCode: (in instructionNames.c, implemented in the Encoding/Decoding Names project) Returns the R-format funct code for the given instruction name, or -1 if the instruction is not a valid R-format instruction.
- getRegNbr: (in registerNames.c, implemented in the Encoding/Decoding Names project) Returns the register number for the given register name, or -1 if the register name is not a valid register.
- processR: (stub in pass2.c) Designed to print R-format instructions in their machine code format using functions in printAsBinary.c.
- processIorJ: (stub in pass2.c) Designed to print I- and J-format instructions in their machine code format using functions in printAsBinary.c.
Parsing Input: Fully-implemented functions used by both pass1 and pass2 for parsing an instruction into individual tokens (syntactic units):
- getToken.c: Used by pass1, pass2, and getNTokens to read tokens — labels, instruction names, register names, integer constants, etc.
- getNTokens.c: Reads multiple tokens at a time; useful for pass2.
- getInstName.c: Finds the instruction name within the instruction; useful for pass2.
- testGetNTokens.c: Tests getNTokens and provides an illustration of how to use it.
Printing Output: The following functions or stub functions are in printAsBinary.c:
- printInt: (stub) Designed to print the binary version of a value. (The stub version prints the decimal equivalent, which is useful while developing and debugging the program, so you may want to keep that stub behavior until everything else is finished. If you do, you can use `smallSampleTestfile.mips.decimal` to check your actual results to expected results as you code.)
- printReg: (stub) Designed to find the register number for the given register and print its binary value (or decimal value during debugging) using printInt.
- printSignedIntInString: Prints the value of the integer in a string (e.g., "23" or "-4") in binary format (or decimal format during debugging). Useful for printing the immediate value in many I-format instructions. Fully implemented.
- printUnsignedIntInString: Prints the value of an unsigned integer in a string (e.g., "23") in binary format (or decimal format during debugging). Useful for processing sll, srl, and lui instructions. Fully implemented.
- printJumpTarget: (stub) Designed to print an address for J-format instructions.
- printBranchOffset: (stub) Designed to print a branch offset for beq and bne instructions.
Test Files:
- smallSampleTestfile.mips: A sample input file for partially testing LabelTable, Pass 1, or a full Assembler.
- smallSampleTestfile.mips.out: The output that the final Assembler should produce if given smallSampleTestfile.mips.
- smallSampleTestfile.mips.decimal: The output that the Assembler should produce if given smallSampleTestfile.mips but in decimal form rather than binary; useful for development testing if you keep printInt in its original form until the last step (see more about printInt below).

Coding Tips:

If you set up a directory with the provided files and the appropriate files from the Label Table project, and if you make a copy of testPass1.c and call it assembler.c, you should be able to compile and run a starter version of the Assembler program. The initial output, though, will mostly be error messages and some starter information for add and j instructions since the most important functions are just stubs. As you implement each function you should be able to compile and run the program again, seeing incremental progress as you go (agile software development).

One approach to completing this project would be:

Set up your directory with the provided files and files from Label Table, including copying testPass1.c to assembler.c.
Update new assembler.c to rewind the file and call pass2. (And update top-of-file comments!)
Compile and run with sample test file, with and without debugging:
```
   make
   ./assembler smallSampleTestfile.mips
   cat smallSampleTestfile.mips.decimal
   ./assembler smallSampleTestfile.mips 1 
```
Notice that the only instruction it recognizes is 'add', and it only partially implements that.
Implement printReg in printAsBinary.c using functions you implemented in the Encoding/Decoding Names project. Compile and run to see what output you get.
Find the appropriate code in processR and complete the processing of "normal" R-format instructions like add. Compile, run, and compare to smallSampleTestfile.mips.decimal.
Add additional test cases to smallSampleTestfile.mips (or copy that file and add your own test cases) to test additional R-format instructions and registers.
Add code to handle the various R-format special cases, testing your results as you go. (This may require new test cases.) Then add code to handle I-format and J-format instructions. You will need to implement printBranchOffset and printJumpTarget in printAsBinary.c when you get to the branch and jump instructions.
I have provided some hints on completing the printInt function to print an integer in binary format, when you are ready to do that, or you can reuse the code provided in printIntAsBinary.c in the PowersOfTwo assignment. I strongly recommend keeping the decimal output as a way of verifying the binary output until the very end. (You can keep it permanently if you convert it from printf to printDebug.)

Ensuring Quality

As specified in the syllabus, your program should adhere to the Kalamazoo College CS Program Style Guide and Documentation Standards, including use of the Braces Line Up style pattern. You may also use the associated template files: the function template file and the header template file.

To ensure that all function calls are syntactically correct (match the function definitions), you should include function declarations for all of your functions in one or more header files, and include the header file(s) in all appropriate C source files (*.c files).

The Makefile I have provided specifies a set of compiler options that will help you catch many errors at compile time. These options generate warnings about questionable constructions that often indicate programmer confusion or actual logic errors. You may have to make adjustments to the Makefile, though, if the specific options or option names for your compiler are somewhat different.

When your program is fully implemented, the smallSampleTestfile.mips input file should produce output equivalent to smallSampleTestfile.mips.out (which is a copy of the original smallSampleTestfile.input file provided as part of the Disassembler project). Note, though, that these two files only test some cases; they do not provide a thorough test suite.

Since comparing long strings of binary is difficult, you may want to use the Unix/Linux diff command to compare your output against smallSampleTestfile.mips.out. To do this, you would run your program and save your output to a file rather than print it to the screen:
    ./assembler smallSampleTestfile.mips > myOutput
    diff myOutput smallSampleTestfile.mips.out
If you are running Windows, the two files might have different line endings (carriage return and line feed vs. just line feed), so you may want to run them through stripCR first. For example,
    make stripCR
    ./stripCR smallSampleTestfile.mips.out > strippedSmallSample.out
    ./stripCR myOutput > strippedMyOutput.out
    diff strippedMyOutput.out > strippedSmallSample.out

Submission Requirements:

Your submission should contain;

All the source code for your assembler (including header file(s)).
Your test input file(s) in MIPS assembly language.
External documentation (e.g., a README.md file, man page, or other help file) that a new user could use to know how (and why) to use your program. It should include a description of your program, along with some sample input and sample output (which need not be the same as your test file(s), since the point of the sample input/output is to help with your description), and instructions on how to run the program. The Program Style Guide has a little more information on what should be included in external documentation.
The output produced by your test input file(s).

Your program should also work with other input files that may be developed for consistent grading. (Note: It is a good idea to run make clean in the directory before submitting; this will remove the machine-specific executable and intermediate "object code" files, since your code will have to be re-compiled on my machine anyway.)

The old rubric for grading the Assembler Programming Project was based on the following general categories. The current rubric is similar, but adjusted to fewer points to match the general grading scheme this quarter.
  Compiles and runs                                            10 pts
  Correctness (satisfies requirements)                         70 pts
  Internal documentation and coding style                      10 pts
  External Documentation                                       10 pts
  Test Cases                                                   10 pts

  Total:                                                       110 pts