Programming Project: Assembler


Behavioral Requirements | Provided Files | Coding Tips | Ensuring Quality | Submission Requirements



You may work individually or in groups of 2 people to finish this project. I expect that the programming will be your group's effort and not the effort of other persons.


Assembler

Write a program that will read in MIPS assembly language and write out the corresponding "pseudo-binary" machine language instructions. ("Pseudo-binary" because each line will actually contain 32 '0' or '1' ASCII characters rather than true binary.)

Your assembler should be a two-pass assembler. In the first pass (already implemented in pass1.c), the program parses each instruction to see if it contains a label at the beginning of the instruction. If it does, the program adds the label and the instruction address to a table. For example, for the following assembly language code fragment, if the instruction labelled main were at address 0 then the first pass of the assembler would generate the following internal label table data structure.

Sample Assembler Input Label Table after Pass 1
main:   lw $a0, 0($t0)
begin:  addi $t0, $zero, 0   # beginning
        addi $t1, $zero, 1
loop:   slt $t2, $a0, $t1    # top of loop
        bne $t2, $zero, finish
        add $t0, $t0, $t1
        addi $t1, $t1, 2
        j loop               # bottom of loop
finish: add $v0, $t0, $zero
            
Label        Address
main0
begin4
loop12
finish32

The second pass, which you will implement, does the actual translation from assembly to machine language. The machine language output should be in the same format as the input for your disassembler program, i.e., each line should contain 32 characters representing the 32 bits of a single machine instruction.

Sample Assembler Input Sample Assembler Output
main:   lw $a0, 0($t0)
begin:  addi $t0, $zero, 0   # beginning
        addi $t1, $zero, 1
loop:   slt $t2, $a0, $t1    # top of loop
        bne $t2, $zero, finish
        add $t0, $t0, $t1
        addi $t1, $t1, 2
        j loop               # bottom of loop
finish: add $v0, $t0, $zero
            
10001101000001000000000000000000
00100000000010000000000000000000
00100000000010010000000000000001
00000000100010010101000000101010
00010101010000000000000000000011
00000001000010010100000000100000
00100001001010010000000000000010
00001000000000000000000000000011
00000001000000000001000000100000
            

Behavioral Requirements:

In a general way, your assembler should provide the reverse functionality of your disassembler in the Disassembler Project. It might seem, in fact, that you could use the MIPS assembly language instructions produced by your disassembler as input to the assembler. This is, in fact, almost true. The statements that are exceptions are the beq, bne, j, and jal instructions, which should have labels instead of addresses in the assembly language input. (These sample instructions come from the testfile described in the Coding Tips below.)

Disassembler Output Assembler Input with Labels instead of Addresses
bne $t2, $zero, 32bne $t2, $zero, FINISH
j 12j LOOP

The output of your assembler should be a text file containing strings representing MIPS instructions, one per line. Actual MIPS instructions would be stored in 32-bit integers; instead, your file will contain lines of 32 characters ('0' or '1'), where each line represents a machine language MIPS instruction. This is the same format as the input for your disassembler. In fact, you should be able to use the output of your assembler as the input to your disassembler and get back the MIPS assembly language program that you started with, except that the labels and comments in the original assembly language program will have disappeared and labels in instructions will have been replaced by their corresponding addresses in beq/bne/j/jal instructions.

Sample Assembler Input Sample Assembler Output Sample Disassembler Output
main:   lw $a0, 0($t0)
begin:  addi $t0, $zero, 0   # beginning
        addi $t1, $zero, 1
loop:   slt $t2, $a0, $t1    # top of loop
        bne $t2, $zero, finish
        add $t0, $t0, $t1
        addi $t1, $t1, 2
        j loop               # bottom of loop
finish: add $v0, $t0, $zero
            
10001101000001000000000000000000
00100000000010000000000000000000
00100000000010010000000000000001
00000000100010010101000000101010
00010101010000000000000000000011
00000001000010010100000000100000
00100001001010010000000000000010
00001000000000000000000000000011
00000001000000000001000000100000
            
lw $a0, 0($t0)
addi $t0, $zero, 0
addi $t1, $zero, 1
slt $t2, $a0, $t1
bne $t2, $zero, 32
add $t0, $t0, $t1
addi $t1, $t1, 2
j 12
add $v0, $t0, $zero
            

Your program should handle all of the instructions and registers in the MIPS Instructions Table and MIPS Registers Table I have provided online. You should be able to handle all 32 registers and all three instruction formats (R, I, and J), including all forms of addressing (for example, lw addresses, beq addresses, and j addresses). You may find Figures 2.1 (p. 78), 2.6 (p. 100), and 2.14 (p. 121) helpful in addition to the table I have provided. [Optional: You may wish to extend your program to handle the additional load and store instructions listed as part of the "Core Instruction Set" on the green MIPS Reference Data card that comes with the book.]

You may assume that every line contains an instruction, i.e., you can increment the program counter for every line. (This is helpful in determining the address associated with labels.)

You should handle all error conditions gracefully. In other words, an error condition should not cause your program to terminate unless it is an error that cannot be recovered from. Otherwise, your program should print a message indicating the type and location of the error and then continue as best it can (at the very least, with the next instruction).

You should develop a test file of your own (or more than one) for testing pass 2. You may also want to test it using your own disassembler.

Provided Files:

You should be able to use the Label Table functions you wrote for the Label Table Programming Project, as well as the Makefile, header files, print functions, and process_arguments function used in that program. In the Makefile, adjust the all: target to include the assembler program. If you create new files beyond the ones already listed in the assembler section of the Makefile, don't forget to add them to the list of dependencies and to the GCC compilation action.

I have provided additional code that you may use:

Coding Tips:

If you set up a directory with the provided files and the appropriate files from the Label Table project, and if you make a copy of testPass1.c and call it assembler.c, you should be able to compile and run a starter version of the Assembler program. The initial output, though, will mostly be error messages and some starter information for add and j instructions since the most important functions are just stubs. As you implement each function you should be able to compile and run the program again, seeing incremental progress as you go (agile software development).

One approach to completing this project would be:

Ensuring Quality

As specified in the syllabus, your program should adhere to the Kalamazoo College CS Program Style Guide and Documentation Standards, including use of the Braces Line Up style pattern. You may also use the associated template files: the function template file and the header template file.

To ensure that all function calls are syntactically correct (match the function definitions), you should include function declarations for all of your functions in one or more header files, and include the header file(s) in all appropriate C source files (*.c files).

The Makefile I have provided specifies a set of compiler options that will help you catch many errors at compile time. These options generate warnings about questionable constructions that often indicate programmer confusion or actual logic errors. You may have to make adjustments to the Makefile, though, if the specific options or option names for your compiler are somewhat different.

When your program is fully implemented, the smallSampleTestfile.mips input file should produce output equivalent to smallSampleTestfile.mips.out (which is a copy of the original smallSampleTestfile.input file provided as part of the Disassembler project). Note, though, that these two files only test some cases; they do not provide a thorough test suite.

Since comparing long strings of binary is difficult, you may want to use the Unix/Linux diff command to compare your output against smallSampleTestfile.mips.out. To do this, you would run your program and save your output to a file rather than print it to the screen:
    ./assembler smallSampleTestfile.mips > myOutput
    diff myOutput smallSampleTestfile.mips.out
If you are running Windows, the two files might have different line endings (carriage return and line feed vs. just line feed), so you may want to run them through stripCR first. For example,
    make stripCR
    ./stripCR smallSampleTestfile.mips.out > strippedSmallSample.out
    ./stripCR myOutput > strippedMyOutput.out
    diff strippedMyOutput.out > strippedSmallSample.out

Submission Requirements:

Your submission should contain;

Your program should also work with other input files that may be developed for consistent grading. (Note: It is a good idea to run make clean in the directory before submitting; this will remove the machine-specific executable and intermediate "object code" files, since your code will have to be re-compiled on my machine anyway.)

The old rubric for grading the Assembler Programming Project was based on the following general categories. The current rubric is similar, but adjusted to fewer points to match the general grading scheme this quarter.

  Compiles and runs                                            10 pts
  Correctness (satisfies requirements)                         70 pts
  Internal documentation and coding style                      10 pts
  External Documentation                                       10 pts
  Test Cases                                                   10 pts

  Total:                                                       110 pts