Programming Project: Assembler

You may work individually or in groups of 2 people to finish this project. I expect that the programming will be your group's effort and not the effort of other persons.


Write a program that will read in MIPS assembly language and write out the corresponding machine language instructions.

Behavioral Requirements:

In a general way, your assembler should provide the reverse functionality of your disassembler in Programming Project #2. It might seem, in fact, that you could use the MIPS assembly language instructions produced by your disassembler as input to the assembler. This is, in fact, almost true. The statements that are exceptions are the beq, bne, j, and jal instructions, which should have labels instead of addresses in the assembly language input. (These sample instructions come from the testfile described in the Coding Tips below.)

Disassembler Output Assembler Input
bne $t2, $zero, 32bne $t2, $zero, FINISH
j 12j LOOP

The output of your assembler should be a text file containing strings representing MIPS instructions, one per line. Actual MIPS instructions would be stored in 32-bit integers; instead, your file will contain lines of 32 characters ('0' or '1'), where each line represents a machine language MIPS instruction. This is the same format as the input for your disassembler. In fact, you should be able to use the output of your assembler as the input to your disassembler and get back the MIPS assembly language program that you started with, except that the labels in the original assembly language program will have disappeared and been replaced by their corresponding addresses in beq/bne/j/jal instructions.

Your program should handle all of the instructions and registers in the MIPS Instructions Table and MIPS Registers Table I have provided online. You should be able to handle all 32 registers and all three instruction formats (R, I, and J), including all forms of addressing (for example, lw addresses, beq addresses, and j addresses). You may find Figures 2.1 (p. 78), 2.6 (p. 100), and 2.14 (p. 121) helpful in addition to the table I have provided. [Optional: You may wish to extend your program to handle the additional load and store instructions listed as part of the "Core Instruction Set" on the green MIPS Reference Data card that comes with the book.]

Handling Labels:

Your assembler should be a two-pass assembler. In the first pass, the program parses each instruction to see if it contains a label at the beginning of the instruction. If it does, the program adds the label and the instruction address to a table.

The second pass does the actual translation from assembly to machine language. The machine language output should be in the same format as the input for your disassembler program, i.e., each line should contain 32 characters representing the 32 bits of a single machine instruction.

Coding Tips:

You should be able to use the Label Table functions you wrote for the Label Table Programming Project, as well as the Makefile, header files, print functions, and process_arguments function used in that program. In the Makefile, add the test drivers and assembler program for this project as additional targets for "all:" and add dependencies and compilation actions for any new files you create.

I have provided some additional code for the first pass that you may use:

If you wish, you may use the structure chart on this page to guide you in your design of pass 2, or you may develop your own design. You may wish to use getNTokens in your code-generating functions (e.g., assembleR, etc.) rather than calling getToken repeatedly, as the structure chart indicates. The testPass1.c test driver and smallSampleTestfile.mips input file were used to test pass 1 and might be useful for testing pass 2.

I have also provided some hints on writing a printBin(int number, int numBits) function (to print a value in binary format).

Note: Parsing loosely-formatted input to separate it into meaningful syntactic units ("tokens") is a non-trivial task. The standard C string library includes the function strtok to help with this process, but it is not a very easy function to understand and use. The getToken and getNTokens functions provide a somewhat simpler interface to the strtok function for this project.

You may assume that every line contains an instruction, i.e., you can increment the program counter for every line. (This is helpful in determining the address associated with labels.)

You should handle all error conditions gracefully. In other words, an error condition should not cause your program to terminate unless it is an error that cannot be recovered from. Otherwise, your program should print a message indicating the type and location of the error and then continue as best it can (at the very least, with the next instruction).

You should develop a test file of your own (or more than one) for testing pass 2. You may also want to test it using your own disassembler.

Ensuring Quality

As specified in the syllabus, your program should adhere to the Kalamazoo College CS Program Style Guide and Documentation Standards, including use of the Braces Line Up style pattern. You may also use the associated template files: the function template file and the header template file.

To ensure that all function calls are syntactically correct (match the function definitions), you should include function declarations for all of your functions in one or more header files, and include the header file(s) in all appropriate C source files (*.c files).

The Makefile I have provided specifies a set of compiler options that will help you catch many errors at compile time. These options generate warnings about questionable constructions that often indicate programmer confusion or actual logic errors. You may have to make adjustments to the Makefile, though, if the specific options or option names for your compiler are somewhat different.

Submission Requirements:

Your submission should contain;

Your program should also work with other input files that may be developed for consistent grading. (Note: It is a good idea to run make clean in the directory before submitting; this will remove the machine-specific executable and intermediate "object code" files, since your code will have to be re-compiled on my machine anyway.)

The rubric for grading the Assembler Programming Project will be based roughly on the following.

  Compiles and runs                                            10 pts
  Correctness (satisfies requirements)                         70 pts
  Internal documentation and coding style                      10 pts
  External Documentation                                       10 pts
  Test Cases                                                   10 pts

  Total:                                                       110 pts