601.229 (S20): Assignment 2: Postfix calculator

Due: Wednesday, February 26th Tuesday, March 3rd by 11pm

Update 2/10: corrected a typo in the example cPostfixCalc invocations

Update 2/13: added explanatory comments for suggested functions

Update 2/16: Corrected assignment skeleton to make sure -g compiler option is used to enable debugging

Update 2/20: Added link to gentest.rb script; see Task 3

Update 2/21: Added link to idiv.zip, a program demonstrating how to do 64-bit signed division

Overview

In this assignment, you will implement a postfix calculator program in x86-64 assembly language.

The grading breakdown is:

This is a challenging assignment. Don’t wait until the last minute to start it! As usual, ask questions using Piazza, come to office hours, etc.

Important: When you write assembly code, you must actually write assembly code. Using the compiler to generate assembly code is not allowed. In addition, we expect highly detailed code comments; in assembly lanuage programs, it’s not unusual for every line of code to be commented. See Task 4 for more details.

When you are done with this assignment you will have proved yourself capable of writing nontrivial x86-64 assembly code. This is a foundational skill for hacking on operating systems and compilers, understanding security vulnerabilities such as buffer overflows, and generally becoming one with the machine.

Getting started

Get started by downloading csf_assign02.zip and extracting it using the unzip command.

You can download this file from a Linux command prompt using the curl command:

curl -O https://jhucsf.github.io/spring2020/assign/csf_assign02.zip

Note that in the -O option, it is the letter “O”, not the numeral “0”.

Update 2/16: The Makefile in the original version of the assignment skeleton did not enable debugging support. If you are using the original assignment skeleton, please edit your Makefile and change the CFLAGS, ASMFLAGS, and LDFLAGS definitions as follows:

CFLAGS = -Og -g -Wall -no-pie
ASMFLAGS = -g -no-pie
LDFLAGS = -g -no-pie

Postfix arithmetic

Normally when we express arithmetic we use infix notation, where binary operators (such as +, -, etc.) are placed between their operands. For example, to expression the addition of the operands 2 and 3, we would write

2 + 3

In postfix notation the operator comes after the operands, so adding 2 and 3 would be written

2 3 +

Postfix notation has some advantages over infix notation: for example, parentheses are not necessary, since the order of operations is never ambiguous. Postfix notation is also called Reverse Polish notation. It is famously used in HP calculators and programming languages such as Forth and PostScript.

Evaluating postfix expressions using a program is very simple. The program maintains a stack of values (initially empty). The items (operands and operators) in the expression are processed in order. Operands (e.g., literal values) are pushed onto the stack. When an operator is encountered, its operand values are popped from the stack, the operator is applied to the operand values, and the result value is pushed onto the stack. For example, consider the postfix expression

3 4 5 + *

This expression would be evaluated as follows:

Item Action
3 Push 3 onto stack
4 Push 4 onto stack
5 Push 5 onto stack
+ Pop operands 5 and 4, add them, push sum 9
* Pop operands 9 and 3, multiply them, push product 27

When any valid postfix expression is evaluated, the stack will have a single result value when the end of the expression is reached. Also, valid postfix expressions will guarantee that operand values are always available on the stack when an operator is processed. Here are some examples of invalid postfix expressions:

Expression Why invalid
10 2 - * Only one operand value is on stack when operator * is processed
2 3 + 4 Two operand values are on stack when end of expression is reached

Tasks

This section explains the specific tasks that you will need to complete.

Note: these tasks aren’t meant to be done strictly sequentially. For example, Tasks 1 and 2 can and should be done at the same time. Tasks 1–3 should probably be done before Tasks 4 and 5, though.

Task 1: Postfix calculator in C

The first task is to implement a C version of the postfix calculator. The C version should take a single command line argument specifying a postfix expression, and (assuming the postfix expression is valid) print a single line of output of the form

Result is: N

where N is the result of evaluating the expression.

Requirements and specifications

If the expression is invalid, or if the maximum stack depth is exceeded, the program must print a single line of the form

Error: msg

where msg is a message describing the error. The program should also exit immediately with an exit code of 1 if an error occurs (after printing the error message).

Compiling and running the program

The source code for the C version of the program is in two files, cPostfixCalcMain.c and cPostfixCalcFuncs.. There is also a header file cPostfixCalc.h, which you should use for definitions and function prototypes. The main function should be in cPostfixCalcMain.c; all other functions should be in cPostfixCalcFuncs.c.

Running the command make cPostfixCalc will compile the two source files and link them into an executable cPostfixCalc. Here are some example invocations you can try from the command line:

Invocation Expected output
./cPostfixCalc '1 1 +' Result is: 2
./cPostfixCalc '7 2 -' Result is: 5
./cPostfixCalc '3 4 5 + *' Result is: 27
./cPostfixCalc '17 3 /' Result is: 5
./cPostfixCalc '3 10 /' Result is: 0
./cPostfixCalc '2 3 4 5 +-*' Result is: -12
./cPostfixCalc '10 2 - *' Error: ...
./cPostfixCalc '2 3 + 4' Error: ...

Note that in the cases where the postfix expression is invalid, the specific output text following Error: is not mandated — just have the program print something descriptive of the error. For example, in the case of the invocation

./cPostfixCalc '2 3 + 4'

the full error message might be something like Error: multiple values left on stack.

Hints for Task 1

Think about the C version as a template for the assembly version. The purpose of the C version of the program is to help you think about the problem and discover a good way to decompose it into functions. When you write the assembly language version of the program, you can hand-translate each C function into an equivalent assembly language function.

Make simplifying assumptions. Since you’re using the C version of the program as a template for the assembly version, you should try to write your C functions in a way that will be easy to duplicate in assembly language. For example, use the long data type (64 bit signed integer) for all data values except for the characters in the expression string. Use pointers as necessary to allow a called function to modify a variable whose address is passed as an argument.

Here is a suggested set of functions to write, which will be reasonably straightforward to translate to assembly language:

// Determine whether character code c is a space or tab
long isSpace(long c);
// Determine whether character code c is a digit ('0'-'9')
long isDigit(long c);
// Skip any leading space characters in the string s, returning a
// pointer to the first non-whitespace character (or the NUL terminator
// if there are no remaining non-whitespace characters)
const char *skipws(const char *s);
// Return the type (integer literal, +, -, *, or /) of the next
// token in string s
long tokenType(const char *s);
// Parse an integer literal, storing the integer value in the
// variable pointed to by pVal; returns pointer to first character
// in the string past the integer
const char *consumeInt(const char *s, long *pVal);
// Push integer value onto the stack (variable pCount points to
// is the number of values on the stack)
void stackPush(long stack[], long *pCount, long val);
// Pop integer value from the stack (variable pCount points to
// is the number of values on the stack)
long stackPop(long stack[], long *pCount);
// Evaluate given postfix expression, returning the result of
// the evaluation
long eval(const char *expr);

Note that you are not required to implement these exact functions, but you may if you wish.

Task 2: Unit tests for C postfix calculator

As you develop each function in the C postfix calculator program, write unit tests for it. The source file cTests.c is a starting point for writing unit tests. The unit test code will use the same unit testing framework as Assignment 1.

To compile the C version of the unit tests, run the command make cTests. To run the tests, run the command

./cTests

Hints for Task 2

Test thoroughly! Try to exercise the corner cases in your code.

In addition to testing valid inputs, you should also test invalid inputs. One challenge for testing invalid inputs is that the correct program behavior is exiting with an exit code of 1: however, you don’t want the test program to actually exit when an invalid input is tested, since that would cause the test program to exit! The file cTests.c has support for redirecting calls to the exit function so that they return control to your test function, rather than exiting the program. Let’s say, for example, that you want to test that a function called eval properly calls exit when given an invalid postfix expression. The test function should set the exitExpected variable to a nonzero value, and then use sigsetjmp and siglongjmp as follows:

expectedExit = 1; /* about to test code that is supposed to call exit */

if (sigsetjmp(exitBuf, 1) == 0) {
  eval("2 3 + 4");    /* invalid postfix expression */
  FAIL("eval function failed to exit for invalid expression");
} else {
  printf("Good, eval properly called exit for invalid expression...");
}

This approach works because cTests.c has its own version of the exit function which uses siglongjmp to transfer control back to the call to sigsetjmp but return with a nonzero return code. The sigsetjmp and siglongjmp functions are essentially primitive forms of try/catch and throw (respectively). You can read more about these functions in Chapter 8 of the textbook. (Note that the FAIL macro, which causes a unit test to immediately fail, also takes advantage of sigsetjmp/siglongjmp.)

Task 3: System-level tests

In addition to writing unit tests to test the individual functions in the C version of the postfix calculator, program, you should also write “system”-level tests to test the overall program on various inputs. Add these tests to sysTests.sh. This script supports two kinds of tests:

A few example tests are provided:

expect 5 '2 3 +'
expect 42 '6 7 *'
expect 42 '6 6 6 6 6 6 6 + + + + + +'
expect_error '2 2'
expect_error '1 *'

You should add your own tests. As with the unit tests, try to think of corner cases in your code, and add tests to exercise them.

To run the system tests on your C postfix calculator implementation, run the command

./sysTests.sh ./cPostfixCalc

Update 2/20: To help you generate some interesting tests, you may use the following Ruby script: gentest.rb. Example invocation:

$ ruby gentest.rb
expect 109 '4 15 7 * +'

Please note that these automatically-generated tests are not a substitute for writing your own tests. For example, the script only generates valid postfix expressions, but you should also write tests for various kinds of invalid expressions.

Task 4: Postfix calculator in assembly

In this task, you will write an x86-64 assembly language version of the postfix calculator. Use the source files asmPostfixCalcMain.S and asmPostfixCalcFuncs.S for the main function and implementation functions, respectively. Follow the design of the C implementation you wrote in Task 1.

Note that the .S file extension means “preprocessed assembly”, so you can use C-style comments in your assembly code. You can also use #define to define named constants, just as you would in a C program.

To assemble and link the assembly language program, run the command make asmPostfixCalc. Run it exactly the same way as the C version, e.g.

./asmPostfixCalc '1 2 +'

Writing assembly code

The purpose of this assignment is for you to learn how to write assembly code by hand. So, using the compiler to compile C to assembly code, and then pretending you wrote the assembly code, is not a legitimate way to complete this task.

You may inspect compiler-generated assembly code as a learning tool, but you should do so sparingly, and you should never directly copy any compiler-generated code into your assembly source files.

We expect your assembly code to have very detailed code comments. The code comments should explain exactly what your assembly code is doing. You should not expect to receive substantial credit for inadequately-commented code.

Hints for Task 4

This task is challenging! Here are some hints to help you make progress.

Start with the simplest functions. If you are implementing the functions suggested in Task 1, you might start by implementing the isSpace function, which should return 1 if its argument is space (' ') or tab ('\t'), and return 0 otherwise. The isDigit function is also a good place to start.

Write the unit tests as you implement your functions. Your job will be vastly easier if you develop each function and its unit test(s) at the same time. You will find that unit tests allow you to test and debug each function in isolation, and free you from having to worry unnecessarily about interactions with other functions.

Once you get all of the individual functions to work correctly, you can tie them together into a complete program. See Task 5 below for more details on how to write unit tests for your assembly functions.

Follow the calling conventions. Make sure your assembly language functions correctly adhere to the x86-64 calling conventions. By doing so,

Make sure that when your assembly code calls a function, the stack pointer (%rsp) contains an address that is a multiple of 16. Because a call instruction pushes the current program counter value (%rip) onto the stack, on entry to a function, the stack pointer is misaligned by 8 bytes (the size of a code address), so each of your functions will need to adjust %rsp by N bytes such that N mod 16 = 8. I.e., subtract 8 bytes, or 24 bytes, or 40 bytes, etc.

Use registers for local variables. For most local variables in your code, you can use a register as the storage location. The callee-saved registers (%rbx, %rbp, %r12%r16) are the most straightforward to use, but you will need to save their previous contents and then restore them before returning from the function. (The pushq and popq instructions make saving and restoring register contents easy.) The caller-saved registers (%r10 and %r11) don’t need to be saved and restored, but their values aren’t preserved across function calls, so they’re tricker to use correctly.

Use the frame pointer to keep track of local variables in memory. Some variables will need to be allocated in memory. You can allocate such variables in the stack frame of the called function. The frame pointer register (%rbp) can help you easily access these variables. A typical setup for a function which allocates variables in the stack frame would be something like

myFunc:
    pushq %rbp                      /* save previous frame pointer */
    subq $N, %rsp                   /* reserve space for local variable(s) */
    movq %rsp, %rbp                 /* set up frame pointer */

    /*
     * implementation of function: local variables can be accessed
     * relative to %rbp, e.g. 0(%rbp), 8(%rbp), etc.
     */

    addq $N, %rsp                   /* deallocate space for local variable(s) */
    popq %rbp                       /* restore previous frame pointer */
    ret

The code above allocates N bytes of memory in the stack frame for local variables. Note that N needs to be a multiple of 16 to ensure correct stack pointer alignment. (Think about it!)

Use leaq to compute addresses of local variables. It is likely that one or more of your functions takes a pointer to a variable as a parameter. When calling such a function, the leaq instruction provides a very convenient way to compute the address of a variable. For example, let’s say we want to pass the address of a local variable 8 bytes offset from the frame pointer (%rbp) as the first argument to a function. We could load the address of this variable into the %rdi register (used for the first function argument) using the instruction

leaq 8(%rbp), %rdi

Use local labels starting with .L for flow control. As you implement flow control (such as loops and decisions) in your program, you will need to define labels for branch targets. You should use names starting with .L (period followed by capital L) for these labels. This will ensure that the assembler does not enter them into the symbol table as function entry points, which will make debugging with gdb difficult. Here is an example assembly language function with local labels:

/*
 * Determine the length of specified character string.
 *
 * Parameters:
 *   s - pointer to a NUL-terminated character string
 *
 * Returns:
 *    number of characters in the string
 */
	.globl strLen
strLen:
	subq $8, %rsp                 /* adjust stack pointer */
	movq $0, %r10                 /* initial count is 0 */

.LstrLenLoop:
	cmpb $0, (%rdi)               /* found NUL terminator? */
	jz .LstrLenDone               /* if so, done */
	inc %r10                      /* increment count */
	inc %rdi                      /* advance to next character */
	jmp .LstrLenLoop              /* continue loop */

.LstrLenDone:
	movq %r10, %rax               /* return count */
	addq $8, %rsp                 /* restore stack pointer */
	ret

Note that the function above also illustrates what we consider to be an appropriate amount of detail for code comments.

Use cqto before idivq. Before using the idivq instruction to do integer division, use the cqto instruction to sign extend the dividend (which should be stored in the %rax register) into the %rdx register. Download the idiv.zip example program for a full example.

Task 5: Unit tests for assembly postfix calculator

In the source file asmTests.c, implement unit tests for the functions of the assembly language version of the postfix calculator.

You will need to add function prototypes for your assembly functions. If your assembly language calculator implementation follows the design of the C version exactly, you can use the same function prototypes (i.e., you can copy them from cPostfixCalc.h).

Even better, if your assembly functions are equivalent to your C functions, you can copy the actual tests. So, the unit tests for your assembly functions will likely be similar (or even identical) to the unit tests for your C functions.

Note that the technique for testing functions on invalid input explained in Task 2 can also be used in the unit tests for your assembly language functions.

Submitting

To submit your work: