601.229 (F19): HW5: x86-64 hexdump
- Out on: Monday, Oct 28th, 2019
- Due by:
Friday, Nov 8th, 2019 by 10pmMonday, Nov 11th, 2019 by 10pm - Collaboration: None
- Grading: Packaging 10%, Style 10%, Design 10% Functionality 70%
Acknowledgment: this assignment is based on the Fall 2018 HW5 by Peter Froehlich.
Update 10/31: added Example assembly language programs
Update 11/5: added hint about how to determine which bytes are printable characters
Update 11/5: clarified that writing unit tests is strongly encouraged but not required
Update 11/7: added Example assembly language functions, also slightly corrected the advice about stack pointer alignment
Overview
This assignment is all about hacking native x86_64 assembly code. For obvious reasons, you’ll need a 64-bit Lubuntu 18.04 LTS reference system; you cannot do this assignment on a 32-bit install. (Note that the ugrad machines should work, but testing on an Ubuntu 18.04-derived system or virtual machine is recommended since it matches what the autograder will be using.)
You’ll use the standard gcc/gas toolchain and you must use AT\&T syntax, not Intel syntax.
You should use the following starter code: hw5.zip
Note that for all problems the full x86_64 conventions regarding register usage (arguments, results, caller-saved vs. callee-saved, etc.) are in effect! (Of course regular calls differ from system calls in this regard.)
Lost?
If you find yourself wondering how to use a system call,
you can use the man
command to look up information. For
example, to find out how the read
system call works, use
man 2 read
. Sadly the man
pages don’t describe the details
required to call system calls from assembly language, but this
post
has everything you’ll need regarding register conventions and system
call numbers.
Warning!
Undocumented assembly code will lose points. You don’t have to comment every line, but at least every “coherent chunk” of assembly should have a comment or two. In particular you must describe where you get what data from, especially when it comes to functions and their parameters/results. You have been warned!
Hexdump’s Revenge
Remember the hex
program from
Assignment 1?
You’re about to write that program again, but this time you’ll do it
in x86_64 assembly using only system calls and no standard
library!
The specification for hex
is exactly the same as given on
Homework 1 which means you can also reuse all your old test cases
(assuming you had some). Of course you’ll now have to approach the
problem a little differently, for example you’ll need to use the read
and write
system calls (with suitable buffers!) instead of getchar
or printf
or whatnot.
The starter code has two files related to this problem. The files hex.S
and hexFuncs.S
are the assembly language modules used to implement
the hex
program. The hex.S
module should contain only the main
function for the hex
program. The hexFuncs.S
module should contain
useful functions that you can call from main
.
Unit testing
Whenever you write functions to incorporate into a program, is extremely important to have confidence that they behave correctly. Unit tests are a very effective way to test the behavior of functions to make sure they meet their specfications.
In this assignment, you will use a simple unit testing framework called TCTest. You can read the README and demo program for specific information about how it works, but if you’ve used unit testing frameworks such as JUnit, it should be fairly straightforward.
You should use unit tests to test the functions in hexFuncs.S
.
[Note that you are not required to write unit tests, but
we strongly encourage you to write unit tests.]
Add your unit tests to the hexTest.c
source file.
For each function that you want to test, you will need to write
a C language function declaration for the function. You can use
the addLongs
example function and its associated tests as an example.
To compile and run the unit tests, run the following commands:
make hexTest
./hexTest
Important advice: Writing complete programs in
assembly language is hard. Using unit tests, you can adopt a
test-driven approach where you implement one assembly language
function at a time, and test them to ensure correct operation.
Using this approach will make developing the hex
program vastly
easier.
Here is a concrete example. A useful function for the hexdump program is one that converts a byte value to a two-digit hex number. In assembly language, we could define this function like this:
/*
* Convert a byte value to a two-digit hex string.
*
* Parameters:
* val - a byte value
* s - a pointer to a char buffer with enough room for a
* string of length 2
*/
.globl byteToHex
byteToHex:
/* code would go here... */
To unit test this function, we make several changes to hexTest.c
.
First, we add a function prototype (right below the one for the
example addLongs
function):
void byteToHex(long val, char *s);
We also add a function prototype for a new test function called
testByteToHex
, just below the prototype for testAddLongs
:
void testByteToHex(TestObjs *objs);
We add testByteToHex
to the test functions to be executed from main
:
TEST(testByteToHex);
Finally, we add a definition of testByteToHex
:
void testByteToHex(TestObjs *objs) {
char buf[10];
byteToHex(0x29, buf);
ASSERT(0 == strcmp(buf, "29"));
byteToHex(0xC, buf);
ASSERT(0 == strcmp(buf, "0c"));
}
Assuming that the byteToHex
function was implemented correctly, when
we compile and run hexTest
, we should see the following output:
testAddLongs...passed!
testByteToHex...passed!
All tests passed!
If you’d like to see the entire hexTest.c
with the test for byteToHex
,
here it is: hexTest.c
Program-level testing
In addition to unit testing individual functions, you should test the program as a whole the same way you tested the program you wrote for Homework 1. In general, for any input file (text, binary, etc.), the command
./hex < inputfile
should produce exactly the same output as
xxd -g 1 < inputfile
We encourage you to test your program with a variety of inputs, including (but not limited to):
- empty file
- small files
- large files
- files with sizes that are a multiple of 16
- files with sizes that aren’t a multiple of 16
- text files
- binary files
x86-64 tips and tricks
Here are some tips and tricks in no particular order.
Don’t forget that you need to prefix constant values with $
. For example,
if you want to set register %r10
to 16, the instruction is
movq $16, %r10
and not
movq 16, %r10
If you want to use a label as a pointer (address), prefix it with
$
. For example,
movq $sHexDigits, %r10
would put the address that sHexDigits
refers to in %r10
.
If you want to load or store the data in a variable named by a
label, then do not prefix it with $
. For example, if you want
to load the value of the (64 bit) variable bCount
into %rdi
,
use the instruction
movq bCount, %rdi
When calling a function, the stack pointer (%rsp
) must contain an address
which is a multiple of 16. However, because the callq
instruction
pushes an 8 byte return address on the stack, on entry to a function,
the stack pointer will be “off” by 8 bytes. You can subtract 8 from
%rsp
when a function begins and add 8 bytes to %rsp
before returning
to compensate. (See the example addLongs
function.) Pushing an
odd number of callee-saved registers also works, and has the benefit
that you can then use the callee-saved registers freely in your function.
If you want to define read-only string constants, the .rodata
section
is the right place for them. For example:
.section .rodata
sHexDigits: .string "0123456789abcdef"
The .equ
assembler directive is useful for defining constant values,
for example:
.equ BUFSIZE, 16
You might find the following source code comment useful for reminding yourself about calling conventions:
/*
* Notes:
* Callee-saved registers: rbx, rbp, r12-r15
* Subroutine arguments: rdi, rsi, rdx, rcx, r8, r9
*/
In Unix and Linux, standard input is file descriptor 0.
Linux system calls do not preserve %rcx
or %r11
, so make sure you save them on the stack if their contents need to be preserved across a system call.
The GNU assembler allows you to define “local” labels, which start with the prefix .L
. You should use these for control flow targets within a function. For example (from the echoInput.S example program):
cmpq $0, %rax /* see if read failed */
jl .LreadError /* handle read failure */
...
.LreadError:
/* error handling goes here */
Hint about determining which characters are printable: the range of
printable ASCII characters is 32 through 126, inclusive. Any byte value
that is not in this range should be printed as “.
” (period). Note
that “.
” has ASCII value 46.
Example assembly language programs
For reference, here are links to a couple of example assembly language programs which use the read
and write
system calls.
- hello.S: prints a
Hello, world
message - echoInput.S: reads up to 128 bytes of data from standard input and echoes it to standard output
Example assembly language functions
This section shows implementations of a couple of assembly language functions you might find useful.
Here is an assembly language function called strLen
which returns the number
of characters in a NUL-terminated character string:
/*
* Determine the length of specified character string.
*
* Parameters:
* s - pointer to a NUL-terminated character string
*
* Returns:
* number of characters in the string
*/
.globl strLen
strLen:
subq $8, %rsp /* adjust stack pointer */
movq $0, %r10 /* initial count is 0 */
.LstrLenLoop:
cmpb $0, (%rdi) /* found NUL terminator? */
jz .LstrLenDone /* if so, done */
inc %r10 /* increment count */
inc %rdi /* advance to next character */
jmp .LstrLenLoop /* continue loop */
.LstrLenDone:
movq %r10, %rax /* return count */
addq $8, %rsp /* restore stack pointer */
ret
In C, the declaration of this function could look like this:
long strLen(const char *s);
Unit testing this function might involve the following assertions:
ASSERT(13L == strLen("Hello, world!"));
ASSERT(0L == strLen(""));
ASSERT(8L == strLen("00000010"));
Here is a function that writes a NUL-terminated character string to standard output:
/*
* Print a C character string to stdout.
*
* Parameters:
* s - the string to print
*/
.globl printStr
printStr:
pushq %r12 /* preserve contents of %r12 */
/* determine length of string */
movq %rdi, %r12 /* save s (strLen will modify %rdi) */
callq strLen /* determine length of s */
/* use write system call to print string */
movq $1, %rdi /* first write arg is fd (1=stdout) */
movq %r12, %rsi /* second write arg is buffer */
movq %rax, %rdx /* third write arg is count */
movq $1, %rax /* write is system call 1 */
syscall /* call write */
popq %r12 /* restore contents of %r12 */
ret
Note that this function uses the strLen
function.
Deliverables
Submit a zipfile containing your complete project. The recommended
way to do this is to run the command make solution.zip
. This
will create a file called solution.zip
with all of the required
files. Important: all of the files in the zipfile must be
at the top level, not a subdirectory. For example, if your
zipfile is called solution.zip
and you run the command unzip -l solution.zip
to list its contents, you should see something like the following output:
TODO: expected output of unzip -l solution.zip
Upload your zipfile to Gradescope as HW5. Make sure to include your name and email address in every file you turn in (well, in every file for which it makes sense to do so anyway!)
Grading
For reference, here is a short explanation of the grading criteria; some of the criteria don’t apply to all problems, and not all of the criteria are used on all assignments.
Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for assignments on Piazza.
Style refers to C/C++/assembly programming style, including things like consistent indentation, appropriate identifier names, useful comments, suitable documentation, etc. Simple, clean, readable code is what you should be aiming for. Make sure you follow the style guide posted on Piazza!
Design refers to proper modularization (functions, modules, classes, etc.) and an appropriate choice of algorithms and data structures.
Performance refers to how fast/with how little memory your programs can produce the required results compared to other submissions.
Functionality refers to your programs being able to do what they should according to the specification given above; if the specification is ambiguous, ask for clarification! (It also refers to you simply doing the required work, which may not be programming alone.)
If your programs cannot be built you will get no points whatsoever. If
your programs cannot be built without warnings using the required
compiler options given on
Piazza we will take off 10%
(except if you document a very good reason). If your programs cannot
be built using make
we will take off 10%. If valgrind
detects memory
errors in your programs, we will take off 10%. If your programs fail
miserably even once, i.e. terminate with an exception of any kind or
dump core, we will take off 10% (for each such case).