601.229 (S23): Assignment 1: Big integers

Milestone 1: due Monday Jan 30th by 11pm

Milestone 2: due Monday Feb 6th by 11pm

Assignment type: Pair, you may work with one partner

Overview

In this assignment, you will implement a simple C library implementation operations on a 256-bit unsigned integer data type.

This is a substantial assignment! We strongly recommend that you start working on it as early as possible, and plan to make steady progress rather than waiting until the last minute to complete it.

To complete the assignment, you will implement a number of C functions, and you will also write unit tests to test these functions.

Milestones, grading criteria

The grading breakdown is as follows:

Milestone 1 (15% of the assignment grade):

Implementation of functions (15%)
- uint256_create_from_u64
- uint256_create
- uint256_get_bits

Milestone 2 (85% of the assignment grade):

Implementation of functions (65%)
- uint256_create_from_hex
- uint256_format_as_hex
- uint256_add
- uint256_sub
- uint256_mul (note that this function is worth at most 8% of the overall assignment grade)
Comprehensiveness and quality of your unit tests (10%)
Design and coding style (10%)

Important! Milestone 1 is intended as a warm-up, since you might not have written C code in a while. For that reason, it is a very lightweight milestone. Milestone 2 will require significantly more work.

Getting started

To get started, download csf_assign01.zip, which contains the skeleton code for the assignment, and then unzip it.

Note that you can download the zipfile from the command line using the curl program:

curl -O https://jhucsf.github.io/spring2023/assign/csf_assign01.zip

Note that in the -O option, it is the letter “O”, not the numeral “0”.

Unsigned integers

You should be familiar with unsigned integer data types from your previous experience programming in C and C++. (And, of course, we are covering the properties of machine-level integer data types in the course.)

In C, the uint64_t data type is typically the largest unsigned integer data type available, and because its values consist of 64 bits, it can represent integer values in the range \(0\) to \(2^{64}-1\), inclusive.

In this assignment, you will implement functions implementing operations on a data type called UInt256, which is a 256-bit unsigned integer data type. Its internal representation contains an array of 4 uint64_t values:

typedef struct {
  uint64_t data[4];
} UInt256;

The elements of the data array are in order from least significant to most significant. So, element 0 contains bits 0–63, element 1 contains bits 64–127, etc.

The functions you must implement are declared in the header file uint256.h and defined in the source file uint256.c.

Implementing the functions, testing

Each function has a detailed comment describing the expected behavior.

A unit test program is provided. Its code is in the source file uint256_tests.c. You can compile the test program by running the make command. So, the following commands will build the test program and execute it:

make
./uint256_tests

Note that while the unit test program contains some basic tests that you should find useful, they are by no means comprehensive. So, you should understand that you are expected to add your own tests. For Milestone 2, part of the grading criteria includes the comprehensiveness of the tests you write.

Important restriction: In your implementation of the functions, you must not use any data types (or operations on data types) where the size is greater than 64 bits. For example, using the __int128 or uint128_t data types is not permitted.

Hints and tips

Addition

To add two UInt256 values, you should use the “grade school” algorithm for addition. Treat the elements of the data arrays of the values being added as the “columns” of the addition, starting with the elements at index 0 (the least significant “column”.) Each column of the sum is the sum of the corresponding column values of the values being added. However, you will need to handle the possibility that the sum of column values could exceed the range of values that can be represented by uint64_t, in which case you will need to carry a 1 into the next (more significant) column.

You can determine if the addition of two uint64_t values overflows as follows:

uint64_t leftval = /* some value */,
         rightval = /* some value */,
         sum;

sum = leftval + rightval;
if (sum < leftval) {
  // the addition overflowed
}

Note that it is possible that the sum of two UInt256 values could overflow the range of values which can be represented. This will happen if a 1 is carried out of the most significant column. You do not need to do anything special to handle this possibility. So, for example, if you add 1 to the maximum UInt256 value that can be represented, the result should be 0:

UInt256 max;
for (int i = 0; i < 4; ++i) { max.data[i] = ~(0UL); }

UInt256 one = uint256_create_from_u64(1UL);

UInt256 sum = uint256_add(max, one);

// these assertions should succeed
ASSERT(sum.data[3] == 0UL);
ASSERT(sum.data[2] == 0UL);
ASSERT(sum.data[1] == 0UL);
ASSERT(sum.data[0] == 0UL);

Subtraction

You should implement subtraction as follows. Let’s say that uint256_sub needs to compute the difference \(a-b\). It should implement this subtraction as \(a + (-b)\), where \(-b\) is the two’s complement negation of \(b\). Recall that the two’s complement negation of a bitstring is found by inverting all of its bits, and then adding 1.

As with addition, you do not need to do anything special to handle the case where a difference would produce a value that can’t be represented (because it would be negative.) So, for example, subtracting 1 from 0 should yield a difference that is equal to the maximum value that can be represented.

Multiplication

Note that multiplication is the most complicated operation you will be implementing, so you should probably implement uint256_mul last, after getting all of the other functions to work.

To think about how multiplication should work, let’s start by thinking about how unsigned integers are represented.

For example, the integer 37 is represented in binary as 100101, because

\[37 = 1 \times 2^{5} + 0 \times 2^{4} + 0 \times 2^{3} + 1 \times 2^{2} + 0 \times 2^{1} + 1 \times 2^{0}\]

Let’s say we want to compute the product \(37 \times m\), where \(m\) is some other integer value. We do this by computing the sum of multiplying \(m\) by the powers of 2 that make up 37:

\[37 \times m = (2^{5} \times m) + (2^{2} \times m) + (2^{0} \times m)\]

As we know, left shifting an integer’s binary representation by \(n\) bits is equivalent to multiplying it by \(2^{n}\). This suggests an algorithm for multiplying UInt256 values \(a\) and \(b\):

Start with the result set to 0
For each bit in the binary representation of \(a\) that is set to 1:
- Compute a term by left shifting \(b\) by the bit position of the bit in \(a\)
- Update the result by adding the term to its current value

When the algorithm finishes, the result should be the product \(a \times b\).

As with addition and subtraction, the product could be too large to represent, but you do not need to do anything special to handle this case. You can just allow the left-shift operations and additions to overflow. The autograder tests will only test multiplications where the product can be represented exactly.

You will probably want to implement helper functions for checking whether a particular bit is set to 1, and for left-shifting a value by a specified number of positions:

int uint256_bit_is_set(UInt256 val, unsigned index);
UInt256 uint256_leftshift(UInt256 val, unsigned shift);

You can add declarations for these helper functions to uint256.h, and add unit tests for them to uint256_tests.c.

You will need to think carefully about how to implement the left shift operation. Drawing a diagram on paper will probably be helpful. Note that when left-shifting a uint64_t value, shifting by 64 or more positions is undefined behavior. Also note that literal integer values are, by default, 32 bit values which can’t be shifted by more than 31 positions. Adding the suffix “UL” on an integer literal will force its type to be unsigned long, which is a 64 bit type in the version of gcc we are using. So, for example,

// this shift is undefined behavior
uint64_t val1 = (1 << 33);

// this shift is well defined
uint64_t val2 = (1UL << 33);

Conversion from hex

When implementing the uint256_create_from_hex function, you may assume that the character string passed as its parameter consists entirely of hex digits 0–9 and a-f. You do not need to check the string for invalid characters. However, you should not assume that the string’s length is 64 characters or less. If the string has a length greater than 64, only the rightmost 64 hex digits should be used in the conversion.

We strongly recommend using the strtoul function to convert chunks of hex digits (up to 16 at a time) to uint64_t values. A possible algorithm is to start with the rightmost 16 hex digits, convert them (and assign the resulting value to data[0]), and then continue working left (towards the beginning of the string) until up to 64 hex digits have been converted.

Conversion to hex

The uint256_format_as_hex function should dynamically allocate (using malloc) a sufficiently large array of char elements to store a string of hex digits representing the value of the UInt256 value being converted. We recommend using the sprintf function to convert a uint64_t value to a sequence of hex digits. For example:

char    *buf = /* buffer w/ room for at least 16 chars plus NUL terminator */;
uint64_t val = /* some value */;

sprintf(buf, "%lx", val);    // format without leading 0s

sprintf(buf, "%016lx", val); // format with leading 0s

You will need to write a loop to make sure that the value of each element of the data array is represented in the resulting hex string. Note that the resulting hex string should not have unnecessary leading 0 digits. In other words, the hex string should have the minimum number of hex digits needed to represent the of the UInt256 value.

Writing tests

You can use the unit tests provided with the assignment skeleton as a guide for adding your own tests. The basic idea is that each test function should call functions to perform computations on UInt256 values, and then use ASSERT to check the resulting values.

Your unit tests should include as many “corner” cases as possible. For example:

adding 0 to a value
subtracting 0 from a value
causing an addition to overflow
causing a subtraction to (negatively) overflow

You can add new tests and assertions to the existing test functions, but it is probably a good idea to add some completely new test functions of your own. To add a test function,

add a function prototype for it (towards the top of the test program),
add a call to the TEST macro in the test program’s main function,
implement the test program (somewhere towards the bottom of the test program)

Note that if you omit step 2, your test function will never be executed.

A Ruby script called genfact.rb is provided with the starter code. Running this script will generate an arbitrary arithmetic fact that you could turn into a unit test. When run without a command line argument, the script generates an addition, subtraction, or multiplication fact (chosen randomly.) You can provide the argument add. sub, or mul to force the generation of an addition, subtraction, or multiplication fact.

For example, let’s say you run genfact.rb and it produces the fact

c0b02ba840f40caf406fd64866ad2fb39a280bf0772e5f0eee7d8c9a5f9e82c - 3f69093a9cdcececda8f5ed979956361e9f16ff7730cfd46e548f7b625a4748 = 8147226da4171fc265e0776eed17cc51b0369bf9042161c8093494e439fa0e4

This fact could be implemented as a unit test as follows:

UInt256 left, right, result;

left.data[0] = 0xeee7d8c9a5f9e82cUL;
left.data[1] = 0x39a280bf0772e5f0UL;
left.data[2] = 0xf406fd64866ad2fbUL;
left.data[3] = 0xc0b02ba840f40caUL;
right.data[0] = 0x6e548f7b625a4748UL;
right.data[1] = 0x1e9f16ff7730cfd4UL;
right.data[2] = 0xcda8f5ed97995636UL;
right.data[3] = 0x3f69093a9cdceceUL;
result = uint256_sub(left, right);
ASSERT(0x8093494e439fa0e4UL == result.data[0]);
ASSERT(0x1b0369bf9042161cUL == result.data[1]);
ASSERT(0x265e0776eed17cc5UL == result.data[2]);
ASSERT(0x8147226da4171fcUL == result.data[3]);

Running and debugging tests

By default running the test program with the invocation ./uint256_tests will run each test function in order. However, you can run just one test function by naming it on the command line. For example, the invocation

./uint256_create_from_u64 test_add_2

will execute only the test_add_2 test function. This is useful when you are focusing on getting a specific test to pass. Also, because C is a memory-unsafe language, it’s possible for an earlier test to corrupt the state of the program in a way that could affect the execution of later tests, so in general running only one test function will produce a more trustworthy result than running all of the test functions.

If a unit test function fails, you can use gdb to debug the test function. For example, let’s say that the unit test test_add_2 is failing. Start by invoking gdb on the test program:

gdb ./uint256_tests

At the gdb prompt, set a breakpoint at the beginning of test_add_2, then run the program:

(gdb) break test_add_2
Breakpoint 1 at 0x3d44: file uint256_tests.c, line 203.
(gdb) run test_add_2
Starting program: /home/daveho/git/csf-spring2023-private/src/csf_assign01_solution/uint256_tests test_add_2
test_add_2...
Breakpoint 1, test_add_2 (objs=0x7ffff7fac6a0 <_IO_2_1_stdout_>) at uint256_tests.c:203
203	void test_add_2(TestObjs *objs) {
(gdb) n
211	  left.data[0] = 0x4cb6c96c4810cb5eUL;
(gdb)

At this point, you can use the next and step commands to execute the code. By stepping into the function call associated with the assertion failure, you can trace the execution and inspect data in order to pinpoint the cause of the issue.

Memory correctness

In all C and C++ code you write, we expect that there are no memory errors, including

invalid reads
invalid writes
uses of uninitialized values
memory leaks

We expect you to use the valgrind memory trace tool to check program execution for occurrences of memory errors. For this assignment, run

valgrind --leak-check=full ./uint256_tests

There should be no dynamic memory errors, and assuming that all of the unit tests pass, there should be no memory leaks.

Submitting

Before you submit, prepare a README.txt file so that it contains your names, and briefly summarizes each of your contributions to the submission (i.e., who worked on what functionality.) This may be very brief if you did not work with a partner.

To submit your work:

Run the following commands to create a solution.zip file:

rm -f solution.zip
zip -9r solution.zip Makefile *.h *.c README.txt

Upload solution.zip to Gradescope as Assignment 1 MS1 or Assignment 1 MS2, depending on which milestone you are submitting.

Please check the files you uploaded to make sure they are the ones you intended to submit.

Autograder

When you upload your submission to Gradescope, it will be tested by the autograder, which executes unit tests for each required function. Please note the following:

If your code does not compile successfully, all of the tests will fail
The autograder runs valgrind on your code, but it does not report any information about the result of running valgrind: points will be deducted if your code has memory errors or memory leaks!