601.229 (S20): Assignment 1: Arbitrary-precision arithmetic

Due: ~~Friday, Feb 7~~ Tuesday, Feb 11 by 11pm

Update 1/29: Posted fact-generation script to help with testing

Update 1/30: Important Makefile fix posted (see Submitting); information about autograder posted

Update 1/31: Makefile updated to use -lm option to link test executable (apintTests) with the math library.

Overview

In this assignment you will implement a simple C library for arbitrary-precision integer arithmetic.

The grading breakdown is as follows:

Impementation of data type and functions: 60%
Unit tests: 30%
Design and coding style: 10% (see the style guidelines)

Getting started

Download csf_assign01.zip, which contains the skeleton code for the assignment.

You can download this file from a Linux command prompt using the curl command:

curl -O https://jhucsf.github.io/spring2020/assign/csf_assign01.zip

Note that in the -O option, it is the letter “O”, not the numeral “0”.

Arbitrary-precision integer arithmetic

As you know from the material we have been covering, including Chapter 2 of the textbook, the hardware-supported numeric data types are finite, meaning that only a finite set of values can be directly represented using those data types. Most programming languages (such as C, C++, and Java) use hardware numeric data types to implement their primitive numeric types (int, long, double, etc.) As you have seen, the finite nature of machine data types leads to some odd situations, such as the existence of values a and b such that

a ≥ 0

b ≥ 0

a + b < a

The above inequalities could be true if, for example, the sum a + b is too large to represent using the data type to which a and b belong.

Some applications, such as cryptographic applications, need to represent values which behave like mathematical integers, in which case the machine integer data types can’t be used directly. Some programming languages, notably languges in the LISP family, have first-class support for arbitrary-precision integers (often referred to as “bignums”.) For programming languages without such support, having an arbitrary-precision integer data type can be very useful.

Bit strings

Arbitrarily large integer values can be represented as bit strings. For example, the decimal value

159114774885074890108

can be represented by the bit string

10001010000000101001001111001011101101010101001000100110110101111100

Note that 68 bits are required to represent this particular integer value, and that the highest bit set to 1 is the one representing 2⁶⁷.

Hexadecimal (base 16) notation is convenient for working with bit strings, because each hex “digit” represents exactly 4 bits of information. The value shown above, represented in hexadecimal, is

8A0293CBB55226D7C

The `ApInt` data type

An instance of the ApInt data type (Arbitrary precision Integer) represents an arbitrary nonnegative integer value.

The ApInt data type is defined as follows (in the header file apint.h):

typedef struct {
    /* TODO: representation */
} ApInt;

So, ApInt is a typedef (alias) for an unnamed struct type. You will need to add fields to the unnamed struct type so that an allocated instance of ApInt can represent any nonnegative integer value.

Because ApInt instances can represent any arbitrarily-large integer value, they can require an arbitrary amount of memory, and so are accessed via pointers (so that their representations can be allocated dynamically.)

A set of functions is defined to allow a program to create, use, and destroy instance of ApInt. All of these functions have named beginning with apint. (It is a good practice when designing APIs for C programs to use a common naming prefix for functions associated with a particular data type.) These functions are:

ApInt *apint_create_from_u64(uint64_t val);
ApInt *apint_create_from_hex(const char *hex);
void apint_destroy(ApInt *ap);
uint64_t apint_get_bits(ApInt *ap, unsigned n);
int apint_highest_bit_set(ApInt *ap);
ApInt *apint_lshift(ApInt *ap);
ApInt *apint_lshift_n(ApInt *ap, unsigned n);
char *apint_format_as_hex(ApInt *ap);
ApInt *apint_add(const ApInt *a, const ApInt *b);
ApInt *apint_sub(const ApInt *a, const ApInt *b);
int apint_compare(const ApInt *left, const ApInt *right);

Note that all of the functions declared as returning ApInt * must return a pointer to a dynamically allocated instance of ApInt that will be deallocated by a subsequent call to apint_destroy.

Here are brief descriptions of the expected behavior of these functions.

apint_create_from_u64: Returns a pointer to an ApInt instance whose value is specified by the val parameter, which is a 64-bit unsigned value.

apint_create_from_hex: Returns a pointer to an ApInt instance whose value is specified by the hex parameter, which is an arbitrary sequence of hexadecimal (base 16) digits. This function should accept both the lower-case letters a through f and the upper-case letters A through F as the hex digits with values 10 through 15.

apint_destroy: Deallocates the memory used by the ApInt instance pointed-to by the ap parameter.

apint_get_bits: Returns a uint64_t value containing 64 bits of the binary representation of the ApInt instance pointed to by ap. The parameter n indicates which bits to return. If n is 0, bits 0..63 are returned, if n is 1 bits 64..127 are returned, etc. The function should be prepared to handle arbitrarily large values of n.

apint_highest_bit_set: Returns the position of the most significant bit set to 1 in representation of the ApInt pointed to by ap. As a special case, returns -1 if the ApInt instance pointed to by ap represents the value 0.

apint_lshift: Returns a pointer to an ApInt instance formed by shifting each bit of the ApInt instance pointed to by ap one bit position to the left.

apint_lshift_n: Returns a pointer to an ApInt instance formed by shifting each bit of the ApInt instance pointed to by ap n bit positions to the left.

apint_format_as_hex: Returns a pointer to a dynamically-allocated C character string containing the hexadecimal (base 16) digits of the representation of the ApInt instance pointed to by ap. Note that the hex digits representing the values 10 through 15 should be lower-case a through f. The string returned should not have any leading zeroes, except in the special case of the ApInt instance representing the value 0, in which case the returned string should consist of a single 0 digit.

apint_add: Computes the sum a plus b, and returns a pointer to an ApInt instance representing the sum.

apint_sub: Computes the difference a minus b, and returns a pointer to an ApInt instance representing the difference. As a special case, if b is greater than a, returns NULL (since the ApInt data type cannot represent a negative value.)

apint_compare: Compares the values represented by the ApInt instances pointed-to by the parameters left and right. Returns a negative value if left is less that right, a positive value if left is greater than right, and 0 if the values are equal.

Tasks

This section explains the tasks you are responsible for completing. Note that they aren’t meant to be strictly sequential, although you will need to handle Task 1 before proceeding to Task 2 and Task 3.

Task 1: Determine a data representation

You will need to determine a suitable data representation for ApInt values by specifying fields within the ApInt data type.

The basic idea is that the bit string required to represent the numeric value of an ApInt instance is stored in a variable-length array of fixed-precision integer values.

The representation should be reasonably space efficient. For example, an array of uint32_t or uint64_t values, such that the bits of the overall integer value are packed into the array elements, would be a space-efficient representation, because at most some small number (63 or fewer) of bits per instance would be “wasted” (by being leading 0 bits in the array element representing the most significant chunk of the bit string.) In contrast, representing the numeric value using hexadecimal digits stored n an array of char values would not be a space-efficient representation, since each char element would store one of only 16 possible values, wasting half of the bits in the array.

Note that you could represent the value using an array of small (8 or 16 bit) integer elements. However, the apint_add and apint_sub functions can be implemented more efficiently if they can operate on larger chunks of data.

The ApInt representation should have a way of keeping track of the length of the array storing the bit string. The idea is that ApInt values requiring more bits to represent will (in general) require more storage. The functions which operate on ApInt values will need to know the array length of each ApInt instance.

Task 2: Function implementation

Your main task is to implement all of the required functions as described above.

Note that as you work on the functions, you should also work on the unit tests! So, this task and Task 3 should be done in parallel.

Here are a few hints.

Use the assert macro to verify preconditions, postconditions, and invariants: assertions are a great way of catching logic errors in your code before they cause a crash or incorrect results

Start with the simpler functions (such as apint_create_from_u64 and apint_destroy), verify that they are working completely by writing unit tests, then move on to more complex functions (such as apint_lshift_n and apint_add)

Getting the apint_create_from_hex and apint_format_as_hex functions to work is an important milestone because they allow your test code to easily create and verify arbitrarily large integer values

Task 3: Unit testing

Whenever you develop data types and their associated operations, it is extremely important to have confidence that they behave correctly. Unit tests are a very effective way to test the behavior of functions to make sure they meet their specfications.

In this assignment, you will use a simple unit testing framework for C code called TCTest. You can read the README and demo program for specific information about how it works, but if you’ve used unit testing frameworks such as JUnit, it should be fairly straightforward.

The basic idea is to create instances of ApInt that can be used to test the various functions: these objects form the test fixture. Then, test methods carry out function calls on the test fixture objects (potentially creating new instances of ApInt as intermediate results), and use assertions to check that the observed behavior matches the expected behavior.

The apintTests.c source file contains the test program that will contain your unit tests. Some minimal test code is provided; you should improve the tests by adding new test fixture objects and test methods. To compile and run the test program, use the following commands:

# compile the tests
make depend
make apintTests

# run the tests
./apintTests

Note that the command ./apintTests runs all of the test functions. You can run a specific test function by specifying its name as a command line argument: for example,

./apintTests testCreateFromU64

When you run the test program, you will see output indicating which tests passed and failed. A test will fail if

a test assertion (using the ASSERT macro) fails
a C language assertion (using the assert macro) fails
a runtime exception such as a segmentation fault or floating point exception occurs

Here are some guidelines and hints to help you develop effective unit tests.

Test the simpler functions first. A good place to start is the apint_create_from_u64 and apint_get_bits functions.

Test every function. Your unit tests should test each of the ApInt functions thoroughly.

Test corner cases. Test for situations such as:

Operations on ApInt values with varying bit string lengths
Left shifts of varying lengths
Adding ApInt instances with different-length bit strings

Generate test cases programmatically. Write a script or program to generate large random integer values in hexadecimal format, generate test code to test operations on these values, and add them to your unit test program. For example, here is a script-generated test for the apint_add function:

a = apint_create_from_hex("7e5ff912c8ede6ccff0d56ae5a9b5459804f9");
b = apint_create_from_hex("6057bccd860546f03fd51bf5488d50cca96");
sum = apint_add(a, b);
ASSERT(0 == strcmp("7ec050cf9673ec13ef4d2bca4fe3e1aa4cf8f",
       (s = apint_format_as_hex(sum))));
apint_destroy(sum);
apint_destroy(b);
apint_destroy(a);
free(s);

As part of generating tests, it will be helpful to have a language or tool that can do arbitrary-precision arithmetic. Options include Python and the Unix bc program.

Update 1/29 genfact.rb is a Ruby program which generates arithmetic facts that you can use for testing. Run it as follows:

ruby genfact.rb

It will output an addition fact, subtraction fact, or inequality. You can take the generated fact and turn it into a test case. For example, the inequality

962d7e839ed2d377 < 8e11539c5ea7a510656b3fcc5403c83b91229139ab28bce

could be converted into the following test code:

a = apint_create_from_hex("962d7e839ed2d377");
b = apint_create_from_hex("8e11539c5ea7a510656b3fcc5403c83b91229139ab28bce");
ASSERT(apint_compare(a, b) < 0);
apint_destroy(b);
apint_destroy(a);

The script-generated addition fact

d4fa6f0b63ad80a34b93b74d + 3935dcebf95bdf = d4fa6f0b63e6b680378d132c

could be converted into the following test code:

a = apint_create_from_hex("d4fa6f0b63ad80a34b93b74d");
b = apint_create_from_hex("3935dcebf95bdf");
sum = apint_add(a, b);
ASSERT(0 == strcmp("d4fa6f0b63e6b680378d132c", (s = apint_format_as_hex(sum))));
apint_destroy(sum);
apint_destroy(b);
apint_destroy(a);
free(s);

Feel free to use genfact.rb as a basis for generating unit tests.

Use gdb to investigate bugs. When a test fails, use gdb to help determine the reason for the failure. Set breakpoints at the program location just prior to the point where the program state becomes corrupted. Single step and inspect variables to understand what the code is doing.

Use valgrind to check memory use. The valgrind tool is enormously useful for making sure your code does not contain memory errors such as

use of uninitialized values
out of bounds memory accesses
memory leaks
double free errors

To run valgrind, on your test program, the command is valgrind ./apintTests.

Submitting

To submit your work:

Run the command make solution.zip
Upload solution.zip to Gradescope as Assignment1
Please check the files you uploaded to make sure they are the ones you intended to submit

Update 1/30 There is an issue with the Makefile posted in the original version of the assignment skeleton: the make solution.zip command would not rebuild the zipfile even if files had changed. You can fix the solution.zip target by changing the original target

solution.zip :
	rm -f solution.zip
	zip -9r $@ Makefile *.h *.c

to the following:

.PHONY: solution.zip
solution.zip :
	rm -f solution.zip
	zip -9r $@ Makefile *.h *.c

You can also download a fixed version of the makefile: Makefile

Autograder

When you upload your submission to Gradescope, it will be tested by the autograder, which executes unit tests for each required function. Please note the following:

Only the “Impementation of data type and functions” rubric item is autograder
If your code does not compile successfully, all of the tests will fail
The autograder runs valgrind on your code, but it does not report any information about the result of running valgrind: points will be deducted if your code has memory errors or memory leaks!