Assignment 5: Garbage Collection

(graded, group assignment)

In the fifth graded assignment, you will implement a copying garbage collector using Cheney’s algorithm for the L3 virtual machine.

Introduction

ASM Lowering and Virtual Machine

The initial commit for the 5th assignment contains two new elements:

  • a lowered CPS tree to assembly language transformer in l3/compiler/src/l3/CPSToASMTranslator.scala and the tools around this transformation;
  • a virtual machine for executing the assmbly code, in the l3/vm directory.

The compiler code now contains a jar file with the reference transformations for assignments 2,3 and 4. The transformations are located in package l3.reference instead of just l3, so they do not conflict with your implementations. They are used by default in Main.scala, so you can compile any examples you come up with using the reference compiler (this may be an asset or a liability, depending on how well you implemented your compiler, as there are a few known bugs in the reference optimizer).

The virtual machine contains two main components:

  • the interpreter, located in l3/vm/src/engine.c (and with the header file l3/vm/src/engine.h);
  • the memory system, located in l3/vm/src/memory.c (and with the header file l3/vm/src/memory.h).

It also contains basic infrastructure like a makefile, an eclipse project and 4 tests that you can use to check your GC implementation.

Getting Started

As always, the steps for getting started are:

 git fetch origin git checkout origin/assignment5 -b assignment5 git merge assignment4 

And resolve any merge conflicts.

Now, you will have the vm directory in l3/. Go there and run make:

 $ cd l3/vm/ $ make Use the following targets:  - 'make vm' to use your copying GC  - 'make test' to test the VM  - 'make clean' to clean the VM $ make test rm -rf bin mkdir -p bin cc -std=c99 -g -Wall -O3  src/engine.c src/fail.c src/loader.c src/main.c src/memory.c -o bin/vm Queens test failed! Bignums test failed! Pascal test failed! Maze test failed! 

You can also run the tests individually, since they’re included in l3/examples/asm (source code in l3/examples):

 $ bin/vm ../examples/asm/queens.asm  enter size (0 to exit)> 10 Error: no memory left (block of size 16 requested) 

Your task is simple: make the tests pass by implementing a copying garbage collector instead of the no-GC memory system currently implemented in l3/vm/src/memory.c.

Eclipse Projects

For the existing Eclipse project, you will have to close Eclipse and run sbt eclipse again to refresh the project definition and include the jar containing the reference implementations for assignments 2, 3 and 4.

An Eclipse project is also available for the virtual machine. You can import the project by going to File > Import > General > Existing Projects into Workspace and point Eclipse to l3/vm where it will pick up the l3-vm project.

In case Eclipse can’t find the C library headers, such as stdio.h or stdlib.h you will need to add the specific directories for your operating system and compiler. To do this, check out this question on stackoverflow. For the virtual machine, the directory that needs to be added is /usr/lib/gcc/i686-linux-gnu/4.7/include/.

Okay, that’s it with the intro. Now for real work.

Implementaion

When starting the virtual machine, the first call is to memory_setup. This allows the memory manager to allocate the requested memory (the amount can be changed by passing the -m <size> option on the command line). The method memory_get_start returns a pointer to the beginning of this allocated memory.

The virtual machine then loads the program code into the memory. Afterwards, the method memory_set_heap_start is called, indicating to the memory manager the address which directly follows the program code. This is the address where the memory manager can start allocating memory in the function memory_allocate.

Tips and Tricks

Valid Pointer Bitmap

Even though we use tagged values, it is possible that a register contains a non-tagged value which was used as the intermediate of an arithmetic operation. Since registers are also stored in the heap, these values might end up left over in there. Such a value might look like a pointer, and it might point anywhere in the heap, not only to the beginning of a block. In order to prevent the GC from starting to copy at random parts of the heap, it has to know whether a certain address is the start of a block or not.

One solution to store this information is to allocate a bitmap at the beginning of both the from and the to space which contains one bit for every possible address. This bit is set to 1 whenever a block is allocated at the corresponding address.

To complete the above picture, the heap will look like this:

 

Pointer Formats
  • The from and to spaces can be seen as arrays of 32-bit values, which is represented by the type value_t in the VM. Every entry in the array can contain arbitrary 32 bit information: tagged values, virtual pointers, block headers or registers.
  • There are two kinds of pointers in the VM: physical addresses, represented as values of type value_t*, and virtual addresses, represented as values of type value_t. For efficiency reasons, the pointers stored in the base registers (result of engine_get_base_register) are physical pointers. On the other hand, all pointers stored in the heap are virtual addresses. You can look at the functions addr_v_to_p and addr_p_to_v in the file engine.c to see how to convert between the two address types.
  • After copying a block to the to-space, a forwarding pointer needs to be stored in the block’s location in the from-space. In order to do that you can use just overwrite the old header of the block in the from space. This implies that the headers need to be tagged (just like integers) in order to be able to differentiate them from a forwarding pointer.

Evaluation

You can either use the make test command to execute the tests automatically, or you can run each individual test by hand (good for debugging):

 $ bin/vm ../examples/asm/queens.asm  enter size (0 to exit)> 10 Error: no memory left (block of size 16 requested) 
The expected sizes your program should run on are:
  • for queens.asm: without GC – breaks at 6, GC works with 14
  • for bignums.asm: without GC – breaks at 108, GC works with 300
  • for pascal.asm: without GC – breaks at 28, GC works with 50
  • for maze.asm: without GC – breaks at 5, GC works with 14
 

Debugging

When debugging you may want to trigger a garbage collection early on. To do so, print the code size and adjust the memory with bin/vm -m <bytes> such that the size is just above the code size + twice the bitmap size. This will let you trigger the garbage collection early, when the program hasn’t yet allocated too much space, allowing easier debugging.