Clozure CL Documentation

17.5. The ephemeral GC

In the Clozure CL memory management scheme, the relative age of two objects in the dynamic heap can be determined by their addresses: if addresses X and Y are both addresses in the dynamic heap, X is younger than Y (X was created more recently than Y) if it is nearer to the free pointer (and farther from the base of the heap) than Y.

Ephemeral (or generational) garbage collectors attempt to exploit the following assumptions:

most newly created objects become garbage soon after they'recreated.
most objects that have already survived several GCs are unlikely to ever become garbage.
old objects can only point to newer objects as the result of a destructive modification (e.g., via SETF.)

By concentrating its efforts on (frequently and quickly) reclaiming newly created garbage, an ephemeral collector hopes to postpone the more costly full GC as long as possible. It's important to note that most programs create some long-lived garbage, so an EGC can't typically eliminate the need for full GC.

An EGC views each object in the heap as belonging to exactly one generation; generations are sets of objects that are related to each other by age: some generation is the youngest, some the oldest, and there's an age relationship between any intervening generations. Objects are typically assigned to the youngest generation when first allocated; any object that has survived some number of GCs in its current generation is promoted (or tenured) into an older generation.

When a generation is GCed, the roots consist of the stacks, registers, and global variables as always and also of any pointers to objects in that generation from other generations. To avoid the need to scan those (often large) other generations looking for such intergenerational references, the runtime system must note all such intergenerational references at the point where they're created (via Setf).^[2] The set of pointers that may contain intergenerational references is sometimes called the remembered set.

In Clozure CL's EGC, the heap is organized exactly the same as otherwise; "generations" are merely structures which contain pointers to regions of the heap (which is already ordered by age.) When a generation needs to be GCed, any younger generation is incorporated into it; all objects which survive a GC of a given generation are promoted into the next older generation. The only intergenerational references that can exist are therefore those where an old object is modified to contain a pointer to a new object.

The EGC uses exactly the same code as the full GC. When a given GC is "ephemeral",

the "base of the heap" used to determine an object's markbit address is the base of the generation being collected;
the markbits vector is actually a pointer into the middle of the global markbits table; preceding entries in this table are used to note doubleword addresses in older generations that (may) contain intergenerational references;
some steps (notably GCTWA and the handling of weak objects) are not performed;
the intergenerational references table is used to find additional roots for the mark and forward phases. If a bit is set in the intergenerational references table, that means that the corresponding doubleword (in some "old" generation, in some "earlier" part of the heap) may have had a pointer to an object in a younger generation stored into it.

With one exception (the implicit setfs that occur on entry to and exit from the binding of a special variable), all setfs that might introduce an intergenerational reference must be memoized. ^[3] It's always safe to push any cons cell or gvector locative onto the memo stack; it's never safe to push anything else.

Typically, the intergenerational references bitvector is sparse: a relatively small number of old locations are stored into, although some of them may have been stored into many times. The routine that scans the memoization buffer does a lot of work and usually does it fairly often; it uses a simple, brute-force method but might run faster if it was smarter about recognizing addresses that it'd already seen.

When the EGC mark and forward phases scan the intergenerational reference bits, they can clear any bits that denote doublewords that definitely do not contain intergenerational references.

^[2]This is sometimes called "The Write Barrier": all assignments which might result in intergenerational references must be noted, as if the other generations were write-protected.

^[3]Note that the implicit setfs that occur when initializing an object - as in the case of a call to cons or vector - can't introduce intergenerational references, since the newly created object is always younger than the objects used to initialize it.