Previous Section Next Section Table of Contents Glossary Index

Chapter 17. Implementation Details of Clozure CL

17.3. Heap Allocation

When the Clozure CL kernel first starts up, a large contiguous chunk of the process's address space is mapped as "anonymous, no access" memory. ("Large" means different things in different contexts; on LinuxPPC32, it means "about 1 gigabyte", on DarwinPPC32, it means "about 2 gigabytes", and on current 64-bit platforms it ranges from 128 to 512 gigabytes, depending on OS. These values are both defaults and upper limits; the --heap-reserve argument can be used to try to reserve less than the default.)

Reserving address space that can't (yet) be read or written to doesn't cost much; in particular, it doesn't require that corresponding swap space or physical memory be available. Marking the address range as being "mapped" helps to ensure that other things (results from random calls to malloc(), dynamically loaded shared libraries) won't be allocated in this region that lisp has reserved for its own heap growth.

A small portion (around 1/32 on 32-bit platforms and 1/64 on 64-bit platforms) of that large chunk of address space is reserved for GC data structures. Memory pages reserved for these data structures are mapped read-write as pages are made writable in the main portion of the heap.

The initial heap image is mapped into this reserved address space and an additional (LISP-HEAP-GC-THRESHOLD) bytes are mapped read-write. GC data structures grow to match the amount of GC-able memory in the initial image plus the gc threshold, and control is transferred to lisp code. Inevitably, that code spoils everything and starts consing; there are basically three layers of memory allocation that can go on.

17.3.1. Per-thread object allocation

Each lisp thread has a private "reserved memory segment"; when a thread starts up, its reserved memory segment is empty. PPC ports maintain the highest unallocated address and the lowest allocatable address in the current segment in registers when running lisp code; on x86-664, these values are maintained in the current threads's TCR. (An "empty" heap segment is one whose high pointer and low pointer are equal.) When a thread is not in the middle of allocating something, the low 3 or 4 bits of the high and low pointers are clear (the pointers are doublenode-aligned.)

A thread tries to allocate an object whose physical size in bytes is X and whose tag is Y by:

  1. decrementing the "high" pointer by (- X Y)

  2. trapping if the high pointer is less than the low pointer

  3. using the (tagged) high pointer to initialize the object, if necessary

  4. clearing the low bits of the high pointer

On PPC32, where the size of a CONS cell is 8 bytes and the tag of a CONS cell is 1, machine code which sets the arg_z register to the result of doing (CONS arg_y arg_z) looks like:

  (SUBI ALLOCPTR ALLOCPTR 7)    ; decrement the high pointer by (- 8 1)
  (TWLLT ALLOCPTR ALLOCBASE)    ; trap if the high pointer is below the base
  (STW ARG_Z -1 ALLOCPTR)       ; set the CDR of the tagged high pointer
  (STW ARG_Y 3 ALLOCPTR)        ; set the CAR
  (MR ARG_Z ALLOCPTR)           ; arg_z is the new CONS cell
  (RLWINM ALLOCPTR ALLOCPTR 0 0 28)     ; clear tag bits
	    

On x86-64, the idea's similar but the implementation is different. The high and low pointers to the current thread's reserved segment are kept in the TCR, which is addressed by the gs segment register. An x86-64 CONS cell is 16 bytes wide and has a tag of 3; we canonically use the temp0 register to initialize the object

  (subq ($ 13) ((% gs) 216))    ; decrement allocptr
  (movq ((% gs) 216) (% temp0)) ; load allocptr into temp0
  (cmpq ((% gs) 224) (% temp0)) ; compare to allocabase
  (jg L1)                       ; skip trap
  (uuo-alloc)                   ; uh, don't skip trap
L1
  (andb ($ 240) ((% gs) 216))   ; untag allocptr in the tcr
  (movq (% arg_y) (5 (% temp0))) ; set the car
  (movq (% arg_z) (-3 (% temp0))); set the cdr
  (movq (% temp0) (% arg_z))    ; return the cons
	    

If we don't take the trap (if allocating 8-16 bytes doesn't exhaust the thread's reserved memory segment), that's a fairly short and simple instruction sequence. If we do take the trap, we'll have to do some additional work in order to get a new segment for the current thread.

17.3.2. Allocation of reserved heap segments

After the lisp image is first mapped into memory - and after each full GC - the lisp kernel ensures that (LISP-HEAP-GC-TRESHOLD) additional bytes beyond the current end of the heap are mapped read-write.

If a thread traps while trying to allocate memory, the thread goes through the usual exception-handling protocol (to ensure that any other thread that GCs "sees" the state of the trapping thread and to serialize exception handling.) When the exception handler runs, it determines the nature and size of the failed allocation and tries to complete the allocation on the thread's behalf (and leave it with a reasonably large thread-specific memory segment so that the next small allocation is unlikely to trap.

Depending on the size of the requested segment allocation, the number of segment allocations that have occurred since the last GC, and the EGC and GC thresholds, the segment allocation trap handler may invoke a full or ephemeral GC before returning a new segment. It's worth noting that the [E]GC is triggered based on the number of and size of these segments that have been allocated since the last GC; it doesn't have much to do with how "full" each of those per-thread segments are. It's possible for a large number of threads to do fairly incidental memory allocation and trigger the GC as a result; avoiding this involves tuning the per-thread allocation quantum and the GC/EGC thresholds appropriately.

17.3.3. Heap growth

All OSes on which Clozure CL currently runs use an "overcommit" memory allocation strategy by default (though some of them provide ways of overriding that default.) What this means in general is that the OS doesn't necessarily ensure that backing store is available when asked to map pages as read-write; it'll often return a success indicator from the mapping attempt (mapping the pages as "zero-fill, copy-on-write"), and only try to allocate the backing store (swap space and/or physical memory) when non-zero contents are written to the pages.

It -sounds- like it'd be better to have the mmap() call fail immediately, but it's actually a complicated issue. (It's possible that other applications will stop using some backing store before lisp code actually touches the pages that need it, for instance.) It's also not guaranteed that lisp code would be able to "cleanly" signal an out-of-memory condition if lisp is ... out of memory

I don't know that I've ever seen an abrupt out-of-memory failure that wasn't preceded by several minutes of excessive paging activity. The most expedient course in cases like this is to either (a) use less memory or (b) get more memory; it's generally hard to use memory that you don't have.


Previous Section Next Section Table of Contents Glossary Index