Previous Chapter Next Section Table of Contents Glossary Index

Chapter 17. Implementation Details of Clozure CL

17.1. Threads and exceptions

Clozure CL's threads are "native" (meaning that they're scheduled and controlled by the operating system.) Most of the implications of this are discussed elsewhere; this section tries to describe how threads look from the lisp kernel's perspective (and especially from the GC's point of view.)

Clozure CL's runtime system tries to use machine-level exception mechanisms (conditional traps when available, illegal instructions, memory access protection in some cases) to detect and handle exceptional situations. These situations include some TYPE-ERRORs and PROGRAM-ERRORS (notably wrong-number-of-args errors), and also include cases like "not being able to allocate memory without GCing or obtaining more memory from the OS." The general idea is that it's usually faster to pay (very occasional) exception-processing overhead and figure out what's going on in an exception handler than it is to maintain enough state and context to handle an exceptional case via a lighter-weight mechanism when that exceptional case (by definition) rarely occurs.

Some emulated execution environments (the Rosetta PPC emulator on x86 versions of Mac OS X) don't provide accurate exception information to exception handling functions. Clozure CL can't run in such environments.

17.1.1. The Thread Context Record

When a lisp thread is first created (or when a thread created by foreign code first calls back to lisp), a data structure called a Thread Context Record (or TCR) is allocated and initialized. On modern versions of Linux and FreeBSD, the allocation actually happens via a set of thread-local-storage ABI extensions, so a thread's TCR is created when the thread is created and dies when the thread dies. (The World's Most Advanced Operating System—as Apple's marketing literature refers to Darwin—is not very advanced in this regard, and I know of no reason to assume that advances will be made in this area anytime soon.)

A TCR contains a few dozen fields (and is therefore a few hundred bytes in size.) The fields are mostly thread-specific information about the thread's stacks' locations and sizes, information about the underlying (POSIX) thread, and information about the thread's dynamic binding history and pending CATCH/UNWIND-PROTECTs. Some of this information could be kept in individual machine registers while the thread is running (and the PPC - which has more registers available - keeps a few things in registers that the X86-64 has to access via the TCR), but it's important to remember that the information is thread-specific and can't (for instance) be kept in a fixed global memory location.

When lisp code is running, the current thread's TCR is kept in a register. On PPC platforms, a general purpose register is used; on x86-64, an (otherwise nearly useless) segment register works well (prevents the expenditure of a more generally useful general- purpose register for this purpose.)

The address of a TCR is aligned in memory in such a way that a FIXNUM can be used to represent it. The lisp function CCL::%CURRENT-TCR returns the calling thread's TCR as a fixnum; actual value of the TCR's address is 4 or 8 times the value of this fixnum.

When the lisp kernel initializes a new TCR, it's added to a global list maintained by the kernel; when a thread exits, its TCR is removed from this list.

When a thread calls foreign code, lisp stack pointers are saved in its TCR, lisp registers (at least those whose value should be preserved across the call) are saved on the thread's value stack, and (on x86-64) RSP is switched to the control stack. A field in the TCR (tcr.valence) is then set to indicate that the thread is running foreign code, foreign argument registers are loaded from a frame on the foreign stack, and the foreign function is called. (That's a little oversimplified and possibly inaccurate, but the important things to note are that the thread "stops following lisp stack and register usage conventions" and that it advertises the fact that it's done so. Similar transitions in a thread's state ("valence") occur when it enters or exits an exception handler (which is sort of an OS/hardware-mandated foreign function call where the OS thoughtfully saves the thread's register state for it beforehand.)

17.1.2. Exception contexts, and exception-handling in general

Unix-like OSes tend to refer to exceptions as "signals"; the same general mechanism ("signal handling") is used to process both asynchronous OS-level events (such as the result of the keyboard driver noticing that ^C or ^Z has been pressed) and synchronous hardware-level events (like trying to execute an illegal instruction or access protected memory.) It makes some sense to defer ("block") handling of asynchronous signals so that some critical code sequences complete without interruption; since it's generally not possible for a thread to proceed after a synchronous exception unless and until its state is modified by an exception handler, it makes no sense to talk about blocking synchronous signals (though some OSes will let you do so and doing so can have mysterious effects.)

On OSX/Darwin, the POSIX signal handling facilities coexist with lower-level Mach-based exception handling facilities. Unfortunately, the way that this is implemented interacts poorly with debugging tools: GDB will generally stop whenever the target program encounters a Mach-level exception and offers no way to proceed from that point (and let the program's POSIX signal handler try to handle the exception); Apple's CrashReporter program has had a similar issue and, depending on how it's configured, may bombard the user with alert dialogs which falsely claim that an application has crashed (when in fact the application in question has routinely handled a routine exception.) On Darwin/OSX, Clozure CL uses Mach thread-level exception handling facilities which run before GDB or CrashReporter get a chance to confuse themselves; Clozure CL's Mach exception handling tries to force the thread which received a synchronous exception to invoke a signal handling function ("as if" signal handling worked more usefully under Darwin.) Mach exception handlers run in a dedicated thread (which basically does nothing but wait for exception messages from the lisp kernel, obtain and modify information about the state of threads in which exceptions have occurred, and reply to the exception messages with an indication that the exception has been handled. The reply from a thread-level exception handler keeps the exception from being reported to GDB or CrashReporter and avoids the problems related to those programs. Since Clozure CL's Mach exception handler doesn't claim to handle debugging-related exceptions (from breakpoints or single-step operations), it's possible to use GDB to debug Clozure CL.

On platforms where signal handling and debugging don't get in each other's way, a signal handler is entered with all signals blocked. (This behavior is specified in the call to the sigaction() function which established the signal handler.) The signal handler receives three arguments from the OS kernel; the first is an integer that identifies the signal, the second is a pointer to an object of type "siginfo_t", which may or may not contain a few fields that would help to identify the cause of the exception, and the third argument is a pointer to a data structure (called a "ucontext" or something similar), which contains machine-dependent information about the state of the thread at the time that the exception/signal occurred. While asynchronous signals are blocked, the signal handler stores the pointer to its third argument (the "signal context") in a field in the current thread's TCR, sets some bits in another TCR field to indicate that the thread is now waiting to handle an exception, unblocks asynchronous signals, and waits for a global exception lock that serializes exception processing.

On Darwin, the Mach exception thread creates a signal context (and maybe a siginfo_t structure), stores the signal context in the thread's TCR, sets the TCR field which describes the thread's state, and arranges that the thread resume execution at its signal handling function (with a signal handler, possibly NULL siginfo_t, and signal context as arguments. When the thread resumes, it waits for the global exception lock.

On x86-64 platforms where signal handing can be used to handle synchronous exceptions, there's an additional complication: the OS kernel ordinarily allocates the signal context and siginfo structures on the stack of the thread that received the signal; in practice, that means "wherever RSP is pointing." Clozure CL's Section 17.2.3, “Register and stack usage conventions” require that the thread's value stack—where RSP is usually pointing while lisp code is running—contain only "nodes" (properly tagged lisp objects), and scribbling a signal context all over the value stack would violate this requirement. To maintain consistency, the sigaltstack() mechanism is used to cause the signal to be delivered on (and the signal context and siginfo to be allocated on) a special stack area (the last few pages of the thread's control stack, in practice). When the signal handler runs, it (carefully) copies the signal context and siginfo to the thread's control stack and makes RSP point into that stack before invoking the "real" signal handler. The effect of this hack is that the "real" signal handler always runs on the thread's control stack.

Once the exception handler has obtained the global exception lock, it uses the values of the signal number, siginfo_t, and signal context arguments to determine the (logical) cause of the exception. Some exceptions may be caused by factors that should generate lisp errors or other serious conditions (stack overflow); if this is the case, the kernel code may release the global exception lock and call out to lisp code. (The lisp code in question may need to repeat some of the exception decoding process; in particular, it needs to be able to interpret register values in the signal context that it receives as an argument.)

In some cases, the lisp kernel exception handler may not be able to recover from the exception (this is currently true of some types of memory-access fault and is also true of traps or illegal instructions that occur during foreign code execution. In such cases, the kernel exception handler reports the exception as "unhandled", and the kernel debugger is invoked.

If the kernel exception handler identifies the exception's cause as being a transient out-of-memory condition (indicating that the current thread needs more memory to cons in), it tries to make that memory available. In some cases, doing so involves invoking the GC.

17.1.3. Threads, exceptions, and the GC

Clozure CL's GC is not concurrent: when the GC is invoked in response to an exception in a particular thread, all other lisp threads must stop until the GC's work is done. The thread that triggered the GC iterates over the global TCR list, sending each other thread a distinguished "suspend" signal, then iterates over the list again, waiting for a per-thread semaphore that indicates that the thread has received the "suspend" signal and responded appropriately. Once all other threads have acknowledged the request to suspend themselves, the GC thread can run the GC proper (after doing any necessary Section 17.1.4, “PC-lusering”.) Once the GC's completed its work, the thread that invoked the GC iterates over the global TCR list, raising a per-thread "resume" semaphore for each other thread.

The signal handler for the asynchronous "suspend" signal is entered with all asynchronous signals blocked. It saves its signal-context argument in a TCR slot, raises the tcr's "suspend" semaphore, then waits on the TCR's "resume" semaphore.

The GC thread has access to the signal contexts of all TCRs (including its own) at the time when the thread received an exception or acknowledged a request to suspend itself. This information (and information about stack areas in the TCR itself) allows the GC to identify the "stack locations and register contents" that are elements of the GC's root set.

17.1.4. PC-lusering

It's not quite accurate to say that Clozure CL's compiler and runtime follow precise stack and register usage conventions at all times; there are a few exceptions:

  • On both PPC and x86-64 platforms, consing isn't fully atomic.It takes at least a few instructions to allocate an object in memory(and slap a header on it if necessary); if a thread is interrupted in the middle of that instruction sequence, the new object may or may not have been created or fully initialized at the point in time that the interrupt occurred. (There are actually a few different states of partial initialization)

  • On the PPC, the common act of building a lisp control stack frame involves allocating a four-word frame and storing three register values into that frame. (The fourth word - the back pointer to the previous frame - is automatically set when the frame is allocated.) The previous contents of those three words are unknown (there might have been a foreign stack frame at the same address a few instructions earlier),so interrupting a thread that's in the process of initializing a PPC control stack frame isn't GC-safe.

  • There are similar problems with the initialization of temp stackframes on the PPC. (Allocation and initialization doesn't happen atomically, and the newly allocated stack memory may have undefined contents.)

  • Section 17.5, “The ephemeral GC”'s write barrier has to be implemented atomically (i.e.,both an intergenerational store and the update of a corresponding reference bit has to happen without interruption, or neither of these events can happen.)

  • There are a few more similar cases.

Fortunately, the number of these non-atomic instruction sequences is small, and fortunately it's fairly easy for the interrupting thread to recognize when the interrupted thread is in the middle of such a sequence. When this is detected, the interrupting thread modifies the state of the interrupted thread (modifying its PC and other registers) so that it is no longer in the middle of such a sequence (it's either backed out of it or the remaining instructions are emulated.)

This works because (a) many of the troublesome instruction sequences are PPC-specific and it's relatively easy to partially disassemble the instructions surrounding the interrupted thread's PC on the PPC and (b) those instruction sequences are heavily stylized and intended to be easily recognized.


Previous Chapter Next Section Table of Contents Glossary Index