/ Clozure Common Lisp

Debugging Signal Handlers on Darwin

On Darwin (i.e., on Mac OS X OS X macOS), Unix signals are implemented on top of the Mach exception handling mechanism. The problem is that the debugger (at least as of some years ago) would stop at the underlying Mach exception, and there was no apparent way to continue to the signal handler.

I was doing some experiments on an arm64 Mac, and it turns out that current lldb now has a way: the platform.plugin.darwin.ignored-exceptions setting.

List the mach exceptions to ignore, separated by ‘|’ (e.g. ‘EXC_BAD_ACCESS|EXC_BAD_INSTRUCTION’). lldb will instead stop on the BSD signal the exception was converted into, if there is one.

Thus, the following lldb command does the trick:

settings set platform.plugin.darwin.ignored-exceptions EXC_BAD_ACCESS|EXC_BAD_INSTRUCTION

This works for both arm64 and x86-64.

Read on for details, if you happen to be interested in this obscure topic.

Here is some C code that shows the problem.

#include <stdlib.h>
#include <unistd.h>
#include <signal.h>

void handler(int signum, siginfo_t *info, ucontext_t *context)
{
    write(STDOUT_FILENO, "yow\n", 4);
    _exit(1);
}

void crash()
{
#if defined(__x86_64__)
    asm("int $0xc0");           /* will cause SIGSEGV */
#elif defined(__arm64__)
    asm("udf #0");              /* will cause SIGILL  */
#else
    #error "Did Apple change processor architectures again?"
#endif
}

int main()
{
    struct sigaction sa = { 0 };

    sa.sa_sigaction = (void *)handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = SA_SIGINFO;

    sigaction(SIGSEGV, &sa, NULL);
    sigaction(SIGILL, &sa, NULL);

    crash();
}

Save this code into a file named handler.c, and compile it.

$ cc -g handler.c

Now, suppose we want to debug the signal handler. Fire uo the debugger, put a breakpoint on the handler function, and run the program.

$ lldb a.out
(lldb) target create "a.out"
Current executable set to '/Volumes/ccl/a.out' (arm64).
(lldb) b handler
Breakpoint 1: 2 locations.
(lldb) run
Process 76160 launched: '/Volumes/ccl/a.out' (arm64)
Process 76160 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x0)
    frame #0: 0x0000000100000498 a.out`crash at handler.c:16:5
   13  	#if defined(__x86_64__)
   14  	    asm("int $0xc0");           /* will cause SIGSEGV */
   15  	#elif defined(__arm64__)
-> 16  	    asm("udf #0");              /* will cause SIGILL  */
   17  	#else
   18  	    #error "Did Apple change processor architectures again?"
   19  	#endif
Target 0: (a.out) stopped.
(lldb) 

Note that the “stop reason” is because the debugger detected the EXC_BAD_INSTRUCTION Mach exception message. Trying to continue from here is apparently (via continue) is apparently futile: we just stop again and again at the same place.

Now, let’s use the platform.plugin.darwin.ignored-exceptions setting, and re-run the program:

(lldb) settings set platform.plugin.darwin.ignored-exceptions EXC_BAD_INSTRUCTION
(lldb) run
There is a running process, kill it and restart?: [Y/n] y
Process 76160 exited with status = 9 (0x00000009) killed
Process 76166 launched: '/Volumes/ccl/a.out' (arm64)
Process 76166 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGILL
    frame #0: 0x0000000100000498 a.out`crash at handler.c:16:5
   13  	#if defined(__x86_64__)
   14  	    asm("int $0xc0");           /* will cause SIGSEGV */
   15  	#elif defined(__arm64__)
-> 16  	    asm("udf #0");              /* will cause SIGILL  */
   17  	#else
   18  	    #error "Did Apple change processor architectures again?"
   19  	#endif
Target 0: (a.out) stopped.
(lldb)

We now observe that the “stop reason” is SIGILL (on arm64 anyway). This time, if we contine, we hit the breakpoint on the handler function, and we can actually see what is going on!

(lldb) cont
Process 76166 resuming
Process 76166 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000480 a.out`handler(signum=4, info=0x000000016fdfeab0, context=0x000000016fdfeb18) at handler.c:7:5
   4   	
   5   	void handler(int signum, siginfo_t *info, ucontext_t *context)
   6   	{
-> 7   	    write(STDOUT_FILENO, "yow\n", 4);
   8   	    _exit(1);
   9   	}
   10  	
Target 0: (a.out) stopped.
(lldb) 

If we know that we will want to handle SIGILL frequently as part of normal operation (as an arm64 port of CCL will probably want to do), we can tell the debugger what to do when a signal arrives.

(lldb) process handle SIGILL --pass true --stop false --notify false
NAME         PASS   STOP   NOTIFY
===========  =====  =====  ======
SIGILL       true   false  false

And now, when we restart the program, we stop right in the signal handler.

(lldb) run
There is a running process, kill it and restart?: [Y/n] y
Process 76166 exited with status = 9 (0x00000009) killed
Process 76186 launched: '/Volumes/ccl/a.out' (arm64)
Process 76186 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000480 a.out`handler(signum=4, info=0x000000016fdfeab0, context=0x000000016fdfeb18) at handler.c:7:5
   4   	
   5   	void handler(int signum, siginfo_t *info, ucontext_t *context)
   6   	{
-> 7   	    write(STDOUT_FILENO, "yow\n", 4);
   8   	    _exit(1);
   9   	}
   10  	
Target 0: (a.out) stopped.
(lldb) process handle SIGILL --pass true --stop false --notify false

CCL on the Mac previously used the Mach exception handling interface directly, in large part because it was all but impossible to debug signal handlers after the debugger stopped on EXC_BAD_ACCESS.

The GitHub issue https://github.com/llvm/llvm-project/issues/60438 reports the same long standing problem that we observed with CCL, and one of the issues comments notes that the the platform.plugin.darwin.ignored-exceptions setting appeared in https://github.com/llvm/llvm-project/commit/bff4673b41781ec5bff6b96b52cf321d2271726c (which was first present in the lldb 15.x releases).