Clozure CL Documentation

12.11. Tutorial: Allocating Foreign Data on the Lisp Heap

Not every foreign function is so marvelously easy to use as the ones we saw in the last section. Some functions require you to allocate a C struct, fill it with your own information, and pass in a pointer to that struct. Some of them require you to allocate an empty struct that they will fill in so that you can read the information out of it.

There are generally two ways to allocate foreign data. The first way is to allocate it on the stack; the RLET macro is one way to do this. This is analogous to using automatic variables in C. In the jargon of Common Lisp, data allocated this way is said to have dynamic extent.

The other way to heap-allocate the foreign data. This is analogous to calling malloc in C. Again in the jargon of Common Lisp, heap-allocated data is said to have indefinite extent. If a function heap-allocates some data, that data remains valid even after the function itself exits. This is useful for data which may need to be passed between multiple C calls or multiple threads. Also, some data may be too large to copy multiple times or may be too large to allocate on the stack.

The big disadvantage to allocating data on the heap is that it must be explicitly deallocated—you need to "free" it when you're done with it. Normal Lisp objects, even those with indefinite extent, are deallocated by the garbage collector when it can prove that they're no longer referenced. Foreign data, though, is outside the GC's ken: it has no way to know whether a blob of foreign data is still referenced by foreign code or not. It is thus up to the programmer to manage it manually, just as one does in C with malloc and free.

What that means is that, if you allocate something and then lose track of the pointer to it, there's no way to ever free that memory. That's what's called a memory leak, and if your program leaks enough memory it will eventually use up all of it! So, you need to be careful to not lose your pointers.

That disadvantage, though, is also an advantage for using foreign functions. Since the garbage collector doesn't know about this memory, it will never move it around. External C code needs this, because it doesn't know how to follow it to where it moved, the way that Lisp code does. If you allocate data manually, you can pass it to foreign code and know that no matter what that code needs to do with it, it will be able to, until you deallocate it. Of course, you'd better be sure it's done before you do. Otherwise, your program will be unstable and might crash sometime in the future, and you'll have trouble figuring out what caused the trouble, because there won't be anything pointing back and saying "you deallocated this too soon."

And, so, on to the code...

As in the last tutorial, our first step is to create a local dynamic library in order to help show what is actually going on between CCL and C. So, create the file ptrtest.c, with the following code:

#include <stdio.h>

void reverse_int_array(int * data, unsigned int dataobjs)
{
    int i, t;
    
    for(i=0; i<dataobjs/2; i++)
        {
            t = *(data+i);
            *(data+i) = *(data+dataobjs-1-i);
            *(data+dataobjs-1-i) = t;
        }
}

void reverse_int_ptr_array(int **ptrs, unsigned int ptrobjs)
{
    int *t;
    int i;
    
    for(i=0; i<ptrobjs/2; i++)
        {
            t = *(ptrs+i);
            *(ptrs+i) = *(ptrs+ptrobjs-1-i);
            *(ptrs+ptrobjs-1-i) = t;
        }
}

void
reverse_int_ptr_ptrtest(int **ptrs)
{
    reverse_int_ptr_array(ptrs, 2);
    
    reverse_int_array(*(ptrs+0), 4);
    reverse_int_array(*(ptrs+1), 4);
}

This defines three functions. reverse_int_array takes a pointer to an array of ints, and a count telling how many items are in the array, and loops through it putting the elements in reverse. reverse_int_ptr_array does the same thing, but with an array of pointers to ints. It only reverses the order the pointers are in; each pointer still points to the same thing. reverse_int_ptr_ptrtest takes an array of pointers to arrays of ints. (With me?) It doesn't need to be told their sizes; it just assumes that the array of pointers has two items, and that both of those are arrays which have four items. It reverses the array of pointers, then it reverses each of the two arrays of ints.

Now, compile ptrtest.c into a dynamic library using the command:

      gcc -dynamiclib -Wall -o libptrtest.dylib ptrtest.c -install_name ./libptrtest.dylib

The function make-heap-ivector is the primary tool for allocating objects in heap memory. It allocates a fixed-size CCL object in heap memory. It returns both an array reference, which can be used directly from CCL, and a macptr, which can be used to access the underlying memory directly. For example:

      ? ;; Create an array of 3 4-byte-long integers
      (multiple-value-bind (la lap)
          (make-heap-ivector 3 '(unsigned-byte 32))
        (setq a la)
        (setq ap lap))
      ;Compiler warnings :
      ;   Undeclared free variable A, in an anonymous lambda form.
      ;   Undeclared free variable AP, in an anonymous lambda form.
      #<A Mac Pointer #x10217C>

      ? a
      #(1396 2578 97862649)

      ? ap
      #<A Mac Pointer #x10217C>

It's important to realize that the contents of the ivector we've just created haven't been initialized, so their values are unpredictable, and you should be sure not to read from them before you set them, to avoid confusing results.

At this point, a references an object which works just like a normal array. You can refer to any item of it with the standard aref function, and set them by combining that with setf. As noted above, the ivector's contents haven't been initialized, so that's the next order of business:

      ? a
      #(1396 2578 97862649)

      ? (aref a 2)
      97862649

      ? (setf (aref a 0) 3)
      3

      ? (setf (aref a 1) 4)
      4

      ? (setf (aref a 2) 5)
      5

      ? a
      #(3 4 5)

In addition, the macptr allows direct access to the same memory:

      ? (setq *byte-length-of-long* 4)
      4

      ? (%get-signed-long ap (* 2 *byte-length-of-long*))
      5

      ? (%get-signed-long ap (* 0 *byte-length-of-long*))
      3

      ? (setf (%get-signed-long ap (* 0 *byte-length-of-long*)) 6)
      6

      ? (setf (%get-signed-long ap (* 2 *byte-length-of-long*)) 7)
      7

      ? ;; Show that a actually got changed through ap
      a
      #(6 4 7)

So far, there is nothing about this object that could not be done much better with standard Lisp. However, the macptr can be used to pass this chunk of memory off to a C function. Let's use the C code to reverse the elements in the array:

      ? ;; Insert the full path to your copy of libptrtest.dylib
      (open-shared-library "/Users/andrewl/openmcl/openmcl/gtk/libptrtest.dylib")
      #<SHLIB /Users/andrewl/openmcl/openmcl/gtk/libptrtest.dylib #x639D1E6>

      ? a
      #(6 4 7)

      ? ap
      #<A Mac Pointer #x10217C>

      ? (external-call "_reverse_int_array" :address ap :unsigned-int (length a) :address)
      #<A Mac Pointer #x10217C>

      ? a
      #(7 4 6)

      ? ap
      #<A Mac Pointer #x10217C>

The array gets passed correctly to the C function, reverse_int_array. The C function reverses the contents of the array in-place; that is, it doesn't make a new array, just keeps the same one and reverses what's in it. Finally, the C function passes control back to CCL. Since the allocated array memory has been directly modified, CCL reflects those changes directly in the array as well.

There is one final bit of housekeeping to deal with. Before moving on, the memory needs to be deallocated:

      ? (dispose-heap-ivector a ap)
      NIL

The dispose-heap-ivector macro actually deallocates the ivector, releasing its memory into the heap for something else to use. Both a and ap now have undefined values.

When do you call dispose-heap-ivector? Anytime after you know the ivector will never be used again, but no sooner. If you have a lot of ivectors, say, in a hash table, you need to make sure that when whatever you were doing with the hash table is done, those ivectors all get freed. Unless there's still something somewhere else which refers to them, of course! Exactly what strategy to take depends on the situation, so just try to keep things simple unless you know better.

The simplest situation is when you have things set up so that a Lisp object "encapsulates" a pointer to foreign data, taking care of all the details of using it. In this case, you don't want those two things to have different lifetimes: You want to make sure your Lisp object exists as long as the foreign data does, and no longer; and you want to make sure the foreign data doesn't get deallocated while your Lisp object still refers to it.

If you're willing to accept a few limitations, you can make this easy. First, you can't let foreign code keep a permanent pointer to the memory; it has to always finish what it's doing, then return, and not refer to that memory again. Second, you can't let any Lisp code that isn't part of your encapsulating "wrapper" refer to the pointer directly. Third, nothing, either foreign code or Lisp code, should explicitly deallocate the memory.

If you can make sure all of these are true, you can at least ensure that the foreign pointer is deallocated when the encapsulating object is about to become garbage, by using CCL's nonstandard "termination" mechanism, which is essentially the same as what Java and other languages call "finalization".

Termination is a way of asking the garbage collector to let you know when it's about to destroy an object which isn't used anymore. Before destroying the object, it calls a function which you write, called a terminator.

So, you can use termination to find out when a particular macptr is about to become garbage. That's not quite as helpful as it might seem: It's not exactly the same thing as knowing that the block of memory it points to is unreferenced. For example, there could be another macptr somewhere to the same block; or, if it's a struct, there could be a macptr to one of its fields. Most problematically, if the address of that memory has been passed to foreign code, it's sometimes hard to know whether that code has kept the pointer. Most foreign functions don't, but it's not hard to think of exceptions.

You can use code such as this to make all this happen:

      (defclass wrapper (whatever)
        ((element-type :initarg :element-type)
         (element-count :initarg :element-count)
         (ivector)
         (macptr)))

      (defmethod initialize-instance ((wrapper wrapper) &rest initargs)
        (declare (ignore initargs))
        (call-next-method)
        (ccl:terminate-when-unreachable wrapper)
        (with-slots (ivector macptr element-type element-count) wrapper
          (multiple-value-bind (new-ivector new-macptr)
              (make-heap-ivector element-count element-type)
            (setq ivector new-ivector
                  macptr new-macptr))))

      (defmethod ccl:terminate ((wrapper wrapper))
        (with-slots (ivector macptr) wrapper
          (when ivector
            (dispose-heap-ivector ivector macptr)
            (setq ivector nil
                  macptr nil))))

The ccl:terminate method will be called on some arbitrary thread sometime (hopefully soon) after the GC has decided that there are no strong references to an object which has been the argument of a ccl:terminate-when-unreachable call.

If it makes sense to say that the foreign object should live as long as there's Lisp code that references it (through the encapsulating object) and no longer, this is one way of doing that.

Now we've covered passing basic types back and forth with C, and we've done the same with pointers. You may think this is all... but we've only done pointers to basic types. Join us next time for pointers... to pointers.

12.11.1. Acknowledgement

Much of this chapter was generously contributed by Andrew P. Lentvorski Jr.