Previous Section Next Section Table of Contents Glossary Index

Chapter 12. The Foreign-Function Interface

12.11. Tutorial: Allocating Foreign Data on the Lisp Heap

Not every foreign function is so marvelously easy to use as the ones we saw in the last section. Some of them require you to allocate a C struct, fill it in with your own information, and pass it a pointer to the struct. Some of them require you to allocate an empty struct so they can fill it in, and then you can read the information out of it.

Also, some of them have their own structs and return a pointer to that same struct every time you call them, but those are easier to deal with, so they won't be covered in this section.

You might know that Lisp (and, indeed, most programming languages) has two separate regions of memory. There's the stack, which is where variable bindings are kept. Memory on the stack is allocated every time any function is called, and deallocated when it returns, so it's useful for anything that doesn't need to last longer than one function call, when there's only one thread. If that's all you need, you can do it with .

Then, there's the heap, which holds everything else, and is our subject here. There are two advantages and one big disadvantage to putting things on the heap rather than the stack. First, data allocated on the heap can be passed outside of the scope in which it was created. This is useful for data which may need to be passed between multiple C calls or multiple threads. Also, some data may be too large to copy multiple times or may be too large to allocate on the stack.

The second advantage is security. If incoming data is being placed directly onto the stack, the input data can cause stack overflows and underflows. This is not something which Lisp users generally worry about since garbage collection generally handles memory management. However, "stack smashing" is one of the classic exploits in C which malicious hackers can use to gain control of a machine. Not checking external data is always a bad idea; however, allocating it into the heap at least offers more protection than direct stack allocation.

The big disadvantage to allocating data on the heap is that it must be explicitly deallocated—you need to "free" it when you're done with it. Ordinarily, in Lisp, you wouldn't allocate memory yourself, and the garbage collector would know about it, so you wouldn't have to think about it again. When you're doing it manually, it's very different. Memory management becomes a manual process, just like in C and C++.

What that means is that, if you allocate something and then lose track of the pointer to it, there's no way to ever free that memory. That's what's called a memory leak, and if your program leaks enough memory it will eventually use up all of it! So, you need to be careful to not lose your pointers.

That disadvantage, though, is also an advantage for using foreign functions. Since the garbage collector doesn't know about this memory, it will never move it around. External C code needs this, because it doesn't know how to follow it to where it moved, the way that Lisp code does. If you allocate data manually, you can pass it to foreign code and know that no matter what that code needs to do with it, it will be able to, until you deallocate it. Of course, you'd better be sure it's done before you do. Otherwise, your program will be unstable and might crash sometime in the future, and you'll have trouble figuring out what caused the trouble, because there won't be anything pointing back and saying "you deallocated this too soon."

And, so, on to the code...

As in the last tutorial, our first step is to create a local dynamic library in order to help show what is actually going on between CCL and C. So, create the file ptrtest.c, with the following code:

#include <stdio.h>

void reverse_int_array(int * data, unsigned int dataobjs)
{
    int i, t;
    
    for(i=0; i<dataobjs/2; i++)
        {
            t = *(data+i);
            *(data+i) = *(data+dataobjs-1-i);
            *(data+dataobjs-1-i) = t;
        }
}

void reverse_int_ptr_array(int **ptrs, unsigned int ptrobjs)
{
    int *t;
    int i;
    
    for(i=0; i<ptrobjs/2; i++)
        {
            t = *(ptrs+i);
            *(ptrs+i) = *(ptrs+ptrobjs-1-i);
            *(ptrs+ptrobjs-1-i) = t;
        }
}

void
reverse_int_ptr_ptrtest(int **ptrs)
{
    reverse_int_ptr_array(ptrs, 2);
    
    reverse_int_array(*(ptrs+0), 4);
    reverse_int_array(*(ptrs+1), 4);
}
    

This defines three functions. reverse_int_array takes a pointer to an array of ints, and a count telling how many items are in the array, and loops through it putting the elements in reverse. reverse_int_ptr_array does the same thing, but with an array of pointers to ints. It only reverses the order the pointers are in; each pointer still points to the same thing. reverse_int_ptr_ptrtest takes an array of pointers to arrays of ints. (With me?) It doesn't need to be told their sizes; it just assumes that the array of pointers has two items, and that both of those are arrays which have four items. It reverses the array of pointers, then it reverses each of the two arrays of ints.

Now, compile ptrtest.c into a dynamic library using the command:

      gcc -dynamiclib -Wall -o libptrtest.dylib ptrtest.c -install_name ./libptrtest.dylib
    

If that command doesn't make sense to you, feel free to go back and read about it at .

Now, start CCL and enter:

      ? ;; make-heap-ivector courtesy of Gary Byers
      (defun make-heap-ivector (element-count element-type)
       (let* ((subtag (ccl::element-type-subtype element-type)))
        (unless (= (logand subtag target::fulltagmask)
                 target::fulltag-immheader)
         (error "~s is not an ivector subtype." element-type))
        (let* ((size-in-bytes (ccl::subtag-bytes subtag element-count)))
         (ccl::%make-heap-ivector subtag size-in-bytes element-count))))      
      MAKE-HEAP-IVECTOR

      ? ;; dispose-heap-ivector created for symmetry
      (defmacro dispose-heap-ivector (a mp)
       `(progn
         (ccl::%dispose-heap-ivector ,a)
         ;; Demolish the arguments for safety
         (setf ,a nil)
         (setf ,mp nil)))
      DISPOSE-HEAP-IVECTOR
    

If you don't understand how those functions do what they do. That's okay; it gets into very fine detail which really doesn't matter, because you don't need to change them.

The function make-heap-ivector is the primary tool for allocating objects in heap memory. It allocates a fixed-size CCL object in heap memory. It returns both an array reference, which can be used directly from CCL, and a macptr, which can be used to access the underlying memory directly. For example:

      ? ;; Create an array of 3 4-byte-long integers
      (multiple-value-bind (la lap)
          (make-heap-ivector 3 '(unsigned-byte 32))
        (setq a la)
        (setq ap lap))
      ;Compiler warnings :
      ;   Undeclared free variable A, in an anonymous lambda form.
      ;   Undeclared free variable AP, in an anonymous lambda form.
      #<A Mac Pointer #x10217C>

      ? a
      #(1396 2578 97862649)

      ? ap
      #<A Mac Pointer #x10217C>
    

It's important to realize that the contents of the ivector we've just created haven't been initialized, so their values are unpredictable, and you should be sure not to read from them before you set them, to avoid confusing results.

At this point, a references an object which works just like a normal array. You can refer to any item of it with the standard aref function, and set them by combining that with setf. As noted above, the ivector's contents haven't been initialized, so that's the next order of business:

      ? a
      #(1396 2578 97862649)

      ? (aref a 2)
      97862649

      ? (setf (aref a 0) 3)
      3

      ? (setf (aref a 1) 4)
      4

      ? (setf (aref a 2) 5)
      5

      ? a
      #(3 4 5)
    

In addition, the macptr allows direct access to the same memory:

      ? (setq *byte-length-of-long* 4)
      4

      ? (%get-signed-long ap (* 2 *byte-length-of-long*))
      5

      ? (%get-signed-long ap (* 0 *byte-length-of-long*))
      3

      ? (setf (%get-signed-long ap (* 0 *byte-length-of-long*)) 6)
      6

      ? (setf (%get-signed-long ap (* 2 *byte-length-of-long*)) 7)
      7

      ? ;; Show that a actually got changed through ap
      a
      #(6 4 7)
    

So far, there is nothing about this object that could not be done much better with standard Lisp. However, the macptr can be used to pass this chunk of memory off to a C function. Let's use the C code to reverse the elements in the array:

      ? ;; Insert the full path to your copy of libptrtest.dylib
      (open-shared-library "/Users/andrewl/openmcl/openmcl/gtk/libptrtest.dylib")
      #<SHLIB /Users/andrewl/openmcl/openmcl/gtk/libptrtest.dylib #x639D1E6>

      ? a
      #(6 4 7)

      ? ap
      #<A Mac Pointer #x10217C>

      ? (external-call "_reverse_int_array" :address ap :unsigned-int (length a) :address)
      #<A Mac Pointer #x10217C>

      ? a
      #(7 4 6)

      ? ap
      #<A Mac Pointer #x10217C>
    

The array gets passed correctly to the C function, reverse_int_array. The C function reverses the contents of the array in-place; that is, it doesn't make a new array, just keeps the same one and reverses what's in it. Finally, the C function passes control back to CCL. Since the allocated array memory has been directly modified, CCL reflects those changes directly in the array as well.

There is one final bit of housekeeping to deal with. Before moving on, the memory needs to be deallocated:

      ? ;; dispose-heap-ivector created for symmetry
      ;; Macro repeated here for pedagogy
      (defmacro dispose-heap-ivector (a mp)
      `(progn
      (ccl::%dispose-heap-ivector ,a)
      ;; Demolish the arguments for safety
      (setf ,a nil)
      (setf ,mp nil)))
      DISPOSE-HEAP-IVECTOR

      ? (dispose-heap-ivector a ap)
      NIL

      ? a
      NIL

      ? ap
      NIL
    

The dispose-heap-ivector macro actually deallocates the ivector, releasing its memory into the heap for something else to use. In addition, it makes sure that the variables which it was called with are set to nil, because otherwise they would still be referencing the memory of the ivector - which is no longer allocated, so that would be a bug. Making sure there are no other variables set to it is up to you.

When do you call dispose-heap-ivector? Anytime after you know the ivector will never be used again, but no sooner. If you have a lot of ivectors, say, in a hash table, you need to make sure that when whatever you were doing with the hash table is done, those ivectors all get freed. Unless there's still something somewhere else which refers to them, of course! Exactly what strategy to take depends on the situation, so just try to keep things simple unless you know better.

The simplest situation is when you have things set up so that a Lisp object "encapsulates" a pointer to foreign data, taking care of all the details of using it. In this case, you don't want those two things to have different lifetimes: You want to make sure your Lisp object exists as long as the foreign data does, and no longer; and you want to make sure the foreign data doesn't get deallocated while your Lisp object still refers to it.

If you're willing to accept a few limitations, you can make this easy. First, you can't let foreign code keep a permanent pointer to the memory; it has to always finish what it's doing, then return, and not refer to that memory again. Second, you can't let any Lisp code that isn't part of your encapsulating "wrapper" refer to the pointer directly. Third, nothing, either foreign code or Lisp code, should explicitly deallocate the memory.

If you can make sure all of these are true, you can at least ensure that the foreign pointer is deallocated when the encapsulating object is about to become garbage, by using CCL's nonstandard "termination" mechanism, which is essentially the same as what Java and other languages call "finalization".

Termination is a way of asking the garbage collector to let you know when it's about to destroy an object which isn't used anymore. Before destroying the object, it calls a function which you write, called a terminator.

So, you can use termination to find out when a particular macptr is about to become garbage. That's not quite as helpful as it might seem: It's not exactly the same thing as knowing that the block of memory it points to is unreferenced. For example, there could be another macptr somewhere to the same block; or, if it's a struct, there could be a macptr to one of its fields. Most problematically, if the address of that memory has been passed to foreign code, it's sometimes hard to know whether that code has kept the pointer. Most foreign functions don't, but it's not hard to think of exceptions.

You can use code such as this to make all this happen:

      (defclass wrapper (whatever)
        ((element-type :initarg :element-type)
         (element-count :initarg :element-count)
         (ivector)
         (macptr)))

      (defmethod initialize-instance ((wrapper wrapper) &rest initargs)
        (declare (ignore initargs))
        (call-next-method)
        (ccl:terminate-when-unreachable wrapper)
        (with-slots (ivector macptr element-type element-count) wrapper
          (multiple-value-bind (new-ivector new-macptr)
              (make-heap-ivector element-count element-type)
            (setq ivector new-ivector
                  macptr new-macptr))))

      (defmethod ccl:terminate ((wrapper wrapper))
        (with-slots (ivector macptr) wrapper
          (when ivector
            (dispose-heap-ivector ivector macptr)
            (setq ivector nil
                  macptr nil))))
    

The ccl:terminate method will be called on some arbitrary thread sometime (hopefully soon) after the GC has decided that there are no strong references to an object which has been the argument of a ccl:terminate-when-unreachable call.

If it makes sense to say that the foreign object should live as long as there's Lisp code that references it (through the encapsulating object) and no longer, this is one way of doing that.

Now we've covered passing basic types back and forth with C, and we've done the same with pointers. You may think this is all... but we've only done pointers to basic types. Join us next time for pointers... to pointers.

12.11.1. Acknowledgement

Much of this chapter was generously contributed by Andrew P. Lentvorski Jr.


Previous Section Next Section Table of Contents Glossary Index