00:54:34 -!- alms_ [~alms_@209-6-130-32.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has quit [Quit: alms_] 00:55:27 alms_ [~alms_@209-6-130-32.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has joined #ccl 01:26:51 -!- pjb [~user@90.24.195.13] has quit [Ping timeout: 240 seconds] 02:15:28 pjb [~t@AMontsouris-651-1-93-50.w82-123.abo.wanadoo.fr] has joined #ccl 04:11:39 -!- PuffTheMagic [uid3325@gateway/web/irccloud.com/x-cjilitlszzyqvgsw] has quit [Ping timeout: 260 seconds] 04:21:35 -!- Fare [fare@nat/google/x-jxzoropvsaghyqhn] has quit [Ping timeout: 256 seconds] 06:09:58 PuffTheMagic [uid3325@gateway/web/irccloud.com/x-pqszzucqvdjualou] has joined #ccl 06:49:59 DataLinkDroid [~DataLinkD@CPE-121-217-85-18.lnse1.cht.bigpond.net.au] has joined #ccl 09:13:22 segv- [~mb@dslb-188-102-168-176.pools.arcor-ip.net] has joined #ccl 10:07:17 -!- DataLinkDroid [~DataLinkD@CPE-121-217-85-18.lnse1.cht.bigpond.net.au] has quit [Quit: Bye] 10:19:46 -!- dented42 [~dented42@opengroove.org] has quit [Ping timeout: 255 seconds] 10:19:46 -!- dmiles_afk [~dmiles@c-71-237-234-93.hsd1.or.comcast.net] has quit [Ping timeout: 255 seconds] 10:19:47 -!- faheem [~faheem@bulldog.duhs.duke.edu] has quit [Ping timeout: 255 seconds] 10:19:48 -!- |3b| [foobar@cpe-72-177-66-41.austin.res.rr.com] has quit [Ping timeout: 260 seconds] 10:20:43 dmiles_afk [~dmiles@c-71-237-234-93.hsd1.or.comcast.net] has joined #ccl 10:20:58 |3b|` [foobar@cpe-72-177-66-41.austin.res.rr.com] has joined #ccl 10:21:28 fourOfTwelve [~dented42@opengroove.org] has joined #ccl 10:23:48 faheem [~faheem@bulldog.duhs.duke.edu] has joined #ccl 10:46:58 -!- alms_ [~alms_@209-6-130-32.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has quit [Quit: alms_] 12:52:29 -!- segv- [~mb@dslb-188-102-168-176.pools.arcor-ip.net] has quit [Remote host closed the connection] 12:53:10 Fare [fare@nat/google/x-kaqaumqizrmypfst] has joined #ccl 13:48:50 alms_ [~alms_@209-6-130-32.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has joined #ccl 15:15:02 df_ [~df@aldur.bowerham.net] has joined #ccl 15:21:40 -!- PuffTheMagic [uid3325@gateway/web/irccloud.com/x-pqszzucqvdjualou] has quit [*.net *.split] 15:21:40 -!- df___ [~df@aldur.bowerham.net] has quit [*.net *.split] 16:05:29 -!- clop [~jared@moat3.centtech.com] has quit [Quit: Leaving] 16:36:32 -!- sellout [~Adium@c-98-245-92-119.hsd1.co.comcast.net] has quit [Quit: Leaving.] 16:36:55 -!- gz- [~gz@setf.clozure.com] has quit [Quit: Movin' on] 16:37:06 gz [~gz@setf.clozure.com] has joined #ccl 16:37:59 sellout- [~Adium@c-98-245-92-119.hsd1.co.comcast.net] has joined #ccl 16:42:28 -!- fe[nl]ix [~quassel@pdpc/supporter/professional/fenlix] has quit [*.net *.split] 16:43:06 fe[nl]ix [~quassel@pdpc/supporter/professional/fenlix] has joined #ccl 16:46:21 -!- alms_ [~alms_@209-6-130-32.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has quit [Quit: alms_] 16:56:26 alms_ [~alms_@209-6-130-32.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has joined #ccl 17:17:22 -!- Fare [fare@nat/google/x-kaqaumqizrmypfst] has quit [Ping timeout: 256 seconds] 17:21:47 PuffTheMagic [uid3325@gateway/web/irccloud.com/x-amtifxrjqrsmhfka] has joined #ccl 17:24:33 -!- alms_ [~alms_@209-6-130-32.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has quit [Quit: alms_] 17:40:20 Fare [fare@nat/google/x-bbwoqnmivbtmsdty] has joined #ccl 17:41:09 alms_ [~alms_@209-6-130-32.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has joined #ccl 18:01:43 -!- Fare [fare@nat/google/x-bbwoqnmivbtmsdty] has quit [Ping timeout: 264 seconds] 18:50:31 segv- [~mb@dslb-188-102-168-176.pools.arcor-ip.net] has joined #ccl 18:56:53 -!- Vivitron [~Vivitron@pool-98-110-213-33.bstnma.fios.verizon.net] has quit [Quit: trivial-irc-0.0.4] 19:13:03 -!- fourOfTwelve [~dented42@opengroove.org] has quit [Ping timeout: 245 seconds] 19:18:22 dented42 [~dented42@opengroove.org] has joined #ccl 20:16:40 Fare [fare@nat/google/x-jspzlbgpuuuqslvb] has joined #ccl 20:59:27 ahem. 21:01:17 -!- billstclair [~billstcla@unaffiliated/billstclair] has quit [Read error: Connection reset by peer] 21:02:27 My low-level error looks like somehow the class descriptor object for some class segment-key was corrupted an whenever I try to initialize it (in shared-initialize, called by make-instance), the typecheck for number slot (which is of type flight-number, which is a deftype to (or (eql blank) (integer 1 9999))) causes a # is not of the expected type (or (or ccl::numeric-ctype ccl::named-ctype ...) ccl::class-ctype ...) 21:09:19 is there anything about CLOS optimizations that could cause this weird behavior? 21:09:47 (and what could I have changed that causes this behavior?) 21:22:25 DataLinkDroid [~DataLinkD@1.146.119.134] has joined #ccl 22:01:59 Fare: does the bug occur if CLOS optimizations are never enabled ? 22:02:27 That's what I'm looking into right now... 22:12:53 Vivitron [~Vivitron@pool-98-110-213-33.bstnma.fios.verizon.net] has joined #ccl 22:20:18 looks the same :-( 22:29:36 OK. 22:33:30 well, at least I have a simple MAKE-INSTANCE form that triggers the bug 22:34:06 I don't know where to go from it. -- How do I disassemble a specific shared-initialize method? 22:34:31 can you disassemble its METHOD-FUNCTION ? 22:35:53 What seems to be getting bashed is a predicate function associated with a slot; the predicate function is stored in a ... slot ... in the slot's SLOT-DEFINITION object. 22:36:55 it's the standard fallback method that's in my stack, btw: 22:38:23 The predicate function calls a low-level form of TYPEP with a canonicalized version of the slot's type (an object of type CCL::CTYPE) as an operand, and that CTYPE object is getting clobbered. 22:39:19 is that the type-predicate slot? 22:39:28 I see it bound to a closure 22:40:43 yes; it cointains a function (ok, closure) and the CTYPE is probably a closed-over value referenced by that closure. Let me see if there's an easy way to get your hands on that. 22:40:45 I can save it in a defparamater thanks to slime 22:41:28 the disassembly does show the bogus object as a constant being passed to check-ctypep 22:42:01 Is it a valid object ot type CCL::CTYPE ? (or it's been bashed already.) 22:42:35 OK, so I know *where* something was corrupted. Question being WHAT corrupted it, and how the hell did my libraries affect that. 22:43:42 Conceptually, we'd want to WATCH the CTYPE. It's probably the case that it's getting clobbered simply because it's near some other object in memory and something is writing outside of the bounds of that other object. 22:44:25 I brought in ASDF3 and am using its hooks instead of the qres previous pre-image-dump-hooks, and have moved some of the final optimizations we did in there. But I don't see how that could do it. Also cl-unicode or drakma or hunchentoot or bordeaux-threads updated, but that doesn't look like it should matter. 22:45:24 alternatively, it's at the same memory as a pointer kept to the stack while something else is going on. 22:46:39 or something not-gc-friendly happened, and this was the casualty 22:46:56 The general approach to debugging this is binary search: if the CTYPE is OK at time A and clobbered at time B, look at it again at time (A + B)/2. Repeat. 22:50:26 You can couple that by enabling "GC heap integrity checking", which will force the GC to run some fairly exhaustive tests before and after it runs. Those tests are slow, but would probably find something like this; it'd be interesting to know if this is the only problem or one of several. 22:52:29 If you do (setq ccl::*gc-event-status-bits* 4), those integrity checks will run; if they find inconsistencies, they'll drop into the kernel debugger with a possibly cryptic error message. You can exit (x) from that; the checks may keep reporting the same problem or may report several problems. 22:55:02 thanks 22:55:29 and to check whether the ctype is OK, I run this make-instance and catch any trouble? 22:55:54 Your help is much appreciated 22:56:05 I'll put checks around each file compilation 22:57:15 If you can capture it in a special variable ASAP after the class is defined, you can just check it periodically to see if it's been bashed. 22:58:24 IIRC, it's likely to be a UNION-CTYPE (which represent types specified via (OR ...)) 23:05:03 aha, it's not getting corrupted during compilation, but during initialization of the application 23:05:36 oh -- it could be addresses saved in a closure before the image was dumped, and then used after it was restored? 23:06:10 Heap addresses should be updated correctly. Stack addresses wouldn't be. 23:09:50 "Heap addresses" above means "references to heap-allocated lisp objects". Raw numeric addresses of those objects change all the time. 23:13:14 now playing the game of dichotomy... 23:16:48 Things that're allocated in the same thread at (roughly) the same time are often going to be near each other in memory. This kind of corruption is often caused by something storing beyond the bounds of a nearby object, and "nearby" often implies temporal proximity. 23:17:45 is there a magic macro / variable to extract at macro-expansion time the position of the current form in the file ? 23:18:01 so I could define a macro and pepper my code with it... 23:18:56 I'm not sure that I understand the question. What file ? Current in what sense ? 23:19:11 brb 23:19:56 *load-pathname* 23:19:58 for the file 23:20:25 is there the equivalent for the line number / character position of the reader, etc. 23:21:43 or function being defined, etc. 23:21:57 If you can get your hands on the stream, FILE-POSITION might be helpful. I'd have to check to see if there are other ways. 23:24:08 *fcomp-loading-toplevel-location* maybe ? 23:25:56 -!- DataLinkDroid [~DataLinkD@1.146.119.134] has quit [Quit: Bye] 23:43:46 Or just *LOADING-TOPLEVEL-LOCATION*. 23:45:17 DataLinkDroid [~DataLinkD@1.146.119.134] has joined #ccl 23:45:57 yup, I ended up using that one 23:46:13 and did a little peppering 23:46:47 (/usr/local/google/ita/x1/qres/lisp/core/core-application.lisp 4368 11909 28): class QRES-SCHED:SEGMENT-KEY exists and is working properly 23:47:15 need ... food ... I'll be around, but may take a little time to respond. 23:47:18 (filename start-pos end-pos (count within that toplevel form)) 23:47:22 thanks a lot 23:48:59 between #29 and #30 it is... let's see what that is... 23:50:31 if I count correctly, a form called (qres-build:tweak-GC-parameters) 23:56:37 or maybe it's the form before, which has some of the clos optimization 23:58:23 well, what the tweak-gc-parameters do is configure gc params and threshhold then run the damn gc. 23:58:40 somehow I feel cheated... that will reveal the bug but probably not create it.