00:46:12 -!- lisppaste5 [n=lisppast@common-lisp.net] has quit [simmons.freenode.net irc.freenode.net] 00:48:56 -!- mdc_mobile [n=mdc_mobi@ds9.entity.com] has quit [simmons.freenode.net irc.freenode.net] 00:49:37 lisppaste5 [n=lisppast@common-lisp.net] has joined #ccl 00:49:46 mdc_mobile [n=mdc_mobi@ds9.entity.com] has joined #ccl 00:54:16 -!- lisppaste5 [n=lisppast@common-lisp.net] has quit ["Want lisppaste5 in your channel? Email lisppaste-requests AT common-lisp.net."] 00:54:48 lisppaste5 [n=lisppast@common-lisp.net] has joined #ccl 01:01:17 -!- mdc_mobile [n=mdc_mobi@ds9.entity.com] has quit [simmons.freenode.net irc.freenode.net] 01:02:41 mdc_mobile [n=mdc_mobi@ds9.entity.com] has joined #ccl 01:34:50 -!- rme [n=rme@pool-70-104-96-75.chi.dsl-w.verizon.net] has quit [] 01:37:58 rme [n=rme@pool-70-104-96-75.chi.dsl-w.verizon.net] has joined #ccl 01:57:41 -!- billstclair [n=billstcl@unaffiliated/billstclair] has quit [simmons.freenode.net irc.freenode.net] 01:57:41 -!- ilitirit [n=john@watchdog.msi.co.jp] has quit [simmons.freenode.net irc.freenode.net] 02:00:31 billstclair [n=billstcl@unaffiliated/billstclair] has joined #ccl 02:00:31 ilitirit [n=john@watchdog.msi.co.jp] has joined #ccl 03:02:23 bfulgham_ [n=brent@adsl-69-234-124-120.dsl.irvnca.pacbell.net] has joined #ccl 03:06:51 -!- sellout [n=greg@c-24-128-50-176.hsd1.ma.comcast.net] has quit [Read error: 104 (Connection reset by peer)] 03:19:40 -!- mdc_mobile [n=mdc_mobi@ds9.entity.com] has quit [] 05:50:07 -!- rme [n=rme@pool-70-104-96-75.chi.dsl-w.verizon.net] has quit [] 06:52:48 -!- bfulgham_ [n=brent@adsl-69-234-124-120.dsl.irvnca.pacbell.net] has quit [] 10:14:53 sellout [n=greg@c-24-128-50-176.hsd1.ma.comcast.net] has joined #ccl 12:34:49 -!- alms [n=alms@146-115-42-237.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has quit [] 14:09:01 alms [n=alms@146-115-42-237.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has joined #ccl 14:13:50 anRch [n=markmill@nmd.sbx00833.peaboma.wayport.net] has joined #ccl 14:34:10 mdc_mobile [n=mdc_mobi@ds9.entity.com] has joined #ccl 14:39:59 -!- anRch [n=markmill@nmd.sbx00833.peaboma.wayport.net] has quit [] 14:40:53 milanj [n=milan@93.87.181.86] has joined #ccl 14:44:29 anRch [n=markmill@nmd.sbx00833.peaboma.wayport.net] has joined #ccl 14:46:54 -!- mdc_mobile [n=mdc_mobi@ds9.entity.com] has quit [] 15:31:06 rme [n=rme@pool-70-104-96-75.chi.dsl-w.verizon.net] has joined #ccl 15:34:11 mdc_mobile [n=mdc_mobi@64.61.60.146] has joined #ccl 15:36:50 jajcloz [n=jaj@pool-98-110-225-173.bstnma.fios.verizon.net] has joined #ccl 15:50:32 -!- anRch [n=markmill@nmd.sbx00833.peaboma.wayport.net] has quit [] 16:16:16 -!- mdc_mobile [n=mdc_mobi@64.61.60.146] has quit [Read error: 104 (Connection reset by peer)] 16:16:17 mdc_ [n=mdc_mobi@64.61.60.146] has joined #ccl 16:22:50 -!- rme [n=rme@pool-70-104-96-75.chi.dsl-w.verizon.net] has quit [] 16:33:21 rme [n=rme@pool-70-104-96-75.chi.dsl-w.verizon.net] has joined #ccl 16:37:14 -!- mdc_ [n=mdc_mobi@64.61.60.146] has quit [Read error: 113 (No route to host)] 16:56:58 mdc_mobile [n=mdc_mobi@ds9.entity.com] has joined #ccl 18:11:42 anRch [n=markmill@nmd.sbx07283.medfoma.wayport.net] has joined #ccl 18:25:42 Fare [n=Fare@ita4fw1.itasoftware.com] has joined #ccl 18:25:50 hi 18:26:31 I have an "Exception occurred while executing foreign code" (malloc), and would like to understand WHAT exception that was. 18:26:53 and if it could possibly be a misdirected signal 18:28:19 also the %rip printed by the exception handler is invalid and vastly different from the one printed as the top of the backtrace 18:31:54 http://trac.clozure.com/openmcl/wiki/CclUnderGdb might help you track it down 18:32:04 thanks 18:33:43 I thought that the kernel debugger printed out the signal number that it choked on, but I must be wrong about that. 18:47:06 rme pasted "kernel debugger and signal number" at http://paste.lisp.org/display/85276 18:47:22 No, it does print out the signal number. Please see paste. 18:49:41 -!- anRch [n=markmill@nmd.sbx07283.medfoma.wayport.net] has quit [Read error: 104 (Connection reset by peer)] 18:49:48 anRch [n=markmill@nmd.sbx07283.medfoma.wayport.net] has joined #ccl 18:52:47 -!- anRch [n=markmill@nmd.sbx07283.medfoma.wayport.net] has quit [Read error: 104 (Connection reset by peer)] 18:52:52 anRch [n=markmill@nmd.sbx07283.medfoma.wayport.net] has joined #ccl 18:56:43 -!- billstclair [n=billstcl@unaffiliated/billstclair] has quit [] 18:59:56 -!- anRch [n=markmill@nmd.sbx07283.medfoma.wayport.net] has quit [Read error: 54 (Connection reset by peer)] 18:59:56 anRch_ [n=markmill@nmd.sbx07283.medfoma.wayport.net] has joined #ccl 19:06:29 anRch [n=markmill@nmd.sbx07283.medfoma.wayport.net] has joined #ccl 19:06:29 -!- anRch_ [n=markmill@nmd.sbx07283.medfoma.wayport.net] has quit [Read error: 104 (Connection reset by peer)] 19:09:09 billstclair [n=billstcl@unaffiliated/billstclair] has joined #ccl 19:11:23 I don't have "Unhandled exception FOO at BAR..." but instead "exception in foreign context" 19:12:09 also, interestingly, $rbp looks like it was OR'ed with 0xb000000000000001 19:12:31 at least, when I clear those bits things make more sense. 19:12:47 (in the register dump) 19:13:22 the faulty instruction being a mov 8(%rbp),%r13 19:14:30 oh, interestingly, the previous instruction is a lea (%rbx,%r12,1),%rbp -- which explains the bad %rbp 19:16:36 except even by that token, rbp is off by one from what I'd expect. 19:17:23 C code doesn't necessarily use %rbp as a frame pointer. 19:17:27 *Fare* suspects the previous instruction may not have been the one just before 19:17:32 gbyers, indeed. 19:17:38 Those instructions looks like lisp code to me. 19:18:08 if I believe the error message, that must be internals of malloc 19:18:15 We rarely mov to %r13 (= %fn). 19:19:59 is there a quick way in gdb to determine the nearest function prologue? 19:20:05 or defined symbol 19:20:55 (though I expect libc to not have debug symbols) 19:21:10 -!- billstclair [n=billstcl@unaffiliated/billstclair] has quit [] 19:21:51 'info symbol' is supposed to do that, but if the disassembly isn't showing anything it's not clear that that command would. 19:24:12 The thing that prints the error message uses dlsym(). 19:24:28 hum. I don't see any obvious jump to that instruction in that C routine, but the value of the register is off-by-one as compared to what the previous instruction would yield 19:24:40 which would still be a bug 19:25:10 since %r12 has this high-nibble to 0xb that sends the whole heap to la-la land 19:26:48 oh but the previous instruction clearly masks all but the low bits of %r12. 19:27:19 So I *must* have missed a jump, or the error is weirder than I can fathom 19:28:09 -!- anRch [n=markmill@nmd.sbx07283.medfoma.wayport.net] has quit [Read error: 104 (Connection reset by peer)] 19:28:15 anRch [n=markmill@nmd.sbx07283.medfoma.wayport.net] has joined #ccl 19:33:06 billstclair [n=billstcl@unaffiliated/billstclair] has joined #ccl 19:34:39 -!- anRch [n=markmill@nmd.sbx07283.medfoma.wayport.net] has quit [] 19:34:50 or is CCL accurate in its reported register values? 19:35:44 The register values come from the signal context; if the exception happened in foreign code, we haven't touched them. 19:41:40 I give up on this one. I suspect anything from kernel bug to memory fault to CCL bug to library or application corrupting the heap. 19:44:15 Were you able to get a lisp backtrace ? 19:57:15 yes 19:58:10 it says MALLOC on top, which is coherent with the %rip in the exception log 19:58:22 (but the other register values are not coherent at all) 19:58:30 gz pasted "untitled" at http://paste.lisp.org/display/85286 20:00:27 (That's the backtrace) 20:00:51 Thanks. Unremarkable, isn't it ? 20:06:37 my annotation is more remarkable. 20:06:44 fare annotated #85286 "untitled" at http://paste.lisp.org/display/85286#1 20:07:58 This is apparently happening on and off, and the top is always the same (make-file-stream and up), so it doesn't seem totally random. 20:08:29 What else in CCL calls malloc very often ? 20:12:34 Fare: is this happening only when safety=3? 20:18:34 gbyers annotated #85286 "untitled" at http://paste.lisp.org/display/85286#2 20:34:00 -!- jajcloz [n=jaj@pool-98-110-225-173.bstnma.fios.verizon.net] has quit [] 20:34:22 mdc_ [n=mdc_mobi@c-76-119-233-23.hsd1.ma.comcast.net] has joined #ccl 20:51:08 -!- mdc_mobile [n=mdc_mobi@ds9.entity.com] has quit [Read error: 113 (No route to host)] 20:58:19 -!- mdc_ is now known as mdc_mobile 21:34:07 gz: possibly. 21:35:59 gz: I am pretty sure we had SAFETY 3 on all the images where this happened 21:36:50 so more stuff might call malloc in that case. 21:37:24 could the misaligned address be due to some structure being clobbered by the GC with stuff that has type-encoded low-bits? 21:37:47 "more stuff might call malloc" ? 22:06:49 jajcloz [n=jaj@pool-98-110-225-173.bstnma.fios.verizon.net] has joined #ccl 22:08:13 The GC doesn't have anything to do with malloc's heap (aside from the fact that it'll free some foreign pointers that aren't reachable from lisp.) With safety 3, some things that might ordinarily stack-allocate blocks of foreign memory may call malloc instead. 22:16:25 hum. Any suggestions on tracing the origin of that corruption? 22:18:02 Linux has mtrace and there may be some malloc libraries intended for debugging this sort of thing/ 22:18:03 . 22:50:22 -!- milanj [n=milan@93.87.181.86] has quit ["Leaving"] 23:30:12 -!- Fare [n=Fare@ita4fw1.itasoftware.com] has quit ["Leaving"] 23:47:55 I'm the gc sources, is there a kind of style rule as to when it's appropriate for something to be a LispObj or a pointer to a LispObj? 23:48:06 Make that "In the gc sources".