2014-09-27T00:05:28Z davazp joined #sbcl 2014-09-27T00:05:48Z Bicyclidine quit (Ping timeout: 272 seconds) 2014-09-27T00:56:03Z davazp quit (Remote host closed the connection) 2014-09-27T01:00:29Z psilord joined #sbcl 2014-09-27T01:01:54Z drmeist__ joined #sbcl 2014-09-27T01:06:04Z edgar-rft joined #sbcl 2014-09-27T01:12:06Z drmeist__ quit (Remote host closed the connection) 2014-09-27T01:26:06Z davazp joined #sbcl 2014-09-27T02:14:30Z faheem__1 quit (Ping timeout: 258 seconds) 2014-09-27T02:16:45Z akkad quit (Ping timeout: 260 seconds) 2014-09-27T02:21:22Z scymtym_ quit (Ping timeout: 240 seconds) 2014-09-27T02:22:58Z akkad joined #sbcl 2014-09-27T02:33:26Z faheem joined #sbcl 2014-09-27T02:35:39Z psilord quit (Quit: Leaving.) 2014-09-27T02:38:57Z christoph_debian quit (Ping timeout: 245 seconds) 2014-09-27T02:52:19Z christoph_debian joined #sbcl 2014-09-27T03:17:51Z davazp quit (Remote host closed the connection) 2014-09-27T03:36:17Z drmeiste_ is now known as drmeister__ 2014-09-27T04:42:01Z drmeister__ quit (Remote host closed the connection) 2014-09-27T04:49:24Z gingerale joined #sbcl 2014-09-27T05:28:03Z drmeiste_ joined #sbcl 2014-09-27T05:32:18Z drmeiste_ quit (Ping timeout: 258 seconds) 2014-09-27T05:41:44Z DGASAU quit (Remote host closed the connection) 2014-09-27T05:46:42Z nyef quit (Quit: G'night all.) 2014-09-27T06:02:18Z DGASAU joined #sbcl 2014-09-27T06:27:16Z psilord joined #sbcl 2014-09-27T07:00:09Z sdemarre joined #sbcl 2014-09-27T07:16:05Z drmeiste_ joined #sbcl 2014-09-27T07:20:17Z sdemarre quit (Ping timeout: 245 seconds) 2014-09-27T07:20:30Z drmeiste_ quit (Ping timeout: 246 seconds) 2014-09-27T07:35:49Z angavrilov joined #sbcl 2014-09-27T07:37:32Z sdemarre joined #sbcl 2014-09-27T07:59:14Z stassats joined #sbcl 2014-09-27T08:21:58Z oleo is now known as Guest24335 2014-09-27T08:22:50Z oleo__ joined #sbcl 2014-09-27T08:25:02Z Guest24335 quit (Ping timeout: 244 seconds) 2014-09-27T08:34:41Z stassats quit (Ping timeout: 260 seconds) 2014-09-27T08:38:36Z DGASAU quit (Ping timeout: 258 seconds) 2014-09-27T08:56:43Z yacks quit (Ping timeout: 272 seconds) 2014-09-27T10:03:52Z yacks joined #sbcl 2014-09-27T10:46:53Z edgar-rft quit (Quit: memory access interrupted by nuclear burnout) 2014-09-27T10:52:22Z drmeiste_ joined #sbcl 2014-09-27T10:57:17Z drmeiste_ quit (Ping timeout: 272 seconds) 2014-09-27T11:13:39Z alchemis7 left #sbcl 2014-09-27T11:13:40Z alchemis7 joined #sbcl 2014-09-27T11:16:14Z ebrasca joined #sbcl 2014-09-27T11:38:15Z DGASAU joined #sbcl 2014-09-27T12:05:29Z stassats joined #sbcl 2014-09-27T12:10:48Z ebrasca quit (Quit: ChatZilla 0.9.90.1 [Firefox 24.0/20131118140013]) 2014-09-27T12:40:31Z drmeiste_ joined #sbcl 2014-09-27T12:45:17Z drmeiste_ quit (Ping timeout: 260 seconds) 2014-09-27T13:09:25Z drmeiste_ joined #sbcl 2014-09-27T13:12:12Z drmeiste_ is now known as drmeister_ 2014-09-27T13:30:50Z kanru` joined #sbcl 2014-09-27T14:31:55Z loke_ joined #sbcl 2014-09-27T14:49:47Z eudoxia joined #sbcl 2014-09-27T14:59:18Z LiamH joined #sbcl 2014-09-27T15:15:59Z eudoxia quit (Quit: Lost terminal) 2014-09-27T15:40:37Z scymtym_ joined #sbcl 2014-09-27T15:57:45Z oleo__ is now known as oleo 2014-09-27T15:58:17Z krzysz00 joined #sbcl 2014-09-27T16:02:58Z stassats quit (Remote host closed the connection) 2014-09-27T16:14:33Z kanru` quit (Remote host closed the connection) 2014-09-27T16:30:26Z sdemarre quit (Ping timeout: 272 seconds) 2014-09-27T16:35:30Z krzysz00 quit (Ping timeout: 272 seconds) 2014-09-27T16:37:03Z krzysz00 joined #sbcl 2014-09-27T16:37:49Z stassats joined #sbcl 2014-09-27T16:52:22Z kanru` joined #sbcl 2014-09-27T17:20:45Z kanru` quit (Remote host closed the connection) 2014-09-27T17:21:30Z kanru` joined #sbcl 2014-09-27T17:30:42Z jsnell quit (Ping timeout: 245 seconds) 2014-09-27T17:30:50Z jsnell joined #sbcl 2014-09-27T17:31:07Z yauz_2 quit (Ping timeout: 245 seconds) 2014-09-27T17:37:40Z heddwch quit (*.net *.split) 2014-09-27T17:37:40Z karswell quit (*.net *.split) 2014-09-27T17:37:41Z asedeno quit (*.net *.split) 2014-09-27T17:37:41Z les quit (*.net *.split) 2014-09-27T17:37:41Z leoc quit (*.net *.split) 2014-09-27T17:37:42Z sobel quit (*.net *.split) 2014-09-27T17:46:49Z yauz joined #sbcl 2014-09-27T17:46:49Z heddwch joined #sbcl 2014-09-27T17:46:49Z karswell joined #sbcl 2014-09-27T17:46:49Z asedeno joined #sbcl 2014-09-27T17:46:49Z les joined #sbcl 2014-09-27T17:46:49Z sobel joined #sbcl 2014-09-27T17:46:49Z leoc joined #sbcl 2014-09-27T17:48:33Z slyrus joined #sbcl 2014-09-27T17:49:48Z kanru` quit (Ping timeout: 246 seconds) 2014-09-27T17:57:01Z scymtym_: since the release is done, i would like to push the IPv6 changes sometime today. is that ok? 2014-09-27T18:24:36Z stassats: sure 2014-09-27T18:24:59Z stassats: the sooner something is pushed, the more time it has to get tested 2014-09-27T18:32:18Z scymtym_: pushed 2014-09-27T18:37:45Z attila_lendvai joined #sbcl 2014-09-27T18:43:08Z nyef joined #sbcl 2014-09-27T18:52:48Z angavrilov quit (Remote host closed the connection) 2014-09-27T19:19:05Z pkhuong: scymtym_: awesome (: 2014-09-27T19:19:18Z stassats: now i need to figure out how to get ipv6 2014-09-27T19:22:12Z scymtym_: pkhuong: well, we'll see what breaks 2014-09-27T19:38:45Z Krystof: seeing what breaks is part of the fun 2014-09-27T19:38:52Z Krystof: twitter abuse is next 2014-09-27T19:58:25Z stassats: so, if you can address only 48 bits on x86-64, the word size of a vector could be recorded in the header 2014-09-27T20:01:43Z stassats: that header can be used for a lot of things 2014-09-27T20:12:30Z stassats: another idea, let's make everything word-sized, no bit-vectors, everything becomes so much simpler 2014-09-27T20:12:53Z Krystof: it would offset the looming complexity of having four different implementations of EVAL 2014-09-27T20:13:17Z stassats: although, no more specialized COMPLEX DOUBLE-FLOATs 2014-09-27T20:13:38Z nyef: Don't forget that we still need to run on 32-bit systems. 2014-09-27T20:14:00Z stassats: right, like that's the only flaw in the idea 2014-09-27T20:14:02Z nyef: And wasn't there some bias towards keeping x86 and x86-64 roughly in sync? 2014-09-27T20:14:40Z stassats: so, we already double-word align everything, double-word sizes everywhere then! 2014-09-27T20:15:14Z nyef: Why don't you work on 31-bit fixnums on 32-bit hosts instead? 2014-09-27T20:17:16Z stassats: well, i don't want to work on any optimizations as long as there is this nasty gc bug 2014-09-27T20:18:19Z stassats: being wrong fast doesn't sound appealing 2014-09-27T20:18:20Z nyef: Mmm. 2014-09-27T20:18:39Z nyef: How many nasty gc bugs was it now? 2014-09-27T20:18:55Z stassats: one known, the pa_alloc one, one elusive 2014-09-27T20:19:54Z stassats: plus unknown unknowns 2014-09-27T20:19:56Z nyef: An elusive one? 2014-09-27T20:20:06Z stassats: nyef: it pops, can't reproduce it 2014-09-27T20:20:11Z stassats: pops up 2014-09-27T20:20:27Z nyef: Lovely. 2014-09-27T20:20:57Z stassats: last time i got it, i restarted and forgot at least to run a backtrace or poke around 2014-09-27T20:21:20Z nyef: I might-or-might-not have time to do an audit of the allocation paths for PPC next week. 2014-09-27T20:22:07Z stassats: forwarding pointer in scavenge or something 2014-09-27T20:22:48Z nyef: And then there's the reduced-conservatism bit that cracauer_ was putting together, isn't there? 2014-09-27T20:24:07Z stassats: how can a forwarding pointer appear during scavenge? 2014-09-27T20:24:25Z stassats: was it moved but the pointer later wasn't updated? how can that happen? 2014-09-27T20:24:57Z nyef: Oh, that could be fun to track down, if it's a time bomb set by a previous GC. 2014-09-27T20:24:59Z stassats: something that one part thought should be pinned and another thought it should be moved, so that one part didn't update the refrence? 2014-09-27T20:25:46Z stassats: but the pinning should happen first, shouldn't it? 2014-09-27T20:26:37Z nyef: Yeah... And that reminds me, there's a badly mis-named variable in there that should be honored more widely than it is... "conservative_stack" or something like that. 2014-09-27T20:27:07Z stassats: another possibility is memory corruption, but that's unlikely 2014-09-27T20:27:47Z stassats: or maybe some root that wasn't scanned 2014-09-27T20:29:03Z nyef: Or something is being scavenged when it should be treated as a conservative root. 2014-09-27T20:29:10Z stassats: i think this one needs some "think hard" approach 2014-09-27T20:29:38Z nyef: Even just having a good method for triggering it would be a good start. 2014-09-27T20:29:46Z stassats: for some reason, when the pa_alloc happened, the defun-cached array got corrupted 2014-09-27T20:30:12Z stassats: i think this forwarding-in-scavenge has defun-cached corruption too 2014-09-27T20:31:00Z stassats: nyef: it needs "think hard" just to get a test-case 2014-09-27T20:31:13Z sdemarre joined #sbcl 2014-09-27T20:32:11Z stassats: defun-cached allocates a vector and puts it into a vector 2014-09-27T20:32:23Z scymtym quit (Remote host closed the connection) 2014-09-27T20:32:25Z Krystof quit (Write error: Connection reset by peer) 2014-09-27T20:34:40Z stassats: i hope it's not memory ordering related 2014-09-27T20:34:57Z nyef: Wouldn't it be GOOD if it were memory ordering related? 2014-09-27T20:35:11Z nyef: Slap a barrier in there, and done. 2014-09-27T20:35:25Z stassats: i don't want to think about when you do need memory ordering on x86 2014-09-27T20:35:36Z Krystof joined #sbcl 2014-09-27T20:35:36Z ChanServ has set mode +o Krystof 2014-09-27T20:35:44Z nyef points to the non-x86 platforms that would presumably also be affected. 2014-09-27T20:37:27Z stassats: i added a barrier on allocation on ppc 2014-09-27T20:37:51Z nyef: Are we going to need the same for SPARC, MIPS, ALPHA, or HPPA? 2014-09-27T20:38:11Z stassats: do they have threads? 2014-09-27T20:38:19Z nyef: Not yet. 2014-09-27T20:38:41Z nyef: SPARC is the most likely to get threads next. 2014-09-27T20:38:45Z njmurphy quit (Ping timeout: 260 seconds) 2014-09-27T20:38:47Z LiamH quit (Ping timeout: 260 seconds) 2014-09-27T20:39:22Z nyef: Finding MIPS hardware that will run SBCL at a decent clip is a bit of a pain, and the ALPHA and HPPA backends are still cheneygc-only. 2014-09-27T20:40:17Z stassats: so, the defun-cached vector had some thing like a SAP instead of a vector 2014-09-27T20:40:23Z LiamH joined #sbcl 2014-09-27T20:40:27Z nyef: Uh-oh. 2014-09-27T20:40:43Z nyef: Yeah, that does sound like potential trouble in several different directions. 2014-09-27T20:40:50Z stassats: i think that went away after the pa-alloc fix, but why did that happen? 2014-09-27T20:41:40Z stassats: see this http://paste.lisp.org/display/142817 2014-09-27T20:42:07Z nyef: Hrm. 2014-09-27T20:42:10Z njmurphy joined #sbcl 2014-09-27T20:43:24Z nyef: Annotation 1, I presume? 2014-09-27T20:43:30Z stassats: yes 2014-09-27T20:43:37Z pkhuong_ joined #sbcl 2014-09-27T20:43:47Z stassats: one of the original manifestations had SAPs in the cache vector, now this test case just memfaults 2014-09-27T20:44:07Z thoto_ quit (Remote host closed the connection) 2014-09-27T20:44:30Z pkhuong quit (Write error: Connection reset by peer) 2014-09-27T20:44:31Z carvite quit (Write error: Broken pipe) 2014-09-27T20:44:31Z irsol quit (Write error: Connection reset by peer) 2014-09-27T20:44:38Z irsol_ joined #sbcl 2014-09-27T20:44:42Z thoto joined #sbcl 2014-09-27T20:46:27Z jackdaniel quit (Ping timeout: 245 seconds) 2014-09-27T20:47:05Z irsol_ quit (Changing host) 2014-09-27T20:47:05Z irsol_ joined #sbcl 2014-09-27T20:48:03Z irsol_ is now known as irsol 2014-09-27T20:49:55Z nyef: Doing a build of head on x86-64/linux to see if I can trigger this. 2014-09-27T20:50:00Z carvite joined #sbcl 2014-09-27T20:51:16Z stassats: the patch does fix it, but i'm trying to figure out how could SAPs appear 2014-09-27T20:52:02Z pkhuong_: defun cached needs two barriers, irrc. 2014-09-27T20:52:35Z pkhuong_: one after allocation and another after the hand-rolled initialisation (just before storing the entry's simple array to the cache vector) 2014-09-27T20:52:55Z stassats: what about on x86? 2014-09-27T20:53:47Z pkhuong_: non on x86 2014-09-27T20:53:49Z jackdaniel joined #sbcl 2014-09-27T20:54:05Z pkhuong_: sparc would also be fine 2014-09-27T20:54:24Z stassats: i think i whenever bad things happened it was around subtypep, which uses defun-cached 2014-09-27T20:54:42Z stassats: of course i couldn't replicate anything 2014-09-27T20:57:45Z pkhuong_: that was the SAP thing, right? 2014-09-27T20:57:54Z stassats: right 2014-09-27T20:58:53Z pkhuong_: on !x86 platforms, defun-cached wants a write/write barrier in alloc-hash-cache, and in alloc-hash-cache-line 2014-09-27T20:58:58Z stassats: and i think occupying more memory made it easier to trigger 2014-09-27T20:59:53Z nyef: I'm starting to appreciate drmeister_'s comment about wanting a lisp-level refactoring and analytics tool. 2014-09-27T21:00:18Z pkhuong_: stassats: higher chances of triggering a GC during the SAP allocation? 2014-09-27T21:00:36Z nyef: It really would be nice to have some automated tool that can go through and sanity-check a bunch of functions to make sure that they work properly with respect to memory barriers and whatnot. 2014-09-27T21:00:39Z sdemarre quit (Read error: Connection reset by peer) 2014-09-27T21:00:43Z pkhuong_: erh. 2014-09-27T21:00:47Z pkhuong_: that's *hard* 2014-09-27T21:00:56Z stassats: pkhuong_: but how would it end up at the wrong place? bad forwarding pointer replacement? 2014-09-27T21:01:15Z oleo quit (Ping timeout: 272 seconds) 2014-09-27T21:01:26Z pkhuong_: stassats: the SAP? C code writes to where the SAP used to be, resulting in random mangling. 2014-09-27T21:01:51Z oleo joined #sbcl 2014-09-27T21:04:24Z stassats: does C code write there? i thought it would just be passed to the error handler 2014-09-27T21:04:34Z pkhuong_: C code to initialise the SAP 2014-09-27T21:05:02Z pkhuong_: C-side allocation does: 1. enter PA; 2. grab chunk of heap; 3. leave PA, check for GC; 4. initialise said chunk of heap 2014-09-27T21:05:04Z stassats: but isn't the stuff like widetag writng protected by PA? 2014-09-27T21:05:08Z pkhuong_: not in C 2014-09-27T21:05:39Z stassats: so, your patch fixes that, but i still had problems 2014-09-27T21:05:45Z stassats: maybe something else does that as well? 2014-09-27T21:06:00Z pkhuong_: all other (non-GC) C-side allocations do that 2014-09-27T21:06:05Z pkhuong_: my patch only fixed saps 2014-09-27T21:06:21Z stassats: i think x86oids only allocate saps 2014-09-27T21:06:25Z nyef: On non-x86oids, GC doesn't check for roots being manipulated by the runtime... 2014-09-27T21:12:51Z stassats: there's alloc_number in funcall2(StaticSymbolFunction(HEAP_EXHAUSTED_ERROR), alloc_number(available), alloc_number(requested)); 2014-09-27T21:13:18Z stassats: sine it's heap exhasted, alloc_number is like to trigger a GC and end up in the failure mode? 2014-09-27T21:13:36Z stassats: would that mean that's why the larger the heap the likelier the bug is? 2014-09-27T21:14:35Z pkhuong_: sounds plausibe 2014-09-27T21:14:46Z pkhuong_: but alloc_number should return a fixnum 2014-09-27T21:14:53Z pkhuong_: at least on x86-64 2014-09-27T21:14:59Z stassats: and your patch does fix alloc_number 2014-09-27T21:16:38Z stassats: and it prints the heap exhaustion table beforehand, so, that's not it 2014-09-27T21:18:44Z pkhuong_: is that heap exhaustion during GC? 2014-09-27T21:18:50Z pkhuong_: no, that's straight to ldb 2014-09-27T21:18:59Z pkhuong_ is now known as pkhuong 2014-09-27T21:23:25Z stassats: gathered some stats: http://paste.lisp.org/display/143871 2014-09-27T21:24:06Z stassats: doesn't say much 2014-09-27T21:25:33Z stassats: apparently it scavenges something largish 2014-09-27T21:27:16Z nyef: That's a bunch of pointers, and it looks to be scavenging spaces... 2014-09-27T21:27:57Z stassats: it's often a0 or 180 away from the start 2014-09-27T21:28:27Z Xach joined #sbcl 2014-09-27T21:28:42Z nyef: Mmm. And your given test case involves NLX on the one side and allocation on the other... 2014-09-27T21:28:48Z Xach: hi friends. i am here to plug https://bugs.launchpad.net/sbcl/+bug/1364413 2014-09-27T21:28:57Z Xach: (patch to fix apropos to match the spec) 2014-09-27T21:29:31Z stassats: nyef: nlx is just to tirgger alloc_number 2014-09-27T21:29:36Z stassats: or rather, alloc_sap 2014-09-27T21:29:44Z stassats: for the internal error handler 2014-09-27T21:29:51Z nyef: Ah, okay. 2014-09-27T21:29:53Z nyef: Hrm. 2014-09-27T21:30:12Z stassats: there's a similar problem on ARM 2014-09-27T21:30:19Z nyef: My basic question there is actually quite simple: Why aren't we stack-allocating that SAP? 2014-09-27T21:30:37Z stassats: i think we've been through that before 2014-09-27T21:31:10Z stassats: a) there are no stack allocation functions, b) somebody might want to access it outside of DX (that one is dubious to me) 2014-09-27T21:31:12Z nyef: I think that we might have, but I don't remember what the answer was. 2014-09-27T21:31:37Z nyef: Yeah, it's a SAP for an interrupt context. It almost has to be DX anyway. 2014-09-27T21:32:05Z nyef: There not being any stack allocation functions is bogus, we would make it work if we had to. 2014-09-27T21:32:49Z stassats: another option, assuming proper alignment, to use a fixnum 2014-09-27T21:33:58Z stassats: this kills ARM http://paste.lisp.org/display/143133#1 2014-09-27T21:35:28Z nyef: ... that's neat. 2014-09-27T21:36:11Z stassats: basically, it causes allocation only to happen at alloc_sap, one of them eventually triggering a GC => bad 2014-09-27T21:36:35Z stassats: it also kills ppc, but with a different error 2014-09-27T21:36:35Z nyef: Let me guess, a GC occurs in the signal handler, and by the time sigprof-handler gets called the allocated SAP is invalid? 2014-09-27T21:37:30Z stassats: since nothing is pinned, probably 2014-09-27T21:38:23Z stassats: the error is "unexpect forwarding pointer in scavenge: 0xbebdce54, start=0xbebdce54, n=1" 2014-09-27T21:38:31Z stassats: oddly familiar 2014-09-27T21:38:54Z stassats: from scavenge_interrupt_context 2014-09-27T21:39:01Z nyef: The n=1 bit implies either a lisp register set (in the interrupt context) or... yeah, okay. 2014-09-27T21:39:16Z nyef: The other case would have been control stack, but IIRC that goes through a different code path. 2014-09-27T21:39:51Z stassats: (my other error messages have n = % because of the bad printf, now it's fixed) 2014-09-27T21:43:10Z stassats: well, i think the arm case is clear, alloc_sap is split by the gc 2014-09-27T21:43:16Z stassats: and what exactly is an interrupt context? 2014-09-27T21:43:45Z nyef: Unix people would recognize it as a "signal context". 2014-09-27T21:44:15Z stassats: ok, so, a register set 2014-09-27T21:45:05Z nyef: Basically, yes, a saved register set. 2014-09-27T21:47:42Z stassats: and x86-64 doesn't seem to fall for this test case, probably because there are no interrupt context scanning 2014-09-27T21:48:23Z stassats: so, the bad value is deposited somewhere and only then hits the GC 2014-09-27T21:48:25Z nyef: Because the interrupt contexts (and stack generally) are considered to be conservative roots. 2014-09-27T21:49:45Z stassats: that's how it may get into the defun-cached cache 2014-09-27T21:51:33Z nyef: The only reason to use alloc_number instead of make_fixnum is if you expect overflow. 2014-09-27T21:52:22Z stassats: that can happen on easily on 32-bits? 2014-09-27T21:52:28Z nyef: I have no idea. 2014-09-27T21:53:16Z nyef: Okay, next, a quick review of the runtime suggests that only alloc_sap() is vulnerable, and all of the uses of alloc_sap() are in scenarios where dynamic-extent allocation is a perfectly reasonable thing to do. 2014-09-27T21:53:47Z nyef: Well, alloc_sap() or alloc_number(). 2014-09-27T21:54:26Z stassats: i still want a test case 2014-09-27T21:55:01Z nyef tries to page in his C-programming skills. 2014-09-27T21:56:10Z stassats: maybe pkhuong's fix just isn't enough? 2014-09-27T21:57:19Z nyef: It might be enough for x86oids, but it can't be enough for any other platform. 2014-09-27T21:57:37Z stassats: right, but something isn't enough for x86oids either 2014-09-27T21:58:58Z stassats: i'm not clear how does the forwarding_pointer get into the interrupt context? 2014-09-27T21:59:58Z nyef: If it's in memory and gets loaded into a register? 2014-09-27T22:00:09Z edgar-rft joined #sbcl 2014-09-27T22:00:33Z stassats: i assume it's all happening during one GC cycle 2014-09-27T22:00:55Z nyef: I'm thinking two here. 2014-09-27T22:01:23Z stassats: well, that's where my understanding stops, i think it needs two, but then, it looks like one 2014-09-27T22:02:31Z stassats: because by the time the second GC hits, there will be a new interrupt context 2014-09-27T22:03:16Z nyef: That's my point: If there's a poisoned value in the register set at that point... 2014-09-27T22:04:17Z stassats: but it should be poisoned after the gc hits and then discarded? or something 2014-09-27T22:04:50Z stassats: Xach: sorry, gc issues are more pressing, i'll bookmark your apropos thing 2014-09-27T22:08:58Z nyef: Okay, trying again to build with some changes to interrupt_internal_error(). 2014-09-27T22:10:40Z stassats: according to the debug log, there's only one gc, that's strange 2014-09-27T22:10:54Z nyef: Okay, that is odd. 2014-09-27T22:11:15Z stassats: sbcl usually runs a gc just at start up 2014-09-27T22:12:38Z nyef: It does? 2014-09-27T22:12:55Z stassats: last time i checked 2014-09-27T22:13:26Z stassats: that would be on x86-64, though, not arm 2014-09-27T22:14:11Z davazp joined #sbcl 2014-09-27T22:16:27Z stassats: ok, no, there are two gcs 2014-09-27T22:16:48Z stassats: the log is 285 MB and i hear then fan spin up when i try to search it 2014-09-27T22:17:19Z stassats: and the first GC is right after the start up indeed, before any of the error handlers are called 2014-09-27T22:17:34Z nyef: That can't be it, then. 2014-09-27T22:17:48Z nyef: Just running genesis-2 now. 2014-09-27T22:18:05Z nyef: I could get called to dinner at any time. 2014-09-27T22:18:09Z stassats: so, http://paste.lisp.org/display/143871#1 2014-09-27T22:19:44Z nyef: Hrm. 2014-09-27T22:19:59Z nyef: Something definitely not right there. 2014-09-27T22:20:12Z nyef: Why the WP violation that late in the game? Is that normal? 2014-09-27T22:20:33Z stassats: it tries to allocate a new page? 2014-09-27T22:20:52Z nyef: Shouldn't be a WP violation for that on gencgc... 2014-09-27T22:20:58Z nyef: Is this a cheneygc target? 2014-09-27T22:21:08Z stassats: nope, gengc 2014-09-27T22:21:41Z stassats: so, how on earth does the bad value get into the interrupt context if the bad value is created during the interrupt? 2014-09-27T22:21:43Z stassats: or is it? 2014-09-27T22:22:24Z nyef: Umm... How long is 142817 annotation 1 supposed to run? 2014-09-27T22:22:47Z stassats: some time 2014-09-27T22:22:54Z stassats: i guess it is longer on rpi 2014-09-27T22:23:02Z nyef: Okay, because it's been more than a minute now on my x86-64, I think. 2014-09-27T22:23:15Z stassats: x86-64 isn't susceptible 2014-09-27T22:23:23Z nyef: It WAS. 2014-09-27T22:23:33Z stassats: oh, you're about a different paste 2014-09-27T22:23:42Z stassats: about a second 2014-09-27T22:23:53Z nyef: Seriously, I'm about to try and kill it it's been running so long. 2014-09-27T22:24:05Z stassats: you may try running it anew 2014-09-27T22:24:08Z nyef: And this is git HEAD + a simple local hack. 2014-09-27T22:24:16Z nyef: (Well, semi-simple.) 2014-09-27T22:24:38Z nyef: Oh! do-cons never returns. 2014-09-27T22:25:03Z nyef: 324,632,687,552 processor cycles 2014-09-27T22:25:03Z nyef: 229,342,763,424 bytes consed 2014-09-27T22:25:37Z stassats: do-cons should return, i hope 2014-09-27T22:26:20Z nyef: Right, sorry, CONSING never returns. 2014-09-27T22:26:31Z stassats: nothing returns, only crashes 2014-09-27T22:27:26Z nyef: http://paste.lisp.org/display/143872 2014-09-27T22:27:57Z nyef: There's a GCC-ism in there, but it shouldn't be hard to wrap much of it up in a macro. 2014-09-27T22:28:11Z stassats: that might explain why it doesn't crash 2014-09-27T22:29:03Z stassats: will it work on not c-stack-is-control-stack? 2014-09-27T22:29:16Z nyef: I don't see why it wouldn't. 2014-09-27T22:29:37Z nyef: It's unboxed data, after all. 2014-09-27T22:30:11Z nyef: There are maybe three other uses of alloc_sap() in that file that could do with being addressed, but if this fixes things I say run with it. 2014-09-27T22:31:52Z stassats: i'll test it on arm tomorrow, don't want to wait for a rebuild 2014-09-27T22:33:46Z stassats: what happens when the gc meets an object without a widetag? 2014-09-27T22:33:59Z |3b|: nyef: MIPS may be getting more accessible, seen a few PI-like mips things recently 2014-09-27T22:35:14Z nyef: stassats: Either said object has a lowtag (meaning that it's a cons) or we hit a lose function. 2014-09-27T22:35:39Z nyef: |3b|: That's good to hear. I have an Origin 350 sitting under my desk, but it runs IRIX and not Linux. 2014-09-27T22:36:16Z nyef: (minimum specs to consider: At least 600 MHz, at least 32 megs of RAM, at least two CPUs.) 2014-09-27T22:38:18Z stassats: so, currently, the pa in pa_alloc is hit before the widetag is written, but wouldn't it be 0, not 1? 2014-09-27T22:39:12Z nyef: Is this in unpatched HEAD, or with pkhuong's patch? 2014-09-27T22:39:19Z stassats: unpatched 2014-09-27T22:39:42Z stassats: i'm trying to fully understand what's going on 2014-09-27T22:39:45Z nyef: Mmm... That's looking odd. 2014-09-27T22:39:56Z nyef: What I see is it allocating a (cons 0 0). 2014-09-27T22:40:33Z stassats: ok, and it's moved, but the pointer is left? 2014-09-27T22:40:50Z nyef: Hrm. 2014-09-27T22:41:19Z stassats: because it's in the C land 2014-09-27T22:41:38Z stassats: so, how does it get into the interrupt context in the scope of a single GC 2014-09-27T22:41:55Z nyef: Oh, I got _context_sap.header wrong, it's supposed to be (1 << 8), not (2 << 8). 2014-09-27T22:42:16Z nyef: And I'm out of time for now, back in a while. 2014-09-27T22:42:45Z gingerale quit (Ping timeout: 246 seconds) 2014-09-27T22:47:15Z |3b| looks again and sees the 1 MIPS board i was thinking of that is actually out is 512M/bit/ not 512MB, making it a bit less useful for sbcl dev :( 2014-09-27T22:47:27Z |3b|: http://wrtnode.com/ 2014-09-27T22:47:56Z stassats: that's a bullshit way to market things 2014-09-27T22:48:09Z |3b|: 600MHz is probably on the slow end as well 2014-09-27T22:48:41Z stassats: "64 MB is too small, i'll, multiple it by 8!" 2014-09-27T22:49:39Z |3b|: other was some promo thing where they were giving them away free to 'interesting' projects then got overwhelmed by requests almost instantly... who knows if they will actually make a product out of it or for how much 2014-09-27T23:01:16Z nyef: Okay, I'm back. 2014-09-27T23:01:26Z stassats: so, the R0 has the value 1 2014-09-27T23:01:39Z nyef: ... Actually in the context? 2014-09-27T23:01:44Z stassats: yes 2014-09-27T23:02:00Z nyef: Hrm. 2014-09-27T23:02:08Z nyef: Program counter? 2014-09-27T23:02:28Z stassats: 0x00029ee4 2014-09-27T23:02:41Z nyef: That looks like runtime to me. 2014-09-27T23:03:02Z stassats: do_pending_interrupt 2014-09-27T23:03:09Z nyef: Hrm. 2014-09-27T23:03:25Z stassats: what 2014-09-27T23:03:40Z stassats: it's called from C, and it thinks that it's lisp 2014-09-27T23:03:48Z nyef: Yeah, that's where I was heading. 2014-09-27T23:04:06Z nyef: ... because it's a trap. 2014-09-27T23:04:32Z stassats: the whole alloc.c is broken 2014-09-27T23:04:39Z nyef: Well, we knew that. 2014-09-27T23:06:02Z stassats: it makes clear what happens on ARM, but not on x86-64 2014-09-27T23:06:58Z stassats: shouldn't x86-64 deal with a C interrupt context fine? i know it doesn't scavenge, just pins things down 2014-09-27T23:07:19Z nyef: In theory, yes. 2014-09-27T23:08:21Z nyef: Umm... alloc_number() is stupidly broken. 2014-09-27T23:11:27Z stassats: so, on x86 the problem is that the sap isn't pinned down, gets moved, and the widetag is written into the wrong place 2014-09-27T23:12:02Z stassats: but do_pending_interrupt could make it pinned, couldn't it? 2014-09-27T23:13:22Z nyef: So, what you start off with is a (CONS 0 0), that gets smashed up into a #.(INT-SAP 0), and then smashed up into the actual SAP... All before constructing the legitimate object pointer for pinning. 2014-09-27T23:13:48Z stassats: there's actually no pinning 2014-09-27T23:14:08Z nyef: As soon as a legit object pointer is constructed, it's considered to be pinned on x86oids. 2014-09-27T23:14:14Z stassats: unless the C environment is pinned 2014-09-27T23:14:24Z nyef: Here's worse: The result from pa_alloc() is pushed to the control stack... as a FIXNUM! 2014-09-27T23:14:33Z nyef: Yes, the C environment is pinned on x86oids. 2014-09-27T23:14:49Z stassats: when in C? both stack and the register set? 2014-09-27T23:14:58Z nyef: Yes. 2014-09-27T23:15:12Z stassats: why is not pinned then? 2014-09-27T23:15:22Z nyef: Because we only pin things that look like valid pointers. 2014-09-27T23:15:43Z nyef: And until the call to make_lispobj(), we have what looks like a fixnum. 2014-09-27T23:15:44Z stassats: without a lowtag it isn't a valid pointer? 2014-09-27T23:16:02Z nyef: Not just a lowtag, it has to be a lowtag pointing to something that "looks legitimate". 2014-09-27T23:16:14Z stassats: sure, but we got that part satisfied 2014-09-27T23:16:57Z nyef: Barely. We'd have to have LIST_POINTER_LOWTAG until the widetag is written in alloc_unboxed(), and OTHER_POINTER_LOWTAG afterwards. 2014-09-27T23:17:33Z nyef: And, honestly, I'm having a hard time thinking that ANY of this stuff is necessary, apart from alloc_code_object(). 2014-09-27T23:18:10Z nyef: And guess what's run in a WITHOUT-GCING? 2014-09-27T23:18:51Z stassats: i modified alloc_code_object recently, no need to guess 2014-09-27T23:19:28Z nyef: Right, so, I think that moving to stack-allocated SAPs in the interrupt routines should clear the worst of this. 2014-09-27T23:20:46Z nyef: On the other hand, I'm not really set up to TEST said theory on non-x86-64 right now. 2014-09-27T23:21:10Z stassats: i can check on arm and ppc 2014-09-27T23:21:21Z stassats: not right now, but in the immediate future 2014-09-27T23:22:11Z psilord quit (Quit: Leaving.) 2014-09-27T23:22:26Z nyef: I'm currently planning on having a G4 with me next week. 2014-09-27T23:22:35Z nyef: Single core, but should suffice for a few things. 2014-09-27T23:23:27Z stassats stares at 64 ppc threads in htop 2014-09-27T23:25:09Z nyef: Do you need me to set my G5 XServe up and running? 2014-09-27T23:26:14Z nyef: Why is alloc_code_object() in the runtime on gencgc, anyway? 2014-09-27T23:26:24Z stassats: who knows 2014-09-27T23:26:59Z nyef: The first thing that comes to mind is CODE_PAGE_FLAG, which IIRC is broken, and always has been. 2014-09-27T23:27:14Z stassats: probably nobody wanted to write N vops for it 2014-09-27T23:27:23Z stassats: although it can be done in lisp 2014-09-27T23:27:38Z nyef points out that cheneygc alloc-code-object IS a VOP. 2014-09-27T23:27:50Z stassats: right 2014-09-27T23:28:06Z nyef: So we're down to the N=2 case, and the N=1 case back when the decision would have been made. 2014-09-27T23:29:04Z stassats: so, the pkhuong patch seems to have the tagged pointer saved 2014-09-27T23:29:15Z stassats: could it suffer from a gcc optimization? 2014-09-27T23:29:17Z Xach left #sbcl 2014-09-27T23:29:19Z nyef: Which should cover the x86oids? 2014-09-27T23:29:43Z stassats: except that it doesn't always, or that is a different issue, but i don't have a test case 2014-09-27T23:30:01Z stassats: the symptoms are similar 2014-09-27T23:30:14Z nyef: Hrm. 2014-09-27T23:30:19Z stassats: so, that's the only way to come with a test case, understand the original issue, figure why it might not cover things 2014-09-27T23:30:51Z nyef: Okay, alloc_number is only used on MIPS, HPPA, SPARC, and possibly ARM, and in gencgc for the heap exhaustion thing. 2014-09-27T23:31:28Z nyef: And the ARM bit is in a #if 0 block. 2014-09-27T23:32:32Z stassats: so, the scavenge to stumble upon 1, it needs to be in boxed heap 2014-09-27T23:33:06Z stassats: how can it get there without being able to disappear? 2014-09-27T23:33:34Z pkhuong: stassats: my patch only works for x86oids, and we do need a compiler barrier (which I believe I inserted with an asm volatile ("" ::: "memory");) 2014-09-27T23:33:52Z stassats: pkhuong: yeah, i'm investigating x86oids now 2014-09-27T23:33:57Z stassats: the arm thing is more clear to me 2014-09-27T23:34:25Z pkhuong: nyef: code page flag it is, iirc. 2014-09-27T23:34:32Z pkhuong: it used to be a VOP. 2014-09-27T23:35:21Z nyef: Okay, lovely, so if we move it back to being a VOP (no real loss), then we're down to alloc_number as the only real use-case for alloc.c, right? 2014-09-27T23:36:16Z attila_lendvai quit (Quit: Leaving.) 2014-09-27T23:37:04Z nyef: pkhuong: What do you think about my patch for stack-allocating SAPs in the runtime? It's more a proof-of-concept than anything else at this point, but it looks workable to me (and clears up the known failure case on x86oids, at least). 2014-09-27T23:38:13Z pkhuong: if we can make sure lifetimes are safe, sure. 2014-09-27T23:38:44Z nyef: Every single case is for an interrupt context or related item, they're all required to be used in a d-x compliant fashion. 2014-09-27T23:38:48Z pkhuong: I'm not super comfortable with DX SAPs though, because there's no easy way to force a copy (?) 2014-09-27T23:39:03Z stassats: sap-int-int-sap? 2014-09-27T23:39:13Z pkhuong: stassats: all right. 2014-09-27T23:39:19Z nyef: Heh. What I wanted back in the day was an easy way to force a LACK of a copy. 2014-09-27T23:40:02Z pkhuong: stassats: I wouldn't put it beyond us to optimise that away ;) 2014-09-27T23:40:13Z phf left #sbcl 2014-09-27T23:40:28Z stassats: yeah, that was my next thought 2014-09-27T23:40:38Z stassats: but can always notinline it 2014-09-27T23:43:16Z pkhuong: mm.. the compiler barrier wouldn't force tagging 2014-09-27T23:44:02Z stassats: and 1 always being in R0 is from get_pseudo_atomic_interrupted(th) returning true, that's why it predictably failed on ARM 2014-09-27T23:44:06Z pkhuong: we probably have to pass the addess of the tagged SAP to the compiler barrier. 2014-09-27T23:45:33Z stassats: disassemble shows that it's tagged, at least now 2014-09-27T23:52:43Z psilord joined #sbcl 2014-09-27T23:59:53Z stassats: so alloc(), the gc will happen only after it returns and traps on pa?