00:18:46 -!- rpg [~rpg@216.243.156.16.real-time.com] has quit [Quit: rpg] 00:51:25 -!- LiamH [~none@pdp8.nrl.navy.mil] has quit [Ping timeout: 240 seconds] 00:51:40 drdo [~drdo@85.207.54.77.rev.vodafone.pt] has joined #sbcl 02:37:23 drdo` [~drdo@85.207.54.77.rev.vodafone.pt] has joined #sbcl 02:38:54 -!- drdo [~drdo@85.207.54.77.rev.vodafone.pt] has quit [Ping timeout: 244 seconds] 03:16:54 attila_lendvai [~attila_le@87.247.13.189] has joined #sbcl 03:16:54 -!- attila_lendvai [~attila_le@87.247.13.189] has quit [Changing host] 03:16:54 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 03:42:16 -!- tsuru`` [~charlie@adsl-74-179-25-191.bna.bellsouth.net] has quit [Ping timeout: 240 seconds] 03:59:37 Ober [jaimef@dns.mauthesis.com] has joined #sbcl 04:21:30 tcr [~tcr@95-88-46-7-dynip.superkabel.de] has joined #sbcl 04:40:51 -!- drdo` is now known as drdo 04:46:17 akovalen` [~anton@95.72.168.38] has joined #sbcl 04:47:33 -!- akovalenko [~anton@95.72.173.229] has quit [Ping timeout: 256 seconds] 05:06:15 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 256 seconds] 05:09:31 attila_lendvai [~attila_le@87.247.13.189] has joined #sbcl 05:09:31 -!- attila_lendvai [~attila_le@87.247.13.189] has quit [Changing host] 05:09:31 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 05:30:16 -!- tcr [~tcr@95-88-46-7-dynip.superkabel.de] has quit [Quit: Leaving.] 05:37:39 -!- akovalen` is now known as akovalenko 05:42:55 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 252 seconds] 05:57:50 attila_lendvai [~attila_le@87.247.39.4] has joined #sbcl 05:57:50 -!- attila_lendvai [~attila_le@87.247.39.4] has quit [Changing host] 05:57:50 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 06:07:26 -!- Ober [jaimef@dns.mauthesis.com] has quit [Ping timeout: 276 seconds] 06:28:20 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Read error: Operation timed out] 06:59:38 attila_lendvai [~attila_le@87.247.62.1] has joined #sbcl 06:59:38 -!- attila_lendvai [~attila_le@87.247.62.1] has quit [Changing host] 06:59:38 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 07:22:57 -!- pchrist_ [~spirit@gentoo/developer/pchrist] has quit [Quit: leaving] 07:23:29 pchrist [~spirit@gentoo/developer/pchrist] has joined #sbcl 07:28:17 jaimef [jaimef@dns.mauthesis.com] has joined #sbcl 07:37:08 -!- Phoodus [~foo@68.107.217.139] has quit [Ping timeout: 276 seconds] 08:00:13 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 240 seconds] 08:17:12 attila_lendvai [~attila_le@87.247.50.30] has joined #sbcl 08:17:12 -!- attila_lendvai [~attila_le@87.247.50.30] has quit [Changing host] 08:17:12 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 08:30:24 Blkt [~user@89-96-199-46.ip13.fastwebnet.it] has joined #sbcl 08:36:30 good morning everyone 08:43:04 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 240 seconds] 08:45:43 zyg [57e37c83@gateway/web/freenode/ip.87.227.124.131] has joined #sbcl 08:47:54 hi, does anyone know why many math ops in irrat.lisp coerce their return type to single-float? 08:48:05 even the input is bignum 08:48:21 Godmorning. I have an function at http://paste.lisp.org/+2P4J which I belive should be tail-optimized, but it seems not to be. 08:49:49 try optimizing for space? 08:49:55 why do you believe that it should be tail-optimized? 08:50:24 kryztof: I always assume that whenever it is possible and "clean" 08:50:38 ok, you need to fix that assumption 08:50:51 Common Lisp does not mandate tail call merging, ever 08:51:42 and SBCL only does it when the expressed desires of the user favour space or speed over debug 08:52:16 kanru: because that's what the Common Lisp standard says that Lisp implementations must do 08:52:30 if the inputs to irrational functions are all rational, then the answer is returned as a single float 08:54:14 kryztof: I though modern CL compilers had that convention. For example in sbcl (defun foo () (foo)) seems to survive. 08:55:47 Kryztof: what I read from sqrt description is "If NUMBER is a positive rational, it is implementation-dependent 08:55:49 whether ROOT is a rational or a float." 08:56:14 so a "float" implies single-float? 08:57:12 a quick question: can this form (proclaim '(optimize (safety 3) (debug 3) (speed 0))) affect other files than it is put in? 08:57:14 kanru: clhs 12.1.3.3 08:57:21 zyg: yes 08:57:52 kryztof: thanks! 09:00:14 It is that form which is causing the no-tail-call-optimization (not sure I'm using the right word here). Atleast (defun foo () (foo)) will now explode and I'm left in ldb. 09:01:26 progress! :-) 09:01:52 Kryztof: thanks 09:25:59 attila_lendvai [~attila_le@87.247.3.176] has joined #sbcl 09:25:59 -!- attila_lendvai [~attila_le@87.247.3.176] has quit [Changing host] 09:25:59 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 09:32:11 Phoodus [~foo@68.107.217.139] has joined #sbcl 09:43:17 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 252 seconds] 10:32:29 attila_lendvai [~attila_le@87.247.35.97] has joined #sbcl 10:32:29 -!- attila_lendvai [~attila_le@87.247.35.97] has quit [Changing host] 10:32:29 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 10:42:04 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 258 seconds] 10:49:41 drl [~lat@110.139.229.172] has joined #sbcl 11:28:05 attila_lendvai [~attila_le@87.247.61.117] has joined #sbcl 11:28:05 -!- attila_lendvai [~attila_le@87.247.61.117] has quit [Changing host] 11:28:05 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 11:32:57 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 245 seconds] 12:04:11 nikodemus [~nikodemus@cs181063174.pp.htv.fi] has joined #sbcl 12:04:11 -!- ChanServ has set mode +o nikodemus 12:21:23 -!- nikodemus [~nikodemus@cs181063174.pp.htv.fi] has quit [*.net *.split] 12:21:23 -!- drl [~lat@110.139.229.172] has quit [*.net *.split] 12:23:59 nikodemus [~nikodemus@cs181063174.pp.htv.fi] has joined #sbcl 12:23:59 drl [~lat@110.139.229.172] has joined #sbcl 12:23:59 -!- niven.freenode.net has set mode +o nikodemus 12:27:30 -!- drl [~lat@110.139.229.172] has quit [Quit: Leaving] 12:38:28 -!- nikodemus [~nikodemus@cs181063174.pp.htv.fi] has quit [Quit: This computer has gone to sleep] 12:47:11 -!- Phoodus [~foo@68.107.217.139] has quit [Ping timeout: 276 seconds] 13:02:39 nikodemus_ [~nikodemus@dsl-hkibrasgw4-fe5bdf00-15.dhcp.inet.fi] has joined #sbcl 13:19:13 LiamH [~none@pdp8.nrl.navy.mil] has joined #sbcl 13:44:14 -!- jaimef [jaimef@dns.mauthesis.com] has quit [*.net *.split] 13:44:16 -!- sbryant [~freenode@ghanima.slavasaur.com] has quit [*.net *.split] 13:45:04 jaimef [jaimef@dns.mauthesis.com] has joined #sbcl 13:48:24 sbryant [~freenode@ghanima.slavasaur.com] has joined #sbcl 13:57:54 -!- zyg [57e37c83@gateway/web/freenode/ip.87.227.124.131] has quit [Ping timeout: 265 seconds] 14:05:44 nyef [~nyef@c-174-63-105-188.hsd1.ma.comcast.net] has joined #sbcl 14:05:54 G'morning all. 14:06:49 yo 14:07:36 o/ 14:07:55 Hello nikodemus_. 14:07:56 nyef: what's you verdict re. symbol-value-in-thread.3? 14:08:03 your, even 14:08:19 Several missing read barriers, one missing CAS lock, and it STILL doesn't work. 14:08:28 ouch 14:08:39 where's the missing CAS lock? 14:09:04 And MAKE-LISP-OBJ isn't even remotely thread-safe (managed to get a GC fault once), and I've had a lockup in deadlock-detection 1. 14:09:16 Umm... Around one of the waitqueue functions. 14:09:37 It's only protected by the mutex, and later in the same function is the only hit to a waitqueue function NOT protected by the same mutex. 14:10:17 DOH. i see it 14:11:08 It helps, but doesn't fix entirely. 14:11:49 I don't think this test case is really stressing the SVIT function, it's stressing GC, MAKE-THREAD, JOIN-THREAD, and SEMAPHOREs. 14:13:18 yeah 14:13:42 Is THREAD-YIELD supposed to act as a read barrier? 14:13:53 (guts of with-cas-lock.) 14:14:22 CAS in %%wait-for-mutex should have a read barrier. 14:14:42 WAKEUP in condition-wait should have a read barrier. 14:14:47 what? isn't CAS an implicit barrier? 14:15:03 You're not CASing each time through the loop. 14:15:17 You're doing some funky thing where you only CAS if you have reason to expect it to succeed. 14:15:22 no, but we're never leaving the loop without CAS succeeding 14:15:41 And without a read barrier, you have to wait for a random interrupt to force the read barrier. 14:16:35 huh 14:16:36 On the whole, I'm not convinced that the extra logic helps, given that it needs a read barrier or is oblivious to state change until it takes an interrupt. 14:17:06 nyef: since a successful CAS is a barrier, is a read barrier needed? 14:17:31 compare-and-compare-and-swap is a classic way to do this, at least on x86oid. 14:17:34 pkhuong: Yes, because it does a read to decide if it wants to CAS. 14:17:50 nyef: do you mean it could spin indefinitely? 14:18:27 nikodemus_: No, because kernel interrupt handling can reasonably be expected to supply a barrier at unpredictable times. 14:19:45 ok. i'll simplify it to use just CAS without the volatile pre-read for now. (the volatile read is really a leftover/reflex from when i had it spin around spin-loop-hint without yielding) 14:20:17 Still, all that, and things still don't quite work. 14:20:29 nikodemus_: with ll/cc-based CAS, I don't know that it's useful anyway. 14:20:33 how about with futexes? 14:20:39 *LL/SC 14:20:54 With futexes it runs through no problem. 14:20:58 ok 14:21:28 we go for a compare first to avoid taking exclusive access on cache lines/bus when it's probably not going to work. I don't think LL has that effect? 14:21:51 pkhuong: I don't believe it does, no. 14:22:27 right -- and i don't think the pre-read helps measurably when failure is going to be followed by a context-switch anyways 14:22:35 k. 14:22:41 It's a per-core flag saying "if this cache line is seen written on the bus, don't overwrite it." 14:23:54 So it'd probably have to hit the bus for the read, and only for the write if it is uncontested. 14:24:51 What bugs me is that I've been unable to find any other plausible causes for lockups. 14:26:15 Ah, good. SB!KERNEL is a private package. We can declare make-lisp-obj to be horribly unsafe on pointer objects if the GC is enabled and not have to worry about outside uses. 14:27:22 sbryant- [~freenode@ghanima.slavasaur.com] has joined #sbcl 14:33:41 -!- sbryant [~freenode@ghanima.slavasaur.com] has quit [*.net *.split] 14:37:38 nyef: given threads on PPC, we should have write-barriers in structure constructors and clos object initialization, no? 14:39:00 ... why? 14:39:36 If you're going to marshal them to another thread, you have to have a barrier anyway, surely? 14:41:40 i think "initialize object when seen by another thread is completely initialized" is something of a bare minimum of memory model 14:41:54 make that "an intialized object..." 14:43:26 nikodemus_: is it GC and type safe? 14:43:34 I think that any inter-thread marshalling of objects should already involve a barrier. 14:43:36 I don't care about the rest. 14:43:59 And since the GC does a full barrier for every thread, that much works. 14:44:21 i think this is perfectly idiomatic, and we even do it ourselves: (unless (cached-bar foo) (setf (cached-bar foo) (make-bar ...)) ; assume thread safe but racy -- we might waste an already created BAR, but other threads will always see a completely initialized one 14:45:01 nyef: and type checks? Passing an array to another thread and managing to treat it as a CONS at regular safety levels sounds lossy to me. 14:46:49 I guess the least we could do is add a write barrier to the end of p-a. 14:47:04 At that point, the header is guaranteed to be written. 14:49:19 -!- sdemarre [~serge@91.176.187.200] has quit [Ping timeout: 248 seconds] 14:49:21 nikodemus_: I think the correct answer there is to (prog1 (make-bar ...) (barrier (:write))). 14:50:27 nyef: i seriously think that's too onerous -- and there is already plenty of code that assumes that other threads cannot see partially initialized structures or similar 14:51:34 how expensive is a write barrier on non-x86oids? 14:52:15 slyrus [~chatzilla@adsl-99-35-53-209.dsl.pltn13.sbcglobal.net] has joined #sbcl 14:52:21 Not sure, TBH. 14:52:32 pretty sure I've seen barrier before publication on my end. 14:52:41 On PPC it's a SYNC instruction. 14:52:51 And our CAS implementation has to do both SYNC and ISYNC. 14:53:03 there isn't really any difference between initialisation and mutation of structs. 14:53:56 -!- jaimef [jaimef@dns.mauthesis.com] has quit [Ping timeout: 276 seconds] 14:54:44 now that we're not just x86oids threads, we could really use an explicit memory model 14:54:52 jaimef [jaimef@dns.mauthesis.com] has joined #sbcl 14:55:10 can we lift the java one? i haven't read it, but apparently it isn't terrible 14:55:15 nooo. 14:55:25 it's basically impossible to implement outside x86 14:55:30 much too stringent. 14:56:04 ok 14:57:09 does C or C++[0x] have one? 14:57:19 0x has something. 14:57:53 -!- whoops [u549@gateway/web/irccloud.com/x-imcngkgbzbtdsblr] has quit [Remote host closed the connection] 14:58:31 gcc.gnu.org/wiki/MemoryModel 15:01:20 this seems to be about type-based alias analysis. 15:01:33 yeah, not what we're afternyef: on PPC, does CAS imply a w 15:01:36 aagh 15:01:52 nyef: on PPC, does CAS imply a write-barrier? 15:02:03 Yes. The documentation even mentions this! 15:02:19 oh, good :) 15:04:26 Actually, ISTR writing the documentation to say "these operations are all write barriers", and listed off a goodly number of thread functions. 15:05:13 Heh. That lazy-cache trick? Use CAS to set the cache. 15:05:54 barrier on publish! 15:06:02 (and barrier on privatize) 15:07:20 nyef: the thing with the lazy cache trick is that portable code wants to do that 15:07:27 whoops [u549@gateway/web/irccloud.com/x-pcuyuxfhpkfegtig] has joined #sbcl 15:07:35 sdemarre [~serge@91.176.142.225] has joined #sbcl 15:07:38 nikodemus_: portable code with threads? 15:08:03 assuming it works, it is pretty much the only way to write portable code that caches anything in an object that can be seen from multiple threads 15:08:24 if it doesn't work... well, then portable code is SOL 15:09:18 So... expose CAS from B-T? 15:09:20 If they want to do it portably, they need to lock. 15:09:23 we for example use it to update INFOdb without locking 15:09:30 If that's too slow, get BX to expose atomic primitives. 15:11:14 ... I thought the globaldb was one of the structures that was so badly thread-unsafe that we had to protect it with the big compiler lock? 15:13:21 oh, sorry. that rewrite-cache trick was used in hash-caches. i misrememeber 15:18:25 Quadrescence_ [~quad@unaffiliated/quadrescence] has joined #sbcl 15:19:10 So, how bad would it be if we used a CAS-lock in the allocation sequence? Because that sort of thing plus moving the allocation pointer to a shared variable would allow us to use cheneygc for threads... 15:20:22 -!- Quadrescence [~quad@unaffiliated/quadrescence] has quit [Ping timeout: 258 seconds] 15:20:22 -!- Quadrescence_ is now known as Quadrescence 15:21:21 it'd scale horribly, but would probably be good enough for a lot of people. 15:21:34 Not Worse Than The Other Python 15:21:43 Heh. 15:22:15 what's the CAS lock for? 15:22:42 Plausibly cheaper than a full lock, and easy enough to write inline in a VOP. 15:23:08 What's the critical section? 15:23:32 nyef: I assume cheneygc+threads would be a compile-time option if it was done? 15:23:49 milanj [~milanj_@79-101-181-128.dynamic.isp.telekom.rs] has joined #sbcl 15:23:50 pkhuong: Access to the allocation pointer. 15:23:59 redline6561: Yeah, compile-time option. 15:24:23 I need caffeine in my bloodstream for this. 15:25:42 nyef: and we already have signal handlers, so we already handle GCing at random times. 15:25:53 so we only need to protect against GC in the middle of allocation. 15:26:01 p-a, remember? 15:26:16 right. 15:27:17 is that active on cheney platforms? What would happen if I had a simultaneous call to GC and allocation? 15:28:11 Same as on gencgc, the allocation is in a p-a block, so the stop-for-gc signal gets deferred. 15:28:32 hi, can someone take a quick look at http://paste.lisp.org/display/125882 15:28:45 i'm getting this on amazon ec2 x86 instance 15:28:56 is there any known issue on amazon ec2 machines 15:29:26 btw. this is threaded program using zs3 library 15:29:37 milanj: have you tried upgrading? 15:29:47 how about s/p-a/PCLSR/ for allocations (at least in the fastpath)? 15:29:48 it's sbcl 1.0.52 15:29:52 if you meant that 15:29:56 homie [~levgue@xdsl-78-35-130-40.netcologne.de] has joined #sbcl 15:30:07 ISTR an issue with ec2. We even had a tiny C test case, but I don't know that they did anything with it. 15:30:34 I haven't run into any issues with my ec2 instance yet. 15:30:39 At least, no sbcl issues. 15:31:54 hmm, anyone familiar with ways to overcome this ? 15:32:09 i mean, from amazon side 15:33:34 Well, you started off with a connection-reset-by-peer. then caught an "unexpected errno 12", then things blew up, right? 15:33:36 no clue. And I can't find any note on the issue on my end. 15:34:05 nyef, looks like 15:34:27 pkhuong, I tried google for it, no success 15:35:50 errno 12 is out of memory... 15:36:17 ... do you get that if you run out of FDs? 15:37:18 nyef: so, if that hack works, it actually wouldn't be *that* hard to have tiny thread-local allocation pools? 15:37:38 no. errno 24 is too many open files. 15:37:42 i've run fine on ec2, both large and small instances -- but it's been a few months 15:39:01 nikodemus_, multi-threaded ? 15:39:04 It looks to me like you actually just ran out of memory inside some bit of sbcl's sockets wrapper (getaddrinfo maybe), and 15:39:18 milanj: excessively so 15:39:19 if this is connected with threads in a first place 15:39:20 sbcl didn't catch that properly, and then had a null pointer deref due to the failed allocation 15:39:21 pkhuong: Umm... I have no idea. 15:39:27 antgreen [user@nat/redhat/x-yzjqcujqegcmlvcx] has joined #sbcl 15:39:45 milanj: the most common causes for memory faults are bugs in foreign code (ie. trying to write to a lisp vector and scribbling past the end, etc), type-errors in unsafe code, and lying to the compiler 15:40:14 "Memory fault at 0" -> null pointer deref. 15:40:36 nyef: just treat unused pre-allocated pools as (dead) conses. 15:41:14 pkhuong: Right, but there are bound to be further gotchas. 15:41:22 It's plausible, at least. 15:41:43 beslyrus [~Brucio-12@adsl-99-35-53-209.dsl.pltn13.sbcglobal.net] has joined #sbcl 15:41:49 milanj: threads can easily make such things cause trouble that might be hidden when running single-threaded, for example because a GC just doesn't happen while you have references to corrupted objects 15:42:29 nyef: and that's how we get to gencgc on x86 (: 15:42:59 but yeah, in this case figuring out the first error would be a productive first step 15:45:15 I guess it starts with "Error couldn't read from #: Connection reset by peer in thread # " 15:46:04 I'm not sure if "zs3" is using some foreign code down the way 15:46:11 that / the "Unexpected errno" 15:46:12 -!- antgreen [user@nat/redhat/x-yzjqcujqegcmlvcx] has quit [Read error: Connection reset by peer] 15:46:42 milanj: can you reproduce this? 15:47:10 I'm sure i can, since I got this on 4 machines 15:47:34 (that is one of the most important steps towards nailing down an issue like this) 15:47:40 neither errors are from SBCL itself, right? 15:47:42 btw. I'm using core dumped with save-lisp-and-die, but I guess that doesn't makes any difference 15:48:09 antgreen [user@nat/redhat/x-wizwfeeqgsuvebxq] has joined #sbcl 15:49:06 the unexpeted errno /could/ be from sbcl's get-protocol-by-name 15:50:22 milanj: please try to reproduce it. if it turns out you can repeat this in <10 minutes, the debugging approach is going to be pretty different than if it takes hours of running to reproduce 15:51:00 it happened after 20-30 minutes, let me try 15:51:08 do I need to make some code change to catch it better ? 15:51:25 Reproduce first, code change later? 15:52:33 yeah 15:52:56 mm... where is the buffer freed in the getprotobyname_r path? 15:52:57 There's a chance that it's the sort of bug that goes away if you try to look at it too closely. 15:55:12 I can see the out of memory resulting in a null pointer that's written to by getprotobyname_r 16:00:38 yeah, it looks like it leaks 16:01:18 http://paste.lisp.org/+2P4Q/1 for my annotations 16:01:45 easy fix for milanj: don't pass :protocol 16:05:21 buflen should be an in/out pointer, like fortran library do. 16:05:24 oh well. 16:10:08 sbcl.core? 16:10:11 seriously? 16:10:46 Seriously. 16:10:57 *nyef* blames the previous administration. 16:11:09 thought it was odd a core file was in the bin dir as I had not run it as root 16:11:30 We call our sbcl cores ".dxl", by happy accident. :) 16:11:45 that reminds me of the story of the person who was carefully constructing the core of their business case in the appropriately-named file 16:12:05 ok get to watch it spend another day compiling on this slow hardware 16:12:20 ... day? 16:13:02 recompiling *sbcl*? 16:13:09 it's hotpatchable. 16:13:27 oh wait, another person 16:13:52 That's a point. If you have a build directory, you could plausibly pick up from make-genesis-2, which is the tail-end of the last host phase. 16:14:30 or slam.sh which is already much faster 16:15:04 Right, slam wouldn't be much slower than picking up from make-genesis-2 if you have an after-xc core. 16:15:12 although all that is too late if jaimef has restarted 16:15:16 If you don't have an after-xc core, you'll need to go the genesis-2 route. 16:15:51 but the last time it took a day to compile sbcl for me was in 2003 on a then-ancient HPPA 1.x machine 16:15:57 churib [~churib@95.156.194.105] has joined #sbcl 16:16:37 so, I look forward to hearing just what exotic hardware is being used 16:17:23 Mmm. My slowest build environment is a dual-core 800MHz G4. Takes a couple hours, maybe. 16:19:41 whether it is at the root of milanj's issue or not, i have a fix for the memory leak going in asap 16:22:25 nikodemus_, I can patch on one of servers and let it work a bit 16:23:20 no issues in last 15 minutes on instances i got it previously 16:25:18 http://paste.lisp.org/display/125882#2 # hotpatch, but i'd really like to see it reproduced before you patch if possible 16:27:45 nyef: i think thread-waiting-for might be needing barriers as well 16:28:00 i'll push a patch to github for you to test soonish 16:29:17 s/test/review and maybe test/ 16:31:36 Cool, thanks. 16:32:08 nikodemus_: the *test* should be barriered up 16:32:26 (in wait-for) 16:32:46 pkhuong: and a :write barrier on the other side 16:34:22 at least my understanding is that a standalone :read barrier doesn't do much good unless there's a corresponding :write barrier? 16:34:46 CAS is an implicit barrier 16:35:32 is a loop around non-interruption-safe code makeing it uninterruptible ? 16:37:16 attila_lendvai [~attila_le@87.247.10.114] has joined #sbcl 16:37:16 -!- attila_lendvai [~attila_le@87.247.10.114] has quit [Changing host] 16:37:16 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 16:38:49 homie: i don't understand the question 16:39:54 well i seem to be unable to state the question better ..... 16:42:43 to be specific, i have no idea what you mean by non-interruption-safe code 16:43:07 asynch-interrupt-unsafe code? 16:43:17 asynch-interrupt-safe code? 16:43:22 code inside WITHOUT-INTERRUPTS? 16:43:31 code inside WITH-INTERRUPTS? 16:43:36 something different? 16:45:47 code inside without-interrupts, but the code itself is non interrupt-safe 16:46:17 like cond-wait for example 16:47:54 You mean calling a synchronization function with interrupts locked out? 16:49:34 tsuru` [~charlie@adsl-74-179-25-191.bna.bellsouth.net] has joined #sbcl 16:52:07 ok (loop (without-interrupts ...)) will be interruptible on each exit/entry to the WITHOUT-INTERRUPTS, but not inside it. if an interrupt arrives while inside the WITHOUT-INTERRUPTS, it will be handled when WITHOUT-INTERRUPTS is exited 16:52:38 whereas (without-interrupts (loop ...)) will not be 16:52:59 assuming there's no WITH-LOCAL-INTERRUPTS, ALLOW-WITH-INTERRUPTS/WITH-INTERRUPTS involved 16:53:22 does that answer your question? 16:55:18 yes, thank you 16:55:39 waoh 16:56:24 ok so (without-interrups (loop....(cond-wait....))) will not be interruptible....ok 16:58:57 Was it that without-gcing implied without-interrupts, or without-interrupts implied without-gcing? 17:00:08 without-gcing implies without-interrupts 17:00:11 iirc 17:00:21 the other, definitely not 17:18:39 nikodemus_, http://paste.lisp.org/display/125882#3 17:20:25 milanj: is the process still up? 17:20:46 yes 17:24:58 can you compare the RSS (via eg top) to a similar process in a healthy state? 17:25:57 borkman [~user@S0106001111de1fc8.cg.shawcable.net] has joined #sbcl 17:29:41 looks like process ate all of memory 17:34:49 ok. then it very likely current git HEAD -- or the hotpatch i pasted -- will fix the issue 17:35:07 s/it/it is/ 17:37:07 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 245 seconds] 17:38:22 ok, I will try with it, thanks for your time 17:41:10 -!- Blkt [~user@89-96-199-46.ip13.fastwebnet.it] has quit [Remote host closed the connection] 17:58:45 drl [~lat@110.139.229.172] has joined #sbcl 18:07:24 -!- antgreen [user@nat/redhat/x-wizwfeeqgsuvebxq] has quit [Remote host closed the connection] 18:10:11 does the sbcl compiler have a name, like python of cmucl ? 18:10:48 -!- drl [~lat@110.139.229.172] has quit [Quit: Leaving] 18:11:32 homie: python is still used, as far as i know. 18:11:45 oh ok 18:12:55 Yeah, we haven't renamed the compiler. 18:13:17 It would be interesting to know the major improvements to python (i.e. not ports to architectures, threading, extensions to the standard) such as overhauls to type propagation or various anaylsis passes. 18:13:19 ... if I ever do a re-implement of the compiler, though, it'll probably be called "anaconda". 18:13:36 hmm, i'd have proposed alligator! 18:13:37 10 years results in a lot of churn, I would think. 18:13:38 lol 18:15:14 there hasn't been anything massive. the biggest thing is alexey's split of continuation into ctran and lvar components way back when 18:17:02 ... Which reminds me, I think we may want to merge ctran and lvar together... d-: 18:17:40 Hahaha. 18:18:23 (Or, more accurately, if we have an LVAR for "control dependency", we should be able to dispense with intra-block CTRANs, yielding a DAG representation of code within a block.) 18:19:20 who does use a body go without tagbody ? 18:19:56 ? 18:20:38 i got warnings in the sbcl compile, somewhere, it was unable to find the tag associated to the body, and it seems there's none.... 18:21:01 A few things have implicit tagbody, but if you can narrow things down a little more...? 18:21:37 oh, it's a do loop around a body, and the compiler complains, there's no loop tag 18:21:53 it has a go in the body... 18:25:47 -!- hlavaty [~user@91-65-217-112-dynip.superkabel.de] has quit [Read error: Operation timed out] 18:32:57 nyef: https://github.com/nikodemus/SBCL/tree/nyef-review 18:34:24 Qworkescence [~quad@unaffiliated/quadrescence] has joined #sbcl 18:34:53 debug-impure.lisp has a infinit error error, so an infinite loop..... 18:35:02 is it expected to see bunch of test failures on "threads.pure.lisp" (git head) ? 18:35:39 milanj: not as such. what platform and what build options? 18:36:54 32bit centos 5.4, just, just :sb-thread in customize-target-features.lisp 18:37:34 (amazon ec2 machine) 18:37:57 milanj: threads are enabled by default on linux these days 18:38:03 but that shouldn't really matter 18:38:25 can you lisppaste the failures? 18:38:31 nikodemus_: That's clearly going to take a full build. s/aught/ought/ in target-thread? 18:38:58 wtf, test infinit-error-protection throws me into debugger in tests, not recoverable... 18:39:19 i typed continue and it opened vim for me..... 18:39:21 http://paste.lisp.org/display/125886 18:39:22 lol 18:39:40 Is there a rough thought as to when threads will be enabled by default in darwin and other POSIX SBCLs? "After the bugs have been shaken out", I guess? 18:39:46 nyef: my grammar isn't sufficient to tell me which is right, so i'll take yous on faith 18:40:05 redline6561: after known issues with stability have been fixed 18:40:22 I'd probably have gone with "should" instead of "ought" or "aught", really. 18:40:54 Got it. Also, a curiosity from reading the nyef-review commit message: What's the perf. hit like with the addition of barriers for PPC? 18:41:04 milanj: wow 18:41:37 can you run "sh run-tests.sh threads.pure.lisp --break-on-failure" and paste from the part where it first breaks 18:42:12 redline6561: Perf hit on which platforms? 18:42:16 for x86oids, zero. read and write barriers are nops on x86oids 18:42:25 Really? Interesting. 18:42:29 UNTRUE. 18:42:46 At least, I'm fairly sure that's untrue. 18:42:55 nikodemus_, http://paste.lisp.org/display/125886#1 18:43:25 nyef: I'm on x86 but I figured there would be hits on the same order of magnitude for both archs. 18:43:29 nyef: in x86-64/system.lisp the vops have no bodies 18:43:30 And was curious about both. 18:43:38 Ah, okay, read and write are NOPs. 18:43:49 It's the full memory barrier that isn't. 18:43:54 only :memory does something 18:44:06 I stand (well, sit) corrected. 18:45:08 ok up until make-genesis-2 there seems no bugs.... 18:45:30 And on PPC, the memory, read, and write barriers all emit a SYNC instruction. 18:45:52 Because the PPC architecture is a bad fit to the barrier semantics we use. 18:46:27 (We specifically assume the alpha memory model, as it's the nastiest one, and is what the linux kernel does.) 18:46:48 Ah yes. I remember reading a little about that somewhere. LWN probably. 18:47:37 ok some inlinings were not possible.... 18:48:28 milanj: http://paste.lisp.org/display/125886#2 # put this into threads.pure.lisp and rerun with --break-on-failures 18:51:42 milanj: if that's a small amazon instance, some failures are probably to be expected because it will likely croak before conceeding to spawn the ungodly amounts of threads some of the tests spawn... but those are really odd ones to break, so something else is going on 18:53:16 http://paste.lisp.org/display/125886#3 18:53:18 yes, it's small 18:53:48 i can test it on large if it makes any change .. 18:53:50 can you add *features* there? 18:54:09 just to make sure... 18:54:46 (you can run the sbcl in the toplevel dir with "sh run-sbcl.sh") 18:56:01 http://paste.lisp.org/display/125886#4 19:00:14 that looks quite sane 19:02:55 http://paste.lisp.org/display/125886#5 19:03:40 arg, no, wrong testcase 19:06:38 http://paste.lisp.org/display/125886#6 19:07:00 if that returns :UNWIND, something is badly wrong with your sbcl 19:07:01 is deleting unreachable code notes ok in build ? 19:07:07 yes 19:07:18 ok 19:09:09 milanj: i need to head home, but a build where you get :UNWIND there has serious problems -- possibly due to a broken libc or kernel 19:09:20 it return :slept 19:09:29 ok, that's good 19:09:50 anyway, I will try to build it on proper box 19:10:25 so it's /plausible/ that it's just futexes that are more likely than typical to give a bogus wakeup, which is strange but less serious than signals going through when they're masked... 19:13:42 How likely is it that we depend on getting an occasional spurious wakeup on a futex? 19:21:46 i don't think we do, but who knows? 19:22:15 easy enough to test, i guess 19:22:41 but now i /really/ need to run... 19:22:56 Fair enough. Enjoy your commute. 19:30:47 btw. nikodemus_, if you are still there, judging by top (lookin at old and patched sbcl), that get-protocol-by-name patch does fix things 19:31:47 -!- nikodemus_ [~nikodemus@dsl-hkibrasgw4-fe5bdf00-15.dhcp.inet.fi] has quit [Ping timeout: 252 seconds] 19:34:58 nikodemus [~Nikodemus@GGZYYYMMCCXV.gprs.sl-laajakaista.fi] has joined #sbcl 19:34:58 -!- ChanServ has set mode +o nikodemus 19:35:46 for which parts ? 19:35:55 network ? 19:36:00 sockets ? 19:37:30 this one: http://paste.lisp.org/display/125882 19:38:15 loading restas results in sbcl hanging in select 19:40:15 jaimef: uniterruptible? 19:40:38 yeah 19:40:39 uninterruptible, even 19:40:46 SIGINFO is all that replies 19:42:41 jaimef: someone has been naughty and has frobbed *on-dangerous-i-forget-the-exact-name* 19:42:59 set it to :error and you'll get a backtrace right before that fatal select 19:44:09 hmm 1.0.51. let me upgrade to 53 and see if it helps 20:11:07 gabnet [~gabnet@245.23.67.86.rev.sfr.net] has joined #sbcl 20:29:06 -!- nikodemus [~Nikodemus@GGZYYYMMCCXV.gprs.sl-laajakaista.fi] has quit [Quit: Leaving] 20:29:23 nikodemus [~Nikodemus@cs181063174.pp.htv.fi] has joined #sbcl 20:29:23 -!- ChanServ has set mode +o nikodemus 20:53:01 prxq [~mommer@mnhm-5f75da7a.pool.mediaWays.net] has joined #sbcl 21:09:55 -!- nyef [~nyef@c-174-63-105-188.hsd1.ma.comcast.net] has quit [Quit: G'night all.] 21:29:57 -!- gabnet [~gabnet@245.23.67.86.rev.sfr.net] has quit [Quit: Quitte] 21:31:31 udzinari [~user@ip-89-102-12-6.net.upcbroadband.cz] has joined #sbcl 21:32:23 -!- nikodemus [~Nikodemus@cs181063174.pp.htv.fi] has quit [Quit: Leaving] 21:42:12 *akovalenko* would like to have "during-xc-core" (so when xc host runs with disabled debugger, it would dump before exit on errors, and set up a toplevel function to go on from the last compiled file.. maybe reenabling debugger in that case) 22:10:38 -!- udzinari [~user@ip-89-102-12-6.net.upcbroadband.cz] has quit [Remote host closed the connection] 22:45:23 -!- LiamH [~none@pdp8.nrl.navy.mil] has quit [Quit: Leaving.] 23:02:39 -!- dsp_ [~tt@lebesgue.cowpig.ca] has quit [Read error: Operation timed out] 23:03:54 dsp_ [~tt@lebesgue.cowpig.ca] has joined #sbcl 23:20:46 -!- prxq [~mommer@mnhm-5f75da7a.pool.mediaWays.net] has quit [Quit: Leaving] 23:24:10 -!- Kryztof [~user@81.174.155.115] has quit [Ping timeout: 260 seconds] 23:41:49 -!- Qworkescence [~quad@unaffiliated/quadrescence] has quit [Quit: Leaving] 23:42:04 -!- milanj [~milanj_@79-101-181-128.dynamic.isp.telekom.rs] has quit [Quit: Leaving]