00:18:46 -!- rpg [~rpg@216.243.156.16.real-time.com] has quit [Quit: rpg]
00:51:25 -!- LiamH [~none@pdp8.nrl.navy.mil] has quit [Ping timeout: 240 seconds]
00:51:40 drdo [~drdo@85.207.54.77.rev.vodafone.pt] has joined #sbcl
02:37:23 drdo` [~drdo@85.207.54.77.rev.vodafone.pt] has joined #sbcl
02:38:54 -!- drdo [~drdo@85.207.54.77.rev.vodafone.pt] has quit [Ping timeout: 244 seconds]
03:16:54 attila_lendvai [~attila_le@87.247.13.189] has joined #sbcl
03:16:54 -!- attila_lendvai [~attila_le@87.247.13.189] has quit [Changing host]
03:16:54 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl
03:42:16 -!- tsuru`` [~charlie@adsl-74-179-25-191.bna.bellsouth.net] has quit [Ping timeout: 240 seconds]
03:59:37 Ober [jaimef@dns.mauthesis.com] has joined #sbcl
04:21:30 tcr [~tcr@95-88-46-7-dynip.superkabel.de] has joined #sbcl
04:40:51 -!- drdo` is now known as drdo
04:46:17 akovalen` [~anton@95.72.168.38] has joined #sbcl
04:47:33 -!- akovalenko [~anton@95.72.173.229] has quit [Ping timeout: 256 seconds]
05:06:15 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 256 seconds]
05:09:31 attila_lendvai [~attila_le@87.247.13.189] has joined #sbcl
05:09:31 -!- attila_lendvai [~attila_le@87.247.13.189] has quit [Changing host]
05:09:31 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl
05:30:16 -!- tcr [~tcr@95-88-46-7-dynip.superkabel.de] has quit [Quit: Leaving.]
05:37:39 -!- akovalen` is now known as akovalenko
05:42:55 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 252 seconds]
05:57:50 attila_lendvai [~attila_le@87.247.39.4] has joined #sbcl
05:57:50 -!- attila_lendvai [~attila_le@87.247.39.4] has quit [Changing host]
05:57:50 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl
06:07:26 -!- Ober [jaimef@dns.mauthesis.com] has quit [Ping timeout: 276 seconds]
06:28:20 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Read error: Operation timed out]
06:59:38 attila_lendvai [~attila_le@87.247.62.1] has joined #sbcl
06:59:38 -!- attila_lendvai [~attila_le@87.247.62.1] has quit [Changing host]
06:59:38 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl
07:22:57 -!- pchrist_ [~spirit@gentoo/developer/pchrist] has quit [Quit: leaving]
07:23:29 pchrist [~spirit@gentoo/developer/pchrist] has joined #sbcl
07:28:17 jaimef [jaimef@dns.mauthesis.com] has joined #sbcl
07:37:08 -!- Phoodus [~foo@68.107.217.139] has quit [Ping timeout: 276 seconds]
08:00:13 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 240 seconds]
08:17:12 attila_lendvai [~attila_le@87.247.50.30] has joined #sbcl
08:17:12 -!- attila_lendvai [~attila_le@87.247.50.30] has quit [Changing host]
08:17:12 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl
08:30:24 Blkt [~user@89-96-199-46.ip13.fastwebnet.it] has joined #sbcl
08:36:30 <Blkt> good morning everyone
08:43:04 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 240 seconds]
08:45:43 zyg [57e37c83@gateway/web/freenode/ip.87.227.124.131] has joined #sbcl
08:47:54 <kanru> hi, does anyone know why many math ops in irrat.lisp coerce their return type to single-float?
08:48:05 <kanru> even the input is bignum
08:48:21 <zyg> Godmorning. I have an function at http://paste.lisp.org/+2P4J which I belive should be tail-optimized, but it seems not to be.
08:49:49 <Kryztof> try optimizing for space?
08:49:55 <Kryztof> why do you believe that it should be tail-optimized?
08:50:24 <zyg> kryztof: I always assume that whenever it is possible and "clean"
08:50:38 <Kryztof> ok, you need to fix that assumption
08:50:51 <Kryztof> Common Lisp does not mandate tail call merging, ever
08:51:42 <Kryztof> and SBCL only does it when the expressed desires of the user favour space or speed over debug
08:52:16 <Kryztof> kanru: because that's what the Common Lisp standard says that Lisp implementations must do
08:52:30 <Kryztof> if the inputs to irrational functions are all rational, then the answer is returned as a single float
08:54:14 <zyg> kryztof: I though modern CL compilers had that convention. For example in sbcl (defun foo () (foo)) seems to survive.
08:55:47 <kanru> Kryztof: what I read from sqrt description is "If NUMBER is a positive rational, it is implementation-dependent
08:55:49 <kanru> whether ROOT is a rational or a float."
08:56:14 <kanru> so a "float" implies single-float?
08:57:12 <zyg> a quick question: can this form (proclaim '(optimize (safety 3) (debug 3) (speed 0)))  affect other files than it is put in?
08:57:14 <Kryztof> kanru: clhs 12.1.3.3
08:57:21 <Kryztof> zyg: yes
08:57:52 <zyg> kryztof: thanks!
09:00:14 <zyg> It is that form which is causing the no-tail-call-optimization (not sure I'm using the right word here). Atleast (defun foo () (foo)) will now explode and I'm left in ldb.
09:01:26 <Kryztof> progress! :-)
09:01:52 <kanru> Kryztof: thanks
09:25:59 attila_lendvai [~attila_le@87.247.3.176] has joined #sbcl
09:25:59 -!- attila_lendvai [~attila_le@87.247.3.176] has quit [Changing host]
09:25:59 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl
09:32:11 Phoodus [~foo@68.107.217.139] has joined #sbcl
09:43:17 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 252 seconds]
10:32:29 attila_lendvai [~attila_le@87.247.35.97] has joined #sbcl
10:32:29 -!- attila_lendvai [~attila_le@87.247.35.97] has quit [Changing host]
10:32:29 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl
10:42:04 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 258 seconds]
10:49:41 drl [~lat@110.139.229.172] has joined #sbcl
11:28:05 attila_lendvai [~attila_le@87.247.61.117] has joined #sbcl
11:28:05 -!- attila_lendvai [~attila_le@87.247.61.117] has quit [Changing host]
11:28:05 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl
11:32:57 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 245 seconds]
12:04:11 nikodemus [~nikodemus@cs181063174.pp.htv.fi] has joined #sbcl
12:04:11 -!- ChanServ has set mode +o nikodemus
12:21:23 -!- nikodemus [~nikodemus@cs181063174.pp.htv.fi] has quit [*.net *.split]
12:21:23 -!- drl [~lat@110.139.229.172] has quit [*.net *.split]
12:23:59 nikodemus [~nikodemus@cs181063174.pp.htv.fi] has joined #sbcl
12:23:59 drl [~lat@110.139.229.172] has joined #sbcl
12:23:59 -!- niven.freenode.net has set mode +o nikodemus
12:27:30 -!- drl [~lat@110.139.229.172] has quit [Quit: Leaving]
12:38:28 -!- nikodemus [~nikodemus@cs181063174.pp.htv.fi] has quit [Quit: This computer has gone to sleep]
12:47:11 -!- Phoodus [~foo@68.107.217.139] has quit [Ping timeout: 276 seconds]
13:02:39 nikodemus_ [~nikodemus@dsl-hkibrasgw4-fe5bdf00-15.dhcp.inet.fi] has joined #sbcl
13:19:13 LiamH [~none@pdp8.nrl.navy.mil] has joined #sbcl
13:44:14 -!- jaimef [jaimef@dns.mauthesis.com] has quit [*.net *.split]
13:44:16 -!- sbryant [~freenode@ghanima.slavasaur.com] has quit [*.net *.split]
13:45:04 jaimef [jaimef@dns.mauthesis.com] has joined #sbcl
13:48:24 sbryant [~freenode@ghanima.slavasaur.com] has joined #sbcl
13:57:54 -!- zyg [57e37c83@gateway/web/freenode/ip.87.227.124.131] has quit [Ping timeout: 265 seconds]
14:05:44 nyef [~nyef@c-174-63-105-188.hsd1.ma.comcast.net] has joined #sbcl
14:05:54 <nyef> G'morning all.
14:06:49 <Kryztof> yo
14:07:36 <nikodemus_> o/
14:07:55 <nyef> Hello nikodemus_.
14:07:56 <nikodemus_> nyef: what's you verdict re. symbol-value-in-thread.3?
14:08:03 <nikodemus_> your, even
14:08:19 <nyef> Several missing read barriers, one missing CAS lock, and it STILL doesn't work.
14:08:28 <nikodemus_> ouch
14:08:39 <nikodemus_> where's the missing CAS lock?
14:09:04 <nyef> And MAKE-LISP-OBJ isn't even remotely thread-safe (managed to get a GC fault once), and I've had a lockup in deadlock-detection 1.
14:09:16 <nyef> Umm... Around one of the waitqueue functions.
14:09:37 <nyef> It's only protected by the mutex, and later in the same function is the only hit to a waitqueue function NOT protected by the same mutex.
14:10:17 <nikodemus_> DOH. i see it
14:11:08 <nyef> It helps, but doesn't fix entirely.
14:11:49 <nyef> I don't think this test case is really stressing the SVIT function, it's stressing GC, MAKE-THREAD, JOIN-THREAD, and SEMAPHOREs.
14:13:18 <nikodemus_> yeah
14:13:42 <nyef> Is THREAD-YIELD supposed to act as a read barrier?
14:13:53 <nyef> (guts of with-cas-lock.)
14:14:22 <nyef> CAS in %%wait-for-mutex should have a read barrier.
14:14:42 <nyef> WAKEUP in condition-wait should have a read barrier.
14:14:47 <nikodemus_> what? isn't CAS an implicit barrier?
14:15:03 <nyef> You're not CASing each time through the loop.
14:15:17 <nyef> You're doing some funky thing where you only CAS if you have reason to expect it to succeed.
14:15:22 <nikodemus_> no, but we're never leaving the loop without CAS succeeding
14:15:41 <nyef> And without a read barrier, you have to wait for a random interrupt to force the read barrier.
14:16:35 <nikodemus_> huh
14:16:36 <nyef> On the whole, I'm not convinced that the extra logic helps, given that it needs a read barrier or is oblivious to state change until it takes an interrupt.
14:17:06 <pkhuong> nyef: since a successful CAS is a barrier, is a read barrier needed?
14:17:31 <pkhuong> compare-and-compare-and-swap is a classic way to do this, at least on x86oid.
14:17:34 <nyef> pkhuong: Yes, because it does a read to decide if it wants to CAS.
14:17:50 <nikodemus_> nyef: do you mean it could spin indefinitely?
14:18:27 <nyef> nikodemus_: No, because kernel interrupt handling can reasonably be expected to supply a barrier at unpredictable times.
14:19:45 <nikodemus_> ok. i'll simplify it to use just CAS without the volatile pre-read for now. (the volatile read is really a leftover/reflex from when i had it spin around spin-loop-hint without yielding)
14:20:17 <nyef> Still, all that, and things still don't quite work.
14:20:29 <pkhuong> nikodemus_: with ll/cc-based CAS, I don't know that it's useful anyway.
14:20:33 <nikodemus_> how about with futexes?
14:20:39 <pkhuong> *LL/SC
14:20:54 <nyef> With futexes it runs through no problem.
14:20:58 <nikodemus_> ok
14:21:28 <pkhuong> we go for a compare first to avoid taking exclusive access on cache lines/bus when it's probably not going to work. I don't think LL has that effect?
14:21:51 <nyef> pkhuong: I don't believe it does, no.
14:22:27 <nikodemus_> right -- and i don't think the pre-read helps measurably when failure is going to be followed by a context-switch anyways
14:22:35 <pkhuong> k.
14:22:41 <nyef> It's a per-core flag saying "if this cache line is seen written on the bus, don't overwrite it."
14:23:54 <nyef> So it'd probably have to hit the bus for the read, and only for the write if it is uncontested.
14:24:51 <nyef> What bugs me is that I've been unable to find any other plausible causes for lockups.
14:26:15 <nyef> Ah, good. SB!KERNEL is a private package. We can declare make-lisp-obj to be horribly unsafe on pointer objects if the GC is enabled and not have to worry about outside uses.
14:27:22 sbryant- [~freenode@ghanima.slavasaur.com] has joined #sbcl
14:33:41 -!- sbryant [~freenode@ghanima.slavasaur.com] has quit [*.net *.split]
14:37:38 <nikodemus_> nyef: given threads on PPC, we should have write-barriers in structure constructors and clos object initialization, no?
14:39:00 <nyef> ... why?
14:39:36 <nyef> If you're going to marshal them to another thread, you have to have a barrier anyway, surely?
14:41:40 <nikodemus_> i think "initialize object when seen by another thread is completely initialized" is something of a bare minimum of memory model
14:41:54 <nikodemus_> make that "an intialized object..."
14:43:26 <pkhuong> nikodemus_: is it GC and type safe?
14:43:34 <nyef> I think that any inter-thread marshalling of objects should already involve a barrier.
14:43:36 <pkhuong> I don't care about the rest.
14:43:59 <nyef> And since the GC does a full barrier for every thread, that much works.
14:44:21 <nikodemus_> i think this is perfectly idiomatic, and we even do it ourselves: (unless (cached-bar foo) (setf (cached-bar foo) (make-bar ...)) ; assume thread safe but racy -- we might waste an already created BAR, but other threads will always see a completely initialized one
14:45:01 <pkhuong> nyef: and type checks? Passing an array to another thread and managing to treat it as a CONS at regular safety levels sounds lossy to me.
14:46:49 <nyef> I guess the least we could do is add a write barrier to the end of p-a.
14:47:04 <nyef> At that point, the header is guaranteed to be written.
14:49:19 -!- sdemarre [~serge@91.176.187.200] has quit [Ping timeout: 248 seconds]
14:49:21 <nyef> nikodemus_: I think the correct answer there is to (prog1 (make-bar ...) (barrier (:write))).
14:50:27 <nikodemus_> nyef: i seriously think that's too onerous -- and there is already plenty of code that assumes that other threads cannot see partially initialized structures or similar
14:51:34 <nikodemus_> how expensive is a write barrier on non-x86oids?
14:52:15 slyrus [~chatzilla@adsl-99-35-53-209.dsl.pltn13.sbcglobal.net] has joined #sbcl
14:52:21 <nyef> Not sure, TBH.
14:52:32 <pkhuong> pretty sure I've seen barrier before publication on my end.
14:52:41 <nyef> On PPC it's a SYNC instruction.
14:52:51 <nyef> And our CAS implementation has to do both SYNC and ISYNC.
14:53:03 <pkhuong> there isn't really any difference between initialisation and mutation of structs.
14:53:56 -!- jaimef [jaimef@dns.mauthesis.com] has quit [Ping timeout: 276 seconds]
14:54:44 <nikodemus_> now that we're not just x86oids threads, we could really use an explicit memory model
14:54:52 jaimef [jaimef@dns.mauthesis.com] has joined #sbcl
14:55:10 <nikodemus_> can we lift the java one? i haven't read it, but apparently it isn't terrible
14:55:15 <pkhuong> nooo.
14:55:25 <pkhuong> it's basically impossible to implement outside x86
14:55:30 <pkhuong> much too stringent.
14:56:04 <nikodemus_> ok
14:57:09 <nikodemus_> does C or C++[0x] have one?
14:57:19 <pkhuong> 0x has something.
14:57:53 -!- whoops [u549@gateway/web/irccloud.com/x-imcngkgbzbtdsblr] has quit [Remote host closed the connection]
14:58:31 <nikodemus_> gcc.gnu.org/wiki/MemoryModel
15:01:20 <pkhuong> this seems to be about type-based alias analysis.
15:01:33 <nikodemus_> yeah, not what we're afternyef: on PPC, does CAS imply a w
15:01:36 <nikodemus_> aagh
15:01:52 <nikodemus_> nyef: on PPC, does CAS imply a write-barrier?
15:02:03 <nyef> Yes. The documentation even mentions this!
15:02:19 <nikodemus_> oh, good :)
15:04:26 <nyef> Actually, ISTR writing the documentation to say "these operations are all write barriers", and listed off a goodly number of thread functions.
15:05:13 <nyef> Heh. That lazy-cache trick? Use CAS to set the cache.
15:05:54 <pkhuong> barrier on publish!
15:06:02 <pkhuong> (and barrier on privatize)
15:07:20 <nikodemus_> nyef: the thing with the lazy cache trick is that portable code wants to do that
15:07:27 whoops [u549@gateway/web/irccloud.com/x-pcuyuxfhpkfegtig] has joined #sbcl
15:07:35 sdemarre [~serge@91.176.142.225] has joined #sbcl
15:07:38 <pkhuong> nikodemus_: portable code with threads?
15:08:03 <nikodemus_> assuming it works, it is pretty much the only way to write portable code that caches anything in an object that can be seen from multiple threads
15:08:24 <nikodemus_> if it doesn't work... well, then portable code is SOL
15:09:18 <nyef> So... expose CAS from B-T?
15:09:20 <pkhuong> If they want to do it portably, they need to lock.
15:09:23 <nikodemus_> we for example use it to update INFOdb without locking
15:09:30 <pkhuong> If that's too slow, get BX to expose atomic primitives.
15:11:14 <nyef> ... I thought the globaldb was one of the structures that was so badly thread-unsafe that we had to protect it with the big compiler lock?
15:13:21 <nikodemus_> oh, sorry. that rewrite-cache trick was used in hash-caches. i misrememeber
15:18:25 Quadrescence_ [~quad@unaffiliated/quadrescence] has joined #sbcl
15:19:10 <nyef> So, how bad would it be if we used a CAS-lock in the allocation sequence? Because that sort of thing plus moving the allocation pointer to a shared variable would allow us to use cheneygc for threads...
15:20:22 -!- Quadrescence [~quad@unaffiliated/quadrescence] has quit [Ping timeout: 258 seconds]
15:20:22 -!- Quadrescence_ is now known as Quadrescence
15:21:21 <pkhuong> it'd scale horribly, but would probably be good enough for a lot of people.
15:21:34 <pkhuong> Not Worse Than The Other Python
15:21:43 <nyef> Heh.
15:22:15 <pkhuong> what's the CAS lock for?
15:22:42 <nyef> Plausibly cheaper than a full lock, and easy enough to write inline in a VOP.
15:23:08 <pkhuong> What's the critical section?
15:23:32 <redline6561> nyef: I assume cheneygc+threads would be a compile-time option if it was done?
15:23:49 milanj [~milanj_@79-101-181-128.dynamic.isp.telekom.rs] has joined #sbcl
15:23:50 <nyef> pkhuong: Access to the allocation pointer.
15:23:59 <nyef> redline6561: Yeah, compile-time option.
15:24:23 <pkhuong> I need caffeine in my bloodstream for this.
15:25:42 <pkhuong> nyef: and we already have signal handlers, so we already handle GCing at random times.
15:25:53 <pkhuong> so we only need to protect against GC in the middle of allocation.
15:26:01 <nyef> p-a, remember?
15:26:16 <pkhuong> right.
15:27:17 <pkhuong> is that active on cheney platforms? What would happen if I had a simultaneous call to GC and allocation?
15:28:11 <nyef> Same as on gencgc, the allocation is in a p-a block, so the stop-for-gc signal gets deferred.
15:28:32 <milanj> hi, can someone take a quick look at http://paste.lisp.org/display/125882
15:28:45 <milanj> i'm getting this on amazon ec2 x86 instance
15:28:56 <milanj> is there any known issue on amazon ec2 machines
15:29:26 <milanj> btw. this is threaded program using zs3 library
15:29:37 <pkhuong> milanj: have you tried upgrading?
15:29:47 <foom> how about s/p-a/PCLSR/ for allocations (at least in the fastpath)?
15:29:48 <milanj> it's sbcl 1.0.52
15:29:52 <milanj> if you meant that
15:29:56 homie [~levgue@xdsl-78-35-130-40.netcologne.de] has joined #sbcl
15:30:07 <pkhuong> ISTR an issue with ec2. We even had a tiny C test case, but I don't know that they did anything with it.
15:30:34 <nyef> I haven't run into any issues with my ec2 instance yet.
15:30:39 <nyef> At least, no sbcl issues.
15:31:54 <milanj> hmm, anyone familiar with ways to overcome this ?
15:32:09 <milanj> i mean, from amazon side
15:33:34 <nyef> Well, you started off with a connection-reset-by-peer. then caught an "unexpected errno 12", then things blew up, right?
15:33:36 <pkhuong> no clue. And I can't find any note on the issue on my end.
15:34:05 <milanj> nyef, looks like
15:34:27 <milanj> pkhuong, I tried google for it, no success
15:35:50 <foom> errno 12 is out of memory...
15:36:17 <nyef> ... do you get that if you run out of FDs?
15:37:18 <pkhuong> nyef: so, if that hack works, it actually wouldn't be *that* hard to have tiny thread-local allocation pools?
15:37:38 <foom> no. errno 24 is too many open files.
15:37:42 <nikodemus_> i've run fine on ec2, both large and small instances -- but it's been a few months
15:39:01 <milanj> nikodemus_, multi-threaded ?
15:39:04 <foom> It looks to me like you actually just ran out of memory inside some bit of sbcl's sockets wrapper (getaddrinfo maybe), and
15:39:18 <nikodemus_> milanj: excessively so
15:39:19 <milanj> if this is connected with threads in a first place
15:39:20 <foom> sbcl didn't catch that properly, and then had a null pointer deref due to the failed allocation
15:39:21 <nyef> pkhuong: Umm... I have no idea.
15:39:27 antgreen [user@nat/redhat/x-yzjqcujqegcmlvcx] has joined #sbcl
15:39:45 <nikodemus_> milanj: the most common causes for memory faults are bugs in foreign code (ie. trying to write to a lisp vector and scribbling past the end, etc), type-errors in unsafe code, and lying to the compiler
15:40:14 <foom> "Memory fault at 0" -> null pointer deref.
15:40:36 <pkhuong> nyef: just treat unused pre-allocated pools as (dead) conses.
15:41:14 <nyef> pkhuong: Right, but there are bound to be further gotchas.
15:41:22 <nyef> It's plausible, at least.
15:41:43 beslyrus [~Brucio-12@adsl-99-35-53-209.dsl.pltn13.sbcglobal.net] has joined #sbcl
15:41:49 <nikodemus_> milanj: threads can easily make such things cause trouble that might be hidden when running single-threaded, for example because a GC just doesn't happen while you have references to corrupted objects
15:42:29 <pkhuong> nyef: and that's how we get to gencgc on x86 (:
15:42:59 <nikodemus_> but yeah, in this case figuring out the first error would be a productive first step
15:45:15 <milanj> I guess it starts with "Error couldn't read from #<SB-SYS:FD-STREAM for "socket 10.6.125.77:47348, peer: 72.21.211.130:80" {C827061}>: Connection reset by peer in thread #<THREAD RUNNING "
15:45:15 <milanj> "
15:46:04 <milanj> I'm not sure if "zs3" is using some foreign code down the way
15:46:11 <nikodemus_> that / the "Unexpected errno"
15:46:12 -!- antgreen [user@nat/redhat/x-yzjqcujqegcmlvcx] has quit [Read error: Connection reset by peer]
15:46:42 <nikodemus_> milanj: can you reproduce this?
15:47:10 <milanj> I'm sure i can, since I got this on 4 machines
15:47:34 <nikodemus_> (that is one of the most important steps towards nailing down an issue like this)
15:47:40 <pkhuong> neither errors are from SBCL itself, right?
15:47:42 <milanj> btw. I'm using core dumped with save-lisp-and-die, but I guess that doesn't makes any difference
15:48:09 antgreen [user@nat/redhat/x-wizwfeeqgsuvebxq] has joined #sbcl
15:49:06 <nikodemus_> the unexpeted errno /could/ be from sbcl's get-protocol-by-name
15:50:22 <nikodemus_> milanj: please try to reproduce it. if it turns out you can repeat this in <10 minutes, the debugging approach is going to be pretty different than if it takes hours of running to reproduce
15:51:00 <milanj> it happened after 20-30 minutes, let me try
15:51:08 <milanj> do I need to make some code change to catch it better ?
15:51:25 <nyef> Reproduce first, code change later?
15:52:33 <nikodemus_> yeah
15:52:56 <pkhuong> mm... where is the buffer freed in the getprotobyname_r path?
15:52:57 <nyef> There's a chance that it's the sort of bug that goes away if you try to look at it too closely.
15:55:12 <pkhuong> I can see the out of memory resulting in a null pointer that's written to by getprotobyname_r
16:00:38 <nikodemus_> yeah, it looks like it leaks
16:01:18 <pkhuong>  http://paste.lisp.org/+2P4Q/1 for my annotations
16:01:45 <foom> easy fix for milanj: don't pass :protocol
16:05:21 <pkhuong> buflen should be an in/out pointer, like fortran library do.
16:05:24 <pkhuong> oh well.
16:10:08 <jaimef> sbcl.core?
16:10:11 <jaimef> seriously?
16:10:46 <nyef> Seriously.
16:10:57 *nyef* blames the previous administration.
16:11:09 <jaimef> thought it was odd a core file was in the bin dir as I had not run it as root
16:11:30 <foom> We call our sbcl cores ".dxl", by happy accident. :)
16:11:45 <Kryztof> that reminds me of the story of the person who was carefully constructing the core of their business case in the appropriately-named file
16:12:05 <jaimef> ok get to watch it spend another day compiling on this slow hardware
16:12:20 <nyef> ... day?
16:13:02 <pkhuong> recompiling *sbcl*?
16:13:09 <pkhuong> it's hotpatchable.
16:13:27 <pkhuong> oh wait, another person
16:13:52 <nyef> That's a point. If you have a build directory, you could plausibly pick up from make-genesis-2, which is the tail-end of the last host phase.
16:14:30 <pkhuong> or slam.sh which is already much faster
16:15:04 <nyef> Right, slam wouldn't be much slower than picking up from make-genesis-2 if you have an after-xc core.
16:15:12 <Kryztof> although all that is too late if jaimef has restarted
16:15:16 <nyef> If you don't have an after-xc core, you'll need to go the genesis-2 route.
16:15:51 <Kryztof> but the last time it took a day to compile sbcl for me was in 2003 on a then-ancient HPPA 1.x machine
16:15:57 churib [~churib@95.156.194.105] has joined #sbcl
16:16:37 <Kryztof> so, I look forward to hearing just what exotic hardware is being used
16:17:23 <nyef> Mmm. My slowest build environment is a dual-core 800MHz G4. Takes a couple hours, maybe.
16:19:41 <nikodemus_> whether it is at the root of milanj's issue or not, i have a fix for the memory leak going in asap
16:22:25 <milanj> nikodemus_, I can patch on one of servers and let it work a bit
16:23:20 <milanj> no issues in last 15 minutes on instances i got it previously
16:25:18 <nikodemus_> http://paste.lisp.org/display/125882#2 # hotpatch, but i'd really like to see it reproduced before you patch if possible
16:27:45 <nikodemus_> nyef: i think thread-waiting-for might be needing barriers as well
16:28:00 <nikodemus_> i'll push a patch to github for you to test soonish
16:29:17 <nikodemus_> s/test/review and maybe test/
16:31:36 <nyef> Cool, thanks.
16:32:08 <pkhuong> nikodemus_: the *test* should be barriered up
16:32:26 <pkhuong> (in wait-for)
16:32:46 <nikodemus_> pkhuong: and a :write barrier on the other side
16:34:22 <nikodemus_> at least my understanding is that a standalone :read barrier doesn't do much good unless there's a corresponding :write barrier?
16:34:46 <pkhuong> CAS is an implicit barrier
16:35:32 <homie> is a loop around non-interruption-safe code makeing it uninterruptible ?
16:37:16 attila_lendvai [~attila_le@87.247.10.114] has joined #sbcl
16:37:16 -!- attila_lendvai [~attila_le@87.247.10.114] has quit [Changing host]
16:37:16 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl
16:38:49 <nikodemus_> homie: i don't understand the question
16:39:54 <homie> well i seem to be unable to state the question better .....
16:42:43 <nikodemus_> to be specific, i have no idea what you mean by non-interruption-safe code
16:43:07 <nikodemus_> asynch-interrupt-unsafe code?
16:43:17 <nikodemus_> asynch-interrupt-safe code?
16:43:22 <nikodemus_> code inside WITHOUT-INTERRUPTS?
16:43:31 <nikodemus_> code inside WITH-INTERRUPTS?
16:43:36 <nikodemus_> something different?
16:45:47 <homie> code inside without-interrupts, but the code itself is non interrupt-safe
16:46:17 <homie> like cond-wait for example
16:47:54 <nyef> You mean calling a synchronization function with interrupts locked out?
16:49:34 tsuru` [~charlie@adsl-74-179-25-191.bna.bellsouth.net] has joined #sbcl
16:52:07 <nikodemus_> ok (loop (without-interrupts ...)) will be interruptible on each exit/entry to the WITHOUT-INTERRUPTS, but not inside it. if an interrupt arrives while inside the WITHOUT-INTERRUPTS, it will be handled when WITHOUT-INTERRUPTS is exited
16:52:38 <nikodemus_> whereas (without-interrupts (loop ...)) will not be
16:52:59 <nikodemus_> assuming there's no WITH-LOCAL-INTERRUPTS, ALLOW-WITH-INTERRUPTS/WITH-INTERRUPTS involved
16:53:22 <nikodemus_> does that answer your question?
16:55:18 <homie> yes, thank you
16:55:39 <homie> waoh
16:56:24 <homie> ok so (without-interrups (loop....(cond-wait....))) will not be interruptible....ok
16:58:57 <nyef> Was it that without-gcing implied without-interrupts, or without-interrupts implied without-gcing?
17:00:08 <nikodemus_> without-gcing implies without-interrupts
17:00:11 <nikodemus_> iirc
17:00:21 <nikodemus_> the other, definitely not
17:18:39 <milanj> nikodemus_, http://paste.lisp.org/display/125882#3
17:20:25 <nikodemus_> milanj: is the process still up?
17:20:46 <milanj> yes
17:24:58 <nikodemus_> can you compare the RSS (via eg top) to a similar process in a healthy state?
17:25:57 borkman [~user@S0106001111de1fc8.cg.shawcable.net] has joined #sbcl
17:29:41 <milanj> looks like process ate all of memory
17:34:49 <nikodemus_> ok. then it very likely current git HEAD -- or the hotpatch i pasted -- will fix the issue
17:35:07 <nikodemus_> s/it/it is/
17:37:07 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 245 seconds]
17:38:22 <milanj> ok, I will try with it, thanks for your time
17:41:10 -!- Blkt [~user@89-96-199-46.ip13.fastwebnet.it] has quit [Remote host closed the connection]
17:58:45 drl [~lat@110.139.229.172] has joined #sbcl
18:07:24 -!- antgreen [user@nat/redhat/x-wizwfeeqgsuvebxq] has quit [Remote host closed the connection]
18:10:11 <homie> does the sbcl compiler have a name, like python of cmucl ?
18:10:48 -!- drl [~lat@110.139.229.172] has quit [Quit: Leaving]
18:11:32 <redline6561> homie: python is still used, as far as i know.
18:11:45 <homie> oh ok
18:12:55 <nyef> Yeah, we haven't renamed the compiler.
18:13:17 <redline6561> It would be interesting to know the major improvements to python (i.e. not ports to architectures, threading, extensions to the standard) such as overhauls to type propagation or various anaylsis passes.
18:13:19 <nyef> ... if I ever do a re-implement of the compiler, though, it'll probably be called "anaconda".
18:13:36 <homie> hmm, i'd have proposed alligator!
18:13:37 <redline6561> 10 years results in a lot of churn, I would think.
18:13:38 <homie> lol
18:15:14 <nikodemus_> there hasn't been anything massive. the biggest thing is alexey's split of continuation into ctran and lvar components way back when
18:17:02 <nyef> ... Which reminds me, I think we may want to merge ctran and lvar together... d-:
18:17:40 <redline6561> Hahaha.
18:18:23 <nyef> (Or, more accurately, if we have an LVAR for "control dependency", we should be able to dispense with intra-block CTRANs, yielding a DAG representation of code within a block.)
18:19:20 <homie> who does use a body go without tagbody ?
18:19:56 <nikodemus_> ?
18:20:38 <homie> i got warnings in the sbcl compile, somewhere, it was unable to find the tag associated to the body, and it seems there's none....
18:21:01 <nyef> A few things have implicit tagbody, but if you can narrow things down a little more...?
18:21:37 <homie> oh, it's a do loop around a body, and the compiler complains, there's no loop tag
18:21:53 <homie> it has a go in the body...
18:25:47 -!- hlavaty [~user@91-65-217-112-dynip.superkabel.de] has quit [Read error: Operation timed out]
18:32:57 <nikodemus_> nyef: https://github.com/nikodemus/SBCL/tree/nyef-review
18:34:24 Qworkescence [~quad@unaffiliated/quadrescence] has joined #sbcl
18:34:53 <homie> debug-impure.lisp has a infinit error error, so an infinite loop.....
18:35:02 <milanj> is it expected to see bunch of test failures on "threads.pure.lisp" (git head) ?
18:35:39 <nikodemus_> milanj: not as such. what platform and what build options?
18:36:54 <milanj> 32bit centos 5.4, just, just :sb-thread in customize-target-features.lisp
18:37:34 <milanj> (amazon ec2 machine)
18:37:57 <nikodemus_> milanj: threads are enabled by default on linux these days
18:38:03 <nikodemus_> but that shouldn't really matter
18:38:25 <nikodemus_> can you lisppaste the failures?
18:38:31 <nyef> nikodemus_: That's clearly going to take a full build. s/aught/ought/ in target-thread?
18:38:58 <homie> wtf, test infinit-error-protection throws me into debugger in tests, not recoverable...
18:39:19 <homie> i typed continue and it opened vim for me.....
18:39:21 <milanj> http://paste.lisp.org/display/125886
18:39:22 <homie> lol
18:39:40 <redline6561> Is there a rough thought as to when threads will be enabled by default in darwin and other POSIX SBCLs? "After the bugs have been shaken out", I guess?
18:39:46 <nikodemus_> nyef: my grammar isn't sufficient to tell me which is right, so i'll take yous on faith
18:40:05 <nikodemus_> redline6561: after known issues with stability have been fixed
18:40:22 <nyef> I'd probably have gone with "should" instead of "ought" or "aught", really.
18:40:54 <redline6561> Got it. Also, a curiosity from reading the nyef-review commit message: What's the perf. hit like with the addition of barriers for PPC?
18:41:04 <nikodemus_> milanj: wow
18:41:37 <nikodemus_> can you run "sh run-tests.sh threads.pure.lisp --break-on-failure" and paste from the part where it first breaks
18:42:12 <nyef> redline6561: Perf hit on which platforms?
18:42:16 <nikodemus_> for x86oids, zero. read and write barriers are nops on x86oids
18:42:25 <redline6561> Really? Interesting.
18:42:29 <nyef> UNTRUE.
18:42:46 <nyef> At least, I'm fairly sure that's untrue.
18:42:55 <milanj> nikodemus_, http://paste.lisp.org/display/125886#1
18:43:25 <redline6561> nyef: I'm on x86 but I figured there would be hits on the same order of magnitude for both archs.
18:43:29 <nikodemus_> nyef: in x86-64/system.lisp the vops have no bodies
18:43:30 <redline6561> And was curious about both.
18:43:38 <nyef> Ah, okay, read and write are NOPs.
18:43:49 <nyef> It's the full memory barrier that isn't.
18:43:54 <nikodemus_> only :memory does something
18:44:06 <nyef> I stand (well, sit) corrected.
18:45:08 <homie> ok up until make-genesis-2 there seems no bugs....
18:45:30 <nyef> And on PPC, the memory, read, and write barriers all emit a SYNC instruction.
18:45:52 <nyef> Because the PPC architecture is a bad fit to the barrier semantics we use.
18:46:27 <nyef> (We specifically assume the alpha memory model, as it's the nastiest one, and is what the linux kernel does.)
18:46:48 <redline6561> Ah yes. I remember reading a little about that somewhere. LWN probably.
18:47:37 <homie> ok some inlinings were not possible....
18:48:28 <nikodemus_> milanj: http://paste.lisp.org/display/125886#2 # put this into threads.pure.lisp and rerun with --break-on-failures
18:51:42 <nikodemus_> milanj: if that's a small amazon instance, some failures are probably to be expected because it will likely croak before conceeding to spawn the ungodly amounts of threads some of the tests spawn... but those are really odd ones to break, so something else is going on
18:53:16 <milanj> http://paste.lisp.org/display/125886#3
18:53:18 <milanj> yes, it's small
18:53:48 <milanj> i can test it on large if it makes any change ..
18:53:50 <nikodemus_> can you add *features* there?
18:54:09 <nikodemus_> just to make sure...
18:54:46 <nikodemus_> (you can run the sbcl in the toplevel dir with "sh run-sbcl.sh")
18:56:01 <milanj> http://paste.lisp.org/display/125886#4
19:00:14 <nikodemus_> that looks quite sane
19:02:55 <nikodemus_> http://paste.lisp.org/display/125886#5
19:03:40 <nikodemus_> arg, no, wrong testcase
19:06:38 <nikodemus_> http://paste.lisp.org/display/125886#6
19:07:00 <nikodemus_> if that returns :UNWIND, something is badly wrong with your sbcl
19:07:01 <homie> is deleting unreachable code notes ok in build ?
19:07:07 <nikodemus_> yes
19:07:18 <homie> ok
19:09:09 <nikodemus_> milanj: i need to head home, but a build where you get :UNWIND there has serious problems -- possibly due to a broken libc or kernel
19:09:20 <milanj> it return :slept
19:09:29 <nikodemus_> ok, that's good
19:09:50 <milanj> anyway, I will try to build it on proper box
19:10:25 <nikodemus_> so it's /plausible/ that it's just futexes that are more likely than typical to give a bogus wakeup, which is strange but less serious than signals going through when they're masked...
19:13:42 <nyef> How likely is it that we depend on getting an occasional spurious wakeup on a futex?
19:21:46 <nikodemus_> i don't think we do, but who knows?
19:22:15 <nikodemus_> easy enough to test, i guess
19:22:41 <nikodemus_> but now i /really/ need to run...
19:22:56 <nyef> Fair enough. Enjoy your commute.
19:30:47 <milanj> btw. nikodemus_, if you are still there, judging by top (lookin at old and patched sbcl), that get-protocol-by-name patch does fix things
19:31:47 -!- nikodemus_ [~nikodemus@dsl-hkibrasgw4-fe5bdf00-15.dhcp.inet.fi] has quit [Ping timeout: 252 seconds]
19:34:58 nikodemus [~Nikodemus@GGZYYYMMCCXV.gprs.sl-laajakaista.fi] has joined #sbcl
19:34:58 -!- ChanServ has set mode +o nikodemus
19:35:46 <homie> for which parts ?
19:35:55 <homie> network ?
19:36:00 <homie> sockets ?
19:37:30 <milanj> this one: http://paste.lisp.org/display/125882
19:38:15 <jaimef> loading restas results in sbcl hanging in select
19:40:15 <nikodemus> jaimef: uniterruptible?
19:40:38 <jaimef> yeah
19:40:39 <nikodemus> uninterruptible, even
19:40:46 <jaimef> SIGINFO is all that replies
19:42:41 <nikodemus> jaimef: someone has been naughty and has frobbed *on-dangerous-i-forget-the-exact-name*
19:42:59 <nikodemus> set it to :error and you'll get a backtrace right before that fatal select
19:44:09 <jaimef> hmm 1.0.51. let me upgrade to 53 and see if it helps
20:11:07 gabnet [~gabnet@245.23.67.86.rev.sfr.net] has joined #sbcl
20:29:06 -!- nikodemus [~Nikodemus@GGZYYYMMCCXV.gprs.sl-laajakaista.fi] has quit [Quit: Leaving]
20:29:23 nikodemus [~Nikodemus@cs181063174.pp.htv.fi] has joined #sbcl
20:29:23 -!- ChanServ has set mode +o nikodemus
20:53:01 prxq [~mommer@mnhm-5f75da7a.pool.mediaWays.net] has joined #sbcl
21:09:55 -!- nyef [~nyef@c-174-63-105-188.hsd1.ma.comcast.net] has quit [Quit: G'night all.]
21:29:57 -!- gabnet [~gabnet@245.23.67.86.rev.sfr.net] has quit [Quit: Quitte]
21:31:31 udzinari [~user@ip-89-102-12-6.net.upcbroadband.cz] has joined #sbcl
21:32:23 -!- nikodemus [~Nikodemus@cs181063174.pp.htv.fi] has quit [Quit: Leaving]
21:42:12 *akovalenko* would like to have "during-xc-core" (so when xc host runs with disabled debugger, it would dump before exit on errors, and set up a toplevel function to go on from the last compiled file.. maybe reenabling debugger in that case)
22:10:38 -!- udzinari [~user@ip-89-102-12-6.net.upcbroadband.cz] has quit [Remote host closed the connection]
22:45:23 -!- LiamH [~none@pdp8.nrl.navy.mil] has quit [Quit: Leaving.]
23:02:39 -!- dsp_ [~tt@lebesgue.cowpig.ca] has quit [Read error: Operation timed out]
23:03:54 dsp_ [~tt@lebesgue.cowpig.ca] has joined #sbcl
23:20:46 -!- prxq [~mommer@mnhm-5f75da7a.pool.mediaWays.net] has quit [Quit: Leaving]
23:24:10 -!- Kryztof [~user@81.174.155.115] has quit [Ping timeout: 260 seconds]
23:41:49 -!- Qworkescence [~quad@unaffiliated/quadrescence] has quit [Quit: Leaving]
23:42:04 -!- milanj [~milanj_@79-101-181-128.dynamic.isp.telekom.rs] has quit [Quit: Leaving]