00:01:23 <stassats> works from C
00:07:26 <stassats> if induce a memory fault on x86, it just enters the debugger, ppc drops to ldb
00:07:32 <danlentz> woohoo -- 1.1.10 with all tests //apparent-success on darwin 12.4.0
00:07:51 <stassats> (sb-sys:sap-ref-8 (sb-sys:int-sap 0) 0) => gc signals blocked
00:09:54 <danlentz> (is it in good taste to say a general "thank-you" to the sbcl team?)
00:11:10 <stassats> if you want to
00:11:17 <danlentz> :)
00:13:37 <danlentz> Are concurrency matters on darwin generally considered to be as reliable on darwin as linux?  I recall a few rumblings on the devel list a while back, although iirc it was specifically wrt signals
00:14:10 <pkhuong> danlentz: it's much closer now that we basically don't use the OS for anything but spawning threads.
00:14:32 <stassats> the fact the it faults consistently at 10f suggests that it's not garbage related
00:15:01 <pkhuong> stassats: seriously, find the PC and disassemble.
00:15:59 <stassats> that doesn't always make it clear, especially since it's in C
00:16:29 <pkhuong> the segfault is in C?
00:16:36 <stassats> yes, in getaddrinfo
00:17:02 <stassats> getaddrinfo => __check_pf
00:17:29 <danlentz> if I had some concurrent code that Im positive passed extensive unit tests ~1.25 years ago, but now does not, would it be likely that some underlying sbcl internl has changed?
00:17:58 <stassats> osx may have changed and sbcl didn't
00:18:12 <stassats> they are known for such things
00:18:23 <danlentz> vague question I know but basically just a sanity check before I get too worked up trying to fix it
00:19:36 <danlentz> good point -- the environment I was using was snow-leapord shen last it passed
00:24:05 <danlentz> is there a place that collects statistic on the approx number of users per platform?  ( to get some idea how many others might also run sbcl/moiuntain-lion? ) i.e. If there are a large number (for some value of large) then the assumption to start with would most likely be that the error is in my code
00:25:10 <stassats> there are a lot of users on x86-64/linux and i manage regularly to find bugs here
00:25:13 <pkhuong> even if there are many such users, there's no reason to believe they particularly exercise concurrency features.
00:25:45 <stassats> such a metric wouldn't prove anything
00:27:28 -!- Bike [~Glossina@174-25-37-243.ptld.qwest.net] has quit [Read error: Connection reset by peer]
00:27:58 <danlentz> no, I guess not.  Its just hard to come conclusion that the problem is not with me, since 99.999999% of the time it is...
00:28:26 -!- davazp [~user@92.251.185.254.threembb.ie] has quit [Remote host closed the connection]
00:29:11 <pkhuong> if you describe the problem and have some code, someone might be able to help
00:29:58 Bike [~Glossina@174-25-37-243.ptld.qwest.net] has joined #sbcl
00:32:04 <danlentz> specifically the problems I'm seeing are with dr mclain's dstm code (written for lisp works) but requiring only very light porting to sbcl.  So probably he is the first one I should bother.  But I appreciate the offer b/c his answer is likely to be that it still works fine on lispworks so I've been hesitant to raise the issue.
00:33:29 <danlentz> pkhuong: is your mini-stm package still operational in recent sbcl configurations?
00:33:45 <pkhuong> danlentz: should be, for what's in there.
00:34:00 <pkhuong> simple problem: the transaction hash table isn't synchronised.
00:34:37 <danlentz> you mean with mcclain's dstm?
00:34:45 <pkhuong> the one on your github
00:35:01 <danlentz> no crap -- thanks a million
00:35:54 <danlentz> embarrassing, but very much appreciated -- I'll test and let you know
00:37:25 <pkhuong> I'd use special bindings instead of a dictionary to map from thread to transactions
00:39:39 <pkhuong> I also don't see where you avoid concurrent commit.
00:40:42 <danlentz> which repo are you looking t? dstm-collections?
00:40:50 <pkhuong> cl-ctrie
00:43:54 *stassats* sees how get-address-info can be made to cons less in the meantime
00:44:14 -!- ASau [~user@p4FF96FA0.dip0.t-ipconnect.de] has quit [Read error: Connection reset by peer]
00:45:31 <danlentz> is symbol-value-in-thread acceptable to use?
00:47:27 <pkhuong> danlentz: compare-and-swap doesn't seem to be used correctly. cas returns the previous value.
00:48:41 <danlentz> non nil
00:48:58 <danlentz> let me check tho
00:49:54 <pkhuong> looks like LW returns a boolean success value.
00:51:28 -!- psilord [~psilord@c-69-180-173-249.hsd1.mn.comcast.net] has quit [Ping timeout: 264 seconds]
00:51:37 <pkhuong> the rest looks a bit hairy but correct. I'll trust Herlihy with the details.
00:52:30 Quadrescence [~quad@unaffiliated/quadrescence] has joined #sbcl
01:03:27 prxq_ [~mommer@mnhm-5f75c881.pool.mediaWays.net] has joined #sbcl
01:04:48 -!- prxq [~mommer@mnhm-590c373f.pool.mediaWays.net] has quit [Read error: Operation timed out]
01:06:44 <pkhuong> danlentz: you'll want to check for (eq (cas ...) old) instead, in SBCL. The info is useful, when CAS can't fail spuriously.
01:06:50 <danlentz> yes absolutely correct
01:07:11 psilord [~psilord@c-69-180-173-249.hsd1.mn.comcast.net] has joined #sbcl
01:07:34 <danlentz> lisp works cad is apparently more like compare-and-set
01:09:10 <danlentz> just out of curiosity what makes hash table a less preferred mechanism for holding the thread local var?
01:09:18 <pkhuong> more complicated for no reason.
01:09:24 <pkhuong> specials are thread local vars, in SBCL.
01:09:43 drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has joined #sbcl
01:10:05 -!- Quadrescence [~quad@unaffiliated/quadrescence] has quit [Quit: This computer has gone to sleep]
01:16:10 <danlentz> pkhuong: pushed updated/working version to github with also an attribution to thank you for the assistance.   It is very much appreciated.
01:16:43 <danlentz> (still using hash table for the moment tho)
01:19:41 <pkhuong> pretty sure you need the barriers in theory
01:21:36 LiamH [~none@pool-74-96-2-44.washdc.east.verizon.net] has joined #sbcl
01:22:36 <danlentz> even under x86-64?
01:24:13 <pkhuong> TSO is strong. Maybe not.
01:26:59 <danlentz> I was a bit unsure about the matter when I looked into it, but what the consensus seemed to be was that for x8664 it wasn't needed.  Alpha was IIRC the arch that seemed to have consensus that it was needed.  But the whole debate, to be honest, was a little bit over my head.
01:29:20 <pkhuong> on lapha, you'd need barriers everywhere.
01:29:41 <pkhuong> On x86, the implicit barrier in CAS might suffice.
01:30:32 <pkhuong> but I'm not sure: the write to new happens after acquisition, and transactions don't really release locations.
01:30:39 echo-area [~user@182.92.247.2] has joined #sbcl
01:33:01 <stassats> well, that doesn't make sense, the place where it crashes doesn't seem to use any memory passed from lisp
01:33:12 <stassats> everything is allocated on the C stack
01:33:31 <stassats> maybe there's some 64-bit/32-bit issue with stack location
01:33:32 <pkhuong> stack too small?
01:34:08 <stassats> unlikely, it's just a handful of variables
01:36:50 <pkhuong> danlentz: ah, on x86, I'm pretty sure if the write to transaction-state (to record success/failure) is visible, then so are all the previous writes, so you're good.
01:40:17 <stassats> th->alien_stack_pointer is initialized differently on :STACK-GROWS-DOWNWARD-NOT-UPWARD and not
01:40:34 <danlentz> great. I'm only interested in sbcl/x86 -- portability is something I haven't done much work to achieve.  A battle for another day.  But that does set my mind at ease wrt barriers.
01:41:49 <stassats> but it seems to be correct
01:44:36 <stassats> and that stack is probably only used for allocating aliens, and on x86oids only
01:52:10 <pkhuong> stassats: do you still have issues on frlock?
01:52:28 <stassats> let me see
01:52:52 <stassats> yep
02:00:11 <stassats> could it be an alignment issue?
02:01:34 <stassats> time to write my own c foreign functions and test those hypotheses
02:02:54 <pkhuong> alignment is usually sigbus
02:37:20 -!- christoph_debian [~christoph@ppp-188-174-122-198.dynamic.mnet-online.de] has quit [Read error: Operation timed out]
02:40:38 -!- LiamH [~none@pool-74-96-2-44.washdc.east.verizon.net] has quit [Quit: Leaving.]
02:44:58 pranavrc [~pranavrc@unaffiliated/pranavrc] has joined #sbcl
02:52:33 christoph_debian [~christoph@ppp-188-174-122-152.dynamic.mnet-online.de] has joined #sbcl
02:53:16 -!- Bike [~Glossina@174-25-37-243.ptld.qwest.net] has quit [Ping timeout: 264 seconds]
03:05:36 Bike [~Glossina@174-25-37-243.ptld.qwest.net] has joined #sbcl
03:43:33 Bike_ [~Glossina@67-5-211-203.ptld.qwest.net] has joined #sbcl
03:43:59 -!- Bike [~Glossina@174-25-37-243.ptld.qwest.net] has quit [Disconnected by services]
03:44:02 -!- Bike_ is now known as Bike
03:51:55 Bike_ [~Glossina@67-5-223-248.ptld.qwest.net] has joined #sbcl
03:53:50 -!- Bike [~Glossina@67-5-211-203.ptld.qwest.net] has quit [Ping timeout: 240 seconds]
03:56:21 -!- Bike_ is now known as Bike
04:00:27 Bike_ [~Glossina@67-5-231-61.ptld.qwest.net] has joined #sbcl
04:03:22 -!- Bike [~Glossina@67-5-223-248.ptld.qwest.net] has quit [Ping timeout: 246 seconds]
04:03:32 -!- Bike_ is now known as Bike
04:48:16 -!- stassats [~stassats@wikipedia/stassats] has quit [Ping timeout: 240 seconds]
05:17:27 Bike_ [~Glossina@67-5-231-61.ptld.qwest.net] has joined #sbcl
05:18:45 -!- Bike [~Glossina@67-5-231-61.ptld.qwest.net] has quit [Ping timeout: 264 seconds]
05:22:29 -!- Bike_ is now known as Bike
05:40:58 benkard [~benkard@tmo-107-85.customers.d1-online.com] has joined #sbcl
05:46:14 -!- Bike [~Glossina@67-5-231-61.ptld.qwest.net] has quit [Ping timeout: 240 seconds]
05:46:50 Bike [~Glossina@67-5-231-61.ptld.qwest.net] has joined #sbcl
05:46:57 -!- benkard [~benkard@tmo-107-85.customers.d1-online.com] has quit [Quit: Textual IRC Client: www.textualapp.com]
05:53:46 Quadrescence [~quad@unaffiliated/quadrescence] has joined #sbcl
06:05:39 -!- Quadrescence [~quad@unaffiliated/quadrescence] has quit [Quit: This computer has gone to sleep]
06:18:45 -!- Bike [~Glossina@67-5-231-61.ptld.qwest.net] has quit [Ping timeout: 245 seconds]
06:30:24 ASau [~user@p4FF96FA0.dip0.t-ipconnect.de] has joined #sbcl
06:30:41 Bike [~Glossina@67-5-231-61.ptld.qwest.net] has joined #sbcl
06:36:48 -!- kanru [~kanru@118-163-10-190.HINET-IP.hinet.net] has quit [Remote host closed the connection]
07:08:30 Quadrescence [~quad@c-24-4-5-176.hsd1.ca.comcast.net] has joined #sbcl
07:08:31 -!- Quadrescence [~quad@c-24-4-5-176.hsd1.ca.comcast.net] has quit [Changing host]
07:08:31 Quadrescence [~quad@unaffiliated/quadrescence] has joined #sbcl
07:19:46 kanru [~kanru@118-163-10-190.HINET-IP.hinet.net] has joined #sbcl
07:21:50 -!- Bike [~Glossina@67-5-231-61.ptld.qwest.net] has quit [Ping timeout: 240 seconds]
07:22:08 -!- drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has quit [Read error: Connection reset by peer]
07:22:46 drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has joined #sbcl
07:25:14 -!- Quadrescence [~quad@unaffiliated/quadrescence] has quit [Quit: This computer has gone to sleep]
07:28:26 -!- drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has quit [Remote host closed the connection]
07:29:03 drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has joined #sbcl
07:33:20 -!- drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has quit [Ping timeout: 245 seconds]
07:51:08 loke [~loke@203.127.16.194] has joined #sbcl
07:51:15 <loke> Shouldn't the following give an error?
07:51:16 <loke> (make-pathname :directory '(:absolute "a" "b") :name "c/d")
07:53:08 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl
08:13:28 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Quit: Leaving.]
08:36:06 daimrod [daimrod@sbrk.org] has joined #sbcl
08:37:03 Quadrescence [~quad@unaffiliated/quadrescence] has joined #sbcl
08:41:42 -!- Quadrescence [~quad@unaffiliated/quadrescence] has quit [Client Quit]
09:14:31 attila_lendvai [~attila_le@84-236-118-220.pool.digikabel.hu] has joined #sbcl
09:14:31 -!- attila_lendvai [~attila_le@84-236-118-220.pool.digikabel.hu] has quit [Changing host]
09:14:31 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl
09:16:21 -!- ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has quit [Remote host closed the connection]
09:19:17 davazp [~user@31.200.149.34] has joined #sbcl
09:49:36 -!- pranavrc [~pranavrc@unaffiliated/pranavrc] has quit [Quit: Ping timeout: ]
10:09:13 -!- echo-area [~user@182.92.247.2] has quit [Remote host closed the connection]
10:13:08 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Quit: Leaving.]
10:16:04 -!- davazp [~user@31.200.149.34] has quit [Ping timeout: 264 seconds]
10:21:35 attila_lendvai [~attila_le@84-236-118-220.pool.digikabel.hu] has joined #sbcl
10:21:35 -!- attila_lendvai [~attila_le@84-236-118-220.pool.digikabel.hu] has quit [Changing host]
10:21:35 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl
10:27:34 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Quit: Leaving.]
11:05:35 LiamH [~none@pool-74-96-2-44.washdc.east.verizon.net] has joined #sbcl
11:06:12 -!- LiamH [~none@pool-74-96-2-44.washdc.east.verizon.net] has quit [Client Quit]
11:25:27 <Krystof> loke: at some point, if you try to use it, yes
11:25:32 <Krystof> but not necessarily just by creating it
11:26:08 <loke> Krystof: problem is that the namestring will become /a/b/c/d, which is wrong
11:26:18 <loke> getting the namestring should (IMHO) raise an exception
11:26:30 <loke> since it's not actually a valid namestring for that filesystem
11:27:31 <Krystof> I agree that creating a namestring of /a/b/c/d is wrong
11:27:47 <Krystof> what should I think happen is the same as when creating something with :name "c.d"
11:43:54 ASau` [~user@p5797EFD9.dip0.t-ipconnect.de] has joined #sbcl
11:45:23 -!- ASau [~user@p4FF96FA0.dip0.t-ipconnect.de] has quit [Ping timeout: 246 seconds]
12:15:07 pranavrc [~pranavrc@122.164.104.2] has joined #sbcl
12:15:07 -!- pranavrc [~pranavrc@122.164.104.2] has quit [Changing host]
12:15:07 pranavrc [~pranavrc@unaffiliated/pranavrc] has joined #sbcl
12:18:45 <Blkt> loke: http://www.lispworks.com/documentation/HyperSpec/Body/f_mk_pn.htm says it has no exceptional situations, so it can't raise an error
12:18:58 -!- pranavrc [~pranavrc@unaffiliated/pranavrc] has quit [Read error: Connection reset by peer]
12:30:51 pranavrc [~pranavrc@unaffiliated/pranavrc] has joined #sbcl
13:12:58 stassats [~stassats@wikipedia/stassats] has joined #sbcl
13:27:10 segv- [~mb@95-91-241-60-dynip.superkabel.de] has joined #sbcl
13:31:54 <Krystof> Blkt: that's not actually true
13:32:02 <Blkt> why?
13:32:57 <Krystof> because exceptional situations are the minimum situations in which conditions must be signalled
13:33:01 <Krystof> not the maximum
13:35:27 <Blkt> mmm
13:35:29 <Blkt> I'm not following
13:37:07 <fe[nl]ix> the exceptional situations in the standard are those in which an error MUST be signaled, but the implementation MAY signal an error in other situations
13:37:10 <Krystof> the exceptional situations section says "under these conditions, these exceptions must/may/should happen"
13:37:29 <Krystof> it does not say "under all other conditions, no exceptions at all must happen"
13:37:35 nyef [~nyef@c-50-157-244-41.hsd1.ma.comcast.net] has joined #sbcl
13:37:45 <fe[nl]ix> hi nyef
13:37:50 <nyef> Hello.
13:39:12 <Blkt> but doesen't that imply that some code that runs fine on an implementation may break on another for non-bug reasons?
13:40:17 <fe[nl]ix> yes, FSVO "bug"
13:41:19 <Blkt> well, I think that's a normal situation trying to write portable code
13:42:17 <Krystof> for example: (make-string 10 :element-type 'base-char :initial-element #\é)
13:45:21 <Blkt> but that's not make-string itself raising an error
13:45:32 <Blkt> it is the underlying functions, aren't they?
13:47:07 <nyef> By that logic, ERROR doesn't raise an error, the underlying functions do.
13:48:29 <pkhuong> FLOAT: Exceptional Situations: None.
13:49:07 <Blkt> what I mean is that make-string or make-pathname bodies should have (error 'foo) somewhere to be raising an error themselves
13:51:05 <danlentz> pkhuong: changing transaction objects from classes to structs increased my transaction/sec 700-900% with 2 threads and heavy contention
13:51:24 <danlentz> things looking much better!
13:52:03 <fe[nl]ix> impressive
13:52:28 <danlentz> 1000000 txn down to 3.2 sec (was almost 22)
13:55:16 <danlentz> y the idea is that the ctrie nature is especially suited for herlithy's DSTM (which is a very lightweight mechanism that otherwise is mostly only useful for functional tree root-node type data structures)
13:56:05 <danlentz> ctrie is a mutable data structure however it  can clone in O(1) like a functional one
13:57:02 Bike [~Glossina@67-5-231-61.ptld.qwest.net] has joined #sbcl
13:57:07 <danlentz> so putting the two together is in way the best of both worlds
14:00:00 <danlentz> there are still a few hurdles; for one, at least with my MOP/persistent object classes there is the issue of garbage collecting instances that were created and then rolled back
14:00:51 <danlentz> I think I need to attack that at a more fundamental level by going to a "slab allocation" type of strategy
14:01:19 nicdev [user@2600:3c03::f03c:91ff:fedf:4986] has joined #sbcl
14:01:36 davazp [~user@92.251.188.143.threembb.ie] has joined #sbcl
14:02:32 <danlentz> supposedly scala-stm is based on ctrie also, but I have yet to find the code, let alone understand it
14:03:33 <danlentz> scala code, to my eyes, is hideously ugly and unpleasant to read compared to lisp.
14:05:16 <pkhuong> danlentz: nice.
14:06:16 <pkhuong> danlentz: if you're read mostly, the seqlock thing will probably be a lot better.
14:06:54 <pkhuong> especially if the ctrie's root node is already a choke point.
14:07:32 <danlentz> the root seldom changes
14:07:53 <danlentz> the branching factor is 32, so it fans pout very quickly
14:08:31 <danlentz> i only need the stm to coordinate functionality that involves multiple cries or such
14:08:41 <danlentz> since all ops are already atomic
14:10:00 <danlentz> plus I recently implemented "compound ops" like update-if, ensure-get, remove-if, etc  all atomic as well
14:11:07 <danlentz> so that aloows a variety of kinds of read/modify/write functionality on things in the ctrie while maintaining a lock-free atomic guarantee
14:12:53 LiamH [~none@pdp8.nrl.navy.mil] has joined #sbcl
14:13:59 <danlentz> tbh dstm right now is just in use for ctrie-objects [like hashtable-class) and ordered collections
14:14:59 <danlentz> until I fully work out the issues with building ordered collections on ctrie, I'm using a weight-balanced binary tree for those
14:15:51 <danlentz> also, I kind of missed binary trees, so I had to find an excuse to incorporate my wb-tree code :)
14:17:28 <danlentz> I think ultimately ctries would be faster for sets, since the are actually very similar in nature to patricia tries in a lot of ways
14:17:45 <danlentz> in effect, not really algorithmically
14:18:53 <danlentz> so, for example, union / intersection, etc could be done very efficiently in comparison to adams-type binary tree
14:21:39 <danlentz> before i jump into much new functionality though my more immediate priority is to put some considerable effort into the test suite, which has lagged behind.
14:24:10 <danlentz> pkhuong: i will reread seqlock though; i'm sure I learn something new each time I look through it
14:25:20 <danlentz> i'm also pretty amazed by common-cold
14:42:35 <stassats> now i can't even build a proper glibc, keep getting Illegal instruction
14:42:37 <danlentz> pkhuong: in mcclain's dstm, reads occur unprotected, so there is no transaction needed for single var. for atomic reads of multiplee vars, this still happens lock-free.  I will look more closely at seqlock i might be misermembering how it operates in comparison
14:46:53 <pkhuong> danlentz: the seqlock STM is basically a single global fast read/write lock. reads are optimistic, and there's only one writer at a time.
14:50:25 drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has joined #sbcl
14:52:05 -!- Bike [~Glossina@67-5-231-61.ptld.qwest.net] has quit [Ping timeout: 245 seconds]
14:52:53 -!- xymox [lechuck@unaffiliated/contempt] has quit [Ping timeout: 248 seconds]
14:54:06 xymox [lechuck@unaffiliated/contempt] has joined #sbcl
14:54:43 -!- drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has quit [Ping timeout: 246 seconds]
15:16:26 Bike [~Glossina@67-5-231-61.ptld.qwest.net] has joined #sbcl
15:16:37 <stassats> well, i can't seem to build glibc which works no matter what i try
15:16:49 <stassats> need to resolve this some other way then, sigh
15:42:32 <stassats> and now frlock.1 seems to not fail
15:42:41 <stassats> maybe it's non-deterministic
15:43:50 <stassats> indeed
15:46:12 <danlentz> pkhuong: then this dstm should perform better in general, since it still has optimistic reads and also allows many writers (and also in that capability allows for distinguished "root transaction" with rollback granularity for subtrans below it.  Am I still missing something?
15:46:51 <pkhuong> danlentz: low overhead. research shows that the seqlock thing is a good baseline, at least until you hit dozens of concurrenct writers.
15:48:21 <pkhuong> it's really nice that a solution will scale better to hundred of cores, but most machines still only have <= 8 (also, the easiest way to make things scale to many cores is often to slow the serial case down)
15:49:04 <danlentz> in any event, seqlock seems like it could be a useful "contrib" to sbcl.  Is there strong opposition to such a thing?
15:50:04 <pkhuong> I don't trust it yet.
15:50:10 <Krystof> we object to anything that might be useful!
15:50:12 <pkhuong> and we already have frlock in sb-concurrency.
15:50:23 <pkhuong> not that I trust frlock either ;)
15:50:34 <stassats> whose test  fails on ppc...
15:50:55 <danlentz> i've not had success with nicodemus's brlock/frlock but i haven't tried in a while
15:53:54 <danlentz> also, I guess I've gone through so much effort with the whole lock-free approach in cl-ctrie it would seem like defeat to start with locking strategies at this point :/
15:54:28 <pkhuong> the lock is an implementation detail. dstm isn't lock-free either.
15:55:08 *stassats* is out of ideas what to test for the getaddrinfo problem
15:55:32 -!- ASau` is now known as ASau
15:55:44 <danlentz> although these hw lock elision techniques seem like they will offer stiff competiotion
15:56:44 <danlentz> pkhuong: ?? There's no locking anywhere in this dstm -- its all completely optimistic
15:56:46 <pkhuong> stassats: our string deportation logic always was iffy
15:56:55 <pkhuong> danlentz: an :active transaction can lock everyone out.
15:57:45 <danlentz> To the extent that within transactions it does not even assure that the invariant is satisfied.  Only at close of tx
15:58:42 <danlentz> there can be multiple :active.  The first to commit wins, and the other will rollback when it attempts to do the same
15:58:57 <pkhuong> danlentz: in write-var-with-transaction, what happens when all the transactions attempt to write to the same location, and the one transaction that's acquired the location for writing never makes forward progress?
15:59:51 <stassats> if i call it with NULL, instead of a string, it returns -2 without crashing
16:00:05 <pkhuong> stassats: you can't call with both string arguments NULL.
16:00:34 <pkhuong> stassats: but you could construct a c-string on the foreign heap, and pass that to getaddrinfo.
16:00:48 <stassats> well, it doesn't crash, whether i can or can't call it that way
16:01:21 <pkhuong> stassats: it returns with an error telling you you can't call it that way. I'm not surprised it doesn't crash.
16:02:11 <stassats> it's clear that this happens, but the fact that it doesn't crash is different from the fact that it crashes in all other cases
16:02:58 <pkhuong> danlentz: the lack of guaranteed forward-progress for any one of the concurrent threads of executions makes this a lock-ful design. If there was a guarantee that at least one of the concurrent threads always made progress, that'd be lock-free (cas-based multiword compare and swap implementations have that).
16:04:02 <danlentz> phkuong: It loops.  but  another thread can issue a write at any time which will call write-var-with-transaction with a different root transaction.  That will break the loop for the first thread if it is not making progress
16:04:25 <stassats> but the string allocation appears to be correct
16:04:42 <danlentz> one thread is guaranteed to succeed
16:04:48 <pkhuong> danlentz: what loop of the first thread? the first thread could just never execute, as a scheduling artefact.
16:09:06 -!- Bike [~Glossina@67-5-231-61.ptld.qwest.net] has quit [Ping timeout: 264 seconds]
16:09:32 Bike [~Glossina@75-164-172-180.ptld.qwest.net] has joined #sbcl
16:13:51 <danlentz> when thread2 commits, it will set the state of the var to committed.  that will cause the loop in thread1's  write-var-with-transaction to terminate
16:14:48 <danlentz> each thread has its own  write-var-with-transaction called with guarantee of a unique root-transaction per-thread (thread local value)
16:15:59 <pkhuong> danlentz: right, but if thread2 never commits?
16:16:24 <danlentz> you mean neither thread1 or thread2 ever commit?
16:16:34 <danlentz> then the rte is no conflict
16:17:00 <pkhuong> the paper itself says it's obstruction-free, not lock-free.
16:17:39 <danlentz> yes I abuse the term lock-free you're right
16:17:48 <stassats> wonder if ccl works
16:19:33 <pkhuong> but I'm not even sure I see how the implementation on github is obstruction-free.
16:21:34 <danlentz> the original hdstm code he started with was more traditional -- pluggable conflict maganer that forcibly tedminated a competing thrwad
16:22:17 <stassats> well, i can't even run ccl here
16:23:06 <danlentz> in his later versions, he eliminated it
16:23:46 <pkhuong> danlentz: without the option of changing the blocking transaction's state from :active to :aborted in write-var-with-transaction, the STM is not even obstruction-free.
16:25:42 <danlentz> well thats distressing I will have to definitely look more deeply into this then
16:27:16 <danlentz> but you've definitely narrowed the focus considerably to w-v-w-t, so that is enormously helpful
16:30:40 <danlentz> I also have a number of the correspondences dr mclain posted to lispworks-hug as he worked on the various versions of this, so perhaps re-reading those might also help
16:32:12 <pkhuong> danlentz: you'll have to change the rest of the commit protocol too, then. CASing from :active to :committed was useful to detect cancellations... also as an implicit barrier.
16:43:36 <danlentz> pkhuong: I just pushed sbcl friendly "hdstm.lisp" which implementation is much closer to that described in the paper including, hopefully, at least obstruction-free  write-var / commit protocol.
16:45:25 <danlentz> i also added the barriers --- hopefully I got them correct.
16:49:43 -!- ASau [~user@p5797EFD9.dip0.t-ipconnect.de] has quit [Ping timeout: 264 seconds]
16:51:06 -!- Bike [~Glossina@75-164-172-180.ptld.qwest.net] has quit [Ping timeout: 264 seconds]
16:51:45 Bike [~Glossina@67-5-199-215.ptld.qwest.net] has joined #sbcl
16:53:30 <danlentz> although its not clear what his comment re: hash tables in DO-ORELSE refers to; i'm assuming its an artifact left over from earlier work.
16:53:44 ASau [~user@p5797EFD9.dip0.t-ipconnect.de] has joined #sbcl
16:55:09 <pkhuong> danlentz: I don't think sb-ext:barrier does what you think it does.
16:57:00 <pkhuong> if you're micro-optimising, you might want to switch to defglobal on SBCL, and to add padding to the roll/trans/fails counters. The way they currently are, they might end up being allocated in the same cache line, making their bottleneck even worse.
16:57:03 -!- edgar-rft [~GOD@HSI-KBW-149-172-63-75.hsi13.kabel-badenwuerttemberg.de] has quit [Quit: bleeding]
16:59:54 <pkhuong> and there's a bug in set-state: you probably don't want to loop forever when trying to abort a transaction that's already committed (also, the CAS is an implicit barrier [i think we wish to guarantee that across all platforms, as well]).
17:00:55 sdemarre [~serge@207.95-64-87.adsl-dyn.isp.belgacom.be] has joined #sbcl
17:19:59 <danlentz> pkhuong: by "I don't think sb-ext:barrier does what you think it does." did you mean that there are problems other than being unnecessary around CAS op?
17:20:21 <danlentz> set-state now succeeds unconditionally on :committed txn; unneccessary write-barriers around CAS eliminated; atomic counters declared GLOBAL to reduce chance of being allocated in same cache line (bottleneck)
17:22:19 <pkhuong> danlentz: padding would help with aliasing issues. global is just a way to eliminate the special access overhead.
17:22:48 <pkhuong> sb-ext:barrier puts the barrier after the body, not before.
17:25:22 <stassats> if i call getaddrinfo from another c function, it fails in the same way
17:25:37 <stassats> so, something wrong with the environment surely
17:30:06 -!- Bike [~Glossina@67-5-199-215.ptld.qwest.net] has quit [Ping timeout: 264 seconds]
17:30:11 <danlentz> so essentially my volatile and flush-volitile are backwards
17:31:37 Bike [~Glossina@71-214-80-127.ptld.qwest.net] has joined #sbcl
17:47:49 bege [~bege@S0106001d7e5132b0.ed.shawcable.net] has joined #sbcl
18:04:35 <nyef> You might prefer to not use the body parameter for barrier.
18:05:12 <nyef> I think that I might have had a good reason for adding the body parameter, but whatever that reason might have been escapes me at this point.
18:06:11 -!- pranavrc [~pranavrc@unaffiliated/pranavrc] has quit [Quit: Ping timeout: ]
18:08:53 <pkhuong> I guess the implicit progn helps make at least one common pattern easy to write.
18:09:43 <nyef> Yeah, that might have been it.
18:44:03 <danlentz> nyef: i.e., just use   (sb-thread:barrier (:data-dependency)) before or after
18:46:51 <danlentz> btw, just to be certain, :data-dependency is the correct barrier-type?
18:48:31 <pkhuong> I doubt it.
18:54:52 <danlentz> hmm. barriers are not an easy subject for the newbie it seems.  I was originally just using :memory but after reviewing the way they were used in kraison's cl-skip-list it seemed to me that :data-dependency might have been more appropriate
18:57:03 <danlentz> I haven't found too much other code that uses them in order to study examples
19:06:14 <pkhuong> our barriers are inspired by linux's model. iirc, only alpha needs data dependency barriers.
19:08:58 -!- davazp [~user@92.251.188.143.threembb.ie] has quit [Remote host closed the connection]
19:24:33 ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has joined #sbcl
20:12:42 <nyef> So, yeah, I remember what the use-case was. It saves an explicit temporary when you need to have a barrier after computing a value to return.
20:16:58 <stassats> nyef: could have any idea why a call to a C function on PPC would fail when called not from the main thread?
20:17:05 *stassats* almost tried everything already
20:17:17 <stassats> s/almost tried/tried almost/
20:17:38 <stassats> i'm thinking something wrong with the way the stack is set up
20:20:46 <stassats> fail as in a memory fault
20:23:02 <nyef> Nothing springs immediately to mind, I'm afraid.
20:23:53 <stassats> it's gettaddrinfo from glibc, and i don't have debug symbols (can't install since this is gcc compile farm)
20:24:17 <stassats> and i failed to build a working glibc locally, so i'm out of ways to resolve this
20:25:07 <nyef> ... Maybe a mis-aligned number stack, or some problem with setting up the thread stack location?
20:25:36 <nyef> Was there ever a version of SBCL that this worked on? That is, would bisection be an option?
20:25:43 -!- sdemarre [~serge@207.95-64-87.adsl-dyn.isp.belgacom.be] has quit [Ping timeout: 264 seconds]
20:25:52 <stassats> i tried building some older sbcl, but it failed to build the runtime
20:26:01 <stassats> the ones on sbcl.org are without threads
20:27:39 <nyef> ... Lovely.
20:28:04 <nyef> And, yeah, threaded SBCL/PPC was only added a couple of years ago.
20:28:21 <stassats> i tried a C program with pthreads, it worked as expected
20:28:46 <nyef> Of course it did, being able to track it down that way would have been too easy.
20:28:50 <stassats> i also tried calling from my own void C function, it failed the same as if i called gettaddrinfo myself
20:29:58 <nyef> What about calling something from sb-unix, or simply using PROBE-FILE?
20:30:07 <stassats> i should try again at compiling glibc, seems like the only resort right
20:30:22 <stassats> haven't noticed any problems with anything else, even with other socket functions
20:30:35 <stassats> s/right/right now/
20:31:15 <nyef> So, getaddrinfo() fails, your own function fails, but everything else seems to work?
20:31:30 <stassats> my own function which calls getaddrinfo
20:31:34 <stassats> and everything else works
20:32:37 <nyef> Where are you getting the addresses to pass to getaddrinfo?
20:33:34 <stassats> it actually fails in __check_pf which is called by getaddrinfo with pointers to stack allocated variables
20:34:28 <nyef> They're definitely in the C stack space?
20:34:54 <stassats> here, i have no idea, gdb is useless without debug symbols
20:35:11 <nyef> Mmm.
20:35:26 <stassats> i tried to write my own functions which called another function in a similar way, but it works fine
20:35:34 <nyef> What does __check_pf do?
20:36:03 <stassats> and it works fine on the initial thread, so nothing should be wrong with calling conventions
20:36:50 <stassats> it checks what interfaces are available, it seems to use a cache
20:37:03 <stassats> which is locked with __libc_lock_lock
20:37:38 <stassats> or maybe not, there's a lot of conditionals
20:37:51 <nyef> Lovely.
20:38:25 <stassats> basically it calls some functions, and then sets the pointers it was passed with information
20:38:28 <nyef> Can you get the debug symbols for the version of glibc installed on that machine, even if you can't install them?
20:38:50 <stassats> i tried, but i can't manage to find them
20:39:08 <stassats> it's fedora, i'm not really familiar where it might be, but google failed me
20:39:15 <nyef> Lovely.
20:39:32 <nyef> Because, of course, if you had the debug version of the library, a quick LD_PRELOAD and you can be using it. /-:
20:39:46 <stassats> wanted to C if CCL works, and apparently CCL can't work on POWER7 or something
20:39:48 <pkhuong> stassats: what happens if you call getaddrinfo from the main thread once, and then from another thread?
20:39:53 <foom> it's super-easy  in fedora
20:40:01 <foom> debuginfo-install $package
20:40:10 <pkhuong> we might be missing a linker flag when building the runtime
20:40:17 <foom> debuginfo-install is in yum-utils package
20:40:32 <stassats> nyef: that's why i wanted to build my own glibc, but i kept getting illegal instructions, even though it seemed to be a ppc32 one
20:40:39 <nyef> foom: Even without root access?
20:40:45 <stassats> pkhuong: fails the same way
20:40:47 <foom> oh.
20:40:53 <foom> um, why would you have a machine you don't have root on? :)
20:41:02 <nyef> GCC compile farm.
20:41:12 <stassats> because the one where i have the root on would cost 10K$
20:44:05 <foom> http://dl.fedoraproject.org/pub/fedora-secondary/releases/19/Everything/ppc64/debug/
20:44:28 <foom> you can grab an rpm and extract it to a dir and point gdb at that
20:45:35 <stassats> not sure whether   glibc-debuginfo-2.16-24.fc18.ppc64.rpm  would have 32-bit debug symbols
20:45:37 <stassats> let's see
20:46:46 <foom> oh, you want ppc dir not ppc64, then.
20:47:04 <stassats> and this is the kind of bugs in which the culprit will eventually be found to be really trivial
20:47:24 <stassats> ok, i tried ppc32 and got 404, ppc works
20:48:38 <foom> might check the version of the package matches, while you're at it. :)
20:50:19 <stassats> it doesn't, but i can get the binary from there too
20:54:49 <stassats> success
20:55:01 <stassats> foom: thanks, that helped
20:55:17 <stassats> (success, as in i got debug symbols to work, not in that i fixed the bug)
20:55:43 <Krystof> http://xkcd.com/349/
20:59:21 <stassats> seen_ipv4=0x0 looks suspect, it's a pointer
21:02:31 <stassats> hm, looks like it goes somewhere further than __check_pf, but i only got #18 <signal handler called> #19 0x4f0d95f0 in ?? ()
21:10:00 -!- ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has quit [Remote host closed the connection]
21:15:02 ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has joined #sbcl
21:15:44 <nyef> Okay, I need to sign off now, but I'll wish you luck hunting this one down...  And I might be able to scare up enough project bandwidth to try and reproduce it tomorrow if you're still having trouble.
21:15:48 -!- nyef [~nyef@c-50-157-244-41.hsd1.ma.comcast.net] has quit [Quit: G'night all.]
21:42:56 attila_lendvai [~attila_le@apn-89-223-228-29.vodafone.hu] has joined #sbcl
21:42:56 -!- attila_lendvai [~attila_le@apn-89-223-228-29.vodafone.hu] has quit [Changing host]
21:42:56 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl
21:46:33 milosn_ [~milosn@user-5af50bb3.broadband.tesco.net] has joined #sbcl
21:48:31 -!- milosn [~milosn@user-5af5015d.broadband.tesco.net] has quit [Ping timeout: 264 seconds]
21:51:41 -!- LiamH [~none@pdp8.nrl.navy.mil] has quit [Quit: Leaving.]
22:13:04 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Quit: Leaving.]
22:21:31 <stassats> got sbcl to use a locally built glibc, now gdb refuses to debug threads
22:26:44 <foom> you need libthread_db that matches
22:26:57 <stassats> i just built it, i don't get how it doesn't match
22:27:25 <foom> did you tell gdb to use it? LD_LIBRARY_PATH around gdb?
22:27:58 <stassats> probably my gdb is 64-bit, and i built a 32-bit glibc
22:30:48 <stassats> how many more hurdles do i have yet to jump
22:31:05 <danlentz> Krystof: real hackers would know to use floating-point
22:31:21 <foom> i missed the part about why you're compiling your own glibc
22:31:47 <stassats> foom: because i have nothing else left to do
22:32:13 <foom> in order to do what?
22:32:17 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl
22:32:21 <stassats> the error is in getaddrinfo, debug symbols i got didn't shine any light
22:32:30 <stassats> in order to insert print statements and whatnot
22:34:29 <stassats> and to cook up a test case
22:35:29 <foom> If you have a similarish version of glibc that your distro came with, I think you could just use their 64-bit libthread.
22:37:04 <stassats> i'm building a 64-bit glibc
22:41:10 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Quit: Leaving.]
22:49:01 -!- segv- [~mb@95-91-241-60-dynip.superkabel.de] has quit [Remote host closed the connection]
22:50:03 <stassats> that didn't work that well
22:50:16 <stassats> "i know, i'll build a 32-bit gdb"
22:52:11 <foom> maybe it can't find the lib? Have you tried LD_DEBUG on gdb to see where it's looking?
22:57:17 <stassats> ok, got 32-bit gdb to work
23:21:36 <stassats> print statements for the win, it fails at alloca(65536)
23:23:02 <stassats> which causes INFO: Control stack guard page unprotected and all other bad things
23:24:13 <stassats> and 64K it gets from the page size
23:44:22 <stassats> looks like a mix up with stack grows downward and upward
23:44:46 <stassats> or something like that
23:45:16 <stassats> at least a got a reduced test-case and don't need to recompile half the os anymore