00:01:23 works from C 00:07:26 if induce a memory fault on x86, it just enters the debugger, ppc drops to ldb 00:07:32 woohoo -- 1.1.10 with all tests //apparent-success on darwin 12.4.0 00:07:51 (sb-sys:sap-ref-8 (sb-sys:int-sap 0) 0) => gc signals blocked 00:09:54 (is it in good taste to say a general "thank-you" to the sbcl team?) 00:11:10 if you want to 00:11:17 :) 00:13:37 Are concurrency matters on darwin generally considered to be as reliable on darwin as linux? I recall a few rumblings on the devel list a while back, although iirc it was specifically wrt signals 00:14:10 danlentz: it's much closer now that we basically don't use the OS for anything but spawning threads. 00:14:32 the fact the it faults consistently at 10f suggests that it's not garbage related 00:15:01 stassats: seriously, find the PC and disassemble. 00:15:59 that doesn't always make it clear, especially since it's in C 00:16:29 the segfault is in C? 00:16:36 yes, in getaddrinfo 00:17:02 getaddrinfo => __check_pf 00:17:29 if I had some concurrent code that Im positive passed extensive unit tests ~1.25 years ago, but now does not, would it be likely that some underlying sbcl internl has changed? 00:17:58 osx may have changed and sbcl didn't 00:18:12 they are known for such things 00:18:23 vague question I know but basically just a sanity check before I get too worked up trying to fix it 00:19:36 good point -- the environment I was using was snow-leapord shen last it passed 00:24:05 is there a place that collects statistic on the approx number of users per platform? ( to get some idea how many others might also run sbcl/moiuntain-lion? ) i.e. If there are a large number (for some value of large) then the assumption to start with would most likely be that the error is in my code 00:25:10 there are a lot of users on x86-64/linux and i manage regularly to find bugs here 00:25:13 even if there are many such users, there's no reason to believe they particularly exercise concurrency features. 00:25:45 such a metric wouldn't prove anything 00:27:28 -!- Bike [~Glossina@174-25-37-243.ptld.qwest.net] has quit [Read error: Connection reset by peer] 00:27:58 no, I guess not. Its just hard to come conclusion that the problem is not with me, since 99.999999% of the time it is... 00:28:26 -!- davazp [~user@92.251.185.254.threembb.ie] has quit [Remote host closed the connection] 00:29:11 if you describe the problem and have some code, someone might be able to help 00:29:58 Bike [~Glossina@174-25-37-243.ptld.qwest.net] has joined #sbcl 00:32:04 specifically the problems I'm seeing are with dr mclain's dstm code (written for lisp works) but requiring only very light porting to sbcl. So probably he is the first one I should bother. But I appreciate the offer b/c his answer is likely to be that it still works fine on lispworks so I've been hesitant to raise the issue. 00:33:29 pkhuong: is your mini-stm package still operational in recent sbcl configurations? 00:33:45 danlentz: should be, for what's in there. 00:34:00 simple problem: the transaction hash table isn't synchronised. 00:34:37 you mean with mcclain's dstm? 00:34:45 the one on your github 00:35:01 no crap -- thanks a million 00:35:54 embarrassing, but very much appreciated -- I'll test and let you know 00:37:25 I'd use special bindings instead of a dictionary to map from thread to transactions 00:39:39 I also don't see where you avoid concurrent commit. 00:40:42 which repo are you looking t? dstm-collections? 00:40:50 cl-ctrie 00:43:54 *stassats* sees how get-address-info can be made to cons less in the meantime 00:44:14 -!- ASau [~user@p4FF96FA0.dip0.t-ipconnect.de] has quit [Read error: Connection reset by peer] 00:45:31 is symbol-value-in-thread acceptable to use? 00:47:27 danlentz: compare-and-swap doesn't seem to be used correctly. cas returns the previous value. 00:48:41 non nil 00:48:58 let me check tho 00:49:54 looks like LW returns a boolean success value. 00:51:28 -!- psilord [~psilord@c-69-180-173-249.hsd1.mn.comcast.net] has quit [Ping timeout: 264 seconds] 00:51:37 the rest looks a bit hairy but correct. I'll trust Herlihy with the details. 00:52:30 Quadrescence [~quad@unaffiliated/quadrescence] has joined #sbcl 01:03:27 prxq_ [~mommer@mnhm-5f75c881.pool.mediaWays.net] has joined #sbcl 01:04:48 -!- prxq [~mommer@mnhm-590c373f.pool.mediaWays.net] has quit [Read error: Operation timed out] 01:06:44 danlentz: you'll want to check for (eq (cas ...) old) instead, in SBCL. The info is useful, when CAS can't fail spuriously. 01:06:50 yes absolutely correct 01:07:11 psilord [~psilord@c-69-180-173-249.hsd1.mn.comcast.net] has joined #sbcl 01:07:34 lisp works cad is apparently more like compare-and-set 01:09:10 just out of curiosity what makes hash table a less preferred mechanism for holding the thread local var? 01:09:18 more complicated for no reason. 01:09:24 specials are thread local vars, in SBCL. 01:09:43 drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has joined #sbcl 01:10:05 -!- Quadrescence [~quad@unaffiliated/quadrescence] has quit [Quit: This computer has gone to sleep] 01:16:10 pkhuong: pushed updated/working version to github with also an attribution to thank you for the assistance. It is very much appreciated. 01:16:43 (still using hash table for the moment tho) 01:19:41 pretty sure you need the barriers in theory 01:21:36 LiamH [~none@pool-74-96-2-44.washdc.east.verizon.net] has joined #sbcl 01:22:36 even under x86-64? 01:24:13 TSO is strong. Maybe not. 01:26:59 I was a bit unsure about the matter when I looked into it, but what the consensus seemed to be was that for x8664 it wasn't needed. Alpha was IIRC the arch that seemed to have consensus that it was needed. But the whole debate, to be honest, was a little bit over my head. 01:29:20 on lapha, you'd need barriers everywhere. 01:29:41 On x86, the implicit barrier in CAS might suffice. 01:30:32 but I'm not sure: the write to new happens after acquisition, and transactions don't really release locations. 01:30:39 echo-area [~user@182.92.247.2] has joined #sbcl 01:33:01 well, that doesn't make sense, the place where it crashes doesn't seem to use any memory passed from lisp 01:33:12 everything is allocated on the C stack 01:33:31 maybe there's some 64-bit/32-bit issue with stack location 01:33:32 stack too small? 01:34:08 unlikely, it's just a handful of variables 01:36:50 danlentz: ah, on x86, I'm pretty sure if the write to transaction-state (to record success/failure) is visible, then so are all the previous writes, so you're good. 01:40:17 th->alien_stack_pointer is initialized differently on :STACK-GROWS-DOWNWARD-NOT-UPWARD and not 01:40:34 great. I'm only interested in sbcl/x86 -- portability is something I haven't done much work to achieve. A battle for another day. But that does set my mind at ease wrt barriers. 01:41:49 but it seems to be correct 01:44:36 and that stack is probably only used for allocating aliens, and on x86oids only 01:52:10 stassats: do you still have issues on frlock? 01:52:28 let me see 01:52:52 yep 02:00:11 could it be an alignment issue? 02:01:34 time to write my own c foreign functions and test those hypotheses 02:02:54 alignment is usually sigbus 02:37:20 -!- christoph_debian [~christoph@ppp-188-174-122-198.dynamic.mnet-online.de] has quit [Read error: Operation timed out] 02:40:38 -!- LiamH [~none@pool-74-96-2-44.washdc.east.verizon.net] has quit [Quit: Leaving.] 02:44:58 pranavrc [~pranavrc@unaffiliated/pranavrc] has joined #sbcl 02:52:33 christoph_debian [~christoph@ppp-188-174-122-152.dynamic.mnet-online.de] has joined #sbcl 02:53:16 -!- Bike [~Glossina@174-25-37-243.ptld.qwest.net] has quit [Ping timeout: 264 seconds] 03:05:36 Bike [~Glossina@174-25-37-243.ptld.qwest.net] has joined #sbcl 03:43:33 Bike_ [~Glossina@67-5-211-203.ptld.qwest.net] has joined #sbcl 03:43:59 -!- Bike [~Glossina@174-25-37-243.ptld.qwest.net] has quit [Disconnected by services] 03:44:02 -!- Bike_ is now known as Bike 03:51:55 Bike_ [~Glossina@67-5-223-248.ptld.qwest.net] has joined #sbcl 03:53:50 -!- Bike [~Glossina@67-5-211-203.ptld.qwest.net] has quit [Ping timeout: 240 seconds] 03:56:21 -!- Bike_ is now known as Bike 04:00:27 Bike_ [~Glossina@67-5-231-61.ptld.qwest.net] has joined #sbcl 04:03:22 -!- Bike [~Glossina@67-5-223-248.ptld.qwest.net] has quit [Ping timeout: 246 seconds] 04:03:32 -!- Bike_ is now known as Bike 04:48:16 -!- stassats [~stassats@wikipedia/stassats] has quit [Ping timeout: 240 seconds] 05:17:27 Bike_ [~Glossina@67-5-231-61.ptld.qwest.net] has joined #sbcl 05:18:45 -!- Bike [~Glossina@67-5-231-61.ptld.qwest.net] has quit [Ping timeout: 264 seconds] 05:22:29 -!- Bike_ is now known as Bike 05:40:58 benkard [~benkard@tmo-107-85.customers.d1-online.com] has joined #sbcl 05:46:14 -!- Bike [~Glossina@67-5-231-61.ptld.qwest.net] has quit [Ping timeout: 240 seconds] 05:46:50 Bike [~Glossina@67-5-231-61.ptld.qwest.net] has joined #sbcl 05:46:57 -!- benkard [~benkard@tmo-107-85.customers.d1-online.com] has quit [Quit: Textual IRC Client: www.textualapp.com] 05:53:46 Quadrescence [~quad@unaffiliated/quadrescence] has joined #sbcl 06:05:39 -!- Quadrescence [~quad@unaffiliated/quadrescence] has quit [Quit: This computer has gone to sleep] 06:18:45 -!- Bike [~Glossina@67-5-231-61.ptld.qwest.net] has quit [Ping timeout: 245 seconds] 06:30:24 ASau [~user@p4FF96FA0.dip0.t-ipconnect.de] has joined #sbcl 06:30:41 Bike [~Glossina@67-5-231-61.ptld.qwest.net] has joined #sbcl 06:36:48 -!- kanru [~kanru@118-163-10-190.HINET-IP.hinet.net] has quit [Remote host closed the connection] 07:08:30 Quadrescence [~quad@c-24-4-5-176.hsd1.ca.comcast.net] has joined #sbcl 07:08:31 -!- Quadrescence [~quad@c-24-4-5-176.hsd1.ca.comcast.net] has quit [Changing host] 07:08:31 Quadrescence [~quad@unaffiliated/quadrescence] has joined #sbcl 07:19:46 kanru [~kanru@118-163-10-190.HINET-IP.hinet.net] has joined #sbcl 07:21:50 -!- Bike [~Glossina@67-5-231-61.ptld.qwest.net] has quit [Ping timeout: 240 seconds] 07:22:08 -!- drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has quit [Read error: Connection reset by peer] 07:22:46 drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has joined #sbcl 07:25:14 -!- Quadrescence [~quad@unaffiliated/quadrescence] has quit [Quit: This computer has gone to sleep] 07:28:26 -!- drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has quit [Remote host closed the connection] 07:29:03 drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has joined #sbcl 07:33:20 -!- drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has quit [Ping timeout: 245 seconds] 07:51:08 loke [~loke@203.127.16.194] has joined #sbcl 07:51:15 Shouldn't the following give an error? 07:51:16 (make-pathname :directory '(:absolute "a" "b") :name "c/d") 07:53:08 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 08:13:28 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Quit: Leaving.] 08:36:06 daimrod [daimrod@sbrk.org] has joined #sbcl 08:37:03 Quadrescence [~quad@unaffiliated/quadrescence] has joined #sbcl 08:41:42 -!- Quadrescence [~quad@unaffiliated/quadrescence] has quit [Client Quit] 09:14:31 attila_lendvai [~attila_le@84-236-118-220.pool.digikabel.hu] has joined #sbcl 09:14:31 -!- attila_lendvai [~attila_le@84-236-118-220.pool.digikabel.hu] has quit [Changing host] 09:14:31 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 09:16:21 -!- ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has quit [Remote host closed the connection] 09:19:17 davazp [~user@31.200.149.34] has joined #sbcl 09:49:36 -!- pranavrc [~pranavrc@unaffiliated/pranavrc] has quit [Quit: Ping timeout: ] 10:09:13 -!- echo-area [~user@182.92.247.2] has quit [Remote host closed the connection] 10:13:08 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Quit: Leaving.] 10:16:04 -!- davazp [~user@31.200.149.34] has quit [Ping timeout: 264 seconds] 10:21:35 attila_lendvai [~attila_le@84-236-118-220.pool.digikabel.hu] has joined #sbcl 10:21:35 -!- attila_lendvai [~attila_le@84-236-118-220.pool.digikabel.hu] has quit [Changing host] 10:21:35 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 10:27:34 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Quit: Leaving.] 11:05:35 LiamH [~none@pool-74-96-2-44.washdc.east.verizon.net] has joined #sbcl 11:06:12 -!- LiamH [~none@pool-74-96-2-44.washdc.east.verizon.net] has quit [Client Quit] 11:25:27 loke: at some point, if you try to use it, yes 11:25:32 but not necessarily just by creating it 11:26:08 Krystof: problem is that the namestring will become /a/b/c/d, which is wrong 11:26:18 getting the namestring should (IMHO) raise an exception 11:26:30 since it's not actually a valid namestring for that filesystem 11:27:31 I agree that creating a namestring of /a/b/c/d is wrong 11:27:47 what should I think happen is the same as when creating something with :name "c.d" 11:43:54 ASau` [~user@p5797EFD9.dip0.t-ipconnect.de] has joined #sbcl 11:45:23 -!- ASau [~user@p4FF96FA0.dip0.t-ipconnect.de] has quit [Ping timeout: 246 seconds] 12:15:07 pranavrc [~pranavrc@122.164.104.2] has joined #sbcl 12:15:07 -!- pranavrc [~pranavrc@122.164.104.2] has quit [Changing host] 12:15:07 pranavrc [~pranavrc@unaffiliated/pranavrc] has joined #sbcl 12:18:45 loke: http://www.lispworks.com/documentation/HyperSpec/Body/f_mk_pn.htm says it has no exceptional situations, so it can't raise an error 12:18:58 -!- pranavrc [~pranavrc@unaffiliated/pranavrc] has quit [Read error: Connection reset by peer] 12:30:51 pranavrc [~pranavrc@unaffiliated/pranavrc] has joined #sbcl 13:12:58 stassats [~stassats@wikipedia/stassats] has joined #sbcl 13:27:10 segv- [~mb@95-91-241-60-dynip.superkabel.de] has joined #sbcl 13:31:54 Blkt: that's not actually true 13:32:02 why? 13:32:57 because exceptional situations are the minimum situations in which conditions must be signalled 13:33:01 not the maximum 13:35:27 mmm 13:35:29 I'm not following 13:37:07 the exceptional situations in the standard are those in which an error MUST be signaled, but the implementation MAY signal an error in other situations 13:37:10 the exceptional situations section says "under these conditions, these exceptions must/may/should happen" 13:37:29 it does not say "under all other conditions, no exceptions at all must happen" 13:37:35 nyef [~nyef@c-50-157-244-41.hsd1.ma.comcast.net] has joined #sbcl 13:37:45 hi nyef 13:37:50 Hello. 13:39:12 but doesen't that imply that some code that runs fine on an implementation may break on another for non-bug reasons? 13:40:17 yes, FSVO "bug" 13:41:19 well, I think that's a normal situation trying to write portable code 13:42:17 for example: (make-string 10 :element-type 'base-char :initial-element #\é) 13:45:21 but that's not make-string itself raising an error 13:45:32 it is the underlying functions, aren't they? 13:47:07 By that logic, ERROR doesn't raise an error, the underlying functions do. 13:48:29 FLOAT: Exceptional Situations: None. 13:49:07 what I mean is that make-string or make-pathname bodies should have (error 'foo) somewhere to be raising an error themselves 13:51:05 pkhuong: changing transaction objects from classes to structs increased my transaction/sec 700-900% with 2 threads and heavy contention 13:51:24 things looking much better! 13:52:03 impressive 13:52:28 1000000 txn down to 3.2 sec (was almost 22) 13:55:16 y the idea is that the ctrie nature is especially suited for herlithy's DSTM (which is a very lightweight mechanism that otherwise is mostly only useful for functional tree root-node type data structures) 13:56:05 ctrie is a mutable data structure however it can clone in O(1) like a functional one 13:57:02 Bike [~Glossina@67-5-231-61.ptld.qwest.net] has joined #sbcl 13:57:07 so putting the two together is in way the best of both worlds 14:00:00 there are still a few hurdles; for one, at least with my MOP/persistent object classes there is the issue of garbage collecting instances that were created and then rolled back 14:00:51 I think I need to attack that at a more fundamental level by going to a "slab allocation" type of strategy 14:01:19 nicdev [user@2600:3c03::f03c:91ff:fedf:4986] has joined #sbcl 14:01:36 davazp [~user@92.251.188.143.threembb.ie] has joined #sbcl 14:02:32 supposedly scala-stm is based on ctrie also, but I have yet to find the code, let alone understand it 14:03:33 scala code, to my eyes, is hideously ugly and unpleasant to read compared to lisp. 14:05:16 danlentz: nice. 14:06:16 danlentz: if you're read mostly, the seqlock thing will probably be a lot better. 14:06:54 especially if the ctrie's root node is already a choke point. 14:07:32 the root seldom changes 14:07:53 the branching factor is 32, so it fans pout very quickly 14:08:31 i only need the stm to coordinate functionality that involves multiple cries or such 14:08:41 since all ops are already atomic 14:10:00 plus I recently implemented "compound ops" like update-if, ensure-get, remove-if, etc all atomic as well 14:11:07 so that aloows a variety of kinds of read/modify/write functionality on things in the ctrie while maintaining a lock-free atomic guarantee 14:12:53 LiamH [~none@pdp8.nrl.navy.mil] has joined #sbcl 14:13:59 tbh dstm right now is just in use for ctrie-objects [like hashtable-class) and ordered collections 14:14:59 until I fully work out the issues with building ordered collections on ctrie, I'm using a weight-balanced binary tree for those 14:15:51 also, I kind of missed binary trees, so I had to find an excuse to incorporate my wb-tree code :) 14:17:28 I think ultimately ctries would be faster for sets, since the are actually very similar in nature to patricia tries in a lot of ways 14:17:45 in effect, not really algorithmically 14:18:53 so, for example, union / intersection, etc could be done very efficiently in comparison to adams-type binary tree 14:21:39 before i jump into much new functionality though my more immediate priority is to put some considerable effort into the test suite, which has lagged behind. 14:24:10 pkhuong: i will reread seqlock though; i'm sure I learn something new each time I look through it 14:25:20 i'm also pretty amazed by common-cold 14:42:35 now i can't even build a proper glibc, keep getting Illegal instruction 14:42:37 pkhuong: in mcclain's dstm, reads occur unprotected, so there is no transaction needed for single var. for atomic reads of multiplee vars, this still happens lock-free. I will look more closely at seqlock i might be misermembering how it operates in comparison 14:46:53 danlentz: the seqlock STM is basically a single global fast read/write lock. reads are optimistic, and there's only one writer at a time. 14:50:25 drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has joined #sbcl 14:52:05 -!- Bike [~Glossina@67-5-231-61.ptld.qwest.net] has quit [Ping timeout: 245 seconds] 14:52:53 -!- xymox [lechuck@unaffiliated/contempt] has quit [Ping timeout: 248 seconds] 14:54:06 xymox [lechuck@unaffiliated/contempt] has joined #sbcl 14:54:43 -!- drmeister [~drmeister@pool-71-185-168-200.phlapa.fios.verizon.net] has quit [Ping timeout: 246 seconds] 15:16:26 Bike [~Glossina@67-5-231-61.ptld.qwest.net] has joined #sbcl 15:16:37 well, i can't seem to build glibc which works no matter what i try 15:16:49 need to resolve this some other way then, sigh 15:42:32 and now frlock.1 seems to not fail 15:42:41 maybe it's non-deterministic 15:43:50 indeed 15:46:12 pkhuong: then this dstm should perform better in general, since it still has optimistic reads and also allows many writers (and also in that capability allows for distinguished "root transaction" with rollback granularity for subtrans below it. Am I still missing something? 15:46:51 danlentz: low overhead. research shows that the seqlock thing is a good baseline, at least until you hit dozens of concurrenct writers. 15:48:21 it's really nice that a solution will scale better to hundred of cores, but most machines still only have <= 8 (also, the easiest way to make things scale to many cores is often to slow the serial case down) 15:49:04 in any event, seqlock seems like it could be a useful "contrib" to sbcl. Is there strong opposition to such a thing? 15:50:04 I don't trust it yet. 15:50:10 we object to anything that might be useful! 15:50:12 and we already have frlock in sb-concurrency. 15:50:23 not that I trust frlock either ;) 15:50:34 whose test fails on ppc... 15:50:55 i've not had success with nicodemus's brlock/frlock but i haven't tried in a while 15:53:54 also, I guess I've gone through so much effort with the whole lock-free approach in cl-ctrie it would seem like defeat to start with locking strategies at this point :/ 15:54:28 the lock is an implementation detail. dstm isn't lock-free either. 15:55:08 *stassats* is out of ideas what to test for the getaddrinfo problem 15:55:32 -!- ASau` is now known as ASau 15:55:44 although these hw lock elision techniques seem like they will offer stiff competiotion 15:56:44 pkhuong: ?? There's no locking anywhere in this dstm -- its all completely optimistic 15:56:46 stassats: our string deportation logic always was iffy 15:56:55 danlentz: an :active transaction can lock everyone out. 15:57:45 To the extent that within transactions it does not even assure that the invariant is satisfied. Only at close of tx 15:58:42 there can be multiple :active. The first to commit wins, and the other will rollback when it attempts to do the same 15:58:57 danlentz: in write-var-with-transaction, what happens when all the transactions attempt to write to the same location, and the one transaction that's acquired the location for writing never makes forward progress? 15:59:51 if i call it with NULL, instead of a string, it returns -2 without crashing 16:00:05 stassats: you can't call with both string arguments NULL. 16:00:34 stassats: but you could construct a c-string on the foreign heap, and pass that to getaddrinfo. 16:00:48 well, it doesn't crash, whether i can or can't call it that way 16:01:21 stassats: it returns with an error telling you you can't call it that way. I'm not surprised it doesn't crash. 16:02:11 it's clear that this happens, but the fact that it doesn't crash is different from the fact that it crashes in all other cases 16:02:58 danlentz: the lack of guaranteed forward-progress for any one of the concurrent threads of executions makes this a lock-ful design. If there was a guarantee that at least one of the concurrent threads always made progress, that'd be lock-free (cas-based multiword compare and swap implementations have that). 16:04:02 phkuong: It loops. but another thread can issue a write at any time which will call write-var-with-transaction with a different root transaction. That will break the loop for the first thread if it is not making progress 16:04:25 but the string allocation appears to be correct 16:04:42 one thread is guaranteed to succeed 16:04:48 danlentz: what loop of the first thread? the first thread could just never execute, as a scheduling artefact. 16:09:06 -!- Bike [~Glossina@67-5-231-61.ptld.qwest.net] has quit [Ping timeout: 264 seconds] 16:09:32 Bike [~Glossina@75-164-172-180.ptld.qwest.net] has joined #sbcl 16:13:51 when thread2 commits, it will set the state of the var to committed. that will cause the loop in thread1's write-var-with-transaction to terminate 16:14:48 each thread has its own write-var-with-transaction called with guarantee of a unique root-transaction per-thread (thread local value) 16:15:59 danlentz: right, but if thread2 never commits? 16:16:24 you mean neither thread1 or thread2 ever commit? 16:16:34 then the rte is no conflict 16:17:00 the paper itself says it's obstruction-free, not lock-free. 16:17:39 yes I abuse the term lock-free you're right 16:17:48 wonder if ccl works 16:19:33 but I'm not even sure I see how the implementation on github is obstruction-free. 16:21:34 the original hdstm code he started with was more traditional -- pluggable conflict maganer that forcibly tedminated a competing thrwad 16:22:17 well, i can't even run ccl here 16:23:06 in his later versions, he eliminated it 16:23:46 danlentz: without the option of changing the blocking transaction's state from :active to :aborted in write-var-with-transaction, the STM is not even obstruction-free. 16:25:42 well thats distressing I will have to definitely look more deeply into this then 16:27:16 but you've definitely narrowed the focus considerably to w-v-w-t, so that is enormously helpful 16:30:40 I also have a number of the correspondences dr mclain posted to lispworks-hug as he worked on the various versions of this, so perhaps re-reading those might also help 16:32:12 danlentz: you'll have to change the rest of the commit protocol too, then. CASing from :active to :committed was useful to detect cancellations... also as an implicit barrier. 16:43:36 pkhuong: I just pushed sbcl friendly "hdstm.lisp" which implementation is much closer to that described in the paper including, hopefully, at least obstruction-free write-var / commit protocol. 16:45:25 i also added the barriers --- hopefully I got them correct. 16:49:43 -!- ASau [~user@p5797EFD9.dip0.t-ipconnect.de] has quit [Ping timeout: 264 seconds] 16:51:06 -!- Bike [~Glossina@75-164-172-180.ptld.qwest.net] has quit [Ping timeout: 264 seconds] 16:51:45 Bike [~Glossina@67-5-199-215.ptld.qwest.net] has joined #sbcl 16:53:30 although its not clear what his comment re: hash tables in DO-ORELSE refers to; i'm assuming its an artifact left over from earlier work. 16:53:44 ASau [~user@p5797EFD9.dip0.t-ipconnect.de] has joined #sbcl 16:55:09 danlentz: I don't think sb-ext:barrier does what you think it does. 16:57:00 if you're micro-optimising, you might want to switch to defglobal on SBCL, and to add padding to the roll/trans/fails counters. The way they currently are, they might end up being allocated in the same cache line, making their bottleneck even worse. 16:57:03 -!- edgar-rft [~GOD@HSI-KBW-149-172-63-75.hsi13.kabel-badenwuerttemberg.de] has quit [Quit: bleeding] 16:59:54 and there's a bug in set-state: you probably don't want to loop forever when trying to abort a transaction that's already committed (also, the CAS is an implicit barrier [i think we wish to guarantee that across all platforms, as well]). 17:00:55 sdemarre [~serge@207.95-64-87.adsl-dyn.isp.belgacom.be] has joined #sbcl 17:19:59 pkhuong: by "I don't think sb-ext:barrier does what you think it does." did you mean that there are problems other than being unnecessary around CAS op? 17:20:21 set-state now succeeds unconditionally on :committed txn; unneccessary write-barriers around CAS eliminated; atomic counters declared GLOBAL to reduce chance of being allocated in same cache line (bottleneck) 17:22:19 danlentz: padding would help with aliasing issues. global is just a way to eliminate the special access overhead. 17:22:48 sb-ext:barrier puts the barrier after the body, not before. 17:25:22 if i call getaddrinfo from another c function, it fails in the same way 17:25:37 so, something wrong with the environment surely 17:30:06 -!- Bike [~Glossina@67-5-199-215.ptld.qwest.net] has quit [Ping timeout: 264 seconds] 17:30:11 so essentially my volatile and flush-volitile are backwards 17:31:37 Bike [~Glossina@71-214-80-127.ptld.qwest.net] has joined #sbcl 17:47:49 bege [~bege@S0106001d7e5132b0.ed.shawcable.net] has joined #sbcl 18:04:35 You might prefer to not use the body parameter for barrier. 18:05:12 I think that I might have had a good reason for adding the body parameter, but whatever that reason might have been escapes me at this point. 18:06:11 -!- pranavrc [~pranavrc@unaffiliated/pranavrc] has quit [Quit: Ping timeout: ] 18:08:53 I guess the implicit progn helps make at least one common pattern easy to write. 18:09:43 Yeah, that might have been it. 18:44:03 nyef: i.e., just use (sb-thread:barrier (:data-dependency)) before or after 18:46:51 btw, just to be certain, :data-dependency is the correct barrier-type? 18:48:31 I doubt it. 18:54:52 hmm. barriers are not an easy subject for the newbie it seems. I was originally just using :memory but after reviewing the way they were used in kraison's cl-skip-list it seemed to me that :data-dependency might have been more appropriate 18:57:03 I haven't found too much other code that uses them in order to study examples 19:06:14 our barriers are inspired by linux's model. iirc, only alpha needs data dependency barriers. 19:08:58 -!- davazp [~user@92.251.188.143.threembb.ie] has quit [Remote host closed the connection] 19:24:33 ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has joined #sbcl 20:12:42 So, yeah, I remember what the use-case was. It saves an explicit temporary when you need to have a barrier after computing a value to return. 20:16:58 nyef: could have any idea why a call to a C function on PPC would fail when called not from the main thread? 20:17:05 *stassats* almost tried everything already 20:17:17 s/almost tried/tried almost/ 20:17:38 i'm thinking something wrong with the way the stack is set up 20:20:46 fail as in a memory fault 20:23:02 Nothing springs immediately to mind, I'm afraid. 20:23:53 it's gettaddrinfo from glibc, and i don't have debug symbols (can't install since this is gcc compile farm) 20:24:17 and i failed to build a working glibc locally, so i'm out of ways to resolve this 20:25:07 ... Maybe a mis-aligned number stack, or some problem with setting up the thread stack location? 20:25:36 Was there ever a version of SBCL that this worked on? That is, would bisection be an option? 20:25:43 -!- sdemarre [~serge@207.95-64-87.adsl-dyn.isp.belgacom.be] has quit [Ping timeout: 264 seconds] 20:25:52 i tried building some older sbcl, but it failed to build the runtime 20:26:01 the ones on sbcl.org are without threads 20:27:39 ... Lovely. 20:28:04 And, yeah, threaded SBCL/PPC was only added a couple of years ago. 20:28:21 i tried a C program with pthreads, it worked as expected 20:28:46 Of course it did, being able to track it down that way would have been too easy. 20:28:50 i also tried calling from my own void C function, it failed the same as if i called gettaddrinfo myself 20:29:58 What about calling something from sb-unix, or simply using PROBE-FILE? 20:30:07 i should try again at compiling glibc, seems like the only resort right 20:30:22 haven't noticed any problems with anything else, even with other socket functions 20:30:35 s/right/right now/ 20:31:15 So, getaddrinfo() fails, your own function fails, but everything else seems to work? 20:31:30 my own function which calls getaddrinfo 20:31:34 and everything else works 20:32:37 Where are you getting the addresses to pass to getaddrinfo? 20:33:34 it actually fails in __check_pf which is called by getaddrinfo with pointers to stack allocated variables 20:34:28 They're definitely in the C stack space? 20:34:54 here, i have no idea, gdb is useless without debug symbols 20:35:11 Mmm. 20:35:26 i tried to write my own functions which called another function in a similar way, but it works fine 20:35:34 What does __check_pf do? 20:36:03 and it works fine on the initial thread, so nothing should be wrong with calling conventions 20:36:50 it checks what interfaces are available, it seems to use a cache 20:37:03 which is locked with __libc_lock_lock 20:37:38 or maybe not, there's a lot of conditionals 20:37:51 Lovely. 20:38:25 basically it calls some functions, and then sets the pointers it was passed with information 20:38:28 Can you get the debug symbols for the version of glibc installed on that machine, even if you can't install them? 20:38:50 i tried, but i can't manage to find them 20:39:08 it's fedora, i'm not really familiar where it might be, but google failed me 20:39:15 Lovely. 20:39:32 Because, of course, if you had the debug version of the library, a quick LD_PRELOAD and you can be using it. /-: 20:39:46 wanted to C if CCL works, and apparently CCL can't work on POWER7 or something 20:39:48 stassats: what happens if you call getaddrinfo from the main thread once, and then from another thread? 20:39:53 it's super-easy in fedora 20:40:01 debuginfo-install $package 20:40:10 we might be missing a linker flag when building the runtime 20:40:17 debuginfo-install is in yum-utils package 20:40:32 nyef: that's why i wanted to build my own glibc, but i kept getting illegal instructions, even though it seemed to be a ppc32 one 20:40:39 foom: Even without root access? 20:40:45 pkhuong: fails the same way 20:40:47 oh. 20:40:53 um, why would you have a machine you don't have root on? :) 20:41:02 GCC compile farm. 20:41:12 because the one where i have the root on would cost 10K$ 20:44:05 http://dl.fedoraproject.org/pub/fedora-secondary/releases/19/Everything/ppc64/debug/ 20:44:28 you can grab an rpm and extract it to a dir and point gdb at that 20:45:35 not sure whether glibc-debuginfo-2.16-24.fc18.ppc64.rpm would have 32-bit debug symbols 20:45:37 let's see 20:46:46 oh, you want ppc dir not ppc64, then. 20:47:04 and this is the kind of bugs in which the culprit will eventually be found to be really trivial 20:47:24 ok, i tried ppc32 and got 404, ppc works 20:48:38 might check the version of the package matches, while you're at it. :) 20:50:19 it doesn't, but i can get the binary from there too 20:54:49 success 20:55:01 foom: thanks, that helped 20:55:17 (success, as in i got debug symbols to work, not in that i fixed the bug) 20:55:43 http://xkcd.com/349/ 20:59:21 seen_ipv4=0x0 looks suspect, it's a pointer 21:02:31 hm, looks like it goes somewhere further than __check_pf, but i only got #18 #19 0x4f0d95f0 in ?? () 21:10:00 -!- ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has quit [Remote host closed the connection] 21:15:02 ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has joined #sbcl 21:15:44 Okay, I need to sign off now, but I'll wish you luck hunting this one down... And I might be able to scare up enough project bandwidth to try and reproduce it tomorrow if you're still having trouble. 21:15:48 -!- nyef [~nyef@c-50-157-244-41.hsd1.ma.comcast.net] has quit [Quit: G'night all.] 21:42:56 attila_lendvai [~attila_le@apn-89-223-228-29.vodafone.hu] has joined #sbcl 21:42:56 -!- attila_lendvai [~attila_le@apn-89-223-228-29.vodafone.hu] has quit [Changing host] 21:42:56 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 21:46:33 milosn_ [~milosn@user-5af50bb3.broadband.tesco.net] has joined #sbcl 21:48:31 -!- milosn [~milosn@user-5af5015d.broadband.tesco.net] has quit [Ping timeout: 264 seconds] 21:51:41 -!- LiamH [~none@pdp8.nrl.navy.mil] has quit [Quit: Leaving.] 22:13:04 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Quit: Leaving.] 22:21:31 got sbcl to use a locally built glibc, now gdb refuses to debug threads 22:26:44 you need libthread_db that matches 22:26:57 i just built it, i don't get how it doesn't match 22:27:25 did you tell gdb to use it? LD_LIBRARY_PATH around gdb? 22:27:58 probably my gdb is 64-bit, and i built a 32-bit glibc 22:30:48 how many more hurdles do i have yet to jump 22:31:05 Krystof: real hackers would know to use floating-point 22:31:21 i missed the part about why you're compiling your own glibc 22:31:47 foom: because i have nothing else left to do 22:32:13 in order to do what? 22:32:17 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 22:32:21 the error is in getaddrinfo, debug symbols i got didn't shine any light 22:32:30 in order to insert print statements and whatnot 22:34:29 and to cook up a test case 22:35:29 If you have a similarish version of glibc that your distro came with, I think you could just use their 64-bit libthread. 22:37:04 i'm building a 64-bit glibc 22:41:10 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Quit: Leaving.] 22:49:01 -!- segv- [~mb@95-91-241-60-dynip.superkabel.de] has quit [Remote host closed the connection] 22:50:03 that didn't work that well 22:50:16 "i know, i'll build a 32-bit gdb" 22:52:11 maybe it can't find the lib? Have you tried LD_DEBUG on gdb to see where it's looking? 22:57:17 ok, got 32-bit gdb to work 23:21:36 print statements for the win, it fails at alloca(65536) 23:23:02 which causes INFO: Control stack guard page unprotected and all other bad things 23:24:13 and 64K it gets from the page size 23:44:22 looks like a mix up with stack grows downward and upward 23:44:46 or something like that 23:45:16 at least a got a reduced test-case and don't need to recompile half the os anymore