00:33:58 fisxoj [~fisxoj@192-0-131-151.cpe.teksavvy.com] has joined #sbcl 00:46:52 -!- fisxoj [~fisxoj@192-0-131-151.cpe.teksavvy.com] has quit [Ping timeout: 268 seconds] 01:12:19 -!- davazp [~user@80.31.10.120] has quit [Remote host closed the connection] 01:14:15 -!- yacks [~py@103.6.159.103] has quit [Ping timeout: 260 seconds] 01:22:15 LiamH [~none@96.231.217.193] has joined #sbcl 01:25:51 -!- milosn [~milosn@cable-178-149-0-183.dynamic.sbb.rs] has quit [Read error: Connection reset by peer] 01:26:22 milosn [~milosn@cable-178-149-0-183.dynamic.sbb.rs] has joined #sbcl 01:51:35 yacks [~py@103.6.159.103] has joined #sbcl 02:00:58 echo-area [~user@182.92.247.2] has joined #sbcl 02:02:33 -!- drmeister [~drmeister@pool-173-59-25-58.phlapa.fios.verizon.net] has quit [Remote host closed the connection] 02:03:44 drmeister [~drmeister@pool-173-59-25-58.phlapa.fios.verizon.net] has joined #sbcl 02:04:08 fisxoj [~fisxoj@192-0-131-151.cpe.teksavvy.com] has joined #sbcl 02:08:33 -!- yacks [~py@103.6.159.103] has quit [Read error: Operation timed out] 02:38:55 -!- christoph_debian [~christoph@ppp-88-217-90-168.dynamic.mnet-online.de] has quit [Ping timeout: 264 seconds] 02:43:19 yacks [~py@103.6.159.103] has joined #sbcl 02:45:10 edgar-rft [~GOD@HSI-KBW-078-043-120-047.hsi4.kabel-badenwuerttemberg.de] has joined #sbcl 02:52:14 christoph_debian [~christoph@ppp-188-174-124-23.dynamic.mnet-online.de] has joined #sbcl 03:06:40 ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has joined #sbcl 03:16:39 -!- drmeister [~drmeister@pool-173-59-25-58.phlapa.fios.verizon.net] has quit [Remote host closed the connection] 03:40:17 drmeister [~drmeister@pool-173-59-25-58.phlapa.fios.verizon.net] has joined #sbcl 03:43:12 pranavrc [~pranavrc@122.164.46.65] has joined #sbcl 03:43:12 -!- pranavrc [~pranavrc@122.164.46.65] has quit [Changing host] 03:43:12 pranavrc [~pranavrc@unaffiliated/pranavrc] has joined #sbcl 03:49:51 -!- drmeister [~drmeister@pool-173-59-25-58.phlapa.fios.verizon.net] has quit [Remote host closed the connection] 03:58:31 -!- LiamH [~none@96.231.217.193] has quit [Quit: Leaving.] 03:59:33 -!- fisxoj [~fisxoj@192-0-131-151.cpe.teksavvy.com] has quit [Ping timeout: 248 seconds] 04:25:03 -!- scymtym_ [~user@ip-5-147-120-181.unitymediagroup.de] has quit [Ping timeout: 260 seconds] 06:21:26 prxq [~mommer@mnhm-4d01134a.pool.mediaWays.net] has joined #sbcl 06:26:45 sdemarre [~serge@194.65-64-87.adsl-dyn.isp.belgacom.be] has joined #sbcl 06:39:57 -!- easye [~user@213.33.70.157] has quit [Ping timeout: 276 seconds] 06:42:42 Krystof [~user@81.174.155.115] has joined #sbcl 06:42:42 -!- ChanServ has set mode +o Krystof 06:47:38 -!- ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has quit [Ping timeout: 256 seconds] 06:48:50 benkard [~benkard@2a01:198:6d5:0:5147:31b:69ad:6b78] has joined #sbcl 07:09:12 -!- jdz [~jdz@85.254.212.34] has quit [Quit: Byebye.] 07:10:39 jdz [~jdz@85.254.212.34] has joined #sbcl 07:18:18 -!- sdemarre [~serge@194.65-64-87.adsl-dyn.isp.belgacom.be] has quit [Ping timeout: 276 seconds] 07:30:18 -!- benkard [~benkard@2a01:198:6d5:0:5147:31b:69ad:6b78] has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz] 07:48:01 kanru` [~kanru@193.214.41.96] has joined #sbcl 07:48:16 -!- kanru` [~kanru@193.214.41.96] has quit [Remote host closed the connection] 07:51:27 kanru` [~kanru@193.214.41.96] has joined #sbcl 08:10:27 attila_lendvai [~attila_le@87.247.13.95] has joined #sbcl 08:10:27 -!- attila_lendvai [~attila_le@87.247.13.95] has quit [Changing host] 08:10:27 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 08:25:23 -!- milosn [~milosn@cable-178-149-0-183.dynamic.sbb.rs] has quit [Read error: Connection reset by peer] 08:26:22 milosn [~milosn@cable-178-149-0-183.dynamic.sbb.rs] has joined #sbcl 08:31:26 -!- Vivitron [~Vivitron@c-50-172-44-193.hsd1.il.comcast.net] has quit [Ping timeout: 264 seconds] 09:17:26 sdemarre [~serge@194.65-64-87.adsl-dyn.isp.belgacom.be] has joined #sbcl 09:21:10 -!- kanru` [~kanru@193.214.41.96] has quit [Read error: Operation timed out] 09:30:47 kanru` [~kanru@193.214.41.96] has joined #sbcl 09:34:14 -!- kludge` [~comet@unaffiliated/espiral] has quit [Ping timeout: 256 seconds] 09:36:30 kludge` [~comet@unaffiliated/espiral] has joined #sbcl 10:05:38 -!- sdemarre [~serge@194.65-64-87.adsl-dyn.isp.belgacom.be] has quit [Ping timeout: 264 seconds] 10:10:16 stassats [~stassats@wikipedia/stassats] has joined #sbcl 10:10:33 -!- edgar-rft [~GOD@HSI-KBW-078-043-120-047.hsi4.kabel-badenwuerttemberg.de] has quit [Ping timeout: 245 seconds] 10:11:28 so, why do foreign functions use such an elaborate scheme for safepoints? 10:11:43 instead of doing the same things as for lisp code 10:19:53 elaborate as in, it writes RSP to a special page around the call to C, then upon stop_the_world, it write protects this page 10:20:43 i don't see why it can't just TEST AL, [safepoint] 10:20:57 -!- echo-area [~user@182.92.247.2] has quit [Remote host closed the connection] 10:21:00 even if [safepoint] is different from the lisp code one 10:41:48 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Ping timeout: 245 seconds] 10:42:54 attila_lendvai [~attila_le@87.247.13.212] has joined #sbcl 10:42:54 -!- attila_lendvai [~attila_le@87.247.13.212] has quit [Changing host] 10:42:54 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 11:07:41 "keyword argument not a symbol: #" 11:07:53 go me 11:08:40 segv- [~mb@95.91.242.119] has joined #sbcl 11:14:29 sdemarre [~serge@194.65-64-87.adsl-dyn.isp.belgacom.be] has joined #sbcl 11:21:00 angavrilov [~angavrilo@217.71.227.190] has joined #sbcl 11:27:22 easye [~user@213.33.70.157] has joined #sbcl 11:46:52 ASau` [~user@p4FF97651.dip0.t-ipconnect.de] has joined #sbcl 11:48:17 -!- ASau [~user@p4FF96DC2.dip0.t-ipconnect.de] has quit [Ping timeout: 240 seconds] 11:49:18 -!- sdemarre [~serge@194.65-64-87.adsl-dyn.isp.belgacom.be] has quit [Ping timeout: 245 seconds] 12:23:12 benkard [~benkard@2001:4ca0:0:f230:4827:b4a5:c3b6:35d0] has joined #sbcl 12:37:52 Krystof: does "more correctly" mean it's still not entirely correct? 12:42:12 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Quit: Leaving.] 12:53:16 stassats: how would foreign code test for safe points? 12:53:31 not foreign code, just after the exit 12:54:49 currently each thread has a protected page, where it puts RSP, i don't see why not just put RSP into a thread slot, and use TEST AL,[global-safepoint-page+some-offset] 12:54:53 RSP is needed for GC to happen without interrupting foreign code (which safepoint builds can't) 12:55:17 stassats: what would some-offset be? 12:55:44 n-word-bytes, just to tell it apart from the lisp-global-safepoint 12:57:04 why waste a page per thread? 12:57:31 to be able to individually signal threads. 12:58:12 That's the design I went with, initially. We're already wasting a lot more than a single page per thread. 12:59:12 david or anton probably have a good idea of all the requirements on safe points and foreign call. 12:59:25 ok, now that makes more sense 12:59:57 the thread structure is quite large, can't we put it inside that page then? 13:00:08 that's what I did. 13:00:10 there are no writes normally to it? 13:00:50 -!- kanru` [~kanru@193.214.41.96] has quit [Ping timeout: 264 seconds] 13:01:19 the fork went for a single shared page, I think. But I can see the outline of a race condition on foreign calls, if the gc-visible SP isn't tested atomically with reentering lisp land. 13:03:35 yeah, the question is reversed then, why lisp code uses a global safepoint then 13:04:08 TEST AL,[constant] is certainly cheaper 13:08:36 ok, it looks like the thread is allocated in that page, but CSP is stored at the bottom 13:09:08 hence (storew rsp-tn thread-base-tn thread-saved-csp-offset) where thread-saved-csp-offset = (- (/ *backend-page-bytes* n-word-bytes)) 13:10:38 i can understand 25 year old code written by students not being documented, but a recently written having zilch of clues on what's going on is no good 13:12:12 oh, no, it's not actually at the bottom, just in the page preceding the struct 13:17:20 sdemarre [~serge@194.65-64-87.adsl-dyn.isp.belgacom.be] has joined #sbcl 13:19:39 with a single page with a single address dedicated for the safepoint it's a bit easier to test for it during SIGSEGV, just a comparison, instead of a range, but that shouldn't really be that much more expensive 13:25:52 -!- pranavrc [~pranavrc@unaffiliated/pranavrc] has quit [Quit: Ping timeout: ] 13:35:43 stassats: I'm always nervous of claiming bug-freeness 13:35:58 if you like, "more correctly" in commit messages now saves me from writing "really correctly this time" later 13:37:05 teggi [~teggi@113.173.4.29] has joined #sbcl 13:47:07 kanru` [~kanru@193.160.199.1] has joined #sbcl 14:21:58 -!- teggi [~teggi@113.173.4.29] has quit [Remote host closed the connection] 14:22:37 teggi [~teggi@113.173.4.29] has joined #sbcl 14:23:26 -!- kanru` [~kanru@193.160.199.1] has quit [Ping timeout: 240 seconds] 14:26:37 drmeister [~drmeister@farnsworth.chem.temple.edu] has joined #sbcl 14:30:22 Has anyone used sb-impl::add-fd-handler is its purpose to add a handler for some file-descriptor to the main SBCL loop? 14:32:20 -!- nicdev` is now known as nicdev 14:32:57 attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has joined #sbcl 14:34:51 drmeister: slime does it 14:35:27 I have some example sockets code that uses sb-impl::add-fd-handler as well. 14:35:39 kanru` [~kanru@201.42.214.193.static.cust.telenor.com] has joined #sbcl 14:36:36 I've implemented a sb-bsd-sockets package in my Common Lisp environment - I'm trying out some example code but I find that it requires a little more environment support to run a server. 14:41:48 in what sense ? 14:46:10 Maybe I'm confused. What is the purpose of sb-impl::add-fd-handler? Is it used by SWANK to listen for traffic from a socket while also listening to the terminal for REPL input? I'm guessing it does this with the unix function "select" which can watch for input on multiple file-descriptors. 14:46:36 i just noticed that non-unicode builds seem to be broken since around 2013-09-05: https://ci.cor-lab.org/job/sbcl-master/121/ (klick "next build" to see how configuration 3, which is "without sb-unicode" is consistently "red) 14:46:41 I'm thinking in terms of a non-multithreading situation. 14:50:11 it does indeed use select() (or poll/kpoll or similar) 14:51:02 scymtym: hm. How do I see the build log for those builds? 14:51:45 Krystof: https://ci.cor-lab.org/job/sbcl-master/122/featureset=3,label=ubuntu_quantal_64bit/consoleText 14:51:47 Krystof: click one of the red "balls", then "console output" in the left sidebar 14:52:11 if necessary, click "Full log" (or similar) at the top of the console output 14:52:17 character pure hangs 14:52:18 i can reproduce 14:53:06 on (assert (char-equal (code-char 201) (code-char 233))) 14:53:08 stassats: ok, leaving it to you for an hour or so then 14:53:33 ha ha 14:53:35 both constants 14:56:03 (compile nil `(lambda () (char-equal #\b #\a))) hangs too, bot not #\a #\a 14:57:41 ok, calling (sb-int:two-arg-char-equal #\b #\a) fails to, so, it's this base-char-equal thing, i actually haven't checked the logic, just copied it from the transform 14:58:49 there's no sb-kernel:base-char-p on #-sb-unicode 14:59:50 but there's sb-impl::base-char-p, which conviniently hangs on (sb-impl::base-char-p #\a) 15:00:05 fisxoj [~fisxoj@192-0-131-151.cpe.teksavvy.com] has joined #sbcl 15:00:35 base-char-p is not exported from sb-kernel on #-sb-unicode for some reason 15:03:58 ok, i see it, base-char-p is (defun base-char-p (char) (base-char-p char)), but #!+sb-unicode (define-source-transform base-char-p (x) `(typep ,x 'base-char)) 15:04:08 hence the loopp 15:05:05 so, i'm not the one to blame! Krystof is! 15:07:00 i'm thiniking, a) prevent base-char-p from appearing in a non-sb-unicode build at all and guard the cases where it's used 15:07:20 b) remove #!+sb-unicode from the transform and from the :export clause 15:11:15 -!- benkard [~benkard@2001:4ca0:0:f230:4827:b4a5:c3b6:35d0] has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz] 15:15:09 i'm going for a 15:16:30 why does it even need to exist even in sb-unicode build? 15:16:42 -!- sdemarre [~serge@194.65-64-87.adsl-dyn.isp.belgacom.be] has quit [Ping timeout: 276 seconds] 15:16:44 just replace it with (typep ,x 'base-char) everywhere? 15:17:00 (base-char-p x) is easier to type! 15:17:21 "yay" 15:17:57 that's a convention for all other types, there's no reason to do that just for base-char 15:18:59 What's the convention; defining an internal foo-p? 15:19:10 right 15:19:27 Then why should it be #+sb-unicode? base-char exists in #-sb-unicode too. 15:20:30 testing for base-char is usually redundant on -sb-unicode, not having it present will catch inadvertent uses 15:20:57 good point. 15:28:28 *stassats* puts that in a comment, lest somebody like me will complain about it 20 years later 15:32:16 benkard [~benkard@2001:4ca0:0:f230:1130:d63d:9e01:1f93] has joined #sbcl 15:40:41 sdemarre [~serge@194.65-64-87.adsl-dyn.isp.belgacom.be] has joined #sbcl 15:40:56 -!- Quadrescence [~quad@unaffiliated/quadrescence] has quit [Quit: Leaving] 15:41:22 Quadrescence [~quad@unaffiliated/quadrescence] has joined #sbcl 15:54:36 -!- kanru` [~kanru@201.42.214.193.static.cust.telenor.com] has quit [Ping timeout: 245 seconds] 16:30:12 -!- benkard [~benkard@2001:4ca0:0:f230:1130:d63d:9e01:1f93] has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz] 16:35:31 edgar-rft [~GOD@HSI-KBW-109-193-013-113.hsi7.kabel-badenwuerttemberg.de] has joined #sbcl 16:44:56 -!- sdemarre [~serge@194.65-64-87.adsl-dyn.isp.belgacom.be] has quit [Ping timeout: 256 seconds] 16:54:07 -!- Bike [~Glossina@69.166.47.103] has quit [Ping timeout: 240 seconds] 16:55:32 -!- attila_lendvai [~attila_le@unaffiliated/attila-lendvai/x-3126965] has quit [Quit: Leaving.] 16:57:19 -!- ASau` is now known as ASau 17:08:06 Bike [~Glossina@69.166.47.109] has joined #sbcl 17:10:30 l_ [~ln@84.233.246.170] has joined #sbcl 17:11:08 -!- l_ [~ln@84.233.246.170] has left #sbcl 17:13:55 ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has joined #sbcl 17:24:12 eeezkil [~eeezkil@unaffiliated/eeezkil] has joined #sbcl 17:26:29 l_ [~l_@84.233.246.170] has joined #sbcl 17:27:59 -!- l_ [~l_@84.233.246.170] has left #sbcl 17:48:22 -!- teggi [~teggi@113.173.4.29] has quit [Remote host closed the connection] 17:58:40 -!- edgar-rft [~GOD@HSI-KBW-109-193-013-113.hsi7.kabel-badenwuerttemberg.de] has quit [Quit: lifetime stopped because of unnecessary operation] 18:05:25 -!- Bike [~Glossina@69.166.47.109] has quit [Ping timeout: 248 seconds] 18:10:05 Hydan [~hydan@ip-89-103-110-5.net.upcbroadband.cz] has joined #sbcl 18:10:35 Bike [~Glossina@69.166.47.109] has joined #sbcl 18:28:24 Vivitron [~Vivitron@c-50-172-44-193.hsd1.il.comcast.net] has joined #sbcl 18:29:52 rpg [~rpg@216.243.156.16.real-time.com] has joined #sbcl 18:36:25 sdemarre [~serge@194.65-64-87.adsl-dyn.isp.belgacom.be] has joined #sbcl 18:51:43 -!- ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has quit [Remote host closed the connection] 19:01:48 -!- Bike [~Glossina@69.166.47.109] has quit [Ping timeout: 245 seconds] 19:06:33 c07b621 seems to have introduced a hang in safepoint builds (at least with featureset sb-{safepoint,thruption,wtimer}), see bottommost row of https://ci.cor-lab.org/job/sbcl-master/125/ (build 126 seems to hang at the same point) 19:06:42 i have to run now, sorry 19:07:51 (unless the test harness needs adaptation again) 19:08:57 that would be a bummer and quite strange 19:11:59 i've seen today a hang in some run-program tests, attributed it to a fluke 19:12:06 will investigate 19:15:22 run-program uses signal like sigchild 19:16:06 that would at least hint at why run-program 19:27:35 -!- fisxoj [~fisxoj@192-0-131-151.cpe.teksavvy.com] has quit [Ping timeout: 260 seconds] 19:29:25 ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has joined #sbcl 19:30:41 tylergoza [~tylergoza@72.29.34.246] has joined #sbcl 19:39:28 -!- Quadrescence [~quad@unaffiliated/quadrescence] has quit [Ping timeout: 256 seconds] 19:41:09 Bike [~Glossina@duncandunn-wless-gw.resnet.wsu.edu] has joined #sbcl 19:45:12 -!- ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has quit [Read error: Connection reset by peer] 19:45:36 Quadrescence [~quad@unaffiliated/quadrescence] has joined #sbcl 19:47:13 ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has joined #sbcl 19:50:06 can't trigger it 19:52:46 i can see things going haywire when trying to initiate safepoints during a fork 19:53:38 but there's without-gcing around it 19:57:55 -!- Bike [~Glossina@duncandunn-wless-gw.resnet.wsu.edu] has quit [Ping timeout: 260 seconds] 19:58:25 scymtym_ [~user@ip-5-147-120-181.unitymediagroup.de] has joined #sbcl 19:59:25 i really don't see how c07b621 could have caused this 20:01:16 -!- sdemarre [~serge@194.65-64-87.adsl-dyn.isp.belgacom.be] has quit [Ping timeout: 245 seconds] 20:01:25 stassats: sorry, i didn't want to waste your time 20:01:48 it just seemed strange to me that all those started to fail since a particular commit 20:01:55 no, i've seen the hang too 20:02:00 maybe additional builds will clarify the situation 20:02:29 but had about 10 full test runs without running into it again 20:03:51 this is just a subjective impression but the vms running our builds seem be much more susceptible to race conditions than "normal" machines 20:04:24 i think i had it to hang when i was building/testing another sbcl 20:04:37 so, i'll try that too add some load 20:04:40 Bike [~Glossina@69.166.47.105] has joined #sbcl 20:04:42 s/too/to/ 20:06:51 that did it 20:09:43 -!- ASau [~user@p4FF97651.dip0.t-ipconnect.de] has quit [Ping timeout: 245 seconds] 20:11:27 running three yeses does job as well 20:14:06 "CPU0: Core temperature above threshold, cpu clock throttled" that's nice 20:14:21 the stock cooler is probably no good 20:15:13 ASau [~user@p4FF97651.dip0.t-ipconnect.de] has joined #sbcl 20:17:15 " ;; Tests that reading from a FIFO is interruptible." may be the culprit 20:33:09 huh, the joined thread seems to be still up 20:37:20 and i can reproduce it without c07b621 20:40:24 maybe that commit only increased the likelihood? 20:40:46 unlikely, i can do it pretty reliably now 20:41:03 by running (loop repeat n do (sb-thread:make-thread (lambda () (loop (incf *x*))))) in another sbcl 20:41:11 where n is 6 20:41:52 and there are some zombies present 20:44:49 and it deadlocks when trying to deliver a safepoint for stopping the world 20:46:39 -!- angavrilov [~angavrilo@217.71.227.190] has quit [Remote host closed the connection] 20:47:01 -!- reb [user@nat/google/x-toyfeqpwofkffymb] has quit [Remote host closed the connection] 20:51:57 fisxoj [~fisxoj@192-0-131-151.cpe.teksavvy.com] has joined #sbcl 21:01:01 -!- fisxoj [~fisxoj@192-0-131-151.cpe.teksavvy.com] has quit [Ping timeout: 246 seconds] 21:06:48 -!- Bike [~Glossina@69.166.47.105] has quit [Ping timeout: 245 seconds] 21:12:40 Bike [~Glossina@wl-nat100.it.wsu.edu] has joined #sbcl 21:16:11 reduced: http://paste.lisp.org/display/138880 21:17:18 this is also stresses sb-wtimer, not just safepoints 21:17:34 since that's how with-timeout is supposed to be implemented 21:25:21 -!- drmeister [~drmeister@farnsworth.chem.temple.edu] has quit [Remote host closed the connection] 21:27:52 so, apparently, there's a dead thread hanging from the delivery of the timer 21:27:59 which wreaks havoc 21:30:48 and sigchld gets delivered to it too, wreaking some more havoc 21:31:04 and i am to untangle that mess, sigh 21:32:25 sigchld should get delivered to one thread, randomly. 21:32:35 from the set of threads that don't have sigchld blocked in it. 21:33:06 maybe the loading of the system causes the thread to persist enough time for sigchld to hit it 21:33:19 right. 21:33:35 and my room is getting warmer from all this load 21:33:49 why that's a problem for the sigchld to hit that thread instead of another? 21:34:11 because sb-safepoint is broken in that place? 21:34:24 sb-safepoint + wtimer 21:34:59 should block all signals in places where safepoints become broken, I guess? 21:35:16 i need to know first what's really going on 21:39:17 drmeister [~drmeister@166.137.87.200] has joined #sbcl 21:40:19 i can make it so that the with-timeout is not delivered and (read-line) hangs 21:40:34 nice error message "STOP_FOR_GC_PENDING, but why? " 21:41:15 http://paste.lisp.org/display/138880#1 21:41:32 will eventually stop for read-line, pressing enter several times may cause ldb 21:46:22 the best test case i can come up with: http://paste.lisp.org/display/138880#2 21:46:53 -!- drmeister [~drmeister@166.137.87.200] has quit [Remote host closed the connection] 21:48:32 drmeister [~drmeister@166.137.87.200] has joined #sbcl 21:54:17 -!- drmeister [~drmeister@166.137.87.200] has quit [Remote host closed the connection] 21:58:27 ok, with-timeout creates a thread # 21:58:37 that's the one which falls victim to sigchld, apparently 22:02:21 blocking sigchld inside it makes sense, but doesn't guarantee that other threads won't be affected too 22:04:27 sure, what is it about it that makes it unsafe for a safepoint? Although, really, it seems pretty nasty to interrupt a random thread and run some code on it to handle sigchld. 22:04:45 that just doesn't seem like a safe thing to ever do. :) 22:04:51 Bike_ [~Glossina@69.166.47.101] has joined #sbcl 22:05:50 a logical error somewhere, clearly 22:06:00 -!- Bike [~Glossina@wl-nat100.it.wsu.edu] has quit [Disconnected by services] 22:06:02 -!- Bike_ is now known as Bike 22:06:22 i haven't written sb-safepoint, nor there's any documentation, i'm in the dark here 22:06:24 It's best practice to do as little as possible in an interrupt handler, e.g., not calling lisp code would be best. 22:06:55 I actually don't see that invoking a safepoint in an interrupt handler can possibly be congruent with the idea of a safepoint, really. 22:07:24 safepoints are used to interrupt lisp threads 22:08:49 it may be causing something else, and the safepoints upon stop_the_world for gc just break on this inconsistency 22:08:58 anything can be happening 22:09:03 drmeister [~drmeister@166.137.87.200] has joined #sbcl 22:09:23 i mean, if you're in the middle of who-knows-what instruction sequence, NOT sitting at a safepoint. 22:09:35 and along comes an asynchronous signal, BLAM, oh wait, here's a safepoint? 22:09:46 that seems...unsafe, doesn't it? 22:10:31 signals are blocked when inside a safepoint 22:11:13 oh, when not sitting at a safepoints, you mean? 22:11:35 at least stop_the_world causes them to be blocked 22:11:44 that's why i can't C-c C-c on a deadlock 22:13:00 then there's a problem when the signal arrives when thread is in a C call, while the rest of the world is stopped, upon doing so, it also checks the safepoint 22:13:10 and traps when it's set 22:13:37 *stassats* is distracted by us open to look for details in the code 22:14:09 I'm using safepoint to mean "the (presumably safe) points at which a flag is checked" 22:14:35 so "inside a safepoint" is confusing to me. 22:15:46 the sate of execution after the safepoint is triggered and something wants to be done 22:16:11 for stop_the_world, waiting for the gc state change, for interrupts, something else 22:16:47 right, okay. 22:17:16 the last sb-qshow trace showed that the problem is when gc wants to stop the world and sets a safepoint, that thread doesn't respond to it => deadlock 22:17:29 -!- Bike [~Glossina@69.166.47.101] has quit [Ping timeout: 254 seconds] 22:18:00 so, sigchld leads to some state where it a) neither a ffi call b) nor to a sequence of code which checks for a safepoint 22:22:34 huh, odd; that thread should be blocked in os_wait_for_wtimer, I'd think. 22:22:43 which seems like a FFI call. :) 22:24:31 i need a better yet test case, this one doesn't fail with enabled debug output 22:24:39 brown [user@nat/google/x-lvcotyijafqrznps] has joined #sbcl 22:25:03 -!- brown is now known as Guest10695 22:27:25 Bike [~Glossina@69.166.47.101] has joined #sbcl 22:32:20 -!- drmeister [~drmeister@166.137.87.200] has quit [Remote host closed the connection] 22:38:55 -!- eeezkil [~eeezkil@unaffiliated/eeezkil] has quit [Ping timeout: 260 seconds] 22:38:55 -!- Guest10695 is now known as reb` 22:40:12 -!- segv- [~mb@95.91.242.119] has quit [Remote host closed the connection] 22:43:27 eeezkil [~eeezkil@unaffiliated/eeezkil] has joined #sbcl 22:45:23 drmeister [~drmeister@pool-173-59-25-58.phlapa.fios.verizon.net] has joined #sbcl 22:50:16 block signals in "System timer watchdog thread" would still make sense, for more precise timers 22:50:22 blocking 22:53:07 its body is (loop while (or (zerop (os-wait-for-wtimer *waitable-timer-handle*)) .... but if (errno == EINTR) return -1; 22:53:52 so, it exists the loop on an interrupt 22:57:13 on solaris, it returns 1 22:58:33 that isn't really a problem right now 23:00:09 -!- eeezkil [~eeezkil@unaffiliated/eeezkil] has quit [Ping timeout: 276 seconds] 23:02:04 -!- milosn [~milosn@cable-178-149-0-183.dynamic.sbb.rs] has quit [Read error: Connection reset by peer] 23:02:58 milosn [~milosn@cable-178-149-0-183.dynamic.sbb.rs] has joined #sbcl 23:03:09 -!- Hydan [~hydan@ip-89-103-110-5.net.upcbroadband.cz] has quit [Read error: Operation timed out] 23:04:28 eeezkil [~eeezkil@unaffiliated/eeezkil] has joined #sbcl 23:04:54 -!- milosn [~milosn@cable-178-149-0-183.dynamic.sbb.rs] has quit [Read error: Connection reset by peer] 23:07:26 -!- Bike [~Glossina@69.166.47.101] has quit [Ping timeout: 264 seconds] 23:08:25 milosn [~milosn@cable-178-149-0-183.dynamic.sbb.rs] has joined #sbcl 23:15:28 Bike [~Glossina@69.166.47.103] has joined #sbcl 23:22:07 -!- ehaliewicz [~user@50-0-51-11.dsl.static.sonic.net] has quit [Remote host closed the connection] 23:29:30 so, it looks like while that thread is in os_wait_for_wtimer, the gc thinks it's actually running lisp and tries to interrupt it with safepoint, to no avail, naturally 23:40:35 Hydan [~hydan@ip-89-103-110-5.net.upcbroadband.cz] has joined #sbcl 23:44:32 -!- Bike [~Glossina@69.166.47.103] has quit [Quit: Reconnecting] 23:46:11 Bike [~Glossina@69.166.47.103] has joined #sbcl 23:51:48 -!- milosn [~milosn@cable-178-149-0-183.dynamic.sbb.rs] has quit [Ping timeout: 245 seconds] 23:53:40 milosn [~milosn@cable-178-149-0-183.dynamic.sbb.rs] has joined #sbcl