00:11:42 -!- sesuncedu1 [~Adium@cpe-076-182-016-061.nc.res.rr.com] has quit [Quit: Leaving.] 00:23:54 erikc [~erikc@CPE00222d53fe78-CM00222d53fe75.cpe.net.cable.rogers.com] has joined #ccl 00:25:11 DataLinkDroid [~DataLinkD@1.130.56.238] has joined #ccl 00:29:56 -!- DataLinkDroid [~DataLinkD@1.130.56.238] has quit [Ping timeout: 256 seconds] 00:32:44 -!- dmiles_afk [~dmiles@c-71-237-234-93.hsd1.or.comcast.net] has quit [Quit: Read error: 110 (Connection timed out)] 00:33:08 were purify and freeze used in relation with write-elf-symbols-to-file so that functions would be at a fixed address for the sake of OPROFILE ? 00:46:25 sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has joined #ccl 00:47:17 DataLinkDroid [~DataLinkD@1.150.18.96] has joined #ccl 00:50:00 That may have been it. FREEZE has effectively been deprecated for the last few years. It should probably be removed. 00:52:02 -!- DataLinkDroid [~DataLinkD@1.150.18.96] has quit [Ping timeout: 256 seconds] 00:57:20 does write-elf-symbols-to-file still do something useful for OPROFILE ? 00:57:44 does freeze and/or oprofile still work? 01:00:14 PURIFY. (It'd be pretty stupid if it didn't). As I said a minute ago, FREEZE has been deprecated and should be removed. oprofile still works , though a newer profiler (perf) is used on newer Linux kernels. Using perf with CCL also depends on WRITE-ELF-SYMBOLS-TO-FILE. 01:02:42 -!- sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has quit [Quit: Leaving.] 01:05:01 sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has joined #ccl 01:08:12 DataLinkDroid [~DataLinkD@1.145.251.252] has joined #ccl 01:14:58 ok, so remove both purify and freeze with no regret. 01:15:00 thanks. 01:16:50 And I would guess that (save-application ... :purify t) - where T is the default - is perfectly safe. Changing the order of calls to FREEZE and PURIFY may have caused the problem, but I don't think that either call is necessary. 01:18:17 ok 01:18:20 thanks 01:23:47 -!- sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has quit [Quit: Leaving.] 01:24:00 -!- rme [rme@6D10F4DD.4CC8819B.699BA7A6.IP] has quit [Quit: rme] 01:24:01 -!- rme [~rme@50.43.190.179] has quit [Quit: rme] 01:25:18 rme [~rme@50.43.190.179] has joined #ccl 01:26:39 pjb` [~t@AMontsouris-651-1-198-45.w83-202.abo.wanadoo.fr] has joined #ccl 01:27:35 -!- pjb [~t@AMontsouris-651-1-93-50.w82-123.abo.wanadoo.fr] has quit [Read error: Operation timed out] 01:47:34 -!- DataLinkDroid [~DataLinkD@1.145.251.252] has quit [Ping timeout: 256 seconds] 01:56:04 just tried wit 1.9 r15784, and it entered the debugger during the save-application :-/ 01:57:58 you're using 1.9 now ? 01:58:01 Unhandled exception 11 at 0x41cbc9, context->regs at #x7fff8b9d4058 01:58:01 Exception occurred while executing foreign code 01:58:01 at check_range + 105 01:58:01 received signal 11; faulting address: 0x300000cfc000 01:58:01 invalid permissions for mapped object 01:58:01 ? for help 01:58:03 [9423] Clozure CL kernel debugger: 01:58:05 just tried it 01:58:13 going back to 1.8 02:02:59 check_range is part of the integrity checking code. 02:03:28 should I try w/o check-gc-integrity ? 02:04:37 It wouldn't fault there if you did. I don't know why it would fault there. 02:08:16 DataLinkDroid [~DataLinkD@1.145.128.72] has joined #ccl 02:09:32 it didn't use to fault with r15782 02:10:02 Did you update the kernel a couple of days ago ? 02:10:15 I don't think I did on this machine 02:10:37 up 54 days. kernel 3.2.5-gg987 02:11:06 Um, the CCL kernel ? 02:15:24 oh, well, I just tried 15784. 02:15:52 Previously, I was trying 1.9 15782; and my base version is still 1.8 15490. 02:23:48 indeed on r15784, it manages to dump an image w/o check gc integrity enabled 02:27:24 So, we don't look for problems, and don't find any. Not exactly reassuring. 02:27:24 -!- ipmonger [~IPmonger@c-68-81-244-69.hsd1.pa.comcast.net] has quit [Quit: ipmonger] 02:27:36 Of course it isn't. 02:28:33 didn't claim it was reassuring. 02:28:37 The address that it faulted on is in the readonly area; I just added code to check that area the other day. The function in question is called 'check_readonly_area", not 'check_area', so that doesn't make any sense. 02:37:27 -!- DataLinkDroid [~DataLinkD@1.145.128.72] has quit [Ping timeout: 256 seconds] 02:39:31 consolers [fork@59.92.56.123] has joined #ccl 02:40:29 any *bsd users know how to get alt send meta to emacs on the console ? 02:52:20 I guess not. 02:53:28 -!- consolers [fork@59.92.56.123] has quit [] 02:54:03 Fare: can you tell if the image saved with integrity checks off exhibits the same problems with CTYPEs etc ? 02:57:08 DataLinkDroid [~DataLinkD@101.171.200.98] has joined #ccl 02:57:12 -!- alms_ [~alms_@209-6-130-32.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has quit [Quit: alms_] 02:59:03 alms_ [~alms_@209-6-130-32.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has joined #ccl 03:01:48 -!- DataLinkDroid [~DataLinkD@101.171.200.98] has quit [Ping timeout: 256 seconds] 03:10:51 probably not. 03:11:06 Since I removed the purify and freeze, I haven't seen the ctype thing again 03:11:25 and my checks says A-Ok 03:12:09 OK. So at this point we aren't sure why the GC check faulted and would like to fix that, but other problems seem to be gone ? 03:12:43 yes, it looks lik 03:13:12 Good. 03:13:32 I'm seeing some unstability, but I don't think it's related to CCL, more like plenty of timeouts connecting to Oracle, and things like that. 03:14:35 Always glad when it's someone else's problem, if indeed it is. 03:14:45 but I need to check each failed test one by one to assess why the hell it failed. 03:14:58 OK. 03:16:12 DataLinkDroid [~DataLinkD@1.147.71.91] has joined #ccl 03:16:30 I'll go eat then. Good night. 03:20:14 The value # is not of the expected type (MEMBER :WITH-INFANT-IN-LAP :INFANT-IN-SEAT ...) 03:20:19 looks like a memory corruption 03:20:34 using 1.9 r15784 03:20:48 Well then I won't go eat then. 03:20:55 so... much rarer, but still something fishy :-( 03:21:17 I'm running another test suite w/ 1.8 15490, but haven't looked at all the failures. 03:21:46 re-running that test on a clean image (also 1.9 r15784) works 03:22:38 42...42 is that B B in UCS4? 03:23:04 Yes. Could be part of a string. 03:23:34 the substring BB appears in one of the strings manipulated by the test. 03:25:02 another failure with fixnum 0 instead of the above in the same type check failure 03:26:14 (this happens while parsing messages) 03:27:12 and various other objects, still with the same typecheck while parsing. 03:27:20 in other similar tests 03:30:21 I'll stop that run. It's discouraging. 04:33:40 on 1.8 15490, with check gc integrity, I get: 04:33:43 Missing memoization in doublenode at 0x302012ff1740 04:33:43 ? for help 04:33:43 [7061] Clozure CL kernel debugger: 04:33:48 in one of the runs. 04:42:29 A good reason to get to 1.9 ASAP. 05:00:08 well, 1.9 is much more unstable, as seen above. 05:00:59 So we need to find out why and fix that. Can you send me the image that gets the test failures and explain how to generate them ? 05:01:01 and 1.8 w/o integrity check doesn't seem to have issues (at least, it's been running rather stably for us for a year) 05:01:39 "run our big bulking test suite against an oracle database" is currently how I reproduce the test -- not very transmissible. 05:02:22 That involves the FFI as well as databases and hulking test suites ? 05:04:18 (:SSR-CHLD 0 #) 05:04:41 yes -- presumably the error could be narrowed down, but at the moment, the field is wide open. 05:05:16 these failures don't happen with sbcl, didn't happen with 1.8 (except that 1.8 seems to fail gc integrity checks once in a while when enabled) 05:06:14 looks like these errors are always around the same part of the code -- and in multiple processes, so it's not just one image being corrupted one way, but multiple being corrupted in similar ways 05:07:08 Well then, they're either CCL bugs or something else. I agree that it'd be necessary to narrow that down further. 05:08:20 Multiple OS-level processes ? Or multiple lisp threads ? 05:08:37 OS-level processes 05:09:02 we are still running one worker thread per process, multiple processes. Sucks. 05:09:45 well, the symptom is pretty low-level and didn't happen with 1.8, so that suggests a CCL bug. 05:10:22 that particular function that appears on a lot of the backtraces uses that evil nconc primitive, in case that matters 05:10:50 oh, and calls a method with a dynamic-extent declaration 05:11:10 Yes. If rme or I came back there later this week and just got locked in a room until this worked, would that work for you ? 05:11:36 probably 05:12:00 meanwhile, I'll try removing the dynamic-extent declarations and see if that works better. 05:12:29 I think that the possibility was discussed. If we wanted to do it this week, we'd need to decide soon. 05:13:49 yes, I'll talk to Allan tomorrow. 05:14:40 OK. 05:15:17 that 1.8 image seems to not have had any ORA-00000 failure. Dunno if it's because running later at night there's less contention, or because it is less buggy. 05:15:59 a lot of process attrition in this 1.9 test 05:16:41 Dunno. I think of 1.9 as being a lot less buggy than 1.8 was, and I also think that a lot of the changes weren't THAT extensive. 05:21:58 yes, all the slaves are dead or stalled 05:22:59 did dynamic-extend treatment change much? 05:23:30 Nope. 05:24:54 At some point in the last year or two, we allowed larger objects to be stack-allocated but IIRC that was before 1.8 05:27:04 Does your test suite allow running a single test ? Multiple times ? 05:28:54 yes, it does 05:29:04 or, well, we can start a REPL and do it. 05:29:12 Good. 05:30:35 which is not a 100% reproduction of the test suite, in that logging doesn't go to the same place, and the cpu pressure is less, and well, slime is running. 05:32:07 And if that behaves differently, that might tell us something. I wouldn't think that the #x4b0000004b and similar things are timing-sensitive, but don't know for sure. 05:32:43 I'm perfectly willing to blame slime for everything, on general principles. 05:41:29 well, it's sensitive to something, because if I just run the test once at the repl, it works. 05:43:52 -!- DataLinkDroid [~DataLinkD@1.147.71.91] has quit [Ping timeout: 256 seconds] 05:46:04 If you turn on gc integrity checks in this image, does it get a memory fault ? 05:51:05 on the 1.9 image or 1.8 image? 05:51:21 on the 1.8, I'll try when the current run is over -- I bet not 05:51:54 1.9, if that's where you were getting errors and if that's where the image couldn't be saved with checks enabled. 05:52:46 on the 1.9 I'm trying again w/ a few suspect dynamic-extent removed, right now -- and I think it didn't have the gc integrity check, anyway, because I can't even dump an image w/ the integrity check 05:53:04 good question... 05:53:20 Yes. Was wondering if it would find anything now, or if it'd fault again. 05:56:39 (recompiling it) 06:08:27 DataLinkDroid [~DataLinkD@123.208.122.198] has joined #ccl 06:08:54 pjb`` [~t@AMontsouris-651-1-122-197.w83-202.abo.wanadoo.fr] has joined #ccl 06:10:52 -!- pjb` [~t@AMontsouris-651-1-198-45.w83-202.abo.wanadoo.fr] has quit [Ping timeout: 256 seconds] 06:22:24 -!- DataLinkDroid [~DataLinkD@123.208.122.198] has quit [Ping timeout: 256 seconds] 06:23:31 yes, the image fails the integrity check after the dump 06:24:19 http://paste.lisp.org/+2X7C 06:25:49 however, it passes the tests so far. 06:26:03 a few thousand more tests to go 06:26:07 same way as it did at image-save time. 06:26:26 yes, same way, modulo slightly different numbers 06:27:12 (and a slower gc) 06:27:17 check_range checks a range of addresses between X and Y. The address that it's faulting on is less than any value of X that the function is called on. 06:35:06 is it a bug in the check because it's failing to consider purified memory areas, or is it a bug in the purify because it's doing something wrong? 06:35:13 I could retry with :purify nil 06:35:20 (now that I don't have a manual purify) 06:36:42 I don't understand yet how what happens happens. It's as if it found some object whose size was negative and backed up instead of moving forward, but I don't yet see how that could happen. 06:38:48 if I dump an image w/o purify, maybe after purification, it can reliably display the bug? 06:39:40 Maybe. It's walking another region of memory when it gets confused. 06:41:58 Are tests stilll running ? 06:42:52 However many #\l s in 'still 06:42:56 ' 06:43:34 well, removing those particular dynamic-extent declarations didn't help 06:43:48 yes, tests are still running 06:44:05 And you got some of the same kinds of failures. 06:45:47 I stopped the 1.9 run -- it has those same corruptions as before 06:46:01 the 1.8 run is still one. It has a few failures that I have to investigate. 06:46:22 (possibly the oracle connection failure, or maybe something else) 06:46:50 now trying 1.9 w/o purify 06:51:15 Your paste involved a 1.9 image saved without integrity checks, but when you enabled them it faulted for reasons that I don't understand. If we don 06:51:58 you previous message was cut at "If we don" 06:52:10 't make progress in other ways and if that image is still around, it might make sense for me to look at it if possible. 06:52:45 from almost identical source, I just clobbered the image w/o purify 06:52:57 Sorry. 06:53:23 Don't understand that. 06:53:34 even w/o purify, same bug 06:53:45 I can send the image 06:54:03 can you enable incoming like rme did before? 06:54:15 Let me try to recreate a place to put it. Just a sec, yes. 06:56:41 Hope I did that right. Should exist with correct permissions now. 06:57:19 same w/o purify http://paste.lisp.org/+2X7C/1 06:58:20 (bzip2'ing) 06:58:34 (stdin): 6.619:1, 1.209 bits/byte, 84.89% saved, 398971120 in, 60277600 out. 06:59:03 cd: Access failed: 550 Failed to change directory. (/pub/incoming) 06:59:32 try again ? 06:59:50 same 07:01:15 sorry. Our IT guy keeps more normal hours ... try again ? 07:03:08 can cd, can't put 07:04:49 one last try, please. 07:06:26 put: Access failed: 550 Permission denied. (borks2.bz2) 07:08:16 just a sec 07:10:37 once again ? 07:12:54 nope 07:13:07 which ftp server do you use? 07:13:20 vsftpd. I'm not familiar with it. 07:16:40 me neither 07:17:23 Sorry. Can beg rme tomorrow, I guess. 07:17:48 Guy has way too much job security ... 07:17:51 I tried vsftpd loooong ago and didn't like it. muddleftpd is what I ended up using in the end. 07:20:15 on the 1.8 + check, I got two slaves dead with Missing memoization in doublenode at 0x302012ff1740 07:20:42 make that 3 07:20:58 That may be spurious, or may have been fixed. Don't remember. 07:27:23 I could also try to rebuild w/ 1.9 and check the gc after each file. 07:27:25 sigh. 07:28:04 but while I'm sleeping, I'll do a plain 1.8 with purify but no gc check, like I'd like to commit. 07:28:29 If I could guess how to set permissions right, we might be able to make some progress. 07:30:32 not tonight anymore, I fear 07:32:02 Well, I might make progress. As it is, I just have to wimp out and ask rme to do it. 07:33:01 I'll do so, and will try to be up early (relatively) tomorrow. 07:39:19 any other proposed way to ship 58MB around? 07:39:54 Not that I can think of; I don't think about this sort of stuff much anymore. 07:41:22 dropbox 07:42:05 I suppose I could encrypt and dropbox 07:42:08 If I had clue one of how to use dropbox, that'd be a great idea. Thanks, but I am sans clue one. 07:42:18 or encrypt and google doc or something 07:43:38 I greatly enjoy not knowing a damned thing about that sort of thing. I think that the best option may be "wait for rme and try to get up early tomorrow." 07:44:27 creating a google drive document 07:44:31 I hope I can share it 07:45:26 And I hope I can understand how to access it. I'll try ... 07:47:02 message sent 07:48:54 I think that's the unpurified image 07:50:00 downloading it. 07:51:51 thanks 07:52:15 got it. thanks. If the bug 07:52:47 sorry. If the bug is in the integrity checking code I'll try to commit a fix; otherwise, I'll try to identify the bug. 07:53:29 if you need the source, I fear my paranoid overlords will require that you come here -- although I suppose some kind of screen sharing via google hangout could be possible. 07:55:35 I'm tentatively planning on being back there thursday and friday, assuming that others agree. 07:56:38 ok 07:56:41 Krystof [~user@81.174.155.115] has joined #ccl 07:56:43 will talk to Allan tomorrow. 07:56:48 bye! 07:57:20 Bye. 07:58:08 -!- Fare [fare@nat/google/x-gbgnmakmjqxqjkcm] has quit [Quit: Leaving] 08:03:07 DataLinkDroid [~DataLinkD@1.144.70.106] has joined #ccl 09:06:24 -!- PuffTheMagic [uid3325@gateway/web/irccloud.com/x-hrrdccslqgajyzqt] has quit [*.net *.split] 09:06:24 -!- peccu1 [~peccu@KD106179020073.ppp-bb.dion.ne.jp] has quit [*.net *.split] 09:15:12 peccu1 [~peccu@KD106179020073.ppp-bb.dion.ne.jp] has joined #ccl 09:52:13 DataLinkD2 [~DataLinkD@101.175.65.1] has joined #ccl 09:53:46 -!- DataLinkDroid [~DataLinkD@1.144.70.106] has quit [Ping timeout: 256 seconds] 09:55:25 DataLinkDroid [~DataLinkD@101.175.65.1] has joined #ccl 09:55:33 -!- DataLinkD2 [~DataLinkD@101.175.65.1] has quit [Read error: Connection reset by peer] 10:00:00 -!- DataLinkDroid [~DataLinkD@101.175.65.1] has quit [Ping timeout: 256 seconds] 11:07:39 PuffTheMagic [uid3325@gateway/web/irccloud.com/x-guimktxcdemgamps] has joined #ccl 11:33:32 sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has joined #ccl 11:56:51 rme_ [~rme@50.43.190.179] has joined #ccl 12:04:43 -!- rme [~rme@50.43.190.179] has quit [Read error: Connection reset by peer] 12:04:44 -!- Krystof [~user@81.174.155.115] has quit [Ping timeout: 246 seconds] 12:04:44 -!- rme_ is now known as rme 12:40:54 Krystof [~user@81.174.155.115] has joined #ccl 13:41:47 beyeran [~beyeran@p54A90D93.dip0.t-ipconnect.de] has joined #ccl 13:43:15 -!- beyeran [~beyeran@p54A90D93.dip0.t-ipconnect.de] has quit [Client Quit] 13:43:55 -!- sellout- [~Adium@c-98-245-92-119.hsd1.co.comcast.net] has quit [Quit: Leaving.] 13:54:28 -!- alms_ [~alms_@209-6-130-32.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has quit [Quit: alms_] 14:13:29 alms_ [~alms_@173-162-137-153-NewEngland.hfc.comcastbusiness.net] has joined #ccl 14:22:14 sellout- [~Adium@c-50-134-130-65.hsd1.co.comcast.net] has joined #ccl 16:53:29 -!- sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has quit [Quit: Leaving.] 17:15:52 sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has joined #ccl 19:45:01 -!- sellout- [~Adium@c-50-134-130-65.hsd1.co.comcast.net] has quit [Quit: Leaving.] 19:54:07 -!- sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has quit [Quit: Leaving.] 20:06:31 -!- pjb`` is now known as pjb 20:20:44 sellout- [~Adium@c-98-245-92-119.hsd1.co.comcast.net] has joined #ccl 20:31:38 Fare [fare@nat/google/x-myimhsmchviquuwz] has joined #ccl 21:17:07 dioxirane [~lqcd@unaffiliated/dioxirane] has joined #ccl 21:32:36 -!- dioxirane [~lqcd@unaffiliated/dioxirane] has quit [Quit: leaving] 21:39:28 DataLinkDroid [~DataLinkD@1.149.61.229] has joined #ccl 21:39:49 -!- Fare [fare@nat/google/x-myimhsmchviquuwz] has quit [Quit: Leaving] 21:54:00 -!- DataLinkDroid [~DataLinkD@1.149.61.229] has quit [Ping timeout: 256 seconds] 21:54:42 DataLinkDroid [~DataLinkD@120.154.131.2] has joined #ccl 21:59:50 -!- alms_ [~alms_@173-162-137-153-NewEngland.hfc.comcastbusiness.net] has quit [Quit: alms_] 22:00:48 -!- DataLinkDroid [~DataLinkD@120.154.131.2] has quit [Ping timeout: 256 seconds] 22:14:54 DataLinkDroid [~DataLinkD@1.149.237.54] has joined #ccl 22:29:08 -!- DataLinkDroid [~DataLinkD@1.149.237.54] has quit [Ping timeout: 256 seconds] 22:43:38 DataLinkDroid [~DataLinkD@123.208.33.105] has joined #ccl 22:50:15 patrickwonders [~Patrick@user-38q42ns.cable.mindspring.com] has joined #ccl 23:00:52 -!- DataLinkDroid [~DataLinkD@123.208.33.105] has quit [Ping timeout: 256 seconds] 23:06:44 -!- patrickwonders [~Patrick@user-38q42ns.cable.mindspring.com] has quit [Quit: Leaving] 23:15:13 DataLinkDroid [~DataLinkD@123.208.83.245] has joined #ccl 23:26:56 -!- DataLinkDroid [~DataLinkD@123.208.83.245] has quit [Ping timeout: 256 seconds] 23:40:44 DataLinkDroid [~DataLinkD@1.148.231.236] has joined #ccl 23:46:46 -!- DataLinkDroid [~DataLinkD@1.148.231.236] has quit [Ping timeout: 256 seconds]