[05:20] *** Koshi has joined #arpnetworks [06:57] *** Koshja has joined #arpnetworks [06:57] *** Koshi has quit IRC (Ping timeout: 250 seconds) [07:31] *** Koshja has quit IRC (Quit: Leaving) [07:55] *** ziyourenxiang has quit IRC (Quit: Leaving) [09:20] *** _iwc has quit IRC (K-Lined) [11:36] *** dne has quit IRC (Remote host closed the connection) [11:39] *** dne has joined #arpnetworks [12:23] *** BryceBot has quit IRC (Ping timeout: 256 seconds) [12:24] *** brycec has quit IRC (Ping timeout: 260 seconds) [13:33] *** Lucifer333 has joined #arpnetworks [13:36] *** BryceBot has joined #arpnetworks [13:40] *** brycec has joined #arpnetworks [13:50] *** BryceBot has quit IRC (Ping timeout: 260 seconds) [13:50] *** brycec has quit IRC (Ping timeout: 268 seconds) [14:28] ^ grrrr [14:34] bad times in bryce land? [14:34] Yes. [14:35] VPS seems to be having some issues with its underlying storage . Or that's my hypothesis. [14:35] Load average suddenly spiked (over 100) and eventually it went "dead" [14:36] Now it's sortof up (haven't rebooted yet, would rather not), kindof responding on the VNC terminal [14:36] Actually I think it's very slowly rebooting (from when I accidentally clicked the "Send CtrlAltDel" button thinking it was the "Send key sequence..." menu I'm familiar with in other VM stuff) [14:37] Yep, exactly what I did. Well damn. Was hoping I could "recover" it. [14:40] *** BryceBot has joined #arpnetworks [14:41] usually if load average spikes heaps it's due to lots of swapping [14:45] *** brycec has joined #arpnetworks [15:02] mercutio: Indeed there appears to have been a bunch of swapping that began to happen around the time, from the spotty information I have. [15:02] And if swapping isn't keeping up or worse is blocking then it all goes to shit. [15:02] on desktop's i would normally be lead to believe it was chrome's fault [15:03] well swap by definition blocks :) [15:04] (Right but I meant blocks and never returns...) [15:04] oh right [15:04] did you get a process list? [15:04] you might be able to figure out what was causing it from that [15:04] Remarkably yes, as things sorted themselves out on the [accidental] shutdown, my top refreshed [15:04] haha [15:05] Unfortunately, it says *everybody* was swapping. [15:05] Because they were. [15:05] Because it was 2+ hours since it started. [15:05] was anything using lots of ram? [15:05] Just the daily backup job which has $never caused this before [15:05] but other variables may be at play. [15:06] it may be lots of apache processes or such [15:06] none using that much on their own [15:06] but just with lots of them adding up... [15:06] LOL Apache... [15:06] although i suspect you may not use apache :) [15:06] well apache seems to be one of the common examples for that behaviour. [15:06] Actually I do, but it's a single small instance for handling DAV traffic to a single private vhost. [15:07] Anyhow, from iotop: Actual DISK READ: 3.85 M/s | Actual DISK WRITE: 80.04 M/s [15:07] that's decent speed for swap. [15:07] It's one of the newer nodes no less :) [15:07] kct03 [15:07] i gathered that [15:08] the old nodes wouldn't swap that quick.. [15:08] haha, exactly [15:08] but yeah it doesn't sound like it's disk performance issue so much as some kind of extra ram utilisation. [15:09] Indeed. [15:09] It swapped itself to death [15:09] Some sort of race condition or the like that caused it to swap harder and harder and harder, the OOM-killer never did a thing it seems :( [15:11] (oh this explains part of the load - it was time for the full, not incremental, backup) [15:11] i hvae never had much luck with the oom killer [15:11] it often kills the wrong thing [15:11] and like apache is spwaning way too many processes that are bloating up [15:11] so it kills mysql [15:13] lol, whatever it takes to keep people from browsing your LAMP website right? :P [15:13] I wouldn't say I rely on OOM-killer, I was just surprised it didn't kick-in. [15:13] tbh, i haven't had any exposure to the OOM killer in a long time. [15:15] * brycec restarts the backup... let's see if it blows up again :) [16:52] *** BryceBot has quit IRC (Ping timeout: 240 seconds) [16:54] *** brycec has quit IRC (Ping timeout: 248 seconds) [17:03] There we go. Broke it again. [17:03] [ 8160.104100] INFO: task jbd2/dm-4-8:404 blocked for more than 120 seconds. [17:03] That's why I wonder if there are issues with the underlying storage... Or Linux. Or the host. Or some driver. [17:04] (Yes I slammed it with some relatively high load activity - copying from one large database to another, so big chunk of RAM getting used I imagine) [17:05] At present, it appears the kernel's waiting for data to get written to disk / filesystem writes to commit, and it's just blocking. [17:06] http://sprunge.us/PBgJ [17:09] anyone here use mutt? [17:10] A bit in the past. I liked it, but I was never "great" at it. [17:11] trying to run down an annoying bug. was hoping someone was using the latest with the integrated sidebar and running a mac client. [17:12] integrated sidebar? (the fact I have no idea what that is should give you an idea how minimal my experience has been) [17:13] yeah, in 1.7 they merged in the popular sidebar patch [17:13] makes for a nice addition to the interface [17:14] Damn... Ended up having to hard power-off/on my VPS :( [17:16] *** BryceBot has joined #arpnetworks [17:22] *** brycec has joined #arpnetworks [18:39] *** ziyourenxiang has joined #arpnetworks [18:42] brycec: do you think something is up with the disks presented to kct0* hosts via ceph? [20:03] *** dne has quit IRC (Ping timeout: 260 seconds) [20:04] *** staticsafe has quit IRC (Ping timeout: 250 seconds) [20:05] *** staticsafe has joined #arpnetworks [23:40] *** dne has joined #arpnetworks