***: Koshja has joined #arpnetworks
Koshi has quit IRC (Ping timeout: 250 seconds)
Koshja has quit IRC (Quit: Leaving)
ziyourenxiang has quit IRC (Quit: Leaving)
_iwc has quit IRC (K-Lined)
dne has quit IRC (Remote host closed the connection)
dne has joined #arpnetworks
BryceBot has quit IRC (Ping timeout: 256 seconds)
brycec has quit IRC (Ping timeout: 260 seconds)
Lucifer333 has joined #arpnetworks
BryceBot has joined #arpnetworks
brycec has joined #arpnetworks
BryceBot has quit IRC (Ping timeout: 260 seconds)
brycec has quit IRC (Ping timeout: 268 seconds)
KILLALLHUMANS01: ^ grrrr
sjackso: bad times in bryce land?
KILLALLHUMANS01: Yes.
VPS seems to be having some issues with its underlying storage . Or that's my hypothesis.
Load average suddenly spiked (over 100) and eventually it went "dead"
Now it's sortof up (haven't rebooted yet, would rather not), kindof responding on the VNC terminal
Actually I think it's very slowly rebooting (from when I accidentally clicked the "Send CtrlAltDel" button thinking it was the "Send key sequence..." menu I'm familiar with in other VM stuff)
Yep, exactly what I did. Well damn. Was hoping I could "recover" it.
***: BryceBot has joined #arpnetworks
mercutio: usually if load average spikes heaps it's due to lots of swapping
***: brycec has joined #arpnetworks
KILLALLHUMANS01: mercutio: Indeed there appears to have been a bunch of swapping that began to happen around the time, from the spotty information I have.
And if swapping isn't keeping up or worse is blocking then it all goes to shit.
mercutio: on desktop's i would normally be lead to believe it was chrome's fault
well swap by definition blocks :)
brycec: (Right but I meant blocks and never returns...)
mercutio: oh right
did you get a process list?
you might be able to figure out what was causing it from that
brycec: Remarkably yes, as things sorted themselves out on the [accidental] shutdown, my top refreshed
mercutio: haha
brycec: Unfortunately, it says *everybody* was swapping.
Because they were.
Because it was 2+ hours since it started.
mercutio: was anything using lots of ram?
brycec: Just the daily backup job which has $never caused this before
but other variables may be at play.
mercutio: it may be lots of apache processes or such
none using that much on their own
but just with lots of them adding up...
brycec: LOL Apache...
mercutio: although i suspect you may not use apache :)
well apache seems to be one of the common examples for that behaviour.
brycec: Actually I do, but it's a single small instance for handling DAV traffic to a single private vhost.
Anyhow, from iotop: Actual DISK READ: 3.85 M/s | Actual DISK WRITE: 80.04 M/s
mercutio: that's decent speed for swap.
brycec: It's one of the newer nodes no less :)
kct03
mercutio: i gathered that
the old nodes wouldn't swap that quick..
brycec: haha, exactly
mercutio: but yeah it doesn't sound like it's disk performance issue so much as some kind of extra ram utilisation.
brycec: Indeed.
It swapped itself to death
Some sort of race condition or the like that caused it to swap harder and harder and harder, the OOM-killer never did a thing it seems :(
(oh this explains part of the load - it was time for the full, not incremental, backup)
mercutio: i hvae never had much luck with the oom killer
it often kills the wrong thing
and like apache is spwaning way too many processes that are bloating up
so it kills mysql
brycec: lol, whatever it takes to keep people from browsing your LAMP website right? :P
I wouldn't say I rely on OOM-killer, I was just surprised it didn't kick-in.
mercutio: tbh, i haven't had any exposure to the OOM killer in a long time.
-: brycec restarts the backup... let's see if it blows up again :)
***: BryceBot has quit IRC (Ping timeout: 240 seconds)
brycec has quit IRC (Ping timeout: 248 seconds)
KILLALLHUMANS01: There we go. Broke it again.
[ 8160.104100] INFO: task jbd2/dm-4-8:404 blocked for more than 120 seconds.
That's why I wonder if there are issues with the underlying storage... Or Linux. Or the host. Or some driver.
(Yes I slammed it with some relatively high load activity - copying from one large database to another, so big chunk of RAM getting used I imagine)
At present, it appears the kernel's waiting for data to get written to disk / filesystem writes to commit, and it's just blocking.
http://sprunge.us/PBgJ
JC_Denton: anyone here use mutt?
KILLALLHUMANS01: A bit in the past. I liked it, but I was never "great" at it.
JC_Denton: trying to run down an annoying bug. was hoping someone was using the latest with the integrated sidebar and running a mac client.
KILLALLHUMANS01: integrated sidebar? (the fact I have no idea what that is should give you an idea how minimal my experience has been)
JC_Denton: yeah, in 1.7 they merged in the popular sidebar patch
makes for a nice addition to the interface
KILLALLHUMANS01: Damn... Ended up having to hard power-off/on my VPS :(
***: BryceBot has joined #arpnetworks
brycec has joined #arpnetworks
ziyourenxiang has joined #arpnetworks
nathani: brycec: do you think something is up with the disks presented to kct0* hosts via ceph?
***: dne has quit IRC (Ping timeout: 260 seconds)
staticsafe has quit IRC (Ping timeout: 250 seconds)
staticsafe has joined #arpnetworks
dne has joined #arpnetworks