***: nathani has joined #arpnetworks mhoran: I'm having some funky IO issues on my VPS, anyone else? mike-burns: Update: I wasn't before, but now I am.
Took me a minute to SSH in. mhoran: Maybe something going on with the ceph cluster? brycec: My loadavg is unusually high, seems to stem from time spent waiting on disk IO.
aka "me too"
Oct 04 08:13:47 vps3 kernel: sd 0:0:0:1: [sdb] tag#9 abort
:X
That's my disk from the spinning-rust Ceph pool... -: brycec fires the up_the_irons and mercutio flare ***: up_the_irons2 has joined #arpnetworks
up_the_irons2 is now known as up_the_irons up_the_irons: why do I keep getting booted from the chan ***: up_the_irons is now known as Guest54746 brycec: Because your client isn't identifying to Nickserv before joining the channel, Guest54746 ? Guest54746: but I do have it set to identify
It keeps renaming me brycec: heh, likely because nickserv thinks your nick's in use somewhere on the network, you may have to ghost it to get it back ***: Guest54746 has quit IRC (Quit: WeeChat 1.2) brycec: Anyway, IRC aside, let's get ceph back :) ***: up_the_irons2 has joined #arpnetworks brycec: Not seeing any more kernel disk errors at least, and services are slowly beginning to respond. Hooray up_the_irons2: I'm seeing the same on my end
A failing disk was finally kicked out of the cluster
and the cluster is rebalancing...
the failed disk is in the SATA pool, so the SSD pool _shouldn't_ have been affected brycec: (I only noticed it in the SATA pool, fwiw) up_the_irons2: Roger that
How's things looking for others?
Cluster is almost completely rebalanced brycec: (Rebalancing is magic)
(Also, don't let your datacenter hands/monkeys *not* label drives... I pulled 4 drives before I found the right ones. Thank goodness for rebalancing.) mike-burns: Things seem to be back to normal-ish. up_the_irons2: Of course, everything is labeled ***: ziyourenxiang_ has quit IRC (Ping timeout: 268 seconds)
ziyourenxiang_ has joined #arpnetworks