My disk (on kvr05) is acting horribly slow. is there a system status update that might be relevant? up_the_irons? up_the_irons - any idea about my slow disk? been crappy all morning do I have a noisy neighbor again? DAMN... this needs fixing. 1048576 bytes transferred in 22.494234 secs (46615 bytes/sec) it took 22 seconds to write A MEGABYTE up_the_irons ding ding ding RandalSchwartz: yeah, i've been having intermittent problems with the host kvr05 can you migrate me to a different box then? it's interfering with business and tasks. mine's fairly slow too. 52428800 bytes (52 MB) copied, 71.7589 s, 731 kB/s :/ RandalSchwartz: the first priority is to fix the issue with kvr05; i think it is a noisy neighbor, as you said jdoe: are you able to reach your vps at all? (i assume perhaps so, b/c you could get those test results) up_the_irons: yeah. jdoe: and it is on kvr05? not sure offhand. how would I tell? jdoe: it's the same as your vnc host (listed in portal under vm details) you can tell how often I log in ;) sec. :) mm i'm on KVR05 and is sloww, my monitoring software is detecting the VM as down/up/down/up i can't ssh into kvr05. serial console is responsive, however, it also won't let me in past me typing in my login name. i think the disk is completely locked up kvr10 is doing fine. :D ariel: although it is interesting that you can still get in, even though i can't get into the host. up_the_irons: lol... I have at least two portal accounts, neither showing my vps. nope, I'm on kvr06 jdoe: roger seems better now, 6-ish M/s. jdoe: i'm not touching kvr06 ;) but 6M/s is still pretty slow I know, I'm just saying whatever it was, it's improved slightly. And no argument ;) kvr05, i'm in! jdoe: roger :) now to find out who is hogging the disk prolly that schwartz guy. randal actually _does_ use quite a bit of I/O, but not enough to take down the box.. ;) I'm surprised any guest can make it *that* unresponsive. maybe someone is swapping? up_the_irons: sorry for the delay on centos. I had to board up house and horses for the storm. I'll be unboarding and such tonight and tomorrow.. then can get back to work on it. jdoe: well, kvm/qemu doesn't really have too much in the way of I/O isolation. Like, each VM is just a Linux process and if it wants to really load up the disk, the scheduler isn't gonna stop it jpalmer: oh problem at all, take your time. I'm actually going to make the 5.8 version :) I saw a presentation from Verio a few years back where they had talked about all the custom work they had done to give a level of control over disk I/O, memory and CPU usage to their VPS product they claimed they were going to release that back to the FreeBSD project, but never did :/ up_the_irons: linux does kinda suck for that. I have come up with a new way of dealing with disk provisioning for Linux guests that will afford me a *much* easier time supporting more distros; therefore, I'm pumped to make CentOS 5.8 templates, then Ubuntu 12.04 after that, then who knows... Fedora, Gentoo, Arch, ...? you name it LFS PLZ. twobithacker: jdoe yeah, i've seen several proprietary solutions talked about for that stuff, but nothing out in the open (but i admit, i haven't looked into it in a while) as the list grows longer so does the amount of effort to maintain such a list. I know it won't effect our bottom line but still... ;-) doesn't ionice work? up_the_irons: yeah I dunno, it doesn't seem like a problem anyone is anxious to solve, despite the work that's gone into cfq and deadline. I dunno why. I can load the shit out of the disk on fbsd, and the machine is still responsive. try that on linux and it craters. toddf: well, that's the thing, up until now the effort to maintain such a list has been prohibitive; but, with my new disk provisioning strategy, no longer :) I'm not in on all the details, but with a new disk provisioning strategy, doesn't that still mean someone has to manually test install a distribution for it to work? jdoe: i think it is a problem that people have accepted; kinda like DoS attacks, they suck but what can ya do? ;) toddf: yes, but only once, then the template is copied for every VM down the line. new distro versions don't come out _that_ often; openbsd is actually the most frequent publisher, where I find myself having to update my templates every 6 months ;) up_the_irons: request multiple vps's across different kvr* hosts like randalschwartz, and make it a cluster that will lessen the effect of one kvr* with temporary io starvation ;_) eh, ubuntu is on a 6 month cycle too isn't it? toddf: wait wut?;) jdoe: is it? i haven't even noticed... for ubuntu i tend to do only the LTS versions if you want non-LTS, a ubuntu fan can just get an LTS and dist-upgrade, very simple process with like 2 commands up_the_irons: my 'request multiple vps' thing was in rsponse to your 'what can ya do?' and my response involves more business for you *grin* toddf: :) up_the_irons: if you're tracking every version, then yeah it's every 6 months, most of the time. XX.04 and XX.10 roger time for kvr05 to admit failure and reboot up 968 days, 13:33, 5 users, load average: 11.82, 12.28, 20.87 of *course* i can't reach 1000, ever... the load was getting better, but now it is back to worse up_the_irons: re your comments earlier about protecting one guest from another on KVM, cgroups might be what you're after didn't know those worked for io... wonder how well they work. jdoe: Well, if we all start doing really heavy i/o usage, perhaps up_the_irons will find out for us? :-) plett: i'll check it out, tnx up_the_irons - there's no PST until november. :( "at approximately 08/27/2012 14:30 PST" - not possible but at least my system is working again. :) Do we know the cause? Also, my VPS seems to be working fine. it was just kvr05 acting up 900+ uptime days RandalSchwartz: I'm on kvr05 arenlor: note past tense jdoe: So the answer is no that we don't know what the cause was. RandalSchwartz: bah, i always end up screwing the date in some way RandalSchwartz: so, no further issues now? (i don't see any) arenlor: cause is unknown, but i simply suspect after 960+ days of uptime, something just got kinked. I've analyzed enough logs and i'm writing the resolution as "Power cycling the server fixed the high I/O wait issue" and leaving it at that. I'll take 960+ days of uptime and move on... :) up_the_irons: quitters never win and winners never quit... i usually win more sleep if i quit early :o yeah - seems fine now RandalSchwartz: cool sako: so looks like you didn't make it to la dev ops? ;) stop spying on me lol up_the_irons, turns out we have a few friends in common, sako being one and Lars pjs: you know sako and lars? hah, nice pjs: do u go to LADevOps at all? perhaps pjs is there now... we had 75 nicks in here earlier, a record! up_the_irons: still do =) Webhostbudd: still do wut? :) oh, 75 nicks! Hmm. How many VPSes? CaZe: I have two, what about you? Infinity. Nice trick, gave an interger overflow on the billing platform I'm sure. holy crap, 76 nicks! up_the_irons: Yeah, so I decided to check out ARP Networks. Aerosonic: ah, your nick looked new :) although it sounds oddly familiar... I used to be on Linode? Maybe you know me from their old support chan Aerosonic: hmm.. i was never on their old support chan; i used to be in the slicehost one... oh well, doesn't matter ) :)