#arpnetworks 2019-05-07,Tue

↑back Search ←Prev date Next date→ Show only urls

(Click on time to select a line by its url)

Who	What	When
brycec	Yeah I've been happy with DarkSky as a user for awhile and, when it came time to replace wunderground in BryceBot (they shut-down their API), DarkSky was my first choice.	[07:59]
	So ARP Networkians, I'm looking for some advice/info on virtualization and storage. What are thoughts on building a Ceph cluster vs buying something like a TrueNAS that has a fancy multi-year warranty (next-day replacement parts etc), support, a nice GUI etc? The appeal of a TrueNAS is that I'm already familiar with FreeNAS, I like ZFS, and someone else can maintain it if I've moved on from the company (or otherwise, not around), and it has a single company providing Enterprise(tm) support (warranty, support line for whoever has to take over, etc). Oh and it's built and burned-in to high hell before it's shipped to us. Cons: Cost. Ceph is fairly unknown to me and I'm still reading-up to understand things and I wanted to ask ARP (up_the_irons et al) who did their own migration to Ceph and rely on it now. * What's management like? Is it making API calls directly (from curl), some other CLI commands, is there a GUI that could be used by "tier 1 helpdesk"?	[08:06]
m0unds	no experience with ceph, but truenas stuff is really nice - had ~100TB across two chassis w/extensions at work for archival storage	[08:17]
brycec	(Lol I got interrupted by a conference call)	[08:18]
m0unds	not sure if it's worth the cost premium over rolling your own cluster unless you expect hardware failures or something (never had to use the warranty, but the folks @ ixsystems were great during sales and whatnot)	[08:18]
brycec	m0unds: Something to keep in mind is that I work 100% remote so I'm not the one building and installing anything. Having something delivered "ready to go" is a plus.	[08:19]
m0unds	oh, okay, in that case, completely worth it haha	[08:19]
brycec	The price is pretty steep for our budget so I'm looking for alternatives (con call at 1h20 now...)	[08:22]
	* What about "enterprise support"? Is there some company we can say "hey, help us and replace anything that breaks for the next 3 years"? Or would that just be ourselves/whatever hardware vendor we buy from (if we buy something from HP/Dell/Lenovo)	[08:28]
	* How is capacity and expansion? How is capacity planning "handled"? If I want 20TB with some modicum of redundancy, am I building two+ servers with 5x4TB drives (for instance)? Or maybe 4 servers at 10TB+ each? And if I need more storage, just... build another and add it?	[08:41]
	* How's resource usage/planning? Is Ceph particularly memory-hungry (as ZFS is known to be)? Is IO CPU-bound, disk- or network-bound?	[08:48]
	(To clarify: Those last couple of questions can be found online but I'm interested in ARP's experience specifically.)	[08:53]
	btw m0unds just curious if you recall and are willing to share - What did that cost you, rounded to the nearest $10k? What sort of high-availability did that support? Was it being used for VM disks (requiring super-low latency and relatively high IOPS) or something less performance-sensitive? Right now my biggest concern is that I'll spend a 5-digit chunk of money on something that doesn't perform up to snuff, effectively repeating the mistakes of past admins here. And I'll be doubly embarrassed if there was a cheaper, better alternative like Ceph that I had just skipped over. (And naturally, there's a time pressure to all this - the sooner it's deployed, the better, yesterday would be ideal) * What sort of monitoring exists, either implemented in Ceph, or on top of Ceph, or beneath Ceph, for things like node failures or, more concerning to me, drive failures?	[09:03]
........................... (idle for 2h11mn)
m0unds	brycec: I want to say it was $14-18k per device, and this was back in 2013 or so; use was purely archival, so very low moment-to-moment demand. just dump stuff, then replicate one unit to the other via network (one onsite, one offsite)	[11:17]
brycec	m0unds: Ah cool, thank you!	[11:25]
m0unds	looking to see whether i might still have the old outlook pst in a backup	[11:26]
....... (idle for 33mn)
	here we go, found it brycec: $15k nearest round number per chassis + $2k per unit for extended software support, 3yr adv replacement, external sas kit that was w/4TB 7200rpm SAS disks circa 12/2013	[11:59]
brycec	Their prices haven't changed terribly much, which is a little disappointing. 44TB raw is about $11k (I don't remember what drives) The more I'm reading on Ceph, the more excited I am for it as a technology, and wishing I had hardware laying around to build up a testbed. I'm real curious about getting a few chassis from 45Drives (has "fancy warranty"), or maybe just some SuperMicro stuff and building from scratch Price isn't a whole lot better (45Drives would be about $8k/chassis w/o drives), but... tempting all the same.	[12:05]
......... (idle for 43mn)
	More Ceph questions for ARP: What does ARP's Ceph architecture look like? Are the VM hosts also running monitors and/or OSD, or are those on isolated/dedicated machines? What is ARP's expansion plan, replace existing OSD/disks with larger ones, add a new server full of disks when you reach some watermark, add a new server with a few disks and just steadily add disks as capacity requirements grow (or something else)? (Oh and insert "MDS" wherever applicable in my question, such as "running monitors and/or MDS and/or OSD". Forgot about that.)	[12:50]
m0unds	wow, prices are still that high? yeesh lol, "the storinator" @ 45drives	[12:58]
mercutio	brycec: storage hosts and vm machines are separate. we did the migration ourselves. main thing about ceph is you want at least 4 servers as a starting point really as you want 3 way replication, and you want to be able to have one host go down reads normally comes from one of the 3 way rather than split. meaning your read performance of your drives has quite a big correlation with your read speeds for data. that said with readahead that can change a bit. and if you do parallel requests to different non 4mb areas etc.	[13:03]
	i did a test bed myself first, and lots of reading..	[13:11]
m0unds	how's the cluster connected to vms? gbe?	[13:12]
mercutio	infiniband	[13:12]
m0unds	cool are the spinning disks like 15krpm sas or something?	[13:12]
mercutio	we've got mix of storage that is one of the hardest things to decide when starting out what is your performance needs, and how wide do you want to go with what disks to achieve that one cool thing about ceph vs local storage is that it's much harder for single clients to impact other users where on local storage some nasty neighbours can really impact you.	[13:14]
m0unds	right, contending for local disk i/o	[13:18]
mercutio	also you can move/reboot single servers so like when we moved our cluster we moved it one server at a time..	[13:18]
m0unds	io perf is really good on both ssd and spinning disk w/the "thunder" vms - that's why i was curious	[13:24]
mercutio	it's good not great i mean it parallelises well linux does some readahead at least the spinning disk pool is also quite wide like i think that's probably one of the things that people underestimate the need of for ceph when coming from traditional storage... you don't really want to have 4 storage servesr, you want to have 10 so it's better to have 10x10 disks host sthan 4x25 disk hosts my test setup was 4 nodes	[13:28]
......... (idle for 41mn)
brycec	"also you can move/reboot single servers" hence having 4+ (with copies=3). If you only had 3 servers and copies=3, it would need to rebalance once the third came back, yeah? mercutio: What's management like for having many (almost?) identically-configured OSD hosts? Like, what kind of maintenance is there to do beyond applying OS updates, anything? Or does Ceph just keep on ticking for the most part?	[14:15]
mercutio	well you can set minimum copies to 2 if you want to test with a couple of hosts you can set minimum copies to 1 even and then it can still write if you don't have enough to guarantee then it'll block all writes umm maintenance isn't that bad. if you want to go in depth probably should do a consult.	[14:19]
brycec	Oh right, ARP has a side-hustle lol. Definitely don't mean to impinge on that, and not looking to get too in-depth. It (maintenance) was just something I hadn't seen much mention of in my reading so far. And of course as a customer I'm always interested in the underlying infrastructure.	[14:33]
mercutio	well i suppose the biggest advantage to maintenance is that you can fail disks and then replace them when you like. so usually on local storage the disk will fail at some inopportune time and then you need to go swap it out. (i mean when is an opportune time for a disk to fail?) so as long as you have extra disks in the cluster (which you should) then you can replace them whenever	[14:35]
brycec	What happens when a disk fails, does the OSD process just crash or do you have to stop it manually?	[14:37]
mercutio	it can fail it out or not. occassionally a disk partially fails and needs to be forced out but at least no dc visit :)	[14:38]
brycec	Ceph certainly seems much more flexible than a RAID or ZFS solution in that regard. Lose X disks with those and you're just hoping you don't lose anymore (barring hot-spares which are basically wasted space). Ceph, you can lose as many disks as you like so long as you still have the required $capacity * $copies	[14:40]
mercutio	yeah if using zfs you can always have hot spares. that's what we have in germany. if you're using servers that can take a lot of disks then hot spares isn't as bad too	[14:40]
brycec	But those hot spares are literally wasting space in the chassis. I'd rather put it to work (in a still-resilient manner), not to mention I'd know if it were a dud much sooner than if I have to wait until the spare's needed and it craps out then.	[14:41]
mercutio	yeh accounting for 3x makes it a bit harder to justify. yeah well in germany it's good because it's harder to replace the disks. zfs isn't a crazy way to go for smaller setup	[14:41]
brycec	Also, I'm in love with the expandibility of Ceph. ZFS has hoops to jump through (adding additional vdevs of matching or greater capacity etc) which means detailed planning AND expense right now. Ceph seems like you can just throw disks/servers at it whenever you have spare capital.	[14:43]
mercutio	so like if the idea of having 4 servers even seems extreme then zfs may be better :) yeah you don't have to match sizes	[14:43]
brycec	Ooh that too Nah 4 servers doesn't sound extreme, though I'm strongly considering just 2 or 3 to start in order to ease the budget blow.	[14:44]
mercutio	i'd say it'd be better to go second hand if budget is tight 2 or 3 isn't really enough to go with ceph	[14:44]
brycec	Yeah I'm writing an email now to get a feel from the people actually driving the decision/holding the purse strings on that.	[14:45]
mercutio	https://www.instagram.com/p/BmuOhtcF9HE/	[14:46]
BryceBot	Instagram: "Some of our gear in Frankfurt, Germanyn.n.n.n#datacenter #datacentre #datacenters #server #servers #cloudcomp #serverroom #serverrack #sysadmin #tech #technology #techgeek #wiring #cat5 #cabling #cables #wires #behindthescenes #networking #networkengineer #cableporn #datacenterporn #switches #arpnetworks #cisco" by arpnetworks	[14:46]
brycec	Ceph definitely seems better suited to the cheap off-the-shelf (and eBay) approach / I can get away with stuff that doesn't have a "fancy warranty" since there's more hardware with redundancy from the get-go. (If 1 of 10 cheap Dell eBay servers dies, so what, buy a replacement and install it when there's time)	[14:46]
mercutio	so kzt01/kzt02 each take 25 disks	[14:47]
brycec	Are they VM hosts? They look like they're also VM hosts (based on the naming scheme)	[14:47]
mercutio	yeah they're our zfs ones which are local storage	[14:47]
brycec	Ahhh so Frankfurt != Ceph	[14:48]
mercutio	it's harder to justify in germany building a ceph cluster yeah	[14:48]
brycec	Makes sense	[14:48]
mercutio	it's got ssd zil though we haven't had any disk performance issues in germany either having lots of disks helps that hah but yeah, so like for instance you could do 8x25 disk hosts and then find you have 200 disks those are 2.5" though VM is different from archival storage though 4tb disks is too big for VM, you're going to crash and burn..... you get a lot more read/write load with virtual machine hosting compared to backup/archival space or such	[14:48]
brycec	Meanwhile in the states https://www.instagram.com/p/BwoAGuLAvDe/ I'm gathering SCT* are VM hosts, and the couple of chassis full of 2.5" disks are Ceph storage?	[14:51]
BryceBot	Instagram: "More migrated gear #datacenter #datacentre #datacenters #server #servers #cloudcomp #serverroom #serverrack #sysadmin #tech #technology #techgeek #wiring #cat5 #cabling #cables #wires #behindthescenes #networking #networkengineer #cableporn #datacenterporn #switches #arpnetworks #hp #supermicro" by arpnetworks	[14:51]
mercutio	yeah sounds about right there's no 2.5" disks in that picture	[14:51]
brycec	What are the HP machines below, unlabeled, do you know?	[14:52]
mercutio	think they're hp g8s	[14:52]
brycec	I meant what are they used for lol	[14:53]
mercutio	oh VM	[14:53]
brycec	brycec wishes Instagram's photos were bigger/higher res	[14:53]
mercutio	you can half read on the side zoom in and you can read kct13, kct12, kct11, kct10 from top to bottom	[14:53]
brycec	Well at least I'm pretty sure this is all Ceph given the tags, lol https://www.instagram.com/p/BwhyfBHAJUd/	[14:54]
mercutio	oh they are 2.5"	[14:54]
BryceBot	Instagram: "More in progress Ceph migration, then finally complete! #datacenter #datacentre #datacenters #server #servers #cloudcomp #serverroom #serverrack #sysadmin #tech #technology #techgeek #wiring #cat5 #cabling #cables #wires #behindthescenes #networking #networkengineer #cableporn #datacenterporn #switches #arpnetworks #hp #supermicro" by arpnetworks	[14:54]
mercutio	yeah that's hard to read too :)	[14:54]
brycec	I really like the full view in https://www.instagram.com/p/BwDSI68ASEX/ (thanks up_the_irons )	[14:54]
BryceBot	Instagram: "Some of our gear from outside the cagen#datacenter #datacentre #datacenters #server #servers #cloudcomp #serverroom #serverrack #sysadmin #tech #technology #techgeek #wiring #cat5 #cabling #cables #wires #behindthescenes #networking #networkengineer #cableporn #datacenterporn #switches #arpnetworks #supermicro #hp #apc" by arpnetworks	[14:54]
brycec	(I realize that's pre-move, yes) It gives an idea of the scale of VM hosts and storage hosts	[14:55]
mercutio	yeah we're more dense for vm hosts than storage because vm hosts don't need as much space for the disks :) i think things have been pretty smooth with ceph/kvm but people onyl really mention performance ors uch if things are terrible	[14:55]
brycec	And I gather you're gradually moving to those 1U HP G8s, replacing the older hosts that still did local storage.	[14:57]
mercutio	no kvm hosts in los angeles use local storage for customers so yeah that's actually done :) which made the move easier	[14:57]
brycec	I can only imagine. Thanks for letting me bend your ear, mercutio !	[15:00]
mercutio	i've heard of a few other people doing ceph clusters now it is growing in popularity	[15:01]
brycec	Random idea - Is there such a thing as "resellable" Ceph / Ceph as-a-service? Some way that a provider such as ARP could grant access to the overall cluster and let a customer create pools for themselves? Or does that just defy reason?	[15:02]
mercutio	i don't think security model would be good for that in current ceph you could do iscsi rbd or such	[15:02]
brycec	I framed that as an "ARP offering" to keep it mildly on-topic. I'm also thinking about for $work in a year or two's time -- I maintain the storage and grant my users (departments etc) access to a subset so they can manage their own VM crap. (Less devops, more devs and ops)	[15:04]
mercutio	the closest i can think that could work ok is iscsi rbd but you'd still have to create iscsi devices per volume you want to export you can actually potentially boot off iscsi rbd now i want to try that ;) http://docs.ceph.com/docs/mimic/rbd/iscsi-overview/	[15:04]
brycec	K yeah, that doesn't quite fill the hole, but not the end of the world. So long as I continue maintaining the Proxmox servers (and the permissions in Proxmox) they can create/destroy as many VMs as they need without directly managing/handling RBD creation (Proxmox's GUI is pretty much "How much space do you need? Okay, done." and it creates the requisite RBD in the pool etc etc)	[15:07]
mercutio	well i suppose you don't nede to have an iscsi volume per vm	[15:08]
brycec	That's pretty cool though!	[15:08]
mercutio	you could just have a fs on top and then that'd work with proxmox	[15:08]
brycec	Proxmox will let you build a whole zpool on top of iSCSI even (I'm sure there must be a scenario where that makes sense... I just know it's doable)	[15:08]
mercutio	yeah that sounds like fun the main thing is getting more than gbe to hosts... i suppose you could always use ib for it	[15:09]
brycec	Yeah we're planning to standardize on 10GbE RJ45 (why RJ45? I don't know)	[15:09]
mercutio	i wouldn't go for rj45... sfp+ or ib ib is qsfp so is 40gbe	[15:10]
brycec	That decision wasn't mine :p Though I do remember why they did... because they don't have a 10GbE switch yet and wanted to be able to use it on a GbE switch. sigh	[15:10]
mercutio	some people have had issues with 10gbe and performance btw	[15:10]
brycec	Ugh now you tell me :p	[15:11]
mercutio	heh now is when you mentioned it :)	[15:11]
brycec	brycec is just responsible for the storage end of this project	[15:11]
mercutio	10gbe to clients isn't so bad	[15:11]
brycec	What sort of performance issues? What should I be googling?	[15:11]
mercutio	but if you're sharing that 10gbe to clients with 10gbe between nodes then it can congest you could just get arista 40gbe switch? and 40gbe cards?	[15:11]
brycec	We'll see. Money's already been spent...	[15:12]
mercutio	ah	[15:12]
brycec	This project hasn't been terribly well coordinated from the get-go. I found out I was responsible for anything a week after they'd ordered VM hosts and asked me where my part was.	[15:13]
mercutio	hah so you have to come up with storage solution? and they have vm hosts with 10gbe?	[15:13]
brycec	Very embarrassing to be legitimately saying "Oh, I didn't know I was supposed to do anything." (unlike in school when I'd use it as a bullshit lie)	[15:14]
mercutio	yeh it sucks it's not that uncommon though becuase someone says that person will deal with it, then someone else says that person and then everyone assumes that person knows this :)	[15:14]
brycec	Yep that's me apparently, responsible for managing all of it, but only responsible for purchasing storage. (And not the one installing any of it, physically) And yeah, the VM hosts were purchased with Intel 2x10GbE RJ45 NICs No idea where the switch situation stands... brycec was hired as a developer but gained a sysadmin hat along the way, unwillingly.	[15:15]
mercutio	haha	[15:17]
brycec	mercutio: Quick question -- When a pool is configured with copies=3, does that a block is replicated on 3 different machines, or 3 different OSD (so if a machine had 3 disks and the block was only on those disks and the machine died, would the block be lost?) (I haven't come across an answer to that yet -- or if I did, I didn't realize it, so I figured I'd ask)	[15:27]
mercutio	depends what your policy is normally people set to 3 different machiens but you can set it to 3 different cages or such too or 3 disks on one machine http://docs.ceph.com/docs/mimic/rados/operations/crush-map/ brycec:	[15:34]
brycec	Ah it's configured in the CRUSH map, got it. Thanks again mercutio	[15:44]
	(In my defense, I've glossed-over the CRUSH stuff as being a level of configuration I don't need to know/care about yet)	[15:50]
mercutio	well the default crush map should be fine for small setup anyway	[15:59]
brycec	define "small" :P	[15:59]
.... (idle for 15mn)
mercutio	small means all servers in one rack basically like if you have 3 rooms with servers in them, each with power feeds to them there is the potential for a whole room to lose power, so you may want to do your redundancy across rooms so that one whole room can go down.	[16:14]
.......................................................................... (idle for 6h6mn)
m0unds	@w kaeg	[22:21]
BryceBot	ဧရာဝတီမြစ်, မကွေးခရိုင်, မကွေးတိုင်းဒေသကြီး, 10261, မြန်မာ: Partly Cloudy ☁ 38.2°C (100.8°F), Humidity: 30%, Wind: From the E at 0m/s Gusting to 1m/s. Visibility: 16km -- For more details including the forecast, see https://darksky.net/forecast/20.3142558,94.9191074	[22:21]
m0unds	wat it partially matched the name of a city instead of icao, lol @w 87114	[22:21]
BryceBot	ABQ, New Mexico, 87114, USA: Overcast ☁ 55.0°F (12.8°C), Humidity: 51%, Wind: From the ENE at 2mph Gusting to 7mph. Visibility: 8mi -- For more details including the forecast, see https://darksky.net/forecast/35.191585586202,-106.67789964486	[22:24]
............. (idle for 1h4mn)
***	ziyourenxiang has quit IRC (Ping timeout: 250 seconds)	[23:28]

↑back Search ←Prev date Next date→ Show only urls

(Click on time to select a line by its url)