↑back Search ←Prev date Next date→ Show only urls | (Click on time to select a line by its url) |
Who | What | When |
---|---|---|
up_the_irons | brycec: 4TB is still the best value right now; I did a calculation a few weeks ago
something like this, for example: https://www.newegg.com/Product/Product.aspx?Item=N82E16822236350 as for us personally, we are still just re-using the myriad of drives we have left over from the old kvr machines so, 1TB each | [00:59] |
mercutio | depends on the workload if it's enough iops going with 4tb though
archiving data is way less iops than lots of virtual servers reach writing to log files etc. s/reach/each/ | [01:06] |
BryceBot | <mercutio> archiving data is way less iops than lots of virtual servers each writing to log files etc. | [01:07] |
............................................ (idle for 3h38mn) | ||
*** | mhoran has quit IRC (Quit: mhoran) | [04:45] |
mhoran has joined #arpnetworks
ChanServ sets mode: +o mhoran | [04:54] | |
......... (idle for 44mn) | ||
hive-mind has quit IRC (Ping timeout: 258 seconds)
hive-mind has joined #arpnetworks | [05:38] | |
..................................... (idle for 3h0mn) | ||
m0unds | brycec: did you have an ideal pricepoint per disk for enterprise drives? | [08:39] |
............ (idle for 55mn) | ||
*** | ziyourenxiang has quit IRC (Ping timeout: 268 seconds) | [09:34] |
........................................................................................ (idle for 7h18mn) | ||
ziyourenxiang has joined #arpnetworks | [16:52] | |
............ (idle for 55mn) | ||
JC_Denton has quit IRC (Quit: The whole problem with the world is that fools and fanatics are always so certain of themselves, but wiser people so full of doubts.) | [17:47] | |
JC_Denton has joined #arpnetworks | [18:01] | |
........................................ (idle for 3h19mn) | ||
brycec | Thank you mercutio and up_the_irons for the input.
Alas I'm but a mere dev who's managed to do sysadmin work (or vice-versa), but doing the hardware thing? Ooof I am out of my depth. up_the_irons: 4TB certainly seems like a sweet spot, but even that WD drive (which I have reliability concerns with, thanks to Backblaze's data) suffers the same exact fault of the HGST drive I had my eye on. Finding actually new inventory is hit-and-miss at best. And buying a hard drive with who-knows-how-many "miles" on it already feels like it's seriously asking for trouble. No? (^ Referring to "refreshed"/"renewed" drives) m0unds: Ideal price point? Ideally, $75 for 4TB I guess. But only because that's a price I saw when I first started searching, and because $1200-$1500 (16-20 drives) is a relatively easy pill to swallow. But then, even, $150 for a 4TB feels "reasonable" for an "enterprise/DC" drive. | [21:20] |
mercutio | just assume disks will fail
you need to figure out how many iops you need though | [21:29] |
brycec | But how the hell do I do that? :/ | [21:30] |
mercutio | 4x4tb hard-disks will only give iops slightly higher than 4/3 of one hard-disk
by looking at statistics from your current work load | [21:30] |
brycec | (I know you described figuring out what I've got now, but that's not quite the same as "what do you need") | [21:30] |
mercutio | yeah well i reckon it's always best to overspec slightly in the beginning
as storage requirements can both increase over time suddenly, can start failing worse and worse past a certain point.. | [21:31] |
brycec | "4x4tb hard-disks" reminder than I'm talking 16x4TB disks (spread across 4 servers) | [21:31] |
mercutio | and can perform much more poorly in partial failure situations
well i used 4 as en easier example if you do 3 way replication it usually reads from a single drive, but it needs to write to all 3 so you'll get slightly more than 16/3 iops if each hard-disk does 100 iops, then that is ~533 iops | [21:31] |
brycec | Ohhh that's what you meant, gotcha. (I misunderstood what you were saying) | [21:33] |
mercutio | it can read from different hard-disks as each block lives on one primary
but if you have significant write loads then you can still run into performance deficits | [21:33] |
brycec | (Very RAID5-esque in that way) | [21:33] |
mercutio | yeah
like heaps of people used to do raid 5 for vm work loads and it often failed pretty badly even with raid10 like we had, neighbours could still disrupt each other. | [21:34] |
brycec | mercutio: My predecessor did :(
(RAID5, that is) | [21:34] |
mercutio | yeh it's quite common
it used to not be so bad for things like mail work loads | [21:34] |
brycec | We tried evaluating Logzilla on that array... it went *horrifically* | [21:35] |
mercutio | back when we had 9gb scsi disks etc
thing is hard-disk sizes went way up, and iops didn't... like even u320 scsi etc had 15000 rpm hard-disks ssds have helped a little one of the cool things about ssds is that performance tends to be pretty good in mixed work loads, and with random accesses | [21:35] |
brycec | <3 SSD (in general) | [21:36] |
mercutio | whereas with legacy hard-disk systems things like reading 10,000 files etc can bog down other things
but yeah, i went through all the vm hosts to look at iostat output btw but it's still hard to estimate things like peak load | [21:36] |
brycec | I've got munin doings its thing, but it's rubbish for precise values (naturally)
*doing its | [21:38] |
mercutio | anything within a vm is not going to be as good as the host | [21:38] |
brycec | (Also, apparently I disabled munin on the VM hosts at some point so I don't have current data *facepalm* | [21:38] |
mercutio | what system are you running atm?
well you can still set it up again and get current data better than nothing | [21:38] |
brycec | Yeah I will be
Proxmox | [21:39] |
mercutio | one thing that i noticed about our work load is that we're higher on writes than reads btw
ah so you can get an estimate using iostat both now and since reboot do iostat -x for since reboot you want to look at w/s and r/s | [21:39] |
brycec | My ARP Metal host (also running Proxmox, and _does_ have working munin data) seems to peak at 400 IOPS reads on both of its spinning rust so that's probably the drives' limit there. And 300 on writes. So that gives me a number I can understand at least. | [21:40] |
mercutio | also you can get an idea how bogged down the current system is with r_await, and w_await | [21:40] |
brycec | (Seems that Debian no longer packages iostat) | [21:41] |
mercutio | how many hard-disks is that with?
sysstat package | [21:41] |
brycec | Oh nevermind, yeah sysstat
mercutio: I'm on an STL host, so 2. | [21:41] |
mercutio | iops on hard-disks can still go up with sequential data too | [21:42] |
brycec | (A pair of WDC WD1003FBYX-01Y7B1 in my ARP Metal system, configured as a ZFS mirror vdev) | [21:42] |
brycec watches a RAID10 array do what RAID10 arrays do... via iostat.
(It amuses me to see it in action - reads from 2 disks, but writes to all 4, etc) Thanks again mercutio for helping me through my minor meltdown I'm still over my head, but at least I have some IOPS numbers in front of me I can understand Now if only I could buy brand new drives of the capacity and spec that I want. | [21:51] | |
mercutio | i suspect you can buy new hard-disks fine
you were saying amazon has buy limit? there's probably someone else you can deal with too | [22:01] |
m0unds | gah, buffer was too small
mercutio: any idea why ssd storage on a "thunder" vm would be slower than spinning disk? more contention on the host or something? | [22:05] |
mercutio | it shouldn't be slower | [22:06] |
m0unds | it's like twice as slow, iops vs spinning using ioping -R . | [22:07] |
mercutio | oh latency higher
umm probably because it's going through network | [22:07] |
m0unds | well, latency is also higher | [22:07] |
mercutio | ioping is single access at once | [22:07] |
m0unds | ah, gotcha
spinning disks are physically attached? | [22:07] |
mercutio | so if you run two ioping at once it shouldn't degrade as much
yeh it's one of the negative things about ceph not a lot that can be done about it | [22:07] |
m0unds | gotcha, not a big deal, was just curious | [22:08] |
mercutio | that said, if rdma support was stable it'd be a little better | [22:08] |
m0unds | figured that was the case though, attached vs network - better parallelization or whatever? | [22:08] |
mercutio | well there's multiple memory copys
and layers of overhead on the ceph side whenever you receive network traffic and have to act on it and respond it's slower than staying within the cache of a local system | [22:08] |
m0unds | brycec: i missed it - did you say whether you had a working price range for disks? i was just having to do some shopping for a client, so i had some numbers for a bunch of vendors, haha | [22:09] |
mercutio | even if you had very very low network latency the unpredictableness of incoming traffic means that cpu caches etc aren't ready | [22:09] |
m0unds | mercutio: gotcha | [22:09] |
mercutio | so you're more likely to have cold cpu cache etc
plus the memory copies not doing rdma if you look at rdma latency tests they're pretty good if both sides and ready/waiting just checking latency this intel octane memory is lower latency than nvme ssd too apparently it's noticably faster | [22:09] |
m0unds | interesting | [22:11] |
brycec | m0unds: $75-$150 for a 4TB drive but for no better reason than that's where prices seemed to be when I started shopping
21:27 <brycec> m0unds: Ideal price point? Ideally, $75 for 4TB I guess. But only because that's a price I saw when I first started searching, and because $1200-$1500 (16-20 drives) is a relatively easy pill to swallow. 21:28 <brycec> But then, even, $150 for a 4TB feels "reasonable" for an "enterprise/DC" drive. | [22:11] |
m0unds | cool, thanks (i need to resize my znc buffer, it's too damned short) | [22:13] |
brycec | m0unds: [FBI] should have a log too | [22:13] |
mercutio | rbi is back | [22:13] |
m0unds | oh, right, forgot about that
yeah | [22:13] |
mercutio | fbi even and bryce said
https://ceph.com/community/part-3-rhcs-bluestore-performance-scalability-3-vs-5-nodes/ | [22:13] |
m0unds | yeah, looks like that's about the range i saw, even with some qty discounts | [22:16] |
brycec | tl;dr "More nodes == way higher IOPS achieved" | [22:18] |
mercutio | but similar performance for bulk data
well it also looked around the number of nodes you may be looking at | [22:18] |
m0unds | hahaha | [22:19] |
mercutio | the blog is pertty cool | [22:19] |
brycec | mercutio: thanks (4) | [22:19] |
mercutio | https://ceph.com/community/ceph-block-storage-performance-on-all-flash-cluster-with-bluestore-backend/
of course all flash is going to give high iops :) | [22:19] |
brycec | I really appreciate their "Executive Summaries" | [22:20] |
mercutio | they're so well written
i'm used to things like anandtech which are pretty average | [22:20] |
brycec | I wish I could afford 5 nodes with 7 $1200 SSDs each :/
(Their all-flash cluster uses 7 Intel 4TB SSDs) | [22:23] |
m0unds | fancy fancy fancy | [22:25] |
mercutio | yeah i think intel are sponsoring it or something
maybe overheads aren't as bad as i thought they were it's improved a bit i think i haven't even hawerd of this cpu.. xeon platinum 8180 38.5mb cache! 28 core oh there's two of them ok that is weird, there is 196gb of ram usualyl it'd be 192gb? >>> 192*1024/1000 196.608 hmm.. also as well as those 4tb disks they've got those stupid fast intel optane memory ssd | [22:25] |
m0unds | some of intel's 3d nvm chips are mfg here | [22:29] |
mercutio | https://www.newegg.com/Product/Product.aspx?Item=9SIAH9B99N0963&Description=intel%20optane%20375gb&cm_re=intel_optane_375gb-_-9SIAH9B99N0963-_-Product
only $1500 each :) | [22:30] |
m0unds | hahahaha | [22:30] |
mercutio | it'll trickle down i'm hoping | [22:30] |
m0unds | haha, those are the chips they build here | [22:31] |
mercutio | where is here? | [22:31] |
m0unds | the ones in optane and micron quantx
well, 5 mins north of here in rio rancho, nm, usa | [22:31] |
mercutio | ah
Intel Optane SSDs have roughly 10X lower latency and 5-8X better throughput at low queue depths compared to the fastest PCIe NVMe NAND-based SSDs. kind of crazy... ah they talk about how they're benchmarking etc too, adn they are using fio like i suspected but there's actually a librbd module so it's easier than setting up virtual environments i suppose | [22:31] |
m0unds | cool | [22:35] |
brycec | So, opinion poll: "Renewed" drives, yea/nay? | [22:49] |
mercutio | i'm guessing that you shoudl probably just get cheap 2tb seagate and shit loads of them
and then redo it with ssd later :) or add another ssd storage pool | [22:49] |
brycec | On the one hand, drives are gonna fail, plan/expect that. On the other, I just don't trust others' used stuff. | [22:50] |
mercutio | the only problem with cheap drives is that they don't "fail" quickly and they don't have vibration protection
yeah i'd buy new if not getting mass discount i'll bbl but it's really what you can do for bulk servers like if you can take 20x hard-disks its' better than 12 you might be able to use nvme and have internal journal but you wnat to be careful about TBW | [22:50] |
brycec | TBW = terabytes written
(for those like me and didn't know the term) | [22:54] |
..... (idle for 20mn) | ||
*** | mnathani has quit IRC () | [23:14] |
..... (idle for 20mn) | ||
hive-mind has quit IRC (Ping timeout: 268 seconds) | [23:34] | |
hive-mind has joined #arpnetworks | [23:41] | |
.... (idle for 15mn) | ||
mercutio | ah sorry, i was on my way out the door basically
yeah some SSDs aren't so good with TBW compared to others. it's mostly significant for journals.. because journals don't need a lot of storage, but reuse that storage a lot | [23:56] |
↑back Search ←Prev date Next date→ Show only urls | (Click on time to select a line by its url) |