I've had some good luck with Teamgroup MP44's in zfs raid 1, plenty fast, run fairly cool and I've been running one with a stress test and have plopped a few hundred TB on it without too much issues.
All of Z+'s ZFS stuff runs on either 6 or 7 wide RaidZ2 or 12 wide RaidZ3 for the "idc how long it takes to access it" datasets. Customer facing is almost always 2-4 vdevs of 6 wide raidz2, gives us 250-500 raw IOPS and then it's cached by large enterprise NVMe drives.
Admittedly I put a lot of effort into making sure the large data arrays is what's going to be left if anything were to happen. But at the end of the day, I'm not an iron mountain and I really hope any client who chooses to backup data with my company and service is doing so as a secondary backup.
I'm getting to a volume of data where the power usage of the HDD's is actually a concern - due to using a bunch of 3 and 6TB drives for my own personal arrays I'm burning 400-500w of power for only 180tb of usable volume. That's pretty bad.
Switching consumer side from enterprise 6TBs to enterprise 18TB's in July / Aug which will let me store data 3x more efficiently. Unfortunately it won't save me money because it's the same number of disks, but my usable volumes will 3x. Small chance I take the leap and go for 24TB disks.
Since I work with such large datasets, I've actually started investing in tape, and now have the ability to store data on tape. Still working out how I'd like to offer this as a service, but there's already competition out there for ultra slow access data on the cheap.
@archivist said: SSD can be even cheaper considering wear down, electricity and no hardware failures.
This is Wrong on so many levels, that I cannot even say.
High Performance SSD will consume more power then a Magnetic SAS Drive.
fail Rate on consumer drives used 24/7/365 is much higher then Enterprise drives used 24/7/365, especially after the first year.
While Consumer SSD is cheap~ish, Enterprise is not and Enterprise drives with capacity of 6.4 TB or grater cost a fortune even on e-bay.
Consumer SSD used in Enterprise/Hosting is like shooting your own balls, reliability wise. - This is a NoGo.
A setup of 24 Enterprise drives in a raid60 is astronomical cost wise; Raid 10 on that would max out the 12GBPS SAS Link, will also give you enormous IOPS but it will cost a few K in USD to set up, I doubt that it would be feasible on the low-end-market for anyone.
So spinning Rust is still the GO TO for Large Storage. Raid 10 can speed things up, but that will end up in a ~5USD / TB by today prices.
EDIT:
SSD VS HDD Power usage:
6.4 TB SAS SSD
18 TB SAS HDD
To be fair, I found that enterprise NVMe disks were cheaper than consumer NVMe disks at large storage capacities, specifically 3.84TB and above. At the moment, 7.68TB Enterprise NVMe is ~$450-550, but it depends on how the market is doing. I'm buying used 3.84TB Gen4 Enterprise NVMe drives for ~$230 each. I agree though, generally they are more expensive than HDD.
Storage VMs have always interested us as a product to potentially offer, but we ultimately decided that using HDDs for such a product would not be worth the headache of having to deal with the generally slow performance of those HDDs. SSDs, while they are at a relatively low price, the upfront cost is still much more expensive than selling HDD VMs. I looked into it, and to make SSD VMs make sense, I would either need to sell it at ~$15/TB+ or use some sort of setup with deduplication and slightly overcommit space, which is both complex and really expensive to do correctly.
A 24 x 7.68TB NVMe server would cost at least $10,800 for the drives alone, maybe add an extra +$3k for the server hardware itself (assume maybe an AMD EPYC system), and the margins just aren't really there. The bandwidth usage and I/O usage from clients would likely be absurd too.
Best scenario I could assume, buy 12 x 15.36TB used NVMe for $900 each and run it in RAID5. You will get 168.96 TB in capacity. At $10/TB, you could make ~$1000/month. Then once someone starts using the write speed, you will have 1000 "Exposing XXX host for using hard drive!!!" posts because their Yabs.sh write speed is not 21,939,123 MB/s.
I'll look into it again likely in a few months, to see if it's something we can offer. I'm hoping that we can come up with something that we could rent out at $10/TB/month, or if NAND flash pricing drops to that point. $5/TB/month would be the dream, but I know that it's basically impossible without a heavy ROI, even if you didn't have any redundancy at all.
Also regarding Contabo and data loss: I’ve seen data loss incidents at pretty much every provider. You should be taking your own backups no matter what, regardless of the provider that you’re using. Even providers with billions, like Google, have lost data before. RAID does not always protect you and is not a backup, and many VPS hosts either don’t do backups or charge extra since it’s a huge logistical challenge. We do backups, but it’s only once a day, and not something we would do on a storage-focused product.
Realistically, it would not be a bad idea to take a 1.8TB SSD Contabo VPS and then a 2TB HDD VPS elsewhere and sync them together, if you are happy with Contabo’s SSD performance.
@ZizzyDizzyMC said: Switching consumer side from enterprise 6TBs to enterprise 18TB's in July / Aug which will let me store data 3x more efficiently. Unfortunately it won't save me money because it's the same number of disks, but my usable volumes will 3x. Small chance I take the leap and go for 24TB disks.
If you plan to do 12 x18TB in Z2 ( ZFS ) = Respect for burning ~100 GB ram for that.
When fragmentation will be above 20%, performance will be..... I cannot say the word as @AuroraZero will ban my ass, I will go with "low "
Raid 6/60/Z2 or stripped Z2:
I know that I am gonna get my ass whooped for trashing in SW raid from some/most of you ( it is not that it does not work, it works, but not as HW raid in large setups , also I said it does not work I mean hitting it whit a ton of IOPS, even for backup), I can say this, at 360 TB in stripped Z2 with 512 GB DDR4 only for ZFS after ~1 and a half hear year performance was barely saturating a 10GBPS link in sequential read / write. HW raid will also struggle with it ( fragmentation ), but at least you will not waste CPU Cycles for math and RAM for cache and also rely on a kernel not to freeze fuck up.
In small numbers, up to 8 drives we also do ZFS sometimes ( well the only case is if I cannot fit in a HW raid controller )
Yes, re-silver on ZFS is much faster then on HW raid at large capacity drives ( whatever is above 10TB ), sincerely, on 18 TB drives it finished ~5 days before the HW raid, while performance was rubbish on both during re-silver.
And yes, we did not loose a single BIT with ZFS for more than 10 years,
In our case RAID6/60 or any double parity raid ( HW or SW ) the bottleneck will be the math. Now I will trust an ASIC to do this rather then to introduce latency's from the CPU to do it.
In raid 10 - whatever works, either HW or SW.
The above for spinning rust.
on NVME, or NVME RAID 1/10/5/6/60/z1/z2.... , caching and compression actually helps as data blocks are "evenly/full" distributed/filled and that little optimization you get helps the life of the NAND a LOT! , yet, if using Enerprise NVME, you saw my post, it can take PB read/write and just go on and on and on - you get the idea.
From my point of view, if you want true performance from HDD, RAID10 and a minimum of 12 drives ( the more, the better ) but you will loose 50% of total capacity, and the outcome is $$$$$$$$$$$ ( ran out of $$$ )
Budget/Backup/storage optimized raid 6/z2 up to 12 drives and 60 whatever is above. ( RAID60 = stripped raid 6 )
There is no magic formula to have speed and redundancy and max capacity at the same time other then raid 10 and that is quite a fiasco if the underneath hardware is not "strong" quality wise.
Waiting for others to share more info / insight on this.
My host nodes for disks currently have 1TB of ram, and I have 1 that's in use and another with identical specs as a backup that's powered off. Next host node setup will have 2TB of ram per node. Finding a 2u 2node system with 32 dimm slots per node is hard on a budget though. (Xeon Scalable gen2 or epyc gen 2/3)
But yes, ZFS ram usage will just use the whole ram amount it's allowed by default on TrueNAS Scale. The reason I go for multiple vdevs of ZFS raidz2 is because I'm paranoid about disk loss. It's not about the capacity or the speed, it's about not risking client data if I can at all help it.
Comments
It's been kinda nice seeing a drive conversation around here. I don't have much to add, but I expect like me, others are following the conversation.
For staff assistance or support issues please use the helpdesk ticket system at https://support.lowendspirit.com/index.php?a=add
I've had some good luck with Teamgroup MP44's in zfs raid 1, plenty fast, run fairly cool and I've been running one with a stress test and have plopped a few hundred TB on it without too much issues.
All of Z+'s ZFS stuff runs on either 6 or 7 wide RaidZ2 or 12 wide RaidZ3 for the "idc how long it takes to access it" datasets. Customer facing is almost always 2-4 vdevs of 6 wide raidz2, gives us 250-500 raw IOPS and then it's cached by large enterprise NVMe drives.
Admittedly I put a lot of effort into making sure the large data arrays is what's going to be left if anything were to happen. But at the end of the day, I'm not an iron mountain and I really hope any client who chooses to backup data with my company and service is doing so as a secondary backup.
I'm getting to a volume of data where the power usage of the HDD's is actually a concern - due to using a bunch of 3 and 6TB drives for my own personal arrays I'm burning 400-500w of power for only 180tb of usable volume. That's pretty bad.
Switching consumer side from enterprise 6TBs to enterprise 18TB's in July / Aug which will let me store data 3x more efficiently. Unfortunately it won't save me money because it's the same number of disks, but my usable volumes will 3x. Small chance I take the leap and go for 24TB disks.
Since I work with such large datasets, I've actually started investing in tape, and now have the ability to store data on tape. Still working out how I'd like to offer this as a service, but there's already competition out there for ultra slow access data on the cheap.
To be fair, I found that enterprise NVMe disks were cheaper than consumer NVMe disks at large storage capacities, specifically 3.84TB and above. At the moment, 7.68TB Enterprise NVMe is ~$450-550, but it depends on how the market is doing. I'm buying used 3.84TB Gen4 Enterprise NVMe drives for ~$230 each. I agree though, generally they are more expensive than HDD.
Storage VMs have always interested us as a product to potentially offer, but we ultimately decided that using HDDs for such a product would not be worth the headache of having to deal with the generally slow performance of those HDDs. SSDs, while they are at a relatively low price, the upfront cost is still much more expensive than selling HDD VMs. I looked into it, and to make SSD VMs make sense, I would either need to sell it at ~$15/TB+ or use some sort of setup with deduplication and slightly overcommit space, which is both complex and really expensive to do correctly.
A 24 x 7.68TB NVMe server would cost at least $10,800 for the drives alone, maybe add an extra +$3k for the server hardware itself (assume maybe an AMD EPYC system), and the margins just aren't really there. The bandwidth usage and I/O usage from clients would likely be absurd too.
Best scenario I could assume, buy 12 x 15.36TB used NVMe for $900 each and run it in RAID5. You will get 168.96 TB in capacity. At $10/TB, you could make ~$1000/month. Then once someone starts using the write speed, you will have 1000 "Exposing XXX host for using hard drive!!!" posts because their Yabs.sh write speed is not 21,939,123 MB/s.
I'll look into it again likely in a few months, to see if it's something we can offer. I'm hoping that we can come up with something that we could rent out at $10/TB/month, or if NAND flash pricing drops to that point. $5/TB/month would be the dream, but I know that it's basically impossible without a heavy ROI, even if you didn't have any redundancy at all.
I am a representative of Advin Servers
Also regarding Contabo and data loss: I’ve seen data loss incidents at pretty much every provider. You should be taking your own backups no matter what, regardless of the provider that you’re using. Even providers with billions, like Google, have lost data before. RAID does not always protect you and is not a backup, and many VPS hosts either don’t do backups or charge extra since it’s a huge logistical challenge. We do backups, but it’s only once a day, and not something we would do on a storage-focused product.
Realistically, it would not be a bad idea to take a 1.8TB SSD Contabo VPS and then a 2TB HDD VPS elsewhere and sync them together, if you are happy with Contabo’s SSD performance.
I am a representative of Advin Servers
If you plan to do 12 x18TB in Z2 ( ZFS ) = Respect for burning ~100 GB ram for that.
When fragmentation will be above 20%, performance will be..... I cannot say the word as @AuroraZero will ban my ass, I will go with "low
"
Raid 6/60/Z2 or stripped Z2:
I know that I am gonna get my ass whooped for trashing in SW raid from some/most of you ( it is not that it does not work, it works, but not as HW raid in large setups , also I said it does not work I mean hitting it whit a ton of IOPS, even for backup), I can say this, at 360 TB in stripped Z2 with 512 GB DDR4 only for ZFS after ~1 and a half hear year performance was barely saturating a 10GBPS link in sequential read / write. HW raid will also struggle with it ( fragmentation ), but at least you will not waste CPU Cycles for math and RAM for cache and also rely on a kernel not to freeze fuck up.
In small numbers, up to 8 drives we also do ZFS sometimes ( well the only case is if I cannot fit in a HW raid controller )
Yes, re-silver on ZFS is much faster then on HW raid at large capacity drives ( whatever is above 10TB ), sincerely, on 18 TB drives it finished ~5 days before the HW raid, while performance was rubbish on both during re-silver.
And yes, we did not loose a single BIT with ZFS for more than 10 years,
In our case RAID6/60 or any double parity raid ( HW or SW ) the bottleneck will be the math. Now I will trust an ASIC to do this rather then to introduce latency's from the CPU to do it.
In raid 10 - whatever works, either HW or SW.
The above for spinning rust.
on NVME, or NVME RAID 1/10/5/6/60/z1/z2.... , caching and compression actually helps as data blocks are "evenly/full" distributed/filled and that little optimization you get helps the life of the NAND a LOT! , yet, if using Enerprise NVME, you saw my post, it can take PB read/write and just go on and on and on - you get the idea.
From my point of view, if you want true performance from HDD, RAID10 and a minimum of 12 drives ( the more, the better ) but you will loose 50% of total capacity, and the outcome is $$$$$$$$$$$ ( ran out of $$$
)
Budget/Backup/storage optimized raid 6/z2 up to 12 drives and 60 whatever is above. ( RAID60 = stripped raid 6 )
There is no magic formula to have speed and redundancy and max capacity at the same time other then raid 10 and that is quite a fiasco if the underneath hardware is not "strong" quality wise.
Waiting for others to share more info / insight on this.
Host-C - VPS & Storage VPS Services – Reliable, Scalable and Fast - AS211462
"If there is no struggle there is no progress"
My host nodes for disks currently have 1TB of ram, and I have 1 that's in use and another with identical specs as a backup that's powered off. Next host node setup will have 2TB of ram per node. Finding a 2u 2node system with 32 dimm slots per node is hard on a budget though. (Xeon Scalable gen2 or epyc gen 2/3)
But yes, ZFS ram usage will just use the whole ram amount it's allowed by default on TrueNAS Scale. The reason I go for multiple vdevs of ZFS raidz2 is because I'm paranoid about disk loss. It's not about the capacity or the speed, it's about not risking client data if I can at all help it.
@host_c above 20% frag and the diks will be ass!!
Free Hosting at YetiNode | MicroNode | Cryptid Security | URL Shortener | LaunchVPS | ExtraVM | Host-C | In the Node, or Out of the Loop?
(https://www.backblaze.com/blog/backblaze-drive-stats-for-2024/)
(https://www.backblaze.com/blog/ssd-edition-2023-mid-year-drive-stats-review/)
Those are low capacity SSDs, I’m assuming they’re being used as boot disks instead of storing actual data, thus low reads/writes.
I am a representative of Advin Servers
yes