Need to choose: RAID 1 Mirror (Software or Hardware)
I know this question has been asked a thousand times and I have also read 999 of them before posting this question just want to have an insight what others prefer.
Scenario: **
Two SATA SSDs which I want in **raid 1 mirror. I want them in boot RAID so if one fails, the other takes over and boots the OS. Now have two choices
Hardware based Mirror RAID
Going to use Dell R630 with H730P raid controller. Simple and straight forward with the only risk that if controller dies need to replace it same controller having same firmware.Software based Mirrio RAID.
Actually, I was about to go with hardware raid when the datacenter told me to go for OS based RAID and pointed me to https://help.ubuntu.com/community/Installation/SoftwareRAID which uses mdadm. I also searched for ZFS but its there is no easy way to setup it on server version of Ubuntu and you can simply download the desktop version and then format it as zfs during setup which I think created the boot version.
**
Now I want to seek which route I should adopt or what is the industry practice?**. The hardware one seems easy to go down and manage but involves an inherited risk of controller failure whereas the software one seems a little complex and things can go wrong since my coding skills aren't the best.
Host Mayo Ltd: Dedicated Server Hosting & UK VPS Server
Comments
I reckon that you've answered your own question; hardware, with the proviso that you purchase the identical controller at the same time, to circumvent supply/firmware revision issues should the need arise.
It wisnae me! A big boy done it and ran away.
NVMe2G for life! until death (the end is nigh)
I had linux raid 1 with ext3 for several years, performance was no worse than naked drives, by default multiple threads can read concurrently spreading the load on different drives. Easy to set up and resilient to crashes. Then I moved to ZFS mirror two months ago, performance is significantly worse but so far no other problems. Disk caches must be disabled in both cases.
I will underline here industry practice.
2 drive setup, if SATA, go for ZFS!
For SW RAID:
MDADM on Mirrored - sincerely, ZFS beats anything in SW raid on the market, even low spec HW raid controlers, do ZFS!
For HW Raid:
Use only SAS, SATA + HW makes no sense, I will not dive into specifics, but just no, if you have SATA drives just do a ZFS and account for the 1 GB RAM for each 1TB storage )
For Dell H730P ( P stands for performance, and they have 2GB Cache ), awesome cards, out of 2 dozen servers with those, none field in the last 3 years. Eve when we upgraded H730 ( 1 GB version ) to H730P the virtual array was imported without problems.
There is one thing that you cannot do on H730/730P/740P. You cannot do ZFS on that, I mean you can but it will work like shit - pardon my language, the HBA mode on it is a joke; Drives will have IO issues, especially SATA ( I think they will negotiate only a 3GBPS link as I recall ).
If you wish to do SW raid, you will need a H330 card ( as that is a HBA ).
If you do HW raid, please use SAS
If you do HW over consumer grade SATA, it will work ok, but the controler might throe your drives as pre-fail after some time, eve if the are ok.
Also note, using non DELL drives in a Dell server might get your coolers to 98% even in idle, you will have to disable 3rd party pci fan ramp. ( 100% on coolers is ~150W / hours of electricity, plus it will kill your coolers fast )
https://www.dell.com/community/en/conversations/poweredge-hardware-general/how-to-quiet-r730xd-fans-lower-idle-fan-speed/647fa279f4ccf8a8de7fd4ad
Disable the third party/non-Dell cooling profile:
.\ipmitool -I lanplus -H ipaddress -U root -P password raw 0x30 0xce 0x00 0x16 0x05 0x00 0x00 0x00 0x05 0x00 0x01 0x00 0x00
Enable the third party/non-Dell cooling profile:
.\ipmitool -I lanplus -H ipaddress -U root -P password raw 0x30 0xce 0x00 0x16 0x05 0x00 0x00 0x00 0x05 0x00 0x00 0x00 0x00
Cheers!
Host-C - VPS Services Provider - AS211462
"If there is no struggle there is no progress"
that's a good insight....what I was actually looking for real word experience. I will study the @host_c reply in detail.
Host Mayo Ltd: Dedicated Server Hosting & UK VPS Server
Guys,
2 drive mirrors will suck especially on SW RAID. Data has to be multiplied and copied to each drive + write ACK, it will suck the same on HW RAID also if that makes you feel all fuzzy inside .
Performance with RAID is with stripped mirrors ( AKA raid 10 ). The higher the number of the drives the better the performance.
Know that modern CPU's ( the 4000+ ones in GB6 single core ) are amazing, but i will still rather let an ASIC do the job.
@HostMayo
As for Boot Drive, just do the Mirror over HW raid, it will never fail, the load on the OS is nonexistent, compared to a 100TB raid array for VM's.
Host-C - VPS Services Provider - AS211462
"If there is no struggle there is no progress"
@davide
ZFS vs ext3 you had, i will just say this, you will not loose a bit of data, ever with ZFS. good choice for SW RAID.
Host-C - VPS Services Provider - AS211462
"If there is no struggle there is no progress"
SAS drives goes very expensive in which I can even build a nvme server lolz...so i see that every thing will suck in raid mirror unless I go raid 5.
I also want to go for zfs but two problems
1. As mentioned above performance will really suck that bad?
2. No comprehensive guide to setup zfs boot drive setup.
Host Mayo Ltd: Dedicated Server Hosting & UK VPS Server
Whatever you do, ok, just don't do raid 5, it is not recommended in PROD for almost 3 decades now. ( since larger drives have shown up )
For boot, as I said before, do mirror on what you have ( the 2 drives )
NVME - if consumer via PCI-EX, aaaaaa, just don't, or do backups. Consumer NVME will wear out fast as hell, we have some 980 PRO's that have 55% and above in ~4 months.
You could go with PCI-EX 6.4TB or larger Enterprise, not as fast as consumer, but the TBW on that is enormous compared to Consumer ones.
@NathanCore4 here can fix you up with some premium stuff at good prices.
We have some NVME from them, all are between 98-100% life ( 2% wear ).
Host-C - VPS Services Provider - AS211462
"If there is no struggle there is no progress"
Why if ZFS much better on for raid1 purposes then 99% dedicated server installation from providers comes with mdadm?
I mean lets took Hetzner - if we choose dedicated + almalinux / ubuntu / debian as OS and pick selector "RAID1" then we will get server with mdadm.
A lot of google articles tell me that "yes bro, you can install zfs-pool as addon on your RHEL/DEB system" but no as "/ partition during install".
I am not argue against ZFS but if its FreeBSD and you got ZFS by default - then okay, great solution. But if you typicall RHEL-DEB-whatever user than reputable hosters should offer ZFS by default for that OS too?
True the fiberstate datacenter also advised me for mdadm on Ubuntu
Host Mayo Ltd: Dedicated Server Hosting & UK VPS Server
Because no one wants to BURN RAM for Storage
Plus the fact that in high IO, it kinda sucks ( above 50-60K IO/sec ).
It is excellent for data integrity on large arrays if you hit it with a ton of ram.
It's purpose was
EDIT:
I also have the feeling that ZFS is widely misused, sincerely.
It is like having 10x /24 on the same broadcast domain and not splitting them into separate vlans - and we see this all day.
Host-C - VPS Services Provider - AS211462
"If there is no struggle there is no progress"
auto expand on drive upgrade - to ease upgrades on large arrays in data-centers
cache data in ram - as at the time hw raid controllers barely had 256 mb cache, and on large arrays, even in raid 10 that was a huge issue.
I think that for that purpose 1xSSD + 100xHDD will be fine, and we can just install OS on SSD and create super-mega-ZFSPool on HDDs (as block storage addon).
And we discussing 2xSSD/HDD installation.
Are you sure that ZRAID1 on 2x4TB SSD with OS will be good idea too?
Just googled that (I'm not debian user so may be that wrong way to install zfs).
So for installation Debian we need:
1. Rent a server with 2xSomething and boot from Livecd.
2. Install zfs-addon-packages to livecd env.
3. Create partitions
4. Then debootstrap our Debian to our partitions (may be I'm wrong there).
Ofc its possible to do, but if we just have debian-tnx-12.iso in installator we can choose raid1 (mdadm), raid0 (mdadm) etc...
I mean even OS developers in their installation packages use (only) mdadm!
Okay if ZFS good but Hetzner is too lazy for making templates for zfs-installed-debian then why I can't use RAIDZ1 "out the box" from debian/rhel intallation ISO like mdadm?
do that, send me the data while you have 30 customers that have the add-on drive on the 100TB pool that basically shits itself IO wise, like barely hitting a few MB. - we did this and boy it went to shit, and we tried it a bunch of times in the past years.
Plus the fact that you will burn the extra XXXGB of ram for that, useless rather give it to customers for free.
Of the 2, only NETAPP got it right ( WAFL ) as there was a debate on the technology back then. ( Sun Microsystems VS NetApp )
But RAID 5 is not a good idea, on anything, not even personal porn collection
Host-C - VPS Services Provider - AS211462
"If there is no struggle there is no progress"
I think that I stopped to understand you
You said that ZFS Raid is great thing.
I said that we can install OS to 1xSSD and create zfs_pool on 100xHDD.
And then you tell me that this is shit plan because this setup will eat almost all RAM.
So ZFS raid good or bad?
May be this is language miscommunication from my side but I really lost the point about ZFS pluses and minuses.
Lets take as example that situation:
We have 44 TB of very important data that should have fast +- access.
We buy SX65 from hetz with 2x1NVME and 4x22TB HDD.
My offer:
mdadm raid1 for 2x1NVME (as it is "out-of-box setup")
zfs-raid-10 for 4x22HDD.
If we have same server without without nvme then my solution will be mdadm raid10 for 4x22Tb only.
What is your plan for that task / server?
MDADM is as old as the linux kernel, it is mature, and stable, hence it is trusted.
ZFS is like the new kid on the block, pretty young, so not that trust-worthy for the moment.
If you do a storage server for serving shares over NFS,CIFS,ISCSI ( like freenas/truenass) then yes, it makes sense to use ZFS on the XXXTB share, as the software allocates all ram to storage - it has only 1 task to do.
On a node, where you run VMS with dynamic memory allocation, and have to dynamically adjust the RAM for storage also, naaa - not such a good idea. Too much stuff happening in ram that ultimately translates into latencies, and in storage, each sub-ms counts.
BAD Idea, really bad on a PROD node hosting VM's, on a dedicated SHARED Storage it is OK-ish depending on the ammount of IO reqests and RAID type
it is for some: only 2-4 drive setup, or dedicated storage, like the whole server is dedicated to storage, like I described above.
It really depends on what you do, like choosing the right tool for the right job.
ZFS stands out for Data Integrity, nothing more.
Writing from my phone does not help my case
Host-C - VPS Services Provider - AS211462
"If there is no struggle there is no progress"
For Boot, I would use 1 NVME, as the OS does not do heavy Read / Write to it, so Wear level will not be an issue
Definetly RAID10 on the 4 drives - ZFS if you have the RAM for it ( 88 GB ) if not, MDADM
Use the aditional NVME for fast boot to the VM's or as plain fast storage.
I would not mirror the NVME for BOOT OS.
You can bend the 1GB / TB rule in ZFS, it will work, but the outcome might be what you aspect, especially since we use open-ZFS.
Host-C - VPS Services Provider - AS211462
"If there is no struggle there is no progress"
lolz @host_c you have made choice even more difficult hahaah seems like there is no one solution to all...it all comes down to your configuration and then to personal choice, I assume.
Host Mayo Ltd: Dedicated Server Hosting & UK VPS Server
Yeah, that what I want to hear
ZFS should be used only if ZFS (storage) is ONLY task for that server.
Lets back to OPs question, for example:
task of server: Shared Hosting (cPanel/DA).
we have 7950X with 128GB RAM and 2x1,92 TB NVME.
Will you prefer to "have a night with installation of ZFS pool raid1 and CloudLinux (rhel-based)" or just choose mdadm raid-1 during setup? We have enough RAM for ZFS obviously.
Very, very intresting conversation, thank you!
@shakib is using config like this with zfs to sell off vps so he must have first-hand information.
Host Mayo Ltd: Dedicated Server Hosting & UK VPS Server
Just google for FreeBSD ZFS on Root and it seems that even for FreeBSD we should prepare partitions first... I mean there are no tick "zfs raid1 please" during setup. However whole FreeBDS not about ticks...
Cant go freebds route...only the ones supported by cpanel/directadmin...looks like if I want to see zfs performance I have to enter the water my self.
The plan I have is to setup ubuntu 22 desktop and enable zfs partition during setup which also makes it zfs boot raid and then add second drive to the pool. As per logic it should work fine.
Host Mayo Ltd: Dedicated Server Hosting & UK VPS Server
As I can understand that article about Rocky for RHEL based you should remove kernel-core, pre-install kmod and other zfs drivers, rebuild initrd.
Not just prepare partitions...
But still its too complicated. Lets wait for @Shakib and other advices.
It can not be true that ZFS on RHELs root is SO difficult.
I talked to shakib on skype he uses promox with advance installation for zfs. Our situation is a bit different.
Host Mayo Ltd: Dedicated Server Hosting & UK VPS Server
If proxmox has zfs from the box then just spin up 1xVM for whole dedi and install shared-hosting software on top (+ use licenses "for vps" as a plus) imo.
My professional opinion, go with HW raid as much as possible. - but who am I, it is not like we mostly sell storage services
This is why: ( raid 60 of 24 drives on a node that has running customers, live migrations to it):
## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
Yet-Another-Bench-Script
v2024-06-09
https://github.com/masonr/yet-another-bench-script
## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
Tue Oct 1 09:30:59 PM EEST 2024
Basic System Information:
Uptime : 10 days, 6 hours, 22 minutes
Processor : Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
CPU cores : 80 @ 3100.019 MHz
AES-NI : ✔ Enabled
VM-x/AMD-V : ✔ Enabled
RAM : 502.5 GiB
Swap : 256.0 GiB
Disk : 260.6 TiB
Distro : Debian GNU/Linux 12 (bookworm)
Kernel : 6.8.4-2-pve
VM Type : NONE
IPv4/IPv6 : ✔ Online / ❌ Offline
IPv4 Network Information:
ISP : Classified
ASN : Classified
Host : Classified
Location : Oradea, Bihor County (BH)
Country : Romania
fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/sdb1):
That is almost kinda Raid 10 territory, and RAID 60 has a ton of overhead behind it. We tried with ZFS, we field, we tried again, we failed again.
YES - Bongo, I like you
We have a few ZFS NAS that feed storage via ISCSI to VMWARE ( VMFS6 File System ) for 3 nodes in a small cluster. Customer has mainly Databases that do constant 3-5 GBPS read and write, it works like a charm.
Depends,
if the NVME are Consumer Grade, yes, MDADM mirror ( no zfs sorry will explain in a bit why )
If the NVME are Enterprise - I would just install the OS on one of them and add the second as data source, maybe for SQL only or heavy Read / Write stuff. ( that is how much I trust Enterprise NVME yes ).
Either the situation you must have at least 1 Backup ( if you do not have, we can help here with a few TB )
Why no ZFS? Because your Kernel will try to juggle RAM ( ARC ) between what the OS needs and Cache for ZFS.
ZFS has a pretty complicated Cache system - AKA ARC. What it definitely hates by design is modifying the available usable size on the go/dynamic ( the RAM ).
It was written to consume as much as possible.
While RAM is FIXED amount in a server, if you use ZFS and have PROD apps running meanwhile that balance goes to shit; ARC will always try to adjust to the new value available, rendering it's purpose unusable or let's say poor in performance ( as it wants to cache physical blocks that are used frequently and you are always modifying the available space).
Yes you can set a fixed amount for ARC on Boot, but, you are pissing on the whole point of ARC, in that case, go MDADM. ( unless you have like 40 TB Drives and 128 GB ram and allocate a fixed amount of 40 GB for ARC for ZFS, but again, ARC performs best if it has as much as possible )
As a note:
We did a test, 160 TB in drives and 512 GB in ARC, storage via ISCSI to VMWARE.
Specs - Xeon V4 3.0 GHz CPU, H330 HBA and Chelsio T520CR ETH Lan Cards, outcome=Jesus ALL mighty!!!.
We put the "SAN" against a EMC and Netapp ( spinning disks ) of some customers, ZFS eat them, by a high margin in whatever IO BW test you can imagine.
Unfortunately PROX sucks in ISCSI, and we could no port the project to it, alos, NFS is not an option, maybe with RDMA, but that is pretty week for now.
Back to our topic:
In principle all Cache works like this, controller sees frequently accessed data ( blocks ) and puts them into the CACHE. - end of story for READ; Write behave a little different and more complex, but more on this another-time.
I would not waste my time with it, sincerely.
Need some details to this, or I might not understood something?
Host-C - VPS Services Provider - AS211462
"If there is no struggle there is no progress"
Nah, in that part I mean that if Shakib successfully install proxmox with ZFS (never tried proxmox) then it seems that Proxmox installation process offering us ZFS-raid during setup. So my advice was:
1. Install Proxmox on baremetal.
2. Spin-up VM with maxed resources minus ZFS RAM (for example, if our server 128G and 2x4TB NVME, then leave 8GB/10GB RAM for host and rest 119GB for VM).
3. And inside that VM install cPanel/DA and sell shared hosting.
So we successfully completed task "shared hosting server on ZFS raid".
Regarding HWRAID. I never used any of that devices but is it true, that we should buy 2 same models and if one of them die that we just swap?
I mean, if our server dies with mdadm raid, than we replace motherboard, and 99% that we just "power-on our pc" and server go back online.
And if we just have 1 HWRAID in stock and it will die, than it will be impossible to restore data?
For example - buying 1 pcs of any hwraid for home use won't be great and smart idea?
true
Yes you would need to replace the same raid controller and possible with same firmware otherwise no data. But they are hard to kill
yup go with mdadm then.
Til now from all the discussion I finalized this
zfs -> Use only when data integrity is utmost important not the performance: Mostly for data storage.
mdadm->simple straight easy for normal raid.
hardware raid-> More simple, marginal performance gain (can be ignored though) , risk of hardware controller.
Host Mayo Ltd: Dedicated Server Hosting & UK VPS Server
yes, that is true, to some extent, this you will have to google around, I do not know if MDADM maps the drives by UUID or /dev/sdxxx, as on a new motherboards, drives can change naming from sdaxxx to daxxx or other. in that case you have to import and re-sync the array.
With ZFS this will never happen, you can basically put the drives in a desktop on 3 different controllers, it makes no importance to zfs as it maps the drives by UUID
In WH raid it will mostly depend on your HW Raid Card Generation, whatever is newer than 2015-ish will import an array created on the same model card 99% of the time. Between BRAND and MODELS not really, low probability.
On the H730 and upward gen import works 99% of the time, even from H730 to H740P, but as you pointed out, there is a risk.
FW was a problem a few years back, especially with HPE and LSI FSC controlers.
YES!
YES, I will ad that this one has a very low impact on CPU time, but that depends on RAID typ used.
There is 1 thing I did not like in MDADM, in large raid setups ( 12+ disk ) if you have some disks that are slow-er then the rest, it will keep de-synchronizing the array.
MDAM + NVME for cache = just don't, and this is true for ZFS also.
ZFS on the other hand has no problems on this, but this is mostly due to the fact that it uses RAM for CACHE ( ARC )
You can mix SATA with SAS ( 5.400 to 15.000 RPM mix model brand drives )with PCI-EX NVME all in the same pool/raid if you so wish ) it will work. - 100% tested, not once , imagine that
I cannot give you numbers on failed HW controllers, I can only tell you what I personally saw in the last 20+ years.
Controllers that failed were mostly HPE and those were P4XX series and P2XX and P8XX ( up to GEN 8 HP servers ) as the heat-sink was tiny and the ASIC was hot as hell. also HPE took the "wise" decision to make plastic pins for the heat-sink to the PCB, not wise when you have a 80 degree Celsius CPU......... ( LSI, FSC, DELL use metal pins to secure the heat-sink ).
If ambient temperature is above 24 Celsius even a HBA will fail, so in this calculation we might want to take out home/basement hosted servers.
I like HW raid cards because:
I wish to mention the fact that my view is as of a Hosting Provider, where I have to put the data integrity of my customers above all other metrics ( speed, IO ).
Also I have to able to fix/change the failed part fast, so fucking around in tickets or video-calls with remote hands is a waste of time for me, also debugging WTF happened after the last update in ZFS utils or MDADM is a waste of time.
Time is the only resource we cannot buy, spare parts - yes, stores/e-bay is full of them
If you need anything, give me a PING via IPV4
Host-C - VPS Services Provider - AS211462
"If there is no struggle there is no progress"