The provider of this very nice VPS asked me to retest with RAM doubled to 4 GB. The short answer seems to be that increasing the RAM doesn't help. Even with 4 GB RAM, there still is a lot of I/O wait.
"very nice VPS" is not sarcasm. It really is a great VPS! And the provider's customer service is great as well!
@Not_Oles said:
The provider of this very nice VPS asked me to retest with RAM doubled to 4 GB. The short answer seems to be that increasing the RAM doesn't help. Even with 4 GB RAM, there still is a lot of I/O wait.
"very nice VPS" is not sarcasm. It really is a great VPS! And the provider's customer service is great as well!
@AuroraZero said: You are wondering why IO wait is higher on a VPS with other people sharing it?
I trust the provider 100%. The provider says it's his least busy Node. Maybe there might be nobody else one it, although the provider didn't say that expressly.
Compared with their Nodes, I often have seen reduced disk performance on Qemu VPSes. Interestingly, network performance on Qemu VPSes seems pretty good when compared with network performance on the Nodes. However, it's not the same with disk I/O. Even using the virtio Qemu feature instead of Qemu's full hardware emulation still doesn't seem to move Qemu VPS disk I/O up toward parity with the Node. For disk I/O there often seems to be a minimum of approximately 15 or 20% performance penalty.
Previously I was only looking at overall time for disk I/O tasks. Now that @cmeerw kindly introduced me to vmstat, I am able to see the I/O wait numbers. Thus now I can imagine that the disk I/O performance penalty is due to I/O wait.
@cmeerw said: it really is just waiting for the OS to do that I/O.
Of course you are right. What I want to understand, though, is why the VPS OS takes longer to do the same file I/O task even using virtio Qemu than the time taken to do the same task directly on the Node without Qemu. Also, why is there no similar penalty for network I/O? The file I/O penalty shown here seems to exist even on empty Nodes with only one test VM running.
Thanks (yet again!) to your kindness, @cmeerw, I might recently see a little more deeply. Maybe I now need to find a kind person on the Qemu team who might give me a little tour of the code and a little hint about why Qemu seems, even with virtio, to take longer for file I/O.
@Not_Oles said: Why is there so much I/O wait on the Ryzen VPS results shown above and practically none on the E3 VPS?
Actually looking at the Ryzen results again, why is it only doing around 150 MB/s (even with a 1M block size)?
Could you maybe just do a dd if=chronos-20240904.tgz.cpt of=/dev/null bs=1M? Surely, that should be closer to some GB/s (similar to the fio results you had earlier)
One other thing that might be an advantage for the E3 VPS could be that it still benefits from the host system's cache (depending on how the storage is set up)
@Not_Oles said: Why does the Ryzen VPS need more RAM than the E3 VPS needs?
Possible answer: The Ryzen needs more RAM because it is using NVMe instead of the SSD that is in the E3? So it's not the single threaded sha256sum that makes the difference here? Instead it's the difference in the disk setup?
The time discrepancies you're seeing when running sha256sum -c between your E3 dedicated server and Ryzen 7950X VPS are puzzling, especially given that the Ryzen 7950X is a far more powerful processor and is equipped with NVMe storage. However, there are multiple factors that could explain why the E3 dedicated server is outperforming the Ryzen VPS for this specific task, despite the hardware advantages of the VPS.
Let's break this down:
Single-threaded Nature of sha256sum
The sha256sum utility, as you're observing, is likely single-threaded, meaning it can only use one CPU core for computing the checksum at a time. This explains why even a more powerful CPU like the Ryzen 7950X may not show its full potential in this case. It's not leveraging multiple cores, so the additional power of the Ryzen CPU is not fully utilized.
In your case, the E3 has fewer but dedicated cores with no virtualization overhead, and that may be enough to perform similarly or even better for single-threaded workloads like sha256sum.
VPS Virtualization Overhead
Even though the Ryzen VPS has powerful hardware, virtualization adds a layer of overhead. This includes managing the CPU, memory, and I/O operations. The performance of vCores in a VPS may not match the physical cores of a dedicated server, even if the VPS hardware is theoretically faster.
The dedicated server on the E3 is running without any virtualization overhead, so it can access the hardware directly without hypervisor intervention. This direct access can make a noticeable difference in I/O-heavy operations like checksum verification.
I/O Performance (SSD RAID 10 vs NVMe RAID 10)
You mentioned that the E3 dedicated server has SSD RAID 10, while the Ryzen VPS has NVMe RAID 10. Although NVMe is generally faster than SSD, performance in a VPS can suffer due to I/O throttling, contention with other VPSs, or underlying shared infrastructure that limits the maximum throughput of the storage system.
RAID 10 on SSDs in your dedicated E3 might be very well-optimized and not subject to any sharing. Meanwhile, the NVMe RAID 10 in the VPS could be shared across multiple tenants. Even if it's fast, contention for resources might slow down the actual I/O performance you're seeing on the VPS.
Impact on sha256sum -c: Since this command reads through the entire file, the I/O performance will heavily affect the overall execution time. If the VPS has any bottlenecks or restrictions on its NVMe performance, this could lead to slower I/O compared to the dedicated SSD RAID 10 setup.
Different Operating Systems
The E3 is running Ubuntu 22.04.4 LTS, while the Ryzen VPS is running Debian 12.7. While both are Linux distributions, there might be differences in file system tuning, kernel versions, or I/O scheduling mechanisms that could impact how efficiently each system handles large file reads and checksumming operations.
The Ubuntu installation on the E3 could have a more optimized I/O scheduler for this specific workload, or it might handle file caching differently than Debian on the VPS.
I/O Caching and File System Differences
The file system in use (ext4, XFS, etc.) could impact performance as well. Different file systems have varying levels of efficiency for reading large files. In addition, Linux systems often cache recently accessed files in memory, so depending on when and how the sha256sum -c was run, the E3 might have benefited from better caching of the file in memory compared to the Ryzen VPS.
RAID configurations on the VPS may also introduce additional latency due to the abstraction layer needed to manage the RAID arrays. Even though NVMe is fast, the RAID controller (especially in virtualized environments) can sometimes add latency.
Disk I/O Saturation and Contention
On a VPS, you're often sharing resources like the disk subsystem with other virtual machines. Even if your VPS has NVMe storage, if other users on the same physical machine are performing heavy disk operations at the same time, your VPS might experience lower-than-expected I/O performance.
This would explain why real-world performance on the VPS is lower than expected, even though benchmarks like Geekbench and fio show the Ryzen VPS is generally faster in isolated tests.
real, user, and sys Time Differences
The "real" time is the actual elapsed time from start to finish of the command.
The "user" time refers to the amount of time the CPU spent executing user-space (application-level) code.
The "sys" time is the amount of time the CPU spent executing system-level (kernel) operations, such as I/O operations.
If you see a significant increase in sys time on the VPS, it could indicate that more time is being spent on I/O operations, which would make sense if the I/O system on the VPS is slower due to contention, throttling, or other factors.
Graphics Processors (GPU)
The graphics processors (iGPUs) on either system are not involved in the sha256sum operation. SHA-256 hashing is a CPU-bound operation, and the typical sha256sum utility does not leverage GPU acceleration unless specifically configured to do so (which is not common with basic file checksum verification tasks).
CPU Frequency Differences
Although the Ryzen 7950X is much newer and faster overall, the E3's CPU might have a higher base clock speed for a single core, or it might sustain its boost frequencies for longer periods under load, particularly since it's dedicated hardware. The Ryzen VPS, on the other hand, might throttle CPU frequency slightly in a virtualized environment to balance resource usage across tenants, impacting single-threaded tasks like sha256sum.
Conclusion:
The E3 dedicated server is outperforming the Ryzen 7950X VPS in your sha256sum -c tests due to a combination of factors:
Single-threaded nature of sha256sum – the additional cores and raw power of the Ryzen CPU are not fully utilized.
Virtualization overhead – the VPS introduces some overhead that affects I/O performance and CPU efficiency.
I/O performance bottlenecks – while the VPS uses NVMe storage, it may be throttled or subject to resource contention, whereas the E3 has direct and optimized access to its SSD RAID 10 array.
File system and kernel differences – slight differences in how the operating systems handle I/O could also contribute to the disparity.
Your results are not unusual given these factors, and the key issue seems to be the VPS’s I/O performance and the single-threaded nature of sha256sum -c.
Comments
The provider of this very nice VPS asked me to retest with RAM doubled to 4 GB. The short answer seems to be that increasing the RAM doesn't help. Even with 4 GB RAM, there still is a lot of I/O wait.
"very nice VPS" is not sarcasm. It really is a great VPS! And the provider's customer service is great as well!
I hope everyone gets the servers they want!
You are wondering why IO wait is higher on a VPS with other people sharing it?
Free Hosting at YetiNode | Cryptid Security | URL Shortener | LaunchVPS | ExtraVM | Host-C | In the Node, or Out of the Loop?
I trust the provider 100%. The provider says it's his least busy Node. Maybe there might be nobody else one it, although the provider didn't say that expressly.
Compared with their Nodes, I often have seen reduced disk performance on Qemu VPSes. Interestingly, network performance on Qemu VPSes seems pretty good when compared with network performance on the Nodes. However, it's not the same with disk I/O. Even using the virtio Qemu feature instead of Qemu's full hardware emulation still doesn't seem to move Qemu VPS disk I/O up toward parity with the Node. For disk I/O there often seems to be a minimum of approximately 15 or 20% performance penalty.
Previously I was only looking at overall time for disk I/O tasks. Now that @cmeerw kindly introduced me to vmstat, I am able to see the I/O wait numbers. Thus now I can imagine that the disk I/O performance penalty is due to I/O wait.
Of course you are right. What I want to understand, though, is why the VPS OS takes longer to do the same file I/O task even using virtio Qemu than the time taken to do the same task directly on the Node without Qemu. Also, why is there no similar penalty for network I/O? The file I/O penalty shown here seems to exist even on empty Nodes with only one test VM running.
Thanks (yet again!) to your kindness, @cmeerw, I might recently see a little more deeply. Maybe I now need to find a kind person on the Qemu team who might give me a little tour of the code and a little hint about why Qemu seems, even with virtio, to take longer for file I/O.
Do we have someone here who knows the Qemu code?
I hope everyone gets the servers they want!
What I did this morning was create a 2 core, 4 GB VPS on my E3 machine. Then I ran a test on the E3 VPS. Here are the results. Almost no I/O wait!
Why is there so much I/O wait on the Ryzen VPS results shown above and practically none on the E3 VPS?
I hope everyone gets the servers they want!
Actually looking at the Ryzen results again, why is it only doing around 150 MB/s (even with a 1M block size)?
Could you maybe just do a
dd if=chronos-20240904.tgz.cpt of=/dev/null bs=1M
? Surely, that should be closer to some GB/s (similar to the fio results you had earlier)One other thing that might be an advantage for the E3 VPS could be that it still benefits from the host system's cache (depending on how the storage is set up)
The kind provider and I finally got around to testing, as @zakkuuno suggested on OGF, with increased RAM on the Ryzen VPS.
Bingo! No I/O wait!
Why does the Ryzen VPS need more RAM than the E3 VPS needs?
I hope everyone gets the servers they want!
bashvm@E3-VPS:~$ date; dd if=chronos-20240904.tgz.cpt bs=1M | /usr/bin/time -v sha256sum -c 6
This line is a weird cut and paste error. It should have been
bashvm@E3-VPS:~$ date; dd if=chronos-20240904.tgz.cpt bs=1M | /usr/bin/time -v sha256sum -c chronos-20240904.tgz.cpt.SHA256; date
I hope everyone gets the servers they want!
Right, there is no I/O wait, because there is no I/O now ("bi" column in vmstat are all "0" - because it's cached now).
Possible answer: The Ryzen needs more RAM because it is using NVMe instead of the SSD that is in the E3? So it's not the single threaded sha256sum that makes the difference here? Instead it's the difference in the disk setup?
I hope everyone gets the servers they want!
The time discrepancies you're seeing when running sha256sum -c between your E3 dedicated server and Ryzen 7950X VPS are puzzling, especially given that the Ryzen 7950X is a far more powerful processor and is equipped with NVMe storage. However, there are multiple factors that could explain why the E3 dedicated server is outperforming the Ryzen VPS for this specific task, despite the hardware advantages of the VPS.
Let's break this down:
Single-threaded Nature of sha256sum
The sha256sum utility, as you're observing, is likely single-threaded, meaning it can only use one CPU core for computing the checksum at a time. This explains why even a more powerful CPU like the Ryzen 7950X may not show its full potential in this case. It's not leveraging multiple cores, so the additional power of the Ryzen CPU is not fully utilized.
In your case, the E3 has fewer but dedicated cores with no virtualization overhead, and that may be enough to perform similarly or even better for single-threaded workloads like sha256sum.
VPS Virtualization Overhead
Even though the Ryzen VPS has powerful hardware, virtualization adds a layer of overhead. This includes managing the CPU, memory, and I/O operations. The performance of vCores in a VPS may not match the physical cores of a dedicated server, even if the VPS hardware is theoretically faster.
The dedicated server on the E3 is running without any virtualization overhead, so it can access the hardware directly without hypervisor intervention. This direct access can make a noticeable difference in I/O-heavy operations like checksum verification.
I/O Performance (SSD RAID 10 vs NVMe RAID 10)
You mentioned that the E3 dedicated server has SSD RAID 10, while the Ryzen VPS has NVMe RAID 10. Although NVMe is generally faster than SSD, performance in a VPS can suffer due to I/O throttling, contention with other VPSs, or underlying shared infrastructure that limits the maximum throughput of the storage system.
RAID 10 on SSDs in your dedicated E3 might be very well-optimized and not subject to any sharing. Meanwhile, the NVMe RAID 10 in the VPS could be shared across multiple tenants. Even if it's fast, contention for resources might slow down the actual I/O performance you're seeing on the VPS.
Impact on sha256sum -c: Since this command reads through the entire file, the I/O performance will heavily affect the overall execution time. If the VPS has any bottlenecks or restrictions on its NVMe performance, this could lead to slower I/O compared to the dedicated SSD RAID 10 setup.
Different Operating Systems
The E3 is running Ubuntu 22.04.4 LTS, while the Ryzen VPS is running Debian 12.7. While both are Linux distributions, there might be differences in file system tuning, kernel versions, or I/O scheduling mechanisms that could impact how efficiently each system handles large file reads and checksumming operations.
The Ubuntu installation on the E3 could have a more optimized I/O scheduler for this specific workload, or it might handle file caching differently than Debian on the VPS.
I/O Caching and File System Differences
The file system in use (ext4, XFS, etc.) could impact performance as well. Different file systems have varying levels of efficiency for reading large files. In addition, Linux systems often cache recently accessed files in memory, so depending on when and how the sha256sum -c was run, the E3 might have benefited from better caching of the file in memory compared to the Ryzen VPS.
RAID configurations on the VPS may also introduce additional latency due to the abstraction layer needed to manage the RAID arrays. Even though NVMe is fast, the RAID controller (especially in virtualized environments) can sometimes add latency.
Disk I/O Saturation and Contention
On a VPS, you're often sharing resources like the disk subsystem with other virtual machines. Even if your VPS has NVMe storage, if other users on the same physical machine are performing heavy disk operations at the same time, your VPS might experience lower-than-expected I/O performance.
This would explain why real-world performance on the VPS is lower than expected, even though benchmarks like Geekbench and fio show the Ryzen VPS is generally faster in isolated tests.
real, user, and sys Time Differences
The "real" time is the actual elapsed time from start to finish of the command.
The "user" time refers to the amount of time the CPU spent executing user-space (application-level) code.
The "sys" time is the amount of time the CPU spent executing system-level (kernel) operations, such as I/O operations.
If you see a significant increase in sys time on the VPS, it could indicate that more time is being spent on I/O operations, which would make sense if the I/O system on the VPS is slower due to contention, throttling, or other factors.
Graphics Processors (GPU)
The graphics processors (iGPUs) on either system are not involved in the sha256sum operation. SHA-256 hashing is a CPU-bound operation, and the typical sha256sum utility does not leverage GPU acceleration unless specifically configured to do so (which is not common with basic file checksum verification tasks).
CPU Frequency Differences
Although the Ryzen 7950X is much newer and faster overall, the E3's CPU might have a higher base clock speed for a single core, or it might sustain its boost frequencies for longer periods under load, particularly since it's dedicated hardware. The Ryzen VPS, on the other hand, might throttle CPU frequency slightly in a virtualized environment to balance resource usage across tenants, impacting single-threaded tasks like sha256sum.
Conclusion:
The E3 dedicated server is outperforming the Ryzen 7950X VPS in your sha256sum -c tests due to a combination of factors:
Single-threaded nature of sha256sum – the additional cores and raw power of the Ryzen CPU are not fully utilized.
Virtualization overhead – the VPS introduces some overhead that affects I/O performance and CPU efficiency.
I/O performance bottlenecks – while the VPS uses NVMe storage, it may be throttled or subject to resource contention, whereas the E3 has direct and optimized access to its SSD RAID 10 array.
File system and kernel differences – slight differences in how the operating systems handle I/O could also contribute to the disparity.
Your results are not unusual given these factors, and the key issue seems to be the VPS’s I/O performance and the single-threaded nature of sha256sum -c.
@hostmaze shut up AI!