Is nested virtualisation as terrible as professionals say?
I've always heard that nested virtualisation is neat for testing, but should never be used for production because of performance and unintelligible mumbling "reasons"
What are people's actual experience?
Is anyone running nested virt in production?
Is it performant?
poll
- Do you use nested virtualisation?36 votes
- Never tried it30.56%
- Tried it, worked poorly22.22%
- I use it for testing30.56%
- I use it for production11.11%
- Other (post in thread)  5.56%
Comments
Why would you want to use it? Constrained resources? Sandboxed sandboxing?
Does running containers in a VM count? I do, to keep docker-supplied functionality separate from full OS LXC containers. It's all quite low volume.
There's probably a better way to reach the same result :-)
I had in mind specifically KVM-in-KVM, for example running a hypervisor like Proxmox or XCP-ng in a virtual machine.
Searching for "nested" here nets many people asking vendors for the feature - I figured there's some experience floating around
In case of the "worked poorly", it would be nice to have some insight in which combinations worked poorly, and especially, when running production, which combination that is and whether it runs satisfactory.
I think most ask this because they need amd-v or intel-vt cpu flags enabled for containerization, not necessarily for kvm-in-kvm. Nested as kvm-in-kvm has very poor disk performance within the nested box and slightly unstable network. That's what I experienced myself when running under high load in a nested test server. Another issue for production setups is that you add another layer which increases the trusting computing base.
Gotcha, thank you!
Hmm, I guess it's just an unoptimized usecase then. I can imagine that all kinds of queues and timings in networking/disk IO/CPU scheduling break down when you recurse them - like how TCP-in-TCP tunnels sounds like they should work, but just don't. Thanks!
Now I'm curious if there's any workarounds . Like having a network passthrough driver, or a non-journalling filesystem on one of the levels. It would need an active community to figure out a setup like that .
Edit: Thinking some more - I guess a performant system would either need massive tuning and tweaks on the scale it took to implement containers on linux - or the bare metal host needs to be aware of all recursive guests. And at that point you might as well simplify the system down to a single hypervisor with multiple tenants. Huh.
Used it for testing:
Websites have ads, I have ad-blocker.
terrible performance compared to what? if it's just some bullshit MBA talk, ignore them.
this argument has a lot of lacking information..for example;
since this is mostly about resource talk, you should reconsider for the cost efficiency, and does that decision to use nested virtualization cause more trouble than it's worth? (like it just increasing customer support cost because your users notice something "weird" with the performance).
sure it might be slow, but if you're able to make sure it has better cost efficiency overall in production, support and maintenance terms, just run it then, why not? (looking at ye, weird legacy systems).
from my experience, I only have this level of nested virtualization:
1. own hardware
2. install XCP-ng on it
3. from XCP-ng, create VMs
4. inside the VMs, deploy the software using docker / kubernetes
there are no significant performance hit, and this setup can run just fine in production environment.
Fuck this 24/7 internet spew of trivia and celebrity bullshit.
I'm using nested virtualization in production (KVM-in-KVM)
I don't see any performance issues.
I haven't tested it for applications that are very, very sensitive to the slightest latency.
But it seems to work well for many uses.
Nearly all NAT services either start or run on nested virt. Hell even BuyVM ran regular VPS on it for awhile.
As long as the passthrough is set correctly then you will not even notice a difference performance wise.
Free Hosting at YetiNode | Cryptid Security | URL Shortener | LaunchVPS | ExtraVM | Host-C | In the Node, or Out of the Loop?
I think there can be cases where you will see an impact but modt likely this then is due to missing awareness between the layers.
For IO think of stripe and blocksizes that are not matching or cache settings that compete or even conflict each other (barriers etc.). For network think of overhead in MTUs and similar stuff.
In the end I have to side with @remy here. If your layers match critical settings, there should not be much of an issue or big performance hit after all. Even for the often blamed kvm in kvm.