@AlwaysSkint said:
This ain't good: been down & up, then down again..
From Client Area, when trying to see why ATLZ is down.
Could Not Resolve Host: Solusvm.virmach.com
Some weird DNS issue that just went away. I don't remember who we set up for DNS but it's possible it briefly had a burst of connection issues in that route. Unrelated to SolusVM. It's WHMCS, it couldn't resolve Google either.
ATLZ still not opening though - too late for me now. Will check again in the morning.
Hope that this isn't the 3rd IP change - though looks as if the node has issues; sorry can't recall which one.
It wisnae me! A big boy done it and ran away.
NVMe2G for life! until death (the end is nigh)
I noticed a new option in "Switch IP" when I logged into my friend's account. I think it will be useful for them, but I fear that many tickets will be opened...
@tototo said:
I noticed a new option in "Switch IP" when I logged into my friend's account. I think it will be useful for them, but I fear that many tickets will be opened...
Only if people give their friends their account password.
@Virmach That's both my ATLZ (a nameserver, 149.57.205.xxx) and DLSZ VPS (149.57.208.xxx) inaccessible in SolusVM. [Edit] Both are completely down. Open a Priority ticket for ATL, so that you can trace the node?
It wisnae me! A big boy done it and ran away.
NVMe2G for life! until death (the end is nigh)
@tototo said:
I noticed a new option in "Switch IP" when I logged into my friend's account. I think it will be useful for them, but I fear that many tickets will be opened...
Only if people give their friends their account password.
Seattle finally got the switch configuration changes we requested. SEAZ004 and SEAZ008 having some issues with it (indirectly) so waiting on hands request for that, otherwise Seattle networking is finally decent.
ATLZ007 had a disk issue that's been fixed, as well as SEAZ010.
@AlwaysSkint said: Open a Priority ticket for ATL, so that you can trace the node?
Tickets are pretty much unfortunately useless at this point and we're handling everything absolutely independently from them for the time being. I still try to keep an eye out for anyone who mentions anything unique and where our intervention is required, such as data not having been moved over during migration, but those are a needle in a haystack.
@Emmet said:
My VMs keep being rebooted... Severe packet loss happens most days.
Will there be compensation account credits? Or should I just be thankful if my BF deals don't end up deadpooling?
Very vague so I'm unable to even figure out if you're on a location facing problems such as Seattle or if it's unique to your service. We'll allow people to request credits as long as they're very clear about it and basically, concise, when making the request. So basically if you said:
My VMs keep being rebooted... Severe packet loss happens most days.
Will there be compensation account credits?
We'd probably just have to close the ticket at this point for example if we get a ton of requests like that.
Probably needs to include:
Name of Node/service
What issue you faced
What duration
Reference of tickets if any reporting it or reference network status, monitoring, anything showing the issue
I'd love to make it easy on everyone and pretty much provide SLA for everyone affected at every level, but that'd basically be all of our revenue for the whole month at this point in a time where we're already spending a lot on new hardware, new colocation services, lots of emergency hands fees, working extra long, and not charging anyone any extra for the upgrade, while we actually actively stopped selling most new services to focus on the transition.
In most cases if you already are not paying full price for a plan on our website and have a special, we immediately lose more money just based on the request being put in but of course it's your right.
We just kindly ask people keep the reality of the situation in mind and consider showing some mercy where possible.
Anyway, that's not without saying that I haven't been thinking about it and trying to figure out the best way to make everyone happy and make it up to people, within the realm of possibility and without them having to make a ticket for SLA credit or service extensions.
@Mastodon said:
I'm supposed to be in NL. Still no luck getting the service online. I access the VPS through VNC.
Node: AMSD027
Main IP pings: false
Node Online: false
Service online: online
Operating System: linux-ubuntu-20.04-x86_64-minimal-latest-v2
Service Status:Active
GW / IP in VPS matches SolusVM.
(Sent a ticket on this, didn't get a response however.)
Any info and or remedy?
Cheers!
All I can immediately tell you is that it's not a node-wide issue and only 1 single person's service is "offline" and many are using networking and racking up bandwidth usage, with their service pinging. You might have to install a new OS if possible for the easiest fix, or make manual modifications and/or try reconfigure networking button maybe 3-4 times every 30 minutes just in case it's not going through properly.
There is technically a small chance of an IP conflict. We have scripts for it but they're not perfect, we run several rounds of them over and over where it goes through services that don't ping and reconfigures them.
I will be idling 2 or 3 more services next year but the exchange rate will increase the price by about 20-25%. I should have charged account credit in advance.
That said, they are far cheaper and I am happy (even consider for downtime due to migration.)
Tokyo is finally getting its long-overdue network overhaul. We had a good discussion about it and I want to kind of share what we believe is occurring right now and why.
Turns out, pretty much every network engineer I've spoken with agrees that large VLANs were not necessarily the issue. I know when previously discussing this everyone pretty much immediately pointed to that and we were initially skeptical of doing larger VLANs and now we're going back to smaller VLANs so it might appear that it was the only issue all along, so I just wanted to clarify what's actually going on.
Large VLANs are fine and independent tested at this point to not cause issues. No collisions or ARP issues generated from the large VLANs, confirmed. The switches are also more than capable in handling it.
The NICs are also not necessarily the issue. We've had multiple hosts confirm they've used the name integrated NIC with large VLANs without any issues.
The motherboard and everything else, also fine, even at higher density per server, but the caveat here is that the original host that told us they use larger VLANs didn't mention they had dedicated NICs.
All of our settings/configuration are also correct and nothing negative resulted from any custom changes on our end. In fact it's probably what's allowed us for the most part to have a relatively functional network.
So, basically:
Large VLAN, fine.
Motherboard/Ryzen and high VM quantity, fine.
NIC, fine.
Where the problem gets created is the combination of the NIC, with the high VM quantities, with the larger VLANs. Then everything goes out the door, and the problem is created. We still have not found the proper solution and I'm sure it exists, but the easy cop-out of treating the symptoms created is splitting the VLANs. So we're not actually solving the problem, we're just avoiding it. There are multiple ways we could avoid this and the route we've selected is splitting up the VLANs. We could have also just mixed up VMs per node in that there's a lower quantity per node (but larger VMs, so not necessarily lower usage.) We could have also gone with a different dedicated NIC, and we could have probably also utilized the other 1-3 ports per server to balance the traffic. All of these would probably just avoid the situation from existing that would lead to the problem, but we decided to split VLANs.
Good news is that we've pretty much confirmed it's not a VLAN size issue so some locations can still use larger VLANs which allows for a greater flexibility when it comes to migrating between the same location and keeping your IP address and all the other benefits that come with that. For higher quantity locations, such as Los Angeles, Seattle, Tokyo, and probably San Jose, it makes more sense to split it, at least until we find the correct solution.
It so happens that my nameserver is on that node. I got impatient after a few days, so reinstalled a due to be cancelled VM on ATLZ005. Sod's Law: just as I fully commissioned it, ATLZ007 came back online. Och well, I had nothing better to do.
[Note to senile self: must remember to reset nameserver DNS entries.]
It wisnae me! A big boy done it and ran away.
NVMe2G for life! until death (the end is nigh)
@Virmach; a minor aside..
My titchy 256MB RAM Dallas VM is being a 'mare to reinstall. Primarily down to ISO/template availability, though I suspect 'modern' distros are just too bloated for it - previously ran Debian 8.
DALZ007 has just one (TinyCore) ISO available in Solus (i.e. no netboot.xyz) and I'm unable to get any Client Area (legacy) nor Solus (Ryzen) templates to boot. Either the VM doesn't start at all or I get the commonly mentioned, no CD or HDD boot problem.
Additionally, with Rescue Mode:
Failed to connect to server (code: 1011, reason: Failed to connect to downstream server)
[i was looking to transfer this VM out to another member but not in this broken state.]
It wisnae me! A big boy done it and ran away.
NVMe2G for life! until death (the end is nigh)
@AlwaysSkint said: @Virmach; a minor aside..
My titchy 256MB RAM Dallas VM is being a 'mare to reinstall. Primarily down to ISO/template availability, though I suspect 'modern' distros are just too bloated for it - previously ran Debian 8.
DALZ007 has just one (TinyCore) ISO available in Solus (i.e. no netboot.xyz) and I'm unable to get any Client Area (legacy) nor Solus (Ryzen) templates to boot. Either the VM doesn't start at all or I get the commonly mentioned, no CD or HDD boot problem.
Additionally, with Rescue Mode:
Failed to connect to server (code: 1011, reason: Failed to connect to downstream server)
[i was looking to transfer this VM out to another member but not in this broken state.]
DALZ007 as well as a few others have been waiting probably nearly 2 weeks for networking to set up the VLAN properly. The way the teams function and how they do networking is not ideal in my opinion and we only have issues with this one partner. Others set it all up at the beginning, but for some reason this one seems to have a set of what I presume to be independent contractors that just do their own thing. I'm not going to name drop the provider but obviously based on previous information we know who Dallas is with right now. Nor am I trying to be negative towards them or complain.
After all at least for locations where we have our own switch, it's our fault for not just managing it ourselves, it's just something we don't have time to do right now. Dallas is one of the locations where we don't have our own switch though.
It's just weird to me that the networking guy seems to not have access to the port maps and has to request it from the non-networking guy and then he only essentially does what he's requested which is an OK way to operate, it just creates situations where servers will get racked and then won't have the switch configuration done and it's happened multiple times, I just didn't catch this one early on because for some reason this location was set up differently than the others, in that the public IP VLAN appears to be separate from the ... other public IP VLAN? Which also doesn't make sense because these aren't set up to even support tagged traffic so what's more likely is that these blocks never got added to the VLAN in the first place? I don't know. Waiting to hear back.
As for template syncs, there's probably been 20 syncs I've done so far, each one fails spectacularly for one reason or another. The way it's set up, if even one single node has a problem with it, it just freezes up permanently, there's no timeout set. SolusVM's coding, nothing I can do other than start it over and over. Plus it doesn't even move onto the next sync so I can't even segment it. So one node having temporary network problems, stuck forever. One node nullrouted due to incorrect DDoS setting at a provider, stuck forever.
In the switch, set static address binding: MAC - IP - switchport.
In every VM, set CIDR to be /32 and force gateway to be on-link.
In every VM, use ip neigh command to set static MAC address of the gateway.
VM-to-VM traffic goes through the gateway, hairpin.
If the switch is ever replaced, set to old MAC address so that VMs don't need to be reconfigured.
This is knowledge from a long time ago and it's based on hearsay from someone who used to work for us, but in the past, we couldn't do static routing with SolusVM. I don't remember the specific nor do I want to dive into it right now to verify. VM-specific configuration, not even going to begin diving into that one with SolusVM. Otherwise there's a dozen other configuration improvements I have in mind. I believe that was a feature request someone put in half a decade ago that they're still working on. I know we can do it outside of SolusVM but I don't want to get into that.
Switch-related configuration changes, we already have enough problems trying to get the DC to even provide us with functional VLANs mapped to the right port. Although it's a realistic possibility in the future when I take over management for locations where we own the switch. But that requires a few hours of reading the manual.
I hope you have a staff of writers, or use speech to text.
Otherwise your hands will be a wreck in no time
Unless you have a goal of typing a million words a year.
But then.. wrong forum
Cheers
I can usually type faster than I can think. Wait, I should've though that through before typing it, now I just sound like I can't think.
Actually now that I think about it, I've been typing slower and slower every day but I used to be able to type 140wpm. Now it's closer to 100 and I think it's because I have like 5 open cuts on my fingers at any given time and my hands are probably wrecked already from all the typing and building/packing servers.
Text to speech would definitely be slower, I'm definitely below average when it comes to speaking. I guess that's the perks of spending 20 years of my life on the computer instead of socializing.
Comments
just be focus to turn on TYOC028/TYOC030 node please, been down for many days/weeks are unacceptable,
09:29:04 up 29 days, 20:51
This ain't good: been down & up, then down again..
From Client Area, when trying to see why ATLZ is down.
It wisnae me! A big boy done it and ran away.
NVMe2G for life! until death (the end is nigh)
Some weird DNS issue that just went away. I don't remember who we set up for DNS but it's possible it briefly had a burst of connection issues in that route. Unrelated to SolusVM. It's WHMCS, it couldn't resolve Google either.
ATLZ still not opening though - too late for me now. Will check again in the morning.
Hope that this isn't the 3rd IP change - though looks as if the node has issues; sorry can't recall which one.
It wisnae me! A big boy done it and ran away.
NVMe2G for life! until death (the end is nigh)
I noticed a new option in "Switch IP" when I logged into my friend's account. I think it will be useful for them, but I fear that many tickets will be opened...
Only if people give their friends their account password.
@Virmach That's both my ATLZ (a nameserver, 149.57.205.xxx) and DLSZ VPS (149.57.208.xxx) inaccessible in SolusVM. [Edit] Both are completely down. Open a Priority ticket for ATL, so that you can trace the node?
It wisnae me! A big boy done it and ran away.
NVMe2G for life! until death (the end is nigh)
Resellers have those anyway.
It wisnae me! A big boy done it and ran away.
NVMe2G for life! until death (the end is nigh)
Be My Friend….
Godfather
blog | exploring visually |
Duplicate
blog | exploring visually |
It's nice to see their billing system is still up and running, while my VM still isn't.
I'm supposed to be in NL. Still no luck getting the service online. I access the VPS through VNC.
Node: AMSD027
Main IP pings: false
Node Online: false
Service online: online
Operating System: linux-ubuntu-20.04-x86_64-minimal-latest-v2
Service Status:Active
GW / IP in VPS matches SolusVM.
(Sent a ticket on this, didn't get a response however.)
Any info and or remedy?
Cheers!
Renewals, yes. But they stopped accepting new orders a few weeks ago until the issues are resolved.
My VMs keep being rebooted... Severe packet loss happens most days.
Will there be compensation account credits? Or should I just be thankful if my BF deals don't end up deadpooling?
Seattle finally got the switch configuration changes we requested. SEAZ004 and SEAZ008 having some issues with it (indirectly) so waiting on hands request for that, otherwise Seattle networking is finally decent.
ATLZ007 had a disk issue that's been fixed, as well as SEAZ010.
Tickets are pretty much unfortunately useless at this point and we're handling everything absolutely independently from them for the time being. I still try to keep an eye out for anyone who mentions anything unique and where our intervention is required, such as data not having been moved over during migration, but those are a needle in a haystack.
Very vague so I'm unable to even figure out if you're on a location facing problems such as Seattle or if it's unique to your service. We'll allow people to request credits as long as they're very clear about it and basically, concise, when making the request. So basically if you said:
We'd probably just have to close the ticket at this point for example if we get a ton of requests like that.
Probably needs to include:
I'd love to make it easy on everyone and pretty much provide SLA for everyone affected at every level, but that'd basically be all of our revenue for the whole month at this point in a time where we're already spending a lot on new hardware, new colocation services, lots of emergency hands fees, working extra long, and not charging anyone any extra for the upgrade, while we actually actively stopped selling most new services to focus on the transition.
In most cases if you already are not paying full price for a plan on our website and have a special, we immediately lose more money just based on the request being put in but of course it's your right.
We just kindly ask people keep the reality of the situation in mind and consider showing some mercy where possible.
Anyway, that's not without saying that I haven't been thinking about it and trying to figure out the best way to make everyone happy and make it up to people, within the realm of possibility and without them having to make a ticket for SLA credit or service extensions.
All I can immediately tell you is that it's not a node-wide issue and only 1 single person's service is "offline" and many are using networking and racking up bandwidth usage, with their service pinging. You might have to install a new OS if possible for the easiest fix, or make manual modifications and/or try reconfigure networking button maybe 3-4 times every 30 minutes just in case it's not going through properly.
There is technically a small chance of an IP conflict. We have scripts for it but they're not perfect, we run several rounds of them over and over where it goes through services that don't ping and reconfigures them.
I will be idling 2 or 3 more services next year but the exchange rate will increase the price by about 20-25%. I should have charged account credit in advance.
That said, they are far cheaper and I am happy (even consider for downtime due to migration.)
Tokyo is finally getting its long-overdue network overhaul. We had a good discussion about it and I want to kind of share what we believe is occurring right now and why.
Turns out, pretty much every network engineer I've spoken with agrees that large VLANs were not necessarily the issue. I know when previously discussing this everyone pretty much immediately pointed to that and we were initially skeptical of doing larger VLANs and now we're going back to smaller VLANs so it might appear that it was the only issue all along, so I just wanted to clarify what's actually going on.
Large VLANs are fine and independent tested at this point to not cause issues. No collisions or ARP issues generated from the large VLANs, confirmed. The switches are also more than capable in handling it.
The NICs are also not necessarily the issue. We've had multiple hosts confirm they've used the name integrated NIC with large VLANs without any issues.
The motherboard and everything else, also fine, even at higher density per server, but the caveat here is that the original host that told us they use larger VLANs didn't mention they had dedicated NICs.
All of our settings/configuration are also correct and nothing negative resulted from any custom changes on our end. In fact it's probably what's allowed us for the most part to have a relatively functional network.
So, basically:
Where the problem gets created is the combination of the NIC, with the high VM quantities, with the larger VLANs. Then everything goes out the door, and the problem is created. We still have not found the proper solution and I'm sure it exists, but the easy cop-out of treating the symptoms created is splitting the VLANs. So we're not actually solving the problem, we're just avoiding it. There are multiple ways we could avoid this and the route we've selected is splitting up the VLANs. We could have also just mixed up VMs per node in that there's a lower quantity per node (but larger VMs, so not necessarily lower usage.) We could have also gone with a different dedicated NIC, and we could have probably also utilized the other 1-3 ports per server to balance the traffic. All of these would probably just avoid the situation from existing that would lead to the problem, but we decided to split VLANs.
Good news is that we've pretty much confirmed it's not a VLAN size issue so some locations can still use larger VLANs which allows for a greater flexibility when it comes to migrating between the same location and keeping your IP address and all the other benefits that come with that. For higher quantity locations, such as Los Angeles, Seattle, Tokyo, and probably San Jose, it makes more sense to split it, at least until we find the correct solution.
It so happens that my nameserver is on that node. I got impatient after a few days, so reinstalled a due to be cancelled VM on ATLZ005. Sod's Law: just as I fully commissioned it, ATLZ007 came back online. Och well, I had nothing better to do.
[Note to senile self: must remember to reset nameserver DNS entries.]
It wisnae me! A big boy done it and ran away.
NVMe2G for life! until death (the end is nigh)
Large VLAN without ARP overhead:
ip neigh
command to set static MAC address of the gateway.Accepting submissions for IPv6 less than /64 Hall of Incompetence.
@Virmach; a minor aside..
My titchy 256MB RAM Dallas VM is being a 'mare to reinstall. Primarily down to ISO/template availability, though I suspect 'modern' distros are just too bloated for it - previously ran Debian 8.
DALZ007 has just one (TinyCore) ISO available in Solus (i.e. no netboot.xyz) and I'm unable to get any Client Area (legacy) nor Solus (Ryzen) templates to boot. Either the VM doesn't start at all or I get the commonly mentioned, no CD or HDD boot problem.
Additionally, with Rescue Mode:
[i was looking to transfer this VM out to another member but not in this broken state.]
It wisnae me! A big boy done it and ran away.
NVMe2G for life! until death (the end is nigh)
DALZ007 as well as a few others have been waiting probably nearly 2 weeks for networking to set up the VLAN properly. The way the teams function and how they do networking is not ideal in my opinion and we only have issues with this one partner. Others set it all up at the beginning, but for some reason this one seems to have a set of what I presume to be independent contractors that just do their own thing. I'm not going to name drop the provider but obviously based on previous information we know who Dallas is with right now. Nor am I trying to be negative towards them or complain.
After all at least for locations where we have our own switch, it's our fault for not just managing it ourselves, it's just something we don't have time to do right now. Dallas is one of the locations where we don't have our own switch though.
It's just weird to me that the networking guy seems to not have access to the port maps and has to request it from the non-networking guy and then he only essentially does what he's requested which is an OK way to operate, it just creates situations where servers will get racked and then won't have the switch configuration done and it's happened multiple times, I just didn't catch this one early on because for some reason this location was set up differently than the others, in that the public IP VLAN appears to be separate from the ... other public IP VLAN? Which also doesn't make sense because these aren't set up to even support tagged traffic so what's more likely is that these blocks never got added to the VLAN in the first place? I don't know. Waiting to hear back.
As for template syncs, there's probably been 20 syncs I've done so far, each one fails spectacularly for one reason or another. The way it's set up, if even one single node has a problem with it, it just freezes up permanently, there's no timeout set. SolusVM's coding, nothing I can do other than start it over and over. Plus it doesn't even move onto the next sync so I can't even segment it. So one node having temporary network problems, stuck forever. One node nullrouted due to incorrect DDoS setting at a provider, stuck forever.
@VirMach
I hope you have a staff of writers, or use speech to text.
Otherwise your hands will be a wreck in no time
Unless you have a goal of typing a million words a year.
But then.. wrong forum
Cheers
blog | exploring visually |
This is knowledge from a long time ago and it's based on hearsay from someone who used to work for us, but in the past, we couldn't do static routing with SolusVM. I don't remember the specific nor do I want to dive into it right now to verify. VM-specific configuration, not even going to begin diving into that one with SolusVM. Otherwise there's a dozen other configuration improvements I have in mind. I believe that was a feature request someone put in half a decade ago that they're still working on. I know we can do it outside of SolusVM but I don't want to get into that.
Switch-related configuration changes, we already have enough problems trying to get the DC to even provide us with functional VLANs mapped to the right port. Although it's a realistic possibility in the future when I take over management for locations where we own the switch. But that requires a few hours of reading the manual.
I can usually type faster than I can think. Wait, I should've though that through before typing it, now I just sound like I can't think.
Actually now that I think about it, I've been typing slower and slower every day but I used to be able to type 140wpm. Now it's closer to 100 and I think it's because I have like 5 open cuts on my fingers at any given time and my hands are probably wrecked already from all the typing and building/packing servers.
Text to speech would definitely be slower, I'm definitely below average when it comes to speaking. I guess that's the perks of spending 20 years of my life on the computer instead of socializing.
@VirMach You are a service provider, but firstly that you are a human so I hope you can get some rest...
At the end, if you need perfect setup you can't use SolusVM... That's why we see BuyVM and HostHatch selfmade panels
Be careful, man, you'll end up like me! Look out for the RSI issues.
It wisnae me! A big boy done it and ran away.
NVMe2G for life! until death (the end is nigh)
Ngl, you should get some more employees if budget allows, and also do some mental health break (but not the way Nexusbytes did it)
Crunchbits Technical Support, Technical Writer, and Sales
Contact me at: +1 (509) 606-3569 or [email protected]