[2022] ★ VirMach ★ RYZEN ★ NVMe ★★ The Epic Sales Offer Thread ★★

flips · July 2022

@Daevien said:

@flips said:
IIRC my FFME0002 node was online for a while after hitting the migrate button. But then it's been dead ever since ...

Yep, same. I poke mine with a stick occasionally but it hasn't really done anything of note since the first day.

Mine is up now!

Sat 09 Jul 2022 11:13:20 AM CEST

Basic System Information:
---------------------------------
Uptime     : 0 days, 0 hours, 0 minutes
Processor  : AMD Ryzen 9 5900X 12-Core Processor
CPU cores  : 1 @ 3693.062 MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ✔ Enabled
RAM        : 1.4 GiB
Swap       : 1.5 GiB
Disk       : 8.3 GiB
Distro     : Debian GNU/Linux 11 (bullseye)
Kernel     : 5.18.0-0.bpo.1-amd64

fio Disk Speed Tests (Mixed R/W 50/50):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 356.12 MB/s  (89.0k) | 874.85 MB/s  (13.6k)
Write      | 356.10 MB/s  (89.0k) | 868.41 MB/s  (13.5k)
Total      | 712.22 MB/s (178.0k) | 1.74 GB/s    (27.2k)
           |                      |
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 1.35 GB/s     (2.6k) | 1.73 GB/s     (1.6k)
Write      | 1.35 GB/s     (2.6k) | 1.73 GB/s     (1.6k)
Total      | 2.70 GB/s     (5.2k) | 3.46 GB/s     (3.3k)

iperf3 Network Speed Tests (IPv4):
---------------------------------
Provider        | Location (Link)           | Send Speed      | Recv Speed
                |                           |                 |
Clouvider       | London, UK (10G)          | 934 Mbits/sec   | 937 Mbits/sec
Online.net      | Paris, FR (10G)           | 933 Mbits/sec   | 937 Mbits/sec
Hybula          | The Netherlands (40G)     | 936 Mbits/sec   | 937 Mbits/sec
Uztelecom       | Tashkent, UZ (10G)        | 868 Mbits/sec   | 721 Mbits/sec
Clouvider       | NYC, NY, US (10G)         | 856 Mbits/sec   | 765 Mbits/sec
Clouvider       | Dallas, TX, US (10G)      | 434 Mbits/sec   | 372 Mbits/sec
Clouvider       | Los Angeles, CA, US (10G) | 779 Mbits/sec   | 623 Mbits/sec

Geekbench 5 Benchmark Test:
---------------------------------
Test            | Value
                |
Single Core     | 1294
Multi Core      | 1311
Full Test       | https://browser.geekbench.com/v5/cpu/15909009

Daevien · July 2022

Yep, I reinstalled mine from scratch (that one is a tiny bf so not much on it) and it's still up now hours later. crosses fingers

yoursunny · July 2022

My vps1 on SJCZ006 had an automatic IP change from 213.59.116.x to 45.88.178.x.
The strange thing is, this happened when I typed sudo reboot in the OS.
Is VirBot giving out dynamic IP now, like residential network?

skorupion · July 2022

@VirMach said:

@yoursunny said:

@VirMach said:
We're ramping up the abuse script temporarily. If you're at around full CPU usage on your VPS for more than 2 hours, it'll be powered down. We have too many VMs getting stuck on OS boot and negatively affecting others at this time, due to the operating systems getting stuck on boot for some after the Ryzen change. I apologize in advance for any false positives but I do want to note that it's technically within our terms for 2 hours of that type of usage to be considered abuse, we just usually try to be much more lenient.

Boot loop after migrating to a different CPU or changing to a different IP is not abuse.
Customer purchased service on a specific CPU and a specific IP that are not expected to change.
The kernel and userland could have been compiled with -march=native so that it would not start on any other CPU.
The services could have been configured to bind to a specific IP, which would cause service restart loop if the IP disappeared.

The safest way is not automatically powering on the service after the migration.
The customer needs to press Power On button themselves and then fixes the machine right away.

Running -march=native code on an unsupported CPU triggers undefined behavior.
Undefined behavior means anything could happen, such as pink unicorn appearing in VirMach offices, @deank stopping to believe in the end, or @FrankZ receiving 1000 free servers.
The simple act of automatic powering on a migrated server could cause these severe consequences and you don't want that.

We're ramping up the abuse script. It's what it is called. I didn't say boot loop after migrating is abuse.

Abuse script will just power it down, not suspend. I don't see the harm in powering down something stuck in a boot loop. I was just providing this as a PSA for anyone reading who might be doing something else not related that's also using a lot of CPU and for general transparency, we're making the abuse script more strict to try to power down the ones stuck in the boot loop automatically more quickly.

The safest way is not automatically powering on the service after the migration.
The customer needs to press Power On button themselves and then fixes the machine right away.

Not possible, we have to power up all of them to fix other issues. Otherwise we won't know the difference between one that's stuck and won't boot and others. Plus many customers immediately make tickets instead of trying to power up the VPS after it goes offline so in any case having them powered on has more benefits than keeping them offline.

Please look ticket #618039

tetech · July 2022

Is FFME004 still down for everyone else? I notice it isn't mentioned in the latest status update. I get "The host is currently unavailable" in SolusVM, so I'm assuming it is more than me and haven't opened a ticket.

kheng86 · July 2022

@tetech said:
Is FFME004 still down for everyone else? I notice it isn't mentioned in the latest status update. I get "The host is currently unavailable" in SolusVM, so I'm assuming it is more than me and haven't opened a ticket.

My VM in FFME004 is fine, but VMs in FFME005 & FFME006 are still getting the "no bootable device" error

tetech · July 2022

@kheng86 said:

@tetech said:
Is FFME004 still down for everyone else? I notice it isn't mentioned in the latest status update. I get "The host is currently unavailable" in SolusVM, so I'm assuming it is more than me and haven't opened a ticket.

My VM in FFME004 is fine, but VMs in FFME005 & FFME006 are still getting the "no bootable device" error

Oh, interesting. Maybe it is time for a ticket.

VirMach · July 2022

Networking Update - Just essentially waiting on DC hands at this point for NYC, Dallas, Atlanta, San Jose, Seattle, Phoenix, and Denver. Once these switch over networking should improve drastically and it should also have a positive impact on the CPU steal. We were supposed to have it done today but doesn't look likely they'll get to it, we'll see.

For Tokyo, we'll try to get it scheduled and completed by Tuesday. That one wasn't planned initially but we already cleaned up the IP blocks so we might as well move forward quickly.

Frankfurt AFAIK is already done that way. Frankfurt is having issues connecting to our servers and even connecting to me here in Los Angeles but networking looks superb on every other check so I'm guessing some common carrier pooched something up. This means it'll get a lot of panel errors unfortunately for the time being.

Disk Update - Frankfurt looks a lot better, I got the configurations to be mostly stable. Some Dallas, Los Angeles, and others also got Gen4 NVMe so as they have problems we're already on it and fixing them but feel free to report any issues for the disks. I/O related errors only please, not being offline or never going online, we already have that part fully figured out and in the works as well, on a higher level.

Tokyo Storage Update - Haven't been able to get to it.

NYC Storage Update - Working on fast-tracking this as it's been heavily delayed and we really need it. We've already moved backups away from the same company the servers are with just in case. They're definitely not taking it well that we're actually leaving, like a crazy ex. It's potentially turning into a Psychz nightmare scenario but that's pretty much all I can say on that. So rest assured if they do pull anything we've got it covered. Of course it's not perfect. They're also being very nosey.

Amsterdam Storage Update - Same as above but secondary level of urgency meaning we want to get out NYC Metro first. I'll make it up to you guys for waiting so long, just remind me if I don't.

VirMach · July 2022

Template Syncs - Ongoing, many more re-synced. OS installs should work better. QuadraNet's "DDoS Protection" is essentially just hefty false positives though so for Los Angeles I might have to literally drive down a hard drive and load them on at this point since all the tweaks they do still don't allow for it. This also got in the way of a lot of backups and huge headaches with migrations.

Windows template is fully broken at this point, I think I synced a bad version. Looking into that later today hopefully.

AlwaysSkint · July 2022

In amongst all this chaos, no further updates on rDNS nor multi-IP, @Virmach ?
Just askin'.

AlwaysSkint · July 2022

Re: Template Syncs, I'll give my 256MB fubar'ed Dallas another attempt at a reinstall. (Not that this VPS is important to me, now.)

VirMach · July 2022

@AlwaysSkint said:
In amongst all this chaos, no further updates on rDNS nor multi-IP, @Virmach ?
Just askin'.

Bottom of the list at this point to be frank. I understand it's very important to some, we just have to make sure people have a functional service first, a functional IP second, that the networking actually functions well, more builds, coordinating shipments, backups, migrations, and getting through the literal thousands of tickets and damage control.

It'll probably go all of the above, then IPv6, then rDNS, then multi-IP. Likely sometime in August and I'm trying to be conservative but you know how that goes. Honestly though if we can't get it done by the end of August and you need multiple IP or rDNS... I'd be totally on your side if you were furious.

VirMach · July 2022

Side note on multi-IP, at this point it's going to require a lot of work to sort through all of it and that's after we set up more nodes. Multi-IP will most likely for most people require another migration as well. Originally this wasn't going to be a problem but originally we were also counting on IP subnet overlaps. Now that we're locking everything down to a subnet per node, it becomes tremendously more difficult.

AlwaysSkint · July 2022

@VirMach said: ..if you were furious.

Heart wouldn't like that, so more likely very frustrated. [Finally managed to walk 2 miles today - yipee!]
Bounced server cron/system emails are a nuisance due to lack of rDNS. The lack of multi-IP would seal the fate of that one particular VPS of mine, as it would no longer be cost effective, so to speak.

AlwaysSkint · July 2022

..Still get boot failure from CD on DALZ007, when trying a Ryzen Debian 10 (and likely others).

yoursunny · July 2022

Miami beach club update - DC hands are being distracted by sexy FrankZ dancing on the bus.

skorous · July 2022

@VirMach said:
Template Syncs - Ongoing, many more re-synced. OS installs should work better. QuadraNet's "DDoS Protection" is essentially just hefty false positives though so for Los Angeles I might have to literally drive down a hard drive and load them on at this point since all the tweaks they do still don't allow for it.

I'm sure this got dropped in all the billion other things going on but since you're talking about template syncs I figured I'd bring it back up. BF-SPECIAL-2020 only shows the stock four ( C7, C8, Deb8, Deb9 ) for mountable ISOs. Absolutely not a show-stopper or anything just wanted to make sure it was on a list somewhere.

willie · July 2022

I have an older SJC VM that rebooted this morning, stayed up for a while, but has been down most of the day with billing panel saying "the node is currently locked". This VPS is not on a Ryzen node unless it has been migrated (I haven't kept track but I didn't request a migration, figuring I'd wait til the smoke clears). "Server information" only says that the node is locked, and doesn't give the other info such as the node name, so I don't know what node it is on.
It also looks like my Ryzen vps has gotten rebuilt or something like that. I don't mean the rebuild from April but something more recent. Did that happen too? Again I haven't followed discussion that closely. This one is on SJCZ005.

Is the non-Ryzen stuff already known to be having issues? It has been working ok up til today.

VirMach · July 2022

@willie said:
1. I have an older SJC VM that rebooted this morning, stayed up for a while, but has been down most of the day with billing panel saying "the node is currently locked". This VPS is not on a Ryzen node unless it has been migrated (I haven't kept track but I didn't request a migration, figuring I'd wait til the smoke clears). "Server information" only says that the node is locked, and doesn't give the other info such as the node name, so I don't know what node it is on.

It also looks like my Ryzen vps has gotten rebuilt or something like that. I don't mean the rebuild from April but something more recent. Did that happen too? Again I haven't followed discussion that closely. This one is on SJCZ005.

Is the non-Ryzen stuff already known to be having issues? It has been working ok up til today.

This was supposed to be a quick migration but QuadraNet has been nullrouting our IP addresses all day for doing the transfers. I guess their DDoS protection is some script they set up to deny you service, the 14 year old developer must have misheard the objective.

I've been going through these more closely and they nullrouted someone for 5 days for sending 60MB/s traffic to their backup VPS for 10 minutes. Every single migration today has been completely botched by that, I have to go through them again and make sure it didn't also corrupt the data it transferred.

It's also possible you're part of the other migration which is going slow for other reasons. CC San Jose has always had weird routing problems and on top of that the Ryzens are on a quarter cabinet that are especially not doing well with the bigger VLANs right now. I assume they're using some old half dead switch (for quarter cabs they provide ports.) Maybe I'm just upset as a result of sleep deprivation, but if I had any energy left this year I'd say it looks like we need to do another round of migrations again already. But hopefully they'll have it resolved soon or else we'll have to start doing migrations via UPS overnighted USB drives to improve efficiency.

FrankZ · July 2022

@willie said: I have an older SJC VM that rebooted this morning, stayed up for a while, but has been down most of the day with billing panel saying "the node is currently locked".

I expect your VM is being migrated to Ryzen currently. The normal billing panel will show up again after the migration is complete.

VirMach · July 2022

@skorous said:

@VirMach said:
Template Syncs - Ongoing, many more re-synced. OS installs should work better. QuadraNet's "DDoS Protection" is essentially just hefty false positives though so for Los Angeles I might have to literally drive down a hard drive and load them on at this point since all the tweaks they do still don't allow for it.

I'm sure this got dropped in all the billion other things going on but since you're talking about template syncs I figured I'd bring it back up. BF-SPECIAL-2020 only shows the stock four ( C7, C8, Deb8, Deb9 ) for mountable ISOs. Absolutely not a show-stopper or anything just wanted to make sure it was on a list somewhere.

The templates were all fixed for all migrations up until yesterday or so, but only on SolusVM. Those aren't tied to their original package anymore. WHMCS is next if I can squeeze it in.

VirMach · July 2022

@AlwaysSkint said:
..Still get boot failure from CD on DALZ007, when trying a Ryzen Debian 10 (and likely others).

Yeah looks like syncs got absolutely gutted by everything I said above as well.

FrankZ · July 2022

@yoursunny said: Miami beach club update - DC hands are being distracted by sexy FrankZ dancing on the bus.

Last time FrankZ was in Miami it did not go so well.

willie · July 2022

Thanks @Virmach and much sympathy. Maybe you can bring on someone to help with this stuff. I guess I'll just have to wait for the non-Ryzen node to be sorted. My Ryzen VM is reachable and responds on port 22, but the ssh host key changed and I can no longer log into it, which makes it sound as if it's been reinstalled. I didn't try the VNC console.

VirMach · July 2022

@willie said:
Thanks @Virmach and much sympathy. Maybe you can bring on someone to help with this stuff. I guess I'll just have to wait for the non-Ryzen node to be sorted. My Ryzen VM is reachable and responds on port 22, but the ssh host key changed and I can no longer log into it, which makes it sound as if it's been reinstalled. I didn't try the VNC console.

It could be an IP conflict or IP change. If it continues let me know.

We've brought people on, we actually have someone helping but unfortunately they're not at the level we need them to be to really make an impact. There's also someone I'm supposed to be hiring for multiple months now from OGF but I basically have no time left to even go through the onboarding process. The most recent issue I described though wasn't a lack of time issue or anything like that, it's a QuadraNet breaking our transfers with nullroute and other DC being slow with hands request problem but it looks like it's finally about to get done for all servers except one, which I may have to revert.

VirMach · July 2022

SolusVM has migrated to either an Epyc or Ryzen servers, I don't remember. We got it like a year ago probably at this point and never used it until now. Let's see if this improves anything or if we're still stuck with PHP/MySQL bottlenecks.

Mason · July 2022

@VirMach said:
The most recent issue I described though wasn't a lack of time issue or anything like that, it's a QuadraNet breaking our transfers with nullroute...

Ahh yes, the famous, premium, VEST Anti-DDoS technology! Was renting a dedi with them to use as a Plex server a few years back. Was using GDrive as storage at the time and every scan for new media would trigger their "protection" and null route my IP for 24 hours. Graphs they had showed network traffic at like 5 Gbps sustained despite being on a 1G port... it just didn't make any sense. Left after the third or fourth null route and moved on to greener pastures.

netrix · July 2022

@VirMach said:
It could be an IP conflict or IP change. If it continues let me know.

my vps ip also has this issue (laxa018: 149.57.135.x)

willie · July 2022

So far no change on mine, unreachable right now, "node is locked" in client area an hour or so ago.

VirMach · July 2022

@Mason said:

@VirMach said:
The most recent issue I described though wasn't a lack of time issue or anything like that, it's a QuadraNet breaking our transfers with nullroute...

Ahh yes, the famous, premium, VEST Anti-DDoS technology! Was renting a dedi with them to use as a Plex server a few years back. Was using GDrive as storage at the time and every scan for new media would trigger their "protection" and null route my IP for 24 hours. Graphs they had showed network traffic at like 5 Gbps sustained despite being on a 1G port... it just didn't make any sense. Left after the third or fourth null route and moved on to greener pastures.

I was already expecting their DDoS protection to be pretty bad, as in an attack would leak through. Never would I have guessed that it's so good that it doesn't let ANYTHING through. They've solved the universal problem of denial of service attack by beating the attacker to the punch. Truly remarkable.

Honestly my only complaint in all this is that they don't just offer 10Gbps unmetered traffic. After all, they'll never actually physically serve since anything above 30MB/s is a denial of service attack.

[2022] ★ VirMach ★ RYZEN ★ NVMe ★★ The Epic Sales Offer Thread ★★

Comments