VirMach - Complain - Moan - Praise - Chit Chat

VirMach · April 28

Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.

Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.

imok · April 28

Will my 1200GB NVMe server be there?

AlwaysSkint · April 28

@Mods: Can we flag someone for incessant trolling or do we have to continually say STFU ?

curiouswanderer · April 28

@VirMach said:

@curiouswanderer said:
Can I come take a picture and touch the racks @VirMach? I'll buy you a drink or coffee

Sure, as long as you like my cable management. Just let me wrap things up with the first cabinet.

@curiouswanderer said:
Excited for the sale next weekend! Always wanted to have VPS right where I live

I only did all this so I can have a VPN with low latency so I’m pretty excited too.

Perfect Will DM you.

Jab · April 28

@VirMach said:
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.

Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.

RAM, don't forget about RAM.

yoursunny · April 28

@AlwaysSkint said:
Can we flag someone for incessant trolling or do we have to continually say STFU ?

They can request an Official Troll tag just like us.

skorous · April 28

@AlwaysSkint said:
@Mods: Can we flag someone for incessant trolling or do we have to continually say STFU ?

These choices aren't mutually exclusive .... wait, are we talking about me?

VirMach · April 29

VirMach · April 29

@Jab said:

@VirMach said:
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.

Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.

RAM, don't forget about RAM.

cybertech · April 29

@VirMach said:

@Jab said:

@VirMach said:
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.

Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.

RAM, don't forget about RAM.

you can start flash deals now

3 cores
300GB NVMe

VirMach · April 29

@cybertech said:

@VirMach said:

@Jab said:

@VirMach said:
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.

Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.

RAM, don't forget about RAM.

you can start flash deals now

3 cores
300GB

0GB RAM, 0TB bandwidth

Jab · April 29

@VirMach said:

@Jab said:

@VirMach said:
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.

Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.

RAM, don't forget about RAM.

=================

@VirMach said:

@cybertech said:
you can start flash deals now

3 cores
300GB

0GB RAM, 0TB bandwidth

Nonono, 1200GB for @imok - he never specified needing any RAM.

localhost · April 29

How bout BYOR

AlwaysSkint · April 29

@localhost said:
How bout BYOR

vyas · April 29

@Jab said:

@VirMach said:
0GB RAM, 0TB bandwidth

Nonono, 1200GB for @imok - he never specified needing any RAM.

Maybe @virmach should make it 1 GB Bandwidth with 1200 GB for @imok,
was that BF 2023 on OGF when someone got a VPS with 1 GB BW ?

Brueggus · April 29

@VirMach said:

@cybertech said:

@VirMach said:

@Jab said:

@VirMach said:
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.

Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.

RAM, don't forget about RAM.

you can start flash deals now

3 cores
300GB

0GB RAM, 0TB bandwidth

Does it support RAMoverIP ?

localhost · April 29

@Brueggus said:

@VirMach said:

@cybertech said:

@VirMach said:

@Jab said:

@VirMach said:
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.

Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.

RAM, don't forget about RAM.

you can start flash deals now

3 cores
300GB

0GB RAM, 0TB bandwidth

Does it support RAMoverIP ?

I would go here

Downloadmoreram.com

vyas · April 29

@Brueggus said:

@VirMach said:

0GB RAM, 0TB bandwidth

Does it support RAMoverIP ?

Any bench ?

imok · April 29

@vyas said:

Any bench ?

CMunroe · April 30

I see IPv6 addresses in the control panel for AMS, but when I try to use it doesn't seem to work. What might I be missing?

bmoto · April 30

@CMunroe said:
I see IPv6 addresses in the control panel for AMS, but when I try to use it doesn't seem to work. What might I be missing?

Try just using the single main IP6 address, not the subnet.

CMunroe · April 30

@bmoto said:

@CMunroe said:
I see IPv6 addresses in the control panel for AMS, but when I try to use it doesn't seem to work. What might I be missing?

Try just using the single main IP6 address, not the subnet.

Trying but not having any luck, but I might have the wrong gateway and cidr. Any good place for me to find those for the single IP?

CMunroe · April 30

Fix network button mostly fixed it, needed to manually edit the interface id.

VirMach · April 30

Should have a more thorough update later today. It's been raining a lot which kind of makes it difficult to load/unload more equipment and my driver side window broke last week. I'm fine getting rained on while I drive but there's no covered parking at the datacenter.

Second cabinet was set up and I've already loaded it with some more servers. Getting the rest ready to hopefully mostly fill them out today and tomorrow. Networking ran into some issues, I think the QFX5100 needs an update but Juniper doesn't make the process very easy. Once I'm back at the facility I'll try a few other workarounds. The second we get this one link up though the rest is pretty much ready to go.

yoursunny · April 30

@VirMach said:
my driver side window broke last week. I'm fine getting rained on while I drive but there's no covered parking at the datacenter.

#KeepCalmAndGetDrenched

VirMach · April 30

Some great news.

@VirMach said: As for routing to Dallas first, I assume you mean if it has to get to the west coast. I know of a transit provider that has a 60G direct route to San Jose, it's just way too expensive. If we expand the location out, or other providers become interested, we could add that. About 2-3x the price of Lumen which is already several times the price of Cogent. They also have a direct route to Chicago. So if we do that, it would 100% be faster than Dallas at least to San Jose. I think it's already faster to Chicago

I was able to get a deal worked out so we're now going to also have a third carrier. Cogent's going to be another 2-3 weeks, I haven't received a timeline for setup but I think this is going to greatly improve the network. They've essentially been the main fiber provider in Oklahoma for the past 10+ years. Plus they have a funny name and quite possibly the worst transit map I've ever seen.

Here's an equally confusing older map.

For some reason I can't find the latest map anymore but I think it pretty much looks the same.

Jab · April 30

HAHA
COX

VirMach · 6:23AM

I have two stories for you guys, backstories for two servers we got back.

First let's talk about the miracle of MIAZ014. I had my noise cancelling headphones while packing the servers in Miami, so I did not notice this at all: the tech in Miami, after some maintenance, had unscrewed the motherboard and left it off. Of course, when I get the box, it also happens to be the one that's royally crushed, it looked like the UPS guy stomped on it (maybe to make the rattling stop.) Other ones, fine.

I get it back and I have my noise cancelling headphones on (wow, they're really good at cancelling noise.) Since the chassis was crushed, it had pinned the motherboard. Immediately I thought it was toast, and I just kind of hammered the chassis back in shape, figured we'd use it for something else. For some reason, this ended up in the wrong server pile and it made its way to the datacenter. It got racked, but then I noticed it'd beat up and remembered the name. Just for fun, I decide to try to power it on. It doesn't power on, no surprise. I'm about to just unrack it and take it home, and that's when I first notice it rattling. I open it up, and see the mess. NVMe SSDs everywhere, crushed heatsink, yeah no wonder it didn't turn on. Except I notice the tech also didn't plug back in the power pins, he just plugged it back in and it was set to auto boot so he called it a day and it was just running since, for like a year or two. So I plug in the power pins, and it boots. Everything's fine. No data loss, nothing. I'll still do testing and replace some things after we get any customers off.

Then let's talk about the most infuriating story every, Dedipath and the curse of Ernie. For this one I'm going to try to word everything relatively neutrally and let you make your own assumptions on what really happened.

At the beginning, for some reason, we had a lot of trouble with the servers sent to Dedipath. Pretty much in every location in some way, it was weird. Like lots and lots of troubleshooting just to get them going even though they were tested before being sent in. There were many servers, but there were a pair of twins in Atlanta that were really giving us a headache, and they seemed to regress as they were worked on. So we'd ask them to troubleshoot, and then it wouldn't improve and now it wouldn't even post, etc. I do not remember many of those details clearly, but there's one part of it I remember very clearly, and it's the reason I didn't even bother touching this server since 2022. It's just been sitting there until today.

I still have some of the chat logs/communication on these two servers so I was able to verify and make sure I'm not misremembering. After all this work, we sent in a board replacement for one of these. They allegedly swapped out the board, but it had the same exact problem. Strange, must be what's on the board then even though that didn't make much sense after the details we collected while troubleshooting. So we sent in memory replacement. more troubleshooting, we also sent in another set of memory for the other one. I got back the potentially bad memory, and it was fine. So I built out a new server, we didn't have a chassis readily available, but think of it like a motherboard with everything attached, tested, wrapped, and sent in ready to do a transplant. The maintenance was done, but we had the exact same issue still. I think it was a total of two sets of memory and motherboards for one, then three other motherboards and 5 sets of memory for the other, plus NVMe SSDs, etc.

This is where Ernie comes in. I'm still trying to figure out what's going on with everything, because it doesn't seem like a swap was done, I'm getting different information from different people. I ask Ernie to make sure maintenance is being done properly on it, ask him again if the board LED is on, if it's responding to being powered on, what the board is displaying. No answer on any of that, but he says something really interesting. For these servers, Ernie said he personally flew to Atlanta, swapped the RAM, and reseated the CPUs. He specifically also mentions he also reseated the RAM and made sure it's in and they're "simply giving no output." That they've exhausted their capability, they need to be sent back for repair, and it's going to be expensive to keep doing remote hands on them. "I have very little faith they will work." Finally, he ends it by saying:

"I'll be very candid I'm not sure what vendor is doing your initial builds but Frankenstein on some of these is an understatement. I do have a source for these to properly be built if you want a intro."

I let him know I'm the vendor doing the builds, reiterate everything above. He says the board swaps were done, etc, then follows it up by saying the most recent motherboard that was sent in to fix the issue was placed in storage. Then he says it might have been put into recycling. That doesn't make sense but I'm ready to just get whatever back. He offers to credit us $540 for all this, which I don't know if we ever got but we definitely got a $550 or so bill just to have these sent back plus a few thousand in RH, plus we lost all that hardware.

One of the servers we get back is one of Dedipath's servers. I already know what's about to happen but I let Ernie know, and of course we send back their hardware (and never hear back on our server they sent wherever instead.) So we lose another server.

The one I did get back, that's been sitting in a corner, I finally open it today. The one Ernie confirmed he worked on personally.

First of all, there's an NVMe missing. Whoops I wonder what happened to that, weird. Two DIMMs are not seated properly and when I say not seated properly what I mean isn't that it didn't "click" but that whoever worked on it pressed in one side, and didn't even align the other side at all, so it's kind of hanging sideways, visibly and the other the same but it got pushed in a bit more.

The heatsink seemed to be a little loose but honestly I can't really confirm that part 100% it could just be I'm tired and thought it opened too easily. The CPU isn't even in, it's stuck to the heatsink but to be fair this is a potentially known problem and a reason the CPU would have been opened up (to fix.) Except there's thermal paste EVERYWHERE like on parts of the heatsink that don't even go above the CPU, on the board, UNDER the CPU (this isn't the first time this has happened, to be fair. No wait actually this was in 2022 so it likely was the first time this happened.)

I cleaned up the paste, pushed in the memory, hit the power button and it works fine. Took maybe 3 minutes. And that's how we lost... I don't know, what's the tally? The end.

This all makes me so much more excited for Oklahoma. Speaking of which, I'm finally about to head to the facility. Raining stopped several hours ago, I'm just trying to take down a big batch of servers without forgetting anything.

cybertech · 6:43AM

@VirMach said:
I have two stories for you guys, backstories for two servers we got back.

First let's talk about the miracle of MIAZ014. I had my noise cancelling headphones while packing the servers in Miami, so I did not notice this at all: the tech in Miami, after some maintenance, had unscrewed the motherboard and left it off. Of course, when I get the box, it also happens to be the one that's royally crushed, it looked like the UPS guy stomped on it (maybe to make the rattling stop.) Other ones, fine.

I get it back and I have my noise cancelling headphones on (wow, they're really good at cancelling noise.) Since the chassis was crushed, it had pinned the motherboard. Immediately I thought it was toast, and I just kind of hammered the chassis back in shape, figured we'd use it for something else. For some reason, this ended up in the wrong server pile and it made its way to the datacenter. It got racked, but then I noticed it'd beat up and remembered the name. Just for fun, I decide to try to power it on. It doesn't power on, no surprise. I'm about to just unrack it and take it home, and that's when I first notice it rattling. I open it up, and see the mess. NVMe SSDs everywhere, crushed heatsink, yeah no wonder it didn't turn on. Except I notice the tech also didn't plug back in the power pins, he just plugged it back in and it was set to auto boot so he called it a day and it was just running since, for like a year or two. So I plug in the power pins, and it boots. Everything's fine. No data loss, nothing. I'll still do testing and replace some things after we get any customers off.

Then let's talk about the most infuriating story every, Dedipath and the curse of Ernie. For this one I'm going to try to word everything relatively neutrally and let you make your own assumptions on what really happened.

At the beginning, for some reason, we had a lot of trouble with the servers sent to Dedipath. Pretty much in every location in some way, it was weird. Like lots and lots of troubleshooting just to get them going even though they were tested before being sent in. There were many servers, but there were a pair of twins in Atlanta that were really giving us a headache, and they seemed to regress as they were worked on. So we'd ask them to troubleshoot, and then it wouldn't improve and now it wouldn't even post, etc. I do not remember many of those details clearly, but there's one part of it I remember very clearly, and it's the reason I didn't even bother touching this server since 2022. It's just been sitting there until today.

I still have some of the chat logs/communication on these two servers so I was able to verify and make sure I'm not misremembering. After all this work, we sent in a board replacement for one of these. They allegedly swapped out the board, but it had the same exact problem. Strange, must be what's on the board then even though that didn't make much sense after the details we collected while troubleshooting. So we sent in memory replacement. more troubleshooting, we also sent in another set of memory for the other one. I got back the potentially bad memory, and it was fine. So I built out a new server, we didn't have a chassis readily available, but think of it like a motherboard with everything attached, tested, wrapped, and sent in ready to do a transplant. The maintenance was done, but we had the exact same issue still. I think it was a total of two sets of memory and motherboards for one, then three other motherboards and 5 sets of memory for the other, plus NVMe SSDs, etc.

This is where Ernie comes in. I'm still trying to figure out what's going on with everything, because it doesn't seem like a swap was done, I'm getting different information from different people. I ask Ernie to make sure maintenance is being done properly on it, ask him again if the board LED is on, if it's responding to being powered on, what the board is displaying. No answer on any of that, but he says something really interesting. For these servers, Ernie said he personally flew to Atlanta, swapped the RAM, and reseated the CPUs. He specifically also mentions he also reseated the RAM and made sure it's in and they're "simply giving no output." That they've exhausted their capability, they need to be sent back for repair, and it's going to be expensive to keep doing remote hands on them. "I have very little faith they will work." Finally, he ends it by saying:

"I'll be very candid I'm not sure what vendor is doing your initial builds but Frankenstein on some of these is an understatement. I do have a source for these to properly be built if you want a intro."

I let him know I'm the vendor doing the builds, reiterate everything above. He says the board swaps were done, etc, then follows it up by saying the most recent motherboard that was sent in to fix the issue was placed in storage. Then he says it might have been put into recycling. That doesn't make sense but I'm ready to just get whatever back. He offers to credit us $540 for all this, which I don't know if we ever got but we definitely got a $550 or so bill just to have these sent back plus a few thousand in RH, plus we lost all that hardware.

One of the servers we get back is one of Dedipath's servers. I already know what's about to happen but I let Ernie know, and of course we send back their hardware (and never hear back on our server they sent wherever instead.) So we lose another server.

The one I did get back, that's been sitting in a corner, I finally open it today. The one Ernie confirmed he worked on personally.

First of all, there's an NVMe missing. Whoops I wonder what happened to that, weird. Two DIMMs are not seated properly and when I say not seated properly what I mean isn't that it didn't "click" but that whoever worked on it pressed in one side, and didn't even align the other side at all, so it's kind of hanging sideways, visibly and the other the same but it got pushed in a bit more.

The heatsink seemed to be a little loose but honestly I can't really confirm that part 100% it could just be I'm tired and thought it opened too easily. The CPU isn't even in, it's stuck to the heatsink but to be fair this is a potentially known problem and a reason the CPU would have been opened up (to fix.) Except there's thermal paste EVERYWHERE like on parts of the heatsink that don't even go above the CPU, on the board, UNDER the CPU (this isn't the first time this has happened, to be fair. No wait actually this was in 2022 so it likely was the first time this happened.)

I cleaned up the paste, pushed in the memory, hit the power button and it works fine. Took maybe 3 minutes. And that's how we lost... I don't know, what's the tally? The end.

This all makes me so much more excited for Oklahoma. Speaking of which, I'm finally about to head to the facility. Raining stopped several hours ago, I'm just trying to take down a big batch of servers without forgetting anything.

its really bad luck, but any way to prevent this in future?

ZA_capetown · 6:56AM

Yay for the progress. And the weird keeps life 'interesting' discoveries :-D
Congrats. :-)

@Virmach how far away do you live / how much travel time to the Oklahoma datacenter is it for you?

Is it a case of whenever something goes wrong it would be easier to just jump in your car and be there in 15 minutes, rather than ask datacenter hands to check things out?

Or not exactly that conveniently close by?

VirMach - Complain - Moan - Praise - Chit Chat

Comments