Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.
Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.
@VirMach said:
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.
Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.
@VirMach said:
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.
Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.
@VirMach said:
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.
Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.
@VirMach said:
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.
Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.
@VirMach said:
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.
Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.
@VirMach said:
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.
Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.
@VirMach said:
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.
Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.
Should have a more thorough update later today. It's been raining a lot which kind of makes it difficult to load/unload more equipment and my driver side window broke last week. I'm fine getting rained on while I drive but there's no covered parking at the datacenter.
Second cabinet was set up and I've already loaded it with some more servers. Getting the rest ready to hopefully mostly fill them out today and tomorrow. Networking ran into some issues, I think the QFX5100 needs an update but Juniper doesn't make the process very easy. Once I'm back at the facility I'll try a few other workarounds. The second we get this one link up though the rest is pretty much ready to go.
@VirMach said: As for routing to Dallas first, I assume you mean if it has to get to the west coast. I know of a transit provider that has a 60G direct route to San Jose, it's just way too expensive. If we expand the location out, or other providers become interested, we could add that. About 2-3x the price of Lumen which is already several times the price of Cogent. They also have a direct route to Chicago. So if we do that, it would 100% be faster than Dallas at least to San Jose. I think it's already faster to Chicago
I was able to get a deal worked out so we're now going to also have a third carrier. Cogent's going to be another 2-3 weeks, I haven't received a timeline for setup but I think this is going to greatly improve the network. They've essentially been the main fiber provider in Oklahoma for the past 10+ years. Plus they have a funny name and quite possibly the worst transit map I've ever seen.
Here's an equally confusing older map.
For some reason I can't find the latest map anymore but I think it pretty much looks the same.
I have two stories for you guys, backstories for two servers we got back.
First let's talk about the miracle of MIAZ014. I had my noise cancelling headphones while packing the servers in Miami, so I did not notice this at all: the tech in Miami, after some maintenance, had unscrewed the motherboard and left it off. Of course, when I get the box, it also happens to be the one that's royally crushed, it looked like the UPS guy stomped on it (maybe to make the rattling stop.) Other ones, fine.
I get it back and I have my noise cancelling headphones on (wow, they're really good at cancelling noise.) Since the chassis was crushed, it had pinned the motherboard. Immediately I thought it was toast, and I just kind of hammered the chassis back in shape, figured we'd use it for something else. For some reason, this ended up in the wrong server pile and it made its way to the datacenter. It got racked, but then I noticed it'd beat up and remembered the name. Just for fun, I decide to try to power it on. It doesn't power on, no surprise. I'm about to just unrack it and take it home, and that's when I first notice it rattling. I open it up, and see the mess. NVMe SSDs everywhere, crushed heatsink, yeah no wonder it didn't turn on. Except I notice the tech also didn't plug back in the power pins, he just plugged it back in and it was set to auto boot so he called it a day and it was just running since, for like a year or two. So I plug in the power pins, and it boots. Everything's fine. No data loss, nothing. I'll still do testing and replace some things after we get any customers off.
Then let's talk about the most infuriating story every, Dedipath and the curse of Ernie. For this one I'm going to try to word everything relatively neutrally and let you make your own assumptions on what really happened.
At the beginning, for some reason, we had a lot of trouble with the servers sent to Dedipath. Pretty much in every location in some way, it was weird. Like lots and lots of troubleshooting just to get them going even though they were tested before being sent in. There were many servers, but there were a pair of twins in Atlanta that were really giving us a headache, and they seemed to regress as they were worked on. So we'd ask them to troubleshoot, and then it wouldn't improve and now it wouldn't even post, etc. I do not remember many of those details clearly, but there's one part of it I remember very clearly, and it's the reason I didn't even bother touching this server since 2022. It's just been sitting there until today.
I still have some of the chat logs/communication on these two servers so I was able to verify and make sure I'm not misremembering. After all this work, we sent in a board replacement for one of these. They allegedly swapped out the board, but it had the same exact problem. Strange, must be what's on the board then even though that didn't make much sense after the details we collected while troubleshooting. So we sent in memory replacement. more troubleshooting, we also sent in another set of memory for the other one. I got back the potentially bad memory, and it was fine. So I built out a new server, we didn't have a chassis readily available, but think of it like a motherboard with everything attached, tested, wrapped, and sent in ready to do a transplant. The maintenance was done, but we had the exact same issue still. I think it was a total of two sets of memory and motherboards for one, then three other motherboards and 5 sets of memory for the other, plus NVMe SSDs, etc.
This is where Ernie comes in. I'm still trying to figure out what's going on with everything, because it doesn't seem like a swap was done, I'm getting different information from different people. I ask Ernie to make sure maintenance is being done properly on it, ask him again if the board LED is on, if it's responding to being powered on, what the board is displaying. No answer on any of that, but he says something really interesting. For these servers, Ernie said he personally flew to Atlanta, swapped the RAM, and reseated the CPUs. He specifically also mentions he also reseated the RAM and made sure it's in and they're "simply giving no output." That they've exhausted their capability, they need to be sent back for repair, and it's going to be expensive to keep doing remote hands on them. "I have very little faith they will work." Finally, he ends it by saying:
"I'll be very candid I'm not sure what vendor is doing your initial builds but Frankenstein on some of these is an understatement. I do have a source for these to properly be built if you want a intro."
I let him know I'm the vendor doing the builds, reiterate everything above. He says the board swaps were done, etc, then follows it up by saying the most recent motherboard that was sent in to fix the issue was placed in storage. Then he says it might have been put into recycling. That doesn't make sense but I'm ready to just get whatever back. He offers to credit us $540 for all this, which I don't know if we ever got but we definitely got a $550 or so bill just to have these sent back plus a few thousand in RH, plus we lost all that hardware.
One of the servers we get back is one of Dedipath's servers. I already know what's about to happen but I let Ernie know, and of course we send back their hardware (and never hear back on our server they sent wherever instead.) So we lose another server.
The one I did get back, that's been sitting in a corner, I finally open it today. The one Ernie confirmed he worked on personally.
First of all, there's an NVMe missing. Whoops I wonder what happened to that, weird. Two DIMMs are not seated properly and when I say not seated properly what I mean isn't that it didn't "click" but that whoever worked on it pressed in one side, and didn't even align the other side at all, so it's kind of hanging sideways, visibly and the other the same but it got pushed in a bit more.
The heatsink seemed to be a little loose but honestly I can't really confirm that part 100% it could just be I'm tired and thought it opened too easily. The CPU isn't even in, it's stuck to the heatsink but to be fair this is a potentially known problem and a reason the CPU would have been opened up (to fix.) Except there's thermal paste EVERYWHERE like on parts of the heatsink that don't even go above the CPU, on the board, UNDER the CPU (this isn't the first time this has happened, to be fair. No wait actually this was in 2022 so it likely was the first time this happened.)
I cleaned up the paste, pushed in the memory, hit the power button and it works fine. Took maybe 3 minutes. And that's how we lost... I don't know, what's the tally? The end.
This all makes me so much more excited for Oklahoma. Speaking of which, I'm finally about to head to the facility. Raining stopped several hours ago, I'm just trying to take down a big batch of servers without forgetting anything.
@VirMach said:
I have two stories for you guys, backstories for two servers we got back.
First let's talk about the miracle of MIAZ014. I had my noise cancelling headphones while packing the servers in Miami, so I did not notice this at all: the tech in Miami, after some maintenance, had unscrewed the motherboard and left it off. Of course, when I get the box, it also happens to be the one that's royally crushed, it looked like the UPS guy stomped on it (maybe to make the rattling stop.) Other ones, fine.
I get it back and I have my noise cancelling headphones on (wow, they're really good at cancelling noise.) Since the chassis was crushed, it had pinned the motherboard. Immediately I thought it was toast, and I just kind of hammered the chassis back in shape, figured we'd use it for something else. For some reason, this ended up in the wrong server pile and it made its way to the datacenter. It got racked, but then I noticed it'd beat up and remembered the name. Just for fun, I decide to try to power it on. It doesn't power on, no surprise. I'm about to just unrack it and take it home, and that's when I first notice it rattling. I open it up, and see the mess. NVMe SSDs everywhere, crushed heatsink, yeah no wonder it didn't turn on. Except I notice the tech also didn't plug back in the power pins, he just plugged it back in and it was set to auto boot so he called it a day and it was just running since, for like a year or two. So I plug in the power pins, and it boots. Everything's fine. No data loss, nothing. I'll still do testing and replace some things after we get any customers off.
Then let's talk about the most infuriating story every, Dedipath and the curse of Ernie. For this one I'm going to try to word everything relatively neutrally and let you make your own assumptions on what really happened.
At the beginning, for some reason, we had a lot of trouble with the servers sent to Dedipath. Pretty much in every location in some way, it was weird. Like lots and lots of troubleshooting just to get them going even though they were tested before being sent in. There were many servers, but there were a pair of twins in Atlanta that were really giving us a headache, and they seemed to regress as they were worked on. So we'd ask them to troubleshoot, and then it wouldn't improve and now it wouldn't even post, etc. I do not remember many of those details clearly, but there's one part of it I remember very clearly, and it's the reason I didn't even bother touching this server since 2022. It's just been sitting there until today.
I still have some of the chat logs/communication on these two servers so I was able to verify and make sure I'm not misremembering. After all this work, we sent in a board replacement for one of these. They allegedly swapped out the board, but it had the same exact problem. Strange, must be what's on the board then even though that didn't make much sense after the details we collected while troubleshooting. So we sent in memory replacement. more troubleshooting, we also sent in another set of memory for the other one. I got back the potentially bad memory, and it was fine. So I built out a new server, we didn't have a chassis readily available, but think of it like a motherboard with everything attached, tested, wrapped, and sent in ready to do a transplant. The maintenance was done, but we had the exact same issue still. I think it was a total of two sets of memory and motherboards for one, then three other motherboards and 5 sets of memory for the other, plus NVMe SSDs, etc.
This is where Ernie comes in. I'm still trying to figure out what's going on with everything, because it doesn't seem like a swap was done, I'm getting different information from different people. I ask Ernie to make sure maintenance is being done properly on it, ask him again if the board LED is on, if it's responding to being powered on, what the board is displaying. No answer on any of that, but he says something really interesting. For these servers, Ernie said he personally flew to Atlanta, swapped the RAM, and reseated the CPUs. He specifically also mentions he also reseated the RAM and made sure it's in and they're "simply giving no output." That they've exhausted their capability, they need to be sent back for repair, and it's going to be expensive to keep doing remote hands on them. "I have very little faith they will work." Finally, he ends it by saying:
"I'll be very candid I'm not sure what vendor is doing your initial builds but Frankenstein on some of these is an understatement. I do have a source for these to properly be built if you want a intro."
I let him know I'm the vendor doing the builds, reiterate everything above. He says the board swaps were done, etc, then follows it up by saying the most recent motherboard that was sent in to fix the issue was placed in storage. Then he says it might have been put into recycling. That doesn't make sense but I'm ready to just get whatever back. He offers to credit us $540 for all this, which I don't know if we ever got but we definitely got a $550 or so bill just to have these sent back plus a few thousand in RH, plus we lost all that hardware.
One of the servers we get back is one of Dedipath's servers. I already know what's about to happen but I let Ernie know, and of course we send back their hardware (and never hear back on our server they sent wherever instead.) So we lose another server.
The one I did get back, that's been sitting in a corner, I finally open it today. The one Ernie confirmed he worked on personally.
First of all, there's an NVMe missing. Whoops I wonder what happened to that, weird. Two DIMMs are not seated properly and when I say not seated properly what I mean isn't that it didn't "click" but that whoever worked on it pressed in one side, and didn't even align the other side at all, so it's kind of hanging sideways, visibly and the other the same but it got pushed in a bit more.
The heatsink seemed to be a little loose but honestly I can't really confirm that part 100% it could just be I'm tired and thought it opened too easily. The CPU isn't even in, it's stuck to the heatsink but to be fair this is a potentially known problem and a reason the CPU would have been opened up (to fix.) Except there's thermal paste EVERYWHERE like on parts of the heatsink that don't even go above the CPU, on the board, UNDER the CPU (this isn't the first time this has happened, to be fair. No wait actually this was in 2022 so it likely was the first time this happened.)
I cleaned up the paste, pushed in the memory, hit the power button and it works fine. Took maybe 3 minutes. And that's how we lost... I don't know, what's the tally? The end.
This all makes me so much more excited for Oklahoma. Speaking of which, I'm finally about to head to the facility. Raining stopped several hours ago, I'm just trying to take down a big batch of servers without forgetting anything.
its really bad luck, but any way to prevent this in future?
Yay for the progress. And the weird keeps life 'interesting' discoveries :-D
Congrats. :-)
@Virmach how far away do you live / how much travel time to the Oklahoma datacenter is it for you?
Is it a case of whenever something goes wrong it would be easier to just jump in your car and be there in 15 minutes, rather than ask datacenter hands to check things out?
Comments
Slight delay, I finally figured out the 40G cable that got delivered is defective. Luckily I ordered several of them and one of the others arrived this morning. I’m going back to grab it and get some breakfast then going back to the DC with the new cable.
Some good news, I just confirmed second cabinet will be ready today so I might also bring back more servers and get working on that as well.
Will my 1200GB NVMe server be there?
@Mods: Can we flag someone for incessant trolling or do we have to continually say STFU ?
It wisnae me! A big boy done it and ran away.
NVMe2G for life! until death (the end is nigh)
Perfect
Will DM you.
Best and friendliest hosts Host-C , Hostbrr aff.
RAM, don't forget about RAM.
Haven't bought a single service in VirMach Great Ryzen 2022 - 2023 Flash Sale.
https://lowendspirit.com/uploads/editor/gi/ippw0lcmqowk.png
They can request an Official Troll tag just like us.
No hostname left!
These choices aren't mutually exclusive .... wait, are we talking about me?
you can start flash deals now
3 cores
300GB NVMe
I bench YABS 24/7/365 unless it's a leap year.
0GB RAM, 0TB bandwidth
=================
Nonono, 1200GB for @imok - he never specified needing any RAM.
Haven't bought a single service in VirMach Great Ryzen 2022 - 2023 Flash Sale.
https://lowendspirit.com/uploads/editor/gi/ippw0lcmqowk.png
How bout BYOR
It wisnae me! A big boy done it and ran away.
NVMe2G for life! until death (the end is nigh)
Maybe @virmach should make it 1 GB Bandwidth with 1200 GB for @imok,
was that BF 2023 on OGF when someone got a VPS with 1 GB BW ?
blog | exploring visually |
Does it support RAMoverIP ?
dnscry.pt - Public DNSCrypt resolvers hosted by LowEnd providers • Need a free NAT LXC? -> https://microlxc.net/
I would go here
Downloadmoreram.com
Any bench ?
blog | exploring visually |
I see IPv6 addresses in the control panel for AMS, but when I try to use it doesn't seem to work. What might I be missing?
Try just using the single main IP6 address, not the subnet.
Trying but not having any luck, but I might have the wrong gateway and cidr. Any good place for me to find those for the single IP?
Fix network button mostly fixed it, needed to manually edit the interface id.
Should have a more thorough update later today. It's been raining a lot which kind of makes it difficult to load/unload more equipment and my driver side window broke last week. I'm fine getting rained on while I drive but there's no covered parking at the datacenter.
Second cabinet was set up and I've already loaded it with some more servers. Getting the rest ready to hopefully mostly fill them out today and tomorrow. Networking ran into some issues, I think the QFX5100 needs an update but Juniper doesn't make the process very easy. Once I'm back at the facility I'll try a few other workarounds. The second we get this one link up though the rest is pretty much ready to go.
#KeepCalmAndGetDrenched
No hostname left!
Some great news.
I was able to get a deal worked out so we're now going to also have a third carrier. Cogent's going to be another 2-3 weeks, I haven't received a timeline for setup but I think this is going to greatly improve the network. They've essentially been the main fiber provider in Oklahoma for the past 10+ years. Plus they have a funny name and quite possibly the worst transit map I've ever seen.
Here's an equally confusing older map.
For some reason I can't find the latest map anymore but I think it pretty much looks the same.
HAHA
COX
Haven't bought a single service in VirMach Great Ryzen 2022 - 2023 Flash Sale.
https://lowendspirit.com/uploads/editor/gi/ippw0lcmqowk.png
I have two stories for you guys, backstories for two servers we got back.
First let's talk about the miracle of MIAZ014. I had my noise cancelling headphones while packing the servers in Miami, so I did not notice this at all: the tech in Miami, after some maintenance, had unscrewed the motherboard and left it off. Of course, when I get the box, it also happens to be the one that's royally crushed, it looked like the UPS guy stomped on it (maybe to make the rattling stop.) Other ones, fine.
I get it back and I have my noise cancelling headphones on (wow, they're really good at cancelling noise.) Since the chassis was crushed, it had pinned the motherboard. Immediately I thought it was toast, and I just kind of hammered the chassis back in shape, figured we'd use it for something else. For some reason, this ended up in the wrong server pile and it made its way to the datacenter. It got racked, but then I noticed it'd beat up and remembered the name. Just for fun, I decide to try to power it on. It doesn't power on, no surprise. I'm about to just unrack it and take it home, and that's when I first notice it rattling. I open it up, and see the mess. NVMe SSDs everywhere, crushed heatsink, yeah no wonder it didn't turn on. Except I notice the tech also didn't plug back in the power pins, he just plugged it back in and it was set to auto boot so he called it a day and it was just running since, for like a year or two. So I plug in the power pins, and it boots. Everything's fine. No data loss, nothing. I'll still do testing and replace some things after we get any customers off.
Then let's talk about the most infuriating story every, Dedipath and the curse of Ernie. For this one I'm going to try to word everything relatively neutrally and let you make your own assumptions on what really happened.
At the beginning, for some reason, we had a lot of trouble with the servers sent to Dedipath. Pretty much in every location in some way, it was weird. Like lots and lots of troubleshooting just to get them going even though they were tested before being sent in. There were many servers, but there were a pair of twins in Atlanta that were really giving us a headache, and they seemed to regress as they were worked on. So we'd ask them to troubleshoot, and then it wouldn't improve and now it wouldn't even post, etc. I do not remember many of those details clearly, but there's one part of it I remember very clearly, and it's the reason I didn't even bother touching this server since 2022. It's just been sitting there until today.
I still have some of the chat logs/communication on these two servers so I was able to verify and make sure I'm not misremembering. After all this work, we sent in a board replacement for one of these. They allegedly swapped out the board, but it had the same exact problem. Strange, must be what's on the board then even though that didn't make much sense after the details we collected while troubleshooting. So we sent in memory replacement. more troubleshooting, we also sent in another set of memory for the other one. I got back the potentially bad memory, and it was fine. So I built out a new server, we didn't have a chassis readily available, but think of it like a motherboard with everything attached, tested, wrapped, and sent in ready to do a transplant. The maintenance was done, but we had the exact same issue still. I think it was a total of two sets of memory and motherboards for one, then three other motherboards and 5 sets of memory for the other, plus NVMe SSDs, etc.
This is where Ernie comes in. I'm still trying to figure out what's going on with everything, because it doesn't seem like a swap was done, I'm getting different information from different people. I ask Ernie to make sure maintenance is being done properly on it, ask him again if the board LED is on, if it's responding to being powered on, what the board is displaying. No answer on any of that, but he says something really interesting. For these servers, Ernie said he personally flew to Atlanta, swapped the RAM, and reseated the CPUs. He specifically also mentions he also reseated the RAM and made sure it's in and they're "simply giving no output." That they've exhausted their capability, they need to be sent back for repair, and it's going to be expensive to keep doing remote hands on them. "I have very little faith they will work." Finally, he ends it by saying:
I let him know I'm the vendor doing the builds, reiterate everything above. He says the board swaps were done, etc, then follows it up by saying the most recent motherboard that was sent in to fix the issue was placed in storage. Then he says it might have been put into recycling. That doesn't make sense but I'm ready to just get whatever back. He offers to credit us $540 for all this, which I don't know if we ever got but we definitely got a $550 or so bill just to have these sent back plus a few thousand in RH, plus we lost all that hardware.
One of the servers we get back is one of Dedipath's servers. I already know what's about to happen but I let Ernie know, and of course we send back their hardware (and never hear back on our server they sent wherever instead.) So we lose another server.
The one I did get back, that's been sitting in a corner, I finally open it today. The one Ernie confirmed he worked on personally.
First of all, there's an NVMe missing. Whoops I wonder what happened to that, weird. Two DIMMs are not seated properly and when I say not seated properly what I mean isn't that it didn't "click" but that whoever worked on it pressed in one side, and didn't even align the other side at all, so it's kind of hanging sideways, visibly and the other the same but it got pushed in a bit more.
The heatsink seemed to be a little loose but honestly I can't really confirm that part 100% it could just be I'm tired and thought it opened too easily. The CPU isn't even in, it's stuck to the heatsink but to be fair this is a potentially known problem and a reason the CPU would have been opened up (to fix.) Except there's thermal paste EVERYWHERE like on parts of the heatsink that don't even go above the CPU, on the board, UNDER the CPU (this isn't the first time this has happened, to be fair. No wait actually this was in 2022 so it likely was the first time this happened.)
I cleaned up the paste, pushed in the memory, hit the power button and it works fine. Took maybe 3 minutes. And that's how we lost... I don't know, what's the tally? The end.
This all makes me so much more excited for Oklahoma. Speaking of which, I'm finally about to head to the facility. Raining stopped several hours ago, I'm just trying to take down a big batch of servers without forgetting anything.
its really bad luck, but any way to prevent this in future?
I bench YABS 24/7/365 unless it's a leap year.
Yay for the progress. And the weird keeps life 'interesting' discoveries :-D
Congrats. :-)
@Virmach how far away do you live / how much travel time to the Oklahoma datacenter is it for you?
Is it a case of whenever something goes wrong it would be easier to just jump in your car and be there in 15 minutes, rather than ask datacenter hands to check things out?
Or not exactly that conveniently close by?