VirMach - Complain - Moan - Praise - Chit Chat

1360361362363364366»

Comments

  • VirMachVirMach Hosting Provider

    @ZA_capetown said:
    Yay for the progress. And the weird keeps life 'interesting' discoveries :-D
    Congrats. :-)

    @Virmach how far away do you live / how much travel time to the Oklahoma datacenter is it for you?

    Is it a case of whenever something goes wrong it would be easier to just jump in your car and be there in 15 minutes, rather than ask datacenter hands to check things out?

    Or not exactly that conveniently close by?

    I live closer to the datacenter than the two main people that work there. Once my car window rolls back up during thunderstorms I’ll be unstoppable.

    Thanked by (2)ZA_capetown AlwaysSkint
  • VirMachVirMach Hosting Provider

    @cybertech said:

    @VirMach said:
    I have two stories for you guys, backstories for two servers we got back.


    First let's talk about the miracle of MIAZ014. I had my noise cancelling headphones while packing the servers in Miami, so I did not notice this at all: the tech in Miami, after some maintenance, had unscrewed the motherboard and left it off. Of course, when I get the box, it also happens to be the one that's royally crushed, it looked like the UPS guy stomped on it (maybe to make the rattling stop.) Other ones, fine.

    I get it back and I have my noise cancelling headphones on (wow, they're really good at cancelling noise.) Since the chassis was crushed, it had pinned the motherboard. Immediately I thought it was toast, and I just kind of hammered the chassis back in shape, figured we'd use it for something else. For some reason, this ended up in the wrong server pile and it made its way to the datacenter. It got racked, but then I noticed it'd beat up and remembered the name. Just for fun, I decide to try to power it on. It doesn't power on, no surprise. I'm about to just unrack it and take it home, and that's when I first notice it rattling. I open it up, and see the mess. NVMe SSDs everywhere, crushed heatsink, yeah no wonder it didn't turn on. Except I notice the tech also didn't plug back in the power pins, he just plugged it back in and it was set to auto boot so he called it a day and it was just running since, for like a year or two. So I plug in the power pins, and it boots. Everything's fine. No data loss, nothing. I'll still do testing and replace some things after we get any customers off.


    Then let's talk about the most infuriating story every, Dedipath and the curse of Ernie. For this one I'm going to try to word everything relatively neutrally and let you make your own assumptions on what really happened.

    At the beginning, for some reason, we had a lot of trouble with the servers sent to Dedipath. Pretty much in every location in some way, it was weird. Like lots and lots of troubleshooting just to get them going even though they were tested before being sent in. There were many servers, but there were a pair of twins in Atlanta that were really giving us a headache, and they seemed to regress as they were worked on. So we'd ask them to troubleshoot, and then it wouldn't improve and now it wouldn't even post, etc. I do not remember many of those details clearly, but there's one part of it I remember very clearly, and it's the reason I didn't even bother touching this server since 2022. It's just been sitting there until today.

    I still have some of the chat logs/communication on these two servers so I was able to verify and make sure I'm not misremembering. After all this work, we sent in a board replacement for one of these. They allegedly swapped out the board, but it had the same exact problem. Strange, must be what's on the board then even though that didn't make much sense after the details we collected while troubleshooting. So we sent in memory replacement. more troubleshooting, we also sent in another set of memory for the other one. I got back the potentially bad memory, and it was fine. So I built out a new server, we didn't have a chassis readily available, but think of it like a motherboard with everything attached, tested, wrapped, and sent in ready to do a transplant. The maintenance was done, but we had the exact same issue still. I think it was a total of two sets of memory and motherboards for one, then three other motherboards and 5 sets of memory for the other, plus NVMe SSDs, etc.

    This is where Ernie comes in. I'm still trying to figure out what's going on with everything, because it doesn't seem like a swap was done, I'm getting different information from different people. I ask Ernie to make sure maintenance is being done properly on it, ask him again if the board LED is on, if it's responding to being powered on, what the board is displaying. No answer on any of that, but he says something really interesting. For these servers, Ernie said he personally flew to Atlanta, swapped the RAM, and reseated the CPUs. He specifically also mentions he also reseated the RAM and made sure it's in and they're "simply giving no output." That they've exhausted their capability, they need to be sent back for repair, and it's going to be expensive to keep doing remote hands on them. "I have very little faith they will work." Finally, he ends it by saying:

    "I'll be very candid I'm not sure what vendor is doing your initial builds but Frankenstein on some of these is an understatement. I do have a source for these to properly be built if you want a intro."

    I let him know I'm the vendor doing the builds, reiterate everything above. He says the board swaps were done, etc, then follows it up by saying the most recent motherboard that was sent in to fix the issue was placed in storage. Then he says it might have been put into recycling. That doesn't make sense but I'm ready to just get whatever back. He offers to credit us $540 for all this, which I don't know if we ever got but we definitely got a $550 or so bill just to have these sent back plus a few thousand in RH, plus we lost all that hardware.

    One of the servers we get back is one of Dedipath's servers. I already know what's about to happen but I let Ernie know, and of course we send back their hardware (and never hear back on our server they sent wherever instead.) So we lose another server.

    The one I did get back, that's been sitting in a corner, I finally open it today. The one Ernie confirmed he worked on personally.

    First of all, there's an NVMe missing. Whoops I wonder what happened to that, weird. Two DIMMs are not seated properly and when I say not seated properly what I mean isn't that it didn't "click" but that whoever worked on it pressed in one side, and didn't even align the other side at all, so it's kind of hanging sideways, visibly and the other the same but it got pushed in a bit more.

    The heatsink seemed to be a little loose but honestly I can't really confirm that part 100% it could just be I'm tired and thought it opened too easily. The CPU isn't even in, it's stuck to the heatsink but to be fair this is a potentially known problem and a reason the CPU would have been opened up (to fix.) Except there's thermal paste EVERYWHERE like on parts of the heatsink that don't even go above the CPU, on the board, UNDER the CPU (this isn't the first time this has happened, to be fair. No wait actually this was in 2022 so it likely was the first time this happened.)

    I cleaned up the paste, pushed in the memory, hit the power button and it works fine. Took maybe 3 minutes. And that's how we lost... I don't know, what's the tally? The end.


    This all makes me so much more excited for Oklahoma. Speaking of which, I'm finally about to head to the facility. Raining stopped several hours ago, I'm just trying to take down a big batch of servers without forgetting anything.

    its really bad luck, but any way to prevent this in future?

    I mean it kind of took care of itself, they can’t hurt us if they’re out of business. I do want to give a huge shoutout to Ivan at LAX though he was an amazing tech and I hope he found a great position somewhere else.

    Also to kind of balance out these two stories, this isn’t much of a story, but xTom moved our equipment to the new facility and it felt as if it never happened, perfect move, way better than I did at LAX.

    Thanked by (2)ZA_capetown skorous
  • VirMachVirMach Hosting Provider
    edited 7:37AM

    I’ve relocated the memory for the storage server for probably the 5th time. I need to stop walking around with it in my hand. Now time to relocate the 16GB DDR4 that I need for that other specific server.

    Someone remind me to put a camera in every room and feed it to ChatGPT to keep track of items. Or maybe I’ll just condition myself to walk around holding dumbbells then one day I’ll also be able to pick up more than two or three servers at a time again.

    Hopefully I’ll actually get to leave soon. But hey good thing this time I have a proper checklist.

    The worst part is the second I remember I have to do something else I just leave whatever I’m holding in the nearest shelf and there’s plenty, so I have to scan up and down 6 feet for it.

    (Edit) casually just accidentally found the 8TB U.2 NVMe that I’ve been desperately trying to find for over a week.

  • VirMachVirMach Hosting Provider

    @yoursunny I’ve located a box that has your username on it, let me know if you still want it and I should be able to send it out within a couple more years as long as you still live there.

  • VirMachVirMach Hosting Provider

    Alright heading out now, I’ll stay there until I get the network up.

  • edited 8:54AM

    @VirMach said:
    @yoursunny I’ve located a box that has your username on it, let me know if you still want it and I should be able to send it out within a couple more years as long as you still live there.

    Yes, it's the XPG drive you promised us three years ago.
    https://lowendspirit.com/discussion/comment/89977/#Comment_89977

    We have been waiting for it to upgrade our desktop.
    We'll ring the 308 buzzer every time we go to Los Angeles until we receive the box.

    Thanked by (1)VirMach

    No hostname left!

  • JabJab Senpai

    @VirMach said: (Edit) casually just accidentally found the 8TB U.2 NVMe that I’ve been desperately trying to find for over a week.

    @imok TIME TO ASK THE QUESTION!

    Haven't bought a single service in VirMach Great Ryzen 2022 - 2023 Flash Sale.
    https://lowendspirit.com/uploads/editor/gi/ippw0lcmqowk.png

  • cybertechcybertech OGBenchmark King

    @VirMach said:
    Alright heading out now, I’ll stay there until I get the network up.

    please dont leave until the migrations are complete 🤭

    I bench YABS 24/7/365 unless it's a leap year.

  • VirMachVirMach Hosting Provider

    @cybertech said:

    @VirMach said:
    Alright heading out now, I’ll stay there until I get the network up.

    please dont leave until the migrations are complete 🤭

    Okay deal, unless my hotspot gets any worse than it already is right now.

    Awesome news by the way, I got the connectivity up between the router and switch. I must have just been sleep deprived last time. Well I'm also sleep deprived now but perhaps it helps that I worked on this first thing when I came in rather than racking a bunch of servers. Other theories include: 1) I was starving last time but I had a sandwich this time, 2) I didn't get any smarter, I just got lucky this time, 3) it's Thursday and I realized we're about to fall behind another whole week, 4) I didn't want to get stuck in the datacenter forever per my last promise.

    Let's see what other issues await us.

    @Jab said:

    @VirMach said: (Edit) casually just accidentally found the 8TB U.2 NVMe that I’ve been desperately trying to find for over a week.

    @imok TIME TO ASK THE QUESTION!

    Bad news thought, the server it's meant for needs a specific Epyc processor that's high core count and while I have at least two of those and it's a huge CPU, they look very tiny next to 200-300 servers worth of equipment and junk. I have to locate the processors now because it's not in the motherboard like I thought. There's also a second 8TB that I have to find unless we've accepted it's not going to be RAID1 but the original plan was RAID1 and I did also locate something like 5 controllers specifically designed for U.2 as well right next to the 8TB drive. Hey at least I'm pretty organized, everything's in little related clusters but in random spots. It's like a VirMach universe and we just have to locate all the galaxies.

    With all that said, this is not meant for an @imok 1200GB special.

Sign In or Register to comment.