A few days ago, a client’s data center (well, actually a server room) "vanished" overnight. My monitoring showed that all devices were unreachable. Not even the ISP routers responded, so I assumed a sudden connectivity drop. The strange part? Not even via 4G.

I then suspected a power failure, but the UPS should have sent an alert.

The office was closed for the holidays, but I contacted the IT manager anyway. He was home sick with a serious family issue, but he got moving.

To make a long story short: the company deals in gold and precious metals. They have an underground bunker with two-meter thick walls. They were targeted by a professional gang. They used a tactic seen in similar hits: they identify the main power line, tamper with it at night, and send a massive voltage spike through it.

The goal is to fry all alarm and surveillance systems. Even if battery-backed, they rarely survive a surge like that. Thieves count on the fact that during holidays, owners are away and fried systems can't send alerts. Monitoring companies often have reduced staff and might not notice the "silence" immediately.

That is exactly what happened here. But there is a "but": they didn't account for my Uptime Kuma instance monitoring their MikroTik router, installed just weeks ago. Since it is an external check, it flagged the lack of response from all IPs without needing an internal alert to be triggered from the inside.

The team rushed to the site and found the mess. Luckily, they found an emergency electrical crew to bypass the damage and restore the cameras and alarms. They swapped the fried server UPS with a spare and everything came back up.

The police warned that the chances of the crew returning the next night to "finish" the job were high, though seeing the systems back online would likely make them move on. They also warned that thieves sometimes break in just to destroy servers to wipe any video evidence.

Nothing happened in the end. But in the meantime, I had to sync all their data off-site (thankfully they have dual 1Gbps FTTH), set up an emergency cluster, and ensure everything was redundant.

Never rely only on internal monitoring. Never.

#IT #SysAdmin #HorrorStories #ITHorrorStories #Monitoring

Questa voce è stata modificata (3 mesi fa)
in reply to Stefano Marinelli

that advice also applies to monitoring scheduled backup jobs (or any other automated process). I use a service that emails me if I don't hit a specific URL roughly every 24 hours, and I hit that at the end of my backup job if it was successful.

Better than finding out the hard way at some point in the future that something happened with my backup job, preventing it from running for the last month.

in reply to Stefano Marinelli

About 15 years ago, the place I worked had a supercomputer. One night, the aircon in the machine room failed. The machine kept computing, and the temperature rose. It rose *quite a lot*.

Sadly, the first thing to fail from the heat was the core switch for the room. You know, the one that handles all of the network for everything in the room. Including the temperature alerts.

It was finally spotted about 8am when the security patrol wondered why the door shutters were so hot.

Unknown parent

mastodon - Collegamento all'originale

Hugo Mills

Fortunately, the only thing that did fail after the aircon was the switch. (And a pair of ear muffs which had been hanging on a metal rail -- they'd melted).

The fire brigade turned up, checked everything, and ran some big positive pressure fans to get airflow through the room from one door to the other to cool everything down.

in reply to Stefano Marinelli

10+ years ago i started volunteering at a festival. Everything was new that year including the small outdoor racks for the area field routers (Juniper MX80). They barely fit but we managed. The racks were left in the sun in the summer. It was only when we enabled Observium (LibreNMS predecessor) that graphs almost everything it gets from SNMP that we discovered the inlet temperature was getting close to 80 degrees C. #monitorallthethings
in reply to Stefano Marinelli

Cool story bro, but it's too fictional, I'd say.
First off, as a Ukrainian, I know that powerlines can survive "the spikes" by just cutting the power at the very input. No damage to equipment behind the input electric circuit breaker, nope. You just get damaged input.
Next, I used to work in a bank. And here we had a clear requirement for data storage center: more than one power input -- is a must.
in reply to Stefano Marinelli

This immediately brought to mind coming into the office after a holiday weekend in 2005 and finding “my” computer room dark. I found our infrastructure manager, who told me that they had an unexpected power outage over the weekend. Confused, I said “But how is that possible? We have multiple feeds and a huge uninterruptible power supply!”

I will never forget his response, delivered in his thick Scottish brogue: “Yes, we do. But it doesn’t do much good when the UPS catches fire.” 😳

in reply to The Gaffer

@thegaffer That reminds me of an incident that happened at work. We have multiple sources of electricity and generators, but none of that matters if the room with the UPS and power controller where all the power sources meet floods from an overflowing toilet a floor above 🙃😅

Whoopsie daisy!

I just finished bypassing all the network switches in the closets from that circuit when they managed to bypass it and catastrophe averted.

That was a fun night! /s

Questa voce è stata modificata (3 mesi fa)
in reply to Stefano Marinelli

The true horror part of this story:

> The office was closed for the holidays, but I contacted the IT manager anyway. He was home sick with a serious family issue, but he got moving.

Home for the holidays, sick, serious family issue?? Who cares! You know what's more important?? Keeping that data center up and running!

Glory to sacrificing yourself for the system!!

Or maybe get someone else next time.

in reply to lorenzo

@lorenzo
I think Stefano, the mild mannered barista of the BSD Cafe who posts pictures of sunsets and from his walks in nature is just a cover, and in reality he is a tough-as-nails secret military agent who's chasing cybercriminals around the globe.
See also his comment to my blog post about "just telling people to call the Barista" to make them crap their pants... this Barista has a secret! 🕵️
Unknown parent

mastodon - Collegamento all'originale

EnigmaRotor

@Dianora @mwl yep: He Who Must Be Read is obviously untouchable. We have no rights to do that. “touch mwl” would miserably fail, that is.
Also note that Stefano’s story does not mention (yet) an overexcited systemd. I don’t see plagiarism in this. Let’s expect an hommage at some point, in pure Hans Zimmer style. /* insert big brass sound sample here */
in reply to ffffennek

@fennek@cyberplace.social pretty harsh. As an external provider, @stefano@bsd.cafe likely would have no idea their primary point of call was sick and/or had a family issue.
Really, if the primary point of call was out of action, it would be up to the business itself to arrange alternatives, allowing the sick person to stay out of action.
in reply to EnigmaRotor

@Dianora @Keltounet @mwl Then I might loose control and replace the bottle with génépi, which would further deviate from the original gelato product but amplify even user satisfaction to a whole new level. #outofcontrol #emergencyStop #forgetPreviousPrompt

Questo sito utilizza cookie per riconosce gli utenti loggati e quelli che tornano a visitare. Proseguendo la navigazione su questo sito, accetti l'utilizzo di questi cookie.