A few days ago, a client’s data center (well, actually a server room) "vanished" overnight. My monitoring showed that all devices were unreachable. Not even the ISP routers responded, so I assumed a sudden connectivity drop. The strange part? Not even via 4G.

I then suspected a power failure, but the UPS should have sent an alert.

The office was closed for the holidays, but I contacted the IT manager anyway. He was home sick with a serious family issue, but he got moving.

To make a long story short: the company deals in gold and precious metals. They have an underground bunker with two-meter thick walls. They were targeted by a professional gang. They used a tactic seen in similar hits: they identify the main power line, tamper with it at night, and send a massive voltage spike through it.

The goal is to fry all alarm and surveillance systems. Even if battery-backed, they rarely survive a surge like that. Thieves count on the fact that during holidays, owners are away and fried systems can't send alerts. Monitoring companies often have reduced staff and might not notice the "silence" immediately.

That is exactly what happened here. But there is a "but": they didn't account for my Uptime Kuma instance monitoring their MikroTik router, installed just weeks ago. Since it is an external check, it flagged the lack of response from all IPs without needing an internal alert to be triggered from the inside.

The team rushed to the site and found the mess. Luckily, they found an emergency electrical crew to bypass the damage and restore the cameras and alarms. They swapped the fried server UPS with a spare and everything came back up.

The police warned that the chances of the crew returning the next night to "finish" the job were high, though seeing the systems back online would likely make them move on. They also warned that thieves sometimes break in just to destroy servers to wipe any video evidence.

Nothing happened in the end. But in the meantime, I had to sync all their data off-site (thankfully they have dual 1Gbps FTTH), set up an emergency cluster, and ensure everything was redundant.

Never rely only on internal monitoring. Never.

#IT #SysAdmin #HorrorStories #ITHorrorStories #Monitoring

Questa voce è stata modificata (2 mesi fa)
in reply to Stefano Marinelli

@Dianora @mwl yep: He Who Must Be Read is obviously untouchable. We have no rights to do that. “touch mwl” would miserably fail, that is.
Also note that Stefano’s story does not mention (yet) an overexcited systemd. I don’t see plagiarism in this. Let’s expect an hommage at some point, in pure Hans Zimmer style. /* insert big brass sound sample here */
in reply to EnigmaRotor

@Dianora @Keltounet @mwl Then I might loose control and replace the bottle with génépi, which would further deviate from the original gelato product but amplify even user satisfaction to a whole new level. #outofcontrol #emergencyStop #forgetPreviousPrompt
in reply to Stefano Marinelli

that advice also applies to monitoring scheduled backup jobs (or any other automated process). I use a service that emails me if I don't hit a specific URL roughly every 24 hours, and I hit that at the end of my backup job if it was successful.

Better than finding out the hard way at some point in the future that something happened with my backup job, preventing it from running for the last month.

Stefano Marinelli reshared this.

in reply to Stefano Marinelli

About 15 years ago, the place I worked had a supercomputer. One night, the aircon in the machine room failed. The machine kept computing, and the temperature rose. It rose *quite a lot*.

Sadly, the first thing to fail from the heat was the core switch for the room. You know, the one that handles all of the network for everything in the room. Including the temperature alerts.

It was finally spotted about 8am when the security patrol wondered why the door shutters were so hot.

in reply to Stefano Marinelli

10+ years ago i started volunteering at a festival. Everything was new that year including the small outdoor racks for the area field routers (Juniper MX80). They barely fit but we managed. The racks were left in the sun in the summer. It was only when we enabled Observium (LibreNMS predecessor) that graphs almost everything it gets from SNMP that we discovered the inlet temperature was getting close to 80 degrees C. #monitorallthethings

Stefano Marinelli reshared this.

in reply to Stefano Marinelli

Cool story bro, but it's too fictional, I'd say.
First off, as a Ukrainian, I know that powerlines can survive "the spikes" by just cutting the power at the very input. No damage to equipment behind the input electric circuit breaker, nope. You just get damaged input.
Next, I used to work in a bank. And here we had a clear requirement for data storage center: more than one power input -- is a must.
in reply to Space flip-flops

@fisher It's not a bank, and it's definitely not a top-tier data center. It's just a trading company in a typical industrial area in Northern Italy, with fewer than 50 employees. They've only got four aging servers in a rack. Facilities like this don't need bank-grade security or the kind of resilience you'd find in a war zone. I wish it were fictional! 😃
in reply to Stefano Marinelli

This immediately brought to mind coming into the office after a holiday weekend in 2005 and finding “my” computer room dark. I found our infrastructure manager, who told me that they had an unexpected power outage over the weekend. Confused, I said “But how is that possible? We have multiple feeds and a huge uninterruptible power supply!”

I will never forget his response, delivered in his thick Scottish brogue: “Yes, we do. But it doesn’t do much good when the UPS catches fire.” 😳

in reply to The Gaffer

@thegaffer That reminds me of an incident that happened at work. We have multiple sources of electricity and generators, but none of that matters if the room with the UPS and power controller where all the power sources meet floods from an overflowing toilet a floor above 🙃😅

Whoopsie daisy!

I just finished bypassing all the network switches in the closets from that circuit when they managed to bypass it and catastrophe averted.

That was a fun night! /s

Questa voce è stata modificata (2 mesi fa)
Unknown parent

mastodon - Collegamento all'originale

Stefano Marinelli

@elaterite @_elena Fair question 🙂
I'm just relaying what I was told and what I know about the company, for which I've been providing some services for many years. The details came directly from their internal manager and, honestly, I didn't have much interest in digging deeper into the technical specifics of the incident.
My focus was simply making sure their servers were back up and running and that their data was safe. Everything else, electrical infrastructure, physical security, and similar aspects, is outside of my scope and handled by other people.
in reply to Stefano Marinelli

The true horror part of this story:

> The office was closed for the holidays, but I contacted the IT manager anyway. He was home sick with a serious family issue, but he got moving.

Home for the holidays, sick, serious family issue?? Who cares! You know what's more important?? Keeping that data center up and running!

Glory to sacrificing yourself for the system!!

Or maybe get someone else next time.

in reply to lorenzo

@lorenzo
I think Stefano, the mild mannered barista of the BSD Cafe who posts pictures of sunsets and from his walks in nature is just a cover, and in reality he is a tough-as-nails secret military agent who's chasing cybercriminals around the globe.
See also his comment to my blog post about "just telling people to call the Barista" to make them crap their pants... this Barista has a secret! 🕵️
in reply to ffffennek

@fennek Calling these 'my' problems is inaccurate; I am simply providing services to this company and I have no formal contract or obligation regarding this specific issue. I could have easily ignored the alert, especially since I wasn't aware the person in charge was out sick. Despite this, I offered to step in and handle it myself - even though it’s hours away - to help out and allow them to stay home.
in reply to ffffennek

@fennek@cyberplace.social pretty harsh. As an external provider, @stefano@bsd.cafe likely would have no idea their primary point of call was sick and/or had a family issue.
Really, if the primary point of call was out of action, it would be up to the business itself to arrange alternatives, allowing the sick person to stay out of action.
in reply to Paul Wilde 😺 (snac2 acct)

@paul @fennek I am familiar with that organization - and I know that the person (the one who was home sick, even if I didn't know he was home sick) has a deep sense of loyalty, but he is not reckless. If he hadn't been well enough, he wouldn't have gone. I even offered to go in his place myself. It is a healthy environment, not "that typical company" that exploits its employees. For obvious reasons, I cannot disclose details (and I work with several similar companies in different areas), but I can guarantee that everyone acted with the utmost respect for human decency. Fortunately, not all businesses operate like malicious entities that only think about harming their employees and collaborators.

I always strive to distance myself from such organizations, as they do not align with my outlook on life and the world.

Questo sito utilizza cookie per riconosce gli utenti loggati e quelli che tornano a visitare. Proseguendo la navigazione su questo sito, accetti l'utilizzo di questi cookie.