Salta al contenuto principale


A few days ago, a client’s data center (well, actually a server room) "vanished" overnight. My monitoring showed that all devices were unreachable. Not even the ISP routers responded, so I assumed a sudden connectivity drop. The strange part? Not even via 4G.

I then suspected a power failure, but the UPS should have sent an alert.

The office was closed for the holidays, but I contacted the IT manager anyway. He was home sick with a serious family issue, but he got moving.

To make a long story short: the company deals in gold and precious metals. They have an underground bunker with two-meter thick walls. They were targeted by a professional gang. They used a tactic seen in similar hits: they identify the main power line, tamper with it at night, and send a massive voltage spike through it.

The goal is to fry all alarm and surveillance systems. Even if battery-backed, they rarely survive a surge like that. Thieves count on the fact that during holidays, owners are away and fried systems can't send alerts. Monitoring companies often have reduced staff and might not notice the "silence" immediately.

That is exactly what happened here. But there is a "but": they didn't account for my Uptime Kuma instance monitoring their MikroTik router, installed just weeks ago. Since it is an external check, it flagged the lack of response from all IPs without needing an internal alert to be triggered from the inside.

The team rushed to the site and found the mess. Luckily, they found an emergency electrical crew to bypass the damage and restore the cameras and alarms. They swapped the fried server UPS with a spare and everything came back up.

The police warned that the chances of the crew returning the next night to "finish" the job were high, though seeing the systems back online would likely make them move on. They also warned that thieves sometimes break in just to destroy servers to wipe any video evidence.

Nothing happened in the end. But in the meantime, I had to sync all their data off-site (thankfully they have dual 1Gbps FTTH), set up an emergency cluster, and ensure everything was redundant.

Never rely only on internal monitoring. Never.

#IT #SysAdmin #HorrorStories #ITHorrorStories #Monitoring

Questa voce è stata modificata (1 giorno fa)
in reply to Stefano Marinelli

nice story! and, yeah, internal monitoring is a must, but you also need an external one, operated by someone else than yourself.
in reply to Stefano Marinelli

Oh, if genre is horror, then don’t forget to tell the tale of the guy who pronounced “Microsoft” 3 times before his mirror. What happened next, the blue mirror of death, is frightening to the bones.
in reply to EnigmaRotor

@EnigmaRotor
I am quite keen to look into Uptime Kuma. Our current monitor is antiquated.

On a side note, you guys are hilarious! I genuinely had a good laugh at your comments.

in reply to Marios Efstathiou

@marios 😄 that’s part of the concept, I think we do need and deserve to get smiles on our faces. As often as we possibly can 😃
in reply to Stefano Marinelli

@Dianora @mwl yep: He Who Must Be Read is obviously untouchable. We have no rights to do that. “touch mwl” would miserably fail, that is.
Also note that Stefano’s story does not mention (yet) an overexcited systemd. I don’t see plagiarism in this. Let’s expect an hommage at some point, in pure Hans Zimmer style. /* insert big brass sound sample here */
in reply to Stefano Marinelli

@Dianora @Keltounet @mwl I’ll virtually enjoy from remote… (well, no, but that’s for another conf a bit closer from home, someday). (I don’t think gelato payloads can make it to my computer, probably due to firewalls, would they?).
in reply to EnigmaRotor

@Dianora @Keltounet @mwl I might as well purchase vanilla icecream and put a large amount of Grand Marnier on it. Call it emulation of some sort , and try to synchronize with you folks. Well, timezones would make this fun, indeed.
in reply to Stefano Marinelli

You know the trade offs of emulation, this could lead to a suboptimal experience… thus I might get rid of the icecream and focus on the bottle. (From simplicity comes user satisfaction)

#drunkCipherMachine #veryRandomNumberGenerator

Questa voce è stata modificata (16 ore fa)
in reply to EnigmaRotor

@Dianora @Keltounet @mwl Then I might loose control and replace the bottle with génépi, which would further deviate from the original gelato product but amplify even user satisfaction to a whole new level. #outofcontrol #emergencyStop #forgetPreviousPrompt
in reply to EnigmaRotor

@EnigmaRotor reading this at lunch in a cafe near my house and I keep chuckling and smiling from ear to ear. @stefano is such a treasure 🙌🏆
in reply to ozoned

@ozoned maybe! Especially if it’s motivation enough for you to keep practicing your Italian! 😂 and definitely at the very least a cameo with a line from Spaceballs
in reply to Elena Rossini ⁂

@_elena And the café is the treasure island (“X” marks the place). 🎶“Heeeeee is a pirate, a jar of whiskey and a bottle of winnnneeeeee”🎶. Well that was a spontaneous Jack Sparrow moment. Sorry!
in reply to Stefano Marinelli

that advice also applies to monitoring scheduled backup jobs (or any other automated process). I use a service that emails me if I don't hit a specific URL roughly every 24 hours, and I hit that at the end of my backup job if it was successful.

Better than finding out the hard way at some point in the future that something happened with my backup job, preventing it from running for the last month.

Stefano Marinelli reshared this.

in reply to Johan Sköld

@rhoot I have my cronjob scripts touch a file as their final action and my monitoring stuff alarms if the file is too old
in reply to Stefano Marinelli

Sounds like a case of either good design or *very* good luck too that the UPS took the brunt of it.

We can't protect against everything, but we *can* have an idea for what to do when the unimagined happens.

in reply to mkj

@mkj yes, that is (was) a very good UPS and it did its job.
@mkj
in reply to Stefano Marinelli

@mkj Not sure if that's the case, but those setups often include a surge protection device upstream of the UPS
@mkj
in reply to Stefano Marinelli

@mkj Aye, there are different kinds of SPDs, and unfortunately electrical installers usually go for the cheapest ones unless you specifically ask for certain specs when getting a quote. Like always, it comes down to money, but at the very least the customer should be told about the limitations
@mkj
in reply to Stefano Marinelli

so refreshing to read a quality tech tale on Mastodon. Thanks for sharing!
in reply to Stefano Marinelli

it is the criminals among us who make life difficult for all. Not even the greatest sci-fi authors have been able to imagine how beautiful and fun a future we all would have without them!
in reply to Stefano Marinelli

you’re a hero Stefano! As your Fedi friend and documentary filmmaker I hope I get preferential treatment when one of your amazing stories gets optioned for a film 🤗
in reply to Elena Rossini ⁂

@_elena Thank you! Sure, I will 👍
But, to be honest, I don't think any of those stories will ever be a film.

The big, most scary one is yet to come, anyway...

in reply to Stefano Marinelli

I don't know, you told this short story like a pro. Starts out, ya, data center suddenly goes dark over the holidays. UPS fails, kinda of ya, ya , still interesting then you introduce the gold, two-meter thick walls, professional thieves, wow, that's some drama! Although, I wonder how they were able to send such a massive power surge down the lines and why the bus mains didn't blow before the equipment was damaged? Looking forward to your next tale!

@_elena

in reply to Bob Tregilus

@elaterite @_elena Fair question 🙂
I'm just relaying what I was told and what I know about the company, for which I've been providing some services for many years. The details came directly from their internal manager and, honestly, I didn't have much interest in digging deeper into the technical specifics of the incident.
My focus was simply making sure their servers were back up and running and that their data was safe. Everything else, electrical infrastructure, physical security, and similar aspects, is outside of my scope and handled by other people.
in reply to Stefano Marinelli

Indeed. Still would be interesting to find out about the details of the infrastructure failure and how they pulled it off. Sounds like a good story for a documentary, especially if this is something that has happened in the past.

@_elena

in reply to Bob Tregilus

@elaterite @_elena The police are investigating, and I know some technicians are scheduled to go over in the next few days. There will also be an insurance report, so I’ll try to get some more information.
in reply to Stefano Marinelli

This is such a good, if niche, example of "paying attention to the fundamentals and the alerts covers all sorts of things you'd never imagine happening."

Thanks for sharing.

in reply to Ian Campbell 🏴

@neurovagrant thank you! My rule is: we need moooarr alerts, as you never know how and when (not if - we know it will happen) your alertil system will break.
in reply to Stefano Marinelli

Great job!

This is why is always run up time on different servers in other places!

Perfect!

in reply to stux⚡️

@stux thank you! Yes, that's a very wise approach. I have some internal and external monitoring tools. And the monitoring tools monitoring the monitoring tools, with different technologies (so a bug won't hit all the tools at the same time). Yet, I always feel I need moooarrr monitoring 🙂
in reply to Stefano Marinelli

About 15 years ago, the place I worked had a supercomputer. One night, the aircon in the machine room failed. The machine kept computing, and the temperature rose. It rose *quite a lot*.

Sadly, the first thing to fail from the heat was the core switch for the room. You know, the one that handles all of the network for everything in the room. Including the temperature alerts.

It was finally spotted about 8am when the security patrol wondered why the door shutters were so hot.

in reply to Hugo Mills

@darkling nice story! Unfortunately I had to manage something like that, too (A/C broken - switch dead, etc)
in reply to Stefano Marinelli

Fortunately, the only thing that did fail after the aircon was the switch. (And a pair of ear muffs which had been hanging on a metal rail -- they'd melted).

The fire brigade turned up, checked everything, and ran some big positive pressure fans to get airflow through the room from one door to the other to cool everything down.

in reply to Stefano Marinelli

10+ years ago i started volunteering at a festival. Everything was new that year including the small outdoor racks for the area field routers (Juniper MX80). They barely fit but we managed. The racks were left in the sun in the summer. It was only when we enabled Observium (LibreNMS predecessor) that graphs almost everything it gets from SNMP that we discovered the inlet temperature was getting close to 80 degrees C. #monitorallthethings

Stefano Marinelli reshared this.

in reply to Lasse Leegaard

@lasseleegaard true. I'm using my switch's fan speed to understand if my home office room is too warm
in reply to Stefano Marinelli

There was an attack a few years back near here where they dropped burning rubbish into manholes around a a data centre; the theory at the time was it was to try and cut off some CCTV or alarm monitoring for something. Well caught!
in reply to Stefano Marinelli

Cool story bro, but it's too fictional, I'd say.
First off, as a Ukrainian, I know that powerlines can survive "the spikes" by just cutting the power at the very input. No damage to equipment behind the input electric circuit breaker, nope. You just get damaged input.
Next, I used to work in a bank. And here we had a clear requirement for data storage center: more than one power input -- is a must.
in reply to Space flip-flops

Third, given it's a data center, power consumption is probably tens of KW. The "gang" could probably be killed in action playing with it.
Fourth, if there is a power spike and cut off, it won't go unnoticed by those who control power lines. They will be the first on site to see what happened.
in reply to Space flip-flops

@fisher It's not a bank, and it's definitely not a top-tier data center. It's just a trading company in a typical industrial area in Northern Italy, with fewer than 50 employees. They've only got four aging servers in a rack. Facilities like this don't need bank-grade security or the kind of resilience you'd find in a war zone. I wish it were fictional! 😃
in reply to Stefano Marinelli

thank you for this knowledge, I have boosted it for reference for others. 🤗
in reply to Stefano Marinelli

This is a pretty important knowledge to have! I bookmarked for future reference!
Questa voce è stata modificata (1 giorno fa)
in reply to Stefano Marinelli

This immediately brought to mind coming into the office after a holiday weekend in 2005 and finding “my” computer room dark. I found our infrastructure manager, who told me that they had an unexpected power outage over the weekend. Confused, I said “But how is that possible? We have multiple feeds and a huge uninterruptible power supply!”

I will never forget his response, delivered in his thick Scottish brogue: “Yes, we do. But it doesn’t do much good when the UPS catches fire.” 😳

in reply to The Gaffer

@thegaffer That reminds me of an incident that happened at work. We have multiple sources of electricity and generators, but none of that matters if the room with the UPS and power controller where all the power sources meet floods from an overflowing toilet a floor above 🙃😅

Whoopsie daisy!

I just finished bypassing all the network switches in the closets from that circuit when they managed to bypass it and catastrophe averted.

That was a fun night! /s

Questa voce è stata modificata (1 giorno fa)
in reply to Stefano Marinelli

why do you assume that via 4G there would be connectivity? I don't get this part, what am I missing?
in reply to Pedro Bufulin

@pedro if the two FTTH providers are down, the router will use the failover 4g connection to reach my VPN (and alert me).
in reply to Stefano Marinelli

@pedro power line monitoring is important even for "normal" failures, because some are destructive.
Since 9/11 there are a few new spooky things, and one is modulating the power with pulses
in reply to Stefano Marinelli

how do you think they managed to burn 4G? I suppose the battery for 4G should not even be in the same "grid" as the other stuff, right? (Im not sure anymore if I know how electricity works, guess I always took it for granted)
in reply to Pedro Bufulin

@pedro the 4g router was connected to the same UPS. So it wasn't destroyed, just off.
in reply to Stefano Marinelli

that's impressive. meanwhile I accidentally stumbled on your website:
You have shared many useful items in a thoughtful way. I appreciate it, and am glad to let you know. 😀
in reply to Stefano Marinelli

And while not relying on internal monitoring make sure your external monitoring doesn't share anything with the monitored systems:

Different ISP, different cloud provider if in the cloud, no shared infra at any level

in reply to Stefano Marinelli

zapping the power lines, eh? Looks like the perfect solution to my nuisance neighbors with the big loudspeakers.
in reply to Stefano Marinelli

The true horror part of this story:

> The office was closed for the holidays, but I contacted the IT manager anyway. He was home sick with a serious family issue, but he got moving.

Home for the holidays, sick, serious family issue?? Who cares! You know what's more important?? Keeping that data center up and running!

Glory to sacrificing yourself for the system!!

Or maybe get someone else next time.

in reply to Dan 🌻

@danvolchek to be honest, I offered him to rush there. But he refused and decided to go (he wasn't far from there)
in reply to Stefano Marinelli

AFAIK, professional alarm systems should function based on the principle that "if it doesn't send periodic alerts saying that everything is ok, and there's no scheduled downtime, then something clearly isn't ok, and somebody needs to be send to investigate it asap."
in reply to miki

@miki I agree. In fact, their first idea is to check why they didn't call/intervene
@miki
in reply to lorenzo

@lorenzo
I think Stefano, the mild mannered barista of the BSD Cafe who posts pictures of sunsets and from his walks in nature is just a cover, and in reality he is a tough-as-nails secret military agent who's chasing cybercriminals around the globe.
See also his comment to my blog post about "just telling people to call the Barista" to make them crap their pants... this Barista has a secret! 🕵️
in reply to Stefano Marinelli

great story, thanks for sharing. Probably @mwl can make a novel "Heroic Stories of a Tiny Router" or so.
in reply to Stefano Marinelli

In the first sentence you mention a "data center", but such an attack would not work with a data center, to be one you need to have two buildings with independent power supply, at a safe distance, etc etc. I think this was at best a hosting room, not a data center.
Questa voce è stata modificata (1 giorno fa)
in reply to Uriel Fanelli

@uriel sure - we tend to call "data center" a specific place, inside the company, that will host the servers (with A/C, etc). Maybe a little inappropriate, here.
in reply to Stefano Marinelli

Well, not "a little". The one you described is - at best - a server room, not even a hosting center, since according with the blueprints, there was no redundancy....
in reply to Uriel Fanelli

@uriel You're right. I've updated the original post to clarify it. Thank you for pointing it out!
in reply to Stefano Marinelli

wow, cool story and well done! 👍

And yes sometimes the truth is really better than fiction (thinking about about something a while back I was part in in my job that could have been easily from a badly scripted reality TV show. Can't go into details because of nda 🙈 )

in reply to Tionisla

@Tionisla Thank you. Yes, this is true. Sometimes things IRL are stranger than in fiction. And, if I look back, I've lived some incredible experiences. If I told it to my 20-year-old self, I would never have believed it
in reply to Stefano Marinelli

heh, yeah and even now you have to sit down rub your eyes and go "wtf". :-D
in reply to Stefano Marinelli

I must repeat this Never trust in onsite backups either. Fire will destroy those. And RAID is not backup.
You know this but it bears repeating!

Stefano Marinelli reshared this.

in reply to Stefano Marinelli

@Dianora The local backup is a remnant of the encrypted backup off network. If you can use it, it'll be faster. But you should assume you will never use it.

Stefano Marinelli reshared this.

in reply to Stefano Marinelli

even my new home alarm is coupled with a external monitoring alarm center that recognize tampering/sabotage jn addition to the "normal" alarms based on sensors etc. it costs a yearly subscription, but having a break in in the past, we considered it worthwile when we renovated our home.
in reply to Stefano Marinelli

I just want to say, this is one of those long, esoteric, fascinating, entertaining threads like you used to see on Reddit, and it's great to see here on the Fedi, minus all the Reddit bullshit. Good job everyone!

Stefano Marinelli reshared this.

in reply to Stefano Marinelli

I wasn't aware of this kind of problems with internal monitoring and the importance of external monitoring. However, I think is more important to monitor the monitoring server or to have one heartbeat of the monitoring system (external or internal). Because the external monitoring system could also fail without being aware of it.
in reply to zako

@zako sure. Monitoring the monitor is more important than monitoring the services.
@zako
in reply to tkr

@tkr I will - but it's too fresh and still not totally over. When I'll have all the final details, this will be a blog post
@tkr
in reply to Stefano Marinelli

Have to integrate this story into the pitch for our monitoring service 😁
in reply to Stefano Marinelli

Good for you. If next time, you could solve your problems without involving people who are sick at home with a serious family issue on top, that would be great.
in reply to ffffennek

@fennek Calling these 'my' problems is inaccurate; I am simply providing services to this company and I have no formal contract or obligation regarding this specific issue. I could have easily ignored the alert, especially since I wasn't aware the person in charge was out sick. Despite this, I offered to step in and handle it myself - even though it’s hours away - to help out and allow them to stay home.
in reply to ffffennek

@fennek@cyberplace.social pretty harsh. As an external provider, @stefano@bsd.cafe likely would have no idea their primary point of call was sick and/or had a family issue.
Really, if the primary point of call was out of action, it would be up to the business itself to arrange alternatives, allowing the sick person to stay out of action.
in reply to Paul Wilde 😺 (snac2 acct)

@paul @fennek I am familiar with that organization - and I know that the person (the one who was home sick, even if I didn't know he was home sick) has a deep sense of loyalty, but he is not reckless. If he hadn't been well enough, he wouldn't have gone. I even offered to go in his place myself. It is a healthy environment, not "that typical company" that exploits its employees. For obvious reasons, I cannot disclose details (and I work with several similar companies in different areas), but I can guarantee that everyone acted with the utmost respect for human decency. Fortunately, not all businesses operate like malicious entities that only think about harming their employees and collaborators.

I always strive to distance myself from such organizations, as they do not align with my outlook on life and the world.

in reply to ffffennek

@fennek No problem! Indeed, it may not have been entirely clear from the original text.
in reply to Stefano Marinelli

Damn you Stefano.

You just spoiled a future Netflix movie.

Instead of watching in 2027 : `The Power Surge Heist`... we will have `The Uptime` with Stefano as sysadmin.

Following you so i can keep up with all the movies i will be missing.

in reply to Stefano Marinelli

I'm upset this didn't turn into a story of how the police laid a trap.

Uptime Kuma is dope tho.

Questo sito utilizza cookie per riconosce gli utenti loggati e quelli che tornano a visitare. Proseguendo la navigazione su questo sito, accetti l'utilizzo di questi cookie.