For the curious, today's scraperbot attack on @lwn has run to well over 800,000 unique IP addresses in the last few hours.
We've made some tweaks that are holding it off for now, but it is ongoing and could go bad again at any time.
If you are a real user and are being turned away by the site, could you let me know what your user agent is?
Diego Roversi
in reply to Jonathan Corbet • •I notice that few hours ago access to lwn timed out.
My user agent is: "Mozilla/5.0 (X11; Linux x86_64; rv:140.0) Gecko/20100101 Firefox/140.0"
Lars Wirzenius
in reply to Jonathan Corbet • • •I did not get an error, but the site took so long to open, I went to my private, highly unauthorized, personal archive of LWN weekly issues to grep for the name of a person I was trying to recall.
I'm glad to hear your technical counter measures are helping, even if I was impatient.
Jonathan Corbet
in reply to Lars Wirzenius • • •@liw Ah, so you are part of the scraper problem :)
Seriously, though, our content is CC-licensed once it escapes the paywall, so your archive is entirely authorized in truth.
Countermeasures are helping for the moment; I do not expect it to be a long-lasting thing.
Closing in on 1M unique IPs this morning. The net is broken.
Lars Wirzenius
in reply to Jonathan Corbet • • •Not scraping as much as opening the weekly issue every week and saving it to a file with a Firefox extension (Save Page WE), and then putting that in a Git repository.
I admit I went back through the archives to backfill my collection.
Why? I wanted to count how many times you've quote me. I'm that vain.
Lars Wirzenius
in reply to Lars Wirzenius • • •Jonathan Corbet
in reply to Jonathan Corbet • • •