Sunday, August 6, 2017

Bot Rant

It’s been a little while since I mentioned this, but just in case you ever have trouble reaching Flush Twice from various locations, I figured you ought to know why.

The Flush Twice server gets requests every day. Virtually every request gets added to the log file. Sometimes it receives requests from actual humans, but other times it gets these requests that no human visitor would ever make. Now the best thing to do is to just ignore the weird inhuman requests, but sometimes the logs fill up with nothing but these weird requests. Next thing you know, your site is being Ddos’ed or hacked. For the longest time I was able to collect huge lists of username/passwords that bots would use in a vain attempt to gain access to this site. Good thing they never tried, “hunter2”. Oh wait… They tried that one as well.

So after a while I started to block IP addresses that were almost certainly bad bots. This list grew to huge proportions, and I was left with pathetically low visitor numbers. Seeing such low numbers made me sad, and I thought that perhaps I was blocking legitimate traffic from accessing the site. I deleted the block lists. The visitor numbers when back up, but the spammy site scraping behaviour returned. Off an on I’ve vacillated between blocking bots and not. At one point I even tried to create a website devoted to the problem.

Ultimately I gave up worrying about how many people actually visit this site. If there’s only five of you, the by golly, the five of you should have the best jokes and comics I can come up with! Even if no one is watching, this site is going to be here in case someone ever does. At the end of the day, I still like to point at this site and say, “I made this for you!”

The most recent scourge I’ve encountered is the site scrapers. They are kind of like spiders, only they disguise themselves as actual visitors. They don’t do a very good job of it, because they “scrape” through the site, trying to download every last page in under 15 minutes. This grinds my gears for a variety of reasons, but suffice to say, they aren’t doing it to the benefit of this site or its actual visitors.

And how many IP addresses do I block? That’s actually hard to say. These days, I block by seeing bot-like behavior, check who owns the IP address of the culprit, if they are a server farm I look up a list of all of the CIDR’s they own, then run that list through a script I cobbled together, document it and add it to the “deny from” section of my htaccess file. Currently there are nearly 16000 CIDR’s in this list. With each CIDR typically covering between 256 and 65536 IP addresses each, that adds up to a lot of places on the internet that are permanently blocked from accessing this site!

And while blocking that many IP addresses makes me a little sad, I didn’t block you. No, not you. You’re special. You’re the person I made this site for. You’re the reason I’m doing all this. It’s always been about you. Nobody but you.

Pleasant dreams.

Pax,

-f2x