Bad Bots
From Docunext Technology Wiki
|
I've been thinking about bad bots for a long time. They cause problems for many reasons, like poor design, not listening to the robots.txt file, clogging up internet traffic, among other issues. They are the source of a lot of spam, too.
My website and blog networks are getting more traffic these days, and not surprisingly a lot of it is from bots. I've improved my robots.txt file, as well as been thinking about adding my bad bot blocker to my apache configuration again, or even more drastic, setting up a firewall to block offending ips from port 80.
What is a bad bot?
In my opinion, bad bots are:
- spyware
- spambots
- zombie pcs
- content scrapers for pasting onto other sites
- automated submission bots that add comments and register non-existent users on forums
- submit spam through forms that send email
- harvest email addresses
- scan for network and software vulnerabilities
- brute force attack-bots in attempts to hack email, shell, or web accounts
Bad bot defenses
You can defend against bad bots in many ways, similar to how you might block spam. Here are a few strategies:
- Network blocks based on source IP address via real time black lists - extreme measure
- Server defense based on user-agent (it is easy to spoof different user-agents unfortunately) - similar to a robots.txt file, slightly more effective, extreme measure, but can be spoofed
- Intrusion detection systems (like snort) - not extreme, but only monitors
- Failure sensors like fail2ban (very good software) - not extreme, but can be spoofed to cause DoS
- Robots.txt file - not obeyed by all bots
- Quick Apache Configuration to block libwww-perl
- Iptables hashtable
Configuration Files for Bots
Related Pages
- Docunext List of Bots
- This Knuckle-head got blocked
- Odd Requests
- knockd
- Fail2ban
- Honeypot
- GeoDNS
- Surphace Scout
- Atrax