Stop AboutUs.org from scraping your blog
Tom’s posted some good tips for preventing AboutUs.org from scraping your site:
Block their robot
Add the following lines to your robots.txt (in the “root” folder of your website)
User-agent: AboutUsBot
Disallow: /Block their IP range
In your .htaccess file (if you’re on Apache) add the following lines:
deny from 66.249.16.Block the bot’s user agent
If you do user agent blocking, block the bot’s user agent:
(currentlyMozilla/5.0 (compatible; AboutUsBot/0.9; +http://www.aboutus.org/AboutUsBot))Block the DomainTools.com IP Range
AboutUs.Org uses Domaintools services to generate thumbnail images of site content, so block their IP range too:
deny from 66.249.4.
Update: Tom appears to have deleted the post from his site. (It’s back) Thankfully AboutUs.org has a page stating how to block their bot from your site and it’s a bit simpler than what’s posted above.
To prevent the AboutUsBot from collecting your site content in the future, please include the following lines in your /robots.txt file.
- User-agent: AboutUsBot
- Disallow: /
The AboutUsBot will include the following in it’s User-Agent string:
- Mozilla/5.0 (compatible; AboutUsBot/0.9; +http://www.aboutus.org/AboutUsBot)
Please note that the current AboutUsBot behavior is to visit each site only once to initialize the AboutUs.org page.
Also, they’ve posted a page for people to voice their concerns! So, go voice them…
Well, the morning brings a change to the “concerns” page, a promise of manual deletion (once you prove you own the domain) – shame this “proof” isn’t required to create and edit a record really…
He still hasn’t explained the purpose of the site, or what “valuable service” it’s providing…