Blocking Site Rippers, Crawlers and Email Scrapers

May 1, 2012 in Website Security by [XTS] Jeremy  |  No Comments

Website security comes in many forms and this article will present 1 method which can be used for sites running under Apache2 with mod_rewrite.

Apache2 makes use of a special file (.htaccess) that can be placed in the root of your site or in individual folders.  .htaccess has a multitude of options to help combat the darker side of the web we live with.  We highly recommend reading this http://perishablepress.com/stupid-htaccess-tricks/#sec9 .

.htacces has the ability to weed out what types of user-agents are requesting your website and redirect them to another url or to a 403, 500 etc. This requires that the Apache2 module mod_rewrite is enabled.

Add the following to your .htaccess file to enable this feature

#Enable RewriteEngine
RewriteEngine On 

# Stop the Nasties!!!
 RewriteBase /
 RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
 RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]
 RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
 RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Zeus
 RewriteRule ^.* - [F,L]

To get a better idea of what user-agents might be affecting your site hosted with Xtreme Services, open your browser to http://yoursite/stats  scroll down to the section that looks like this :

We can clearly see that a user-agent named ZmEu has been bombarding this site followed by the Baidu spyder.

To learn more about the ZmEu user-agent, do a search on google to see what others are saying, or go to www.botsvsbrowsers.com and search to see if it has been classified.

The ZmEu user-agent can be added to the block list as follows :

#Enable RewriteEngine
RewriteEngine On 

# Stop the Nasties!!!
 RewriteBase /
 RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
 RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]
 RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
 RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR]
 RewriteCond %{HTTP_USER_AGENT} ZmEu [NC]
 RewriteRule ^.* - [F,L]

To test that this additional condition is effective, visit http://www.botsvsbrowsers.com/SimulateUserAgent.asp and paste ZmEu into the User Agent text box, add your website address into the URL below it and then click GO. If the new condition worked, you should see a Forbidden message :

A comprehensive example :

#Enable RewriteEngine
RewriteEngine On

# Stop the Nasties!!
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR]
RewriteCond %{HTTP_USER_AGENT} ZmEu [NC]
RewriteRule ^.* - [F,L]

If you would rather not be so friendly, you can re-direct specific user-agents to a website of your choice.

# Redirect to a site that gets the message across
RewriteRule ^.*$ https://www.google.com/search?q=getlost [R,L]

If you know a user-agent is a Spam Bot / Mail Scraper, you can use the following to keep them busy :

# Redirect to a Spam Poison site : )
RewriteRule ^.*$ http://english-1335903213.spampoison.com [R,L]

 

Disclaimer – this article is a helpful how to only.  Implement the content at your own risk.  Be sure to test all changes you make to .htaccess thoroughly to prevent any unwanted messages for valid user-agents.

Posted in Website Security.

About [XTS] Jeremy

XTS Server and Hardware Guru

Leave a Reply

  • PrevNext

    Xtreme Services has been an excellent partner to us in providing the services we need to our clients. Reasonable prices, excellent and professional service, modern features, and excellent uptime are all reasons we use them ourselves. – Jacob Padgett Owner/Web Developer Attractivepenguin.com

    If there is ever a problem or something that needs to be fixed (which there rarely is) they are quick to respond and dedicated to making it right in a timely manner. I would highly recommend this company and am looking forward to the years of service with them ahead. – Steve Brock Farsideyouth.org

    Jeremy is professional and promptly answers my questions immediately and if I need a more detailed explanation he goes the extra mile to get me those answers. For the same price as the big companies, I get personal service from a local company. – Lee Bjella Bjelladesign.com

    Their prices are competitive, but when you also consider the value of the personal service that they provide, they are downright cheap! – Bob Bridges Tivertonaudio.com / Bn-collection.com

    Xtreme Services, has been a huge help and has gone above and beyond the expected to make sure that we – and our customers – are happy and taken care of. We recommend Xtreme Services. – Neil and Keith Eneix Owners and Founders of Fannit.com

Sucuri Security