Some time ago I was going through web logs, and I noticed a number of requests being served on a domain that did not belong to my employer. It ends up the entire website was indexed in the search engines under this other domain.
It is simple for anyone to pull this off, all it takes is to point a DNS entry to some other website's IP address, and now all their pages will appear under that domain if the target site is not protecting itself. In a sense the domain name is siphoning off the content of another website.
It ends up there were a number of domains doing this to my employer's web site. I can only guess at the motivation. It might be a free way to build up SEO keyword relevance before it is switched over to other content. It might be a build up to a social engineering hacking attack. Regardless, it is not a desirable situation to allow for many reasons.
There are probably many ways to deal with this issue, but I decided to use rewrite rules to protect against this situation. I implemented the concept of authorized domains for a website.
I've used Helicon Tech's ISPAI Rewrite, and now Ape, for years now under IIS, but I've tested these rules using mod_rewrite on Apache 2.2 as well.
I implemented two things that should make attackers steer clear of targeting a domain: return a robots.txt that tells the search engines not to index any content and return a HTTP 410 Gone for any other page request. These two things tell any bot or user that nothing exists on a siphoning domain targeting your website. Valid content will only be returned from domains you authorize.
I created a rewrite map which lists authorized domains. For example, assume the domains www.domain-test1.com and images.domain-test1.com are the domains you use for your site. Create a text file named AuthorizedDomains.txt with the following content:
www.domain-test1.com - images.domain-test1.com -
Place a file called robotsDisallow.txt in your website's root with the following content.
User-Agent: * Disallow: /
The robotsDisallow.txt will be served for robots.txt requests on an unauthorized domain.
The following rewrite rules should be placed in your httpd.conf file.
RewriteMap lower int:tolower
RewriteMap AuthorizedDomainsMap txt:AuthorizedDomains.txt
# Serve a robots.txt file which tells search engines not to index unauthorized domains.
RewriteCond ${AuthorizedDomainsMap:${lower:%{SERVER_NAME}}|NOT_FOUND} NOT_FOUND
RewriteRule ^/robots\.txt$ /robotsDisallow.txt [NC,L]
# Return a HTTP 410 page for unauthorized domains.
RewriteCond ${AuthorizedDomainsMap:${lower:%{SERVER_NAME}}|NOT_FOUND} NOT_FOUND
RewriteRule .? - [G]
For Apache, I put the AuthorizedDomains.txt file in the Apache installation folder on Windows, but you can specify the path to the file as well.
A few notes:
You can implement a functionally equivalent set of rules using IIS's URL Rewrite Module. Again, I created a map for the authorzied domain look up, but unlike mod_rewrite, the map is not stored in a separate file. As in the previous example, assume the domains www.domain-test1.com and images.domain-test1.com are your authorized domains.
Place a file called robotsDisallow.txt in your website's root with the following content.
User-Agent: * Disallow: /
The robotsDisallow.txt will be served for robots.txt requests on an unauthorized domain.
The following rules go in the website's web.config file.
<rewrite>
<rules>
<rule name="robots for unauthorized domain" stopProcessing="true">
<match url="^robots\.txt$" />
<conditions logicalGrouping="MatchAll">
<add input="{Authorized domains:{SERVER_NAME}}" pattern="-" negate="true" />
</conditions>
<action type="Rewrite" url="/robotsDisallow.txt" />
</rule>
<rule name="Authorized domain check" stopProcessing="true">
<match url=".?" />
<conditions logicalGrouping="MatchAll">
<add input="{Authorized domains:{SERVER_NAME}}" pattern="-" negate="true" />
</conditions>
<action type="CustomResponse" statusCode="410"
statusReason="Gone"
statusDescription="The requested resource is no longer available" />
</rule>
</rules>
<rewriteMaps>
<rewriteMap name="Authorized domains">
<add key="www.domain-test1.com" value="-" />
<add key="images.domain-test1.com" value="-" />
</rewriteMap>
</rewriteMaps>
</rewrite>
Like the mod_rewrite rules, these rules will return a robots.txt file that tells search engines not to index any content on an unauthorized domain and to return a HTTP 410 Gone for all page requests on an unauthorized domain.
Another options is to use host headers in IIS to restrict the domains for a website, but this mechanism should be used as a last resort. Host headers in IIS have a number of limitations, so avoid them if possible.
To check for this situation, review your website's web logs and look for any unfamiliar domain names. If you find that a website has already fallen victim to this attack and is indexed by the search engines under the siphoning domain, you can implement the use of authorized domains. Then it will take time. It can take quite a while for your pages on the siphoning domain to start dropping out of the search engines, but it will take effect eventually. If I remember correctly, it took a few months for the search engines to start dropping the pages.
It is surprising how many websites are open to this attack. I checked a number of major websites, and all but one was open to this issue. I'm sure there are legal ways to stop someone from doing this, but with a fairly small technical implementation, this can be averted in a much cheaper way.
If you have any questions please let me know. I hope this helps.
Next entry: Create absolute URLs using ASP.NET MVC
Previous Entry: .htaccess subdomain redirect
Latest entries:
Skullcandy stencil usage examples
Comments
My Links
Tags
Follow me
About
Powered by FoxBlog
Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.
© Copyright 2011, Nathan Fox