There are not necessarily bad guys out there doing it deliberately. As long as it is possible to use arbitrary subdomains, it will happen somehow.
For example, there be a list with two domains somewhere in a search index of some search engine:
domain.com
www.sott.net
Then a line break occurs and you have domain.comsott.net which I have seen (another domain name of course).
Someone posts an URL in obfuscated form like ww w.sott.net / article … and you get the problem.
For example here: _http://www.youtube.com/watch?v=dZk0ZGNHkoQ
There is a user comment (obviously written with a malfunctioning keyboard):
comment said:
ht tp:/ /cryptome. org/eyeball/daiichi-npp15/daiichi-photos15.htm Cryptome.org has info that everyone imo should know + ht tp://w w w.sott.net/articles/show/228933-HAARP-and-The-Canary-in-the-Mine starts in 2004 to present day. Dunno if yt purged my previous comment or you did. If you I appreciate the protection 4 me effort. Shared sott.net also with henning. The rest of this with you. TC.
And so on …
On the other hand, if you look at constructions like zihggbu.sott.net or xxx.sott.net or 66.sott.net, one might really wonder what actually happened.
Try this one for redirection (add subdomains as you wish):
Code:
# Subdomain handling
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^.+(www|de|es|fr|mail)\.sott\.net$ [NC]
RewriteRule ^(.*)$ http://%1.sott.net/$1 [R=301,L]
RewriteCond %{HTTP_HOST} !^(www|de|es|fr|mail)\.sott\.net$ [NC]
RewriteRule ^(.*)$ http://www.sott.net/$1 [R=301,L]
</IfModule>
Don't use the other code, the dots are not escaped there. It was only an illustration.
You should also add to your robots.txt file certain pages which are not intended for search engines. Append:
Code:
User-agent: *
Disallow: /users/login
Disallow: /users/signup
Add further pages if necessary.
Another SEO issue is archive pages like http://www.sott.net/signs/archive/en/2012/signs20120906.htm
They may be visited by search engines but should not be indexed. They displace article pages and actual content. Such pages should be configured with:
Code:
<meta name="robots" content="noindex, follow">
There is also redirection missing concerning ending slashes. For example:
users/login and users/login/ coexist! This is also where canonical comes in handy.