Missing redirection

Sirius · Sep 8, 2012

Hi,

Recently I noticed that Google indexed pseudo subdomains concerning Sott.net. There is no redirection setup so you will end up with duplicate content, etc.—which is bad and can be solved by writing one single line of code.
But see for yourself: https://www.google.de/#hl=en&q=site:www.de.sott.net&oq=site:www.de.sott.net

Related topic: http://cassiopaea.org/forum/index.php?topic=27149.0

Scottie · Sep 9, 2012

Oops... Fixed now.

Thanks!

Sirius · Sep 9, 2012

If you redirect those pages with “301 Moved Permanently”, Google will automatically recognise the change and delete the old sites from its search index. This is what's recommended.
You should also in each case specify a canonical URL, see: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139066

Scottie · Sep 9, 2012

Well... The pages haven't moved. They never should have been there in the first place. Google will catch on anyway, which is good because otherwise we'd have to differentiate between all possible valid links (and return the 301) and invalid links. Otherwise we'd end up with "moved permanently to 404 - not found".

It appears this was also screwing up the RSS feeds.

So, anybody with a malfunctioning RSS feed should check the feed URL and make sure it's not something like:

mail.es.sott.net

or

www.de.sott.net

Crikey...

Sirius · Sep 9, 2012

What I meant was to use htaccess redirection like this:

Code:

RewriteCond %{HTTP_HOST} ^www.de.sott.net$  [NC]
RewriteRule ^(.*)$  http://de.sott.net/$1  [R=301,L]

You can then extend it with multiple subdomains, etc.

You need also to generate a canonical URL for each page (very important).

Scottie · Sep 9, 2012

Oh! I see...

Thanks for that bit of Apache redirect stuff... That stuff always makes me cry because it never quite works the way I want it to.

I did a little test when making the 404 error work, and I found out that in about 5 minutes, 245 hits to the SOTT server had malformed URLs, like:

http://blocking.azw.sott.net/articles/show/123456

???

I also put the canonical URL link tags for articles and pages. I had always wondered what those were for, and now I know. In all the SEO reading I've done (like half of Google's content), nobody ever mentioned the canonical URL thing! Sheesh.

Thanks2

:flowers:

Sirius · Sep 9, 2012

There went much more wrong:
https://www.google.de/webhp#q=inurl:*.sott.net+-inurl:de.sott+-inurl:www.sott+-inurl:es.sott+-inurl:fr.sott+-inurl:facebook.com
You need much more aggressive filtering in order to fix this (those are usually valid URLs; they are prefixed wrongly though).
1) Collect all real or valid subdomains first, e.g. de, fr, es, www, mail, etc.
2) Redirect anything from *.sott.net to sott.net except those exceptions above.

Mr. Scott said:
I had always wondered what those were for, and now I know. In all the SEO reading I've done (like half of Google's content), nobody ever mentioned the canonical URL thing! Sheesh.

It's documented in Google's webmaster section. Above I posted a link to it.

Scottie · Sep 9, 2012

Sirius said:
There went much more wrong:
https://www.google.de/webhp#q=inurl:*.sott.net+-inurl:de.sott+-inurl:www.sott+-inurl:es.sott+-inurl:fr.sott+-inurl:facebook.com
You need much more aggressive filtering in order to fix this (those are usually valid URLs; they are prefixed wrongly though).
1) Collect all real or valid subdomains first, e.g. de, fr, es, www, mail, etc.
2) Redirect anything from *.sott.net to sott.net except those exceptions above.

Oh dear...

Well, I improved the regex quite a bit, but I haven't gotten the above to work yet. I need a negative lookahead, I think, but of course it doesn't work.

Oh well, there's always tomorrow!

Scottie · Sep 9, 2012

On the other hand, there's also right now.

I let apache handle the basic cases with my expanded regex, and then for the "everything else -> www.sott.net", I changed the 404 redirect in the SOTT app itself to a 301 redirect to www.sott.net/BLAH.

That will take care of the "_www.dildomania-info.sott.net_" links on Google, with a proper 301 redirect to the real article, AND it has the canonical link in the page.

Not ideal since mod_rewrite is faster, but it works.

Now I can sleep soundly.

:zzz:

Scottie · Sep 10, 2012

One last thought: At first, I thought this was Google doing it's "I'll try a URL that I know is wrong to see if I get the proper 404 error message".

But, after you posted that Google link above Sirius, I'm thinking somebody has been doing "reverse site promotion" for SOTT, trying really hard to get us dropped in the rankings.

Hmm.

Sirius · Sep 10, 2012

There are not necessarily bad guys out there doing it deliberately. As long as it is possible to use arbitrary subdomains, it will happen somehow.

For example, there be a list with two domains somewhere in a search index of some search engine:
domain.com
www.sott.net
Then a line break occurs and you have domain.comsott.net which I have seen (another domain name of course).

Someone posts an URL in obfuscated form like ww w.sott.net / article … and you get the problem.
For example here: _http://www.youtube.com/watch?v=dZk0ZGNHkoQ
There is a user comment (obviously written with a malfunctioning keyboard):

comment said:
ht tp:/ /cryptome. org/eyeball/daiichi-npp15/daiichi-photos15.htm Cryptome.org has info that everyone imo should know + ht tp://w w w.sott.net/articles/show/228933-HAARP-and-The-Canary-in-the-Mine starts in 2004 to present day. Dunno if yt purged my previous comment or you did. If you I appreciate the protection 4 me effort. Shared sott.net also with henning. The rest of this with you. TC.

And so on …
On the other hand, if you look at constructions like zihggbu.sott.net or xxx.sott.net or 66.sott.net, one might really wonder what actually happened.

Try this one for redirection (add subdomains as you wish):

Code:

# Subdomain handling
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^.+(www|de|es|fr|mail)\.sott\.net$ [NC]
RewriteRule ^(.*)$ http://%1.sott.net/$1 [R=301,L]
RewriteCond %{HTTP_HOST} !^(www|de|es|fr|mail)\.sott\.net$ [NC]
RewriteRule ^(.*)$ http://www.sott.net/$1 [R=301,L]
</IfModule>

Don't use the other code, the dots are not escaped there. It was only an illustration.

You should also add to your robots.txt file certain pages which are not intended for search engines. Append:

Code:

User-agent: *
Disallow: /users/login
Disallow: /users/signup

Add further pages if necessary.

Another SEO issue is archive pages like http://www.sott.net/signs/archive/en/2012/signs20120906.htm
They may be visited by search engines but should not be indexed. They displace article pages and actual content. Such pages should be configured with:

Code:

<meta name="robots" content="noindex, follow">

There is also redirection missing concerning ending slashes. For example:
users/login and users/login/ coexist! This is also where canonical comes in handy.

Scottie · Sep 10, 2012

Sirius said:
There are not necessarily bad guys out there doing it deliberately. As long as it is possible to use arbitrary subdomains, it will happen somehow.

For example, there be a list with two domains somewhere in a search index of some search engine:
domain.com
www.sott.net
Then a line break occurs and you have domain.comsott.net which I have seen (another domain name of course).

That makes sense.

Sirius said:
On the other hand, if you look at constructions like zihggbu.sott.net or xxx.sott.net or 66.sott.net, one might really wonder what actually happened.

Yeah, that's what I am wondering about. That and the more racy links!

Sirius said:
Try this one for redirection (add subdomains as you wish):

So simple! It always is, once you see how to do it. Made a few tweaks, and it ended being way shorter than my solution last night.

Sirius said:
You should also add to your robots.txt file certain pages which are not intended for search engines.

Okay, I added them.

Sirius said:
Another SEO issue is archive pages like http://www.sott.net/signs/archive/en/2012/signs20120906.htm
They may be visited by search engines but should not be indexed. They displace article pages and actual content.

DOH! I went in and added the meta tags recursively to the existing files, and all new ones from here on out will have the meta tag added.

Sirius said:
There is also redirection missing concerning ending slashes. For example:
users/login and users/login/ coexist! This is also where canonical comes in handy.

Okay, I fixed the trailing slash, too. Hmm... Okay, I also added the canonical URL for some more pages on the site.

Geez... All this work just to make Google happy.

Thanks again for your help!

Missing redirection

Sirius

Guest

Scottie

Administrator

Sirius

Guest

Scottie

Administrator

Sirius

Guest

Scottie

Administrator

Sirius

Guest

Scottie

Administrator

Scottie

Administrator

Scottie

Administrator

Sirius

Guest

Scottie

Administrator

Trending content