Daniel Brandt (Scroogle, Google Watch) on Google ranking anomalies

transdimensional · Nov 18, 2008

Daniel Brandt (Scroogle, Google Watch) on Google ranking anomalies

November 17, 2008
Seth Finkelstein's Infothought blog

[Below is a guest post from Daniel Brandt, who gives his experiences and speculations below. His views are of course his own and not necessarily my own, but I do believe them worth hearing]

There is definitely some sort of filtering going on in Google's rankings for certain keywords. It took 18 months for any of the pages on my wikipedia-watch.org site to rank better than 200 deep or so for any combination of keywords from those pages. During this time, Yahoo and Live.com were ranking the same pages well for the same terms.

When I test terms on Google, I test with a multi-threaded private tool that checks more than 30 Google data centers on different Class Cs, and shows the rank up to 100 on each one. I can see changes kicking in and out as they propagate across these data centers. The transitions can take several days in normal cases, as when a new or modified page is appropriated into the results.

Wikipedia-watch.org has been a website now for 36 months. During the first half of that period, no pages ranked higher than 200 deep or so, even if you used two fairly uncommon words from that page to search for it (this is documented at wikipedia-watch.org/goohate.html). During the second half of that period, after it took about four months to settle into the transition, the deeper pages ranked okay, and were on a par with Yahoo and Live. But there was still one glaring exception to this rule: the search for the single word "wikipedia" failed to turn up the home page in the first 100 results almost all of the time during this second period.

When it did show up, it always ranked within the top 15. When it didn't show up, it was always greater than 100. There was never anything in between, and I've been watching this curiosity for the last six months now. For the first five of these months, it might kick in for a few hours on all data centers, and then disappear. This happened several times. Twice it kicked in for a few days, and then disappeared from the top 100 again. During the last 30 days, it has been in about half of the total time, for several days each time, and then disappeared again for days. It's always one or the other -- in the top 15 or not even in the top 100. Meanwhile, the deep pages have ranked okay the last 18 months, and have been stable this entire time.

This behavior is something I'm seeing only for the home page, and only on Google but not on Yahoo or Live. It happens almost exclusively when the word "wikipedia" is the solitary search term, or maybe this one word and another term that's also on that page. If you add a third term you begin ranking reasonably well for my home page, presumably because the search is now specific enough to override the filtering. By the way, this home page has a PageRank of 5 and Yahoo counts 3,500 external backlinks to that home page (there's a counting tool at microsoft-watch.org/cgi-bin/ranking.htm). You cannot use Google to count backlinks, because for years now, Google has been deliberately suppressing this information.

I should also add here that for three years running, another site of mine, Scroogle.org, had a tool that compared the top 100 Google results for a search with the top 100 Yahoo results for that same search. This may come as a surprise to some, but the divergence was consistently 80 percent for all searches. In other words, only 20 out of 100 links showed up on both Yahoo and Google for any search, and the other 80 on each engine were unique in their top 100. The overall quality of the results was about even for each engine. To put this another way, there's a lot of wiggle room for a particular engine to vary the top results, and still look like they're providing the most relevant links.

To make this long story shorter, I believe that there is some sort of backend filter that affects which top results are shown by Google. This actually makes some sense, since most searchers never go beyond the first page of results (at 10 links per page). This means Google's reputation and ad revenue depend heavily on the utility of that first page. A filter that favors recency is one component of this, because Google jacks up recent forum and blog posts (and increasingly even news posts). Everyone expects this by now. Static sites such as wikipedia-watch.org must compete in this sort of environment.

In addition to the recency factor, I think there is filter weighting based on what I call "newbie searches." A newbie search is grandpa or grandma searching for single words such as "wikipedia" or "email" that normally return millions of results, which of course is useless to the searcher. Such searches are stupid to begin with, but Google must cater to stupidity in order to push ads, since ad revenue is 99 percent of total revenue. There might even be some sort of rotational weighting for newbie searches.

And call me a tin-foil hatter if you must, but I also believe that "hand jobs" are involved in tweaking this filter. In other words, there is a political dimension to it as well. Regrettably, I cannot prove this. We need more transparency from Google, and we need it now, before the situation becomes even more suspicious.

source: Seth Finkelstein's Infothought blog

Daniel Brandt (Scroogle, Google Watch) on Google ranking anomalies

transdimensional

Jedi

Trending content