Anaylsis of Results
Microsoft’s beta release of their new search engine prompted several fast reviews on the web by diverse organizations, from the BBC Online, to Search Engine Journal. The quick glances at the search engine unfortunately do not take a detailed, comprehensive look at the differences between the results returned by the engines and the quality of the content.
The analysis I have completed below indicates that the search wars are wide open. Google currently provides the highest quality content and best matches to the search query in its results, but is only slightly ahead of Yahoo!’s search engine. MSN’s beta release trails Yahoo! by a small amount in the numbers, but an individual glance at the results reveals much work is needed to play catchup. From the results, it also appears that both Yahoo! and MSN are sorely lacking the index size that Google has. This competitive advantage may be the difference in Google’s ability to provide more relevant results.
A look at the PageRank & measurable backlinks to the sites reveals similar trends between the 3, but Google’s massively larger index and ability to crawl more web pages more frequently may be the key to their current success. Yahoo! and MSN still have only a fraction of the pages in cache that Google does. A rough guess based on the results would be that Yahoo!’s index is approximately 1/2 the size of Google’s, while Microsoft’s is probably less than 1/4 Google’s size.
The measures of content quality and match quality were made as objectively as possible and show that while Google on average provides the best results, there is still a very long way to go towards providing an excellent search engine for the web. The search term “HDTV Comparison”, for example, produced not 1 result among 30 (the top 10 results at each search engine) that I considered excellent. Searches for “George Plimpton” & “Order Digital Prints Online” produced very satisfactory results at Yahoo! & Google, with Yahoo! actually providing slightly better results overall. Google, meanwhile, was very dominant in results for “US National Holidays”, providing many excellent resources that the competition did not have.
Spamming each of the search engines is still surprisingly easy. Many results at all 3 search engines were clearly little more than traffic grabbing pages with no real, unique content. The search engine that deals with this issue most effectively will gain a significant advantage for the future, as sites & pages like these multiply on the web into the millions.
I was initially disappointed in MSN’s poor results, but it appears that they are not far off the results of Yahoo! & Google. While their beta release can not be called a superlative effort, it clearly shows the potential to provide competition to the other industry players. The key to short-term success for both Yahoo! & MSN will be finding a way to build their index fast enough to compete with Google’s giant head-start.
I expect to add further analysis that reflects the discussion on SEO forums about these results.
For an SEO professional, the results below indicate that Google’s trend of favoring “authority” sites as defined by the hilltop algorithm continues unabated. With the exception of several odd standouts, Google, Yahoo! & MSN all give great credence to sites that have many thousands of backlinks, a high PageRank & a large number of pages on the site. The era of small, niche sites is falling to the wayside as the search engines look for the dominance of major web presences like news organizations, government websites & large commercial endeavors.
For MSN specifically, the results are very hard to pick apart. It appears that MSN may be using some additional pieces of ranking technology in their algorithm, possibly further questioning the value of non-relevant links. However, the current size of their index in comparison with Yahoo! & Google makes it exceptionally difficult to determine which pieces they consider important and which pieces are simply left out because they have not yet been spidered. The only piece which stuck out in my analysis, although it was not included in my statistics, is that MSN does not appear to have Google’s marked preference for older sites.
For additional information see Google vs. Yahoo! Results.
I welcome additional analysis from anyone who wishes to use the results. Please reference this page if you copy or use the information contained herein.
Survey Data & Key
This survey comprises four relatively common searches at the three major search engines, including MSN’s new Beta search engine release. Each search was conducted with the default settings at each of the following URLs:
The tables contain five measurements for each result in the SERPs.
- PR – This is the Google PageRank according to the toolbar for the homepage of the site
- YLinkD – This is the number of results according to a query run at Yahoo! for the number of links to the top level domain (linkdomain:url.com -site:url.com).
- MSNLink – This is the number of results according to a query run at MSN’s Beta Search engine for the number of links to the top level domain (link:url.com).
- Size – This value is taken from a search at Google for the number of pages in the index (site:url.com).
- Qual – This is a subjective measure of the quality of the page’s content. The score is X/10 based on the terms listed in the quality of content criteria chart.
- Match – This is a subjective measure of the closeness of the match in the site’s content to the search terms entered in the query. The score is X/10 based on the terms listed in the match criteria chart.
Quality of Content Criteria Chart
The quality of content criteria judges the usability, writing, images, files & overall quality of information on the page. It does not factor in how particularly relevant the information was to the query.
- 0/10 – Did not have any valuable/unique content
- 1/10 – Entirely duplicate/spam content
- 2/10 – Mostly duplicate/spam content
- 3/10 – Very low quality content
- 4/10 – Poor content with some helpful information
- 5/10 – Mediocre content
- 6/10 – Effectively covered the subject
- 7/10 – Very effectively the subject, provided links to other resources, images, files, etc.
- 8/10 – In-depth coverage, very high quality information
- 9/10 – Excellent resource on the subject, exceptional information
- 10/10 – Authoritative resource worthy of an award
Match Criteria Chart
The match criteria judges how effectively the content provided on the page matches the query that was entered. For queries with multiple potential meanings (i.e. George Plimpton, US National Holidays) the highest match criteria scores will be for those that tackle the subject multiple potential aspects of the search.
- 0/10 – Had nothing to do with the search
- 1/10 – Almost no relation to the search
- 2/10 – Very slight match
- 3/10 – Some information matched the query
- 4/10 – Nearly matched the search
- 5/10 – Mediocre match for the search
- 6/10 – Acceptable match
- 7/10 – Above average match, covered at least one aspect very well
- 8/10 – Good match, covered one aspect very well or multiple aspects
- 9/10 – Near perfect match, covered multiple aspects in-depth
- 10/10 – Flawless match, perfect coverage of every possible aspect of the query
PageRank was clearly most important to Google, with the average PR for a result from Google at 6.9, compared to 6.5 for Yahoo! & 5.0 for MSN.
Massive numbers of backlinks, however, seemed most important to Yahoo!, whose average # of bls was 5.6mil compared to 1.9mil for Google & 900K for MSN.
The highest match criteria results for a single search was for “Order Digital Prints Online” at Google – a 7.7, while the highest quality criteria results came from the Yahoo! search for “George Plimpton”.
MSN’s average quality criteria results actually slightly edged out Yahoo!’s average – 5.4 to 5.3.
Yahoo! gave the most results with large sites – an average of 807K pages per site to Google’s 474K and MSN’s 58K.
Results for averages were often widely skewed by sites with exceptional numbers of links & pages. Normalized results will appear here once these are complete to allow for a more balanced analysis.