Search Engines Methodology
Having identified the major components of the search engines let us now look at their Methodology of producing results.
This look involves two steps.
The first is the way search engines rank pages for query relevance. This has material impact on your search results for various keywords.
Second it is normally a good idea to know as to which search engine supplies results to whom. Yes that’s right. Not all search sites are original. Some just borrow & tweak the results generated by other sites (of course under a license) & present them in their own formats. This helps you to understand the structure of web search industry.
Lets say you are looking to find the movie review for a certain Hollywood flick. What would you do? Go to your favorite search engine(every one has one) & type the name of the movie. Then maybe as an afterthought add the word review.
This will tell the SE that you are looking for review on the particular movie. Based on this knowledge it will start to scan its index for relevant matches. Lets say it finds 20,000 matches for your query. How will it present them to you? Keeping in mind that not all the results will be relevant to you & some will be more relevant than others it will have to come up with a priority mechanism which lists the results in some format.
This then is the backdrop in which SE display their propriety hidden arsenal: the ranking algorithm. Minus the hype this means a software program which, according to a certain set parameters will rank the results. However this is where the fun lies. There are huge amount of permutations possible in which SE can rank web sites for specified search terms. SE which have been able to devise better ranking algorithms are more successful then their competitors. Therein lies the key factor of googles popularity.
Most of the search engines guard the parameters & specific weights assigned to these parameters in their algorithms with great zest as an access to it can enable a webmaster to manipulate his result rankings for the specified search term (keywords). However generally there are two sets of parameters which typically cloud the SE ranking vision:
On page factors
Off page factors
On page factors are the occurrence of the search term in the pages’ title, description, meta tags, text or body copy etc. Two things which matter for on page factors are the location & the frequency of search term. The higher it appears on the page & the more it appears will suggest to the search engine that this page is more relevant (oh did I mention that search engine rank pages according to your search terms & not entire web sites. To illustrate, for one search term a page from a web site can appear at the top position & another page from the same web site can debut at 1000th rank also).
On page factors are easier to manipulate & hence the widespread temptation to do so. As a result sometimes web masters are penalized for spamming. These are strategies by which web masters try & mislead SE about the pages content.
Off page factors As a result of wide spread spamming in late nineties, web search became a very frustrating experience through the popular search engines. Sites that had no relevance to the search term started to appear tops in the search results. This was because only, on the page factors, were being considered for rankings by SE & they were often ” Guided”.
Something had to happen to redeem the situation & it did. Search engines got smarter & they took the next step in tweaking their algorithms .They started to include factors that were not directly in the page as a part of their ranking methodology. This typically meant factors like links coming to your page from other pages. The more quality incoming links you had(more people were linking to you or you were popular) the more it meant to the search engine that you were relevant to a specific search term.
Pioneered by google this innovation has been refine by teoma which now lists web sites as hubs (meta sites which provides lots of links ) & resources (sites which have incoming links). As an aside meta sites are in fact good source to start a broad topic search. It will have the advantage of relevance & coverage.
Another off page factor is click through measurement. This means that pages which are attracting more visitors for specific search term are ranked higher then their less visited competitors.
An advantage of including off page factors in the results is that they are more difficult to manipulate. Hence give search engines more control on their results & a better result for a search.
Now almost all SE use some combination of off page & on page factors for their algorithms.
Search Engines who is behind whom:
Major consolidations have happened in the search engine industry. Friend have become competitors. Competitors have shaken hands. SE of yore repositioned themselves as portals. The whole industry as it stands now has narrowed down to a few players who provide the basic search function. Rest of the people have started to package those results in their own format.
To give an analogy this is also how the news industry operate. Most of the newsletters & chronicles are fed breaking news stories by wire services. These are then presented by the specific news papers in their formats. This of course is supplemented by reputed dailies by their own correspondents in the field. The SE counterpart is having their own or borrowed directory results mixed as a part of their final results.
The search results sharing goes hand in hand with the advertising revenue sharing. Hence it serves as a mutually beneficial scenario for all. The basic feeders find more audience hence can command a greater ad revenue. The receptors can get ad revenue without actually incurring the costs of running a search service. Overture has been the most outstanding player in the paid listing category. It has the largest network for distributing its paid inclusion results to other companies.
Herein lies a brief list of feeders & receptors. This list covers the major players in a summary. For more detailed charts & listings you can check this excellent visual relationship chart by Bruce Clay (pdf)at http://www.bruceclay.com/searchenginechart.pdf.
Alternatively you look at the table at searchenginewatch. Here is a snapshot:
Search Engine Provider Provider Notes(Read Down) (Google) Yahoo/Overture
Google Main & Paid Open
Yahoo Main & Paid Yahoo Directory an option
MSN Main & Paid (12/05 & 6/05)
AOL Main & Paid Main & Paid AOL-owned (est. 10/05+) (12/05 & 6/05) Open Directory an option
Excite Network Main & Paid Excite.com (at iWon, MyWay, is InfoSpace-powered My Web Search)
Ask Jeeves Paid Main from Ask-owned Teoma (until 2007)InfoSpace Runs several meta search engines. Dogpile is most popular, representative of others. Google (2006), Yahoo (3/06) & many small providers have distribution deals.AltaVista Main &Paid Open Directory an option; owned by Yahoo
AllTheWeb Main & Paid Owned by Yahoo
HotBot Paid Main Backupfrom Google & Ask; Owned by Lycos Lycos Paid Backup Main from LookSmart; Open Directory an optionNetscape Main & Paid 3.40 Owned by AOL; (est. 10/05+) Open Directory an option Teoma Paid 2.29 Main from Teoma; owned (Sept 05) by Ask; Paid can endas early as 9/04
This then is a summary of the way search engines rank web sites as well as the way search industry operates.