Monday, November 28, 2011

Indexing a web page

After a web page is crawled, the next important step is to index its content. The indexed web page is stored in a giant database on servers around all the world, from where it can later be retrieved. Essentially, the process of indexing is identifying the key words and expressions that best describe the web page and assigning the page to particular keywords. For a human it will not be possible to process such amounts of information, but also generally search engines like google deal just fine with this task. Sometimes they might not get the meaning of a web page right but if you help them by optimizing it, it will be easier for them to classify your web pages correctly and for you, to get some higher ranking.

When a search request comes, the search engine processes it, compares the search string in the search request with the indexed pages in the database on the servers around the world. Since it is likely that more than one page contains the search string, the search engine starts calculating the relevancy of each of the pages in its index with the search string and algorithms. There are various algorithms to calculate relevancy od a content or a web site. Each of these algorithms has different relative weights for common factors like a keyword density, internal and external links, description... That is why different search engines give different search results pages for the same search string in order to et the best and relevants content.

What is more, it is a known fact that all major search engines, like Google or Yahoo etc. periodically change their algorithms (every month) and if you want to keep at the top, you also need to adapt your pages to the latest changes of this important algorithms. This is one reason (the other is your competitors) to devote permanent efforts to seo, if you'd like to be at the top of search results. The last step in search engines is activity and retrieving the results. Basically, it is nothing more than simply displaying them in the browser, the endless pages of search results that are sorted from the most relevant to the least relevant web sites.

No comments:

Post a Comment