Wednesday, July 09, 2008

Introduction to Google Ranking

7/09/2008 08:50:00 AM Posted by Amit Singhal, Google Fellow

In May, Udi Manber introduced our search quality group, the group responsible for the ranking of search results. He introduced various teams within "Quality" (as we like to call the group) including Core Ranking, International Search, User Interfaces, Evaluation, Webspam, and other teams. In this post, I want to tell you more about one of these: the Core Ranking team.

Let me introduce myself. My name is Amit Singhal. I'm a Google Fellow in charge of the ranking team at Google. I've worked in the field of search for the past eighteen years, having been introduced to search in 1990 as a graduate student in computer science. In the academic world, the field of search is known as Information Retrieval (or IR). After spending a decade as an IR researcher, I came to Google in 2000, and have worked on Google ranking ever since.

Google ranking is a collection of algorithms used to find the most relevant documents for a user query.

We do this for hundreds of millions of queries a day, from a collection of billions and billions of pages. These algorithms are run for every query entered into most of Google's search services. While our web search is the most used Google search service and the most widely known, the same ranking algorithms are also used - with some modifications - for other Google search services, including Images, News, YouTube, Maps, Product Search, Book Search, and more.

The most common question I get asked about Google's ranking is "how do you do it?" Of course, there is a lot that goes into building a state-of-the-art ranking system like ours, and I will delve deeper into the technology behind it in a later post. Today, I would like to briefly share the philosophies behind Google ranking:

1) Best locally relevant results served globally.
2) Keep it simple.
3) No manual intervention.

The first one is obvious.
Given our passion for search, we absolutely want to make sure that every user query gets the most relevant results. We often call this the "no query left behind" principle. Whenever we return less than ideal results for any query in any language in any country - and we do (search is by no means a solved problem) - we use that as an inspiration for future improvements.

The second principle seems obvious. Isn't it the desire of all system architects to keep their systems simple? Well, as search systems go, given the wide variety of user queries we have to respond to in multiple languages, it is easy to go down the path where more and more complexity creeps into the system to serve the next incremental fraction of the queries. We work very hard to keep our system simple without compromising on the quality of results. This is an ongoing effort, and a worthy one. We make about ten ranking changes every week and simplicity is a big consideration in launching every change. Our engineers understand exactly why a page was ranked the way it was for a given query. This simple understandable system has allowed us innovate quickly, and it shows. The "keep it simple" philosophy has served us well.

No discussion of Google's ranking would be complete without asking the common - but misguided! :) - question: "Does Google manually edit its results?"

Let me just answer that with our third philosophy: no manual intervention. In our view, the web is built by people. You are the ones creating pages and linking to pages. We are using all this human contribution through our algorithms. The final ordering of the results is decided by our algorithms using the contributions of the greater Internet community, not manually by us. We believe that the subjective judgment of any individual is, well ... subjective, and information distilled by our algorithms from the vast amount of human knowledge encoded in the web pages and their links is better than individual subjectivity.

The second reason we have a principle against manually adjusting our results is that often a broken query is just a symptom of a potential improvement to be made to our ranking algorithm.

Improving the underlying algorithm not only improves that one query, it improves an entire class of queries, and often for all languages. I should add, however, that there are clear written policies for websites recommended by Google, and we do take action on sites that are in violation of our policies or for a small number of other reasons (e.g. legal requirements, child porn, viruses/malware, etc).

Stay tuned for my followup post, where I will discuss in detail the technologies behind our ranking and show examples of several state-of-the-art ranking techniques in action. Let me just conclude this post by saying, our passion for search is stronger than ever - and as a search researcher, I have the best job in the world :-).