Sign in to view more
Sign in to view more
My research at Stanford mainly involved developing algorithms for Web search. The basic technology behind search engines has existed for many years (since the 1960's). The advent of the World Wide Web greatly changed the scale of the data, requiring much more sophisticated technology, but many of the key building blocks existed previously. A search engine allows a user to type in a query, which is a set of terms describing what the user is looking for. The search engine must then return a list of relevant web pages. Often times people ask, "How is it possible for a search engine to search the entire web for my query in a fraction of a second?" The heart of a search engine is the index; that is what allows a search engine to answer your queries so quickly. There is a very simple analogy that everyone is familiar with. If I give you a book about India, and ask you to find me the passage describing Mumbai, would you flip through the book, page-by-page, looking for a paragraph discussing Mumbai? No, you would immediately turn to the end of the book, where there is an index, listing exactly the pages on which the word Mumbai appears. You then only need to concern yourself with those pages when looking for information about Mumbai. A search engine index works in exactly the same way; before you have ever issued a query, the search engine will crawl the Web, and build up an index that lists for each word, all of the web pages that contain that word. Of course, a book has only a few hundred pages, whereas the Web has billions of pages, making the problem much more complex in practice. But at a very high level, the analogy with finding information in a book holds. One of the critical challenges search engines face is that the user wants to see only a few (say ten) results; figuring out which ten results to display for the query Mumbai out of the millions of pages that discuss Mumbai is a very difficult problem, and is the target of substantial research and development.