Google it. While this term is widely used, do you really know how Google’s search engine actually works? How they come up with what sites list first and which sites will help you find what information you’re looking for? In a new section entitled “How Search Works” Google walks interested users through how billions of searches happen each day. An interactive infographic by Google allows users to educate themselves about the search process and how Google deals with the spam and other useless pages.
According to SearchEngineLand, the new area was inspired by Google’s The Story Of Send, an interactive infographic that Google released last year to explain how it handles email.
The interactive infographic has three parts: crawling & indexing, algorithms and fighting spam.
Section 1: Crawling and Indexing
Google uses software called “web crawlers” to discover publicly available webpages. The most popular crawler is “Googlebot”. The crawling bot, sometimes also called a “spider”, preforms the crawling process by which Googlebot discovers new and updated pages to be added to the Google index. This means Google looks at different webpages and follows the links on those pages, much like you would if you were browsing content on the web. Then the crawlers go from link to link and bring data about those webpages back to Google’s servers. The software pays special attention to new sites, changes to existing sites and dead links.
The internet can be compared to a limitless and ever growing library. Google gathers pages during the crawl process and creates an index, so users know exactly how to look things up. Much like the index in the back of a book, the Google index includes information about words and their locations. When you search, at the most basic level, our algorithms look up your search terms in the index to find the appropriate pages.
However, the search process gets a little more complicated from there. When a user searches for “dogs” they don’t necessarily want a page with the word “dogs” on it hundreds of times. They probably want pictures, videos or a list of breeds. Google’s indexing systems note many different aspects of pages, such as when they were published, whether they contain pictures and videos, and much more.
As users begin to explore the interactive infographic they will notice there are links and hidden pop ups and may discover that they reveal more information, as they hover the mouse over certain areas and click.
Section 2: Algorithms
Users want answers, not trillions of webpages. Algorithms are computer programs that look for clues to give users back exactly what they want. This section of the infographic also lets user discover different different aspects of the process to learn more.
Algorithms are the computer processes and formulas that take your questions and turn them into answers. Today Google’s algorithms rely on more than 200 unique signals or “clues” that make it possible to guess what you might really be looking for. These signals include things like the terms on websites, the freshness of content, your region and PageRank.
Section 3: Fighting Spam
Every day, millions of useless spam pages are created. Google makes a valiant effort to fight spam through a combination of computer algorithms and manual review.
Spam sites attempt to make their way to the top of search results through different techniques - like repeating keywords over and over, buying links that pass PageRank or putting invisible text on the screen. This is horrible for search because relevant websites get buried below all of the nonsense and it’s bad for legitimate website owners because their sites become harder to find.
The good news is that Google’s algorithms can detect the vast majority of spam and demote it automatically. For the rest, we have teams who manually review sites.
The different types of spam include: cloaking and/or sneaking redirects, hacked sites, hidden texts and/or keyword stuffing, parked domains, pure spam, spammy free hosts and dynamic DNS providers, thin content with little or no added value, unnatural links from a site, unnatural links to a site and user-generated spam.
Source: Google










