Amit Patel first realized the value of Google’s logs. Patel was one of Google’s very first hires, arriving in early 1999 as a part-timer still working on his Stanford CS PhD. Patel was studying programming language theory but realized he didn’t like the subject too much. (Unlike his bosses, though, he would complete his degree.) Google seemed more fun, and fun was important for Patel, a cherub-faced lover of games and distractions whose business card reads “Troublemaker.” One of his first projects at Google turned out to be more significant than anyone expected. “Go find out how many people are using Google, who’s using it, and what they’re doing with it,” he was told.
The task appealed to Patel, who was only beginning to learn about search engines and data analysis. He realized that Google could be a broad sensor of human behavior. For instance, he noticed that homework questions spiked on weekends. “People would wait until Sunday night to do their homework, and then they’d look up things on Google,” he says. Also, by tracking what queries Google saw the most, you could get a glimpse in real time of what the world was interested in. (A few years later, Patel would be instrumental in constructing the Google Zeitgeist, an annual summation of the most popular search subjects that Google would release to the public at the end of the year.)
But the information that users provided to Google went far beyond the subject matter of their queries. Google had the capacity to capture everything people did on the site on its logs, a digital trail of activities whose retention could provide a key to future innovations. Every aspect of user behavior had a value. How many queries were there, how long were they, what were the top words used in queries, how did users punctuate, how often did they click on the first result, who had referred them to Google, where they were geographically. “Just basic knowledge,” he recalls.
Those logs told stories. Not only when or how people used Google but what kind of people the users were and how they thought. Patel came to realize that the logs could make Google smarter, and he shared log information with search engineers such as Jeff Dean and Krishna Bharat, who were keenly interested in improving search quality.
To that point, Google had not been methodical about storing the information that told it who its users were and what they were doing. “In those days the data was stored on disks which were failing very often, and those machines were often repurposed for something else,” says Patel. One day, to Patel’s horror, one of the engineers pointed to three machines and announced that he needed them for his project and was going to reformat the disks, which at that point contained thousands of query logs. Patel began working on systems that would transfer these data to a safe place. As Google began to evolve a distribution of labor, eventually it mandated that at least one person be working on the web server, one on the index, and one on the logs.
Some years earlier, an artificial intelligence researcher named Douglas Lenat had begun Cyc, an incredibly ambitious effort to teach computers all the commonsense knowledge understood by every human. Lenat hired students to painstakingly type in an endless stream of even the most mundane truisms: a house is a building … people live in houses … houses have front doors … houses have back doors … houses have bedrooms and a kitchen … if you light a fire in a house, it could burn down—millions of pieces of information that a computer could draw upon so that when it came time to analyze a statement that mentioned a house, the computer could make proper inferences. The project never did produce a computer that could process information as well as a four-year-old child.
But the information Google began gathering was far more voluminous, and the company received it for free. Google came to see that instant feedback as the basis of an artificial intelligence learning mechanism. “Doug Lenat did his thing by hiring these people and training them to write things down in a certain way,” says Peter Norvig, who joined Google as director of machine learning in 2001. “We did it by saying ‘Let’s take things that people are doing naturally.’”
On the most basic level, Google could see how satisfied users were. To paraphrase Tolstoy, happy users were all the same. The best sign of their happiness was the “long click”—this occurred when someone went to a search result, ideally the top one, and did not return. That meant Google had successfully fulfilled the query. But unhappy users were unhappy in their own ways. Most telling were the “short clicks” where a user followed a link and immediately returned to try again. “If people type something and then go and change their query, you could tell they aren’t happy,” says Patel. “If they go to the next page of results, it’s a sign they’re not happy. You can use those signs that someone’s not happy with what we gave them to go back and study those cases and find places to improve search.”
Those logs were tutorials on human knowledge. Google’s search engine slowly built up enough knowledge that the engineers could confidently allow it to choose when to swap out one word for another. What helped make this possible was Google’s earlier improvement in infrastructure, including the techniques that Jeff Dean and Sanjay Ghemawat had developed to compress data so that Google could put its index into computer memory instead of on hard disks. That was a case where a technical engineering project meant to speed up search queries enabled a totally different kind of innovation. “One of the big deals about the in-memory index is that it made it much more feasible to take a three-word query and say, ‘I want to look at the data for fifteen synonymous words, because they’re all kind of related,’” says Dean. “You could never afford to do that on a disk-based system, because you’d have to do fifteen disk seeks instead of three, and it would blow up your serving costs tremendously. An in-memory index made for much more aggressive exploration of synonyms and those kinds of things.”
“We discovered a very early nifty thing,” says search engineer Amit Singhal, who worked hard on synonyms. “People change words in their queries. So someone would say, ‘Pictures of dogs,’ and then they’ll say ‘Pictures of puppies.’ That said that maybe dogs and puppies were interchangeable. We also learned that when you boil water, it’s hot water. We were learning semantics from humans, and that was a great advance.”
Similarly, by analyzing how people retracked their steps after a misspelling, Google devised its own spell checker. It built that knowledge into the system; if you typed a word inaccurately, Google would give you the right results anyway.
But there were obstacles. Google’s synonym system came to understand that a dog was similar to a puppy and that boiling water was hot. But its engineers also discovered that the search engine considered that a hot dog was the same as a boiling puppy. The problem was fixed, Singhal says, by a breakthrough late in 2002 that utilized Ludwig Wittgenstein’s theories on how words are defined by context. As Google crawled and archived billions of documents and web pages, it analyzed which words were close to each other. “Hot dog” would be found in searches that also contained “bread” and “mustard” and “baseball games”—not “puppies with roasting fur.” Eventually the knowledge base of Google understood what to do with a query involving hot dogs—and millions of other words. “Today, if you type ‘Gandhi bio,’ we know that ‘bio’ means ‘biography,’” says Singhal. “And if you type ‘bio warfare,’ it means ‘biological.’”
Over the years, Google would make the data in its logs the key to evolving its search engine. It would also use those data on virtually every other product the company would develop. It would not only take note of user behavior in its released products but measure such behavior in countless experiments to test out new ideas and various improvements. The more Google’s system learned, the more new signals could be built into the search engine to better determine relevance.
Sergey Brin had written the original part of the Google search engine that dealt with relevance. At that point it was largely based on PageRank, but as early as 2000 Amit Singhal realized that as time went on, more and more interpretive signals would be added, making PageRank a diminishing factor in determining results. (Indeed, by 200
9, Google would say it made use of more than two hundred signals—though the real number was almost certainly much more—including synonyms, geographic signals, freshness signals, and even a signal for websites selling pizzas.) The code badly needed a rewrite; Singhal couldn’t even stand to read the code that Brin had produced. “I just wrote new,” he says.
Singhal completed a version of the new code in two months and by January 2001 was testing it. Over the next few months, Google exposed it to a percentage of its users and liked the results. They were happier. Sometime that summer, Google flipped the switch and became a different, more accurate service. In accordance with the company’s fanatical secrecy on such matters, it made no announcement. Five years later, Singhal was acknowledged by being named a Google Fellow, awarded an undisclosed prize that was almost certainly in the millions of dollars. There was a press release announcing that Singhal had received the award, but it did not specify the reason.
Google’s search engines would thereafter undergo major transformations every two or three years, with similar stealth. “It’s like changing the engines on a plane flying a thousand kilometers an hour, thirty thousand feet above the earth,” says Singhal. “You have to do it so the passengers don’t feel that something just happened. And in my time, we have replaced our propellers with turboprops and our turboprops with jet engines. The passengers don’t notice, but the ride is more comfortable and the people get there faster.”
In between the major rewrites, Google’s search quality teams constantly produced incremental improvements. “We’re looking at queries all the time and we find failures and say, ‘Why, why, why?’” says Singhal, who himself became involved in a perpetual quest to locate poor results that might have indicated bigger problems in the algorithm. He got into the habit of sampling the logs kept by Google on its users’ behavior and extracting random queries. When testing a new version of the search engine, his experimentation intensified. He would compile a list of tens of thousands of queries, simultaneously running them on the current version of Google search and the proposed revision. The secondary benefit of such a test was that it often detected a pattern of failure in certain queries.
As best as he could remember, that was how the vexing query of Audrey Fino came into Amit Singhal’s life.
It seemed so simple: someone had typed “Audrey Fino” into Google and was unhappy with the result. It was easy for Singhal to see why. The results for that query were dominated by pages in Italian gushing about the charms of the Belgian-born actress Audrey Hepburn. This did not seem to be what the user was looking for. “We realized that this was a person’s name,” says Singhal. “There’s a person somewhere named Audrey Fino, and we didn’t have the smarts in the system to know this.” What’s more, he realized that it was a symptom of a larger failure that required algorithmic therapy. As good as Google was, the search engine stumbled with names.
This spurred a multiyear effort by Singhal and his team to produce a name detection system within the search engine. Names were important. Only 8 percent of Google’s queries were names—and half of those celebrities—but the more obscure name queries were cases where users had specific, important needs (including “vanity searches” where people Googled themselves, a ridiculously common practice). So how would you devise new signals to more skillfully identify names from queries and dig them out of the web corpus? Singhal and his colleagues began where they almost always did: with data. To improve search, Google often integrated external databases, and in this case Google licensed the White Pages, allowing it to use all the information contained in hundreds of thick newsprint-based tomes where the content consisted of nothing but names (and addresses and phone numbers). Google’s search engine sucked up the names and analyzed them until it had an understanding of what a name was and how to recognize it in the system.
But the solution was trickier than that. One had to take context into effect. Consider the query “houston baker.” Was the user looking for a person who baked bread in Texas? Probably. But if you were making that query very far from the Lone Star State, it’s more likely that you were seeking someone named after the famous Texan. Google had to teach its search engine to tell the difference. And a lot of the instruction was done by the users, clicking millions of times to direct their responses to the happy zone of short clicks.
“This is all just learning,” says Singhal. “We had a computer learning algorithm on which we built our name classifier.”
Within a few months Singhal’s team built the system to make use of that information and properly parse name queries. One day not long after that, Singhal typed in the troublesome query once more. This time, rising above the pages gushing about the gamine who starred in Roman Holiday, there was a link providing information about an attorney who was, at least for a time, based in Malta: Ms. Audrey Fino.
“So now we can recognize names and do the right thing when one comes up,” says Singhal five years after the quest. “And our name recognition system is now far better than when I invented it, and is better than anything else out there, no matter what anyone says.”
One day in 2009, he showed a visitor how well it worked, also illuminating other secrets of the search engine. He opened his laptop and typed in a query: “mike siwek lawyer mi.”
He jabbed at the enter key. In a time span best measured in beats of a hummingbird’s wing, ten results appeared. There were the familiar “ten blue links” of Google search. (The text consisting of the actual links to the pages cited as results was highlighted in blue.) Early in Google’s history Page and Brin had decided that ten links was the proper number to show on a page, and numerous tests over the years had reinforced the conviction that ten was the number that users preferred to see. In this case, the top result was a link to the home page of an attorney named Michael Siwek in Grand Rapids, Michigan. This success came as a result of the efforts put into motion by the Audrey Fino problem. The key to understanding a query like this, Singhal said, was the black art of “bigram breakage”: that is, how should a search engine parse a series of words entered into the query field, making the kind of distinctions that a smart human being would make?
For instance, “New York” represents two words that go together (in other words, a bigram). But so do the three words in “New York Times,” which clearly indicate a different kind of search. And everything changes when the query is “New York Times Square,” in which case the breakage would come … well, you know where.
“Deconstruct this [Siwek] query from an engineer’s point of view,” says Singhal. “Most search engines I have known in my academic life will go ‘one word, two words, three words, four words, done.’ We at Google say, ‘Aha! We can break this here!’ We figure that ‘lawyer’ is not a last name and ‘Siwek’ is not a middle name,” he says. “And by the way, lawyer is not a town in Michigan. A lawyer is an attorney.”
This was the hard-won view from inside the Google search engine: a rock is a rock. It’s also a stone, and it could be a boulder. Spell it rokc, and it’s still a rock. But put “little” in front of “rock,” and it’s the capital of Arkansas. Which, is not an “ark.” Unless “Noah” is around.
All this helped to explain how Google could find someone whose name may have never appeared in a search before. (One-third of all search queries are virgin requests.) “Mike Siwek is some person with almost no Internet presence,” says Singhal. “Finding that needle in that haystack, it just happened.”
Amit Singhal turned forty in 2008. The search team celebrated with a party in his honor. As one might expect, it was a joyous celebration. Certainly there was much to celebrate besides a birthday. Consider that these were geeky mathematicians who in an earlier era would have written obscure papers and be scraping by financially on an academic’s salary. Now their work directly benefited hundreds of millions of people, and they had in some way changed the world. Plus, many of them owned stock options that had made them very wealthy.
Just before the dinner was to co
mmence, Singhal’s boss handed a phone to him. “Someone wants to talk to you,” he said.
A female voice that Singhal did not recognize congratulated him on his milestone. “I’m sorry,” he said. “Do I know you? Did we overlap academically?”
“Oh, I’m an academic,” she said. “But we didn’t overlap.”
“Did I influence your work, or did you influence my work?”
“Well,” the woman said, “I think I influenced your work.”
Singhal was at a loss.
“I’m Audrey Fino,” she said.
Actually, she was not Audrey Fino. Singhal’s boss had hired an actress to portray the woman. The Google search engine had been able to locate the digital trail of Audrey Fino, but could not produce the actual person. That sort of magic would have to wait until later.
The secret history of Google was punctuated by similar advances, a legacy of breaking ground in computer science and keeping its corporate mouth shut. The heroes of Google search were heroes at Google but nowhere else. In every one of the four aspects of search—crawling, indexing, relevance, and speedy delivery of results—Google made advances. Search quality specialists such as Amit Singhal were like the quarterbacks and wide receivers on a football team: the eye-popping results of their ranking efforts got the lion’s share of attention. But those results relied on collecting as much information as possible. Google called this “comprehensiveness” and had a team of around three hundred engineers making sure that the indexes captured everything. “Ideally what we want to have is sort of a true mirror of the web,” says a Google engineering VP. “We want to have a copy of every document that’s out there or as many as we can possibly get, we want our copy to be as close to that original as possible both in time and in terms of representation, and then we want to organize that in such a way that it’s easy and efficient to serve, and ultimately to rank.”
Google did all it could to access those pages. If a web page required users to fill out a form to see certain content, Google had probably taught its spiders how to fill out the form. Sometimes content was locked inside programs that ran when users visit a page—applications running in the JavaScript language or a media program like Adobe’s Flash. Google knew how to look inside those programs and suck out the content for its indexes. Google even used optical character recognition to figure out if an image on the website had text on it.