The accumulation of all those improvements lengthened Google’s lead over its competitors, and the circle of early adopters who first discovered Google was eventually joined by the masses, building a dominant market share. Even Google’s toughest competitors had to admit that Brin and Page had built something special. “In the search engine business, Google blew away the early innovators, just blew them away,” says Bill Gates. “And the remains of those people will be long forgotten.”
One of PageRank’s glories (and its original advantage over AltaVista) was its resistance to spam. (The term in this sense meant not unwanted email but links in its results page that secured undeservedly high rankings by somehow tricking the system.) But as Google became the first place that millions of people looked for information on shopping, medical concerns, their friends, and themselves, the stakes were raised.
The engineer who found himself at the center of the company’s spam efforts was an inveterately social twenty-eight-year-old Kentuckian named Matt Cutts. In the summer of 1999, he was pursuing a doctorate at the University of North Carolina when he got stuck with his thesis and on a whim called Google asking what it paid engineers. He got a response saying that it didn’t reveal such information until it was actually negotiating with job candidates. Cutts went back to his thesis, but a couple of days later, he got another message: “Would you like to be in active negotiation?” Clearly, he’d been Googled. After some phone screeners, he flew out to California, getting a taste for the company’s frugality when Google put him up in one of the funky clapboard motels on El Camino Real. Visiting the Google headquarters, he was taken aback by the scene: people working at haphazardly placed sawhorse desks and the director of engineering, Urs Hölzle, playing a high-tech game of fetch with his huge dog, making the floppy beast chase the beam of a laser pointer. In the whirl of interviews, Cutts would remember one question: “How’s your UNIX kung fu?” (UNIX being a popular operating system used in many of Google’s operations.) “My UNIX kung fu is strong,” Cutts replied, deadpan.
He got the job, though his fiancée wouldn’t move to California unless they married immediately. After a courthouse wedding and a Caribbean honeymoon, bride and groom drove across the country to Cutts’s new job in January 2000, where he sat in a cubicle outside Larry and Sergey’s office. Eventually he found himself in an office with Amit Singhal, Ben Gomes, and Krishna Bharat. It was like entering the high temple of search.
Cutts’s first job was helping to create a product called SafeSearch, which would allow people to block pornography from search results. Getting rid of unwanted porn was always a priority for Google. Its first attempt was to construct a list of five hundred or so nasty words. But in 2000, Google got a contract to provide search to a provider that wanted to offer a family-safe version of search to its customers. It needed to step up its game. Brin and Page asked Cutts how he felt about porn. He’d have to see a lot of it to produce a system to filter it out of Google.
Cutts asked his colleagues to help him locate adult websites so he could extract signals to better identify and block them, but everyone was too busy. “No one will help me look for porn!” he complained to his wife one night. She volunteered to bake chocolate chip cookies for Cutts to award to Googlers who found porn sites that slipped through Cutts’s blockade. At the time, Google was updating the index once a month, and before the new version was released, Cutts would host a Look for Porn Day, bringing in his spouse’s confections. “She’s still known as the porn cookie lady at Google,” he says.
The major porn sites were fine with the process; they knew it was bad for them when searchers unintentionally stumbled upon their warehouses of sin, making them a target for muckrakers and publicity-seeking legislators. But not all such sites were good citizens. Cutts noticed that one nasty site used some clever methods to game Google’s blocking system and score high in search results. “It was an eye-opening moment,” says Cutts. “Page-Rank and link analysis may be spam-resistant, but nothing is spam-proof.”
The problem went far beyond porn. Google had won its audience in part because it had been effective in eliminating search spam. But now that Google was the dominant means of finding things on the Internet, a high ranking for a given keyword could drive millions of dollars of business to a site. Sites were now spending time, energy, and technical wizardry to deconstruct Google’s processes and artificially boost page rank. The practice was called search engine optimization, or SEO. You could see their handiwork when you typed in the name of a hotel. The website of the actual hotel would not appear on the first page. Instead, the top results would be dominated by companies specializing in hotel bookings. This made Google less useful. Cutts went to Wayne Rosing and told him that the company really needed to work on stopping spam. Rosing told him to go ahead and try.
A delicate balance was required. Legitimate businesses as well as shady ones partook in the sport. Highly paid consultants tried to reverse-engineer PageRank and other Google techniques. Even amateurs could partake in the hunt for “Google juice,” buying books like Search Engine Optimization for Dummies. The conjurers of this field would gather several times a year at conferences, with hotel ballrooms packed to the gills with webmasters and consultants.
Google maintained that certain SEO methods—such as making sure that the subject matter of the page was reflected in the title and convincing webmasters of popular websites to put links to your site when relevant—were good for the web in general. This begged the question: if a website had to hire outside help to improve its rankings, wasn’t that a failure of Google, whose job it is to find the best results for its users, no matter how the information is formatted or who links to it?
“Ideally, no one would need to learn SEO at all,” Cutts says. “But the fact is that it exists and people will be trying to promote themselves, so you want to be a part of the conversation and say, ‘Here are some good ethical things to do. Here are some things that are very high risk. Stay away from them.’” Cutts would admit that because not everyone has SEO expertise, sometimes Google underranks worthy sites. One example was famous: the query “Eika Kerzen.” That was not a name but a German candle manufacturer (kerzen is the German word for “candles”), whose presence was shamelessly low in rankings for keywords that should have unearthed its excellent products. This matter was dumped on Amit Singhal, who launched an algorithmic revamp of the threshold by which Google translated part of a query into another language, a solution that resolved a whole category of such troublesome results.
A perpetual arms race was waged between Google’s search quality algorithms and companies attacking the system for gain. For several years, Google implemented spam-fighting changes in its monthly index update. It generally aligned those updates to the lunar cycle. “Whenever the full moon was about to appear, people would start jonesing for a Google update,” says Cutts. The SEO community would nervously await changes that could potentially knock its links down the relevance chain. As soon as the new values were reflected in the scores, the SEO crowd would try to divine the logic behind the new algorithms and devise responses so the downgraded links could reclaim their previous rankings. This interaction was dubbed “the Google dance.” (Things got more complicated after the BART project switched index updates from batch-processed to incremental.)
Often the changes in ranking were slight and there were measures available to restore a link to former glory. But other times Google would identify behavior that it judged an attempt to exploit vulnerabilities in its ranking system and would adjust the system to shore up those weaknesses—relegating those using that method to the bottom of the results pile. Generally, the places that got such treatment had no business showing up in the upper reaches of results for popular keywords: they sneakily worked their way up by creating Potemkin villages full of “link farms” designed to pump up a PageRank. Nonetheless, companies whose sites were downgraded in that matter were often outraged. “It’s not like we’ve put all our eggs in one basket,” said
the president of an SEO company called WebGuerrilla to CNET in October 2002, “it’s just that there’s no other basket.” That was the month that a company called SearchKing sued Google after a bad night at the Google dance lowered its PageRank score from 8 to 4 and its business tanked. (In May 2003, a judge dismissed the suit, on the grounds that PageRank is essentially an opinion about a website—albeit an opinion expressed by algorithms—and thus was constitutionally protected.)
Cutts understood that the obscurity of the process could sour people on the company and took it upon himself to be the company’s conduit to the SEO world. Using the pseudonym “Google Guy,” Cutts would answer questions and try as best he could to dispel various conspiracy theories, many of them centered around the suspicion that a sure way to rise in search rankings was to buy ads from Google. But there was only so much he could tell. In large part because of the threat from spammers—as well as fear that the knowledge could benefit competitors—Google treated its search algorithms with utmost confidentiality. Over the years Cutts’s spam team grew considerably (as was typical for Google, Cutts wouldn’t specify the number). “I’m proud to say that web spam is much lower than it was a few years ago,” he says.
But Google’s approach had its cost. As the company gained a dominant market share in search—more than 70 percent in the United States, higher in some other countries—critics would be increasingly uncomfortable with the idea that they had to take Google’s word that it wasn’t manipulating its algorithm for business or competitive purposes. To defend itself, Google would characteristically invoke logic: any variance from the best possible results for its searchers would make the product less useful and drive people away, it argued. But it withheld the data that would prove that it was playing fair. Google was ultimately betting on maintaining the public trust. If you didn’t trust Google, how could you trust the world it presented in its results?
3
“If you’ve Googled it, you’ve researched it, and otherwise you haven’t.”
To get a sense of how far Google search advanced in the first six or seven years of the company, one could look through the eyes of Udi Manber.
Manber had watched it all happen, from the outside. He was born in the town of Kiryat Haim, north of Haifa in Israel. He spent so much time in the small library there that he knew nearly every volume in the collection. Manber loved telling visitors to the library which books they might enjoy and which ones might answer their questions. He studied information retrieval and eventually wound up at Yahoo where he brokered the Google deal, until he quit in disgust in 2002. His next job was as the leader of A9, a search start-up funded by Jeff Bezos. In February 2006, he accepted an offer from Google to become the czar of search engineering. It was like someone who worked on space science all his life finally arriving at NASA. “Suddenly I’m in charge of everybody asking questions in the whole world,” he says. “I thought I had a reasonable idea of the main problems facing search—what was minor and major. When I got here, I saw they solved many of the minor problems and made more headway on the major problems than I thought possible. Google hadn’t just said, ‘Here’s the state of the art, here’s what the textbooks say, let’s do it,’ they developed things from scratch and did it better.”
He was also amazed at how pampered employees were. Every search engineer had exclusive use of a set of servers that stored an index of the entire web—it was the digital equivalent of giving a physicist her own particle accelerator.
One of the first things that happened on Manber’s watch was something called Universal Search. In its first few years, Google had developed a number of specialized forms of search, known as verticals, for various corpuses—such as video, images, shopping catalogs, and locations (maps). Krishna Bharat had created one of those verticals called Google News, a virtual wire service with a front page determined not by editors but algorithms. Another vertical product, called Google Scholar, accessed academic journals. But to access those verticals, users had to choose the vertical. Page and Brin were pushing for a system where one search would find everything.
The key engineer in this project was David Bailey, who had worked with Manber at A9. Bailey was a Berkeley computer science PhD who had once worried that by following his interests—artificial intelligence and the way computers dealt with natural language—he was locking himself in a field with few practical applications. “I figured that no one is ever going to employ someone who’s got a PhD in those things because everybody knows that no computer application worth its salt would deal with plain English text.” That was before Google, which he joined in 2004.
At Google, he had the luxury to figure out what he wanted to do. He found himself in an office with Amit Singhal, Matt Cutts, and Ben Gomes (who’d been his buddy in grad school)—“definitely the cool kids’ office,” he says—and was bowled over by the rich conversations. He needed all the expertise he could find when he was assigned the task of augmenting Google search so that the results page included not only web results but hits from pictures, books, videos, and other sources. If Google really cared about “organizing and making accessible the world’s information,” as it continually boasted (to the point of arrogance, it seemed), it really had to expand its ten blue links beyond web pages. But the challenges were considerable, and several attempts at executing that vision had flopped. “It had become the project of death,” says Bailey.
Nonetheless, Bailey took on the task. He gathered together a team that included a bright product manager named Johanna Wright. Even though Universal Search was something that Larry Page had been urging for years, there was a lot of resistance. “There was definitely a momentum-gathering phase,” says Wright, “and finally there was a point where everyone wanted to work on the project, and it all came together.”
A big challenge in Universal Search was how to determine the relative value of information when it came from different places. Google had gotten pretty good at figuring out how to rank websites for a given query, and it had also learned a lot about ordering the corpus of pictures or video results to satisfy search requests. Every corpus had a different mix of signals. (Everything on the web, of course, had the benefit of linking information, but things such as videos did not have an equivalent.)
For Universal Search, though, Google had to figure out the relative weight to assign to different sets of signals. It became known as the apples-and-oranges problem. The answer, as with many things in Google, lay in determining context from the data in its logs—specifically in analyzing the long clicks in the past. “We have a lot of signals that tell us the intent of the queries,” says Wright. “There could be information in the query that tells us a news result is really relevant and extremely important, and then we’d put it on top of the page.” But clearly the solution involved decoding the intent of a query. In some cases, it turned out that Google’s signals in a given area weren’t effective enough. “It became an opportunity for us to revisit the rankings on those,” says Bailey. Eventually, they got to the point where Google, he says, “transformed the ranking problem to be apples to apples.”
A knottier problem turned out to be how to show these results on the page. Although Google could figure out that certain results—a video clip, a book, a picture, or a scholarly article—might be relevant to a request, the fact was that users mainly expected web links to dominate the results page.
When the Universal Search team showed a prototype to Google’s top executives, everyone realized that taking on the project of death had been worth it. The results in that early attempt were all in the wrong order, but the reaction was visceral—you typed in a word, and all this stuff came out. It had just never happened before. “It definitely was one of the riskier things,” says Bailey. “It was hard, because it’s not just science—there are some judgment calls involved here. We are to some degree using our gut. I still get up in the morning and am astonished that this whole thing even works.”
Google’s search now wasn’t just searchin
g the web. It was searching everything.
In his 1991 book, Mirror Worlds, Yale computer scientist David Gelernter sketched out a future where humans would interact, and transact, with modeled digital representations of the real world. Gelernter described these doppelgänger realities as “a true-to-life mirror image trapped inside a computer.” He made it a point to distinguish his vision from the trendy sci-fi sensation of the moment, virtual reality—fantasy simulations inside the computer as opposed to a digital companion of the physical world. “The whole point of a mirror world is that it’s wired in real time and place—it’s supposed to mirror reality rather than being a parallel reality or cyberworld,” he once said. But though Gelernter looked on the overall prospect of mirror worlds with enthusiasm, he worried as well. “I definitely feel ambivalent about mirror worlds. There are obvious risks of surveillance, but I think it poses deeper risks,” he said. His main concern was that mirror worlds would be steered by the geeky corporations who built them, as opposed to the public. “These risks should be confronted by society at large, not by techno-nerds,” he said. “I don’t trust them. They are not broad-minded and don’t know enough. They don’t know enough history, they don’t have enough of a feel for the nature of society. I think that’s a recipe for disaster.”
But like it or not, Google, the ultimate techno-nerd corporation, was building a mirror world. For many practical purposes, information not stored in the vast Google indexes, which contained, among other things, all the pages of the publicly available web, may as well not have existed. “I’d like to get it to a state that people think of it as ‘If you’ve Googled it, you’ve researched it, and otherwise you haven’t, and that’s it,’” says Sergey Brin.