The trouble with such “exceptional responders,” as Solit called them, was that they had traditionally been ignored, brushed off as random variations, attributed to errors in diagnosis or ascribed, simply, to extraordinary good fortune. The catchphrase attached to these case histories carried the stamp of ultimate scientific damnation: single patient anecdotes (of all words, scientists find the word anecdote particularly poisonous since it refers to a subjective memory). Medical journals have long refused to publish these reports. At scientific conferences when such cases were described, researchers generally rolled their eyes and avoided the topic. When the trials ended, these responders were formally annotated as “outliers,” and the drug was quietly binned.
But Solit wanted to understand these rare responses. These “exceptional responders,” he reasoned, might have some peculiar combination of factors—genes, behaviors, risk factors, environmental exposures—that had made them respond so briskly and durably. He decided to use the latest medical tools to understand their responses as deeply and comprehensively as possible. He had inverted a paradigm: rather than spending an enormous effort trying to figure out why a drug had commonly failed, as most of his colleagues might have, he would try to understand why it had occasionally succeeded. He would try to map the landscape of the valley of death—not by querying all those who had fallen into it, but by asking the one or two patients who had clambered out.
In 2012, Solit’s team published the first analysis of one such trial. Forty-four patients with advanced bladder cancer had been treated with a new drug called everolimus. The results had been uniformly disappointing. Some tumors may have shrunk a little, but none of the patients had showed a striking response. Then, in mid-April 2010, there was patient 45—a seventy-three-year-old woman with tumors filling her entire abdomen and invading her kidneys and lymph nodes. She started the medicine that month. Within weeks, her tumors had begun to involute. The mass invading the kidney necrosed and disappeared. Fifteen months later, when her CAT scans were checked again, her doctors had to squint hard to see any visible signs of tumor in her abdomen.
Solit focused on just that case. Reasoning that genes were likely involved, he pulled out patient 45’s tumor sample from the freezer and sequenced every gene to find the ones that were mutated (in most human cancers, between 10 to 150 genes can be mutated). The woman’s tumor had 140 mutations. Of all those, two stood out: one in a gene named TSC1 and another in a gene named NF2. Both these genes had been suspected to modulate the response to everolimus, but before Solit, no one had found formal proof of the link in human patients.
But this was still a “single patient anecdote”; scientists would still roll their eyes. Solit’s team now returned to the original trial and sequenced the same genes in the larger cohort of patients. A pattern emerged immediately. Four other patients who had mutations in the TSC1 gene had shown modest responses, while none of the other patients, with mutations in other genes but not in TSC1, had shown even a sliver of a response. Via just one variable—the mutation in the TSC1 gene—you could segregate the trial into moderate or strong responders versus nonresponders. “Single patient anecdotes are often dismissed,” Solit wrote. But here, exactly such an anecdote had turned out to be a portal to a new scientific direction. In a future trial, a cohort of patients might be sequenced up front, and only those with mutations in the TSC1 gene might be treated with the drug. Perhaps more important, the relationship between the gene and the susceptibility of the tumor cells opened a new series of scientific investigations into the mechanism for this selective vulnerability, leading to yet new trials and novel drugs.
But is it a law of medicine that such outliers will provide the most informative pieces of data in our attempt to revamp the core of our discipline? In Lewis Thomas’s time, such a law would have made no sense: there was nothing to “outlie.” The range of medical and surgical interventions was so severely limited that any assessment of variations in response was useless; if every patient with heart failure was destined to die, then it made little sense discriminating one from another (and even if some survived long term, no tools existed to investigate them). But this is precisely what has changed: pieces of data that do not fit our current models of illness have become especially important not only because we are reassessing the nature of our knowledge, but also because we are generating more such pieces of data every day. Think about the vast range of medicines and surgical procedures not as therapeutic interventions but as investigational probes. Think of every drug as a chemical tool—a molecular scalpel—that perturbs human physiology. Aspirin flicks off a switch in the inflammatory system. Lipitor tightens a screw on cholesterol metabolism. The more such investigational probes we use, the more likely we are to alter physiology. And the more we alter physiology, the more we will find variations in response, and thereby discover its hidden, inner logic.
....
One morning in the spring of 2015, I led a group of medical students at Columbia University on what I called “outlier rounds.” We were hunting for variant responses to wound healing. Most patients with surgical incisions heal their wounds in a week. But what about the few patients whose wounds don’t heal? We moved from room to room across the hospital, trying to find cases where postsurgical wounds had failed to heal. Most of these were predictable—elderly patients with complex surgical incisions, or diabetics, who are known to heal poorly. But after about nine such cases, we entered the room of a young woman recovering from an abdominal procedure whose incision was still raw and unhealed. The students looked puzzled. Nothing about this woman, or her incision, seemed any different from the hundreds of others that had healed perfectly. After a long pause they began to ask questions. One of them asked about her family history: Had anyone else in her family had a similar experience? Another wondered if he might swab the tissue to check for unusual, indolent infections. The orthodox models of wound healing were coming apart at the seams, I suspected, and a novel way of thinking about an old problem was being born.
We have spent much of our time in medicine dissecting and understanding what we might call the “inlier” problem. By “inliers,” I am referring to the range of normalcy; we have compiled a vast catalog of normal physiological parameters: blood pressure, height, body mass, metabolic rate. Even pathological states are described in terms that have been borrowed from normalcy: there is an average diabetic, a typical case of heart failure, and a standard responder to cancer chemotherapy.
But we have little understanding of what makes an individual lie outside the normal range. “Inliers” allow us to create rules—but “outliers” act as portals to understand deeper laws. The standard formula—height (in cms) − 100 = average weight plus 10 percent (in kgs)—is a rule that works for most of the human population. But it takes a single encounter with a person with genetic dwarfism to know that there are genes that control this relationship and that mutations can disrupt it quite acutely.
In his 1934 book, The Logic of Scientific Discovery, the philosopher Karl Popper proposed a crucial criterion for distinguishing a scientific system from an unscientific one. The fundamental feature of a scientific system, Popper argued, is not that its propositions are verifiable, but that its propositions are falsifiable—i.e., every theory carries an inherent possibility of proving it false. A theory or proposition can only be judged “scientific” if it carries within it a prediction or observation that will prove it false. Theories that fail to generate such “falsifiable” conjectures are not scientific. If medicine is to become a bona fide science, then we will have to take up every opportunity to falsify its models, so that they can be replaced by new ones.
....
LAW THREE
* * *
For every perfect medical experiment, there is a perfect human bias.
In the summer of 2003, I finished my three-year residency in internal medicine and began a fellowship in oncology. It was an exhilarating time. The Human Genome Project had laid the foundation for the new sc
ience of genomics—the study of the entire genome. Although frequent criticism of the project appeared in the media—it had not lived up to its promises, some complained—it was nothing short of a windfall for cancer biology. Cancer is a genetic disease, an illness caused by mutations in genes. Until that time, most scientists had examined cancer cells one gene at a time. With the advent of new technologies to examine thousands of genes in parallel, the true complexity of cancers was becoming evident. The human genome has about twenty-four thousand genes in total. In some cancers, up to a hundred and twenty genes were altered—one in every two hundred genes—while in others, only two or three genes were mutated. (Why do some cancers carry such complexity, while others are genetically simpler? Even the questions—not just the answers—thrown up by the genome-sequencing project were unexpected.)
More important, the capacity to examine thousands of genes in parallel, without making any presuppositions about the mutant genes, allowed researchers to find novel, previously unknown genetic associations with cancer. Some of the newly discovered mutations in cancer were truly unexpected: the genes did not control growth directly, but affected the metabolism of nutrients or chemical modifications of DNA. The transformation has been likened to the difference between measuring one point in space versus looking at an entire landscape—but it was more. Looking at cancer before genome sequencing was looking at the known unknown. With genome sequencing at hand, it was like encountering the unknown unknown.
Much of the excitement around the discovery of these genes was driven by the idea that these could open new vistas for cancer treatment. If cancer cells were dependent on the mutant genes for their survival or growth—“addicted” to the mutations, as biologists liked to describe it—then targeting these addictions with specific molecules might force cancer cells to die. The battle-ax chemical poisons of cellular growth would become obsolete at last. The most spectacular example of one such drug, Gleevec, for a variant of leukemia, had galvanized the entire field. I still recall the first patient whom I treated with Gleevec, a fifty-six-year-old man whose bone marrow had been so eaten by leukemia that he had virtually no platelets left and would bleed profusely from every biopsy that we performed. A fellow had to meet Mr. K with a brick-size pack of sterile gauze pads in the exam room and press on his biopsy site for half an hour to prevent bleeding. About four weeks after he started treatment with Gleevec, it was my turn to perform his biopsy. I came prepared with the requisite armfuls of gauze, dreading the half-hour ordeal—except when I withdrew the needle, the wound stopped bleeding by itself. Through that nick of the skin, its edges furling with a normal-looking clot, I could see the birth of a revolution in cancer treatment.
Around the first week of my fellowship, I learned that another such drug, a molecular cousin of Gleevec’s, was being tested in our hospital for a different form of cancer. The drug had shown promising effects in animal models and in early human experiments—and an early trial was forging ahead with human patients.
I had inherited a group of patients on the trial from a former fellow who had graduated from the program. Even a cursory examination of the trial patients on my roster indicated a spectacular response rate. One woman, with a massive tumor in her belly, found the masses melting away in a few weeks. Another patient had a dramatic reduction in pain from his metastasis. The other fellows, too, were witnessing similarly dramatic responses in their patients. We spoke reverentially about the drug, its striking response rate, and how it might change the landscape for the treatment of cancer.
Yet six months later, the overall results of the study revealed a surprising disappointment. Far from the 70 or 80 percent response rates that we had been expecting from our data, the overall rate was an abysmal 15 percent. The mysterious discrepancy made no sense, but the reason behind it became evident over the next few weeks when we looked deeply at the data. The oncology fellowship runs for three years, and every graduating batch of fellows passes on some patients from his or her roster to the new batch and assigns the rest to the more experienced attending physicians in the hospital. Whether a patient gets passed on to a fellow or an attending doctor is a personal decision. The only injunction is that a patient who get reassigned to a new fellow must be a case of “educational value.”
In fact, every patient moved to the new fellows was a drug responder, while all patients shunted to the attending physicians were nonresponders. Concerned that the new fellows would be unable to handle the more complex medical needs of men and women with no drug response—patients with the most treatment-resistant, recalcitrant variants of the disease—the graduating fellows had moved all the nonresponding patients to more experienced attending physicians. The assignment had no premeditated bias, yet the simple desire to help patients had sharply distorted the experiment.
....
Every science suffers from human biases. Even as we train massive machines to collect, store, and manipulate data for us, humans are the final observers, interpreters, and arbiters of that data. In medicine, the biases are particularly acute for two reasons. The first is hope: we want our medicines to work. Hope is a beautiful thing in medicine—its most tender center—but it is also the most dangerous. Few stories involving the mix of hope and illusion in medicine are more tragic, or more long-drawn, than that of the radical mastectomy.
By the early 1900s, during the brisk efflorescence of modern surgery, surgeons had devised meticulous operations to remove malignant tumors from the breast. Many women with cancer were cured by these surgical “extirpations”—yet, despite surgery, some women still relapsed with metastasis all over their bodies. This postsurgical relapse preoccupied great surgical minds. In Baltimore, the furiously productive surgeon William Halsted argued that malignant tissue left behind during the original surgery caused this relapse. He described breast-cancer surgery as an “unclean operation.” Scattered scraps of tumor left behind by the surgeon, he argued, were the reason for the metastatic spread.
Halsted’s hypothesis was logically coherent—but incorrect. For most women with breast cancer, the real reason for postsurgical relapse was not the local outgrowth of remnant scraps of malignant tissue. Rather, the cancer had migrated out of the breast long before surgery. Cancer cells, contrary to Halsted’s expectations, did not circle in orderly metastatic parabolas around the original tumor; their spread through the body was more capricious and unpredictable. But Halsted was haunted by the “unclean operation.” To test his theory of the local spread of cancer, he amputated not just the breast, but a vast mass of underlying tissue, including the muscles that move the arm and the shoulders and the deep lymph nodes in the chest, all in an effort to “cleanse” the site of the operation.
Halsted called the procedure a radical mastectomy, using the word radical in its original meaning from the Latin word for “root”; his aggressive mastectomy was meant to pull cancer out by its roots from the body. In time, though, the word itself would metastasize in meaning and transform into one of the most inscrutable sources of bias. Halsted’s students—and women with breast cancer—came to think of the word radical in its second meaning: “brazen, innovative, bold.” What surgeon or woman, faced with a lethal, relapsing disease, would choose the nonradical mastectomy? Untested and uncontested, a theory became a law: no surgeon was willing to run a trial for a surgical operation that he knew would work. Halsted’s proposition ossified into surgical doctrine. Cutting more had to translate into curing more.
Yet women relapsed—not occasionally, either, but in large numbers. In the 1940s, a small band of insurgent surgeons—most prominently Geoffrey Keynes in London—tried to challenge the core logic of the radical mastectomy, but to little avail. In 1980, nearly eight decades after Halsted’s first operation, a randomized trial comparing radical mastectomy with a more conservative surgery was formally launched. (Bernie Fisher, the surgeon leading the trial, wrote, “In God we trust. All others must bring data.”) Even that trial barely limped to its conclusion. Captivated by the logic and
bravura of radical surgery, American surgeons were so reluctant to put the procedure to test that enrollment in the control arm faltered. Surgeons from Canada and other nations had to be persuaded to help complete the study.
The results were strikingly negative. Women with the radical procedure suffered a host of debilitating complications, but received no benefits: their chance of relapsing with metastatic disease was identical to that of women treated with more conservative surgery, coupled with local radiation. Breast cancer patients had been ground in the crucible of radical surgery for no real reason. The result was so destabilizing to the field that the trial was revisited in the 1990s, and again in 2000; more than two decades later, there was still no difference in outcome. It is hard to measure the full breadth of its effects, but roughly one hundred thousand to five hundred thousand women were treated with radical mastectomies between 1900 and 1985. The procedure is rarely, if ever, performed today.
....
In retrospect, the sources of bias in radical surgery are easy to spot: a powerful surgeon obsessed with innovation, a word that mutated in meaning, a generation of women forced to trust a physician’s commands, and a culture of perfection that was often resistant to criticism. But other sources of bias in medicine are far more difficult to identify because they are more subtle. Unlike in virtually any of the other sciences, in medicine the subject—i.e., the patient—is not passive, but an active participant in an experiment. In the atomic world, Heisenberg’s uncertainty principle holds that the position and momentum of a particle cannot simultaneously be measured with absolute accuracy. If you send a wave of light to measure the position of a particle, Heisenberg reasoned, then the wave’s hitting the particle changes its momentum, and thereby its position, and so forth ad infinitum; you cannot measure both with absolute certainty. Medicine has its own version of “Heisenbergian” uncertainty: when you enroll a patient in a study, you inevitably alter the nature of the patient’s psyche and, therefore, alter the study. The device used to measure the subject transforms the nature of the subject.