Thursday, March 26, 2020

MICROBE FORECASTING

MICROBE FORECASTING


It was a large city. And it was hit hard. The first cases emerged in late August, and 
the victims suffered terribly. The earliest symptoms were profuse diarrhea and 
vomiting. They experienced severe dehydration, increased heart rate, muscle 
cramps, restlessness, severe thirst, and the loss of skin elasticity. Some of the 
cases progressed to kidney failure, while others led to coma or shock. Many of 
those who came down with the disease died. 
Then on the night of August 31, the outbreak truly broke. Over the next three 
days, 127 people in a single neighborhood died. And by September 10 the number 
of fatalities would reach 500. The epidemic seemed to spare no one. Children and 
adults alike were killed. Few families did not have at least one member who came 
down with the disease. 
The epidemic led to intense panic. Within a week, three-quarters of the 
neighborhood’s residents fled. Stores closed. Homes were locked. And you could 
walk down a formerly bustling urban street without seeing a single person. 
Early in the outbreak, a forty-year-old epidemiologist began an investigation to 
determine its source. He consulted community leaders and methodically inter- 
viewed families of the victims and made careful maps of every single case. Fol- 
lowing his hunch about a waterborne disease, he studied the sources of the 
community’s water and determined that it came from only one of two urban water 
utilities. He conducted microscopic and chemical analyses of specimens from the 
water system, which proved inconclusive. 
In his report to the responsible officials, he presented his analysis and con- 
cluded that contaminated water was to blame. Despite the lack of definitive results 
from the analyses, the mapping of cases strongly supported his conclusion that 
one particular water outlet was the source of the outbreak. He recommended shut- 
ting down the water supply, and the officials agreed. And while the outbreak may 
have already been in decline because of the mass exodus, that investigation and

water closure proved pivotal. 


What was unusual about this outbreak was not the procedural investigation that 
followed. Modern epidemiologists in countries throughout the world conduct ex- 
actly this kind of investigation regularly. They enlist the help of local leaders, study 
the distribution of cases, conduct analyses on potential sources, and then often 
argue with officials as to the best course of action. What was unusual was that the 
outbreak was in 1854—before the field of epidemiology existed. 
As you may have guessed, the investigator responsible for cracking the outbreak 
was none other than John Snow, the now famous London physician and clergyman 
considered one of the founders of contemporary epidemiology. The culprit was, of 
course, the bacteria Vibrio cholerae, or cholera. By finding that water was the source 
rather than “foul air,” Snow contributed to the modern germ theory of infectious 
diseases—that communicable diseases are caused by microbes. To this day, you 
can see a replica of the famous Broad Street pump that Snow identified as the 
source of the 1854 outbreak, in Soho, London. 
It seems intuitive to us today, but the way that Snow used interviews, case iden- 
tification, and mapping to chart the origin of the Broad Street cholera outbreak of 
1854 was revolutionary in its time. While maps had certainly been used extensively 
prior to 1854, the map he made of Soho is considered the first of its kind, not only 
in epidemiology but also in cartography. He was the first to utilize maps to analyze 
geographically related events to make a conclusion about causality—namely, that 
the Broad Street pump was the source of the outbreak. By doing so he has been 
credited with using the first geographic information system, or GIS, a now com- 
monly used cartographic system for capturing and analyzing geographic infor- 
mation. 


In contemporary GIS, layers of information are added to maps like Snow’s to 
provide depth of geographic information and to suggest patterns of causality. 
While Snow’s map included streets, homes, locations of illness and water sources, 
a contemporary version could include many more layers—genetic information 
from cholera specimens collected in different locations, dimensions of time that 
track changes spatially with an added weather layer or social connections between 
the individuals in the various homes. 
Modern GIS is among a range of contemporary tools that is radically changing 
the way that we investigate outbreaks and understand the transmission of diseases. 
When used in a coordinated and comprehensive way, these tools have the poten- 
tial to fundamentally change the way that we monitor for outbreaks and stop them 
in their tracks. 
We now have multiple scientific and technical advantages that Snow lacked in 
the mid-nineteenth century. Among the most profound is that we have significantly 
improved our capacity to catch the bugs we’re chasing and to document their 
diversity. The revolution in molecular biology, in particular the techniques for 
capturing and sequencing genetic information, has profoundly changed our ability 
to identify the microbes that surround us.
The map of London used by John Snow to find the source of the cholera outbreak. 

Miraculous but now standard techniques like the polymerase chain reaction 
(PCR), which resulted in the Nobel Prize for its discoverer Kary Mullis, allow us to 
snip out tiny pieces of genetic information from microbes and create billions of 
identical copies, whose sequences can then be read and sorted out according to 
the family of microbes to which they belong. Yet standard PCR requires that you 
know what you’re looking for. If, for example, we want to find an unknown malaria 
parasite, we can use PCR designed to identify malaria-specific sequence, since all 
malaria parasites have genetic regions that look similar enough to each other. But 
what if we don’t know what we’re looking for?

In the early 2000s, intent on finding unknown microbes, a bright young molec- 
ular biologist, Joe DeRisi, and his colleagues adapted an interesting technique 
developed by DeRisi’s doctoral adviser, Pat Brown, a Stanford biochemist. The 
DNA microarray chip consisted of thousands of tiny bits of distinct artificial genetic 
sequence distributed in an orderly fashion across a small glass slide. Since genetic 
information sticks to its mirror image sequence, if you flush solution from a
specimen containing genetic information across a slide like this, the bits that 
match the designed sequences on the slide will fuse. You can then determine what 
was in the specimen by determining which of the sequences on the slide trapped 
their natural siblings. The technique had already provided thousands of scientists 
with a new way of characterizing the bits of genetic information that flow through 
living systems by the time DeRisi got his hands on it. 
Prior to DeRisi’s innovation, the microarray chips had been used primarily to 
help determine the internal workings of the genes of humans and animals, but De- 
Risi and his colleagues realized that the technique could be modified to create a 
powerful viral detection system. Instead of designing the chips with bits of artificial 
human genetic information, he and his colleagues designed chips with bits of viral 
genetic information. By carefully reviewing the scientific databases for genetic 
information on all of the viruses known to science, they crafted chips that had bits 
of genetic information from a whole range of viral families lined up in neat rows. If 
they introduced genetic information from a sick patient, and it contained a virus 
with a sequence similar to one on the chip, the sequence would be trapped and— 
bingo!—we’d know the bug we were dealing with. 
The viral microarray, as these specialized chips became known, have proliferated 
and spread to labs throughout the world. They’ve helped quickly identify the micro- 
bial villain responsible for new pandemics, like the coronavirus that causes SARS. 
Yet they are not perfect. These chips can only be made to capture viruses from 
families of viruses already known to science. If there are groups of viruses out 
there whose sequences we are completely unaware of, and there certainly are, then 
we have nothing with which to engineer the chips. Truly unknown viruses would 
slide right by. 


Within the past few years, viral microarrays have been supplemented with a series 
of bold new genetic sequencing approaches. New machines churn out mammoth 
amounts of sequence data from specimens—amounts of sequence that previously 
would have been prohibitively expensive or time consuming. These machines are 
permitting an entirely new form of viral discovery. 
Rather than look for particular bits of information, the approach is to take a 
specimen—say a drop of blood—and sequence every bit of genetic information it 
contains. Technically, it’s more complicated than that, but the result is similar to 
what you’d expect. We are approaching a moment when we will be able to read 
every single sequence within a given biological specimen. Every bit of DNA or RNA 
from the host specimen, and critically, every bit from the microbes that are riding 
along with them. 
One of the central problems becomes the bioinformatics—how to sort through 
all of the billions of bits of information that are produced by these incredible tech- 
nologies. Fortunately, in an enlightened move, scientists at the NIH picked up and 
nurtured an electronic repository of sequencing information developed at the 
famed Los Alamos National Laboratory and now called GenBank. Since scientists 
are required by funding sources and journals to submit sequences to GenBank 
prior to submitting academic papers, we collectively contribute billions of bits of 
genetic information each year. GenBank right now holds over a hundred billion bits 
of sequence information. And it’s growing rapidly. When a new sequence is iden- 
tified from a sequencing run, it can be rapidly compared electronically to what’s in 
GenBank to see if there’s a match. 
In late 2006 and early 2007 these techniques were used to good effect. In early 
December 2006 the organs of a patient who had died of a brain hemorrhage in 
Dandenong hospital in Australia were harvested for transplantation. A sixty-three- 
year-old grandmother received one of the kidneys, another unnamed recipient re- 
ceived the other kidney, and a sixty-four-year-old lecturer in a local university re- 
ceived the man’s liver. By early January all three had died. 
The local hospital and collaborating labs looked for all of the usual suspects. 
They utilized PCR and tried to grow up the microbe on culture media. They even 
tried one of the viral microarrays, to no avail. A virus was only found when the 
specimen was subjected to massive sequencing. The team that found it, led by Ian
Lipkin, a world-class laboratory virologist at Columbia University, had to sort 
through over a hundred thousand sequences to find the fourteen sequences be- 
longing to the mystery virus. Truly a needle in a haystack! The mystery virus ended 
up being in a group of viruses called arenaviruses that often live in rodents. With- 
out massive sequencing, the virus would not likely have been found. 


But while identifying what’s actually in a small new outbreak is vital, it’s only the 
beginning. As we get better and better at understanding what’s out there, we will 
have to start asking a tougher question: where is it going? Will it become a pan- 
demic? 
There are three primary objectives to the emerging science of pandemic preven- 
tion: 

1. We need to identify epidemics early. 
2. We need to assess the probability that they will grow into pandemics. 
3. We need to stop the deadly ones before they grow into pandemics. 

The viral microarray and sequencing techniques give us a snapshot of what is 
causing an epidemic, but more is needed to assess the possibility that a new agent 
in a limited outbreak has the right stuff to go pandemic. This is exactly the objec- 
tive of a new program being developed by DARPA, the U.S. Department of De- 
fense’s Advanced Research Projects Agency. DARPA has had a stunning impact on 
the contemporary world of technology, including sponsoring early research that 
has contributed in substantive ways to the development of modern computing, vir- 
tual reality, and the Internet itself. 
DARPA is developing a program called Prophecy, whose objective is to “suc- 
cessfully predict the natural evolution of any virus.” Prophecy seeks nothing less 
than to use technology to predict where an outbreak will go by combining it with 
the support of a team of local on-the-ground experts in hotspots around the world.

Predicting the future trajectory of a virus seems like science fiction, but DARPA 
does not shy away from high-risk/high-payoff ideas, and Prophecy falls clearly in 
that mold. Fortunately, what we know about pandemics and the technologies avail- 
able today bring the objectives it seeks within the realm of possibility. 
Cutting-edge experimental virologists like Raul Andino, at the University of Cali- 
fornia, San Francisco, works to determine rational predictions of the evolution of 
viruses. Viruses reproduce rapidly, so any viral infection, even if it’s the result of 
infection with a single viral particle, will rapidly develop into a swarm,¹ a group of 
viruses, some identical, but mostly mutants differing in one way or another from 
the parental strain that created them. By documenting and studying the way that 
the overall viral swarms respond to different environments, Andino and his col- 
leagues have worked to develop rational strategies for the production of vaccines 
that use live viruses, a subject we will return to in chapter 11. He also hopes to use 
the same information to determine the boundaries within which a swarm can 
evolve. Swarms can’t go in every direction, and getting a sense of what a swarm is 
composed of will help us get a sense of what it can evolve into. 
Another scientist working to change the ways we can forecast microbial evolu- 
tion is not a microbiologist at all but rather a physics-trained bioengineer. Steve 
Quake, an awardee of the same NIH Pioneer Program that has funded my own re- 
search, develops technology that permits us to study and manipulate life in sur- 
prising and incredibly useful ways. In the past ten years this jeans-wearing ski bum 
has spun off multiple companies, developed handfuls of patents, and published 
scores of papers in some of the highest-ranking journals—all while maintaining a 
successful teaching program at Stanford University. Among the useful innovations 
coming from Quake’s group are microfluidic platforms. Essentially, he’s produced 
entire laboratories on small laboratory chips. 
In one particularly notable application, he’s taken the tedious and complex work 
of cell culture, where cells from mammals and other organisms are grown under 
laboratory conditions, from the bench to the chip. The chips he and his team have 
created, just a few centimeters long, house ninety-six separate compartments
where cells grow for weeks at a time and can be carefully measured and manip- 
ulated. While there are many applications for having cell culture on an automated 
and compact chip, one of them is the speed and efficiency for evaluating new 
viruses from large numbers of specimens. It’s not difficult to imagine a chip-based 
system that quickly tells us in what kind of cells a new agent can survive and there- 
fore how it’s most likely to spread (e.g., by sex, blood, sneezes, and so on). 
When we see an outbreak, there are a number of questions we’d like to have an- 
swered. First, what’s the microbe behind it? Techniques like viral microarrays and 
high throughput sequencing are increasing the speed at which we can identify new 
agents and also helping us to find things that we’d have missed through older 
techniques. But once we’ve identified a microbe, we want to know where it’s going. 
We’ll return in chapter 12 to a vision of what the ultimate pandemic prevention sys- 
tem will look like, but it would certainly involve approaches like those developed by 
the Andino lab to assess the potential evolutionary directions that a virus can take. 
And the tools that Quake’s group has developed might one day form a set of 
high-speed chips that quickly evaluate how it’s likely to spread. 


Modern information and communication technology provides us with another set 
of tools that does something distinct and complementary to the biotech advances 
discussed above. In fact, some of this technology is sitting in your pocket as you 
read this sentence. 
In one of our research sites in southwest Cameroon sits a rubber plantation 
called Hevecam, where we conducted an experiment. This experiment represents 
one of the exciting new trends in public health. And it’s all based on simple cell 
phones. 
In Hevecam, a plantation with nearly a hundred thousand inhabitants, when 
individuals get sick they go to a local clinic. If they’re sufficiently ill, they then move 
from that local clinic to the referral hospital in the center of the plantation. Yet 
traditionally there has been no good way for the referral hospital to monitor what’s

happening in the local clinics. A few years ago Lucky Gunasekara, who now heads 
up our program on digital epidemiology, and his partners at the nonprofit Frontli- 
neSMS:Medic that he co-founded, set up a simple system based on text messages 
to allow the referral hospital to monitor what was occurring in the local clinics. By 
simply texting a series of preset codes, the vast majority of vital clinical information 
could be communicated up the medical hierarchy clearly, instantly, and efficiently. 
Using predetermined codes and simple text message forms, the local clinics could 
rapidly inform everyone else of how many cases of malaria, diarrhea, and other ill- 
nesses they were seeing. 
Simple technologies can have dramatic impact. With a few simple techniques, 
medical conditions at Hevecam could be monitored not only in the referral hos- 
pital but also remotely over a web dashboard for anyone with appropriate access. 
By allowing local clinicians or patients themselves the capacity to communicate, 
information can be accumulated, organized, and analyzed, leading to a much more 
rapid and localized sense of what’s going on during a health emergency. 
Something just like this occurred during the earthquake in Haiti in 2010. Im- 
mediately after the earthquake, organizations like Ushahidi² set up short, free 
codes to which people could text “help” messages. They then turned to the local 
DJs who, along with popular word of mouth, publicized the numbers. Amazingly, 
when the dust cleared, the statistical analysis of the text message distributions 
mapped accurately onto high-resolution aerial imagery of damage. Effectively, peo- 
ple’s text messages gave highly informative clues as to where the greatest damage 
occurred. More importantly for those in Haiti, the messages saved lives, with the 
critical information transmitted to the heroic rescue workers on the scene. 
Similar systems have been used during outbreaks, such as the cholera outbreak 
in Haiti in the fall of 2010. The ultimate hope is that outbreak detection can be 
crowdsourced, with small bits of information provided by sufferers that converges 
into a real-time picture of the beginnings of outbreaks and their subsequent 
spread. The short codes are only the start. As more and more countries adopt elec- 
tronic medical records, people around the world will increasingly link to them

including the critical week of February 3, 2008, when a 5.9 magnitude earthquake 
occurred in the Lake Kivu region. By establishing a baseline for the frequency of 
calls, Eagle and his team were able to see telltale clues of unusual calling patterns 
during the period immediately following the earthquake. They were able to detect 
the time of the quake through a peak in call numbers. They were also able to estab- 
lish the epicenter of the quake by using location data from cell towers, placing the 
epicenter central to the locations of the heaviest call volumes. 
The idea that using data derived from cell phones can detect an earthquake in 
space and time is amazing. It also suggests a range of different applications. Indi- 
viduals who are ill may have fundamentally different call patterns than those that 
are not, and call patterns may also alter as a new outbreak spreads. Analyses of call 
data records alone might not provide perfect early detection of a new outbreak, but 
combined with other sources of outbreak data from organizations like ours and 
other health institutions, it might help us chart early epidemic spread. 


Cell phones are growing more ubiquitous by the day and will likely be critical tools 
in helping to detect and respond quickly to outbreaks before they become pan- 
demics. Yet they are not the only technology-heavy solutions being used in the 
growing field of digital surveillance. In 2009 my colleagues at Google³ published a 
fascinating paper showing that individuals’ online search patterns also provide a 
sense of what people are becoming infected with. 
With the vast stores of search data kept by Google and US influenza surveil- 
lance data collected by the CDC, the team was able to calibrate their system to 
determine the key search words that sick people or their caregivers used to indicate 
the presence of illness. The team used searches on words related to influenza and 
its symptoms and remedies to establish a system that accurately tracked the in- 
fluenza statistics generated by the CDC. In fact, they did better. Since Google 
search data is available immediately, and CDC influenza surveillance data lags

because of time needed for reporting and posting, Google was able to beat the 
CDC in providing accurate influenza trends before the traditional surveillance sys- 
tem. 
Early data on seasonal influenza, as provided by the Google Flu Trends system, 
is interesting and potentially important. This early data provides health organi- 
zations time to order medications and prepare for different triage needs. But early 
detection of seasonal influenza is not the Holy Grail. That honor would go to a sys- 
tem that could detect a newly emerging pandemic. Google is now working to ex- 
tend its influenza findings to other kinds of diseases. As more and more people 
use search engines like Google, and more and more data is acquired, the hope is 
that better and better trend analyses will be developed for agents other than in- 
fluenza. Perhaps at some point a community experiencing the beginning of a pan- 
demic will signal its arrival just by Googling. 


The explosion of online social media provides another set of big data in which 
weak but potentially valuable signals of a coming plague may be found. Computer 
scientists, like Vasileios Lampos and Nello Cristianini from the University of Bris- 
tol, have taken a similar approach as the scientists at Google, sorting through hun- 
dreds of millions of Twitter messages. Like their colleagues at Google, Lampos and 
Cristianini used key words to watch trends in Twitter and find associations with in- 
fluenza statistics, in this case provided by the UK’s Health Protection Agency. 
In 2009 they tracked the frequency of tweets related to influenza during the 
H1N1 pandemic and found they were able to track the official health data with 97 
percent accuracy. As with the findings by the Google Flu Trends team, this work 
provides a rapid and potentially inexpensive way to supplement traditional epi- 
demiological data gathering. It also has the potential to be extended to more than 
just influenza. 
While online social media can be scanned to see what people are communi- 
cating about, online social networking may provide a richer and subtler range of

possible uses. In fascinating recent work, two leading social scientists, Nicholas 
Christakis and James Fowler, have studied how social networks can inform surveil- 
lance for infectious diseases. 
In a clever experiment, these two scientists followed Harvard students who were 
divided into two groups. The first group was randomly selected from the Harvard 
student population. The second group was chosen from individuals that the first 
group named as friends. Because individuals near the center of a social network 
are likely to be infected sooner than those on the periphery, Christakis and Fowler 
hypothesized that during an outbreak the friend group would become infected 
sooner than the random, and therefore on average less socially central, group. The 
results were dramatic. During an influenza outbreak in 2009, the friend group be- 
came infected on average fourteen days ahead of the randomly chosen group. 
The hope is that social science can identify novel kinds of sentinels to monitor 
for new outbreaks and catch them early.⁴ Determining friends would be time con- 
suming, however—something we could accomplish on a single college campus 
but perhaps not nationally. Now self-identified friends in massive online social net- 
works may make this task much easier. Online social networks like Facebook, 
while not designed to help monitor for outbreaks, have created relatively easy-to- 
mon-itor systems that can be mined to determine the frequency of illness, identify 
social sentinels, and perhaps eventually provide predictions for spread of a new 
agent within a community. 


When John Snow created the first Geographic Information System in 1854, he took 
actions that would seem very logical and straightforward to us today. He took a 
map, he plotted where sick people were, and he plotted possible sources of conta- 
gion. Snow could not have predicted the directions in which his first tentative step 
would lead or the data that would eventually become available for today’s GIS. 
In the end it may be that no single data source reigns supreme. If Snow were 
alive today and investigating an outbreak, he’d want it all. He’d want to know where

the sick people were, and he’d be glad to get the data more quickly and easily 
through text messages or Internet searches. He’d like to know exactly what cases 
were infected with, down to the very specific microbial genetic strain. He’d seek to 
use call data records to monitor people’s movements in order to track the move- 
ment of the disease or where it was seeded. He’d like to know how people were 
connected socially, and he’d certainly follow individuals who were likely to become 
infected first or show signs earlier than the rest. 
You can imagine the ultimate outbreak GIS, or in terms more familiar to Silicon 
Valley, what Lucky Gunasekara, the head of my data team, calls the ultimate out- 
break mash-up: a map with layer after layer of critical information—where people 
are, what they’re concerned about, what they’re infected with, where they’re mov- 
ing, and who they’re connected to. Developing and maintaining this combined dig- 
ital and biological mash-up is the precise objective of Lucky’s team and something 
to which we’ll return in the final chapter of this book. Ideally, over time the data can 
be analyzed jointly, the various factors can be trained on actual outbreaks, and all 
the technology can be weighted optimally to maximize predictive power. 


When people ask me whether or not I’m optimistic about the future of predicting 
pandemics, the answer is always a resounding yes. Given the first two-thirds of this 
book, you may wonder if my optimism is warranted. A steady wave of intercon- 
nectedness among humans and animals has created a perfect storm for new pan- 
demics. That is true. Yet the interconnectedness among humans that now exists 
through communication and information technology gives us unprecedented 
capacity to catch outbreaks early, which, when combined with amazing advances in 
our ability to study the diversity of the tiny life forms that cause epidemics, cer- 
tainly makes optimism warranted. 
What will win out in the end? Will pandemics sweep through the human popu- 
lation destroying millions of lives? Will technology and science ride in to the res- 
cue?