pulse news 24: MICROBE FORECASTING

MICROBE FORECASTING

It was a large city. And it was hit hard. The first cases emerged in late August, and

the victims suffered terribly. The earliest symptoms were profuse diarrhea and

vomiting. They experienced severe dehydration, increased heart rate, muscle

cramps, restlessness, severe thirst, and the loss of skin elasticity. Some of the

cases progressed to kidney failure, while others led to coma or shock. Many of

those who came down with the disease died.

Then on the night of August 31, the outbreak truly broke. Over the next three

days, 127 people in a single neighborhood died. And by September 10 the number

of fatalities would reach 500. The epidemic seemed to spare no one. Children and

adults alike were killed. Few families did not have at least one member who came

down with the disease.

The epidemic led to intense panic. Within a week, three-quarters of the

neighborhood’s residents fled. Stores closed. Homes were locked. And you could

walk down a formerly bustling urban street without seeing a single person.

Early in the outbreak, a forty-year-old epidemiologist began an investigation to

determine its source. He consulted community leaders and methodically inter-

viewed families of the victims and made careful maps of every single case. Fol-

lowing his hunch about a waterborne disease, he studied the sources of the

community’s water and determined that it came from only one of two urban water

utilities. He conducted microscopic and chemical analyses of specimens from the

water system, which proved inconclusive.

In his report to the responsible officials, he presented his analysis and con-

cluded that contaminated water was to blame. Despite the lack of definitive results

from the analyses, the mapping of cases strongly supported his conclusion that

one particular water outlet was the source of the outbreak. He recommended shut-

ting down the water supply, and the officials agreed. And while the outbreak may

have already been in decline because of the mass exodus, that investigation and

water closure proved pivotal.

What was unusual about this outbreak was not the procedural investigation that

followed. Modern epidemiologists in countries throughout the world conduct ex-

actly this kind of investigation regularly. They enlist the help of local leaders, study

the distribution of cases, conduct analyses on potential sources, and then often

argue with officials as to the best course of action. What was unusual was that the

outbreak was in 1854—before the field of epidemiology existed.

As you may have guessed, the investigator responsible for cracking the outbreak

was none other than John Snow, the now famous London physician and clergyman

considered one of the founders of contemporary epidemiology. The culprit was, of

course, the bacteria Vibrio cholerae, or cholera. By finding that water was the source

rather than “foul air,” Snow contributed to the modern germ theory of infectious

diseases—that communicable diseases are caused by microbes. To this day, you

can see a replica of the famous Broad Street pump that Snow identified as the

source of the 1854 outbreak, in Soho, London.

It seems intuitive to us today, but the way that Snow used interviews, case iden-

tification, and mapping to chart the origin of the Broad Street cholera outbreak of

1854 was revolutionary in its time. While maps had certainly been used extensively

prior to 1854, the map he made of Soho is considered the first of its kind, not only

in epidemiology but also in cartography. He was the first to utilize maps to analyze

geographically related events to make a conclusion about causality—namely, that

the Broad Street pump was the source of the outbreak. By doing so he has been

credited with using the first geographic information system, or GIS, a now com-

monly used cartographic system for capturing and analyzing geographic infor-

mation.

In contemporary GIS, layers of information are added to maps like Snow’s to

provide depth of geographic information and to suggest patterns of causality.

While Snow’s map included streets, homes, locations of illness and water sources,

a contemporary version could include many more layers—genetic information

from cholera specimens collected in different locations, dimensions of time that

track changes spatially with an added weather layer or social connections between

the individuals in the various homes.

Modern GIS is among a range of contemporary tools that is radically changing

the way that we investigate outbreaks and understand the transmission of diseases.

When used in a coordinated and comprehensive way, these tools have the poten-

tial to fundamentally change the way that we monitor for outbreaks and stop them

in their tracks.

We now have multiple scientific and technical advantages that Snow lacked in

the mid-nineteenth century. Among the most profound is that we have significantly

improved our capacity to catch the bugs we’re chasing and to document their

diversity. The revolution in molecular biology, in particular the techniques for

capturing and sequencing genetic information, has profoundly changed our ability

to identify the microbes that surround us.

The map of London used by John Snow to find the source of the cholera outbreak.

Miraculous but now standard techniques like the polymerase chain reaction

(PCR), which resulted in the Nobel Prize for its discoverer Kary Mullis, allow us to

snip out tiny pieces of genetic information from microbes and create billions of

identical copies, whose sequences can then be read and sorted out according to

the family of microbes to which they belong. Yet standard PCR requires that you

know what you’re looking for. If, for example, we want to find an unknown malaria

parasite, we can use PCR designed to identify malaria-specific sequence, since all

malaria parasites have genetic regions that look similar enough to each other. But

what if we don’t know what we’re looking for?

In the early 2000s, intent on finding unknown microbes, a bright young molec-

ular biologist, Joe DeRisi, and his colleagues adapted an interesting technique

developed by DeRisi’s doctoral adviser, Pat Brown, a Stanford biochemist. The

DNA microarray chip consisted of thousands of tiny bits of distinct artificial genetic

sequence distributed in an orderly fashion across a small glass slide. Since genetic

information sticks to its mirror image sequence, if you flush solution from a

specimen containing genetic information across a slide like this, the bits that

match the designed sequences on the slide will fuse. You can then determine what

was in the specimen by determining which of the sequences on the slide trapped

their natural siblings. The technique had already provided thousands of scientists

with a new way of characterizing the bits of genetic information that flow through

living systems by the time DeRisi got his hands on it.

Prior to DeRisi’s innovation, the microarray chips had been used primarily to

help determine the internal workings of the genes of humans and animals, but De-

Risi and his colleagues realized that the technique could be modified to create a

powerful viral detection system. Instead of designing the chips with bits of artificial

human genetic information, he and his colleagues designed chips with bits of viral

genetic information. By carefully reviewing the scientific databases for genetic

information on all of the viruses known to science, they crafted chips that had bits

of genetic information from a whole range of viral families lined up in neat rows. If

they introduced genetic information from a sick patient, and it contained a virus

with a sequence similar to one on the chip, the sequence would be trapped and—

bingo!—we’d know the bug we were dealing with.

The viral microarray, as these specialized chips became known, have proliferated

and spread to labs throughout the world. They’ve helped quickly identify the micro-

bial villain responsible for new pandemics, like the coronavirus that causes SARS.

Yet they are not perfect. These chips can only be made to capture viruses from

families of viruses already known to science. If there are groups of viruses out

there whose sequences we are completely unaware of, and there certainly are, then

we have nothing with which to engineer the chips. Truly unknown viruses would

slide right by.

Within the past few years, viral microarrays have been supplemented with a series

of bold new genetic sequencing approaches. New machines churn out mammoth

amounts of sequence data from specimens—amounts of sequence that previously

would have been prohibitively expensive or time consuming. These machines are

permitting an entirely new form of viral discovery.

Rather than look for particular bits of information, the approach is to take a

specimen—say a drop of blood—and sequence every bit of genetic information it

contains. Technically, it’s more complicated than that, but the result is similar to

what you’d expect. We are approaching a moment when we will be able to read

every single sequence within a given biological specimen. Every bit of DNA or RNA

from the host specimen, and critically, every bit from the microbes that are riding

along with them.

One of the central problems becomes the bioinformatics—how to sort through

all of the billions of bits of information that are produced by these incredible tech-

nologies. Fortunately, in an enlightened move, scientists at the NIH picked up and

nurtured an electronic repository of sequencing information developed at the

famed Los Alamos National Laboratory and now called GenBank. Since scientists

are required by funding sources and journals to submit sequences to GenBank

prior to submitting academic papers, we collectively contribute billions of bits of

genetic information each year. GenBank right now holds over a hundred billion bits

of sequence information. And it’s growing rapidly. When a new sequence is iden-

tified from a sequencing run, it can be rapidly compared electronically to what’s in

GenBank to see if there’s a match.

In late 2006 and early 2007 these techniques were used to good effect. In early

December 2006 the organs of a patient who had died of a brain hemorrhage in

Dandenong hospital in Australia were harvested for transplantation. A sixty-three-

year-old grandmother received one of the kidneys, another unnamed recipient re-

ceived the other kidney, and a sixty-four-year-old lecturer in a local university re-

ceived the man’s liver. By early January all three had died.

The local hospital and collaborating labs looked for all of the usual suspects.

They utilized PCR and tried to grow up the microbe on culture media. They even

tried one of the viral microarrays, to no avail. A virus was only found when the

specimen was subjected to massive sequencing. The team that found it, led by Ian

Lipkin, a world-class laboratory virologist at Columbia University, had to sort

through over a hundred thousand sequences to find the fourteen sequences be-

longing to the mystery virus. Truly a needle in a haystack! The mystery virus ended

up being in a group of viruses called arenaviruses that often live in rodents. With-

out massive sequencing, the virus would not likely have been found.

But while identifying what’s actually in a small new outbreak is vital, it’s only the

beginning. As we get better and better at understanding what’s out there, we will

have to start asking a tougher question: where is it going? Will it become a pan-

demic?

There are three primary objectives to the emerging science of pandemic preven-

tion:

1. We need to identify epidemics early.

2. We need to assess the probability that they will grow into pandemics.

3. We need to stop the deadly ones before they grow into pandemics.

The viral microarray and sequencing techniques give us a snapshot of what is

causing an epidemic, but more is needed to assess the possibility that a new agent

in a limited outbreak has the right stuff to go pandemic. This is exactly the objec-

tive of a new program being developed by DARPA, the U.S. Department of De-

fense’s Advanced Research Projects Agency. DARPA has had a stunning impact on

the contemporary world of technology, including sponsoring early research that

has contributed in substantive ways to the development of modern computing, vir-

tual reality, and the Internet itself.

DARPA is developing a program called Prophecy, whose objective is to “suc-

cessfully predict the natural evolution of any virus.” Prophecy seeks nothing less

than to use technology to predict where an outbreak will go by combining it with

the support of a team of local on-the-ground experts in hotspots around the world.

Predicting the future trajectory of a virus seems like science fiction, but DARPA

does not shy away from high-risk/high-payoff ideas, and Prophecy falls clearly in

that mold. Fortunately, what we know about pandemics and the technologies avail-

able today bring the objectives it seeks within the realm of possibility.

Cutting-edge experimental virologists like Raul Andino, at the University of Cali-

fornia, San Francisco, works to determine rational predictions of the evolution of

viruses. Viruses reproduce rapidly, so any viral infection, even if it’s the result of

infection with a single viral particle, will rapidly develop into a swarm,¹ a group of

viruses, some identical, but mostly mutants differing in one way or another from

the parental strain that created them. By documenting and studying the way that

the overall viral swarms respond to different environments, Andino and his col-

leagues have worked to develop rational strategies for the production of vaccines

that use live viruses, a subject we will return to in chapter 11. He also hopes to use

the same information to determine the boundaries within which a swarm can

evolve. Swarms can’t go in every direction, and getting a sense of what a swarm is

composed of will help us get a sense of what it can evolve into.

Another scientist working to change the ways we can forecast microbial evolu-

tion is not a microbiologist at all but rather a physics-trained bioengineer. Steve

Quake, an awardee of the same NIH Pioneer Program that has funded my own re-

search, develops technology that permits us to study and manipulate life in sur-

prising and incredibly useful ways. In the past ten years this jeans-wearing ski bum

has spun off multiple companies, developed handfuls of patents, and published

scores of papers in some of the highest-ranking journals—all while maintaining a

successful teaching program at Stanford University. Among the useful innovations

coming from Quake’s group are microfluidic platforms. Essentially, he’s produced

entire laboratories on small laboratory chips.

In one particularly notable application, he’s taken the tedious and complex work

of cell culture, where cells from mammals and other organisms are grown under

laboratory conditions, from the bench to the chip. The chips he and his team have

created, just a few centimeters long, house ninety-six separate compartments

where cells grow for weeks at a time and can be carefully measured and manip-

ulated. While there are many applications for having cell culture on an automated

and compact chip, one of them is the speed and efficiency for evaluating new

viruses from large numbers of specimens. It’s not difficult to imagine a chip-based

system that quickly tells us in what kind of cells a new agent can survive and there-

fore how it’s most likely to spread (e.g., by sex, blood, sneezes, and so on).

When we see an outbreak, there are a number of questions we’d like to have an-

swered. First, what’s the microbe behind it? Techniques like viral microarrays and

high throughput sequencing are increasing the speed at which we can identify new

agents and also helping us to find things that we’d have missed through older

techniques. But once we’ve identified a microbe, we want to know where it’s going.

We’ll return in chapter 12 to a vision of what the ultimate pandemic prevention sys-

tem will look like, but it would certainly involve approaches like those developed by

the Andino lab to assess the potential evolutionary directions that a virus can take.

And the tools that Quake’s group has developed might one day form a set of

high-speed chips that quickly evaluate how it’s likely to spread.

Modern information and communication technology provides us with another set

of tools that does something distinct and complementary to the biotech advances

discussed above. In fact, some of this technology is sitting in your pocket as you

read this sentence.

In one of our research sites in southwest Cameroon sits a rubber plantation

called Hevecam, where we conducted an experiment. This experiment represents

one of the exciting new trends in public health. And it’s all based on simple cell

phones.

In Hevecam, a plantation with nearly a hundred thousand inhabitants, when

individuals get sick they go to a local clinic. If they’re sufficiently ill, they then move

from that local clinic to the referral hospital in the center of the plantation. Yet

traditionally there has been no good way for the referral hospital to monitor what’s

happening in the local clinics. A few years ago Lucky Gunasekara, who now heads

up our program on digital epidemiology, and his partners at the nonprofit Frontli-

neSMS:Medic that he co-founded, set up a simple system based on text messages

to allow the referral hospital to monitor what was occurring in the local clinics. By

simply texting a series of preset codes, the vast majority of vital clinical information

could be communicated up the medical hierarchy clearly, instantly, and efficiently.

Using predetermined codes and simple text message forms, the local clinics could

rapidly inform everyone else of how many cases of malaria, diarrhea, and other ill-

nesses they were seeing.

Simple technologies can have dramatic impact. With a few simple techniques,

medical conditions at Hevecam could be monitored not only in the referral hos-

pital but also remotely over a web dashboard for anyone with appropriate access.

By allowing local clinicians or patients themselves the capacity to communicate,

information can be accumulated, organized, and analyzed, leading to a much more

rapid and localized sense of what’s going on during a health emergency.

Something just like this occurred during the earthquake in Haiti in 2010. Im-

mediately after the earthquake, organizations like Ushahidi² set up short, free

codes to which people could text “help” messages. They then turned to the local

DJs who, along with popular word of mouth, publicized the numbers. Amazingly,

when the dust cleared, the statistical analysis of the text message distributions

mapped accurately onto high-resolution aerial imagery of damage. Effectively, peo-

ple’s text messages gave highly informative clues as to where the greatest damage

occurred. More importantly for those in Haiti, the messages saved lives, with the

critical information transmitted to the heroic rescue workers on the scene.

Similar systems have been used during outbreaks, such as the cholera outbreak

in Haiti in the fall of 2010. The ultimate hope is that outbreak detection can be

crowdsourced, with small bits of information provided by sufferers that converges

into a real-time picture of the beginnings of outbreaks and their subsequent

spread. The short codes are only the start. As more and more countries adopt elec-

tronic medical records, people around the world will increasingly link to them

including the critical week of February 3, 2008, when a 5.9 magnitude earthquake

occurred in the Lake Kivu region. By establishing a baseline for the frequency of

calls, Eagle and his team were able to see telltale clues of unusual calling patterns

during the period immediately following the earthquake. They were able to detect

the time of the quake through a peak in call numbers. They were also able to estab-

lish the epicenter of the quake by using location data from cell towers, placing the

epicenter central to the locations of the heaviest call volumes.

The idea that using data derived from cell phones can detect an earthquake in

space and time is amazing. It also suggests a range of different applications. Indi-

viduals who are ill may have fundamentally different call patterns than those that

are not, and call patterns may also alter as a new outbreak spreads. Analyses of call

data records alone might not provide perfect early detection of a new outbreak, but

combined with other sources of outbreak data from organizations like ours and

other health institutions, it might help us chart early epidemic spread.

Cell phones are growing more ubiquitous by the day and will likely be critical tools

in helping to detect and respond quickly to outbreaks before they become pan-

demics. Yet they are not the only technology-heavy solutions being used in the

growing field of digital surveillance. In 2009 my colleagues at Google³ published a

fascinating paper showing that individuals’ online search patterns also provide a

sense of what people are becoming infected with.

With the vast stores of search data kept by Google and US influenza surveil-

lance data collected by the CDC, the team was able to calibrate their system to

determine the key search words that sick people or their caregivers used to indicate

the presence of illness. The team used searches on words related to influenza and

its symptoms and remedies to establish a system that accurately tracked the in-

fluenza statistics generated by the CDC. In fact, they did better. Since Google

search data is available immediately, and CDC influenza surveillance data lags

because of time needed for reporting and posting, Google was able to beat the

CDC in providing accurate influenza trends before the traditional surveillance sys-

tem.

Early data on seasonal influenza, as provided by the Google Flu Trends system,

is interesting and potentially important. This early data provides health organi-

zations time to order medications and prepare for different triage needs. But early

detection of seasonal influenza is not the Holy Grail. That honor would go to a sys-

tem that could detect a newly emerging pandemic. Google is now working to ex-

tend its influenza findings to other kinds of diseases. As more and more people

use search engines like Google, and more and more data is acquired, the hope is

that better and better trend analyses will be developed for agents other than in-

fluenza. Perhaps at some point a community experiencing the beginning of a pan-

demic will signal its arrival just by Googling.

The explosion of online social media provides another set of big data in which

weak but potentially valuable signals of a coming plague may be found. Computer

scientists, like Vasileios Lampos and Nello Cristianini from the University of Bris-

tol, have taken a similar approach as the scientists at Google, sorting through hun-

dreds of millions of Twitter messages. Like their colleagues at Google, Lampos and

Cristianini used key words to watch trends in Twitter and find associations with in-

fluenza statistics, in this case provided by the UK’s Health Protection Agency.

In 2009 they tracked the frequency of tweets related to influenza during the

H1N1 pandemic and found they were able to track the official health data with 97

percent accuracy. As with the findings by the Google Flu Trends team, this work

provides a rapid and potentially inexpensive way to supplement traditional epi-

demiological data gathering. It also has the potential to be extended to more than

just influenza.

While online social media can be scanned to see what people are communi-

cating about, online social networking may provide a richer and subtler range of

possible uses. In fascinating recent work, two leading social scientists, Nicholas

Christakis and James Fowler, have studied how social networks can inform surveil-

lance for infectious diseases.

In a clever experiment, these two scientists followed Harvard students who were

divided into two groups. The first group was randomly selected from the Harvard

student population. The second group was chosen from individuals that the first

group named as friends. Because individuals near the center of a social network

are likely to be infected sooner than those on the periphery, Christakis and Fowler

hypothesized that during an outbreak the friend group would become infected

sooner than the random, and therefore on average less socially central, group. The

results were dramatic. During an influenza outbreak in 2009, the friend group be-

came infected on average fourteen days ahead of the randomly chosen group.

The hope is that social science can identify novel kinds of sentinels to monitor

for new outbreaks and catch them early.⁴ Determining friends would be time con-

suming, however—something we could accomplish on a single college campus

but perhaps not nationally. Now self-identified friends in massive online social net-

works may make this task much easier. Online social networks like Facebook,

while not designed to help monitor for outbreaks, have created relatively easy-to-

mon-itor systems that can be mined to determine the frequency of illness, identify

social sentinels, and perhaps eventually provide predictions for spread of a new

agent within a community.

When John Snow created the first Geographic Information System in 1854, he took

actions that would seem very logical and straightforward to us today. He took a

map, he plotted where sick people were, and he plotted possible sources of conta-

gion. Snow could not have predicted the directions in which his first tentative step

would lead or the data that would eventually become available for today’s GIS.

In the end it may be that no single data source reigns supreme. If Snow were

alive today and investigating an outbreak, he’d want it all. He’d want to know where

the sick people were, and he’d be glad to get the data more quickly and easily

through text messages or Internet searches. He’d like to know exactly what cases

were infected with, down to the very specific microbial genetic strain. He’d seek to

use call data records to monitor people’s movements in order to track the move-

ment of the disease or where it was seeded. He’d like to know how people were

connected socially, and he’d certainly follow individuals who were likely to become

infected first or show signs earlier than the rest.

You can imagine the ultimate outbreak GIS, or in terms more familiar to Silicon

Valley, what Lucky Gunasekara, the head of my data team, calls the ultimate out-

break mash-up: a map with layer after layer of critical information—where people

are, what they’re concerned about, what they’re infected with, where they’re mov-

ing, and who they’re connected to. Developing and maintaining this combined dig-

ital and biological mash-up is the precise objective of Lucky’s team and something

to which we’ll return in the final chapter of this book. Ideally, over time the data can

be analyzed jointly, the various factors can be trained on actual outbreaks, and all

the technology can be weighted optimally to maximize predictive power.

When people ask me whether or not I’m optimistic about the future of predicting

pandemics, the answer is always a resounding yes. Given the first two-thirds of this

book, you may wonder if my optimism is warranted. A steady wave of intercon-

nectedness among humans and animals has created a perfect storm for new pan-

demics. That is true. Yet the interconnectedness among humans that now exists

through communication and information technology gives us unprecedented

capacity to catch outbreaks early, which, when combined with amazing advances in

our ability to study the diversity of the tiny life forms that cause epidemics, cer-

tainly makes optimism warranted.

What will win out in the end? Will pandemics sweep through the human popu-

lation destroying millions of lives? Will technology and science ride in to the res-

cue?

pulse news 24

Pages

Thursday, March 26, 2020

MICROBE FORECASTING

MICROBE FORECASTING