Wastewater-Based Epidemiology for Biosecurity - Demystified
An non-technical explainer for anyone to understand the fundamentals of a technology that can help us in the fight against pathogens old and new
Imagine a tool that can spot the spread of a virus before anyone even shows symptoms. It might sound like science fiction, but this is the promise of wastewater-based epidemiology - sometimes called WBE or wastewater epi. In this explainer, I’ll explore the basics of this powerful public health tool, its limitations, and what we can expect from it in the future.
The Challenge: How Do We Detect Outbreaks Early?
It’s the early hours of an outbreak - doctors suspect a highly concerning virus when they diagnose one or a few patients. To have any chance of containment, public health professionals race to learn how many people are sick, where they are, and whom they have contacted. Equally important, they calculate measures like basic and effective reproduction numbers, generation time, serial intervals, and lethality - these help characterize the outbreak’s scale, severity, and containability.
What do these words mean?
Basic reproduction number (R0): How many people one person can infect when no one is immune and no protections are in place.
Effective reproduction number (Rt): How many people one person is actually infecting at a specific time, considering factors like immunity or prevention measures.
Generation time: The average time between someone getting infected and passing it to someone else.
Serial intervals: The time between when one person shows symptoms and when the person they infected shows symptoms.
Lethality: The percentage of people who die from the virus out of those who get infected.
Patient-level testing and contact tracing are, of course, the first and most important steps to determine how an outbreak should be addressed. However, traditional diagnostics have limitations and rely heavily on human behaviors. Patients may not be tested if they are asymptomatic, lack access to testing services, or simply do not want to test. Even if patients are tested, results can be delayed or inaccurate. When patients use at-home rapid tests, most results will never get reported to public health officials. Additionally, privacy regulations require the removal or masking of personal identifiers before data is aggregated and shared, which can further slow down the reporting process.
To be clear, wastewater epi should be used to complement patient level testing, not replace it. WBE doesn’t tell us which individual is sick, how severe their illness is, or with whom they have been in contact. We will always need patient-level data to answer those questions. But wastewater can help fill in gaps left by patient-level testing, by telling us sooner, more accurately, and more cheaply if and how much of a virus is circulating within a community.
Testing challenges in action
At the start of COVID-19, patient-level testing faced significant challenges. The U.S. CDC's initial tests were flawed, and strict federal testing criteria delayed diagnoses, even for patients showing symptoms. By late February 2020, fewer than 4,000 tests had been conducted nationwide, meaning there was little understanding of the scale and severity of the outbreak. In fact, it wasn't until researchers from the Seattle Flu Study—defying federal and state guidelines—tested flu samples for COVID-19 that community transmission was confirmed.
Getting on the Same Page: What Is Wastewater-Based Epidemiology?
Before diving in, let’s clarify a key distinction between wastewater epi and environmental wastewater monitoring.
Environmental wastewater monitoring usually means a focus on water quality and addressing Clean Water Act concerns like pollution, factory runoff, and contamination. Wastewater epi, on the other hand, looks at wastewater samples to understand the health of a community. Instead of checking for pollution, it checks for things like viruses, chemicals, or even drugs to see what might be circulating in a population. These two types of wastewater monitoring do not need to be mutually exclusive; however, they generally involve different priorities, methods, and stakeholders.
While people usually talk about wastewater epi in terms of biological concerns, like viruses, it can also be used to look at chemicals. Analyzing chemicals in wastewater has been a helpful tool in combating the opioid epidemic.
We could talk all day about wastewater epi, but for the sake of brevity, for the rest of this explainer we will focus specifically on how biological wastewater epi is used in the U.S.
How Does Wastewater-Based Epi Work?
At its core, wastewater epi requires three key components:
1. Samples: Location, Location, Location
Population Size:
Choosing the right sample location is crucial. The granularity of the data depends on where the sample is collected.
For instance, let’s say all wastewater from a large city is treated at a single wastewater treatment plant, and you collect your sample from the water flowing into that plant. The data you generate from that one sample will cover the entire city - maybe up to millions of people. On the other hand, let’s say you collect at manholes in each neighborhood. Those samples will tell you only about the people within the neighborhoods connected to the manholes.
When sampling wastewater from large catchment areas (meaning larger populations), the data is less noisy and the process is more economical—you get high quality data from larger populations. When sampling wastewater from smaller catchment areas (meaning smaller populations), you can obtain more granular data to detect local trends, but the data tends to be noisier (making it difficult to interpret at times) and you increase the risk of encroaching on privacy.
On the note of privacy, the CDC released a report showing that 75% of adults in the U.S. are in support of wastewater epi for infectious diseases, and 95% say they would take measures to protect themselves if wastewater showed that there was disease transmission in their area!
Accessing Sewage:
Regardless of your sampling strategy, you will also need to contend with real-world limitations in accessing wastewater. Whether you’re collecting from a wastewater treatment plant, a manhole, an airport, or even just one building, relationships with wastewater professionals are imperative as you will need cooperation and support from those who maintain these spaces.
When it comes to the actual sampling, not all wastewater is created equal.
Composite sample [gold standard]: involves collecting small amounts of wastewater throughout a 24-hour period. This method decreases noise and increases the likelihood that your data is truly representative of the population you sample from. Don’t worry, there are machines that can collect composite samples automatically!
Grab sample: you just scoop up a sample at a single moment in time. This approach is straightforward and does not require specialized equipment, however, it only captures what is happening at that exact moment. Grab samples might be particularly problematic if the time of day chosen for the sample is inconsistent and/or correlates with unrelated population dynamics - such as in the middle of the night vs. the middle of the day. Studies have shown (yes these exist lol) that poop times vary!
2. Validated Assays: Decoding the Data
Once the samples are collected, they’re sent to a lab for analysis. The key question here is whether the target biomarker is present in fecal matter, urine, or skin particles (that might come off in the shower). In fact, scientists have only just begun to answer the question “what diseases can we find in wastewater?”
Scientists have already published a wealth of research and detailed methodologies for analyzing pathogens in wastewater like SARS-CoV-2, influenza, RSV, polio, mpox, hepatitis A, norovirus, and even chemicals found in opioids and illicit drugs.
What else could we find in wastewater?
In theory, wastewater epi might be analyzable far beyond the pathogens listed above. Inflammatory bowel disease, colorectal cancer, celiac disease, clostridium difficile, kidney disease, bladder cancer, prostate cancer, tuberculosis, diabetes, cirrhosis, hepatitis C, zika, ebola, malaria, tuberculosis, HIV, and syphilis, all have biomarkers that can be found in individual stool and urine samples. Our bodies also shed other chemicals in the bathroom that tell us a lot about how a community is doing - like adrenaline, cortisol, caffeine, prescription drugs. To be clear, researchers have not yet looked at these additional diseases or chemicals in wastewater at any kind of scale - but it sure would be cool if they did!
Answering “How Much”
People infected with SARS-CoV-2 shed viral RNA in their feces, which can be detected in wastewater via PCR-based assays. These PCR tests are not too dissimilar to the tests you likely became familiar with during the COVID-19 pandemic - but it’s not as simple as sticking a Q-tip in the toilet:
qPCR (Quantitative PCR): amplifies viral genetic material and measures it in real-time, allowing you to see not just if a virus is present but also how much of it is in the sample.
dPCR (Digital PCR) and ddPCR (Droplet Digital PCR) [gold standard]: more precise forms of PCR, breaking samples into tiny compartments (or “droplets”) for individual analysis. These methods provide more accurate counts, especially when virus levels are very low. ddPCR, in particular, is highly sensitive, making it ideal for tracking early or late-stage outbreaks when viral load is low.
Answering “What Kind”
Knowing which variants are circulating helps us understand if a more virulent or lethal version of the virus is spreading. This is extra important if a new mutation might render current vaccines, monoclonal antibodies, and/or antivirals ineffective.
Sequencing (especially Whole Genome Sequencing): reads the entire genetic code of a virus, helping to identify specific variants or mutations.
Metagenomic Sequencing: analyzes genetic material from all organisms in a sample, allowing for the detection of multiple pathogens and their genetic variations without prior knowledge of what is present.
There are other resources out there that will do a better job at explaining how these assays work. What is important to understand at a basic level, is that any wastewater epi program will need to develop an assay that is run on specialized equipment by skilled personnel via a process of trial and error. If different programs use different methods, comparing their results becomes very challenging.
3. Data Analysis: Making Sense of the Results
Normalization:
After running the assays, the next step is analyzing the data. One crucial process is normalization, which helps account for variables like population size or weather events.
For example, if a city’s population temporarily doubles due to an influx of conference attendees, the amount of virus detected in wastewater might appear to spike—when, in reality, the infection rate could remain the same. Normalizing the data helps avoid incorrect conclusions and allows for more accurate tracking of infection rates.
There are generally three approaches to normalization:
Biomarker normalization [gold standard]: This method ideally involves using a consistent biomarker like pepper mild mottle virus (PMMoV), which is nearly universally present in human waste. PMMoV levels provide a stable reference point to adjust the concentration of a target substance like a pathogen. Normalizing with PMMoV allows us to adjust for the amount of human waste in a sewer system, i.e. adjusting for community population fluctuations. This method helps ensure the data reflect the true viral load in a population.
Flow normalization: This approach involves adjusting the concentration of the target substance based on the total volume of wastewater processed during the collection period. In theory, if twice as many people are using bathrooms, the wastewater volume should increase accordingly. However, this method can be flawed due to external factors like rainstorms, which can increase water runoff and dilute the samples, or varying water usage efficiency across different systems.
Population normalization: Similarly, population normalization involves adjusting the concentration of the target substance based on an estimate of the contributing population, but this can lead to uncertainty if the population is not well-defined or if there is not good population data. This might work if you know the exact number of people who worked in a building complex on a given day, however, any fluctuations not accounted for (a visiting tour or people staying home who are sick) would lead to noise and inaccuracies in the data.
Validation:
Validation is necessary to ensure that your assay is accurately measuring disease prevalence. Validation is particularly challenging because there is rarely a source of truth to compare against (that’s the whole point of wastewater surveillance - it shows what other data collection methods do not!).
The first and most straightforward validation method is to look at positive controls. In this approach, you prepare a series of samples with known concentrations of the target substance to ensure that the assay is working correctly and producing expected results.
However, there can be factors outside of your lab that impact your results. Everything from temperature fluctuations leading to degradation of viral matter to changes in shedding patterns across different viral variants. To account for these kinds of ‘real world’ of effects requires ongoing detective-style work from epidemiologists and data scientists. Often this looks like evaluating the wastewater against other data sources, like hospitalization rates or patient-level test positivity. This can also include cross-referencing with other independent data sources, such as other researchers who are also doing wastewater epi and continually comparing notes and running experiments.
Data Visualization:
Communicating wastewater epi findings can be challenging for two primary reasons: non-standard geographic units and unfamiliarity with wastewater concentration data as an output.
Imagine the use cases for those consuming wastewater epi data: knowing how many people are sick with a given virus in a given area. But we just learned that wastewater data can’t tell you exactly how many people are sick - just the concentration of a virus in wastewater (and if that is going up, down, by how much, and how much of each viral strain is present). In addition, wastewater catchment areas do not map neatly onto zip codes (or other standard geographical units). Thoughtful data analysis and visualization can help bridge the gap between this complex output and practical decision-making.
What are we measuring?
Wastewater epi results are typically reported in terms of “copies per liter.” The “copies” part means how many pieces of the virus’ genetic code (DNA or RNA) were found in 1 liter of that wastewater. The more “copies” detected, the more of the virus is likely circulating in the people who contributed to that sewage.
Check out how this looks in a graph. This is for Massachusetts from July 2024. You can pretty quickly see that southern MA saw an increase about halfway through the month. And if you look over to the Y-axis, you can see exactly how many “copies” were found in the sample. Thank you MA for the public data!
The Benefits and Limitations of Wastewater Epi
Pro: Inclusive Surveillance
Not everyone has access to healthcare (1 in 3 Americans don’t even have a primary care physician), and most diseases or chemicals are not reportable to the CDC. But everybody uses the bathroom. About 83% of the US is connected to the sewer, and those who are connected to septic will likely still contribute to the wastewater “grid,” if you will.
A thoughtful approach to collecting wastewater can result in a more inclusive and accurate measure of disease prevalence.
Pro: Early Warning Signal
Wastewater Epi is particularly valuable for detecting diseases with incubation periods where viral matter is shed in fecal matter before symptom onset. This means that the virus could appear in wastewater before the first hospitalization, allowing for earlier intervention, potentially weeks before traditional diagnostic methods would indicate a problem.
Exact lead times depend on program design and availability of hospitalization data, but generally the lead time for SARS-CoV-2 in wastewater compared to clinical data was about one to three weeks. Similar lead times have been found for influenza and RSV.
Pro & Con: Privacy Concerns
PCR and whole genome sequencing assays are naturally anonymized - that means they can tell you how much of a virus there is and what kind of strains are circulating, but not who, specifically, is sick. This can make wastewater epi attractive as a population monitoring tool when compared with patient-level testing data.
That being said, there are still privacy concerns when sampling smaller populations. If you were to sample one individual’s house, for example, you would know the virus in the wastewater probably came from that individual. Respiratory virus transmission is also relatively low-stigma, but privacy concerns are more likely to appear when testing for more sensitive targets - such as STIs or illicit drugs.
Legally speaking, there is little discussion on ownership of sewage and/or sewage privacy, beyond a broad recognition of the 4th amendment. Ethically, however, it is imperative that any wastewater epi program is transparent with and receives buy-in from the population it monitors, especially when dealing with small populations and culturally sensitive biomarkers. And there is no reason not to - as noted earlier, the CDC released a report showing that 75% of adults in the U.S. are in support of wastewater epi for infectious diseases, and 95% say they would take measures to protect themselves if wastewater showed that there was disease transmission in their area!
Con: Limited Granularity
Wastewater epi can tell us about trends—how disease prevalence is rising or falling—but it doesn’t provide patient-level data. If you're monitoring for a potential pandemic pathogens family and something shows up in wastewater, you can’t exactly rush patients into isolated care settings. If you’re monitoring for breakout transmission of polio, you might pick it up in wastewater first, but you won’t know to whom to offer free vaccines—but you could know which neighborhoods to start sending resources to!
A mistake would be to say that wastewater has no value because by the time you find it “the cat is out of the bag” so to speak. It would be better to hold the full nuance that wastewater can buy you valuable time and target your limited resources in responding, and should be coupled with patient-level testing data in care settings.
The Future of Wastewater Epi
As a relatively young field, Wastewater Epi has enormous potential for growth. Three key themes will shape its future:
Standardization: Consistent sampling procedures, lab assays, and data analysis methods will make it easier to compare data across regions and even countries.
Sustainable Funding: With pandemic-related funding on the decline, finding new sources of support will be crucial. The private healthcare sector and defense organizations could play significant roles in sustaining this valuable technology.
Metagenomic Sequencing: Instead of searching for one specific pathogen, metagenomic sequencing allows us to analyze EVERYTHING in a sample, offering a powerful tool for detecting novel biological threats.
Conclusion
Wastewater epidemiology is a powerful tool in the biosecurity toolbox. It offers a more inclusive, timely, and cost-effective way to monitor disease prevalence across populations. While we’ve only scratched the surface today, I hope this introduction sparks your interest and encourages you to explore this fascinating field further.
Further reading and resources
CDC Center for Forecasting and Outbreak Analyics (CFA) Behind the Model
Wastewater Surveillance for Infectious Disease: A Systematic Review
This was created as part of the Bluedot Impact Biosecurity Fundamentals program