Using Data Science to Find the Next El Nino

The El Niño/La Niña pattern in the Pacific Ocean is notorious for its long-distance effects on weather as far away as Africa and the Midwestern United States. But climate experts also know of several other such patterns, known as “teleconnections,” and believe that there are many more to be discovered.

The new TRIPODS+Climate project, a collaboration among the University of Wisconsin-Madison, the University of Chicago, and the University of California-Irvine, will develop novel data science tools to sniff out these hidden patterns, improving weather forecasts and scientific understanding of global climate.

The collaboration is an expansion of the National Science Foundation TRIPODS program, which funded several research centers in 2017 to explore the fundamentals of data science—the modern intersection of mathematics, statistics, and computer science. Stephen Wright, Professor of Computer Sciences at UW-Madison and the Wisconsin Institute for Discovery, and Rebecca Willett, Professor of Statistics and Computer Science at UChicago (and an alumna of WID), lead one of the TRIPODS Institutes. With TRIPODS+Climate, they will work with a team of climate scientists to apply data science methods such as machine learning, network analysis, and predictive modeling to the growing flood of climate data.

“There are fundamental challenges pervasive in data science that are epitomized in the climate science setting, making this collaboration a nice opportunity for advances on a number of fronts,” Willett said. “The question really is, can we find some middle ground that’s going to allow us to harness climate data as fully as possible without ignoring existing physical models of climate?”

While El Niño, formally known as the El Niño–Southern Oscillation, is the best-known climate teleconnection, scientists have found many similar patterns in the Pacific and Atlantic Oceans. For example, TRIPODS+Climate co-investigators at UC-Irvine led by Prof. Efi Foufoula-Georgiou recently found that sea temperature changes near the coast of New Zealand strongly predict precipitation changes three months later and thousands of miles away in the southwestern United States.

“Data science techniques are especially useful for sifting through massive troves of data to discover unexpected relationships between events”  — Stephen Wright

But despite an unprecedented increase in the volume and resolution of climate observations, these phenomena are difficult to detect in the data. Researchers working with high-dimensional and noisy data must spot complex relationships across geography and time while ruling out spurious correlations and other false positives. Enter data science.

“Interrogating observations and climate model outputs to discover, characterize and understand climate modes of variability and change is fundamental for improving seasonal to sub-seasonal forecasts,” said Foufoula-Georgiou. “However, the large internal variability of the climate system, non-stationarities and space-time dependencies make it hard to discern causal predictive relationships,”

Steve Wright
Stephen Wright

TRIPODS+Climate will create new methodologies in machine learning and network estimation that reveal the structure of the Earth’s climate system and its regional hydroclimatic impacts. Machine learning, where statistical algorithms use large datasets to detect patterns and make predictions, can be used to find teleconnections previously hidden from human observation. Network estimation methods can mathematically conceptualize global climate as an interconnected structure of nodes, so that scientists can better quantify and understand complex influences across geography and time.

“Data science techniques are especially useful for sifting through massive troves of data to discover unexpected relationships between events,” Wright said. “We have seen examples of this phenomenon in the relationships between genetics, environment, and disease. Climate science is an area in which very large collections of data are ready and waiting to be analyzed.”

These tools will then be used to build new computational climate models and create new platforms for climate diagnostics and prognostics, improving seasonal and subseasonal forecasts. More accurate predictions will help scientists and policymakers understand and prepare for climate change, extreme weather events, and water allocation under conditions of high or low precipitation.

Like the other TRIPODS+X programs announced today by the NSF, TRIPODS+Climate will also strengthen the broader data science community by training students and post-docs at the interface of data and climate science.

“This project will help spread the influence of modern data science through the climate community, and put young data science researchers in touch with a critical area of research that is a rich source of data analysis problems,” Willett said.

Author: Rob Mitchum, University of Chicago

Leave a Reply

Your email address will not be published. Required fields are marked *