Max Griswold: Big Data + Small Stories = Better Policy
Diane Baldwin/RAND Corporation
What Max Griswold (cohort '18) likes best about big data are the small stories.
"Every single row in a dataset tells a story, sometimes tragic, sometimes uplifting," he explains. "When I'm working through the data, I spin the wheel and look at random lines, reading the stories they tell."
Looking at individual lines is imperative, he says. "No, you can't read them all, but you have to make sure the data extraction is valid." On a recent project examining Medicare claims data, he noticed "there was one patient who had two heart attacks and had $7 million in charges in one year. Was that accurate? It turns out it was. This was a crazy example, and you have to make sure."
Griswold has worked with big data and scalable computing for more than a decade now, and he's seen it change greatly in that time.
"My work on any big-data project these days involves running a statistical model millions of times, on sources of data so large that we wouldn't have even been able to open the dataset 10 years ago," he says.
“Every single row in a dataset tells a story, sometimes tragic, sometimes uplifting. ... You need to understand the stories, what the data looks like in terms of lived experience, to see the big picture.”
Before he began his Ph.D. studies at Pardee RAND and started working with researchers in the Center for Scalable Computing and Analysis, he was a senior research scientist at the Institute for Health Metrics and Evaluation, an independent population health research center at UW Medicine, part of the University of Washington.
Although his academic background is in economics, his role at IHME was epidemiological. "IHME has the largest longitudinal collection of health data in the world, and I would use big data every day," he said. "University of Washington had one of the first academic computing clusters in the U.S., and I did a lot of alcohol, drug, and occupational health and safety research with data from 30 years of longitudinal surveys — 50 to 70 thousand data sets per topic area — and large statistical models estimating attributable risk."
While at IHME he coauthored more than a dozen articles published in The Lancet and JAMA documenting findings from the multidecade Global Burden of Disease Study. Topics ranged from the United Nations' health-related Sustainable Development Goals to the burden of cardiovascular and other diseases in individual countries including the United States, Brazil, and Russia.
Griswold came to Pardee RAND to grow and strengthen his analytic skills, especially — but not only — in the area of health care data analysis.
As part of Pardee RAND's On-the-Job Training requirements, Griswold is working on a range of projects that involve scalable computing both at RAND and in the cloud.
The research for which he's exploring Medicare and Medicaid claims data, for example, has a goal of determining appropriate pricing for Medicare Advantage. "We're looking at 10-20 thousand different disease codings, huge demographic datasets, and trying out machine learning models on data to determine pricing," he said. "To do this research we're using a Hadoop cluster for data extraction. We have a data use agreement with CMS, so all analysis has to be done in house at RAND."
Diane Baldwin/RAND Corporation
A different project has him analyzing geospatial data on Amazon Web Services and Amazon Cloud Services. "We're looking at the built environment's effects on determinants of crime. We have a complete census of buildings in Los Angeles, Pittsburgh, Detroit, and New Orleans, overlaid with incidents where police were called for firearms-related violent crimes. We're looking at the relationship between buildings and crime," he explained.
As with the Medicare study, he said, "Examining random subsets also helps my research. It can be really purposeful to understand the stories behind the data, even if — in this case — so many of the stories are tragic."
Griswold is in the planning stages for his dissertation, which will also involve a hefty amount of data analysis. His working title is "Mapping the Obesity Epidemic: A New Approach to Meta-Analysis," and his goal is to determine underlying causes of the epidemic. "It's a huge big-data problem. We have a lot of data," he said, "It just needs to be analyzed."
"Big data is definitely my bread and butter," he says, "but I see it as a means to an end, to improve policy. And you need to understand the stories, what the data looks like in terms of lived experience, to see the big picture."