Skip to main content
Blog Post

What Does Working with Large Claims Databases Mean For Researchers?

In 2017, the annual premiums for employer-sponsored health insurance for a family of four were $18,764. It turns out that you can get a pretty decent new car for that kind of money. A 2018 Toyota Corolla in Barcelona Red Metallic paint has an MSRP of $18,550. And, because of the insurance mandate (which I support) in the Affordable Care Act, every working family in the US, for all intents and purposes, is being compelled by law to buy a Toyota Corolla worth of insurance each year. What are we getting for our money and how do those costs vary?

My research is focused on examining the factors that drive health spending growth on the privately insured and influence variation in health care spending across the US. My work has benefitted tremendously from the growing availability of health insurance claims data. As recently as a decade ago, because of data availability, most of our understanding of health spending came from the analysis of Medicare data.  I been fortunate to work with mountains of data from the Health Care Cost Institute (HCCI). The HCCI data includes insurance claims from three of the five largest insurers in the US. The data include detailed information on health spending for nearly a third of individuals in the US with employer-sponsored coverage. Recently, HCCI announced that they were making this data accessible to all researchers.

Below, I discuss a few key topics for researchers who want to work with large insurance claims databases.

Research Questions Still Matter, Maybe More Than Ever

There is a temptation for researchers to view so-called big data as an end, not a means. Increasingly, universities are bragging about their investment in cutting edge analytics. It’s good to see this investment, but ultimately, research is only as good as the questions that researchers are asking. There is a risk that with so much data, scholars can produce a flood of statistics that are near impossible to put in a wider context. Research has to be motivated by a question (e.g. why does health spending on the privately insured vary across the US by a factor of three?). Researchers must focus, more than ever, on parsing out which questions are the most important. The growing availability of insurance claims data provides new avenues for researchers to answer important questions. Game changing research occurs when scholars use new methods and new data to answer hugely important questions and then can translate these findings to non-academic audiences. Ultimately, using ‘big data’ is another tool for researchers to answer important questions.

The Ability to Visualize Data and Results is Critical

Researchers need to think hard about how to visualize their results. The best analysis is not hugely useful if it is not digestible by other researchers and, crucially, is indigestible by policymakers. While most researchers can understand outputs from a multinomial logistic regression, there are many policymakers and journalists who can’t. The growing availability of data and the credibility revolution in applied econometrics has driven researchers to use even more technical analytic tools. It is crucial for researchers to still be able to express and present their findings to those outside their field and outside the academic community.

More data actually allows researchers to create better visualizations of their findings. How do you get better at visualizing your results? Go to seminars, read journal articles, analyze how data-driven journalists (e.g. the UpShot at the New York Times) present their stories, and see how the best researchers in the world are presenting their empirical results. I’m a huge fan of how Raj Chetty at Stanford University has visualized his analysis of tax data to study intergenerational mobility. Are there lessons from how he has presented his data that can inform your work?

My goal is to be able to present the key message of my papers in three fairly simple figures.

How Research Teams Are Structured

Producing empirical research increasingly requires research assistants and mountains of lines of code. With so much data, it is almost impossible for a single researcher to do all their analysis on his or her own. Research assistants are crucial to my work. These 23- to 30-year olds work with me for two or three years and then go on to PhD programs. As research assistants become more and more important, researchers need to think more about becoming effective managers, developing the ability to identify and nurture talent, and understanding how to run teams. These skills are not generally taught in PhD programs. Maybe they should be.

 

Zack Cooper’s work has analyzed health spending patterns for the privately insured, the drivers of surprise billing in hospital emergency departments, and the links between politics and health spending. Cooper’s work has been presented at the White House, the Department of Justice, the Federal Trade Commission, and the Department for Health and Human Services and has been featured extensively in the popular press.

Zack Cooper headshot
Author

Zack Cooper, PhD

Assistant Professor of Public Health (Health Policy ), Assistant Professor of Economics, and Assistant Professor in the Institution for Social and Policy Studies - Yale University

Zack Cooper is an Assistant Professor of Health Policy and of Economics at Yale University. Read Bio

Blog comments are restricted to AcademyHealth members only. To add comments, please sign-in.