Academic Medical Center Chief Data Scientists: Pathways of the Pioneers

In the panoply of new data science programs, tools, and services, that underpin today’s successful clinical research and care delivery operations, the position of chief data scientist (CDS) has recently emerged as a key innovation position in academic medical centers (AMCs). To better understand this trend and the paths that led to these roles, we met with recently appointed leaders at Georgetown University Medical Center (GUMC) and Northwestern University Feinberg School of Medicine (NUFSM). Here, we explore how their unconventional experiences positioned them to seize opportunity and become pioneers in institutional strategic data management with an emphasis on translational and clinical research.

First, here’s a quick background on the career paths for our spotlighted data leaders. In September of 2018, Subha Madhavan, Ph.D. was named the first Chief Data Scientist and Director of GUMC’s new Office for Health Data Science and Informatics. Dr. Madhavan’s research career began with her formal Ph.D. training at the Uniformed Services University for the Health Sciences, postdoctoral training in molecular biology at Johns Hopkins University, and was coupled with a masters degree in computer science from the University of Maryland. After her graduate work, she led programs in the new field of bioinformatics at a startup genomics company and then joined the National Cancer Institute at the advent of whole genome sequencing and genomic database research analysis. Since coming to Georgetown’s Lombardi Cancer Center she has built the Innovation Center for Biomedical Informatics (ICBI) into a national leading program for health IT, data science, and clinical informatics. Now, in her new role, she’s expanding her successes into new realms of Georgetown and the MedStar Health system data enterprise.

This past December, Nicholas Soulakis, Ph.D. was named Director of the Data Services Center for the Center for Data Sciences and Informatics (CDSI) at NUFSBM. His career pathway was atypical in that informatics skills from his ‘day job’ became integrated with his studies in epidemiology at the University of Pittsburgh Graduate School of Public Health. His work centered on outbreak detection services for Ebola, Hanta virus, AIDS, polio, and influenza. Coinciding with this he worked full-time at the Center for Biomedical Informatics at the University of Pittsburgh during the day and took classes at night. He applied his technical skills in data warehousing, messaging routers, and data integration into his epidemiologic surveillance research. Dr. Soulakis’ work was thrust into demand at the turn of the 21st century when the rise in bio-terrorism threats took center stage. Part of his graduate studies took him to the New York City Department of Health where practical experience in algorithm development and data analysis was drawn upon for daily public health operations whether it was H1N1 flu, opioid overdoses or heat waves. That professional experience gave him the working tools for population health problems and ultimately led him to Northwestern and the Department of Preventive Medicine.

Despite their non-traditional training paths, Drs. Soulakis and Madhavan shared with us some commonalities in their inspiring new work as both are affiliated with their institutional hubs for the NCATS Clinical and Translational Science Awards Program, and actively engage in training programs. We asked them to address some questions about their work and the opportunities ahead for data scientists.

What were the important initial projects that got you going and what approaches to research do you take?

Madhavan: An important thread throughout my career has been building databases and researcher-friendly tools for cancer research. I started out as a lab rat, really, just building software tools for imaging and genomic analyses at Hopkins. Noteworthy of this early work is the Repository of Molecular Brain Neoplasia Data (REMBRANDT) that I established early in my time at NCI to facilitate clinical information and genomic characterization data from patients with advanced brain tumors. I’m proud of our contributions to this through our Georgetown Database of Cancer (G-DOC system) which is a publicly available data resource for cancer research. I led the development of The Cancer Genome Atlas (TCGA) data portal while at NCI which helped shape my career in multi-disciplinary data science to drive translational research. These are leading resources that are spurring precision medicine programs in clinical translational research including with private sector partners. In recent years, we’ve built on these to support data science needs in many other research programs at Georgetown and with our partners at MedStar Health. The MedStar system is a very large, diverse health system in the Mid-Atlantic region that has its own research institute and, most importantly, strong supportive leadership that advocates for our partnership in many data projects. We also have many strong partnerships with other institutions, such as Howard University, Dept. of Veterans Affairs and many private partners where our data science can be leveraged in innovative research collaborations.

Soulakis: From my graduate work, and the atypical approach I took in my research, I have to say that passion and mission really forced me to learn things and do what it takes to accomplish the job. Initially, I followed the path of a traditional researcher in doing open studies and doing some algorithm development. I do a lot of graph theory and network analysis, for example. Now I use these skills and experiences and apply them in clinical departments to understand how you can use operational data for quality improvement projects and improve productivity. We use transactional data from EMRs, or scheduling data or customer relationship management (CRM) data, or paging system data – we develop systems to exploit data systems to answer important questions no matter what format.

What impact does a Center or an Office dedicated to data science have on advancing research?

Soulakis: Our dean provided resources for the CDSI which is led by Dr. Justin Starren and this has enabled us to aggregate, convene and support bright young researchers with good data science ideas and some innovative work. Simply put, it brings together critical ideas, tools and people. My job in this environment is to make data science available for everybody – no matter whether you are a skilled person or whether you want to increase the capabilities of your staff, retool, retrain. Really, my mission is to increase the data science abilities of the entire system no matter what your role or objectives. We have a lot of genomics programs now, for example, that have needs for tools that we can help support. In another case, I help with making cloud infrastructure available for financial management analysis and we set up project teams to solve important projects with informatics tools to use the data.

Madhavan: The objective of my position and our Office is to harness the power of information from multiple disparate data sources using advanced techniques such as machine learning and artificial intelligence to produce novel, actionable insights. In this role, I will work to develop our data science enterprise with a foundation on interdisciplinary teams and groups. To accelerate Georgetown’s data capabilities, we’ll be expanding our collaborative efforts with departments across campus as well as with our clinical partner, MedStar Health. In doing so, we’ll focus on current and emerging topics, such as digital health, precision medicine, integration of social determinants of health and behavioral health with other clinical data, real-world evidence, and apply informatics and data analytic approaches to leverage big data that will advance health care and improve outcomes.

It is likely that your position will help guide institutional practices for data sharing. What have you learned and what do we need to learn to engage in these practices and do so responsibly?

Madhavan: Yes, this is a big part of our future and we have a lot of work to do going forward. I’m quite proud of our track record on data sharing and the leadership position that Georgetown has taken in making research databases publicly available. We receive a lot of attention and interest from others, including from the Biden Cancer Initiative, Global Alliance for Genomic Health and the National Cancer Institute Moonshot Program where it is critical that genomic and clinical information is shared to maximize the speed of, and impact from, research. Nationally, we’ve made many important data contributions to these cancer programs and now we’re applying this in other areas of research to take on critical programs for health disparities, outcomes research, neuroscience, infectious diseases and behavioral health. We need new models for sharing, and also for working with our industry partners and that’s part of our portfolio and it is important that we develop strong and trusted relationships with our providers and patients in doing so in a responsible fashion. Trust is most important in all aspects of our work. We are fortunate to be working with colleagues from the Georgetown Kennedy School of Ethics on these important areas particularly involving patient privacy, data security, and the ethical uses of data – particularly in the area of artificial intelligence. We have many tools and talents to set the course properly for data sharing.

Soulakis: Indeed, this is important, and we’re taking a step-wise approach. Recently, I met with our chief information security officer (CISO) as there are many issues in a medical center placing clinical data into a commercially available cloud service as you might know. We have all sorts of data storage and workflow sensitivities – particularly, with highly sensitive personally identifiable data sources. So, we’re starting with open, anonymous, and aggregated data that doesn’t have the complexities associated with it, and starting there. For example, we’re heavily into the uses of open government data. We also have anonymous data from genomic sequencing projects. For instance, we’re looking at pneumonia in our critical care units and trying to depict models of pathogenesis. We have our physics and engineering teams that form great modeling teams that are working with this data in the cloud because none of it is personally identifiable. So, we’ve started there, and working closely with our CISO, chief information officer, the bioinformatics center staff, and establishing relationships that enables us to chart out the framework for the big things to come.

How do clinicians, researchers and collaborators find you and know of your capabilities?

Soulakis: Keep in mind, we’re really early in our journey. The primary place where we do this outreach is through our data science steering committee and we have institutional leadership, and broad organizational support. All of them are vested and committed to using more data-driven approaches and improving Northwestern’s research capabilities in data science. A lot of this becomes known through demonstration projects and increasing the practical applications of what we can do. There is so much misconception about what you can do with data science, artificial intelligence and machine learning – it’s the listening and going to meetings to understand what the problem is and seeing if we can help solve for it.

Madhavan: Throughout my time at Georgetown, I’ve had fantastic support from the leadership here and that helps immensely. Going back to our foundation in 2012 with the ICBI, each year, we hold a symposium that showcases not only our team’s work but that across the University in data science applications. This has been extremely important and we’ve had very effective communications and collaborations stemming from that. Another aspect of our work that helps build our data ecosystem is that we share a lot of our work in Github so its transparent and engaging for others to know our capabilities. I feel we have been strongly received because of our open culture and commitment to sharing knowledge and resources wherever we can. I am grateful for our team’s approach to collaboration and all of our partners who bring important contributions to our community of data scientists.

What steps are you taking to help shape the health data science workforce and next generation of problem solvers?

Madhavan: Education is a core component of our initiative. I’m very pleased that we are launching our new masters program in health information and data science (MS HIDS). We think we are well positioned to help train the workforce needed to support many elements of the health data ecosystem – from clinical research, regulatory sciences, outcomes research, device manufacturers, etc. The future is here and there are exciting career opportunities emerging; we anticipate our industry-driven program can create the pathways that make it easier to make the connections and find ways to contribute than they were when I started 20 years ago in Informatics.

Soulakis: My formal didactic teachings are at the University of Chicago in population informatics, and evaluation methods in health informatics. I also lead a variety of graduate student mentoring projects in my routine work. I tell my students that to be an informatics scientist, your two greatest tools are the listening and observing processes. It’s fundamentally important in your collaborations to fully understand the shared vision with your colleagues of what is possible. Also, I’ve been a methodologist all through my career and that’s what I emphasize. That as a methodologist, I’m not the one formulating the hypothesis. I’m the person that provides the data science services, and I don’t get to have an opinion on the outcome or value of the project. I need to be objective and facilitate the analysis of the data, and it’s important for me to know that the person wanting the analysis is invested in the question, and regardless of whether the answer is positive or negative, that they are going to derive value from it.

How do we inspire and create pathways for more people like you?

Soulakis: I don’t want to get on a soapbox here but I feel like there’s certainly barriers to technical professions. A lot of it is socioeconomics and I feel like we’re really missing an opportunity for some of the more creative, motivated, driven professionals given that the technical barriers are so high. If students can’t afford laptops, if they can’t get into programming classes, if they can’t afford that robotics club, then we’ve lost out. I think a lot of the STEM barriers are so high that they keep out some of the best students and thus we have a real diversity problem. Personally, I would love to see more free training and more credentialing to get them to their first career opportunity. Those two things, coupled with passion and mission can take you a really long way.

Madhavan: Well, this is important to me because of my career, I was often the only woman in the room. . I am grateful for both male and female mentors who have guided me throughout my career. I’ve also found that it is important for women to support and sponsor each other and make sure our voices are heard when the decisions are being made. So, there are not very many women with the title of chief data scientists or chief data officers. We have to change that. Some of this is cultural adaptation, and some of it is more emphasis on STEM education in our early years. Here at Georgetown we’re very conscious of this and we always make sure we have women scientists represented on panels and on our committees. We need to reinforce each other. I recall that I was guided by Dr. Laura Esserman, the breast cancer surgeon, researcher and informatician entrepreneur from UCSF, in her revolutionary work on adaptive clinical trials for cancer. She supported me and empowered me to take chances and speak up. As the years have gone on, I’ve learned to speak up more and I’ve been supported so much in our efforts for greater diversity and representation of women in our work here. I continue to engage with Georgetown Women in Medicine and Women in AMIA communities to advance this mission.

As the role of chief data scientist matures in academia and industry, we’ll watch for the opportunities and support ideas for change that will enhance the roles for new career development paths to emerge in this exciting and high demand area of research. We welcome our readers ideas and notions on where AcademyHealth’s organizational capabilities should be applied to strengthen these opportunities.