The following events take place Tuesday, March 26th, a day prior to the Health Datapalooza.

Health Data Workshops

In collaboration with:

careset logo

Location: Washington Hilton (1919 Connecticut Ave., NW,  Washington, D.C. 20009)
: $150/class
Registration: Coming Soon

Session 1

Session 1A: Introduction to NPI and NPPES
Time: 9:00 a.m. - 12:30 p.m. ET
Location: Lincoln East

NPPES is the public data that lists every provider (both doctor and hospital) in the United States. It details where they provide treatment, what type of healthcare provider they are, and basic contact information. Typically, claims data analysis and healthcare system modeling begin with this dataset. Data Source:

This year, there are three new files coming out of NPPES include health information exchange endpoints, additional locations and other business names. This class will explain all of the fields, and the basic instructions for getting the data into an online database like AWS or Google database products, and how to filter the data on basic fields. Key topics will include:

  • Short history of HIPAA and its relationship to NPPES.
  • How to break the NPPES data into state-level data that can be loaded easily into Excel.
  • How to use csvkit to filter the NPPES csv file.
  •  Understanding the National Uniform Claim Committee Healthcare Provider Taxonomy.
  •  How to properly determine the “primary” provider type (taxonomy) for a healthcare provider.
  • Working with credentials in NPPES (be careful).
  • Review the address information in NPPES.
  • Review phone information in NPPES.
  • Review new endpoints file.
  • Review new other-name fileReview new practice location file.

Session 1B:  Introduction to NDC data
Time: 9:00 a.m. - 12:30 p.m. ET
Location: Lincoln West

NDC is the basic building block for medication analysis in the US, it is the list of FDA approved medications, and their specific packaging permutations. The NDC data is lynch pin dataset that links together data from the patent office, and is relied upon by the National Library of Medicine for drug data mapping in RxNorm and ULM. Key topics will include:

  • Short history of the NDC code.
  • Working with the FDA NDC search tool as an introduction to NDC data.
  • Downloading and parsing CSV results from the FDA search tool.
  • Understand the basic layout of the NDC data download.
  • Understanding the different NDC10 structures.
  • Understanding NDC11 and how it is calculated.
  • Linking Package Data to Product Data.
  • Importing portions of the data into Excel.
  • Understanding how NDC data relates to Label Data.
  • Understanding how NDC data related to Orange Book Data and Medication Patents.
  • Understanding how NDC connects to the FDA Pillbox application.

Session 2

Session 2A: Referral/Patient Sharing Data Tutorial
Time: 1:30 p.m. – 5:00 p.m. ET
Location: Lincoln East

The “DocGraph” dataset shows how Medicare Providers share patients in time. This large, graph dataset can reveal the structure of the healthcare system, showing how patients flow through medicare. Learn about the dataset that is frequently studied as one of the largest social graph datasets, using real-names, that is available to the public. This class will also cover MrPUP, which is the explicit referral dataset, as opposed to the implicit referral dataset. Key topics will include:

  • Brief history of the dataset.
  • The basic concepts of working with graph datasets, notions of centrality, etc.
  • Understanding the basic structure of the DocGraph HOP Datase.t
  • Understanding how DocGraph “Hop” differs from the original FOIA version of the dataset.
  • Using NPPES to filter a subset of the graph
  • Loading the graph of small state.
  • Querying the graph for a single provider.
  • Looking at secondary relationships, considering the “dandelion graph.”
  • Understanding the relationships between referral data and medication/utilization data.
  • Understanding MrPUP and explicit referral relationships.
  • Review of tools for next steps. Neo4J and GePhi.

Session 2B: Open Payments data
Time: 1:30 p.m. – 5:00 p.m. ET
Location: Lincoln West

Congress has required that pharmaceutical and other life-science companies release detailed data on the payments that they make to healthcare professionals in the United States under the Physician Payments Sunshine Act (PPSA), which was a section (6002) of the ACA. Now, there are several years worth of data releases under the act. This class will delve into that dataset, related datasets and provide the context needed to study the data. Key topics will include:

  • History of the data, including Propublica’s Dollar for Docs efforts.
  • Using the Socrata Open Payments Browsing Tool
  • Studying the data submission and protest process.
  • Downloading and working with the raw open data files.
  • Understanding the different classes of payments
  • Linking the data to the NDC database, and limitations.
  • Linking the data to the NPI database, limitations and Propublica’s link file.

Time: 9:00 a.m. - 5:00 p.m. ET
Location: AcademyHealth (1666 K Street NW, Ste. 1100, Washington, D.C.)
Cost: $40
Register: Coming Soon

As Application Programming Interfaces proliferate across the industry we still face challenges for consumers, data holders and application developers. How can API endpoints be discovered? How can consumers have confidence in applications that are receiving their health data? How can data holders know which applications to trust? How can we address all of these challenges without limiting choice for consumers? is hosting a pre-Palooza un-conference that will examine and brainstorm these issues and discuss standard approaches to addressing the challenges in a way that can unleash innovation that will enable an accelerated adoption and use of APIs for health data. If you are an application developer producing consumer-facing health apps, or if you are a data holder needing to provide an API for consumers to use to share their health data you want to be at this event.