user-doctor-hairNPPES Provider Golden Record

About the Dataset

Every team that builds on NPPES eventually builds the same thing — a cleanup layer to make the raw registry usable. The Python scripts, the address scrubbers, the dedup logic, the exclusion cross-checks, the FHIR mapping work. It's the project nobody budgeted for and everybody ends up owning. We built it once, so your team doesn't have to.

The NPPES Provider Golden Record is the cleaned, deduplicated, enriched version of the Centers for Medicare & Medicaid Services (CMS) National Plan and Provider Enumeration System (NPPES) — refreshed weekly within days of each CMS release.

We validate every practice address against USPS postal records, standardize it to deliverable form, and score each with a confidence rating — so claim denials from bad ZIPs and typos stop at our pipeline, not yours. We surface the duplicate NPI pairs — candidate matches across millions of providers — and resolve them with tiered-confidence scoring, so the same physician doesn't show up three times in your network map. We cross-reference every NPI against CMS PECOS for Medicare enrollment, specialty, and group practice PAC ID — and flag OIG/LEIE exclusions and deactivated NPIs before they reach your billing team.

We fill missing NUCC taxonomy gaps, attach hospital CCN and group practice affiliations from CMS Care Compare, and enrich every address with census tract, CBSA metro, HPSA shortage scores, MUA designations, congressional and state-legislative districts, IANA timezone, and a native Snowflake geo_point GEOGRAPHY column for ST_DISTANCE and ST_DWITHIN spatial queries. And we publish pre-built FHIR R4.0.1 Practitioner, Organization, Location, and PractitionerRole resources — so your next TEFCA, QHIN, HIE, or EHR integration project starts on day thirty instead of day zero.

One weekly-refreshed source of truth for revenue cycle, credentialing, network planning, compliance, and TEFCA-ready interop.

circle-exclamation

What You Get

Addresses validated & geocoded

Every practice address USPS-standardized, confidence-scored, and paired with a native Snowflake GEOGRAPHY point for ST_DISTANCE / ST_DWITHIN spatial queries

Duplicate NPIs resolved

Tiered-confidence clustering collapses multi-NPI providers into unique real-world entities, with the canonical and all alternate NPIs preserved

Medicare enrollment joined

PECOS enrollment status, specialty, enrollment state, and group practice PAC ID attached to every NPI

Compliance risks flagged

OIG/LEIE exclusions, deactivated NPIs, and critical data-quality issues surfaced as clean booleans

Taxonomy gaps filled

Missing NUCC primary taxonomy codes inferred and flagged, so specialty classification is complete across every record

Hospital & group affiliations appended

CMS Care Compare hospital CCN affiliations, group practice, medical school, graduation year

Geographic & demographic context enriched

Census tract, CBSA metro, HPSA shortage scores, MUA designations, congressional and state legislative districts, IANA timezone, ACS demographics and economics

Interop resources pre-built

FHIR R4.0.1 Practitioner, Organization, Location, and PractitionerRole resources ready to drop into TEFCA, QHIN, HIE, or EHR pipelines

Coverage

Every US state, DC, and territories

Refresh cadence

Weekly — our pipeline monitors for new CMS NPPES releases and loads the validated, enriched data within days of publication, automatically

Consumer tables (11 in the share):

  • Provider master: PROVIDERS, PROVIDERS_ADDRESSES, PROVIDERS_IDENTIFIERS, PROVIDERS_LICENSES, PROVIDERS_TAXONOMIES, TAXONOMIES

  • Enrichment outputs: PROVIDERS_ENTITIES, PROVIDERS_FHIR, PROVIDERS_DUPLICATES

  • Metadata: PRODUCT_INFO, DATA_DICTIONARY

Overview

The NPPES Provider Golden Record provides analytics-ready access to every US healthcare provider with enrichment that isn't available in the raw CMS feed.

Provider master:

  • PROVIDERS – One row per NPI. Identity, lifecycle, secondary-name fields, taxonomy enrichment, OIG exclusion, CMS Care Compare hospital/group affiliations, PECOS Medicare enrollment, duplicate cluster flags, quality signals

  • PROVIDERS_ADDRESSES – One row per (NPI, address_type). Both Practice_Location and Business_Mailing rows per provider. Practice rows carry USPS-validated address, geocoded lat/long + native Snowflake geo_point GEOGRAPHY, census tract/block/place codes, ACS demographics, HPSA and MUA shortage designations, congressional + state-legislative districts, IANA timezone — mailing rows are raw NPPES passthrough

  • PROVIDERS_IDENTIFIERS – Other provider identifiers (PECOS, Medicaid, prior NPIs, state license numbers per NPPES OTHER_PROVIDER_IDENTIFIER_*). Join PROVIDER_ID = PROVIDERS.ID

  • PROVIDERS_LICENSES – State medical licenses (state + license number)

  • PROVIDERS_TAXONOMIES – Per-NPI taxonomy rows with the raw NPPES PRIMARY flag (sparsely populated ~15% of rows — for dense primary-taxonomy signal use PROVIDERS.INFERRED_PRIMARY_TAXONOMY_CODE)

  • TAXONOMIES – NUCC taxonomy reference (~880 codes with grouping, classification, specialization, definition)

Enrichment outputs:

  • PROVIDERS_ENTITIES – One row per unique real-world provider with multi-NPI duplicate clusters pre-collapsed and critical rollups preserved (any_medicare_enrolled, any_oig_excluded) — query this for accurate unique-provider counts

  • PROVIDERS_FHIR – FHIR R4.0.1 Practitioner (or Organization), PractitionerRole for Medicare enrollment, and Location resources stored as VARIANT, ready to drop into TEFCA, QHIN, EHR, or HIE pipelines

  • PROVIDERS_DUPLICATES – Bridge table of suspected duplicate NPI pairs with similarity scores, match type, confidence tier, and recommended surviving NPI — forensic view behind PROVIDERS_ENTITIES

Metadata Tables

Every Dataplex data product includes these standard metadata tables:

Table
Purpose

PRODUCT_INFO

Single-row product snapshot — current provider count, address count, enrichment coverage, last refresh

DATA_DICTIONARY

Column descriptions for every consumer-facing view

Entity Relationship Diagram

NPPES Provider Golden Record Entity Relationship Diagram

The primary join key is npi for every consumer-facing table. PROVIDERS.ID (VARCHAR UUID) is the FK target for PROVIDERS_IDENTIFIERS.PROVIDER_ID, PROVIDERS_LICENSES.PROVIDER_ID, and PROVIDERS_TAXONOMIES.PROVIDER_ID. Address enrichment joins via NPI: PROVIDERS_ADDRESSES is one row per (NPI, address_type) so each provider has up to two rows (Practice + Mailing).

Data Tables

PROVIDERS — the primary consumer table

One row per NPI. Identity, taxonomy, OIG exclusion, CMS Care Compare, PECOS Medicare enrollment, duplicate flags, and quality scoring. Join to PROVIDERS_ADDRESSES on npi for validated addresses and geo/demographic enrichment. Filter is_canonical = TRUE to dedupe a roster. The id column (VARCHAR UUID) is the FK target for PROVIDERS_IDENTIFIERS, PROVIDERS_LICENSES, and PROVIDERS_TAXONOMIES — join via PROVIDER_ID = PROVIDERS.ID.

Key Features:

  • One row per NPI (9.5M+ US healthcare providers including active + deactivated)

  • Validated identity and lifecycle fields including the PROVIDER_OTHER_* secondary-name family (maiden, former, DBA/AKA)

  • PECOS Medicare enrollment joined per NPI — medicare_enrolled is strict TRUE/FALSE

  • OIG/LEIE exclusion flagged per NPI with exclusion type and date

  • CMS Care Compare hospital affiliations, group practice, medical school, graduation year

  • Duplicate cluster labeling with a stable is_canonical filter for safe roster dedup

  • Refreshed weekly on the CMS NPPES release cadence

Key Columns

circle-info

This table highlights the columns used most often. PROVIDERS has 85+ columns in total including identity fields (id, 10× PROVIDER_OTHER_*). Query DWV.DATA_DICTIONARY WHERE table_name = 'PROVIDERS' for the complete, authoritative spec with types and descriptions — we publish the metadata to Snowflake so it always matches what's in the view.

Column
Type
Description

npi

NUMBER

National Provider Identifier — 10-digit CMS ID, primary key

address_id

VARCHAR

FK to PROVIDERS_ADDRESSES.address_id — join for address + enrichment

entity_type_code

NUMBER

1 = Individual, 2 = Organization

is_individual / is_organization

BOOLEAN

Convenience flags derived from entity_type_code

provider_organization_name

VARCHAR

Legal business name (organizations)

provider_last_name / first_name / middle_name

VARCHAR

Name parts (individuals)

provider_full_name

VARCHAR

Derived display name

provider_credential_text

VARCHAR

MD, DO, NP, etc.

provider_enumeration_date

DATE

When CMS first issued the NPI

is_active

BOOLEAN

Currently active (not deactivated)

npi_status

VARCHAR

active or deactivated

years_in_practice

NUMBER

Years since enumeration

primary_taxonomy_code

VARCHAR

NUCC taxonomy code (e.g., 207R00000X = Internal Medicine)

specialty_description

VARCHAR

Human-readable specialty from NUCC lookup

is_specialty_inferred

BOOLEAN

TRUE when taxonomy was inferred to fill a NPPES gap

is_excluded

BOOLEAN

On the OIG/LEIE exclusion list

exclusion_type_description

VARCHAR

OIG exclusion reason

exclusion_date

DATE

Date added to OIG exclusion list

medicare_enrolled

BOOLEAN

Enrolled in Medicare per PECOS (strict TRUE/FALSE)

medicare_specialty_description

VARCHAR

PECOS-reported Medicare specialty

medicare_enrollment_state

VARCHAR

State of PECOS enrollment

medicare_group_practice_pac_id

VARCHAR

PECOS group practice PAC ID

medical_school

VARCHAR

From CMS Care Compare

graduation_year

NUMBER

From CMS Care Compare

hospital_affiliation_count

NUMBER

Number of CCN affiliations (0–3)

group_practice_name

VARCHAR

Primary group practice from CMS Care Compare

is_canonical

BOOLEAN

TRUE for the canonical NPI in a duplicate cluster — filter for a deduplicated roster

is_duplicate_suspected

BOOLEAN

Participates in a high-confidence duplicate pair

duplicate_group_id

NUMBER

Cluster ID across all members (NULL if singleton)

duplicate_cluster_size

NUMBER

Count of NPIs in this cluster

has_quality_flags

BOOLEAN

Any active data quality flag

has_critical_quality_flag

BOOLEAN

Critical-severity flag (e.g., undeliverable address)

overall_quality_score

FLOAT

0.0–1.0 composite quality score

created_at

TIMESTAMP

When the source data was last loaded

updated_at

TIMESTAMP

When this record was last refreshed

PROVIDERS_ADDRESSES — validated addresses with full geo and demographic enrichment

One row per (NPI, address_type) — both Practice_Location and Business_Mailing rows per provider. Practice rows carry the full enrichment stack (USPS validation, Geocodio census/CBSA/HPSA/MUA, lat/lng/geo_point GEOGRAPHY, political districts, timezone, ACS demographics). Mailing rows are raw NPPES fields only with enrichment columns NULL.

Key Features:

  • One row per (NPI, address_type) — ~19M rows total, ~9.5M practice + ~9.5M mailing

  • Raw NPPES fields: street1, street2, city, state, zipcode, countrycode, telephone_number, fax_number, address_type, id (VARCHAR UUID), provider_id (VARCHAR UUID FK to PROVIDERS.ID)

  • Practice rows carry USPS validation with address_validation_status indicating valid / corrected / undeliverable / not_found (see status values below)

  • Geocoded lat/long coordinates and native geo_point GEOGRAPHY on practice rows

  • Census tract, block, place FIPS codes

  • ACS demographics and economics (population, median age, race/ethnicity, median household income, housing, education, veteran status)

  • HRSA HPSA shortage scores (primary care, mental health, dental) and MUA designations

  • Congressional and state legislative districts

  • IANA timezone with UTC offset and DST observance

circle-info

Filter WHERE address_type = 'Practice_Location' for enriched geo analytics. Filter WHERE address_type = 'Business_Mailing' for correspondence addresses.

Key Columns

circle-info

This table highlights the columns used most often. PROVIDERS_ADDRESSES has ~60 columns in total. Query DWV.DATA_DICTIONARY WHERE table_name = 'PROVIDERS_ADDRESSES' for the complete, authoritative spec — we publish the metadata to Snowflake so it always matches what's in the view.

Column
Type
Description

id

VARCHAR(36)

Row PK

provider_id

VARCHAR(36)

FK to PROVIDERS.ID

npi

NUMBER

NPI — matches PROVIDERS.NPI

address_type

VARCHAR

Practice_Location or Business_Mailing

street1 / street2 / city / state / zipcode

VARCHAR

Address as filed with CMS

countrycode

VARCHAR

ISO country code (US)

telephone_number / fax_number

VARCHAR

Contact numbers as filed

validated_address_line1 / validated_city / validated_state / validated_zip / validated_zip4

VARCHAR

USPS-validated canonical address (practice rows only)

latitude / longitude

FLOAT

Geocoded coordinates (practice rows only)

geo_point

GEOGRAPHY

Native Snowflake spatial type — use with ST_DISTANCE, ST_DWITHIN, ST_CONTAINS (practice rows only)

address_validation_status

VARCHAR

Outcome of address verification (practice rows) — see status values below

address_confidence_score

FLOAT

0.0–1.0 validation confidence

state_fips / county_fips

VARCHAR

Census FIPS codes

county_name

VARCHAR

County display name

tract_code / block_code / block_group

VARCHAR

Census Bureau tract/block geography

metro_area_name / metro_area_code / metro_area_type

VARCHAR

CBSA name, code, metropolitan/micropolitan

combined_statistical_area_name

VARCHAR

CSA (where applicable)

congressional_district / congressional_district_name

VARCHAR

US House district

state_senate_district / state_house_district

VARCHAR

State legislative districts

timezone_name

VARCHAR

IANA timezone (e.g., America/New_York)

timezone_utc_offset

NUMBER

UTC offset in hours

total_population

NUMBER

ACS total population for the tract

median_household_income

NUMBER

ACS median household income

pct_bachelors_or_higher

FLOAT

ACS educational attainment

is_in_hpsa

BOOLEAN

County has an active HRSA HPSA designation

hpsa_primary_care_score / mental_health_score / dental_score

NUMBER

HPSA severity (0–26, higher = more severe shortage)

is_in_mua

BOOLEAN

HRSA Medically Underserved Area designation

is_rural

BOOLEAN

Not metropolitan

created_at / updated_at

TIMESTAMP

Lineage

address_validation_status values

What each outcome means when you're working with the data:

Value
What it means for you

valid

The address as filed with CMS is deliverable and standardized — no changes needed. Safe to use practice_* or validated_* columns interchangeably.

corrected

The filed address had issues (typos, non-canonical casing, missing ZIP+4). The validated_* columns contain the corrected, deliverable version — use those for mailing and claim submission, not the practice_* originals.

undeliverable

Address verification determined the location cannot receive mail (invalid apartment, deleted street, etc.). Flag these providers for outreach or directory correction before using them for claims or mailings.

not_found

The address could not be matched against authoritative address records — could be a new construction, an irregular PO box format, or a data-entry error. Verify before mailing or submitting claims.

PROVIDERS_ENTITIES — deduplicated provider roster

One row per unique real-world provider. Multi-NPI duplicate clusters collapsed into a single canonical entity with the full NPI list preserved in all_npis. Critical rollups (any_oig_excluded, any_medicare_enrolled, any_has_critical_quality_flag) aggregate across cluster members so compliance signals are never silently deduped away.

Key Features:

  • Use this for accurate unique-provider counts (avoid inflating from multi-NPI clusters)

  • all_npis ARRAY preserves every NPI in the cluster for roster remediation

  • Any OIG exclusion or Medicare enrollment on any cluster member is surfaced

Key Columns

circle-info

This table highlights the columns used most often. PROVIDERS_ENTITIES has 38 columns in total. Query DWV.DATA_DICTIONARY WHERE table_name = 'PROVIDERS_ENTITIES' for the complete spec.

Column
Type
Description

entity_id

NUMBER

Stable cluster key (duplicate_group_id for clusters, canonical_npi for singletons)

canonical_npi

NUMBER

Representative NPI for the entity

all_npis

ARRAY

Every NPI in this entity (length 1 for singletons)

npi_count

NUMBER

Cluster size

any_medicare_enrolled

BOOLEAN

TRUE if ANY cluster NPI is Medicare-enrolled per PECOS

any_oig_excluded

BOOLEAN

Compliance rollup — TRUE if ANY cluster NPI is on the OIG/LEIE list

any_has_quality_flags

BOOLEAN

TRUE if ANY cluster NPI has a data quality flag

any_has_critical_quality_flag

BOOLEAN

TRUE if ANY cluster NPI has a critical quality flag

entity_type_code

NUMBER

1 = Individual, 2 = Organization

provider_full_name

VARCHAR

Derived display name from canonical NPI

provider_organization_name / last/first/middle_name

VARCHAR

Name parts from canonical NPI

provider_credential_text

VARCHAR

Credentials from canonical NPI

primary_taxonomy_code / specialty_description

VARCHAR

NUCC taxonomy from canonical NPI

cms_specialty

VARCHAR

CMS Care Compare specialty from canonical NPI

is_active

BOOLEAN

Canonical NPI's active status

npi_status

VARCHAR

active or deactivated

is_excluded / exclusion_type_description / exclusion_date

BOOLEAN / VARCHAR / DATE

Canonical NPI's OIG status

medical_school / graduation_year / hospital_affiliation_count / group_practice_name

From CMS Care Compare

overall_quality_score

FLOAT

Composite score from canonical NPI

address_id

VARCHAR

FK to PROVIDERS_ADDRESSES (canonical NPI's practice address)

created_at / updated_at

TIMESTAMP

Lineage

PROVIDERS_FHIR — drop-in FHIR R4.0.1 resources

One row per NPI, 1:1 with PROVIDERS. Three FHIR R4.0.1 resources stored as VARIANT: a Practitioner or Organization (switched on entity_type_code), a PractitionerRole representing Medicare enrollment (NULL when not PECOS-enrolled), and a Location with coordinates. Ready to drop directly into TEFCA, QHIN, EHR, or HIE pipelines — no FHIR transformer required.

Key Features:

  • US Core-aligned field conventions

  • Vendor-scoped extensions for OIG/LEIE exclusion on Practitioner.extension

  • meta.lastUpdated preserved from source

  • Conditional PractitionerRole (Medicare enrollment only) mirrors real-world semantics

Column Reference

All 7 columns:

Column
Type
Description

npi

NUMBER

National Provider Identifier (1:1 with PROVIDERS.npi)

fhir_resource_type

VARCHAR

Practitioner (individual) or Organization (Type 2 NPI)

fhir_practitioner

VARIANT

FHIR R4.0.1 Practitioner or Organization resource as JSON

fhir_practitioner_role

VARIANT

FHIR R4.0.1 PractitionerRole for Medicare enrollment (NULL when not enrolled)

fhir_location

VARIANT

FHIR R4.0.1 Location resource with coordinates

created_at

TIMESTAMP

When the source data was last loaded

updated_at

TIMESTAMP

When this record was last refreshed

PROVIDERS_DUPLICATES — bridge table of suspected duplicate pairs

Tiered-confidence duplicate NPI pairs with similarity scores and recommended primary NPI. The forensic view behind PROVIDERS_ENTITIES — use this to audit dedup quality or build custom dedup logic.

Column Reference

All 10 columns:

Column
Type
Description

npi_1

NUMBER

First NPI in the pair

npi_2

NUMBER

Second NPI (always greater than npi_1)

similarity_score

FLOAT

0.0–1.0 weighted similarity score across name, address, phone, and taxonomy

confidence_tier

VARCHAR

Evidence strength for treating the two NPIs as the same entity — see tier values below

match_type

VARCHAR

Pattern of similarity that triggered the match — see match type values below

entity_class

VARCHAR

individual or organization

recommended_primary

NUMBER

Suggested survivor NPI (lower-numbered = earlier CMS registration)

detection_date

DATE

When the pair was first detected

created_at

TIMESTAMP

When the source data was last loaded

updated_at

TIMESTAMP

When this record was last refreshed

confidence_tier values

How much trust to place in a pair when collapsing a roster:

Value
What it means for you

high_confidence

Strong evidence the two NPIs are the same real-world provider (exact name + shared address or phone). Safe to use directly for roster deduplicationPROVIDERS_ENTITIES pre-collapses these clusters for you.

possible_match

Partial similarity that warrants a human review step before deduplication. Useful for surfacing dedup candidates in an MDM workflow, but don't merge blindly.

match_type values

The pattern of similarity that triggered the pair:

Value
What it means for you

exact_match

Name, address, and/or phone match character-for-character on both NPIs. Highest-confidence category.

probable_match

Strong fuzzy similarity (e.g., name + shared practice address or phone). Meaningful overlap but with minor variations.

fuzzy_variant

Name spelling variants sharing location or taxonomy signals — typically hyphen/space/punctuation differences, or common name aliases. Lower precision but surfaces legitimate name changes and data-entry inconsistencies.

PROVIDERS_IDENTIFIERS — other provider identifiers

Other provider identifiers per NPI — PECOS enrollment IDs, Medicaid provider IDs, prior NPIs, state license numbers per NPPES OTHER_PROVIDER_IDENTIFIER_* spec. One row per (provider, identifier). ~2.76M rows.

Key Features:

  • ~2.76M identifier rows across every NPI that filed one

  • Join PROVIDER_ID = PROVIDERS.ID

  • Includes ALL NPIs (canonical + non-canonical) so historical claim joins don't silently drop deactivated providers

Column
Type
Description

id

VARCHAR(36)

Row PK

provider_id

VARCHAR(36)

FK to PROVIDERS.ID

identifier

VARCHAR

The other identifier value

type_code

VARCHAR

NPPES Other Provider Identifier Type Code

type_value

VARCHAR

Human-readable type label

state

VARCHAR

State code where the identifier was issued

issuer

VARCHAR

Issuing authority

created_at / updated_at

TIMESTAMP

Lineage

PROVIDERS_LICENSES — state medical licenses

State medical licenses per provider. One row per (provider, license). ~6.96M rows.

Key Features:

  • ~6.96M license rows

  • Join PROVIDER_ID = PROVIDERS.ID

Column
Type
Description

id

VARCHAR(36)

Row PK

provider_id

VARCHAR(36)

FK to PROVIDERS.ID

licenses_number

VARCHAR

License number as filed

state

VARCHAR

State code that issued the license

created_at / updated_at

TIMESTAMP

Lineage

PROVIDERS_TAXONOMIES — per-NPI taxonomy rows

Per-NPI taxonomy rows with the raw NPPES PRIMARY flag preserved. One row per (NPI, taxonomy) — up to 15 taxonomies per provider. ~9.94M rows. The HEATLHCARE_PROVIDER_TAXONOMY_CODE column name reflects the spelling carried over from the source NPPES schema.

circle-info

The raw PRIMARY flag is sparsely populated (~15% of rows) per NPPES spec. For a dense inferred-primary signal, use PROVIDERS.INFERRED_PRIMARY_TAXONOMY_CODE on the main PROVIDERS table — we use a ROW_NUMBER fallback to pick the best taxonomy when the raw flag is missing.

Key Features:

  • ~9.94M taxonomy rows

  • Join PROVIDER_ID = PROVIDERS.ID

  • Join HEATLHCARE_PROVIDER_TAXONOMY_CODE = TAXONOMIES.CODE to get classification/specialization names

Column
Type
Description

id

VARCHAR(36)

Row PK

provider_id

VARCHAR(36)

FK to PROVIDERS.ID

taxonomy_id

VARCHAR(36)

FK to TAXONOMIES.ID

heatlhcare_provider_taxonomy_code

VARCHAR

NUCC taxonomy code

primary

BOOLEAN

Raw NPPES primary flag — sparsely populated

created_at / updated_at

TIMESTAMP

Lineage

TAXONOMIES — NUCC taxonomy reference

NUCC (National Uniform Claim Committee) taxonomy code definitions. ~880 rows, updated quarterly. Join from PROVIDERS_TAXONOMIES.HEATLHCARE_PROVIDER_TAXONOMY_CODE or PROVIDERS.PRIMARY_TAXONOMY_CODE to TAXONOMIES.CODE for classification names.

Column
Type
Description

id

VARCHAR

Row PK

code

VARCHAR

10-char NUCC taxonomy code

grouping

VARCHAR

Top-level grouping (e.g., Allopathic & Osteopathic Physicians)

classification

VARCHAR

Mid-level classification (e.g., Internal Medicine)

specialization

VARCHAR

Specialization (e.g., Cardiovascular Disease)

definition

VARCHAR

NUCC's plain-language definition

notes

VARCHAR

Additional NUCC notes

display_name

VARCHAR

Pre-concatenated human-readable name

section

VARCHAR

NUCC spec section reference

created_at / updated_at

TIMESTAMP

Lineage

Data Quality

Standardization

Every address is USPS-validated and geocoded, with AI-driven normalization as a fallback for addresses USPS can't resolve. Taxonomy codes are filled in via inference when NPPES is missing a primary — is_specialty_inferred flags which records were enriched. NPI lifecycle fields (is_active, npi_status) are computed from deactivation and reactivation dates to handle NPPES's sparse status flagging.

Data Freshness

Check when data was last refreshed using PRODUCT_INFO:

Getting Started

Platform Schema Reference

This dataset is available on Snowflake Marketplace. Queries use a schema-only reference — the database is set by the share context:

Platform
Schema
Example

Snowflake

DWV

DWV.PROVIDERS

Discover Available Data

Start with PRODUCT_INFO for a product-wide snapshot and DATA_DICTIONARY for column-level descriptions on every view.

Foundational Queries

Specialty breakdown with Medicare enrollment rate

Revenue Cycle: Reduce Claim Denials from Bad NPI Data

See original CMS addresses next to USPS-corrected addresses side by side — the records that would have triggered denials on the raw NPPES feed.

Spatial and Demographic Analytics

Market sizing: provider density by metro with ACS demographics

Network adequacy: providers within 10 miles of an anchor NPI (native geo_point)

Anchor the search to a real provider's practice location and find every other provider within a radius. Swap anchor_npi for any NPI to re-target the search.

TEFCA and FHIR-Ready Interop

Drop-in FHIR resources for EHR, HIE, and QHIN pipelines

Three FHIR R4.0.1 resources per NPI — a Practitioner or Organization, a PractitionerRole for Medicare enrollment, and a Location. Inspect the VARIANT in Snowsight's JSON viewer to verify conformance.

FHIR bulk export: one resource type per row for NDJSON streaming

Compose a FHIR bulk export bundle — UNION ALL the three resource types into a single NDJSON-shaped stream for QHIN or $export pipelines.

Medicare Network Planning: PECOS-Enrolled Providers in HPSA Areas

Cross-reference PECOS Medicare enrollment with HRSA HPSA shortage designations to surface in-network providers where they're most needed.

Roster Dedup Without Losing Signal

High-confidence duplicate NPI pairs with similarity scores

Tiered-confidence duplicate pairs with match type and recommended primary NPI — safe to use for roster collapse without losing historical claim lineage.

HIPAA Compliance: OIG-Excluded Providers Still Active in NPPES

OIG/LEIE-flagged NPIs that are still showing as active in NPPES — pull this before billing or claim submission.

Who Uses This Data

Common Use Cases

  • Revenue cycle: reduce first-pass claim denials — Replace raw NPPES with validated, USPS-corrected addresses, verified NPI lifecycle status, and PECOS-confirmed Medicare enrollment to cut denials caused by bad provider master data

  • Biennial revalidation and HIPAA compliance — OIG/LEIE exclusion flags and revalidation-aware NPI status help meet the 2026 CMS biennial revalidation mandate and reduce exposure to HIPAA fines

  • TEFCA, QHIN, and HIE interoperability — Query pre-built FHIR R4.0.1 Practitioner, Organization, PractitionerRole, and Location resources for drop-in TEFCA and bulk-export pipelines without building a FHIR transformer

  • Provider network design for VBC and ACO programs — Build deduplicated provider networks with PROVIDERS_ENTITIES, cross-reference PECOS enrollment, and filter by HPSA shortage and congressional district

  • Network adequacy modeling and market sizing — Use native geo_point GEOGRAPHY with ST_DWITHIN / ST_DISTANCE plus ACS demographics (median household income, population, education) and CBSA metro boundaries

  • Provider roster master data management (MDM)PROVIDERS_DUPLICATES and PROVIDERS_ENTITIES together provide a golden record of unique real-world providers with the signal (OIG, Medicare, quality) preserved across every multi-NPI cluster

  • Underserved-area provider analytics — HRSA HPSA primary care, mental health, and dental shortage scores plus MUA/MUP designations layered on every provider for rural health, CMMI, and 340B analysis

This dataset pairs well with:

  • CMS NPPES Provider Dataset — The raw source registry that this Golden Record is built on. Subscribe to the raw feed if you need the unprocessed relational tables (addresses, licenses, identifiers, taxonomies) for custom research; subscribe to the Golden Record if you want the same provider universe cleaned, deduped, enriched, and FHIR-ready

  • HRSA Healthcare Resources Dataset — Source of the county-level HPSA, MUA, and AHRF workforce signals used here. Join to HRSA for workforce rate analysis by county

  • CMS Data Feeds Dataset — Complete CMS data catalog including Care Compare and PECOS source feeds

  • TEFCA / QHIN implementations — Drop-in FHIR R4.0.1 resources compatible with the ASTP TEFCA Common Agreement and the QHIN Directory schema

Frequently Asked Questions

How is the NPPES Provider Golden Record different from the raw CMS NPPES Provider Dataset? The raw dataset is the unprocessed NPI registry as CMS publishes it weekly — addresses with typos and outdated ZIPs, no deduplication, no Medicare enrollment, no OIG exclusion flags, no FHIR mapping, no geo enrichment. The Golden Record is that same universe after USPS address validation (around 23% of addresses get corrected), geocoding with census and ACS enrichment, PECOS Medicare enrollment joins, OIG/LEIE exclusion checks, CMS Care Compare hospital and group affiliations, NUCC taxonomy gap-filling, tiered-confidence duplicate detection, and FHIR R4.0.1 resource generation. Most NPPES consumers eventually write their own cleanup pipeline — this dataset replaces that pipeline.

How often is the data updated? Weekly, automatically. Our pipeline monitors CMS for each new NPPES release and loads the updated, validated, and enriched data within days of publication — no manual intervention needed. Coverage spans every active and deactivated NPI in the registry (9M+ providers).

What FHIR version is supported? FHIR R4.0.1. Every row in PROVIDERS_FHIR contains a US Core-aligned Practitioner or Organization resource, a PractitionerRole for Medicare enrollment (NULL when not PECOS-enrolled), and a Location resource. The resources use standard http://hl7.org/fhir/sid/us-npi identifier systems and can be dropped directly into a QHIN, EHR, or HIE pipeline.

Can I use the Snowflake GEOGRAPHY column for spatial queries? Yes. PROVIDERS_ADDRESSES.geo_point is a native Snowflake GEOGRAPHY type built from the geocoded lat/long. Use it with ST_DISTANCE, ST_DWITHIN, ST_CONTAINS, and other Snowflake spatial functions for proximity, radius, and polygon containment queries without converting from lat/long on every query.

How do I query "unique providers" without double-counting multi-NPI entities? Use PROVIDERS_ENTITIES. It pre-collapses multi-NPI duplicate clusters into one row per real-world entity while preserving any_oig_excluded and any_medicare_enrolled rollups so compliance signals aren't silently deduped away. Alternatively, filter PROVIDERS with is_canonical = TRUE for a one-row-per-entity view that keeps the full column set.

Are inactive or deactivated NPIs included? Yes. NPPES never recycles an NPI and historical claims still reference deactivated NPIs, so the dataset preserves them. Use is_active = TRUE and is_canonical = TRUE for a clean active-provider roster, or query without those filters for historical analysis.

What's the pricing and is there a trial? Full access is $800/month on Snowflake Marketplace. A 14-day free trial includes 1,500 NPIs that are consistent across every view — the trial set is pre-materialized for fast evaluation queries. Metadata views (PRODUCT_INFO, DATA_DICTIONARY) are always fully available.

Get Started

circle-check

About Dataplex

Dataplex Consulting & Data Products delivers turnkey, analytics-ready data products that make complex public and commercial data easy to use across modern data platforms. Our data pipelines include automated quality checks and active monitoring to ensure timely, reliable, and well-structured data that is ready for downstream analytics, machine learning, and operational use.

In addition to data products, Dataplex provides data engineering and analytics consulting services to organizations of all sizes. We bring deep, hands-on experience supporting both early-stage companies and large enterprises, helping teams build scalable data platforms, improve data reliability, and become more data-driven.

Last updated