NPPES Provider Golden Record
About the Dataset
Every team that builds on NPPES eventually builds the same thing — a cleanup layer to make the raw registry usable. The Python scripts, the address scrubbers, the dedup logic, the exclusion cross-checks, the FHIR mapping work. It's the project nobody budgeted for and everybody ends up owning. We built it once, so your team doesn't have to.
The NPPES Provider Golden Record is the cleaned, deduplicated, enriched version of the Centers for Medicare & Medicaid Services (CMS) National Plan and Provider Enumeration System (NPPES) — refreshed weekly within days of each CMS release.
We validate every practice address against USPS postal records, standardize it to deliverable form, and score each with a confidence rating — so claim denials from bad ZIPs and typos stop at our pipeline, not yours. We surface the duplicate NPI pairs — candidate matches across millions of providers — and resolve them with tiered-confidence scoring, so the same physician doesn't show up three times in your network map. We cross-reference every NPI against CMS PECOS for Medicare enrollment, specialty, and group practice PAC ID — and flag OIG/LEIE exclusions and deactivated NPIs before they reach your billing team.
We fill missing NUCC taxonomy gaps, attach hospital CCN and group practice affiliations from CMS Care Compare, and enrich every address with census tract, CBSA metro, HPSA shortage scores, MUA designations, congressional and state-legislative districts, IANA timezone, and a native Snowflake geo_point GEOGRAPHY column for ST_DISTANCE and ST_DWITHIN spatial queries. And we publish pre-built FHIR R4.0.1 Practitioner, Organization, Location, and PractitionerRole resources — so your next TEFCA, QHIN, HIE, or EHR integration project starts on day thirty instead of day zero.
One weekly-refreshed source of truth for revenue cycle, credentialing, network planning, compliance, and TEFCA-ready interop.
Get Full Access | Snowflake Marketplace | Free Trial
Upgrade Path for CMS NPPES Provider Dataset Subscribers: This is the premium counterpart to the raw CMS NPPES Provider Dataset. The Golden Record delivers every NPI in NPPES after address validation, census/ACS enrichment, PECOS enrollment joins, OIG exclusion checks, duplicate clustering, and FHIR mapping — consolidating work that teams typically distribute across internal data quality, interoperability, and compliance pipelines into a single analytics-ready dataset refreshed weekly on the CMS release cadence.
Backwards-compatible by design: every table name, column, and join path from the CMS NPPES Provider Dataset is preserved in the Golden Record. Existing subscribers can swap the database name in their queries — PROVIDERS, PROVIDERS_ADDRESSES, PROVIDERS_IDENTIFIERS, PROVIDERS_LICENSES, PROVIDERS_TAXONOMIES, and TAXONOMIES keep their original shape, so existing SQL runs unchanged. New enrichment columns and tables are layered alongside.
What You Get
Addresses validated & geocoded
Every practice address USPS-standardized, confidence-scored, and paired with a native Snowflake GEOGRAPHY point for ST_DISTANCE / ST_DWITHIN spatial queries
Duplicate NPIs resolved
Tiered-confidence clustering collapses multi-NPI providers into unique real-world entities, with the canonical and all alternate NPIs preserved
Medicare enrollment joined
PECOS enrollment status, specialty, enrollment state, and group practice PAC ID attached to every NPI
Compliance risks flagged
OIG/LEIE exclusions, deactivated NPIs, and critical data-quality issues surfaced as clean booleans
Taxonomy gaps filled
Missing NUCC primary taxonomy codes inferred and flagged, so specialty classification is complete across every record
Hospital & group affiliations appended
CMS Care Compare hospital CCN affiliations, group practice, medical school, graduation year
Geographic & demographic context enriched
Census tract, CBSA metro, HPSA shortage scores, MUA designations, congressional and state legislative districts, IANA timezone, ACS demographics and economics
Interop resources pre-built
FHIR R4.0.1 Practitioner, Organization, Location, and PractitionerRole resources ready to drop into TEFCA, QHIN, HIE, or EHR pipelines
Coverage
Every US state, DC, and territories
Refresh cadence
Weekly — our pipeline monitors for new CMS NPPES releases and loads the validated, enriched data within days of publication, automatically
Consumer tables (11 in the share):
Provider master:
PROVIDERS,PROVIDERS_ADDRESSES,PROVIDERS_IDENTIFIERS,PROVIDERS_LICENSES,PROVIDERS_TAXONOMIES,TAXONOMIESEnrichment outputs:
PROVIDERS_ENTITIES,PROVIDERS_FHIR,PROVIDERS_DUPLICATESMetadata:
PRODUCT_INFO,DATA_DICTIONARY
Overview
The NPPES Provider Golden Record provides analytics-ready access to every US healthcare provider with enrichment that isn't available in the raw CMS feed.
Provider master:
PROVIDERS – One row per NPI. Identity, lifecycle, secondary-name fields, taxonomy enrichment, OIG exclusion, CMS Care Compare hospital/group affiliations, PECOS Medicare enrollment, duplicate cluster flags, quality signals
PROVIDERS_ADDRESSES – One row per (NPI, address_type). Both
Practice_LocationandBusiness_Mailingrows per provider. Practice rows carry USPS-validated address, geocoded lat/long + native Snowflakegeo_pointGEOGRAPHY, census tract/block/place codes, ACS demographics, HPSA and MUA shortage designations, congressional + state-legislative districts, IANA timezone — mailing rows are raw NPPES passthroughPROVIDERS_IDENTIFIERS – Other provider identifiers (PECOS, Medicaid, prior NPIs, state license numbers per NPPES
OTHER_PROVIDER_IDENTIFIER_*). JoinPROVIDER_ID = PROVIDERS.IDPROVIDERS_LICENSES – State medical licenses (state + license number)
PROVIDERS_TAXONOMIES – Per-NPI taxonomy rows with the raw NPPES
PRIMARYflag (sparsely populated ~15% of rows — for dense primary-taxonomy signal usePROVIDERS.INFERRED_PRIMARY_TAXONOMY_CODE)TAXONOMIES – NUCC taxonomy reference (~880 codes with grouping, classification, specialization, definition)
Enrichment outputs:
PROVIDERS_ENTITIES – One row per unique real-world provider with multi-NPI duplicate clusters pre-collapsed and critical rollups preserved (
any_medicare_enrolled,any_oig_excluded) — query this for accurate unique-provider countsPROVIDERS_FHIR – FHIR R4.0.1
Practitioner(orOrganization),PractitionerRolefor Medicare enrollment, andLocationresources stored as VARIANT, ready to drop into TEFCA, QHIN, EHR, or HIE pipelinesPROVIDERS_DUPLICATES – Bridge table of suspected duplicate NPI pairs with similarity scores, match type, confidence tier, and recommended surviving NPI — forensic view behind
PROVIDERS_ENTITIES
Metadata Tables
Every Dataplex data product includes these standard metadata tables:
PRODUCT_INFO
Single-row product snapshot — current provider count, address count, enrichment coverage, last refresh
DATA_DICTIONARY
Column descriptions for every consumer-facing view
Entity Relationship Diagram

The primary join key is npi for every consumer-facing table. PROVIDERS.ID (VARCHAR UUID) is the FK target for PROVIDERS_IDENTIFIERS.PROVIDER_ID, PROVIDERS_LICENSES.PROVIDER_ID, and PROVIDERS_TAXONOMIES.PROVIDER_ID. Address enrichment joins via NPI: PROVIDERS_ADDRESSES is one row per (NPI, address_type) so each provider has up to two rows (Practice + Mailing).
Data Tables
PROVIDERS — the primary consumer table
One row per NPI. Identity, taxonomy, OIG exclusion, CMS Care Compare, PECOS Medicare enrollment, duplicate flags, and quality scoring. Join to PROVIDERS_ADDRESSES on npi for validated addresses and geo/demographic enrichment. Filter is_canonical = TRUE to dedupe a roster. The id column (VARCHAR UUID) is the FK target for PROVIDERS_IDENTIFIERS, PROVIDERS_LICENSES, and PROVIDERS_TAXONOMIES — join via PROVIDER_ID = PROVIDERS.ID.
Key Features:
One row per NPI (9.5M+ US healthcare providers including active + deactivated)
Validated identity and lifecycle fields including the
PROVIDER_OTHER_*secondary-name family (maiden, former, DBA/AKA)PECOS Medicare enrollment joined per NPI —
medicare_enrolledis strict TRUE/FALSEOIG/LEIE exclusion flagged per NPI with exclusion type and date
CMS Care Compare hospital affiliations, group practice, medical school, graduation year
Duplicate cluster labeling with a stable
is_canonicalfilter for safe roster dedupRefreshed weekly on the CMS NPPES release cadence
Key Columns
This table highlights the columns used most often. PROVIDERS has 85+ columns in total including identity fields (id, 10× PROVIDER_OTHER_*). Query DWV.DATA_DICTIONARY WHERE table_name = 'PROVIDERS' for the complete, authoritative spec with types and descriptions — we publish the metadata to Snowflake so it always matches what's in the view.
npi
NUMBER
National Provider Identifier — 10-digit CMS ID, primary key
address_id
VARCHAR
FK to PROVIDERS_ADDRESSES.address_id — join for address + enrichment
entity_type_code
NUMBER
1 = Individual, 2 = Organization
is_individual / is_organization
BOOLEAN
Convenience flags derived from entity_type_code
provider_organization_name
VARCHAR
Legal business name (organizations)
provider_last_name / first_name / middle_name
VARCHAR
Name parts (individuals)
provider_full_name
VARCHAR
Derived display name
provider_credential_text
VARCHAR
MD, DO, NP, etc.
provider_enumeration_date
DATE
When CMS first issued the NPI
is_active
BOOLEAN
Currently active (not deactivated)
npi_status
VARCHAR
active or deactivated
years_in_practice
NUMBER
Years since enumeration
primary_taxonomy_code
VARCHAR
NUCC taxonomy code (e.g., 207R00000X = Internal Medicine)
specialty_description
VARCHAR
Human-readable specialty from NUCC lookup
is_specialty_inferred
BOOLEAN
TRUE when taxonomy was inferred to fill a NPPES gap
is_excluded
BOOLEAN
On the OIG/LEIE exclusion list
exclusion_type_description
VARCHAR
OIG exclusion reason
exclusion_date
DATE
Date added to OIG exclusion list
medicare_enrolled
BOOLEAN
Enrolled in Medicare per PECOS (strict TRUE/FALSE)
medicare_specialty_description
VARCHAR
PECOS-reported Medicare specialty
medicare_enrollment_state
VARCHAR
State of PECOS enrollment
medicare_group_practice_pac_id
VARCHAR
PECOS group practice PAC ID
medical_school
VARCHAR
From CMS Care Compare
graduation_year
NUMBER
From CMS Care Compare
hospital_affiliation_count
NUMBER
Number of CCN affiliations (0–3)
group_practice_name
VARCHAR
Primary group practice from CMS Care Compare
is_canonical
BOOLEAN
TRUE for the canonical NPI in a duplicate cluster — filter for a deduplicated roster
is_duplicate_suspected
BOOLEAN
Participates in a high-confidence duplicate pair
duplicate_group_id
NUMBER
Cluster ID across all members (NULL if singleton)
duplicate_cluster_size
NUMBER
Count of NPIs in this cluster
has_quality_flags
BOOLEAN
Any active data quality flag
has_critical_quality_flag
BOOLEAN
Critical-severity flag (e.g., undeliverable address)
overall_quality_score
FLOAT
0.0–1.0 composite quality score
created_at
TIMESTAMP
When the source data was last loaded
updated_at
TIMESTAMP
When this record was last refreshed
PROVIDERS_ADDRESSES — validated addresses with full geo and demographic enrichment
One row per (NPI, address_type) — both Practice_Location and Business_Mailing rows per provider. Practice rows carry the full enrichment stack (USPS validation, Geocodio census/CBSA/HPSA/MUA, lat/lng/geo_point GEOGRAPHY, political districts, timezone, ACS demographics). Mailing rows are raw NPPES fields only with enrichment columns NULL.
Key Features:
One row per (NPI, address_type) — ~19M rows total, ~9.5M practice + ~9.5M mailing
Raw NPPES fields:
street1,street2,city,state,zipcode,countrycode,telephone_number,fax_number,address_type,id(VARCHAR UUID),provider_id(VARCHAR UUID FK toPROVIDERS.ID)Practice rows carry USPS validation with
address_validation_statusindicating valid / corrected / undeliverable / not_found (see status values below)Geocoded lat/long coordinates and native
geo_point GEOGRAPHYon practice rowsCensus tract, block, place FIPS codes
ACS demographics and economics (population, median age, race/ethnicity, median household income, housing, education, veteran status)
HRSA HPSA shortage scores (primary care, mental health, dental) and MUA designations
Congressional and state legislative districts
IANA timezone with UTC offset and DST observance
Filter WHERE address_type = 'Practice_Location' for enriched geo analytics. Filter WHERE address_type = 'Business_Mailing' for correspondence addresses.
Key Columns
This table highlights the columns used most often. PROVIDERS_ADDRESSES has ~60 columns in total. Query DWV.DATA_DICTIONARY WHERE table_name = 'PROVIDERS_ADDRESSES' for the complete, authoritative spec — we publish the metadata to Snowflake so it always matches what's in the view.
id
VARCHAR(36)
Row PK
provider_id
VARCHAR(36)
FK to PROVIDERS.ID
npi
NUMBER
NPI — matches PROVIDERS.NPI
address_type
VARCHAR
Practice_Location or Business_Mailing
street1 / street2 / city / state / zipcode
VARCHAR
Address as filed with CMS
countrycode
VARCHAR
ISO country code (US)
telephone_number / fax_number
VARCHAR
Contact numbers as filed
validated_address_line1 / validated_city / validated_state / validated_zip / validated_zip4
VARCHAR
USPS-validated canonical address (practice rows only)
latitude / longitude
FLOAT
Geocoded coordinates (practice rows only)
geo_point
GEOGRAPHY
Native Snowflake spatial type — use with ST_DISTANCE, ST_DWITHIN, ST_CONTAINS (practice rows only)
address_validation_status
VARCHAR
Outcome of address verification (practice rows) — see status values below
address_confidence_score
FLOAT
0.0–1.0 validation confidence
state_fips / county_fips
VARCHAR
Census FIPS codes
county_name
VARCHAR
County display name
tract_code / block_code / block_group
VARCHAR
Census Bureau tract/block geography
metro_area_name / metro_area_code / metro_area_type
VARCHAR
CBSA name, code, metropolitan/micropolitan
combined_statistical_area_name
VARCHAR
CSA (where applicable)
congressional_district / congressional_district_name
VARCHAR
US House district
state_senate_district / state_house_district
VARCHAR
State legislative districts
timezone_name
VARCHAR
IANA timezone (e.g., America/New_York)
timezone_utc_offset
NUMBER
UTC offset in hours
total_population
NUMBER
ACS total population for the tract
median_household_income
NUMBER
ACS median household income
pct_bachelors_or_higher
FLOAT
ACS educational attainment
is_in_hpsa
BOOLEAN
County has an active HRSA HPSA designation
hpsa_primary_care_score / mental_health_score / dental_score
NUMBER
HPSA severity (0–26, higher = more severe shortage)
is_in_mua
BOOLEAN
HRSA Medically Underserved Area designation
is_rural
BOOLEAN
Not metropolitan
created_at / updated_at
TIMESTAMP
Lineage
address_validation_status values
address_validation_status valuesWhat each outcome means when you're working with the data:
valid
The address as filed with CMS is deliverable and standardized — no changes needed. Safe to use practice_* or validated_* columns interchangeably.
corrected
The filed address had issues (typos, non-canonical casing, missing ZIP+4). The validated_* columns contain the corrected, deliverable version — use those for mailing and claim submission, not the practice_* originals.
undeliverable
Address verification determined the location cannot receive mail (invalid apartment, deleted street, etc.). Flag these providers for outreach or directory correction before using them for claims or mailings.
not_found
The address could not be matched against authoritative address records — could be a new construction, an irregular PO box format, or a data-entry error. Verify before mailing or submitting claims.
PROVIDERS_ENTITIES — deduplicated provider roster
One row per unique real-world provider. Multi-NPI duplicate clusters collapsed into a single canonical entity with the full NPI list preserved in all_npis. Critical rollups (any_oig_excluded, any_medicare_enrolled, any_has_critical_quality_flag) aggregate across cluster members so compliance signals are never silently deduped away.
Key Features:
Use this for accurate unique-provider counts (avoid inflating from multi-NPI clusters)
all_npisARRAY preserves every NPI in the cluster for roster remediationAny OIG exclusion or Medicare enrollment on any cluster member is surfaced
Key Columns
This table highlights the columns used most often. PROVIDERS_ENTITIES has 38 columns in total. Query DWV.DATA_DICTIONARY WHERE table_name = 'PROVIDERS_ENTITIES' for the complete spec.
entity_id
NUMBER
Stable cluster key (duplicate_group_id for clusters, canonical_npi for singletons)
canonical_npi
NUMBER
Representative NPI for the entity
all_npis
ARRAY
Every NPI in this entity (length 1 for singletons)
npi_count
NUMBER
Cluster size
any_medicare_enrolled
BOOLEAN
TRUE if ANY cluster NPI is Medicare-enrolled per PECOS
any_oig_excluded
BOOLEAN
Compliance rollup — TRUE if ANY cluster NPI is on the OIG/LEIE list
any_has_quality_flags
BOOLEAN
TRUE if ANY cluster NPI has a data quality flag
any_has_critical_quality_flag
BOOLEAN
TRUE if ANY cluster NPI has a critical quality flag
entity_type_code
NUMBER
1 = Individual, 2 = Organization
provider_full_name
VARCHAR
Derived display name from canonical NPI
provider_organization_name / last/first/middle_name
VARCHAR
Name parts from canonical NPI
provider_credential_text
VARCHAR
Credentials from canonical NPI
primary_taxonomy_code / specialty_description
VARCHAR
NUCC taxonomy from canonical NPI
cms_specialty
VARCHAR
CMS Care Compare specialty from canonical NPI
is_active
BOOLEAN
Canonical NPI's active status
npi_status
VARCHAR
active or deactivated
is_excluded / exclusion_type_description / exclusion_date
BOOLEAN / VARCHAR / DATE
Canonical NPI's OIG status
medical_school / graduation_year / hospital_affiliation_count / group_practice_name
—
From CMS Care Compare
overall_quality_score
FLOAT
Composite score from canonical NPI
address_id
VARCHAR
FK to PROVIDERS_ADDRESSES (canonical NPI's practice address)
created_at / updated_at
TIMESTAMP
Lineage
PROVIDERS_FHIR — drop-in FHIR R4.0.1 resources
One row per NPI, 1:1 with PROVIDERS. Three FHIR R4.0.1 resources stored as VARIANT: a Practitioner or Organization (switched on entity_type_code), a PractitionerRole representing Medicare enrollment (NULL when not PECOS-enrolled), and a Location with coordinates. Ready to drop directly into TEFCA, QHIN, EHR, or HIE pipelines — no FHIR transformer required.
Key Features:
US Core-aligned field conventions
Vendor-scoped extensions for OIG/LEIE exclusion on
Practitioner.extensionmeta.lastUpdatedpreserved from sourceConditional
PractitionerRole(Medicare enrollment only) mirrors real-world semantics
Column Reference
All 7 columns:
npi
NUMBER
National Provider Identifier (1:1 with PROVIDERS.npi)
fhir_resource_type
VARCHAR
Practitioner (individual) or Organization (Type 2 NPI)
fhir_practitioner
VARIANT
FHIR R4.0.1 Practitioner or Organization resource as JSON
fhir_practitioner_role
VARIANT
FHIR R4.0.1 PractitionerRole for Medicare enrollment (NULL when not enrolled)
fhir_location
VARIANT
FHIR R4.0.1 Location resource with coordinates
created_at
TIMESTAMP
When the source data was last loaded
updated_at
TIMESTAMP
When this record was last refreshed
PROVIDERS_DUPLICATES — bridge table of suspected duplicate pairs
Tiered-confidence duplicate NPI pairs with similarity scores and recommended primary NPI. The forensic view behind PROVIDERS_ENTITIES — use this to audit dedup quality or build custom dedup logic.
Column Reference
All 10 columns:
npi_1
NUMBER
First NPI in the pair
npi_2
NUMBER
Second NPI (always greater than npi_1)
similarity_score
FLOAT
0.0–1.0 weighted similarity score across name, address, phone, and taxonomy
confidence_tier
VARCHAR
Evidence strength for treating the two NPIs as the same entity — see tier values below
entity_class
VARCHAR
individual or organization
recommended_primary
NUMBER
Suggested survivor NPI (lower-numbered = earlier CMS registration)
detection_date
DATE
When the pair was first detected
created_at
TIMESTAMP
When the source data was last loaded
updated_at
TIMESTAMP
When this record was last refreshed
confidence_tier values
confidence_tier valuesHow much trust to place in a pair when collapsing a roster:
high_confidence
Strong evidence the two NPIs are the same real-world provider (exact name + shared address or phone). Safe to use directly for roster deduplication — PROVIDERS_ENTITIES pre-collapses these clusters for you.
possible_match
Partial similarity that warrants a human review step before deduplication. Useful for surfacing dedup candidates in an MDM workflow, but don't merge blindly.
match_type values
match_type valuesThe pattern of similarity that triggered the pair:
exact_match
Name, address, and/or phone match character-for-character on both NPIs. Highest-confidence category.
probable_match
Strong fuzzy similarity (e.g., name + shared practice address or phone). Meaningful overlap but with minor variations.
fuzzy_variant
Name spelling variants sharing location or taxonomy signals — typically hyphen/space/punctuation differences, or common name aliases. Lower precision but surfaces legitimate name changes and data-entry inconsistencies.
PROVIDERS_IDENTIFIERS — other provider identifiers
Other provider identifiers per NPI — PECOS enrollment IDs, Medicaid provider IDs, prior NPIs, state license numbers per NPPES OTHER_PROVIDER_IDENTIFIER_* spec. One row per (provider, identifier). ~2.76M rows.
Key Features:
~2.76M identifier rows across every NPI that filed one
Join
PROVIDER_ID = PROVIDERS.IDIncludes ALL NPIs (canonical + non-canonical) so historical claim joins don't silently drop deactivated providers
id
VARCHAR(36)
Row PK
provider_id
VARCHAR(36)
FK to PROVIDERS.ID
identifier
VARCHAR
The other identifier value
type_code
VARCHAR
NPPES Other Provider Identifier Type Code
type_value
VARCHAR
Human-readable type label
state
VARCHAR
State code where the identifier was issued
issuer
VARCHAR
Issuing authority
created_at / updated_at
TIMESTAMP
Lineage
PROVIDERS_LICENSES — state medical licenses
State medical licenses per provider. One row per (provider, license). ~6.96M rows.
Key Features:
~6.96M license rows
Join
PROVIDER_ID = PROVIDERS.ID
id
VARCHAR(36)
Row PK
provider_id
VARCHAR(36)
FK to PROVIDERS.ID
licenses_number
VARCHAR
License number as filed
state
VARCHAR
State code that issued the license
created_at / updated_at
TIMESTAMP
Lineage
PROVIDERS_TAXONOMIES — per-NPI taxonomy rows
Per-NPI taxonomy rows with the raw NPPES PRIMARY flag preserved. One row per (NPI, taxonomy) — up to 15 taxonomies per provider. ~9.94M rows. The HEATLHCARE_PROVIDER_TAXONOMY_CODE column name reflects the spelling carried over from the source NPPES schema.
The raw PRIMARY flag is sparsely populated (~15% of rows) per NPPES spec. For a dense inferred-primary signal, use PROVIDERS.INFERRED_PRIMARY_TAXONOMY_CODE on the main PROVIDERS table — we use a ROW_NUMBER fallback to pick the best taxonomy when the raw flag is missing.
Key Features:
~9.94M taxonomy rows
Join
PROVIDER_ID = PROVIDERS.IDJoin
HEATLHCARE_PROVIDER_TAXONOMY_CODE = TAXONOMIES.CODEto get classification/specialization names
id
VARCHAR(36)
Row PK
provider_id
VARCHAR(36)
FK to PROVIDERS.ID
taxonomy_id
VARCHAR(36)
FK to TAXONOMIES.ID
heatlhcare_provider_taxonomy_code
VARCHAR
NUCC taxonomy code
primary
BOOLEAN
Raw NPPES primary flag — sparsely populated
created_at / updated_at
TIMESTAMP
Lineage
TAXONOMIES — NUCC taxonomy reference
NUCC (National Uniform Claim Committee) taxonomy code definitions. ~880 rows, updated quarterly. Join from PROVIDERS_TAXONOMIES.HEATLHCARE_PROVIDER_TAXONOMY_CODE or PROVIDERS.PRIMARY_TAXONOMY_CODE to TAXONOMIES.CODE for classification names.
id
VARCHAR
Row PK
code
VARCHAR
10-char NUCC taxonomy code
grouping
VARCHAR
Top-level grouping (e.g., Allopathic & Osteopathic Physicians)
classification
VARCHAR
Mid-level classification (e.g., Internal Medicine)
specialization
VARCHAR
Specialization (e.g., Cardiovascular Disease)
definition
VARCHAR
NUCC's plain-language definition
notes
VARCHAR
Additional NUCC notes
display_name
VARCHAR
Pre-concatenated human-readable name
section
VARCHAR
NUCC spec section reference
created_at / updated_at
TIMESTAMP
Lineage
Data Quality
Standardization
Every address is USPS-validated and geocoded, with AI-driven normalization as a fallback for addresses USPS can't resolve. Taxonomy codes are filled in via inference when NPPES is missing a primary — is_specialty_inferred flags which records were enriched. NPI lifecycle fields (is_active, npi_status) are computed from deactivation and reactivation dates to handle NPPES's sparse status flagging.
Data Freshness
Check when data was last refreshed using PRODUCT_INFO:
Getting Started
Platform Schema Reference
This dataset is available on Snowflake Marketplace. Queries use a schema-only reference — the database is set by the share context:
Snowflake
DWV
DWV.PROVIDERS
Discover Available Data
Start with PRODUCT_INFO for a product-wide snapshot and DATA_DICTIONARY for column-level descriptions on every view.
Foundational Queries
Specialty breakdown with Medicare enrollment rate
Revenue Cycle: Reduce Claim Denials from Bad NPI Data
See original CMS addresses next to USPS-corrected addresses side by side — the records that would have triggered denials on the raw NPPES feed.
Spatial and Demographic Analytics
Market sizing: provider density by metro with ACS demographics
Network adequacy: providers within 10 miles of an anchor NPI (native geo_point)
geo_point)Anchor the search to a real provider's practice location and find every other provider within a radius. Swap anchor_npi for any NPI to re-target the search.
TEFCA and FHIR-Ready Interop
Drop-in FHIR resources for EHR, HIE, and QHIN pipelines
Three FHIR R4.0.1 resources per NPI — a Practitioner or Organization, a PractitionerRole for Medicare enrollment, and a Location. Inspect the VARIANT in Snowsight's JSON viewer to verify conformance.
FHIR bulk export: one resource type per row for NDJSON streaming
Compose a FHIR bulk export bundle — UNION ALL the three resource types into a single NDJSON-shaped stream for QHIN or $export pipelines.
Medicare Network Planning: PECOS-Enrolled Providers in HPSA Areas
Cross-reference PECOS Medicare enrollment with HRSA HPSA shortage designations to surface in-network providers where they're most needed.
Roster Dedup Without Losing Signal
High-confidence duplicate NPI pairs with similarity scores
Tiered-confidence duplicate pairs with match type and recommended primary NPI — safe to use for roster collapse without losing historical claim lineage.
HIPAA Compliance: OIG-Excluded Providers Still Active in NPPES
OIG/LEIE-flagged NPIs that are still showing as active in NPPES — pull this before billing or claim submission.
Who Uses This Data
Common Use Cases
Revenue cycle: reduce first-pass claim denials — Replace raw NPPES with validated, USPS-corrected addresses, verified NPI lifecycle status, and PECOS-confirmed Medicare enrollment to cut denials caused by bad provider master data
Biennial revalidation and HIPAA compliance — OIG/LEIE exclusion flags and revalidation-aware NPI status help meet the 2026 CMS biennial revalidation mandate and reduce exposure to HIPAA fines
TEFCA, QHIN, and HIE interoperability — Query pre-built FHIR R4.0.1
Practitioner,Organization,PractitionerRole, andLocationresources for drop-in TEFCA and bulk-export pipelines without building a FHIR transformerProvider network design for VBC and ACO programs — Build deduplicated provider networks with
PROVIDERS_ENTITIES, cross-reference PECOS enrollment, and filter by HPSA shortage and congressional districtNetwork adequacy modeling and market sizing — Use native
geo_pointGEOGRAPHY withST_DWITHIN/ST_DISTANCEplus ACS demographics (median household income, population, education) and CBSA metro boundariesProvider roster master data management (MDM) —
PROVIDERS_DUPLICATESandPROVIDERS_ENTITIEStogether provide a golden record of unique real-world providers with the signal (OIG, Medicare, quality) preserved across every multi-NPI clusterUnderserved-area provider analytics — HRSA HPSA primary care, mental health, and dental shortage scores plus MUA/MUP designations layered on every provider for rural health, CMMI, and 340B analysis
Related Datasets & Research
This dataset pairs well with:
CMS NPPES Provider Dataset — The raw source registry that this Golden Record is built on. Subscribe to the raw feed if you need the unprocessed relational tables (addresses, licenses, identifiers, taxonomies) for custom research; subscribe to the Golden Record if you want the same provider universe cleaned, deduped, enriched, and FHIR-ready
HRSA Healthcare Resources Dataset — Source of the county-level HPSA, MUA, and AHRF workforce signals used here. Join to HRSA for workforce rate analysis by county
CMS Data Feeds Dataset — Complete CMS data catalog including Care Compare and PECOS source feeds
TEFCA / QHIN implementations — Drop-in FHIR R4.0.1 resources compatible with the ASTP TEFCA Common Agreement and the QHIN Directory schema
Frequently Asked Questions
How is the NPPES Provider Golden Record different from the raw CMS NPPES Provider Dataset? The raw dataset is the unprocessed NPI registry as CMS publishes it weekly — addresses with typos and outdated ZIPs, no deduplication, no Medicare enrollment, no OIG exclusion flags, no FHIR mapping, no geo enrichment. The Golden Record is that same universe after USPS address validation (around 23% of addresses get corrected), geocoding with census and ACS enrichment, PECOS Medicare enrollment joins, OIG/LEIE exclusion checks, CMS Care Compare hospital and group affiliations, NUCC taxonomy gap-filling, tiered-confidence duplicate detection, and FHIR R4.0.1 resource generation. Most NPPES consumers eventually write their own cleanup pipeline — this dataset replaces that pipeline.
How often is the data updated? Weekly, automatically. Our pipeline monitors CMS for each new NPPES release and loads the updated, validated, and enriched data within days of publication — no manual intervention needed. Coverage spans every active and deactivated NPI in the registry (9M+ providers).
What FHIR version is supported?
FHIR R4.0.1. Every row in PROVIDERS_FHIR contains a US Core-aligned Practitioner or Organization resource, a PractitionerRole for Medicare enrollment (NULL when not PECOS-enrolled), and a Location resource. The resources use standard http://hl7.org/fhir/sid/us-npi identifier systems and can be dropped directly into a QHIN, EHR, or HIE pipeline.
Can I use the Snowflake GEOGRAPHY column for spatial queries?
Yes. PROVIDERS_ADDRESSES.geo_point is a native Snowflake GEOGRAPHY type built from the geocoded lat/long. Use it with ST_DISTANCE, ST_DWITHIN, ST_CONTAINS, and other Snowflake spatial functions for proximity, radius, and polygon containment queries without converting from lat/long on every query.
How do I query "unique providers" without double-counting multi-NPI entities?
Use PROVIDERS_ENTITIES. It pre-collapses multi-NPI duplicate clusters into one row per real-world entity while preserving any_oig_excluded and any_medicare_enrolled rollups so compliance signals aren't silently deduped away. Alternatively, filter PROVIDERS with is_canonical = TRUE for a one-row-per-entity view that keeps the full column set.
Are inactive or deactivated NPIs included?
Yes. NPPES never recycles an NPI and historical claims still reference deactivated NPIs, so the dataset preserves them. Use is_active = TRUE and is_canonical = TRUE for a clean active-provider roster, or query without those filters for historical analysis.
What's the pricing and is there a trial?
Full access is $800/month on Snowflake Marketplace. A 14-day free trial includes 1,500 NPIs that are consistent across every view — the trial set is pre-materialized for fast evaluation queries. Metadata views (PRODUCT_INFO, DATA_DICTIONARY) are always fully available.
Get Started
Ready to access the NPPES Provider Golden Record?
Questions? Contact our team for a walkthrough.
Snowflake
Free Trial
About Dataplex
Dataplex Consulting & Data Products delivers turnkey, analytics-ready data products that make complex public and commercial data easy to use across modern data platforms. Our data pipelines include automated quality checks and active monitoring to ensure timely, reliable, and well-structured data that is ready for downstream analytics, machine learning, and operational use.
In addition to data products, Dataplex provides data engineering and analytics consulting services to organizations of all sizes. We bring deep, hands-on experience supporting both early-stage companies and large enterprises, helping teams build scalable data platforms, improve data reliability, and become more data-driven.
Last updated

