stethoscopeCMS Medicaid Provider Spending Dataset

About the Dataset

The CMS Medicaid Provider Spending Dataset delivers 200+ million outpatient Medicaid claims aggregated by provider, procedure, and month across 50+ states — automatically updated when CMS publishes new T-MSIS (Transformed Medicaid Statistical Information System) data. This analytics-ready product enriches spending records with NPI provider demographics from NPPES and state-level data quality ratings from the CMS DQ Atlas. Designed for health policy analysts, Medicaid program administrators, and healthcare consultants who need provider-level Medicaid expenditure intelligence without months of data wrangling.

circle-exclamation

Quick Access

Tables: PROVIDER_SPENDING, BILLING_PROVIDERS, SERVICING_PROVIDERS, HCPCS_CODES, STATE_QUALITY Sources: CMS T-MSIS Analytic Files, NPPES, CMS DQ Atlas Automatic Updates: Pipeline monitors CMS for new releases and loads data within days of publication Coverage: 50+ US states and territories, 2018-present (7+ years)

Overview

The CMS Medicaid Provider Spending Dataset provides comprehensive access to Medicaid outpatient spending data including:

  • PROVIDER_SPENDING (spending) - 200M+ claims aggregated by billing provider NPI, servicing provider NPI, HCPCS procedure code, and month — with total paid, claims, and beneficiary counts

  • BILLING_PROVIDERS (billing_providers) - 600K+ billing provider demographics including name, address, specialty taxonomy, and NPI assignment date

  • SERVICING_PROVIDERS (servicing_providers) - 1.5M+ servicing provider demographics with the same NPI-enriched fields

  • HCPCS_CODES (hcpcs_codes) - 20K+ HCPCS Level II and CPT procedure code descriptions from CMS reference files

  • STATE_QUALITY (state_quality) - CMS DQ Atlas state-level data quality ratings for T-MSIS expenditure data across 54 states/territories

Metadata Tables

Every Dataplex data product includes these standard metadata tables:

Table
Purpose

FEEDS

Dataset catalog — available tables, descriptions, update dates

FEEDS_FILES

Batch load history with is_latest flag for data freshness

CHANGELOG

Change log — data loads, schema changes, corrections

DATA_DICTIONARY

Column descriptions for all tables

Entity Relationship Diagram

CMS Medicaid Provider Spending Entity Relationship Diagram

PROVIDER_SPENDING is the central fact table. Join to BILLING_PROVIDERS via BILLING_PROVIDER_NPI_NUM = NPI, to SERVICING_PROVIDERS via SERVICING_PROVIDER_NPI_NUM = NPI, and to HCPCS_CODES via HCPCS_CODE (LEFT JOIN — ~83% coverage). STATE_QUALITY joins to BILLING_PROVIDERS via STATE = STATE_CODE for quality-adjusted analysis. All tables link to FEEDS and FEEDS_FILES via feed_id and feeds_files_id for data lineage.

Medicaid Provider Spending Tables

Provider Spending (PROVIDER_SPENDING)

Medicaid outpatient claims aggregated by billing provider, servicing provider, HCPCS procedure code, and month. The core fact table with 200M+ rows spanning 7+ years of monthly data.

Key Features:

  • 200M+ rows — one per billing NPI + servicing NPI + HCPCS code + month

  • Total paid, total claims, unique beneficiaries per aggregation

  • Computed metrics: average paid per claim and per beneficiary

  • CLAIM_FROM_DATE enables native date arithmetic and time-series functions

  • Privacy floor: minimum 12 beneficiaries per row (CMS suppression)

Column Reference

Column
Type
Description

BILLING_PROVIDER_NPI_NUM

VARCHAR

National Provider Identifier of the billing provider

SERVICING_PROVIDER_NPI_NUM

VARCHAR

National Provider Identifier of the servicing provider

HCPCS_CODE

VARCHAR

Healthcare Common Procedure Coding System Level II code

CLAIM_FROM_MONTH

VARCHAR

Month of claim service (YYYY-MM format)

CLAIM_FROM_DATE

DATE

First day of the claim month (enables date arithmetic)

TOTAL_UNIQUE_BENEFICIARIES

NUMBER

Number of unique Medicaid beneficiaries served (min 12 for privacy)

TOTAL_CLAIMS

NUMBER

Total number of Medicaid claims submitted

TOTAL_PAID

NUMBER

Total Medicaid amount paid in USD

AVG_PAID_PER_CLAIM

NUMBER

Average payment per claim (TOTAL_PAID / TOTAL_CLAIMS)

AVG_PAID_PER_BENEFICIARY

NUMBER

Average payment per beneficiary (TOTAL_PAID / TOTAL_UNIQUE_BENEFICIARIES)

feed_id

VARCHAR

FK to FEEDS — identifies which dataset this row belongs to

feeds_files_id

VARCHAR

FK to FEEDS_FILES — identifies which batch loaded this data

created_at

TIMESTAMP

When the source data was loaded into the warehouse

updated_at

TIMESTAMP

When dbt last rebuilt this table


Billing Providers (BILLING_PROVIDERS)

Billing provider demographics enriched with NPI data — 600K+ unique NPIs with name, address, specialty, and credentials.

Key Features:

  • 600K+ unique billing NPIs

  • NPI-linked demographics: name, address, phone, taxonomy code

  • Entity type distinguishes individuals (1) from organizations (2)

  • State field for geographic analysis (mostly 2-letter codes; ~0.01% non-standard from NPPES)

Column Reference

Column
Type
Description

NPI

VARCHAR

National Provider Identifier

ENTITY_TYPE

VARCHAR

1 = Individual provider, 2 = Organization

ORG_NAME

VARCHAR

Organization name (NULL for individual providers)

LAST_NAME

VARCHAR

Provider last name (NULL for organizations)

FIRST_NAME

VARCHAR

Provider first name (NULL for organizations)

MIDDLE_NAME

VARCHAR

Provider middle name

CREDENTIAL

VARCHAR

Provider credential (MD, DO, NP, etc.)

ADDRESS_LINE1

VARCHAR

Practice location street address

CITY

VARCHAR

Practice location city

STATE

VARCHAR

Practice location state (2-letter code)

ZIP

VARCHAR

Practice location ZIP code (5-digit)

PHONE

VARCHAR

Practice phone number

SEX

VARCHAR

Provider sex (individuals only)

TAXONOMY_CODE

VARCHAR

Primary healthcare provider taxonomy code

ENUMERATION_DATE

DATE

Date NPI was assigned

feed_id

VARCHAR

FK to FEEDS — identifies which dataset this row belongs to

feeds_files_id

VARCHAR

FK to FEEDS_FILES — identifies which batch loaded this data

created_at

TIMESTAMP

When the source data was loaded into the warehouse

updated_at

TIMESTAMP

When dbt last rebuilt this table


Servicing Providers (SERVICING_PROVIDERS)

Servicing provider demographics — 1.5M+ unique NPIs. Same schema as BILLING_PROVIDERS. The servicing provider is the individual or organization that actually performed the service.

Key Features:

  • 1.5M+ unique servicing NPIs

  • Same NPI-linked fields as BILLING_PROVIDERS

  • Larger population because servicing includes both billing and non-billing providers

Column Reference

Same columns as BILLING_PROVIDERS above.


HCPCS Codes (HCPCS_CODES)

HCPCS Level II and CPT procedure/service code descriptions. Combines CMS Physician Fee Schedule (CPT codes) and CMS HCPCS quarterly file (Level II alpha-prefix codes).

Key Features:

  • 20K+ procedure codes with descriptions

  • Covers ~83% of distinct codes in PROVIDER_SPENDING

  • Remaining ~17% are temporary codes, state-specific codes, and ADA dental codes — use LEFT JOIN

Column Reference

Column
Type
Description

HCPCS_CODE

VARCHAR

HCPCS Level II (alpha prefix A-V) or CPT (numeric) code

DESCRIPTION

VARCHAR

Short description of the procedure or service

feed_id

VARCHAR

FK to FEEDS — identifies which dataset this row belongs to

feeds_files_id

VARCHAR

FK to FEEDS_FILES — identifies which batch loaded this data

created_at

TIMESTAMP

When the source data was loaded into the warehouse

updated_at

TIMESTAMP

When dbt last rebuilt this table


State Quality (STATE_QUALITY)

CMS DQ Atlas state-level data quality ratings for T-MSIS spending data. CMS rates most states as "Unusable" under their stringent DQ Atlas methodology — this reflects CMS's quality standards, not that the spending data is invalid.

Key Features:

  • 2,400+ quality assessments across states and topic areas

  • OVERALL_QUALITY rollup across 4 spending-relevant topics

  • Use to flag states with known CMS-identified reporting gaps

  • Source: download.medicaid.gov DQ Atlas bulk CSV

Column Reference

Column
Type
Description

STATE_CODE

VARCHAR

Two-letter US state/territory code

STATE_NAME

VARCHAR

Full state/territory name

MEASURE_NAME

VARCHAR

CMS DQ Atlas measure name (e.g., 'IP Stays', 'Enrollment Counts')

TOPIC_AREA

VARCHAR

DQ Atlas topic area grouping

RATING

VARCHAR

Quality rating for this state + measure (Low/Medium/High Concern, Unusable)

OVERALL_QUALITY

VARCHAR

Worst-of quality rollup across spending-relevant DQ Atlas topics

feed_id

VARCHAR

FK to FEEDS — identifies which dataset this row belongs to

feeds_files_id

VARCHAR

FK to FEEDS_FILES — identifies which batch loaded this data

created_at

TIMESTAMP

When the source data was loaded into the warehouse

updated_at

TIMESTAMP

When dbt last rebuilt this table

Data Quality

Standardization

  • All columns use UPPERCASE naming consistent with Snowflake conventions

  • CLAIM_FROM_DATE computed from CLAIM_FROM_MONTH for native date arithmetic

  • AVG_PAID_PER_CLAIM and AVG_PAID_PER_BENEFICIARY pre-computed for convenience

  • Provider NPI fields validated as 10-digit identifiers

  • STATE_QUALITY filtered to 4 spending-relevant topic areas with OVERALL_QUALITY rollup

  • HCPCS_CODES merged from 3 CMS reference sources with deduplication

Data Freshness

Check when data was last updated:

How to Query CMS Medicaid Provider Spending Data

Platform Schema Reference

This dataset is available on both Snowflake and Databricks. Queries use schema-only references — the database is already set by the share or catalog context:

Platform
Schema
Example

Snowflake

DWV

DWV.PROVIDER_SPENDING

Databricks

cms_tmsis_provider_spending_dwv

cms_tmsis_provider_spending_dwv.provider_spending

Discover Available Data

Start with the FEEDS table to see what's available, and FEEDS_FILES to understand data freshness and load history.

Working with Data Lineage

Every data row links to FEEDS_FILES via feeds_files_id, which tells you exactly which batch loaded that data. Use this to filter to the current data version or trace any row back to its source load.

Top Billing Providers by State

Identify the highest-spending Medicaid billing providers in any state.

Track Medicaid spending and provider participation over 7+ years of monthly data, including pandemic impact periods.

Spending by HCPCS Procedure Category

Analyze which procedures drive the most Medicaid spending.

State Quality-Adjusted Spending Analysis

Cross-reference spending totals with CMS DQ Atlas quality ratings to identify states where data quality may affect analysis reliability.

Billing Outlier Detection

Find providers billing significantly above state medians for specific procedures — useful for fraud detection and billing compliance.

Tracking Data Changes Over Time

FEEDS_FILES records every batch load with row_count_delta showing what changed. Use this to monitor source data updates.

Who Uses This Data

Common Use Cases

  • Medicaid billing outlier detection — Identify providers billing significantly above state medians by procedure code and geography, flagging potential fraud or billing errors

  • Medicaid spending trend analysis — Track spending patterns across 84 months (2018-2024) including COVID-19 pandemic impact on Medicaid utilization and costs

  • Cross-state Medicaid comparisons — Compare per-provider and per-procedure spending across 50+ states with quality-adjusted analysis using CMS DQ Atlas ratings

  • Provider network analysis — Map billing-to-servicing provider relationships to identify coverage gaps, network concentration, and referral patterns

  • HCPCS procedure cost benchmarking — Benchmark procedure-level costs against state and national medians for rate-setting and contract negotiation

  • Grant writing and policy research — Access structured Medicaid expenditure data for FQHC grant applications, state Medicaid program evaluations, and federal policy analysis

This dataset pairs well with:

  • CMS NPPES Provider Dataset — Extended NPI provider attributes including additional practice locations, other names, and full taxonomy classification details

  • HRSA Healthcare Resources — County-level healthcare workforce and shortage area data for correlating spending patterns with provider supply

  • CMS Data Feeds Dataset — Medicare physician spending comparisons via the Medicare Provider Utilization and Payment Data feeds

Frequently Asked Questions

What states are included in the T-MSIS provider spending data? All 50 US states plus DC, Puerto Rico, US Virgin Islands, and Guam — 54 jurisdictions total. Coverage varies by year as states onboarded to T-MSIS between 2015-2020. By 2020, all states report through T-MSIS.

How often is the data updated? CMS publishes new T-MSIS spending data approximately annually. Our pipeline automatically monitors for new releases and loads updated data within days of publication — no manual intervention needed. Coverage currently spans 2018-present and grows with each CMS release.

What does "Unusable" mean in STATE_QUALITY? CMS's DQ Atlas applies stringent quality thresholds to T-MSIS data. Most states (~50 of 54) receive an "Unusable" rating on at least one spending-related measure. This reflects CMS's internal quality standards — it does not mean the spending data is invalid or should be discarded. Use OVERALL_QUALITY to identify states with known reporting gaps and adjust analysis accordingly.

Why do only ~83% of HCPCS codes have descriptions? The HCPCS_CODES table combines CMS's Physician Fee Schedule (CPT codes) and HCPCS Level II quarterly file. The remaining ~17% are temporary local codes (e.g., T-codes, S-codes), state-specific codes, and ADA dental codes not published in national CMS reference files. Always use LEFT JOIN when joining PROVIDER_SPENDING to HCPCS_CODES.

circle-check
Platform
Action

Snowflake

Coming soon — listing in preparation

circle-check

Last updated