CMS Medicaid Provider Spending Dataset
About the Dataset
The CMS Medicaid Provider Spending Dataset delivers 200+ million outpatient Medicaid claims aggregated by provider, procedure, and month across 50+ states — automatically updated when CMS publishes new T-MSIS (Transformed Medicaid Statistical Information System) data. This analytics-ready product enriches spending records with NPI provider demographics from NPPES and state-level data quality ratings from the CMS DQ Atlas. Designed for health policy analysts, Medicaid program administrators, and healthcare consultants who need provider-level Medicaid expenditure intelligence without months of data wrangling.
Coming Soon — This product is being prepared for marketplace listing. Request a Free Trial
Quick Access
Tables: PROVIDER_SPENDING, BILLING_PROVIDERS, SERVICING_PROVIDERS, HCPCS_CODES, STATE_QUALITY Sources: CMS T-MSIS Analytic Files, NPPES, CMS DQ Atlas Automatic Updates: Pipeline monitors CMS for new releases and loads data within days of publication Coverage: 50+ US states and territories, 2018-present (7+ years)
Overview
The CMS Medicaid Provider Spending Dataset provides comprehensive access to Medicaid outpatient spending data including:
PROVIDER_SPENDING (spending) - 200M+ claims aggregated by billing provider NPI, servicing provider NPI, HCPCS procedure code, and month — with total paid, claims, and beneficiary counts
BILLING_PROVIDERS (billing_providers) - 600K+ billing provider demographics including name, address, specialty taxonomy, and NPI assignment date
SERVICING_PROVIDERS (servicing_providers) - 1.5M+ servicing provider demographics with the same NPI-enriched fields
HCPCS_CODES (hcpcs_codes) - 20K+ HCPCS Level II and CPT procedure code descriptions from CMS reference files
STATE_QUALITY (state_quality) - CMS DQ Atlas state-level data quality ratings for T-MSIS expenditure data across 54 states/territories
Metadata Tables
Every Dataplex data product includes these standard metadata tables:
FEEDS
Dataset catalog — available tables, descriptions, update dates
FEEDS_FILES
Batch load history with is_latest flag for data freshness
CHANGELOG
Change log — data loads, schema changes, corrections
DATA_DICTIONARY
Column descriptions for all tables
Entity Relationship Diagram

PROVIDER_SPENDING is the central fact table. Join to BILLING_PROVIDERS via BILLING_PROVIDER_NPI_NUM = NPI, to SERVICING_PROVIDERS via SERVICING_PROVIDER_NPI_NUM = NPI, and to HCPCS_CODES via HCPCS_CODE (LEFT JOIN — ~83% coverage). STATE_QUALITY joins to BILLING_PROVIDERS via STATE = STATE_CODE for quality-adjusted analysis. All tables link to FEEDS and FEEDS_FILES via feed_id and feeds_files_id for data lineage.
Medicaid Provider Spending Tables
Provider Spending (PROVIDER_SPENDING)
Medicaid outpatient claims aggregated by billing provider, servicing provider, HCPCS procedure code, and month. The core fact table with 200M+ rows spanning 7+ years of monthly data.
Key Features:
200M+ rows — one per billing NPI + servicing NPI + HCPCS code + month
Total paid, total claims, unique beneficiaries per aggregation
Computed metrics: average paid per claim and per beneficiary
CLAIM_FROM_DATE enables native date arithmetic and time-series functions
Privacy floor: minimum 12 beneficiaries per row (CMS suppression)
Column Reference
BILLING_PROVIDER_NPI_NUM
VARCHAR
National Provider Identifier of the billing provider
SERVICING_PROVIDER_NPI_NUM
VARCHAR
National Provider Identifier of the servicing provider
HCPCS_CODE
VARCHAR
Healthcare Common Procedure Coding System Level II code
CLAIM_FROM_MONTH
VARCHAR
Month of claim service (YYYY-MM format)
CLAIM_FROM_DATE
DATE
First day of the claim month (enables date arithmetic)
TOTAL_UNIQUE_BENEFICIARIES
NUMBER
Number of unique Medicaid beneficiaries served (min 12 for privacy)
TOTAL_CLAIMS
NUMBER
Total number of Medicaid claims submitted
TOTAL_PAID
NUMBER
Total Medicaid amount paid in USD
AVG_PAID_PER_CLAIM
NUMBER
Average payment per claim (TOTAL_PAID / TOTAL_CLAIMS)
AVG_PAID_PER_BENEFICIARY
NUMBER
Average payment per beneficiary (TOTAL_PAID / TOTAL_UNIQUE_BENEFICIARIES)
feed_id
VARCHAR
FK to FEEDS — identifies which dataset this row belongs to
feeds_files_id
VARCHAR
FK to FEEDS_FILES — identifies which batch loaded this data
created_at
TIMESTAMP
When the source data was loaded into the warehouse
updated_at
TIMESTAMP
When dbt last rebuilt this table
Billing Providers (BILLING_PROVIDERS)
Billing provider demographics enriched with NPI data — 600K+ unique NPIs with name, address, specialty, and credentials.
Key Features:
600K+ unique billing NPIs
NPI-linked demographics: name, address, phone, taxonomy code
Entity type distinguishes individuals (1) from organizations (2)
State field for geographic analysis (mostly 2-letter codes; ~0.01% non-standard from NPPES)
Column Reference
NPI
VARCHAR
National Provider Identifier
ENTITY_TYPE
VARCHAR
1 = Individual provider, 2 = Organization
ORG_NAME
VARCHAR
Organization name (NULL for individual providers)
LAST_NAME
VARCHAR
Provider last name (NULL for organizations)
FIRST_NAME
VARCHAR
Provider first name (NULL for organizations)
MIDDLE_NAME
VARCHAR
Provider middle name
CREDENTIAL
VARCHAR
Provider credential (MD, DO, NP, etc.)
ADDRESS_LINE1
VARCHAR
Practice location street address
CITY
VARCHAR
Practice location city
STATE
VARCHAR
Practice location state (2-letter code)
ZIP
VARCHAR
Practice location ZIP code (5-digit)
PHONE
VARCHAR
Practice phone number
SEX
VARCHAR
Provider sex (individuals only)
TAXONOMY_CODE
VARCHAR
Primary healthcare provider taxonomy code
ENUMERATION_DATE
DATE
Date NPI was assigned
feed_id
VARCHAR
FK to FEEDS — identifies which dataset this row belongs to
feeds_files_id
VARCHAR
FK to FEEDS_FILES — identifies which batch loaded this data
created_at
TIMESTAMP
When the source data was loaded into the warehouse
updated_at
TIMESTAMP
When dbt last rebuilt this table
Servicing Providers (SERVICING_PROVIDERS)
Servicing provider demographics — 1.5M+ unique NPIs. Same schema as BILLING_PROVIDERS. The servicing provider is the individual or organization that actually performed the service.
Key Features:
1.5M+ unique servicing NPIs
Same NPI-linked fields as BILLING_PROVIDERS
Larger population because servicing includes both billing and non-billing providers
Column Reference
Same columns as BILLING_PROVIDERS above.
HCPCS Codes (HCPCS_CODES)
HCPCS Level II and CPT procedure/service code descriptions. Combines CMS Physician Fee Schedule (CPT codes) and CMS HCPCS quarterly file (Level II alpha-prefix codes).
Key Features:
20K+ procedure codes with descriptions
Covers ~83% of distinct codes in PROVIDER_SPENDING
Remaining ~17% are temporary codes, state-specific codes, and ADA dental codes — use LEFT JOIN
Column Reference
HCPCS_CODE
VARCHAR
HCPCS Level II (alpha prefix A-V) or CPT (numeric) code
DESCRIPTION
VARCHAR
Short description of the procedure or service
feed_id
VARCHAR
FK to FEEDS — identifies which dataset this row belongs to
feeds_files_id
VARCHAR
FK to FEEDS_FILES — identifies which batch loaded this data
created_at
TIMESTAMP
When the source data was loaded into the warehouse
updated_at
TIMESTAMP
When dbt last rebuilt this table
State Quality (STATE_QUALITY)
CMS DQ Atlas state-level data quality ratings for T-MSIS spending data. CMS rates most states as "Unusable" under their stringent DQ Atlas methodology — this reflects CMS's quality standards, not that the spending data is invalid.
Key Features:
2,400+ quality assessments across states and topic areas
OVERALL_QUALITY rollup across 4 spending-relevant topics
Use to flag states with known CMS-identified reporting gaps
Source: download.medicaid.gov DQ Atlas bulk CSV
Column Reference
STATE_CODE
VARCHAR
Two-letter US state/territory code
STATE_NAME
VARCHAR
Full state/territory name
MEASURE_NAME
VARCHAR
CMS DQ Atlas measure name (e.g., 'IP Stays', 'Enrollment Counts')
TOPIC_AREA
VARCHAR
DQ Atlas topic area grouping
RATING
VARCHAR
Quality rating for this state + measure (Low/Medium/High Concern, Unusable)
OVERALL_QUALITY
VARCHAR
Worst-of quality rollup across spending-relevant DQ Atlas topics
feed_id
VARCHAR
FK to FEEDS — identifies which dataset this row belongs to
feeds_files_id
VARCHAR
FK to FEEDS_FILES — identifies which batch loaded this data
created_at
TIMESTAMP
When the source data was loaded into the warehouse
updated_at
TIMESTAMP
When dbt last rebuilt this table
Data Quality
Standardization
All columns use UPPERCASE naming consistent with Snowflake conventions
CLAIM_FROM_DATE computed from CLAIM_FROM_MONTH for native date arithmetic
AVG_PAID_PER_CLAIM and AVG_PAID_PER_BENEFICIARY pre-computed for convenience
Provider NPI fields validated as 10-digit identifiers
STATE_QUALITY filtered to 4 spending-relevant topic areas with OVERALL_QUALITY rollup
HCPCS_CODES merged from 3 CMS reference sources with deduplication
Data Freshness
Check when data was last updated:
How to Query CMS Medicaid Provider Spending Data
Platform Schema Reference
This dataset is available on both Snowflake and Databricks. Queries use schema-only references — the database is already set by the share or catalog context:
Snowflake
DWV
DWV.PROVIDER_SPENDING
Databricks
cms_tmsis_provider_spending_dwv
cms_tmsis_provider_spending_dwv.provider_spending
Discover Available Data
Start with the FEEDS table to see what's available, and FEEDS_FILES to understand data freshness and load history.
Working with Data Lineage
Every data row links to FEEDS_FILES via feeds_files_id, which tells you exactly which batch loaded that data. Use this to filter to the current data version or trace any row back to its source load.
Top Billing Providers by State
Identify the highest-spending Medicaid billing providers in any state.
Monthly Medicaid Spending Trends
Track Medicaid spending and provider participation over 7+ years of monthly data, including pandemic impact periods.
Spending by HCPCS Procedure Category
Analyze which procedures drive the most Medicaid spending.
State Quality-Adjusted Spending Analysis
Cross-reference spending totals with CMS DQ Atlas quality ratings to identify states where data quality may affect analysis reliability.
Billing Outlier Detection
Find providers billing significantly above state medians for specific procedures — useful for fraud detection and billing compliance.
Tracking Data Changes Over Time
FEEDS_FILES records every batch load with row_count_delta showing what changed. Use this to monitor source data updates.
Who Uses This Data
Common Use Cases
Medicaid billing outlier detection — Identify providers billing significantly above state medians by procedure code and geography, flagging potential fraud or billing errors
Medicaid spending trend analysis — Track spending patterns across 84 months (2018-2024) including COVID-19 pandemic impact on Medicaid utilization and costs
Cross-state Medicaid comparisons — Compare per-provider and per-procedure spending across 50+ states with quality-adjusted analysis using CMS DQ Atlas ratings
Provider network analysis — Map billing-to-servicing provider relationships to identify coverage gaps, network concentration, and referral patterns
HCPCS procedure cost benchmarking — Benchmark procedure-level costs against state and national medians for rate-setting and contract negotiation
Grant writing and policy research — Access structured Medicaid expenditure data for FQHC grant applications, state Medicaid program evaluations, and federal policy analysis
Related Datasets and Research
This dataset pairs well with:
CMS NPPES Provider Dataset — Extended NPI provider attributes including additional practice locations, other names, and full taxonomy classification details
HRSA Healthcare Resources — County-level healthcare workforce and shortage area data for correlating spending patterns with provider supply
CMS Data Feeds Dataset — Medicare physician spending comparisons via the Medicare Provider Utilization and Payment Data feeds
Frequently Asked Questions
What states are included in the T-MSIS provider spending data? All 50 US states plus DC, Puerto Rico, US Virgin Islands, and Guam — 54 jurisdictions total. Coverage varies by year as states onboarded to T-MSIS between 2015-2020. By 2020, all states report through T-MSIS.
How often is the data updated? CMS publishes new T-MSIS spending data approximately annually. Our pipeline automatically monitors for new releases and loads updated data within days of publication — no manual intervention needed. Coverage currently spans 2018-present and grows with each CMS release.
What does "Unusable" mean in STATE_QUALITY? CMS's DQ Atlas applies stringent quality thresholds to T-MSIS data. Most states (~50 of 54) receive an "Unusable" rating on at least one spending-related measure. This reflects CMS's internal quality standards — it does not mean the spending data is invalid or should be discarded. Use OVERALL_QUALITY to identify states with known reporting gaps and adjust analysis accordingly.
Why do only ~83% of HCPCS codes have descriptions? The HCPCS_CODES table combines CMS's Physician Fee Schedule (CPT codes) and HCPCS Level II quarterly file. The remaining ~17% are temporary local codes (e.g., T-codes, S-codes), state-specific codes, and ADA dental codes not published in national CMS reference files. Always use LEFT JOIN when joining PROVIDER_SPENDING to HCPCS_CODES.
Ready to access CMS Medicaid Provider Spending data?
Snowflake
Coming soon — listing in preparation
Databricks
Questions? Contact our team for a walkthrough.
Last updated

