# CMS Medicaid Provider Spending Dataset

### About the Dataset

The **CMS Medicaid Provider Spending Dataset** delivers **227+ million** outpatient **Medicaid** claims aggregated by provider, procedure, and month across 50+ states — **automatically updated** when **CMS** publishes new **T-MSIS** (Transformed Medicaid Statistical Information System) data. This analytics-ready product enriches spending records with **NPI** provider demographics from **NPPES** and state-level data quality ratings from the **CMS DQ Atlas**, so you know which states to trust before you query. With 6 states flagged unusable and 16 more rated high concern by CMS, the built-in quality scoring saves analysts weeks of cross-referencing. Designed for health policy analysts, Medicaid program administrators, and healthcare consultants who need provider-level Medicaid expenditure intelligence without months of data wrangling.

{% hint style="success" %}
**Now Available** on the Snowflake Marketplace — [Get Listing](https://app.snowflake.com/marketplace/listing/GZT1Z7QRSYVE)
{% endhint %}

### Quick Access

**Tables**: PROVIDER\_SPENDING, BILLING\_PROVIDERS, SERVICING\_PROVIDERS, HCPCS\_CODES, STATE\_QUALITY\
**Sources**: CMS T-MSIS Analytic Files, NPPES, CMS DQ Atlas\
**Automatic Updates**: Pipeline monitors CMS for new releases and loads data within days of publication\
**Coverage**: 50+ US states and territories, 2018-present (7+ years)

## Overview

The CMS Medicaid Provider Spending Dataset provides comprehensive access to Medicaid outpatient spending data including:

* **PROVIDER\_SPENDING** (spending) - 200M+ claims aggregated by billing provider NPI, servicing provider NPI, HCPCS procedure code, and month — with total paid, claims, and beneficiary counts
* **BILLING\_PROVIDERS** (billing\_providers) - 600K+ billing provider demographics including name, address, specialty taxonomy, and NPI assignment date
* **SERVICING\_PROVIDERS** (servicing\_providers) - 1.5M+ servicing provider demographics with the same NPI-enriched fields
* **HCPCS\_CODES** (hcpcs\_codes) - 20K+ HCPCS Level II and CPT procedure code descriptions from CMS reference files
* **STATE\_QUALITY** (state\_quality) - CMS DQ Atlas state-level data quality ratings for T-MSIS expenditure data across 54 states/territories

### Metadata Tables

Every Dataplex data product includes these standard metadata tables:

| Table             | Purpose                                                        |
| ----------------- | -------------------------------------------------------------- |
| `FEEDS`           | Dataset catalog — available tables, descriptions, update dates |
| `FEEDS_FILES`     | Batch load history with `is_latest` flag for data freshness    |
| `CHANGELOG`       | Change log — data loads, schema changes, corrections           |
| `DATA_DICTIONARY` | Column descriptions for all tables                             |

## Entity Relationship Diagram

![CMS Medicaid Provider Spending Entity Relationship Diagram](https://813439891-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FDeimGtBflXKQn786VLvj%2Fuploads%2Fgit-blob-368102d2b465e99a8aac3a3e44c6736241a0b496%2Fentity-relationship.png?alt=media)

PROVIDER\_SPENDING is the central fact table. Join to BILLING\_PROVIDERS via `BILLING_PROVIDER_NPI_NUM = NPI`, to SERVICING\_PROVIDERS via `SERVICING_PROVIDER_NPI_NUM = NPI`, and to HCPCS\_CODES via `HCPCS_CODE` (LEFT JOIN — \~83% coverage). STATE\_QUALITY joins to BILLING\_PROVIDERS via `STATE = STATE_CODE` for quality-adjusted analysis. All tables link to FEEDS and FEEDS\_FILES via `feed_id` and `feeds_files_id` for data lineage.

## Medicaid Provider Spending Tables

### Provider Spending (PROVIDER\_SPENDING)

Medicaid outpatient claims aggregated by billing provider, servicing provider, HCPCS procedure code, and month. The core fact table with 200M+ rows spanning 7+ years of monthly data.

**Key Features:**

* 200M+ rows — one per billing NPI + servicing NPI + HCPCS code + month
* Total paid, total claims, unique beneficiaries per aggregation
* Computed metrics: average paid per claim and per beneficiary
* CLAIM\_FROM\_DATE enables native date arithmetic and time-series functions
* Privacy floor: minimum 12 beneficiaries per row (CMS suppression)

#### Column Reference

| Column                        | Type      | Description                                                                  |
| ----------------------------- | --------- | ---------------------------------------------------------------------------- |
| BILLING\_PROVIDER\_NPI\_NUM   | VARCHAR   | National Provider Identifier of the billing provider                         |
| SERVICING\_PROVIDER\_NPI\_NUM | VARCHAR   | National Provider Identifier of the servicing provider                       |
| HCPCS\_CODE                   | VARCHAR   | Healthcare Common Procedure Coding System Level II code                      |
| CLAIM\_FROM\_MONTH            | VARCHAR   | Month of claim service (YYYY-MM format)                                      |
| CLAIM\_FROM\_DATE             | DATE      | First day of the claim month (enables date arithmetic)                       |
| TOTAL\_UNIQUE\_BENEFICIARIES  | NUMBER    | Number of unique Medicaid beneficiaries served (min 12 for privacy)          |
| TOTAL\_CLAIMS                 | NUMBER    | Total number of Medicaid claims submitted                                    |
| TOTAL\_PAID                   | NUMBER    | Total Medicaid amount paid in USD                                            |
| AVG\_PAID\_PER\_CLAIM         | NUMBER    | Average payment per claim (TOTAL\_PAID / TOTAL\_CLAIMS)                      |
| AVG\_PAID\_PER\_BENEFICIARY   | NUMBER    | Average payment per beneficiary (TOTAL\_PAID / TOTAL\_UNIQUE\_BENEFICIARIES) |
| feed\_id                      | VARCHAR   | FK to FEEDS — identifies which dataset this row belongs to                   |
| feeds\_files\_id              | VARCHAR   | FK to FEEDS\_FILES — identifies which batch loaded this data                 |
| created\_at                   | TIMESTAMP | When the source data was loaded into the warehouse                           |
| updated\_at                   | TIMESTAMP | When dbt last rebuilt this table                                             |

***

### Billing Providers (BILLING\_PROVIDERS)

Billing provider demographics enriched with NPI data — 600K+ unique NPIs with name, address, specialty, and credentials.

**Key Features:**

* 600K+ unique billing NPIs
* NPI-linked demographics: name, address, phone, taxonomy code
* Entity type distinguishes individuals (1) from organizations (2)
* State field for geographic analysis (mostly 2-letter codes; \~0.01% non-standard from NPPES)

#### Column Reference

| Column            | Type      | Description                                                  |
| ----------------- | --------- | ------------------------------------------------------------ |
| NPI               | VARCHAR   | National Provider Identifier                                 |
| ENTITY\_TYPE      | VARCHAR   | 1 = Individual provider, 2 = Organization                    |
| ORG\_NAME         | VARCHAR   | Organization name (NULL for individual providers)            |
| LAST\_NAME        | VARCHAR   | Provider last name (NULL for organizations)                  |
| FIRST\_NAME       | VARCHAR   | Provider first name (NULL for organizations)                 |
| MIDDLE\_NAME      | VARCHAR   | Provider middle name                                         |
| CREDENTIAL        | VARCHAR   | Provider credential (MD, DO, NP, etc.)                       |
| ADDRESS\_LINE1    | VARCHAR   | Practice location street address                             |
| CITY              | VARCHAR   | Practice location city                                       |
| STATE             | VARCHAR   | Practice location state (2-letter code)                      |
| ZIP               | VARCHAR   | Practice location ZIP code (5-digit)                         |
| PHONE             | VARCHAR   | Practice phone number                                        |
| SEX               | VARCHAR   | Provider sex (individuals only)                              |
| TAXONOMY\_CODE    | VARCHAR   | Primary healthcare provider taxonomy code                    |
| ENUMERATION\_DATE | DATE      | Date NPI was assigned                                        |
| feed\_id          | VARCHAR   | FK to FEEDS — identifies which dataset this row belongs to   |
| feeds\_files\_id  | VARCHAR   | FK to FEEDS\_FILES — identifies which batch loaded this data |
| created\_at       | TIMESTAMP | When the source data was loaded into the warehouse           |
| updated\_at       | TIMESTAMP | When dbt last rebuilt this table                             |

***

### Servicing Providers (SERVICING\_PROVIDERS)

Servicing provider demographics — 1.5M+ unique NPIs. Same schema as BILLING\_PROVIDERS. The servicing provider is the individual or organization that actually performed the service.

**Key Features:**

* 1.5M+ unique servicing NPIs
* Same NPI-linked fields as BILLING\_PROVIDERS
* Larger population because servicing includes both billing and non-billing providers

#### Column Reference

Same columns as BILLING\_PROVIDERS above.

***

### HCPCS Codes (HCPCS\_CODES)

HCPCS Level II and CPT procedure/service code descriptions. Combines CMS Physician Fee Schedule (CPT codes) and CMS HCPCS quarterly file (Level II alpha-prefix codes).

**Key Features:**

* 20K+ procedure codes with descriptions
* Covers \~83% of distinct codes in PROVIDER\_SPENDING
* Remaining \~17% are temporary codes, state-specific codes, and ADA dental codes — use LEFT JOIN

#### Column Reference

| Column           | Type      | Description                                                  |
| ---------------- | --------- | ------------------------------------------------------------ |
| HCPCS\_CODE      | VARCHAR   | HCPCS Level II (alpha prefix A-V) or CPT (numeric) code      |
| DESCRIPTION      | VARCHAR   | Short description of the procedure or service                |
| feed\_id         | VARCHAR   | FK to FEEDS — identifies which dataset this row belongs to   |
| feeds\_files\_id | VARCHAR   | FK to FEEDS\_FILES — identifies which batch loaded this data |
| created\_at      | TIMESTAMP | When the source data was loaded into the warehouse           |
| updated\_at      | TIMESTAMP | When dbt last rebuilt this table                             |

***

### State Quality (STATE\_QUALITY)

CMS DQ Atlas state-level data quality ratings for T-MSIS spending data. CMS rates most states as "Unusable" under their stringent DQ Atlas methodology — this reflects CMS's quality standards, not that the spending data is invalid.

**Key Features:**

* 2,400+ quality assessments across states and topic areas
* OVERALL\_QUALITY rollup across 4 spending-relevant topics
* Use to flag states with known CMS-identified reporting gaps
* Source: download.medicaid.gov DQ Atlas bulk CSV

#### Column Reference

| Column           | Type      | Description                                                                 |
| ---------------- | --------- | --------------------------------------------------------------------------- |
| STATE\_CODE      | VARCHAR   | Two-letter US state/territory code                                          |
| STATE\_NAME      | VARCHAR   | Full state/territory name                                                   |
| MEASURE\_NAME    | VARCHAR   | CMS DQ Atlas measure name (e.g., 'IP Stays', 'Enrollment Counts')           |
| TOPIC\_AREA      | VARCHAR   | DQ Atlas topic area grouping                                                |
| RATING           | VARCHAR   | Quality rating for this state + measure (Low/Medium/High Concern, Unusable) |
| OVERALL\_QUALITY | VARCHAR   | Worst-of quality rollup across spending-relevant DQ Atlas topics            |
| feed\_id         | VARCHAR   | FK to FEEDS — identifies which dataset this row belongs to                  |
| feeds\_files\_id | VARCHAR   | FK to FEEDS\_FILES — identifies which batch loaded this data                |
| created\_at      | TIMESTAMP | When the source data was loaded into the warehouse                          |
| updated\_at      | TIMESTAMP | When dbt last rebuilt this table                                            |

## Data Quality

### Standardization

* All columns use UPPERCASE naming consistent with Snowflake conventions
* CLAIM\_FROM\_DATE computed from CLAIM\_FROM\_MONTH for native date arithmetic
* AVG\_PAID\_PER\_CLAIM and AVG\_PAID\_PER\_BENEFICIARY pre-computed for convenience
* Provider NPI fields validated as 10-digit identifiers
* STATE\_QUALITY filtered to 4 spending-relevant topic areas with OVERALL\_QUALITY rollup
* HCPCS\_CODES merged from 3 CMS reference sources with deduplication

### Data Freshness

Check when data was last updated:

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT source_name, data_period, row_count, created_at, is_latest
FROM DWV.FEEDS_FILES
WHERE is_latest = TRUE;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT source_name, data_period, row_count, created_at, is_latest
FROM cms_tmsis_provider_spending_dwv.feeds_files
WHERE is_latest = TRUE;
```

{% endtab %}
{% endtabs %}

## How to Query CMS Medicaid Provider Spending Data

### Platform Schema Reference

This dataset is available on both Snowflake and Databricks. Queries use schema-only references — the database is already set by the share or catalog context:

| Platform       | Schema                            | Example                                             |
| -------------- | --------------------------------- | --------------------------------------------------- |
| **Snowflake**  | `DWV`                             | `DWV.PROVIDER_SPENDING`                             |
| **Databricks** | `cms_tmsis_provider_spending_dwv` | `cms_tmsis_provider_spending_dwv.provider_spending` |

### Discover Available Data

Start with the `FEEDS` table to see what's available, and `FEEDS_FILES` to understand data freshness and load history.

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Browse the data catalog
SELECT table_name, description, latest_data_period,
       latest_row_count, last_loaded_at, total_batches
FROM DWV.FEEDS
ORDER BY table_name;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
-- Browse the data catalog
SELECT table_name, description, latest_data_period,
       latest_row_count, last_loaded_at, total_batches
FROM cms_tmsis_provider_spending_dwv.feeds
ORDER BY table_name;
```

{% endtab %}
{% endtabs %}

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Check data freshness per source
-- is_latest = TRUE marks the current batch for each source
SELECT source_name, data_period, row_count, row_count_delta,
       created_at AS loaded_at, is_latest
FROM DWV.FEEDS_FILES
ORDER BY source_name, created_at DESC;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT source_name, data_period, row_count, row_count_delta,
       created_at AS loaded_at, is_latest
FROM cms_tmsis_provider_spending_dwv.feeds_files
ORDER BY source_name, created_at DESC;
```

{% endtab %}
{% endtabs %}

### Working with Data Lineage

Every data row links to `FEEDS_FILES` via `feeds_files_id`, which tells you exactly which batch loaded that data. Use this to filter to the current data version or trace any row back to its source load.

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Get current spending data with batch lineage
SELECT
    ps.BILLING_PROVIDER_NPI_NUM,
    ps.HCPCS_CODE,
    ps.TOTAL_PAID,
    ps.TOTAL_CLAIMS,
    ff.data_period,
    ff.created_at AS data_loaded_at
FROM DWV.PROVIDER_SPENDING ps
JOIN DWV.FEEDS_FILES ff ON ps.feeds_files_id = ff.id
WHERE ff.is_latest = TRUE
LIMIT 20;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT
    ps.BILLING_PROVIDER_NPI_NUM,
    ps.HCPCS_CODE,
    ps.TOTAL_PAID,
    ps.TOTAL_CLAIMS,
    ff.data_period,
    ff.created_at AS data_loaded_at
FROM cms_tmsis_provider_spending_dwv.provider_spending ps
JOIN cms_tmsis_provider_spending_dwv.feeds_files ff ON ps.feeds_files_id = ff.id
WHERE ff.is_latest = TRUE
LIMIT 20;
```

{% endtab %}
{% endtabs %}

### Top Billing Providers by State

Identify the highest-spending Medicaid billing providers in any state.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT bp.STATE, bp.NPI, bp.ORG_NAME, bp.LAST_NAME, bp.FIRST_NAME,
       SUM(ps.TOTAL_PAID) AS total_paid,
       SUM(ps.TOTAL_CLAIMS) AS total_claims
FROM DWV.PROVIDER_SPENDING ps
JOIN DWV.BILLING_PROVIDERS bp ON ps.BILLING_PROVIDER_NPI_NUM = bp.NPI
WHERE bp.STATE = 'CA'
GROUP BY 1, 2, 3, 4, 5
ORDER BY total_paid DESC
LIMIT 25;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT bp.STATE, bp.NPI, bp.ORG_NAME, bp.LAST_NAME, bp.FIRST_NAME,
       SUM(ps.TOTAL_PAID) AS total_paid,
       SUM(ps.TOTAL_CLAIMS) AS total_claims
FROM cms_tmsis_provider_spending_dwv.provider_spending ps
JOIN cms_tmsis_provider_spending_dwv.billing_providers bp ON ps.BILLING_PROVIDER_NPI_NUM = bp.NPI
WHERE bp.STATE = 'CA'
GROUP BY 1, 2, 3, 4, 5
ORDER BY total_paid DESC
LIMIT 25;
```

{% endtab %}
{% endtabs %}

### Monthly Medicaid Spending Trends

Track Medicaid spending and provider participation over 7+ years of monthly data, including pandemic impact periods.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT CLAIM_FROM_MONTH,
       COUNT(DISTINCT BILLING_PROVIDER_NPI_NUM) AS unique_providers,
       SUM(TOTAL_CLAIMS) AS total_claims,
       SUM(TOTAL_PAID) AS total_paid
FROM DWV.PROVIDER_SPENDING
GROUP BY 1
ORDER BY 1;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT CLAIM_FROM_MONTH,
       COUNT(DISTINCT BILLING_PROVIDER_NPI_NUM) AS unique_providers,
       SUM(TOTAL_CLAIMS) AS total_claims,
       SUM(TOTAL_PAID) AS total_paid
FROM cms_tmsis_provider_spending_dwv.provider_spending
GROUP BY 1
ORDER BY 1;
```

{% endtab %}
{% endtabs %}

### Spending by HCPCS Procedure Category

Analyze which procedures drive the most Medicaid spending.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT ps.HCPCS_CODE, h.DESCRIPTION,
       SUM(ps.TOTAL_PAID) AS total_paid,
       SUM(ps.TOTAL_CLAIMS) AS total_claims,
       SUM(ps.TOTAL_UNIQUE_BENEFICIARIES) AS total_beneficiaries
FROM DWV.PROVIDER_SPENDING ps
LEFT JOIN DWV.HCPCS_CODES h ON ps.HCPCS_CODE = h.HCPCS_CODE
GROUP BY 1, 2
ORDER BY total_paid DESC
LIMIT 50;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT ps.HCPCS_CODE, h.DESCRIPTION,
       SUM(ps.TOTAL_PAID) AS total_paid,
       SUM(ps.TOTAL_CLAIMS) AS total_claims,
       SUM(ps.TOTAL_UNIQUE_BENEFICIARIES) AS total_beneficiaries
FROM cms_tmsis_provider_spending_dwv.provider_spending ps
LEFT JOIN cms_tmsis_provider_spending_dwv.hcpcs_codes h ON ps.HCPCS_CODE = h.HCPCS_CODE
GROUP BY 1, 2
ORDER BY total_paid DESC
LIMIT 50;
```

{% endtab %}
{% endtabs %}

### State Quality-Adjusted Spending Analysis

Cross-reference spending totals with CMS DQ Atlas quality ratings to identify states where data quality may affect analysis reliability.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT bp.STATE, sq.OVERALL_QUALITY,
       SUM(ps.TOTAL_PAID) AS total_paid,
       COUNT(DISTINCT ps.BILLING_PROVIDER_NPI_NUM) AS unique_providers
FROM DWV.PROVIDER_SPENDING ps
JOIN DWV.BILLING_PROVIDERS bp ON ps.BILLING_PROVIDER_NPI_NUM = bp.NPI
LEFT JOIN (
    SELECT DISTINCT STATE_CODE, OVERALL_QUALITY
    FROM DWV.STATE_QUALITY
) sq ON bp.STATE = sq.STATE_CODE
GROUP BY 1, 2
ORDER BY total_paid DESC;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT bp.STATE, sq.OVERALL_QUALITY,
       SUM(ps.TOTAL_PAID) AS total_paid,
       COUNT(DISTINCT ps.BILLING_PROVIDER_NPI_NUM) AS unique_providers
FROM cms_tmsis_provider_spending_dwv.provider_spending ps
JOIN cms_tmsis_provider_spending_dwv.billing_providers bp ON ps.BILLING_PROVIDER_NPI_NUM = bp.NPI
LEFT JOIN (
    SELECT DISTINCT STATE_CODE, OVERALL_QUALITY
    FROM cms_tmsis_provider_spending_dwv.state_quality
) sq ON bp.STATE = sq.STATE_CODE
GROUP BY 1, 2
ORDER BY total_paid DESC;
```

{% endtab %}
{% endtabs %}

### Billing Outlier Detection

Find providers billing significantly above state medians for specific procedures — useful for fraud detection and billing compliance.

{% tabs %}
{% tab title="Snowflake" %}

```sql
WITH state_medians AS (
    SELECT bp.STATE, ps.HCPCS_CODE,
           MEDIAN(ps.AVG_PAID_PER_CLAIM) AS median_per_claim
    FROM DWV.PROVIDER_SPENDING ps
    JOIN DWV.BILLING_PROVIDERS bp ON ps.BILLING_PROVIDER_NPI_NUM = bp.NPI
    GROUP BY 1, 2
    HAVING median_per_claim > 1.00
)
SELECT ps.BILLING_PROVIDER_NPI_NUM, bp.STATE, ps.HCPCS_CODE,
       h.DESCRIPTION,
       ps.AVG_PAID_PER_CLAIM, sm.median_per_claim,
       ps.AVG_PAID_PER_CLAIM / sm.median_per_claim AS ratio_to_median
FROM DWV.PROVIDER_SPENDING ps
JOIN DWV.BILLING_PROVIDERS bp ON ps.BILLING_PROVIDER_NPI_NUM = bp.NPI
JOIN state_medians sm ON bp.STATE = sm.STATE AND ps.HCPCS_CODE = sm.HCPCS_CODE
LEFT JOIN DWV.HCPCS_CODES h ON ps.HCPCS_CODE = h.HCPCS_CODE
WHERE ps.AVG_PAID_PER_CLAIM > sm.median_per_claim * 3
ORDER BY ratio_to_median DESC
LIMIT 100;
```

{% endtab %}
{% endtabs %}

### Tracking Data Changes Over Time

`FEEDS_FILES` records every batch load with `row_count_delta` showing what changed. Use this to monitor source data updates.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT source_name, data_period, row_count, row_count_delta,
       created_at AS loaded_at, is_latest
FROM DWV.FEEDS_FILES
WHERE source_name = 'spending'
ORDER BY created_at;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT source_name, data_period, row_count, row_count_delta,
       created_at AS loaded_at, is_latest
FROM cms_tmsis_provider_spending_dwv.feeds_files
WHERE source_name = 'spending'
ORDER BY created_at;
```

{% endtab %}
{% endtabs %}

## Who Uses This Data

### Common Use Cases

* **Medicaid billing outlier detection** — Identify providers billing significantly above state medians by procedure code and geography, flagging potential fraud or billing errors
* **Medicaid spending trend analysis** — Track spending patterns across 84 months (2018-2024) including COVID-19 pandemic impact on Medicaid utilization and costs
* **Cross-state Medicaid comparisons** — Compare per-provider and per-procedure spending across 50+ states with quality-adjusted analysis using CMS DQ Atlas ratings
* **Provider network analysis** — Map billing-to-servicing provider relationships to identify coverage gaps, network concentration, and referral patterns
* **HCPCS procedure cost benchmarking** — Benchmark procedure-level costs against state and national medians for rate-setting and contract negotiation
* **Grant writing and policy research** — Access structured Medicaid expenditure data for FQHC grant applications, state Medicaid program evaluations, and federal policy analysis

### Related Datasets and Research

This dataset pairs well with:

* [**CMS NPPES Provider Dataset**](https://docs.dataplex-consulting.com/data-catalog/cms-nppes-provider-dataset) — Extended NPI provider attributes including additional practice locations, other names, and full taxonomy classification details
* [**HRSA Healthcare Resources**](https://docs.dataplex-consulting.com/data-catalog/hrsa-dataset) — County-level healthcare workforce and shortage area data for correlating spending patterns with provider supply
* [**CMS Data Feeds Dataset**](https://docs.dataplex-consulting.com/data-catalog/cms-data-feeds-dataset) — Medicare physician spending comparisons via the Medicare Provider Utilization and Payment Data feeds

### Frequently Asked Questions

**What states are included in the T-MSIS provider spending data?** All 50 US states plus DC, Puerto Rico, US Virgin Islands, and Guam — 54 jurisdictions total. Coverage varies by year as states onboarded to T-MSIS between 2015-2020. By 2020, all states report through T-MSIS.

**How often is the data updated?** CMS publishes new T-MSIS spending data approximately annually. Our pipeline automatically monitors for new releases and loads updated data within days of publication — no manual intervention needed. Coverage currently spans 2018-present and grows with each CMS release.

**What does "Unusable" mean in STATE\_QUALITY?** CMS's DQ Atlas applies stringent quality thresholds to T-MSIS data. Most states (\~50 of 54) receive an "Unusable" rating on at least one spending-related measure. This reflects CMS's internal quality standards — it does not mean the spending data is invalid or should be discarded. Use OVERALL\_QUALITY to identify states with known reporting gaps and adjust analysis accordingly.

**Why do only \~83% of HCPCS codes have descriptions?** The HCPCS\_CODES table combines CMS's Physician Fee Schedule (CPT codes) and HCPCS Level II quarterly file. The remaining \~17% are temporary local codes (e.g., T-codes, S-codes), state-specific codes, and ADA dental codes not published in national CMS reference files. Always use LEFT JOIN when joining PROVIDER\_SPENDING to HCPCS\_CODES.

{% hint style="success" %}
**Ready to access CMS Medicaid Provider Spending data?**
{% endhint %}

| Platform       | Action                                                                    |
| -------------- | ------------------------------------------------------------------------- |
| **Snowflake**  | [Get Listing](https://app.snowflake.com/marketplace/listing/GZT1Z7QRSYVE) |
| **Databricks** | [Free Trial](https://trial.dataplex-consulting.com)                       |

{% hint style="success" %}
Questions? [Contact our team](mailto:support@dataplex-consulting.com) for a walkthrough.
{% endhint %}
