# CDC Open Data Product

### About the Dataset

The CDC Open Data Product is a comprehensive data solution that transforms and delivers over 1,485 high-quality, up-to-date public health datasets from the Centers for Disease Control and Prevention (CDC). This product encompasses more than 678 GB of data and growing, with over 32,458 attributes, offering researchers, analysts, and data scientists unparalleled access to a vast array of public health information.

{% hint style="info" %}
**Get Full Access** | [Snowflake Marketplace](https://app.snowflake.com/marketplace/listing/GZT1Z125KEY) | [Databricks](https://checkout.dataplex-consulting.com/b/14A6oAcXFfOw13y9y4bQY01) | [Free Trial](https://trial.dataplex-consulting.com)
{% endhint %}

### Quick Access

**Schemas**: `DWV*` (Data Warehouse Views)\
**Total Datasets**: 1,485 views across 50+ categories\
**Update Frequency**: Daily to annual, depending on CDC source

## Overview

The CDC Open Data Product provides standardized access to diverse health datasets including:

* **NCHS Statistics** (213 views) - National Center for Health Statistics data
* **NNDSS Data** (295 views) - National Notifiable Diseases Surveillance System
* **CDC Cities** (57 views) - City-level health indicator data
* **Motor Vehicles** (45 views) - Vehicle safety and injury data
* **Vaccination Data** (82 views) - Immunization coverage and safety
* **Environmental Health** (25 views) - Environmental exposures and health impacts
* **Heart & Stroke Prevention** (26 views) - Cardiovascular health data
* **Legislative Data** (33 views) - Health policy and legislation tracking

{% hint style="success" %}
**Ready to access CDC Open Data?**

Questions? [Contact our team](mailto:support@dataplex-consulting.com) for a walkthrough.
{% endhint %}

| Platform       | Action                                                                                                                                                  |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Snowflake**  | [Get on Marketplace →](https://app.snowflake.com/marketplace/listing/GZT1Z125KEY)                                                                       |
| **Databricks** | [Subscribe →](https://checkout.dataplex-consulting.com/b/14A6oAcXFfOw13y9y4bQY01) \| [Start 14-Day Free Trial →](https://trial.dataplex-consulting.com) |

## Getting Started

### Basic Query Examples

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- List all available datasets
SELECT table_schema, table_name,
       ROW_NUMBER() OVER (ORDER BY table_schema, table_name) as dataset_number
FROM information_schema.views
WHERE table_schema LIKE 'DWV%'
AND table_name NOT IN ('DATASETS', 'DATASETS_BATCHES')
LIMIT 10;
```

```sql
-- Search for COVID-related datasets
SELECT v.table_schema, v.table_name, d.dataset_name, d.description
FROM information_schema.views v
JOIN dwv.datasets d ON v.table_name = d.view_name
WHERE v.table_schema LIKE 'DWV%'
AND (UPPER(d.dataset_name) LIKE '%COVID%'
     OR UPPER(d.description) LIKE '%COVID%')
ORDER BY v.table_schema, v.table_name;
```

```sql
-- Get sample data from a specific dataset
SELECT *
FROM dwv_nchs_stats.weekly_deaths_count_state_causes_2014_2__3yf8_kanr
LIMIT 100;
```

{% endtab %}
{% endtabs %}

### AI-Powered Dataset Discovery

{% tabs %}
{% tab title="Snowflake" %}
Use Snowflake's CORTEX AI to discover relevant datasets using natural language:

```sql
-- Use CORTEX AI to find datasets about invasive bacterial diseases
WITH find_dataset_prompt AS (
      SELECT
          -- =====================================================================
          -- STEP 1: UPDATE YOUR SEARCH PROMPT HERE
          -- =====================================================================
          -- Write a question that can be answered with YES or NO
          -- The AI will evaluate each dataset against this question
          -- TIP: Be specific but include synonyms for better matches
          -- =====================================================================
          'Is this dataset related to flu, influenza, or respiratory illness?' AS prompt
  ),

  dataset_analysis AS (
      SELECT
          source_dataset_id,
          dataset_name,
          description,
          schema_name || '.' || view_name AS view_location,

          -- =====================================================================
          -- CORTEX AI EVALUATION
          -- =====================================================================
          -- This sends each dataset to Snowflake's AI model for evaluation
          -- The AI reads the dataset name and description to determine if it
          -- matches your search criteria, understanding context beyond keywords
          -- =====================================================================
          SNOWFLAKE.CORTEX.COMPLETE(
              'snowflake-arctic',  -- AI model to use
              find_dataset_prompt.prompt ||
              ' Dataset name: ' || dataset_name ||
              ' Description: ' || COALESCE(LEFT(description, 500), 'No description') ||
              ' Answer with just YES or NO.'  -- Forces concise response
          ) AS matches_criteria
      FROM dwv.datasets
      JOIN find_dataset_prompt
  )

  SELECT
      source_dataset_id,
      dataset_name,
      description,
      view_location,
      matches_criteria AS ai_response  -- Optional: include to see actual AI response
  FROM dataset_analysis
  WHERE UPPER(TRIM(matches_criteria)) = 'YES'  -- Matches AI "YES" responses
  ORDER BY dataset_name;
```

**Example Results:**

| SOURCE\_DATASET\_ID | DATASET\_NAME                                        | DESCRIPTION\_PREVIEW                                                                                                                        | FULL\_TABLE\_NAME                                                                  | AVAILABLE |
| ------------------- | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- | --------- |
| qvzb-qs6p           | 1998-2023 Serotype Data for Invasive Pneumococcal... | CDC monitors invasive bacterial infections that cause bloodstream infections, sepsis, and meningitis in persons living in the community...  | dwv\_pub\_health\_surv.abcs\_pneumococcal\_serotype\_data\_1998\_20\_\_qvzb\_qs6p  | Yes       |
| 7mra-9cq9           | 2023 Respiratory Virus Response - NSSP Emergency...  | 2023 Respiratory Viruses Response – National Syndromic Surveillance Program Emergency Department Visit Trajectories - COVID-19, Flu, RSV... | dwv\_pub\_health\_surv.nssp\_ed\_visit\_trajectories\_by\_state\_202\_\_7mra\_9cq9 | Yes       |
| {% endtab %}        |                                                      |                                                                                                                                             |                                                                                    |           |
| {% endtabs %}       |                                                      |                                                                                                                                             |                                                                                    |           |

### Entity Relationship Diagram

The CDC Open Data Product follows a standardized data architecture where metadata tables track datasets and their processing batches, while actual health data is stored in categorized schemas:

<figure><img src="/files/p7EnaVuxudlhOylaCj2Q" alt=""><figcaption></figcaption></figure>

**Color Legend:**

* 🔵 **Blue (Metadata)**: Core system tables (`datasets`) - Dataset catalog and metadata
* 🟠 **Orange (Processing)**: Batch management (`datasets_batches`) - Processing workflow and versioning
* 🔴 **Red (Mortality)**: Death and mortality data - NCHS vital statistics
* 🟢 **Green (Vaccination)**: Immunization and vaccine data - Coverage and safety monitoring
* 🟣 **Purple (Surveillance)**: Disease surveillance - Infectious disease tracking
* 🟢 **Teal (Environmental)**: Environmental health - Air quality and toxicology

**Key Relationships:**

* Each `dataset` can have multiple `datasets_batches` over time
* Each data table record links to both its parent `dataset` and specific `batch_id`
* Use `datasets_batches.is_latest_batch = TRUE` to get current data
* All data tables follow the same schema pattern with `id`, `dataset_id`, `batch_id`, and `created_at`

## Data Categories

{% tabs %}
{% tab title="Snowflake" %}

#### Core Health Statistics

| Schema                     | Description                                                              | View Count |
| -------------------------- | ------------------------------------------------------------------------ | ---------- |
| **DWV\_NCHS\_STATS**       | National Center for Health Statistics - mortality, births, health trends | 236        |
| **DWV\_NNDSS\_DATA**       | National Notifiable Diseases Surveillance System                         | 295        |
| **DWV\_PUB\_HEALTH\_SURV** | Public Health Surveillance data                                          | 76         |
| **DWV\_CDC\_DATA\_CAT**    | CDC Data Catalog entries                                                 | 61         |

#### Disease-Specific Data

| Schema                         | Description                                     | View Count |
| ------------------------------ | ----------------------------------------------- | ---------- |
| **DWV\_VACC\_DATA**            | Vaccination coverage, adverse events, hesitancy | 89         |
| **DWV\_FLU\_VACCINATIONS**     | Influenza vaccination data                      | 13         |
| **DWV\_CHILD\_VAX**            | Childhood immunization coverage                 | 15         |
| **DWV\_FOOD\_WATER\_DISEASES** | Foodborne and waterborne illness tracking       | 9          |

#### Chronic Disease & Prevention

| Schema                         | Description                                        | View Count |
| ------------------------------ | -------------------------------------------------- | ---------- |
| **DWV\_HEART\_STROKE\_PREV**   | Cardiovascular disease prevention data             | 28         |
| **DWV\_B\_RISK\_FACTORS**      | Behavioral Risk Factor Surveillance System (BRFSS) | 28         |
| **DWV\_TOBACCO\_USE**          | Tobacco use surveillance and cessation             | 1          |
| **DWV\_SMOKING\_TOBACCO\_USE** | Smoking and tobacco use patterns                   | 3          |

#### Environmental & Occupational Health

| Schema                    | Description                           | View Count |
| ------------------------- | ------------------------------------- | ---------- |
| **DWV\_ENV\_HEALTH\_TOX** | Environmental health and toxicology   | 29         |
| **DWV\_MOTOR\_VEHICLES**  | Vehicle safety, crashes, and injuries | 45         |
| **DWV\_TBI\_DATA**        | Traumatic brain injury surveillance   | 3          |

#### Additional Categories

| Schema                          | Description                                                                                            | View Count |
| ------------------------------- | ------------------------------------------------------------------------------------------------------ | ---------- |
| **DWV\_ADM\_DATA**              | Hospital admission data                                                                                | 8          |
| **DWV\_ART\_CDC**               | Assisted Reproductive Technology data                                                                  | 12         |
| **DWV\_CANCER\_RESEARCH\_CITA** | Cancer research citations                                                                              | 1          |
| **DWV\_CDC\_CASE\_SURV**        | Case surveillance data                                                                                 | 6          |
| **DWV\_CDC\_CHRONIC\_DISEASE**  | Chronic disease surveillance                                                                           | 2          |
| **DWV\_CDC\_CITIES**            | City-level health indicators                                                                           | 65         |
| **DWV\_CDC\_MODELS**            | Disease modeling data                                                                                  | 2          |
| **DWV\_CESSATION\_COV**         | Smoking cessation coverage                                                                             | 7          |
| **DWV\_CORONA\_RESPIRATORY**    | Coronavirus respiratory data                                                                           | 2          |
| **DWV\_DISABILITY\_HEALTH**     | Disability health data                                                                                 | 3          |
| **DWV\_FUNDING\_DATA**          | CDC funding information                                                                                | 9          |
| **DWV\_GLOBAL\_HEALTH\_DATA**   | Global health indicators                                                                               | 8          |
| **DWV\_GLOBAL\_SURVEY\_DATA**   | Global survey data                                                                                     | 4          |
| **DWV\_HEALTHY\_AGING**         | Healthy aging data                                                                                     | 3          |
| **DWV\_HLTH\_COSTS**            | Healthcare costs data                                                                                  | 2          |
| **DWV\_HLTH\_PEOPLE2020**       | Healthy People 2020 indicators                                                                         | 1          |
| **DWV\_HLTH\_STATS**            | Health statistics                                                                                      | 11         |
| **DWV\_LAB\_SURVEILLANCE**      | Laboratory surveillance                                                                                | 9          |
| **DWV\_LEGIS\_DATA**            | Legislative data                                                                                       | 33         |
| **DWV\_MCH\_HEALTH**            | Maternal and Child Health                                                                              | 12         |
| **DWV\_MH\_DATA**               | Mental health data                                                                                     | 4          |
| **DWV\_NCEH\_DATA**             | Environmental health (NCEH)                                                                            | 1          |
| **DWV\_NCIRD\_IMMUNIZATION**    | Immunization data                                                                                      | 3          |
| **DWV\_NUTRI\_PHYS\_OBES**      | Nutrition and obesity data                                                                             | 9          |
| **DWV\_ORAL\_HEALTH**           | Oral health data                                                                                       | 11         |
| **DWV\_POLICY\_DATA**           | Healthcare policy data                                                                                 | 5          |
| **DWV\_POL\_SURV**              | Policy surveillance                                                                                    | 17         |
| **DWV\_PREG\_VACC**             | Pregnancy vaccination data                                                                             | 13         |
| **DWV\_PUB\_HEALTH\_INFRA**     | Public health infrastructure                                                                           | 2          |
| **DWV\_QUITLINE\_DATA**         | Quitline data                                                                                          | 8          |
| **DWV\_SURVEY\_DATA**           | Survey data                                                                                            | 18         |
| **DWV\_TEEN\_VAX**              | Teen vaccination data                                                                                  | 1          |
| **DWV\_UNSAFERESPONSEFILTER**   | Response filtering data                                                                                | 4          |
| **DWV\_V\_EYE\_HEALTH**         | Vision and eye health                                                                                  | 17         |
| **DWV\_WEB\_METRICS**           | Web metrics data                                                                                       | 6          |
| **DWV\_YRB\_BEHAVIORS**         | Youth risk behaviors                                                                                   | 18         |
| **DWV\_NCEZ\_ID**               | National Center for Environmental Health - NCEZ ID tracking and management data.                       | 7          |
| **DWV\_NCH\_PREV**              | National Center for Health Statistics - prevalence of chronic diseases and risk factors.               | 2          |
| **DWV\_OTHER\_DATA**            | Other health-related data not covered by specific categories within the CDC healthcare data warehouse. | 204        |
| {% endtab %}                    |                                                                                                        |            |

{% tab title="Databricks" %}

#### Core Health Statistics

| Schema                          | Description                                                              | View Count |
| ------------------------------- | ------------------------------------------------------------------------ | ---------- |
| **cdc\_dwv\_nchs\_stats**       | National Center for Health Statistics - mortality, births, health trends | 235        |
| **cdc\_dwv\_nndss\_data**       | National Notifiable Diseases Surveillance System                         | 295        |
| **cdc\_dwv\_pub\_health\_surv** | Public Health Surveillance data                                          | 73         |
| **cdc\_dwv\_cdc\_data\_cat**    | CDC Data Catalog entries                                                 | 57         |

#### Disease-Specific Data

| Schema                              | Description                                     | View Count |
| ----------------------------------- | ----------------------------------------------- | ---------- |
| **cdc\_dwv\_vacc\_data**            | Vaccination coverage, adverse events, hesitancy | 87         |
| **cdc\_dwv\_flu\_vaccinations**     | Influenza vaccination data                      | 13         |
| **cdc\_dwv\_child\_vax**            | Childhood immunization coverage                 | 15         |
| **cdc\_dwv\_food\_water\_diseases** | Foodborne and waterborne illness tracking       | 9          |

#### Chronic Disease & Prevention

| Schema                              | Description                                        | View Count |
| ----------------------------------- | -------------------------------------------------- | ---------- |
| **cdc\_dwv\_heart\_stroke\_prev**   | Cardiovascular disease prevention data             | 28         |
| **cdc\_dwv\_b\_risk\_factors**      | Behavioral Risk Factor Surveillance System (BRFSS) | 28         |
| **cdc\_dwv\_tobacco\_use**          | Tobacco use surveillance and cessation             | 1          |
| **cdc\_dwv\_smoking\_tobacco\_use** | Smoking and tobacco use patterns                   | 3          |

#### Environmental & Occupational Health

| Schema                         | Description                           | View Count |
| ------------------------------ | ------------------------------------- | ---------- |
| **cdc\_dwv\_env\_health\_tox** | Environmental health and toxicology   | 25         |
| **cdc\_dwv\_motor\_vehicles**  | Vehicle safety, crashes, and injuries | 45         |
| **cdc\_dwv\_tbi\_data**        | Traumatic brain injury surveillance   | 3          |

#### Additional Categories

| Schema                               | Description                           | View Count |
| ------------------------------------ | ------------------------------------- | ---------- |
| **cdc\_dwv\_adm\_data**              | Hospital admission data               | 8          |
| **cdc\_dwv\_art\_cdc**               | Assisted Reproductive Technology data | 12         |
| **cdc\_dwv\_cancer\_research\_cita** | Cancer research citations             | 1          |
| **cdc\_dwv\_cdc\_case\_surv**        | Case surveillance data                | 6          |
| **cdc\_dwv\_cdc\_chronic\_disease**  | Chronic disease surveillance          | 2          |
| **cdc\_dwv\_cdc\_cities**            | City-level health indicators          | 64         |
| **cdc\_dwv\_cdc\_models**            | Disease modeling data                 | 2          |
| **cdc\_dwv\_cessation\_cov**         | Smoking cessation coverage            | 7          |
| **cdc\_dwv\_corona\_respiratory**    | Coronavirus respiratory data          | 2          |
| **cdc\_dwv\_disability\_health**     | Disability health data                | 3          |
| **cdc\_dwv\_funding\_data**          | CDC funding information               | 9          |
| **cdc\_dwv\_global\_health\_data**   | Global health indicators              | 8          |
| **cdc\_dwv\_global\_survey\_data**   | Global survey data                    | 4          |
| **cdc\_dwv\_healthy\_aging**         | Healthy aging data                    | 3          |
| **cdc\_dwv\_hlth\_costs**            | Healthcare costs data                 | 2          |
| **cdc\_dwv\_hlth\_people2020**       | Healthy People 2020 indicators        | 1          |
| **cdc\_dwv\_hlth\_stats**            | Health statistics                     | 11         |
| **cdc\_dwv\_lab\_surveillance**      | Laboratory surveillance               | 9          |
| **cdc\_dwv\_legis\_data**            | Legislative data                      | 33         |
| **cdc\_dwv\_mch\_health**            | Maternal and Child Health             | 12         |
| **cdc\_dwv\_mh\_data**               | Mental health data                    | 4          |
| **cdc\_dwv\_nceh\_data**             | Environmental health (NCEH)           | 1          |
| **cdc\_dwv\_ncird\_immunization**    | Immunization data                     | 3          |
| **cdc\_dwv\_nutri\_phys\_obes**      | Nutrition and obesity data            | 9          |
| **cdc\_dwv\_oral\_health**           | Oral health data                      | 11         |
| **cdc\_dwv\_policy\_data**           | Healthcare policy data                | 5          |
| **cdc\_dwv\_pol\_surv**              | Policy surveillance                   | 17         |
| **cdc\_dwv\_preg\_vacc**             | Pregnancy vaccination data            | 13         |
| **cdc\_dwv\_pub\_health\_infra**     | Public health infrastructure          | 2          |
| **cdc\_dwv\_quitline\_data**         | Quitline data                         | 8          |
| **cdc\_dwv\_survey\_data**           | Survey data                           | 18         |
| **cdc\_dwv\_teen\_vax**              | Teen vaccination data                 | 1          |
| **cdc\_dwv\_unsaferesponsefilter**   | Response filtering data               | 4          |
| **cdc\_dwv\_v\_eye\_health**         | Vision and eye health                 | 17         |
| **cdc\_dwv\_web\_metrics**           | Web metrics data                      | 6          |
| **cdc\_dwv\_yrb\_behaviors**         | Youth risk behaviors                  | 18         |
| {% endtab %}                         |                                       |            |
| {% endtabs %}                        |                                       |            |

## Dataset Features

* **Comprehensive Coverage**: Over 50 health topic areas from infectious diseases to health behaviors
* **Continuous Monitoring**: New CDC datasets and updates are automatically incorporated
* **Rigorous Quality Assurance**: Each dataset batch undergoes automated QA checks
* **Schema Evolution**: Automatic adaptation to changes in source data structures
* **Standardized Format**: Consistent column naming and data types across all datasets

## Data Quality and Maintenance

### Quality Assurance Process

* **Automated Checks**: Each dataset batch is subject to automated QA checks before being made available
* **Data Validation**: Checks for data integrity, including row count validation and data type consistency
* **Schema Evolution**: System automatically adapts to changes in source data schemas
* **Freshness Tracking**: `last_refresh_timestamp` in `DW.DATASETS` indicates most recent updates

### Update Schedule

* **Daily**: NNDSS surveillance data
* **Weekly**: NCHS mortality data, vaccination monitoring
* **Monthly**: BRFSS, chronic disease indicators
* **Quarterly**: Survey data, policy tracking
* **Annual**: Comprehensive health surveys

## Business Applications

The CDC Open Data Product can be utilized in various business applications:

* **Public Health Research**: Epidemiological studies and population health analysis
* **Healthcare Policy Development**: Evidence-based policy creation and evaluation
* **Risk Assessment**: Health risk assessment and management across populations
* **Resource Allocation**: Data-driven healthcare resource planning
* **Disease Surveillance**: Outbreak monitoring and prediction modeling
* **Population Health Management**: Community health assessment and intervention planning

## Data Dictionary

### Common Columns

Most CDC datasets include these standard columns:

| Column       | Description                 | Type           |
| ------------ | --------------------------- | -------------- |
| `ID`         | Unique record identifier    | VARCHAR(36)    |
| `DATASET_ID` | Links to dataset metadata   | VARCHAR(36)    |
| `BATCH_ID`   | Processing batch identifier | VARCHAR(36)    |
| `CREATED_AT` | Record creation timestamp   | TIMESTAMP\_NTZ |

### Metadata Access

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Get detailed dataset information
SELECT
    d.source_dataset_id,
    d.dataset_name,
    d.description,
    d.category,
    d.url as source_url,
    d.last_refresh_timestamp
FROM dwv.datasets d
WHERE d.view_name = 'YOUR_VIEW_NAME';
```

{% endtab %}

{% tab title="Databricks" %}

```sql
-- Get detailed dataset information
SELECT
    d.source_dataset_id,
    d.dataset_name,
    d.description,
    d.category,
    d.url as source_url,
    d.last_refresh_timestamp
FROM cdc_dwv.datasets d
WHERE d.view_name = 'YOUR_VIEW_NAME';
```

{% endtab %}
{% endtabs %}

## Support & Documentation

### Source Data Access

Access original CDC data sources and documentation:

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Find datasets with original source URLs
SELECT
    view_name,
    dataset_name,
    url as source_url
FROM dwv.datasets
WHERE url IS NOT NULL
ORDER BY dataset_name;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
-- Find datasets with original source URLs
SELECT
    view_name,
    dataset_name,
    url as source_url
FROM cdc_dwv.datasets
WHERE url IS NOT NULL
ORDER BY dataset_name;
```

{% endtab %}
{% endtabs %}

### Working with Latest Data Batches

**Important**: CDC datasets are processed in batches over time. To avoid duplicate records and ensure you're working with the most recent data, always filter for the latest batch using `datasets_batches.is_latest_batch = TRUE`.

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Get latest batch information for datasets
SELECT
    d.dataset_name,
    d.view_name,
    b.id batch_id,
    b.total_rows_processed row_count,
    b.created_at as batch_created,
    b.processing_date as batch_completed
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
WHERE b.is_latest_batch = TRUE
ORDER BY b.processing_date DESC
LIMIT 10;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
-- Get latest batch information for datasets
SELECT
    d.dataset_name,
    d.view_name,
    b.id batch_id,
    b.total_rows_processed row_count,
    b.created_at as batch_created,
    b.processing_date as batch_completed
FROM cdc_dwv.datasets d
JOIN cdc_dwv.datasets_batches b ON d.id = b.dataset_id
WHERE b.is_latest_batch = TRUE
ORDER BY b.processing_date DESC
LIMIT 10;
```

{% endtab %}
{% endtabs %}

### Joining to Actual Data Tables

Here are examples showing how to properly join from metadata tables to actual data tables using the latest batch filter:

#### Example 1: Latest COVID-19 Mortality Data

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Weekly death counts by state for trend analysis
SELECT
    jurisdiction_of_occurrence,
    mmwryear,
    mmwrweek,
    allcause,
    naturalcause,
    flag_diab
FROM dwv.datasets d
JOIN dwv.datasets_batches db ON d.id = db.dataset_id
JOIN dwv_nchs_stats.weekly_deaths_count_state_causes_2014_2__3yf8_kanr a
    ON db.id = a.batch_id
WHERE mmwryear >= 2018
    AND is_latest_batch
ORDER BY jurisdiction_of_occurrence, mmwryear, mmwrweek;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
-- Weekly death counts by state for trend analysis
SELECT
    jurisdiction_of_occurrence,
    mmwryear,
    mmwrweek,
    allcause,
    naturalcause,
    flag_diab
FROM cdc_dwv.datasets d
JOIN cdc_dwv.datasets_batches db ON d.id = db.dataset_id
JOIN cdc_dwv_nchs_stats.weekly_deaths_count_state_causes_2014_2__3yf8_kanr a
    ON db.id = a.batch_id
WHERE mmwryear >= 2018
    AND is_latest_batch
ORDER BY jurisdiction_of_occurrence, mmwryear, mmwrweek;
```

{% endtab %}
{% endtabs %}

#### Example 2: Latest Vaccination Coverage with Metadata

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Get latest childhood vaccination data with full context
SELECT
    d.dataset_name,
    d.category,
    d.url as source_url,
    vax.geography state,
    vax.year_season,
    vax.vaccine,
    vax.dimension_type,
    vax.coverage_estimate
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
JOIN dwv_child_vax.vaccination_coverage_young_children_0_3__fhky_rtsk vax
    ON vax.batch_id = b.id
WHERE b.is_latest_batch = TRUE
ORDER BY vax.geography, vax.vaccine;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
-- Get latest childhood vaccination data with full context
SELECT
    d.dataset_name,
    d.category,
    d.url as source_url,
    vax.geography state,
    vax.year_season,
    vax.vaccine,
    vax.dimension_type,
    vax.coverage_estimate
FROM cdc_dwv.datasets d
JOIN cdc_dwv.datasets_batches b ON d.id = b.dataset_id
JOIN cdc_dwv_child_vax.vaccination_coverage_young_children_0_3__fhky_rtsk vax
    ON vax.batch_id = b.id
WHERE b.is_latest_batch = TRUE
ORDER BY vax.geography, vax.vaccine;
```

{% endtab %}
{% endtabs %}

#### Example 3: Latest Environmental Health Data with Batch Tracking

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Get latest air quality measures with processing information
SELECT
    d.dataset_name,
    air.statename,
    air.countyname,
    air.reportyear,
    air.measuretype,
    air.value,
    air.unit,
    b.total_rows_processed as total_records_in_batch,
    b.processing_date as data_refresh_date
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
JOIN dwv_env_health_tox.air_quality_measures_neht_network__cjae_szjv air
    ON air.batch_id = b.id
WHERE b.is_latest_batch = TRUE
    AND air.measurename LIKE '%PM2.5%'
ORDER BY air.statename, air.countyname, air.reportyear DESC;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
-- Get latest air quality measures with processing information
SELECT
    d.dataset_name,
    air.statename,
    air.countyname,
    air.reportyear,
    air.measuretype,
    air.value,
    air.unit,
    b.total_rows_processed as total_records_in_batch,
    b.processing_date as data_refresh_date
FROM cdc_dwv.datasets d
JOIN cdc_dwv.datasets_batches b ON d.id = b.dataset_id
JOIN cdc_dwv_env_health_tox.air_quality_measures_neht_network__cjae_szjv air
    ON air.batch_id = b.id
WHERE b.is_latest_batch = TRUE
    AND air.measurename LIKE '%PM2.5%'
ORDER BY air.statename, air.countyname, air.reportyear DESC;
```

{% endtab %}
{% endtabs %}

### Why Use Latest Batch Filtering?

**Without latest batch filtering**, you may encounter:

* Duplicate records from historical processing runs
* Inconsistent row counts across queries
* Outdated data mixed with current data

**With `is_latest_batch = TRUE`**, you ensure:

* Only the most recent version of each dataset
* Consistent results across different query runs
* Accurate row counts and data freshness
* Optimal query performance

## Example Use Cases

1. **COVID-19 Impact Analysis**: Analyze trends in COVID-19 deaths across different age groups and regions
2. **Tobacco Consumption Trends**: Track changes in tobacco consumption patterns over time and across states
3. **Bacterial Surveillance**: Monitor the prevalence of invasive bacterial infections across demographics
4. **Cardiovascular Disease Risk Assessment**: Analyze risk factors and trends in cardiovascular diseases
5. **Immunization Coverage Evaluation**: Assess vaccination rates and their impact on disease outbreaks

***

## Get Started

{% hint style="success" %}
**CDC Public Health Data Access**

#### Choose Your Platform

{% endhint %}

| Platform       | Get Access                                                                        | Free Trial                                                         |
| -------------- | --------------------------------------------------------------------------------- | ------------------------------------------------------------------ |
| **Snowflake**  | [Get on Marketplace →](https://app.snowflake.com/marketplace/listing/GZT1Z125KEY) | Available via Marketplace                                          |
| **Databricks** | [Subscribe →](https://checkout.dataplex-consulting.com/b/14A6oAcXFfOw13y9y4bQY01) | [Start 14-Day Free Trial →](https://trial.dataplex-consulting.com) |

|                  |                                                                  |
| ---------------- | ---------------------------------------------------------------- |
| **Includes**     | All 1,294+ datasets, daily to annual updates, full documentation |
| **Support**      | Email support included                                           |
| **Cancellation** | Cancel anytime, no long-term commitment                          |

***

Docs Last Updated: 5/9/2026


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.dataplex-consulting.com/data-catalog/cdc-open-data-product.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
