CDC Open Data Product

About the Dataset

The CDC Open Data Product is a comprehensive data solution that transforms and delivers over 1,197 high-quality, up-to-date public health datasets from the Centers for Disease Control and Prevention (CDC). This product encompasses more than 511 GB of data and growing, with over 28,580 attributes, offering researchers, analysts, and data scientists unparalleled access to a vast array of public health information.

🔗 Find the CDC Open Data Product on the Snowflake Marketplace.

Quick Access

Schemas: DWV* (Data Warehouse Views) Total Datasets: 1,197 views across 50+ categories Update Frequency: Daily to annual, depending on CDC source

Overview

The CDC Open Data Product provides standardized access to diverse health datasets including:

  • NCHS Statistics (213 views) - National Center for Health Statistics data

  • NNDSS Data (295 views) - National Notifiable Diseases Surveillance System

  • CDC Cities (57 views) - City-level health indicator data

  • Motor Vehicles (45 views) - Vehicle safety and injury data

  • Vaccination Data (82 views) - Immunization coverage and safety

  • Environmental Health (25 views) - Environmental exposures and health impacts

  • Heart & Stroke Prevention (26 views) - Cardiovascular health data

  • Legislative Data (33 views) - Health policy and legislation tracking

Getting Started

Basic Query Examples

-- List all available datasets
SELECT table_schema, table_name, 
       ROW_NUMBER() OVER (ORDER BY table_schema, table_name) as dataset_number
FROM information_schema.views
WHERE table_schema LIKE 'DWV%'
AND table_name NOT IN ('DATASETS', 'DATASETS_BATCHES')
LIMIT 10;
-- Search for COVID-related datasets
SELECT v.table_schema, v.table_name, d.dataset_name, d.description
FROM information_schema.views v
JOIN dwv.datasets d ON v.table_name = d.view_name
WHERE v.table_schema LIKE 'DWV%'
AND (UPPER(d.dataset_name) LIKE '%COVID%' 
     OR UPPER(d.description) LIKE '%COVID%')
ORDER BY v.table_schema, v.table_name;
-- Get sample data from a specific dataset
SELECT *
FROM dwv_nchs_stats.weekly_deaths_count_state_causes_2014_2__3yf8_kanr
LIMIT 100;

AI-Powered Dataset Discovery

Use Snowflake's CORTEX AI to discover relevant datasets using natural language:

-- Use CORTEX AI to find datasets about invasive bacterial diseases
WITH find_dataset_prompt AS (
      SELECT
          -- =====================================================================
          -- STEP 1: UPDATE YOUR SEARCH PROMPT HERE
          -- =====================================================================
          -- Write a question that can be answered with YES or NO
          -- The AI will evaluate each dataset against this question
          -- TIP: Be specific but include synonyms for better matches
          -- =====================================================================
          'Is this dataset related to flu, influenza, or respiratory illness?' AS prompt
  ),

  dataset_analysis AS (
      SELECT
          source_dataset_id,
          dataset_name,
          description,
          schema_name || '.' || view_name AS view_location,

          -- =====================================================================
          -- CORTEX AI EVALUATION
          -- =====================================================================
          -- This sends each dataset to Snowflake's AI model for evaluation
          -- The AI reads the dataset name and description to determine if it
          -- matches your search criteria, understanding context beyond keywords
          -- =====================================================================
          SNOWFLAKE.CORTEX.COMPLETE(
              'snowflake-arctic',  -- AI model to use
              find_dataset_prompt.prompt ||
              ' Dataset name: ' || dataset_name ||
              ' Description: ' || COALESCE(LEFT(description, 500), 'No description') ||
              ' Answer with just YES or NO.'  -- Forces concise response
          ) AS matches_criteria
      FROM dwv.datasets
      JOIN find_dataset_prompt
  )
  
  SELECT
      source_dataset_id,
      dataset_name,
      description,
      view_location,
      matches_criteria AS ai_response  -- Optional: include to see actual AI response
  FROM dataset_analysis
  WHERE UPPER(TRIM(matches_criteria)) = 'YES'  -- Matches AI "YES" responses
  ORDER BY dataset_name;

Example Results:

SOURCE_DATASET_ID
DATASET_NAME
DESCRIPTION_PREVIEW
FULL_TABLE_NAME
AVAILABLE

qvzb-qs6p

1998-2023 Serotype Data for Invasive Pneumococcal...

CDC monitors invasive bacterial infections that cause bloodstream infections, sepsis, and meningitis in persons living in the community...

dwv_pub_health_surv.abcs_pneumococcal_serotype_data_1998_20__qvzb_qs6p

Yes

7mra-9cq9

2023 Respiratory Virus Response - NSSP Emergency...

2023 Respiratory Viruses Response – National Syndromic Surveillance Program Emergency Department Visit Trajectories - COVID-19, Flu, RSV...

dwv_pub_health_surv.nssp_ed_visit_trajectories_by_state_202__7mra_9cq9

Yes

Entity Relationship Diagram

The CDC Open Data Product follows a standardized data architecture where metadata tables track datasets and their processing batches, while actual health data is stored in categorized schemas:

Color Legend:

  • 🔵 Blue (Metadata): Core system tables (datasets) - Dataset catalog and metadata

  • 🟠 Orange (Processing): Batch management (datasets_batches) - Processing workflow and versioning

  • 🔴 Red (Mortality): Death and mortality data - NCHS vital statistics

  • 🟢 Green (Vaccination): Immunization and vaccine data - Coverage and safety monitoring

  • 🟣 Purple (Surveillance): Disease surveillance - Infectious disease tracking

  • 🟢 Teal (Environmental): Environmental health - Air quality and toxicology

Key Relationships:

  • Each dataset can have multiple datasets_batches over time

  • Each data table record links to both its parent dataset and specific batch_id

  • Use datasets_batches.is_latest_batch = TRUE to get current data

  • All data tables follow the same schema pattern with id, dataset_id, batch_id, and created_at

Data Categories

Core Health Statistics

Schema
Description
View Count

DWV_NCHS_STATS

National Center for Health Statistics - mortality, births, health trends

214

DWV_NNDSS_DATA

National Notifiable Diseases Surveillance System

295

DWV_PUB_HEALTH_SURV

Public Health Surveillance data

66

DWV_CDC_DATA_CAT

CDC Data Catalog entries

56

Disease-Specific Data

Schema
Description
View Count

DWV_VACC_DATA

Vaccination coverage, adverse events, hesitancy

82

DWV_FLU_VACCINATIONS

Influenza vaccination data

12

DWV_CHILD_VAX

Childhood immunization coverage

15

DWV_FOOD_WATER_DISEASES

Foodborne and waterborne illness tracking

9

Chronic Disease & Prevention

Schema
Description
View Count

DWV_HEART_STROKE_PREV

Cardiovascular disease prevention data

26

DWV_B_RISK_FACTORS

Behavioral Risk Factor Surveillance System (BRFSS)

28

DWV_TOBACCO_USE

Tobacco use surveillance and cessation

1

DWV_SMOKING_TOBACCO_USE

Smoking and tobacco use patterns

3

Environmental & Occupational Health

Schema
Description
View Count

DWV_ENV_HEALTH_TOX

Environmental health and toxicology

25

DWV_MOTOR_VEHICLES

Vehicle safety, crashes, and injuries

45

DWV_TBI_DATA

Traumatic brain injury surveillance

3

DWV_ADM_DATA

Hospital admission data including patient demographics, diagnoses, procedures, and outcomes.

8

DWV_ART_CDC

Art Cdc - Data schema focusing on healthcare data related to ART (Assisted Reproductive Technology) from the Centers for Disease Control and Prevention.

12

DWV_CANCER_RESEARCH_CITA

Cancer research data schema focusing on clinical trials and treatment outcomes.

1

DWV_CDC_CASE_SURV

CDC healthcare data schema DWV_CDC_CASE_SURV focuses on tracking and analyzing cases of various diseases and health conditions for surveillance and response purposes.

6

DWV_CDC_CHRONIC_DISEASE

Chronic disease surveillance and epidemiology data collected by the CDC for monitoring and analyzing trends in long-term health conditions.

2

DWV_CDC_CITIES

CDC cities data schema focusing on health indicators and statistics related to urban populations.

57

DWV_CDC_MODELS

CDC healthcare data schema focusing on disease surveillance, outbreak modeling, and epidemiological analysis.

2

DWV_CESSATION_COV

Smoking cessation coverage data for evaluating tobacco cessation programs and interventions.

7

DWV_CORONA_RESPIRATORY

Respiratory data related to the coronavirus outbreak, including testing results, symptoms, and outcomes.

1

DWV_DISABILITY_HEALTH

Disability health data schema focusing on healthcare information related to disabilities.

3

DWV_FUNDING_DATA

CDC Funding Data schema (DWV_FUNDING_DATA) provides comprehensive information on funding sources and allocations related to healthcare initiatives and programs.

9

DWV_GLOBAL_HEALTH_DATA

Global health data schema containing comprehensive data on various health indicators, diseases, and trends worldwide.

8

DWV_GLOBAL_SURVEY_DATA

Global survey data on various health topics collected and analyzed by the CDC.

4

DWV_HEALTHY_AGING

Healthy aging data schema focusing on various health indicators and trends related to aging populations, including chronic conditions, preventive care, and quality of life measures.

3

DWV_HLTH_COSTS

Healthcare costs data schema focusing on tracking and analyzing various health-related expenditures and financial aspects within the healthcare system.

2

DWV_HLTH_PEOPLE2020

Health data schema focusing on people-related indicators for the year 2020, including demographics, health behaviors, and outcomes.

1

DWV_HLTH_STATS

Health statistics data schema focusing on a wide range of healthcare statistics and trends.

11

DWV_LAB_SURVEILLANCE

Laboratory surveillance data schema focusing on monitoring and tracking of various health-related lab tests and results.

8

DWV_LEGIS_DATA

Legislative data related to healthcare policies and regulations.

33

DWV_MCH_HEALTH

Maternal and Child Health data including vital statistics, pregnancy outcomes, and child health indicators.

12

DWV_MH_DATA

Mental health data including diagnoses, treatments, and outcomes.

1

DWV_NCEH_DATA

Environmental health data collected by the National Center for Environmental Health (NCEH), including information on toxicology, exposures, and related topics.

1

DWV_NCIRD_IMMUNIZATION

Immunization coverage, vaccine-preventable diseases, vaccination schedules.

2

DWV_NUTRI_PHYS_OBES

Nutrition, physical activity, and obesity data schema for CDC healthcare analysis.

8

DWV_ORAL_HEALTH

Oral health data schema focusing on dental care, oral hygiene, and related health indicators.

11

DWV_POLICY_DATA

Healthcare policy data including information on coverage, regulations, and trends.

5

DWV_POL_SURV

Public health surveillance data related to political factors and their impact on health outcomes.

17

DWV_PREG_VACC

Pregnancy vaccination data including coverage rates and adverse events.

12

DWV_PUB_HEALTH_INFRA

Public health infrastructure data schema focusing on key indicators related to healthcare systems, facilities, and resources.

2

DWV_QUITLINE_DATA

Quitline data schema for tracking and analyzing smoking cessation efforts and outcomes.

8

DWV_SURVEY_DATA

Survey data related to various health topics, collected and analyzed by the CDC for public health research and surveillance purposes.

18

DWV_TEEN_VAX

Adolescent vaccination coverage and trends data schema.

1

DWV_UNSAFERESPONSEFILTER

Unsaferesponsefilter - CDC healthcare data schema focusing on identifying and filtering unsafe responses in healthcare settings.

4

DWV_V_EYE_HEALTH

Vision health data schema focusing on eye health indicators and trends.

15

DWV_WEB_METRICS

Web metrics data schema focusing on tracking and analyzing various health-related metrics on CDC websites for monitoring and improving online health information dissemination.

4

DWV_YRB_BEHAVIORS

Youth Risk Behavior Surveillance System - behaviors related to health risks and protective factors among youth.

18

Dataset Features

  • Comprehensive Coverage: Over 50 health topic areas from infectious diseases to health behaviors

  • Continuous Monitoring: New CDC datasets and updates are automatically incorporated

  • Rigorous Quality Assurance: Each dataset batch undergoes automated QA checks

  • Schema Evolution: Automatic adaptation to changes in source data structures

  • Standardized Format: Consistent column naming and data types across all datasets

Data Quality and Maintenance

Quality Assurance Process

  • Automated Checks: Each dataset batch is subject to automated QA checks before being made available

  • Data Validation: Checks for data integrity, including row count validation and data type consistency

  • Schema Evolution: System automatically adapts to changes in source data schemas

  • Freshness Tracking: last_refresh_timestamp in DW.DATASETS indicates most recent updates

Update Schedule

  • Daily: NNDSS surveillance data

  • Weekly: NCHS mortality data, vaccination monitoring

  • Monthly: BRFSS, chronic disease indicators

  • Quarterly: Survey data, policy tracking

  • Annual: Comprehensive health surveys

Business Applications

The CDC Open Data Product can be utilized in various business applications:

  • Public Health Research: Epidemiological studies and population health analysis

  • Healthcare Policy Development: Evidence-based policy creation and evaluation

  • Risk Assessment: Health risk assessment and management across populations

  • Resource Allocation: Data-driven healthcare resource planning

  • Disease Surveillance: Outbreak monitoring and prediction modeling

  • Population Health Management: Community health assessment and intervention planning

Data Dictionary

Common Columns

Most CDC datasets include these standard columns:

Column
Description
Type

ID

Unique record identifier

VARCHAR(36)

DATASET_ID

Links to dataset metadata

VARCHAR(36)

BATCH_ID

Processing batch identifier

VARCHAR(36)

CREATED_AT

Record creation timestamp

TIMESTAMP_NTZ

Metadata Access

-- Get detailed dataset information
SELECT 
    d.source_dataset_id,
    d.dataset_name,
    d.description,
    d.category,
    d.url as source_url,
    d.last_refresh_timestamp
FROM datasets d
WHERE d.view_name = 'YOUR_VIEW_NAME';

Support & Documentation

Source Data Access

Access original CDC data sources and documentation:

-- Find datasets with original source URLs
SELECT 
    view_name,
    dataset_name,
    url as source_url
FROM dwv.datasets
WHERE url IS NOT NULL
ORDER BY dataset_name;

Working with Latest Data Batches

Important: CDC datasets are processed in batches over time. To avoid duplicate records and ensure you're working with the most recent data, always filter for the latest batch using datasets_batches.is_latest_batch = TRUE.

-- Get latest batch information for datasets
SELECT 
    d.dataset_name,
    d.view_name,
    b.id batch_id,
    b.total_rows_processed row_count,
    b.created_at as batch_created,
    b.processing_date as batch_completed
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
WHERE b.is_latest_batch = TRUE
ORDER BY b.processing_date DESC
LIMIT 10;

Joining to Actual Data Tables

Here are examples showing how to properly join from metadata tables to actual data tables using the latest batch filter:

Example 1: Latest COVID-19 Mortality Data

 -- Weekly death counts by state for trend analysis
SELECT 
    jurisdiction_of_occurrence,
    mmwryear,
    mmwrweek,
    allcause,
    naturalcause,
    flag_diab
FROM dwv.datasets d
join dwv.datasets_batches db on d.id = db.dataset_id 
join dwv_nchs_stats.weekly_deaths_count_state_causes_2014_2__3yf8_kanr a
 on db.id = a.batch_id
WHERE mmwryear >= 2018
 and is_latest_batch
ORDER BY jurisdiction_of_occurrence, mmwryear, mmwrweek;

Example 2: Latest Vaccination Coverage with Metadata

-- Get latest childhood vaccination data with full context
SELECT 
    d.dataset_name,
    d.category,
    d.url as source_url,
    vax.geography state,
    vax.year_season,
    vax.vaccine,
    vax.dimension_type,
    vax.dimension_type,
    vax.coverage_estimate
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
JOIN dwv_child_vax.vaccination_coverage_young_children_0_3__fhky_rtsk vax 
    ON vax.batch_id = b.id
WHERE b.is_latest_batch = TRUE
ORDER BY vax.geography, vax.vaccine;

Example 3: Latest Environmental Health Data with Batch Tracking

-- Get latest air quality measures with processing information
SELECT 
    d.dataset_name,
    air.statename,
    air.countyname,
    air.reportyear,
    air.measuretype,
    air.value,
    air.unit,
    b.total_rows_processed as total_records_in_batch,
    b.processing_date as data_refresh_date
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
JOIN dwv_env_health_tox.air_quality_measures_neht_network__cjae_szjv air 
    ON air.batch_id = b.id
WHERE b.is_latest_batch = TRUE
  AND air.measurename LIKE '%PM2.5%'
ORDER BY air.statename, air.countyname, air.reportyear DESC;

Why Use Latest Batch Filtering?

Without latest batch filtering, you may encounter:

  • Duplicate records from historical processing runs

  • Inconsistent row counts across queries

  • Outdated data mixed with current data

With is_latest_batch = TRUE, you ensure:

  • Only the most recent version of each dataset

  • Consistent results across different query runs

  • Accurate row counts and data freshness

  • Optimal query performance

Example Use Cases

  1. COVID-19 Impact Analysis: Analyze trends in COVID-19 deaths across different age groups and regions

  2. Tobacco Consumption Trends: Track changes in tobacco consumption patterns over time and across states

  3. Bacterial Surveillance: Monitor the prevalence of invasive bacterial infections across demographics

  4. Cardiovascular Disease Risk Assessment: Analyze risk factors and trends in cardiovascular diseases

  5. Immunization Coverage Evaluation: Assess vaccination rates and their impact on disease outbreaks


Docs Last Updated: October 27th, 2025

Last updated