CDC Open Data Product

About the Dataset

The CDC Open Data Product is a comprehensive data solution that transforms and delivers over 1,200 high-quality, up-to-date public health datasets from the Centers for Disease Control and Prevention (CDC). This product encompasses more than 500GB of data and growing, with over 27,000 attributes, offering researchers, analysts, and data scientists unparalleled access to a vast array of public health information.

🔗 Find the CDC Open Data Product on the Snowflake Marketplace.

Quick Access

Schemas: DWV* (Data Warehouse Views) Total Datasets: 1,200+ views across 50+ categories Update Frequency: Daily to annual, depending on CDC source

Overview

The CDC Open Data Product provides standardized access to diverse health datasets including:

  • NCHS Statistics (213 views) - National Center for Health Statistics data

  • NNDSS Data (295 views) - National Notifiable Diseases Surveillance System

  • CDC Cities (57 views) - City-level health indicator data

  • Motor Vehicles (45 views) - Vehicle safety and injury data

  • Vaccination Data (82 views) - Immunization coverage and safety

  • Environmental Health (25 views) - Environmental exposures and health impacts

  • Heart & Stroke Prevention (26 views) - Cardiovascular health data

  • Legislative Data (33 views) - Health policy and legislation tracking

Browse by Category

Category
Datasets
Documentation

Administrative

4

500 Cities & Places

57

Assisted Reproductive Technology (ART)

11

Behavioral Risk Factors

28

Cancer Research Citation Search

1

Case Surveillance

71

Cessation Coverage

6

Child Vaccinations

15

Chronic Disease Indicators

44

Coronavirus and Other Respiratory Viruses

18

Disability and Health

11

Environmental Health and Toxicology

25

Flu Vaccinations

12

Foodborne, Waterborne, and Related Diseases

9

Global Health

2

Global Survey Data

1

Health Consequences and Costs

10

Health Statistics

18

Healthy Aging

1

Healthy People 2020

1

Heart Disease and Stroke Prevention

26

Injury and Violence

25

Laboratory Surveillance

8

Legislation

33

Maternal and Child Health

21

Mental Health

11

Motor Vehicle

45

National Center for Environmental Health

3

National Center for Health Statistics

213

National Center for Immunization and Respiratory Diseases

1

National Center for State, Tribal, Local, and Territorial Public Health Infrastructure and Workforce

1

Nutrition, Physical Activity, and Obesity

16

Oral Health

4

Policy Surveillance

5

Pregnancy and Vaccination

1

Public Health Surveillance

66

Smoking and Tobacco Use

3

Survey Data

30

Survey Questions Tobacco Use

1

Teen Vaccinations

4

Traumatic Brain Injury

3

Vaccinations

82

Vision and Eye Health

4

Web Metrics

3

Youth Risk Behaviors

28

Uncategorized

55

Total Categories: 47 | Total Datasets: 1,200+

Getting Started

Basic Query Examples

-- List all available datasets
SELECT table_schema, table_name, 
       ROW_NUMBER() OVER (ORDER BY table_schema, table_name) as dataset_number
FROM information_schema.views
WHERE table_schema LIKE 'DWV%'
AND table_name NOT IN ('DATASETS', 'DATASETS_BATCHES')
LIMIT 10;
-- Search for COVID-related datasets
SELECT v.table_schema, v.table_name, d.dataset_name, d.description
FROM information_schema.views v
JOIN dwv.datasets d ON v.table_name = d.view_name
WHERE v.table_schema LIKE 'DWV%'
AND (UPPER(d.dataset_name) LIKE '%COVID%' 
     OR UPPER(d.description) LIKE '%COVID%')
ORDER BY v.table_schema, v.table_name;
-- Get sample data from a specific dataset
SELECT *
FROM dwv_nchs_stats.weekly_deaths_count_state_causes_2014_2__3yf8_kanr
LIMIT 100;

AI-Powered Dataset Discovery

Use Snowflake's CORTEX AI to discover relevant datasets using natural language:

-- Use CORTEX AI to find datasets about invasive bacterial diseases
WITH find_dataset_prompt AS (
      SELECT
          -- =====================================================================
          -- STEP 1: UPDATE YOUR SEARCH PROMPT HERE
          -- =====================================================================
          -- Write a question that can be answered with YES or NO
          -- The AI will evaluate each dataset against this question
          -- TIP: Be specific but include synonyms for better matches
          -- =====================================================================
          'Is this dataset related to flu, influenza, or respiratory illness?' AS prompt
  ),

  dataset_analysis AS (
      SELECT
          source_dataset_id,
          dataset_name,
          description,
          schema_name || '.' || view_name AS view_location,

          -- =====================================================================
          -- CORTEX AI EVALUATION
          -- =====================================================================
          -- This sends each dataset to Snowflake's AI model for evaluation
          -- The AI reads the dataset name and description to determine if it
          -- matches your search criteria, understanding context beyond keywords
          -- =====================================================================
          SNOWFLAKE.CORTEX.COMPLETE(
              'snowflake-arctic',  -- AI model to use
              find_dataset_prompt.prompt ||
              ' Dataset name: ' || dataset_name ||
              ' Description: ' || COALESCE(LEFT(description, 500), 'No description') ||
              ' Answer with just YES or NO.'  -- Forces concise response
          ) AS matches_criteria
      FROM dwv.datasets
      JOIN find_dataset_prompt
  )
  
  SELECT
      source_dataset_id,
      dataset_name,
      description,
      view_location,
      matches_criteria AS ai_response  -- Optional: include to see actual AI response
  FROM dataset_analysis
  WHERE UPPER(TRIM(matches_criteria)) = 'YES'  -- Matches AI "YES" responses
  ORDER BY dataset_name;

Example Results:

SOURCE_DATASET_ID
DATASET_NAME
DESCRIPTION_PREVIEW
FULL_TABLE_NAME
AVAILABLE

qvzb-qs6p

1998-2023 Serotype Data for Invasive Pneumococcal...

CDC monitors invasive bacterial infections that cause bloodstream infections, sepsis, and meningitis in persons living in the community...

dwv_pub_health_surv.abcs_pneumococcal_serotype_data_1998_20__qvzb_qs6p

Yes

7mra-9cq9

2023 Respiratory Virus Response - NSSP Emergency...

2023 Respiratory Viruses Response – National Syndromic Surveillance Program Emergency Department Visit Trajectories - COVID-19, Flu, RSV...

dwv_pub_health_surv.nssp_ed_visit_trajectories_by_state_202__7mra_9cq9

Yes

Entity Relationship Diagram

The CDC Open Data Product follows a standardized data architecture where metadata tables track datasets and their processing batches, while actual health data is stored in categorized schemas:

Color Legend:

  • 🔵 Blue (Metadata): Core system tables (datasets) - Dataset catalog and metadata

  • 🟠 Orange (Processing): Batch management (datasets_batches) - Processing workflow and versioning

  • 🔴 Red (Mortality): Death and mortality data - NCHS vital statistics

  • 🟢 Green (Vaccination): Immunization and vaccine data - Coverage and safety monitoring

  • 🟣 Purple (Surveillance): Disease surveillance - Infectious disease tracking

  • 🟢 Teal (Environmental): Environmental health - Air quality and toxicology

Key Relationships:

  • Each dataset can have multiple datasets_batches over time

  • Each data table record links to both its parent dataset and specific batch_id

  • Use datasets_batches.is_latest_batch = TRUE to get current data

  • All data tables follow the same schema pattern with id, dataset_id, batch_id, and created_at

Data Categories

Core Health Statistics

Schema
Description
View Count

DWV_NCHS_STATS

National Center for Health Statistics - mortality, births, health trends

213

DWV_NNDSS_DATA

National Notifiable Diseases Surveillance System

295

DWV_PUB_HEALTH_SURV

Public Health Surveillance data

66

DWV_CDC_DATA_CAT

CDC Data Catalog entries

55

Disease-Specific Data

Schema
Description
View Count

DWV_VACC_DATA

Vaccination coverage, adverse events, hesitancy

82

DWV_FLU_VACCINATIONS

Influenza vaccination data

12

DWV_CHILD_VAX

Childhood immunization coverage

15

DWV_FOOD_WATER_DISEASES

Foodborne and waterborne illness tracking

9

Chronic Disease & Prevention

Schema
Description
View Count

DWV_HEART_STROKE_PREV

Cardiovascular disease prevention data

26

DWV_B_RISK_FACTORS

Behavioral Risk Factor Surveillance System (BRFSS)

28

DWV_TOBACCO_USE

Tobacco use surveillance and cessation

1

DWV_SMOKING_TOBACCO_USE

Smoking and tobacco use patterns

3

Environmental & Occupational Health

Schema
Description
View Count

DWV_ENV_HEALTH_TOX

Environmental health and toxicology

25

DWV_MOTOR_VEHICLES

Vehicle safety, crashes, and injuries

45

DWV_TBI_DATA

Traumatic brain injury surveillance

3

Complete category listing →

Dataset Features

  • Comprehensive Coverage: Over 50 health topic areas from infectious diseases to health behaviors

  • Continuous Monitoring: New CDC datasets and updates are automatically incorporated

  • Rigorous Quality Assurance: Each dataset batch undergoes automated QA checks

  • Schema Evolution: Automatic adaptation to changes in source data structures

  • Standardized Format: Consistent column naming and data types across all datasets

Data Quality and Maintenance

Quality Assurance Process

  • Automated Checks: Each dataset batch is subject to automated QA checks before being made available

  • Data Validation: Checks for data integrity, including row count validation and data type consistency

  • Schema Evolution: System automatically adapts to changes in source data schemas

  • Freshness Tracking: last_refresh_timestamp in DW.DATASETS indicates most recent updates

Update Schedule

  • Daily: NNDSS surveillance data

  • Weekly: NCHS mortality data, vaccination monitoring

  • Monthly: BRFSS, chronic disease indicators

  • Quarterly: Survey data, policy tracking

  • Annual: Comprehensive health surveys

Business Applications

The CDC Open Data Product can be utilized in various business applications:

  • Public Health Research: Epidemiological studies and population health analysis

  • Healthcare Policy Development: Evidence-based policy creation and evaluation

  • Risk Assessment: Health risk assessment and management across populations

  • Resource Allocation: Data-driven healthcare resource planning

  • Disease Surveillance: Outbreak monitoring and prediction modeling

  • Population Health Management: Community health assessment and intervention planning

Data Dictionary

Common Columns

Most CDC datasets include these standard columns:

Column
Description
Type

ID

Unique record identifier

VARCHAR(36)

DATASET_ID

Links to dataset metadata

VARCHAR(36)

BATCH_ID

Processing batch identifier

VARCHAR(36)

CREATED_AT

Record creation timestamp

TIMESTAMP_NTZ

Metadata Access

-- Get detailed dataset information
SELECT 
    d.source_dataset_id,
    d.dataset_name,
    d.description,
    d.category,
    d.url as source_url,
    d.last_refresh_timestamp
FROM datasets d
WHERE d.view_name = 'YOUR_VIEW_NAME';

Support & Documentation

Source Data Access

Access original CDC data sources and documentation:

-- Find datasets with original source URLs
SELECT 
    view_name,
    dataset_name,
    url as source_url
FROM dwv.datasets
WHERE url IS NOT NULL
ORDER BY dataset_name;

Working with Latest Data Batches

Important: CDC datasets are processed in batches over time. To avoid duplicate records and ensure you're working with the most recent data, always filter for the latest batch using datasets_batches.is_latest_batch = TRUE.

-- Get latest batch information for datasets
SELECT 
    d.dataset_name,
    d.view_name,
    b.id batch_id,
    b.total_rows_processed row_count,
    b.created_at as batch_created,
    b.processing_date as batch_completed
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
WHERE b.is_latest_batch = TRUE
ORDER BY b.processing_date DESC
LIMIT 10;

Joining to Actual Data Tables

Here are examples showing how to properly join from metadata tables to actual data tables using the latest batch filter:

Example 1: Latest COVID-19 Mortality Data

 -- Weekly death counts by state for trend analysis
SELECT 
    jurisdiction_of_occurrence,
    mmwryear,
    mmwrweek,
    allcause,
    naturalcause,
    flag_diab
FROM dwv.datasets d
join dwv.datasets_batches db on d.id = db.dataset_id 
join dwv_nchs_stats.weekly_deaths_count_state_causes_2014_2__3yf8_kanr a
 on db.id = a.batch_id
WHERE mmwryear >= 2018
 and is_latest_batch
ORDER BY jurisdiction_of_occurrence, mmwryear, mmwrweek;

Example 2: Latest Vaccination Coverage with Metadata

-- Get latest childhood vaccination data with full context
SELECT 
    d.dataset_name,
    d.category,
    d.url as source_url,
    vax.geography state,
    vax.year_season,
    vax.vaccine,
    vax.dimension_type,
    vax.dimension_type,
    vax.coverage_estimate
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
JOIN dwv_child_vax.vaccination_coverage_young_children_0_3__fhky_rtsk vax 
    ON vax.batch_id = b.id
WHERE b.is_latest_batch = TRUE
ORDER BY vax.geography, vax.vaccine;

Example 3: Latest Environmental Health Data with Batch Tracking

-- Get latest air quality measures with processing information
SELECT 
    d.dataset_name,
    air.statename,
    air.countyname,
    air.reportyear,
    air.measuretype,
    air.value,
    air.unit,
    b.total_rows_processed as total_records_in_batch,
    b.processing_date as data_refresh_date
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
JOIN dwv_env_health_tox.air_quality_measures_neht_network__cjae_szjv air 
    ON air.batch_id = b.id
WHERE b.is_latest_batch = TRUE
  AND air.measurename LIKE '%PM2.5%'
ORDER BY air.statename, air.countyname, air.reportyear DESC;

Why Use Latest Batch Filtering?

Without latest batch filtering, you may encounter:

  • Duplicate records from historical processing runs

  • Inconsistent row counts across queries

  • Outdated data mixed with current data

With is_latest_batch = TRUE, you ensure:

  • Only the most recent version of each dataset

  • Consistent results across different query runs

  • Accurate row counts and data freshness

  • Optimal query performance

Example Use Cases

  1. COVID-19 Impact Analysis: Analyze trends in COVID-19 deaths across different age groups and regions

  2. Tobacco Consumption Trends: Track changes in tobacco consumption patterns over time and across states

  3. Bacterial Surveillance: Monitor the prevalence of invasive bacterial infections across demographics

  4. Cardiovascular Disease Risk Assessment: Analyze risk factors and trends in cardiovascular diseases

  5. Immunization Coverage Evaluation: Assess vaccination rates and their impact on disease outbreaks


Docs Last Updated: July 2025 &#xNAN;Total Views: 1,200+ &#xNAN;Data Source: Centers for Disease Control and Prevention (CDC)

Last updated