notes-medicalCDC Open Data Product

About the Dataset

The CDC Open Data Product is a comprehensive data solution that transforms and delivers over 1,298 high-quality, up-to-date public health datasets from the Centers for Disease Control and Prevention (CDC). This product encompasses more than 558 GB of data and growing, with over 31,140 attributes, offering researchers, analysts, and data scientists unparalleled access to a vast array of public health information.

Quick Access

Schemas: DWV* (Data Warehouse Views) Total Datasets: 1,298 views across 50+ categories Update Frequency: Daily to annual, depending on CDC source

Overview

The CDC Open Data Product provides standardized access to diverse health datasets including:

  • NCHS Statistics (213 views) - National Center for Health Statistics data

  • NNDSS Data (295 views) - National Notifiable Diseases Surveillance System

  • CDC Cities (57 views) - City-level health indicator data

  • Motor Vehicles (45 views) - Vehicle safety and injury data

  • Vaccination Data (82 views) - Immunization coverage and safety

  • Environmental Health (25 views) - Environmental exposures and health impacts

  • Heart & Stroke Prevention (26 views) - Cardiovascular health data

  • Legislative Data (33 views) - Health policy and legislation tracking

circle-check
circle-check

Getting Started

Basic Query Examples

AI-Powered Dataset Discovery

Use Snowflake's CORTEX AI to discover relevant datasets using natural language:

Example Results:

SOURCE_DATASET_ID
DATASET_NAME
DESCRIPTION_PREVIEW
FULL_TABLE_NAME
AVAILABLE

qvzb-qs6p

1998-2023 Serotype Data for Invasive Pneumococcal...

CDC monitors invasive bacterial infections that cause bloodstream infections, sepsis, and meningitis in persons living in the community...

dwv_pub_health_surv.abcs_pneumococcal_serotype_data_1998_20__qvzb_qs6p

Yes

7mra-9cq9

2023 Respiratory Virus Response - NSSP Emergency...

2023 Respiratory Viruses Response – National Syndromic Surveillance Program Emergency Department Visit Trajectories - COVID-19, Flu, RSV...

dwv_pub_health_surv.nssp_ed_visit_trajectories_by_state_202__7mra_9cq9

Yes

Entity Relationship Diagram

The CDC Open Data Product follows a standardized data architecture where metadata tables track datasets and their processing batches, while actual health data is stored in categorized schemas:

Color Legend:

  • 🔵 Blue (Metadata): Core system tables (datasets) - Dataset catalog and metadata

  • 🟠 Orange (Processing): Batch management (datasets_batches) - Processing workflow and versioning

  • 🔴 Red (Mortality): Death and mortality data - NCHS vital statistics

  • 🟢 Green (Vaccination): Immunization and vaccine data - Coverage and safety monitoring

  • 🟣 Purple (Surveillance): Disease surveillance - Infectious disease tracking

  • 🟢 Teal (Environmental): Environmental health - Air quality and toxicology

Key Relationships:

  • Each dataset can have multiple datasets_batches over time

  • Each data table record links to both its parent dataset and specific batch_id

  • Use datasets_batches.is_latest_batch = TRUE to get current data

  • All data tables follow the same schema pattern with id, dataset_id, batch_id, and created_at

Data Categories

Core Health Statistics

Schema
Description
View Count

DWV_NCHS_STATS

National Center for Health Statistics - mortality, births, health trends

235

DWV_NNDSS_DATA

National Notifiable Diseases Surveillance System

295

DWV_PUB_HEALTH_SURV

Public Health Surveillance data

73

DWV_CDC_DATA_CAT

CDC Data Catalog entries

60

Disease-Specific Data

Schema
Description
View Count

DWV_VACC_DATA

Vaccination coverage, adverse events, hesitancy

87

DWV_FLU_VACCINATIONS

Influenza vaccination data

13

DWV_CHILD_VAX

Childhood immunization coverage

15

DWV_FOOD_WATER_DISEASES

Foodborne and waterborne illness tracking

9

Chronic Disease & Prevention

Schema
Description
View Count

DWV_HEART_STROKE_PREV

Cardiovascular disease prevention data

28

DWV_B_RISK_FACTORS

Behavioral Risk Factor Surveillance System (BRFSS)

28

DWV_TOBACCO_USE

Tobacco use surveillance and cessation

1

DWV_SMOKING_TOBACCO_USE

Smoking and tobacco use patterns

3

Environmental & Occupational Health

Schema
Description
View Count

DWV_ENV_HEALTH_TOX

Environmental health and toxicology

25

DWV_MOTOR_VEHICLES

Vehicle safety, crashes, and injuries

45

DWV_TBI_DATA

Traumatic brain injury surveillance

3

Additional Categories

Schema
Description
View Count

DWV_ADM_DATA

Hospital admission data

8

DWV_ART_CDC

Assisted Reproductive Technology data

12

DWV_CANCER_RESEARCH_CITA

Cancer research citations

1

DWV_CDC_CASE_SURV

Case surveillance data

6

DWV_CDC_CHRONIC_DISEASE

Chronic disease surveillance

2

DWV_CDC_CITIES

City-level health indicators

64

DWV_CDC_MODELS

Disease modeling data

2

DWV_CESSATION_COV

Smoking cessation coverage

7

DWV_CORONA_RESPIRATORY

Coronavirus respiratory data

2

DWV_DISABILITY_HEALTH

Disability health data

3

DWV_FUNDING_DATA

CDC funding information

9

DWV_GLOBAL_HEALTH_DATA

Global health indicators

8

DWV_GLOBAL_SURVEY_DATA

Global survey data

4

DWV_HEALTHY_AGING

Healthy aging data

3

DWV_HLTH_COSTS

Healthcare costs data

2

DWV_HLTH_PEOPLE2020

Healthy People 2020 indicators

1

DWV_HLTH_STATS

Health statistics

11

DWV_LAB_SURVEILLANCE

Laboratory surveillance

9

DWV_LEGIS_DATA

Legislative data

33

DWV_MCH_HEALTH

Maternal and Child Health

12

DWV_MH_DATA

Mental health data

4

DWV_NCEH_DATA

Environmental health (NCEH)

1

DWV_NCIRD_IMMUNIZATION

Immunization data

3

DWV_NUTRI_PHYS_OBES

Nutrition and obesity data

9

DWV_ORAL_HEALTH

Oral health data

11

DWV_POLICY_DATA

Healthcare policy data

5

DWV_POL_SURV

Policy surveillance

17

DWV_PREG_VACC

Pregnancy vaccination data

13

DWV_PUB_HEALTH_INFRA

Public health infrastructure

2

DWV_QUITLINE_DATA

Quitline data

8

DWV_SURVEY_DATA

Survey data

18

DWV_TEEN_VAX

Teen vaccination data

1

DWV_UNSAFERESPONSEFILTER

Response filtering data

4

DWV_V_EYE_HEALTH

Vision and eye health

17

DWV_WEB_METRICS

Web metrics data

6

DWV_YRB_BEHAVIORS

Youth risk behaviors

18

DWV_NCEZ_ID

National Center for Environmental Health - NCEZ ID tracking and management data.

6

DWV_NCH_PREV

National Center for Health Statistics - prevalence of chronic diseases and risk factors.

1

DWV_OTHER_DATA

Other health-related data not covered by specific categories within the CDC healthcare data warehouse.

35

Dataset Features

  • Comprehensive Coverage: Over 50 health topic areas from infectious diseases to health behaviors

  • Continuous Monitoring: New CDC datasets and updates are automatically incorporated

  • Rigorous Quality Assurance: Each dataset batch undergoes automated QA checks

  • Schema Evolution: Automatic adaptation to changes in source data structures

  • Standardized Format: Consistent column naming and data types across all datasets

Data Quality and Maintenance

Quality Assurance Process

  • Automated Checks: Each dataset batch is subject to automated QA checks before being made available

  • Data Validation: Checks for data integrity, including row count validation and data type consistency

  • Schema Evolution: System automatically adapts to changes in source data schemas

  • Freshness Tracking: last_refresh_timestamp in DW.DATASETS indicates most recent updates

Update Schedule

  • Daily: NNDSS surveillance data

  • Weekly: NCHS mortality data, vaccination monitoring

  • Monthly: BRFSS, chronic disease indicators

  • Quarterly: Survey data, policy tracking

  • Annual: Comprehensive health surveys

Business Applications

The CDC Open Data Product can be utilized in various business applications:

  • Public Health Research: Epidemiological studies and population health analysis

  • Healthcare Policy Development: Evidence-based policy creation and evaluation

  • Risk Assessment: Health risk assessment and management across populations

  • Resource Allocation: Data-driven healthcare resource planning

  • Disease Surveillance: Outbreak monitoring and prediction modeling

  • Population Health Management: Community health assessment and intervention planning

Data Dictionary

Common Columns

Most CDC datasets include these standard columns:

Column
Description
Type

ID

Unique record identifier

VARCHAR(36)

DATASET_ID

Links to dataset metadata

VARCHAR(36)

BATCH_ID

Processing batch identifier

VARCHAR(36)

CREATED_AT

Record creation timestamp

TIMESTAMP_NTZ

Metadata Access

Support & Documentation

Source Data Access

Access original CDC data sources and documentation:

Working with Latest Data Batches

Important: CDC datasets are processed in batches over time. To avoid duplicate records and ensure you're working with the most recent data, always filter for the latest batch using datasets_batches.is_latest_batch = TRUE.

Joining to Actual Data Tables

Here are examples showing how to properly join from metadata tables to actual data tables using the latest batch filter:

Example 1: Latest COVID-19 Mortality Data

Example 2: Latest Vaccination Coverage with Metadata

Example 3: Latest Environmental Health Data with Batch Tracking

Why Use Latest Batch Filtering?

Without latest batch filtering, you may encounter:

  • Duplicate records from historical processing runs

  • Inconsistent row counts across queries

  • Outdated data mixed with current data

With is_latest_batch = TRUE, you ensure:

  • Only the most recent version of each dataset

  • Consistent results across different query runs

  • Accurate row counts and data freshness

  • Optimal query performance

Example Use Cases

  1. COVID-19 Impact Analysis: Analyze trends in COVID-19 deaths across different age groups and regions

  2. Tobacco Consumption Trends: Track changes in tobacco consumption patterns over time and across states

  3. Bacterial Surveillance: Monitor the prevalence of invasive bacterial infections across demographics

  4. Cardiovascular Disease Risk Assessment: Analyze risk factors and trends in cardiovascular diseases

  5. Immunization Coverage Evaluation: Assess vaccination rates and their impact on disease outbreaks


Get Started

circle-check

Includes

All 1,294+ datasets, daily to annual updates, full documentation

Support

Email support included

Cancellation

Cancel anytime, no long-term commitment

circle-check

Choose Your Platform

Platform
Get Access
Free Trial

Docs Last Updated: 1/27/2026

Last updated