CDC Open Data Product
About the Dataset
The CDC Open Data Product is a comprehensive data solution that transforms and delivers over 1,200 high-quality, up-to-date public health datasets from the Centers for Disease Control and Prevention (CDC). This product encompasses more than 500GB of data and growing, with over 27,000 attributes, offering researchers, analysts, and data scientists unparalleled access to a vast array of public health information.
🔗 Find the CDC Open Data Product on the Snowflake Marketplace.
Quick Access
Schemas: DWV*
(Data Warehouse Views)
Total Datasets: 1,200+ views across 50+ categories
Update Frequency: Daily to annual, depending on CDC source
Overview
The CDC Open Data Product provides standardized access to diverse health datasets including:
NCHS Statistics (213 views) - National Center for Health Statistics data
NNDSS Data (295 views) - National Notifiable Diseases Surveillance System
CDC Cities (57 views) - City-level health indicator data
Motor Vehicles (45 views) - Vehicle safety and injury data
Vaccination Data (82 views) - Immunization coverage and safety
Environmental Health (25 views) - Environmental exposures and health impacts
Heart & Stroke Prevention (26 views) - Cardiovascular health data
Legislative Data (33 views) - Health policy and legislation tracking
Browse by Category
National Center for State, Tribal, Local, and Territorial Public Health Infrastructure and Workforce
1
Total Categories: 47 | Total Datasets: 1,200+
Getting Started
Basic Query Examples
-- List all available datasets
SELECT table_schema, table_name,
ROW_NUMBER() OVER (ORDER BY table_schema, table_name) as dataset_number
FROM information_schema.views
WHERE table_schema LIKE 'DWV%'
AND table_name NOT IN ('DATASETS', 'DATASETS_BATCHES')
LIMIT 10;
-- Search for COVID-related datasets
SELECT v.table_schema, v.table_name, d.dataset_name, d.description
FROM information_schema.views v
JOIN dwv.datasets d ON v.table_name = d.view_name
WHERE v.table_schema LIKE 'DWV%'
AND (UPPER(d.dataset_name) LIKE '%COVID%'
OR UPPER(d.description) LIKE '%COVID%')
ORDER BY v.table_schema, v.table_name;
-- Get sample data from a specific dataset
SELECT *
FROM dwv_nchs_stats.weekly_deaths_count_state_causes_2014_2__3yf8_kanr
LIMIT 100;
AI-Powered Dataset Discovery
Use Snowflake's CORTEX AI to discover relevant datasets using natural language:
-- Use CORTEX AI to find datasets about invasive bacterial diseases
WITH find_dataset_prompt AS (
SELECT
-- =====================================================================
-- STEP 1: UPDATE YOUR SEARCH PROMPT HERE
-- =====================================================================
-- Write a question that can be answered with YES or NO
-- The AI will evaluate each dataset against this question
-- TIP: Be specific but include synonyms for better matches
-- =====================================================================
'Is this dataset related to flu, influenza, or respiratory illness?' AS prompt
),
dataset_analysis AS (
SELECT
source_dataset_id,
dataset_name,
description,
schema_name || '.' || view_name AS view_location,
-- =====================================================================
-- CORTEX AI EVALUATION
-- =====================================================================
-- This sends each dataset to Snowflake's AI model for evaluation
-- The AI reads the dataset name and description to determine if it
-- matches your search criteria, understanding context beyond keywords
-- =====================================================================
SNOWFLAKE.CORTEX.COMPLETE(
'snowflake-arctic', -- AI model to use
find_dataset_prompt.prompt ||
' Dataset name: ' || dataset_name ||
' Description: ' || COALESCE(LEFT(description, 500), 'No description') ||
' Answer with just YES or NO.' -- Forces concise response
) AS matches_criteria
FROM dwv.datasets
JOIN find_dataset_prompt
)
SELECT
source_dataset_id,
dataset_name,
description,
view_location,
matches_criteria AS ai_response -- Optional: include to see actual AI response
FROM dataset_analysis
WHERE UPPER(TRIM(matches_criteria)) = 'YES' -- Matches AI "YES" responses
ORDER BY dataset_name;
Example Results:
qvzb-qs6p
1998-2023 Serotype Data for Invasive Pneumococcal...
CDC monitors invasive bacterial infections that cause bloodstream infections, sepsis, and meningitis in persons living in the community...
dwv_pub_health_surv.abcs_pneumococcal_serotype_data_1998_20__qvzb_qs6p
Yes
7mra-9cq9
2023 Respiratory Virus Response - NSSP Emergency...
2023 Respiratory Viruses Response – National Syndromic Surveillance Program Emergency Department Visit Trajectories - COVID-19, Flu, RSV...
dwv_pub_health_surv.nssp_ed_visit_trajectories_by_state_202__7mra_9cq9
Yes
Entity Relationship Diagram
The CDC Open Data Product follows a standardized data architecture where metadata tables track datasets and their processing batches, while actual health data is stored in categorized schemas:

Color Legend:
🔵 Blue (Metadata): Core system tables (
datasets
) - Dataset catalog and metadata🟠 Orange (Processing): Batch management (
datasets_batches
) - Processing workflow and versioning🔴 Red (Mortality): Death and mortality data - NCHS vital statistics
🟢 Green (Vaccination): Immunization and vaccine data - Coverage and safety monitoring
🟣 Purple (Surveillance): Disease surveillance - Infectious disease tracking
🟢 Teal (Environmental): Environmental health - Air quality and toxicology
Key Relationships:
Each
dataset
can have multipledatasets_batches
over timeEach data table record links to both its parent
dataset
and specificbatch_id
Use
datasets_batches.is_latest_batch = TRUE
to get current dataAll data tables follow the same schema pattern with
id
,dataset_id
,batch_id
, andcreated_at
Data Categories
Core Health Statistics
DWV_NCHS_STATS
National Center for Health Statistics - mortality, births, health trends
213
DWV_NNDSS_DATA
National Notifiable Diseases Surveillance System
295
DWV_PUB_HEALTH_SURV
Public Health Surveillance data
66
DWV_CDC_DATA_CAT
CDC Data Catalog entries
55
Disease-Specific Data
DWV_VACC_DATA
Vaccination coverage, adverse events, hesitancy
82
DWV_FLU_VACCINATIONS
Influenza vaccination data
12
DWV_CHILD_VAX
Childhood immunization coverage
15
DWV_FOOD_WATER_DISEASES
Foodborne and waterborne illness tracking
9
Chronic Disease & Prevention
DWV_HEART_STROKE_PREV
Cardiovascular disease prevention data
26
DWV_B_RISK_FACTORS
Behavioral Risk Factor Surveillance System (BRFSS)
28
DWV_TOBACCO_USE
Tobacco use surveillance and cessation
1
DWV_SMOKING_TOBACCO_USE
Smoking and tobacco use patterns
3
Environmental & Occupational Health
DWV_ENV_HEALTH_TOX
Environmental health and toxicology
25
DWV_MOTOR_VEHICLES
Vehicle safety, crashes, and injuries
45
DWV_TBI_DATA
Traumatic brain injury surveillance
3
Dataset Features
Comprehensive Coverage: Over 50 health topic areas from infectious diseases to health behaviors
Continuous Monitoring: New CDC datasets and updates are automatically incorporated
Rigorous Quality Assurance: Each dataset batch undergoes automated QA checks
Schema Evolution: Automatic adaptation to changes in source data structures
Standardized Format: Consistent column naming and data types across all datasets
Data Quality and Maintenance
Quality Assurance Process
Automated Checks: Each dataset batch is subject to automated QA checks before being made available
Data Validation: Checks for data integrity, including row count validation and data type consistency
Schema Evolution: System automatically adapts to changes in source data schemas
Freshness Tracking:
last_refresh_timestamp
inDW.DATASETS
indicates most recent updates
Update Schedule
Daily: NNDSS surveillance data
Weekly: NCHS mortality data, vaccination monitoring
Monthly: BRFSS, chronic disease indicators
Quarterly: Survey data, policy tracking
Annual: Comprehensive health surveys
Business Applications
The CDC Open Data Product can be utilized in various business applications:
Public Health Research: Epidemiological studies and population health analysis
Healthcare Policy Development: Evidence-based policy creation and evaluation
Risk Assessment: Health risk assessment and management across populations
Resource Allocation: Data-driven healthcare resource planning
Disease Surveillance: Outbreak monitoring and prediction modeling
Population Health Management: Community health assessment and intervention planning
Data Dictionary
Common Columns
Most CDC datasets include these standard columns:
ID
Unique record identifier
VARCHAR(36)
DATASET_ID
Links to dataset metadata
VARCHAR(36)
BATCH_ID
Processing batch identifier
VARCHAR(36)
CREATED_AT
Record creation timestamp
TIMESTAMP_NTZ
Metadata Access
-- Get detailed dataset information
SELECT
d.source_dataset_id,
d.dataset_name,
d.description,
d.category,
d.url as source_url,
d.last_refresh_timestamp
FROM datasets d
WHERE d.view_name = 'YOUR_VIEW_NAME';
Support & Documentation
Source Data Access
Access original CDC data sources and documentation:
-- Find datasets with original source URLs
SELECT
view_name,
dataset_name,
url as source_url
FROM dwv.datasets
WHERE url IS NOT NULL
ORDER BY dataset_name;
Working with Latest Data Batches
Important: CDC datasets are processed in batches over time. To avoid duplicate records and ensure you're working with the most recent data, always filter for the latest batch using datasets_batches.is_latest_batch = TRUE
.
-- Get latest batch information for datasets
SELECT
d.dataset_name,
d.view_name,
b.id batch_id,
b.total_rows_processed row_count,
b.created_at as batch_created,
b.processing_date as batch_completed
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
WHERE b.is_latest_batch = TRUE
ORDER BY b.processing_date DESC
LIMIT 10;
Joining to Actual Data Tables
Here are examples showing how to properly join from metadata tables to actual data tables using the latest batch filter:
Example 1: Latest COVID-19 Mortality Data
-- Weekly death counts by state for trend analysis
SELECT
jurisdiction_of_occurrence,
mmwryear,
mmwrweek,
allcause,
naturalcause,
flag_diab
FROM dwv.datasets d
join dwv.datasets_batches db on d.id = db.dataset_id
join dwv_nchs_stats.weekly_deaths_count_state_causes_2014_2__3yf8_kanr a
on db.id = a.batch_id
WHERE mmwryear >= 2018
and is_latest_batch
ORDER BY jurisdiction_of_occurrence, mmwryear, mmwrweek;
Example 2: Latest Vaccination Coverage with Metadata
-- Get latest childhood vaccination data with full context
SELECT
d.dataset_name,
d.category,
d.url as source_url,
vax.geography state,
vax.year_season,
vax.vaccine,
vax.dimension_type,
vax.dimension_type,
vax.coverage_estimate
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
JOIN dwv_child_vax.vaccination_coverage_young_children_0_3__fhky_rtsk vax
ON vax.batch_id = b.id
WHERE b.is_latest_batch = TRUE
ORDER BY vax.geography, vax.vaccine;
Example 3: Latest Environmental Health Data with Batch Tracking
-- Get latest air quality measures with processing information
SELECT
d.dataset_name,
air.statename,
air.countyname,
air.reportyear,
air.measuretype,
air.value,
air.unit,
b.total_rows_processed as total_records_in_batch,
b.processing_date as data_refresh_date
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
JOIN dwv_env_health_tox.air_quality_measures_neht_network__cjae_szjv air
ON air.batch_id = b.id
WHERE b.is_latest_batch = TRUE
AND air.measurename LIKE '%PM2.5%'
ORDER BY air.statename, air.countyname, air.reportyear DESC;
Why Use Latest Batch Filtering?
Without latest batch filtering, you may encounter:
Duplicate records from historical processing runs
Inconsistent row counts across queries
Outdated data mixed with current data
With is_latest_batch = TRUE
, you ensure:
Only the most recent version of each dataset
Consistent results across different query runs
Accurate row counts and data freshness
Optimal query performance
Example Use Cases
COVID-19 Impact Analysis: Analyze trends in COVID-19 deaths across different age groups and regions
Tobacco Consumption Trends: Track changes in tobacco consumption patterns over time and across states
Bacterial Surveillance: Monitor the prevalence of invasive bacterial infections across demographics
Cardiovascular Disease Risk Assessment: Analyze risk factors and trends in cardiovascular diseases
Immunization Coverage Evaluation: Assess vaccination rates and their impact on disease outbreaks
Docs Last Updated: July 2025 &#xNAN;Total Views: 1,200+ &#xNAN;Data Source: Centers for Disease Control and Prevention (CDC)
Last updated