CDC Open Data Product
About the Dataset
The CDC Open Data Product is a comprehensive data solution that transforms and delivers over 1,197 high-quality, up-to-date public health datasets from the Centers for Disease Control and Prevention (CDC). This product encompasses more than 511 GB of data and growing, with over 28,580 attributes, offering researchers, analysts, and data scientists unparalleled access to a vast array of public health information.
🔗 Find the CDC Open Data Product on the Snowflake Marketplace.
Quick Access
Schemas: DWV* (Data Warehouse Views)
Total Datasets: 1,197 views across 50+ categories
Update Frequency: Daily to annual, depending on CDC source
Overview
The CDC Open Data Product provides standardized access to diverse health datasets including:
- NCHS Statistics (213 views) - National Center for Health Statistics data 
- NNDSS Data (295 views) - National Notifiable Diseases Surveillance System 
- CDC Cities (57 views) - City-level health indicator data 
- Motor Vehicles (45 views) - Vehicle safety and injury data 
- Vaccination Data (82 views) - Immunization coverage and safety 
- Environmental Health (25 views) - Environmental exposures and health impacts 
- Heart & Stroke Prevention (26 views) - Cardiovascular health data 
- Legislative Data (33 views) - Health policy and legislation tracking 
Getting Started
Basic Query Examples
-- List all available datasets
SELECT table_schema, table_name, 
       ROW_NUMBER() OVER (ORDER BY table_schema, table_name) as dataset_number
FROM information_schema.views
WHERE table_schema LIKE 'DWV%'
AND table_name NOT IN ('DATASETS', 'DATASETS_BATCHES')
LIMIT 10;-- Search for COVID-related datasets
SELECT v.table_schema, v.table_name, d.dataset_name, d.description
FROM information_schema.views v
JOIN dwv.datasets d ON v.table_name = d.view_name
WHERE v.table_schema LIKE 'DWV%'
AND (UPPER(d.dataset_name) LIKE '%COVID%' 
     OR UPPER(d.description) LIKE '%COVID%')
ORDER BY v.table_schema, v.table_name;-- Get sample data from a specific dataset
SELECT *
FROM dwv_nchs_stats.weekly_deaths_count_state_causes_2014_2__3yf8_kanr
LIMIT 100;AI-Powered Dataset Discovery
Use Snowflake's CORTEX AI to discover relevant datasets using natural language:
-- Use CORTEX AI to find datasets about invasive bacterial diseases
WITH find_dataset_prompt AS (
      SELECT
          -- =====================================================================
          -- STEP 1: UPDATE YOUR SEARCH PROMPT HERE
          -- =====================================================================
          -- Write a question that can be answered with YES or NO
          -- The AI will evaluate each dataset against this question
          -- TIP: Be specific but include synonyms for better matches
          -- =====================================================================
          'Is this dataset related to flu, influenza, or respiratory illness?' AS prompt
  ),
  dataset_analysis AS (
      SELECT
          source_dataset_id,
          dataset_name,
          description,
          schema_name || '.' || view_name AS view_location,
          -- =====================================================================
          -- CORTEX AI EVALUATION
          -- =====================================================================
          -- This sends each dataset to Snowflake's AI model for evaluation
          -- The AI reads the dataset name and description to determine if it
          -- matches your search criteria, understanding context beyond keywords
          -- =====================================================================
          SNOWFLAKE.CORTEX.COMPLETE(
              'snowflake-arctic',  -- AI model to use
              find_dataset_prompt.prompt ||
              ' Dataset name: ' || dataset_name ||
              ' Description: ' || COALESCE(LEFT(description, 500), 'No description') ||
              ' Answer with just YES or NO.'  -- Forces concise response
          ) AS matches_criteria
      FROM dwv.datasets
      JOIN find_dataset_prompt
  )
  
  SELECT
      source_dataset_id,
      dataset_name,
      description,
      view_location,
      matches_criteria AS ai_response  -- Optional: include to see actual AI response
  FROM dataset_analysis
  WHERE UPPER(TRIM(matches_criteria)) = 'YES'  -- Matches AI "YES" responses
  ORDER BY dataset_name;Example Results:
qvzb-qs6p
1998-2023 Serotype Data for Invasive Pneumococcal...
CDC monitors invasive bacterial infections that cause bloodstream infections, sepsis, and meningitis in persons living in the community...
dwv_pub_health_surv.abcs_pneumococcal_serotype_data_1998_20__qvzb_qs6p
Yes
7mra-9cq9
2023 Respiratory Virus Response - NSSP Emergency...
2023 Respiratory Viruses Response – National Syndromic Surveillance Program Emergency Department Visit Trajectories - COVID-19, Flu, RSV...
dwv_pub_health_surv.nssp_ed_visit_trajectories_by_state_202__7mra_9cq9
Yes
Entity Relationship Diagram
The CDC Open Data Product follows a standardized data architecture where metadata tables track datasets and their processing batches, while actual health data is stored in categorized schemas:

Color Legend:
- 🔵 Blue (Metadata): Core system tables ( - datasets) - Dataset catalog and metadata
- 🟠 Orange (Processing): Batch management ( - datasets_batches) - Processing workflow and versioning
- 🔴 Red (Mortality): Death and mortality data - NCHS vital statistics 
- 🟢 Green (Vaccination): Immunization and vaccine data - Coverage and safety monitoring 
- 🟣 Purple (Surveillance): Disease surveillance - Infectious disease tracking 
- 🟢 Teal (Environmental): Environmental health - Air quality and toxicology 
Key Relationships:
- Each - datasetcan have multiple- datasets_batchesover time
- Each data table record links to both its parent - datasetand specific- batch_id
- Use - datasets_batches.is_latest_batch = TRUEto get current data
- All data tables follow the same schema pattern with - id,- dataset_id,- batch_id, and- created_at
Data Categories
Core Health Statistics
DWV_NCHS_STATS
National Center for Health Statistics - mortality, births, health trends
214
DWV_NNDSS_DATA
National Notifiable Diseases Surveillance System
295
DWV_PUB_HEALTH_SURV
Public Health Surveillance data
66
DWV_CDC_DATA_CAT
CDC Data Catalog entries
56
Disease-Specific Data
DWV_VACC_DATA
Vaccination coverage, adverse events, hesitancy
82
DWV_FLU_VACCINATIONS
Influenza vaccination data
12
DWV_CHILD_VAX
Childhood immunization coverage
15
DWV_FOOD_WATER_DISEASES
Foodborne and waterborne illness tracking
9
Chronic Disease & Prevention
DWV_HEART_STROKE_PREV
Cardiovascular disease prevention data
26
DWV_B_RISK_FACTORS
Behavioral Risk Factor Surveillance System (BRFSS)
28
DWV_TOBACCO_USE
Tobacco use surveillance and cessation
1
DWV_SMOKING_TOBACCO_USE
Smoking and tobacco use patterns
3
Environmental & Occupational Health
DWV_ENV_HEALTH_TOX
Environmental health and toxicology
25
DWV_MOTOR_VEHICLES
Vehicle safety, crashes, and injuries
45
DWV_TBI_DATA
Traumatic brain injury surveillance
3
DWV_ADM_DATA
Hospital admission data including patient demographics, diagnoses, procedures, and outcomes.
8
DWV_ART_CDC
Art Cdc - Data schema focusing on healthcare data related to ART (Assisted Reproductive Technology) from the Centers for Disease Control and Prevention.
12
DWV_CANCER_RESEARCH_CITA
Cancer research data schema focusing on clinical trials and treatment outcomes.
1
DWV_CDC_CASE_SURV
CDC healthcare data schema DWV_CDC_CASE_SURV focuses on tracking and analyzing cases of various diseases and health conditions for surveillance and response purposes.
6
DWV_CDC_CHRONIC_DISEASE
Chronic disease surveillance and epidemiology data collected by the CDC for monitoring and analyzing trends in long-term health conditions.
2
DWV_CDC_CITIES
CDC cities data schema focusing on health indicators and statistics related to urban populations.
57
DWV_CDC_MODELS
CDC healthcare data schema focusing on disease surveillance, outbreak modeling, and epidemiological analysis.
2
DWV_CESSATION_COV
Smoking cessation coverage data for evaluating tobacco cessation programs and interventions.
7
DWV_CORONA_RESPIRATORY
Respiratory data related to the coronavirus outbreak, including testing results, symptoms, and outcomes.
1
DWV_DISABILITY_HEALTH
Disability health data schema focusing on healthcare information related to disabilities.
3
DWV_FUNDING_DATA
CDC Funding Data schema (DWV_FUNDING_DATA) provides comprehensive information on funding sources and allocations related to healthcare initiatives and programs.
9
DWV_GLOBAL_HEALTH_DATA
Global health data schema containing comprehensive data on various health indicators, diseases, and trends worldwide.
8
DWV_GLOBAL_SURVEY_DATA
Global survey data on various health topics collected and analyzed by the CDC.
4
DWV_HEALTHY_AGING
Healthy aging data schema focusing on various health indicators and trends related to aging populations, including chronic conditions, preventive care, and quality of life measures.
3
DWV_HLTH_COSTS
Healthcare costs data schema focusing on tracking and analyzing various health-related expenditures and financial aspects within the healthcare system.
2
DWV_HLTH_PEOPLE2020
Health data schema focusing on people-related indicators for the year 2020, including demographics, health behaviors, and outcomes.
1
DWV_HLTH_STATS
Health statistics data schema focusing on a wide range of healthcare statistics and trends.
11
DWV_LAB_SURVEILLANCE
Laboratory surveillance data schema focusing on monitoring and tracking of various health-related lab tests and results.
8
DWV_LEGIS_DATA
Legislative data related to healthcare policies and regulations.
33
DWV_MCH_HEALTH
Maternal and Child Health data including vital statistics, pregnancy outcomes, and child health indicators.
12
DWV_MH_DATA
Mental health data including diagnoses, treatments, and outcomes.
1
DWV_NCEH_DATA
Environmental health data collected by the National Center for Environmental Health (NCEH), including information on toxicology, exposures, and related topics.
1
DWV_NCIRD_IMMUNIZATION
Immunization coverage, vaccine-preventable diseases, vaccination schedules.
2
DWV_NUTRI_PHYS_OBES
Nutrition, physical activity, and obesity data schema for CDC healthcare analysis.
8
DWV_ORAL_HEALTH
Oral health data schema focusing on dental care, oral hygiene, and related health indicators.
11
DWV_POLICY_DATA
Healthcare policy data including information on coverage, regulations, and trends.
5
DWV_POL_SURV
Public health surveillance data related to political factors and their impact on health outcomes.
17
DWV_PREG_VACC
Pregnancy vaccination data including coverage rates and adverse events.
12
DWV_PUB_HEALTH_INFRA
Public health infrastructure data schema focusing on key indicators related to healthcare systems, facilities, and resources.
2
DWV_QUITLINE_DATA
Quitline data schema for tracking and analyzing smoking cessation efforts and outcomes.
8
DWV_SURVEY_DATA
Survey data related to various health topics, collected and analyzed by the CDC for public health research and surveillance purposes.
18
DWV_TEEN_VAX
Adolescent vaccination coverage and trends data schema.
1
DWV_UNSAFERESPONSEFILTER
Unsaferesponsefilter - CDC healthcare data schema focusing on identifying and filtering unsafe responses in healthcare settings.
4
DWV_V_EYE_HEALTH
Vision health data schema focusing on eye health indicators and trends.
15
DWV_WEB_METRICS
Web metrics data schema focusing on tracking and analyzing various health-related metrics on CDC websites for monitoring and improving online health information dissemination.
4
DWV_YRB_BEHAVIORS
Youth Risk Behavior Surveillance System - behaviors related to health risks and protective factors among youth.
18
Dataset Features
- Comprehensive Coverage: Over 50 health topic areas from infectious diseases to health behaviors 
- Continuous Monitoring: New CDC datasets and updates are automatically incorporated 
- Rigorous Quality Assurance: Each dataset batch undergoes automated QA checks 
- Schema Evolution: Automatic adaptation to changes in source data structures 
- Standardized Format: Consistent column naming and data types across all datasets 
Data Quality and Maintenance
Quality Assurance Process
- Automated Checks: Each dataset batch is subject to automated QA checks before being made available 
- Data Validation: Checks for data integrity, including row count validation and data type consistency 
- Schema Evolution: System automatically adapts to changes in source data schemas 
- Freshness Tracking: - last_refresh_timestampin- DW.DATASETSindicates most recent updates
Update Schedule
- Daily: NNDSS surveillance data 
- Weekly: NCHS mortality data, vaccination monitoring 
- Monthly: BRFSS, chronic disease indicators 
- Quarterly: Survey data, policy tracking 
- Annual: Comprehensive health surveys 
Business Applications
The CDC Open Data Product can be utilized in various business applications:
- Public Health Research: Epidemiological studies and population health analysis 
- Healthcare Policy Development: Evidence-based policy creation and evaluation 
- Risk Assessment: Health risk assessment and management across populations 
- Resource Allocation: Data-driven healthcare resource planning 
- Disease Surveillance: Outbreak monitoring and prediction modeling 
- Population Health Management: Community health assessment and intervention planning 
Data Dictionary
Common Columns
Most CDC datasets include these standard columns:
ID
Unique record identifier
VARCHAR(36)
DATASET_ID
Links to dataset metadata
VARCHAR(36)
BATCH_ID
Processing batch identifier
VARCHAR(36)
CREATED_AT
Record creation timestamp
TIMESTAMP_NTZ
Metadata Access
-- Get detailed dataset information
SELECT 
    d.source_dataset_id,
    d.dataset_name,
    d.description,
    d.category,
    d.url as source_url,
    d.last_refresh_timestamp
FROM datasets d
WHERE d.view_name = 'YOUR_VIEW_NAME';Support & Documentation
Source Data Access
Access original CDC data sources and documentation:
-- Find datasets with original source URLs
SELECT 
    view_name,
    dataset_name,
    url as source_url
FROM dwv.datasets
WHERE url IS NOT NULL
ORDER BY dataset_name;Working with Latest Data Batches
Important: CDC datasets are processed in batches over time. To avoid duplicate records and ensure you're working with the most recent data, always filter for the latest batch using datasets_batches.is_latest_batch = TRUE.
-- Get latest batch information for datasets
SELECT 
    d.dataset_name,
    d.view_name,
    b.id batch_id,
    b.total_rows_processed row_count,
    b.created_at as batch_created,
    b.processing_date as batch_completed
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
WHERE b.is_latest_batch = TRUE
ORDER BY b.processing_date DESC
LIMIT 10;Joining to Actual Data Tables
Here are examples showing how to properly join from metadata tables to actual data tables using the latest batch filter:
Example 1: Latest COVID-19 Mortality Data
 -- Weekly death counts by state for trend analysis
SELECT 
    jurisdiction_of_occurrence,
    mmwryear,
    mmwrweek,
    allcause,
    naturalcause,
    flag_diab
FROM dwv.datasets d
join dwv.datasets_batches db on d.id = db.dataset_id 
join dwv_nchs_stats.weekly_deaths_count_state_causes_2014_2__3yf8_kanr a
 on db.id = a.batch_id
WHERE mmwryear >= 2018
 and is_latest_batch
ORDER BY jurisdiction_of_occurrence, mmwryear, mmwrweek;Example 2: Latest Vaccination Coverage with Metadata
-- Get latest childhood vaccination data with full context
SELECT 
    d.dataset_name,
    d.category,
    d.url as source_url,
    vax.geography state,
    vax.year_season,
    vax.vaccine,
    vax.dimension_type,
    vax.dimension_type,
    vax.coverage_estimate
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
JOIN dwv_child_vax.vaccination_coverage_young_children_0_3__fhky_rtsk vax 
    ON vax.batch_id = b.id
WHERE b.is_latest_batch = TRUE
ORDER BY vax.geography, vax.vaccine;Example 3: Latest Environmental Health Data with Batch Tracking
-- Get latest air quality measures with processing information
SELECT 
    d.dataset_name,
    air.statename,
    air.countyname,
    air.reportyear,
    air.measuretype,
    air.value,
    air.unit,
    b.total_rows_processed as total_records_in_batch,
    b.processing_date as data_refresh_date
FROM dwv.datasets d
JOIN dwv.datasets_batches b ON d.id = b.dataset_id
JOIN dwv_env_health_tox.air_quality_measures_neht_network__cjae_szjv air 
    ON air.batch_id = b.id
WHERE b.is_latest_batch = TRUE
  AND air.measurename LIKE '%PM2.5%'
ORDER BY air.statename, air.countyname, air.reportyear DESC;Why Use Latest Batch Filtering?
Without latest batch filtering, you may encounter:
- Duplicate records from historical processing runs 
- Inconsistent row counts across queries 
- Outdated data mixed with current data 
With is_latest_batch = TRUE, you ensure:
- Only the most recent version of each dataset 
- Consistent results across different query runs 
- Accurate row counts and data freshness 
- Optimal query performance 
Example Use Cases
- COVID-19 Impact Analysis: Analyze trends in COVID-19 deaths across different age groups and regions 
- Tobacco Consumption Trends: Track changes in tobacco consumption patterns over time and across states 
- Bacterial Surveillance: Monitor the prevalence of invasive bacterial infections across demographics 
- Cardiovascular Disease Risk Assessment: Analyze risk factors and trends in cardiovascular diseases 
- Immunization Coverage Evaluation: Assess vaccination rates and their impact on disease outbreaks 
Docs Last Updated: October 27th, 2025
Last updated

