LogoLogo
  • Getting Started
  • Snowflake Catalog
    • Snowflake Catalog
    • CDC Open Data Product
      • CDC Open Data Catalog
    • CMS Data Feeds Dataset
      • CMS Data Feeds Catalog
    • CMS Data Research Dataset
      • CMS Data Research Catalog
    • CMS Home Medical Equipment Product Databank
    • CMS Home Medical Equipment Provider Databank
    • CMS NPPES Provider Dataset
    • COVID-19 Diagnostic Laboratory Testing Dataset
    • FDA Device Dataset
    • FDA Drug Adverse Events (FAERS) Dataset
    • Google Reviews & Ratings Dataset
    • Google Trends Top Daily Terms Data Warehouse
    • Reddit/Subreddit Dataset
Powered by GitBook
LogoLogo

Explore More Resources

  • Data Solutions Hub
  • Visit Snowflake Marketplace
  • Visit Data Commerce Cloud
  • Connect with Us on LinkedIn

© 2025 Dataplex Consulting & Data Products

On this page
  • About the Dataset
  • Dataset Features
  • Data Quality and Maintenance
  • Business Applications
  • Example Use Cases
  • Data Structure
  • Entity Relationship Diagram
  • Sample Queries
  • Support and Contact
  • About Dataplex
  1. Snowflake Catalog

CDC Open Data Product

PreviousSnowflake CatalogNextCDC Open Data Catalog

Last updated 3 months ago

About the Dataset

The CDC Open Data Product is a comprehensive data solution that transforms and delivers over 1,300 high-quality, up-to-date public health datasets from the Centers for Disease Control and Prevention (CDC). This product encompasses more than 500GB of data and growing, with over 27,000 attributes, offering researchers, analysts, and data scientists unparalleled access to a vast array of public health information. .

Dataset Features

  • Covers topics including infectious diseases, chronic conditions, health behaviors, and healthcare utilization

  • Continuous monitoring for new CDC datasets and updates

  • Rigorous quality assurance process ensuring data reliability

  • Schema evolution support to handle changes in source data structures

Data Quality and Maintenance

The CDC Open Data Product undergoes a rigorous quality assurance process:

  • Automated Checks: Each dataset batch is subject to automated QA checks before being made available.

  • Schema Evolution: The system automatically adapts to changes in source data schemas, ensuring data consistency over time.

  • Data Validation: Checks are performed to ensure data integrity, including row count validation and data type consistency.

  • Regular Updates: Datasets are updated based on the CDC's update frequency for each dataset.

  • Freshness Tracking: The last_refresh_timestamp in the dwv.datasets table indicates the most recent update for each dataset.

Business Applications

The CDC Open Data Product can be utilized in various business applications, including:

  • Public health research and analysis

  • Healthcare policy development

  • Epidemiological studies

  • Health risk assessment and management

  • Population health management

  • Healthcare resource allocation

  • Disease outbreak monitoring and prediction

Example Use Cases

  1. COVID-19 Impact Analysis: Analyze trends in COVID-19 deaths across different age groups and regions.

  2. Tobacco Consumption Trends: Track changes in tobacco consumption patterns over time and across states.

  3. Bacterial Surveillance: Monitor the prevalence of invasive bacterial infections across different demographics.

  4. Cardiovascular Disease Risk Assessment: Analyze risk factors and trends in cardiovascular diseases across populations.

  5. Immunization Coverage Evaluation: Assess vaccination rates and their impact on disease outbreaks.

Data Structure

The CDC Open Data Product is organized into three main tables in the DWV schema:

  1. datasets: Contains metadata about each CDC dataset.

  2. datasets_batches: Provides information about data processing batches.

  3. {table_name}: Individual dataset tables, one for each CDC dataset within DWV_ schemas tied to the dataset category.

Common system columns across all datasets include:

  • id: Unique identifier for each record

  • dataset_id: Reference to the dataset metadata

  • batch_id: Reference to the processing batch

  • source_dataset_id: Original CDC dataset identifier

Entity Relationship Diagram

Sample Queries

  1. List all available datasets:

SELECT dataset_name, description, last_refresh_timestamp
FROM dwv.datasets
ORDER BY last_refresh_timestamp DESC;
  1. Get the latest processing batch for a specific dataset:

SELECT db.processing_date, db.total_rows_processed
FROM dwv.datasets_batches db
JOIN dwv.datasets d ON db.dataset_id = d.id
WHERE d.dataset_name = 'AH Provisional COVID-19 Deaths by Week and Age'
ORDER BY db.processing_date DESC
LIMIT 1;
  1. Query COVID-19 related datasets:

SELECT *
FROM dwv.datasets d
WHERE ARRAYS_OVERLAP(
    tags,
    ARRAY_CONSTRUCT(
        'coronavirus',
        'covid-19',
        'covid19',
        'sars-cov-2',
        'immunization'
    )
)
ORDER BY dataset_name;
  1. Query the latest batch data for a single dataset:

SELECT v.*
FROM dwv.datasets d
JOIN dwv.datasets_batches db ON d.id = db.dataset_id
JOIN dwv_pub_health_surv.nssp_emergency_department_visit_trajectories_by_st__rdmq_nq56 v
    ON db.id = v.batch_id AND db.dataset_id = v.dataset_id
QUALIFY RANK() OVER(ORDER BY db.processing_date DESC) = 1
ORDER BY geography;

Support and Contact

About Dataplex

Dataplex Consulting & Data Products offers top-notch, turnkey data products, making data easily accessible for any business. Our data pipelines feature automatic quality checks and active monitoring, ensuring timely, clean, and high-quality data designed for seamless ingestion.

We also offer data consulting services to companies of all sizes. With 20+ years of experience serving small businesses and Fortune 500 companies, our team has gained a wealth of practical expertise in the field. Our track record shows success in enhancing data management, boosting revenue, and helping companies become more data-driven.

For questions, support, or feedback regarding the CDC Open Data Product, please contact our data support team at .

See the catalog for the full list of tables
support@dataplex-consulting.com