folderNational Center for State, Tribal, Local, and Territorial Public Health Infrastructure and Workforce

National Center for State, Tribal, Local, and Territorial Public Health Infrastructure and Workforce datasets in the CDC Open Data Catalog

This page contains all datasets in the National Center for State, Tribal, Local, and Territorial Public Health Infrastructure and Workforce category of the CDC Open Data Catalog.

Total Datasets in Category: 2 Last Updated: 1/27/2026

CDC Text Corpora for Learners: HTML Mirrors of MMWR, EID, and PCD

  • Description: The attached ZIP archives are part of the CDC Text Corpora for Learnersarrow-up-right program. This version, comprised of 33,567 articles, was constructed on 2024-03-01 using source content retrieved on 2024-01-09.

The attached three ZIP archives contain the 33,567 articles in 33,576 compiled HTML mirrors of the MMWR Morbidity and Mortality Weekly Reportarrow-up-right including its series: Weekly Reports, Recommendations and Reports, Surveillance Summaries, Supplements, and Notifiable Diseases, a subset of Weekly Reports, constructed ad hoc; EID Emerging Infectious Diseasesarrow-up-right; and PCD Preventing Chronic Diseasearrow-up-right.There is one archive per series. The archive attachments are located in the About this Dataset section of this landing page. In that section when you click Show More, the attachments are located in the section Attachments.

The retrieval and organization of the files included making as few changes to raw sources as possible, to support as many downstream uses as possible.

  • Snowflake Schema: dwv_pub_health_infra

  • Databricks Schema: cdc_dwv_pub_health_infra

  • Table Name: cdc_text_corpora_html_mirrors_mmwr_eid___ut5n_bmc3

  • Dataset ID: ut5n-bmc3

  • Category: National Center for State, Tribal, Local, and Territorial Public Health Infrastructure and Workforce

  • Total Rows: 67,152

  • Last Refresh: 1/1/2026

  • Total Batches: 2

  • Tags: informatics, harvest-cdc-journals, ai, text analysis, semantic, phic, pcd, nlp, ncstltphiw, morphology, mmwr, ml, machine learning, corpora, corpus, data science, eid, llm, linguistic, language

CDC Text Corpora for Learners: MMWR, EID, and PCD Article Metadata

The data represented here is the tabulated metadataarrow-up-right of the combined 33,567 articles of the MMWR, EID, and PCD collectionsarrow-up-right whose contents are organized into three ZIP archived JSON files per collection. The JSON value output formats include UTF-8 HTML, UTF-8 markdown, and ASCII plain text.

The JSON filesarrow-up-right are located in the program's repository.arrow-up-right This version was constructed on 2024-03-01 using source content retrieved on 2024-01-09.

  • Snowflake Schema: dwv_pub_health_infra

  • Databricks Schema: cdc_dwv_pub_health_infra

  • Table Name: cdc_text_corpora_learners_mmwr_eid_pcd___7rih_tqi5

  • Dataset ID: 7rih-tqi5

  • Category: National Center for State, Tribal, Local, and Territorial Public Health Infrastructure and Workforce

  • Total Rows: 67,134

  • Last Refresh: 1/1/2026

  • Total Batches: 2

  • Tags: text analysis, corpus, corpora, linguistics, language, informatics, harvest-cdc-journals, eid, pcd, ncstltphiw, mmwr, ml, data science, phic, machine learning, morphology, semantics, smokefree indoor air

Last updated