# National Center for State, Tribal, Local, and Territorial Public Health Infrastructure and Workforce

This page contains all datasets in the **National Center for State, Tribal, Local, and Territorial Public Health Infrastructure and Workforce** category of the CDC Open Data Catalog.

**Total Datasets in Category**: 2 **Last Updated**: 5/24/2026

#### CDC Text Corpora for Learners: HTML Mirrors of MMWR, EID, and PCD

* **Description**: The attached ZIP archives are part of the [CDC Text Corpora for Learners](https://github.com/cmheilig/harvest-cdc-journals) program. This version, comprised of 33,567 articles, was constructed on 2024-03-01 using source content retrieved on 2024-01-09.

The attached three ZIP archives contain the 33,567 articles in 33,576 compiled HTML mirrors of the MMWR [Morbidity and Mortality Weekly Report](https://www.cdc.gov/mmwr/) including its series: *Weekly Reports*, *Recommendations and Reports*, *Surveillance Summaries*, *Supplements*, and *Notifiable Diseases*, a subset of *Weekly Reports*, constructed ad hoc; EID [Emerging Infectious Diseases](https://www.cdc.gov/eid/); and PCD [Preventing Chronic Disease](https://www.cdc.gov/pcd/).There is one archive per series. The archive attachments are located in the *About this Dataset* section of this landing page. In that section when you click Show More, the attachments are located in the section *Attachments*.

The retrieval and organization of the files included making as few changes to raw sources as possible, to support as many downstream uses as possible.

* **Snowflake Schema**: dwv\_pub\_health\_infra
* **Databricks Schema**: cdc\_dwv\_pub\_health\_infra
* **Table Name**: cdc\_text\_corpora\_html\_mirrors\_mmwr\_eid\_\_\_ut5n\_bmc3
* **Dataset ID**: ut5n-bmc3
* **Category**: National Center for State, Tribal, Local, and Territorial Public Health Infrastructure and Workforce
* **Total Rows**: 100,728
* **Last Refresh**: 5/2/2026
* **Total Batches**: 3
* **Tags**: pcd, ai, corpora, corpus, data science, eid, harvest-cdc-journals, informatics, language, linguistic, llm, machine learning, ml, mmwr, morphology, ncstltphiw, nlp, phic, semantic, text analysis
* **Source Data**: <https://data.cdc.gov/d/ut5n-bmc3>

#### CDC Text Corpora for Learners: MMWR, EID, and PCD Article Metadata

* **Description**: This landing page is part of the [CDC Text Corpora for Learners](https://github.com/cmheilig/harvest-cdc-journals) program; this includes the compiled 33,576 CDC Text for Learners [HTML mirrors](https://data.cdc.gov/National-Center-for-State-Tribal-Local-and-Territo/CDC-Text-Corpora-for-Learners-HTML-Mirrors-of-MMWR/ut5n-bmc3/about_data) of the MMWR [Morbidity and Mortality Weekly Report](https://www.cdc.gov/mmwr/) including its series: *Weekly Reports*, *Recommendations and Reports*, *Surveillance Summaries*, *Supplements*, and *Notifiable Diseases*, a subset of *Weekly Reports*, constructed ad hoc; EID [Emerging Infectious Diseases](https://www.cdc.gov/eid/); and PCD [Preventing Chronic Disease](https://www.cdc.gov/pcd/)

The data represented here is the tabulated [metadata](https://github.com/cmheilig/harvest-cdc-journals/blob/main/README.md#metadata-fields) of the combined 33,567 articles of the [MMWR, EID, and PCD collections](https://github.com/cmheilig/harvest-cdc-journals?tab=readme-ov-file#collections) whose contents are organized into three ZIP archived JSON files per collection. The JSON value output formats include UTF-8 HTML, UTF-8 markdown, and ASCII plain text.

The [JSON files](https://github.com/cmheilig/harvest-cdc-journals?tab=readme-ov-file#collections) are located in the [program's repository.](https://github.com/cmheilig/harvest-cdc-journals) This version was constructed on 2024-03-01 using source content retrieved on 2024-01-09.

* **Snowflake Schema**: dwv\_pub\_health\_infra
* **Databricks Schema**: cdc\_dwv\_pub\_health\_infra
* **Table Name**: cdc\_text\_corpora\_learners\_mmwr\_eid\_pcd\_\_\_7rih\_tqi5
* **Dataset ID**: 7rih-tqi5
* **Category**: National Center for State, Tribal, Local, and Territorial Public Health Infrastructure and Workforce
* **Total Rows**: 100,701
* **Last Refresh**: 5/2/2026
* **Total Batches**: 3
* **Tags**: eid, text analysis, smokefree indoor air, semantics, phic, pcd, ncstltphiw, morphology, mmwr, ml, machine learning, linguistics, language, informatics, harvest-cdc-journals, data science, corpus, corpora
* **Source Data**: <https://data.cdc.gov/d/7rih-tqi5>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.dataplex-consulting.com/data-catalog/cdc-open-data-product/cdc-national-center-for-state-tribal-local-and-territorial-public-health-infrastructure-and-workfo-1.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
