# Census LEHD LODES Employment Dataset

### About the Dataset

Census block-level employment data from the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) program. Origin-destination commute flows, workforce demographics, and job characteristics with geographic crosswalk for county, metro, and state aggregation. 22 years of annual data (2002-2023) covering all 50 states, DC, and Puerto Rico.

**75,000 files condensed into one SQL query.**

{% hint style="info" %}
**Get Full Access** | [Snowflake Marketplace](https://app.snowflake.com/marketplace/listing/GZT1Z7QRSWHE/dataplex-consulting-data-products-census-lehd-lodes-employment-dataset) | [Free Trial](https://trial.dataplex-consulting.com)
{% endhint %}

### Quick Access

**Tables**: OD, RAC, WAC, XWALK + 4 metadata tables\
**Sources**: 4 Census LEHD data sources\
**Coverage**: All US Census blocks (\~8 million), 22 years (2002-2023)\
**Update Frequency**: Annually (1-2 year lag from reference year to Census release)

## Overview

The Census LEHD LODES dataset provides comprehensive access to employment geography data including:

* **Origin-Destination** (`OD`) - Block-to-block commute flows: where workers live and where they work. 2.6 billion rows across 22 years.
* **Residence Area Characteristics** (`RAC`) - Jobs by where workers live with 42 demographic columns (age, earnings, 20 NAICS sectors, race, ethnicity, education, sex). 119 million rows.
* **Workplace Area Characteristics** (`WAC`) - Jobs by where people work with 52 columns including firm age and firm size (WAC-only). 48.5 million rows.
* **Geographic Crosswalk** (`XWALK`) - Maps every 2020 Census block to tracts, counties, metros, congressional districts, ZCTAs, and coordinates. 8.2 million rows. Always fully available (no trial limit).

### Metadata Tables

Every Dataplex data product includes these standard metadata tables:

| Table             | Purpose                                                        |
| ----------------- | -------------------------------------------------------------- |
| `FEEDS`           | Dataset catalog — available tables, descriptions, update dates |
| `FEEDS_FILES`     | Batch load history with `is_latest` flag for data freshness    |
| `CHANGELOG`       | Change log — data loads, schema changes, corrections           |
| `DATA_DICTIONARY` | Column descriptions for all tables                             |

## Entity Relationship Diagram

![Census LEHD LODES Entity Relationship Diagram](/files/S1JXWDh95cPMkd8BBPxR)

**Join pattern:** OD, RAC, and WAC all join to XWALK via geocode columns (`w_geocode` or `h_geocode` = `tabblk2020`) for geographic aggregation from blocks to counties, metros, and states. All data tables link to `FEEDS` and `FEEDS_FILES` via `feed_id` and `feeds_files_id` for data lineage.

## Data Tables

### OD (Origin-Destination)

Block-to-block commute flows: where workers live and where they work. Each row represents a unique home-block to work-block pair for a given year. JOIN to XWALK on `w_geocode` or `h_geocode` to aggregate to county, metro, or state level.

**Key Features:**

* 2.6 billion rows across 22 years (2002-2023)
* Census block-level granularity (15-digit FIPS codes)
* Job counts segmented by age (3), earnings (3), and industry (3)
* `main` files (both residence and workplace in same state) and `aux` files (workplace in state, residence elsewhere)

**Primary Key:** `w_geocode` + `h_geocode` + `year` + `part`

See [Schema Reference](/data-catalog/census-lehd-lodes-dataset/schema-reference.md#od-origin-destination) for all 19 columns.

### RAC (Residence Area Characteristics)

Jobs by where workers live. Each row is a Census block where workers reside, with 42 demographic breakdown columns. JOIN to XWALK on `h_geocode` to aggregate.

**Key Features:**

* 119 million rows across 22 years
* 20 NAICS industry sectors (CNS01-CNS20)
* 7 race categories, 2 ethnicity groups, 4 education levels, 2 sex categories
* 3 age segments and 3 earnings brackets

**Primary Key:** `h_geocode` + `year`

See [Schema Reference](/data-catalog/census-lehd-lodes-dataset/schema-reference.md#rac-residence-area-characteristics) for all 48 columns.

### WAC (Workplace Area Characteristics)

Jobs by where people work. Each row is a Census block where jobs are located. Identical demographic columns to RAC plus firm age (CFA) and firm size (CFS) columns not available in RAC.

**Key Features:**

* 48.5 million rows across 22 years
* All RAC columns plus 5 firm age groups (CFA01-CFA05) and 5 firm size groups (CFS01-CFS05)
* WAC uses noise infusion (NOT synthetic like RAC/OD) — reliable at 10+ jobs per block

**Primary Key:** `w_geocode` + `year`

See [Schema Reference](/data-catalog/census-lehd-lodes-dataset/schema-reference.md#wac-workplace-area-characteristics) for all 58 columns.

### XWALK (Geographic Crosswalk)

Reference table mapping every 2020 Census block to higher geographies. This is the aggregation enabler — JOIN OD/RAC/WAC to XWALK to roll up block-level data to counties, metros, states, congressional districts, or ZCTAs.

**Key Features:**

* 8.2 million rows (one per 2020 Census block)
* Maps to: state, county, tract, block group, CBSA/metro, ZCTA, congressional district, place, school district, and more
* Includes block centroid coordinates (latitude/longitude)
* Always fully available — no trial limit (essential for any analysis)

**Primary Key:** `tabblk2020`

See [Schema Reference](/data-catalog/census-lehd-lodes-dataset/schema-reference.md#xwalk-geographic-crosswalk) for all 45 columns.

## Data Quality

### Data Generation Methods

| Table | Method                         | Reliability                                                      |
| ----- | ------------------------------ | ---------------------------------------------------------------- |
| OD    | Full synthetic data generation | Statistically representative; reliable at county level and above |
| RAC   | Full synthetic data generation | Statistically representative; reliable at county level and above |
| WAC   | Multiplicative noise infusion  | Reliable at 10+ jobs per block                                   |
| XWALK | Exact Census geography         | Exact                                                            |

### Standardization

* All geocode columns zero-padded to 15 digits (preserved as strings, not integers)
* All geographic codes preserve leading zeros (state, county, tract, CBSA)
* Year extracted from filenames and added as a typed integer column
* All job count columns cast to NUMBER with `TRY_TO_NUMBER` for safe handling

### Data Freshness

Check when data was last updated:

```sql
SELECT source_name, data_period, row_count, created_at, is_latest
FROM DWV.FEEDS_FILES
WHERE is_latest = TRUE;
```

## Getting Started

### Platform Schema Reference

This dataset is available on both Snowflake and Databricks. Queries use schema-only references — the database is already set by the share or catalog context:

| Platform       | Schema                  | Example                    |
| -------------- | ----------------------- | -------------------------- |
| **Snowflake**  | `DWV`                   | `DWV.OD`                   |
| **Databricks** | `census_lehd_lodes_dwv` | `census_lehd_lodes_dwv.od` |

### Discover Available Data

Start with the `FEEDS` table to see what's available, and `FEEDS_FILES` to understand data freshness.

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Browse the data catalog
SELECT table_name, description, latest_data_period,
       latest_row_count, last_loaded_at, total_batches
FROM DWV.FEEDS
ORDER BY table_name;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
-- Browse the data catalog
SELECT table_name, description, latest_data_period,
       latest_row_count, last_loaded_at, total_batches
FROM census_lehd_lodes_dwv.feeds
ORDER BY table_name;
```

{% endtab %}
{% endtabs %}

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Check data freshness per source
SELECT source_name, data_period, row_count, row_count_delta,
       created_at AS loaded_at, is_latest
FROM DWV.FEEDS_FILES
ORDER BY source_name, created_at DESC;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT source_name, data_period, row_count, row_count_delta,
       created_at AS loaded_at, is_latest
FROM census_lehd_lodes_dwv.feeds_files
ORDER BY source_name, created_at DESC;
```

{% endtab %}
{% endtabs %}

### Working with Data Lineage

Every data row links to `FEEDS_FILES` via `feeds_files_id`, which tells you exactly which batch loaded that data. Use this to filter to the current data version or trace any row back to its source load.

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Get current WAC data with batch lineage
SELECT w.w_geocode, w.year, w.C000, w.CNS16,
       ff.data_period, ff.created_at AS data_loaded_at
FROM DWV.WAC w
JOIN DWV.FEEDS_FILES ff ON w.feeds_files_id = ff.id
WHERE ff.is_latest = TRUE
LIMIT 20;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT w.w_geocode, w.year, w.C000, w.CNS16,
       ff.data_period, ff.created_at AS data_loaded_at
FROM census_lehd_lodes_dwv.wac w
JOIN census_lehd_lodes_dwv.feeds_files ff ON w.feeds_files_id = ff.id
WHERE ff.is_latest = TRUE
LIMIT 20;
```

{% endtab %}
{% endtabs %}

### Top Employment Counties

Aggregate workplace block data to county level to find the highest employment centers.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT x.ctyname AS county, x.stname AS state,
       SUM(w.C000) AS total_jobs,
       SUM(w.CNS16) AS healthcare_jobs,
       SUM(w.CNS05) AS manufacturing_jobs
FROM DWV.WAC w
JOIN DWV.XWALK x ON w.w_geocode = x.tabblk2020
WHERE w.year = 2023
GROUP BY 1, 2
ORDER BY total_jobs DESC
LIMIT 25;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT x.ctyname AS county, x.stname AS state,
       SUM(w.C000) AS total_jobs,
       SUM(w.CNS16) AS healthcare_jobs,
       SUM(w.CNS05) AS manufacturing_jobs
FROM census_lehd_lodes_dwv.wac w
JOIN census_lehd_lodes_dwv.xwalk x ON w.w_geocode = x.tabblk2020
WHERE w.year = 2023
GROUP BY 1, 2
ORDER BY total_jobs DESC
LIMIT 25;
```

{% endtab %}
{% endtabs %}

### Commute Flows Between Counties

Find the largest commute flows between counties using OD + XWALK.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT xw.ctyname AS work_county, xw.stname AS work_state,
       xh.ctyname AS home_county, xh.stname AS home_state,
       SUM(od.S000) AS total_commuters
FROM DWV.OD od
JOIN DWV.XWALK xw ON od.w_geocode = xw.tabblk2020
JOIN DWV.XWALK xh ON od.h_geocode = xh.tabblk2020
WHERE od.year = 2023 AND od.part = 'main'
GROUP BY 1, 2, 3, 4
ORDER BY total_commuters DESC
LIMIT 25;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT xw.ctyname AS work_county, xw.stname AS work_state,
       xh.ctyname AS home_county, xh.stname AS home_state,
       SUM(od.S000) AS total_commuters
FROM census_lehd_lodes_dwv.od od
JOIN census_lehd_lodes_dwv.xwalk xw ON od.w_geocode = xw.tabblk2020
JOIN census_lehd_lodes_dwv.xwalk xh ON od.h_geocode = xh.tabblk2020
WHERE od.year = 2023 AND od.part = 'main'
GROUP BY 1, 2, 3, 4
ORDER BY total_commuters DESC
LIMIT 25;
```

{% endtab %}
{% endtabs %}

### Workforce Demographics by Metro

Analyze workforce age, earnings, and industry composition at the metropolitan area level.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT x.cbsaname AS metro_area,
       SUM(r.C000) AS total_workers,
       ROUND(SUM(r.CE03) / NULLIF(SUM(r.C000), 0) * 100, 1) AS pct_high_earners,
       ROUND(SUM(r.CA01) / NULLIF(SUM(r.C000), 0) * 100, 1) AS pct_under_30
FROM DWV.RAC r
JOIN DWV.XWALK x ON r.h_geocode = x.tabblk2020
WHERE r.year = 2023 AND x.cbsa IS NOT NULL
GROUP BY 1
HAVING SUM(r.C000) > 100000
ORDER BY total_workers DESC
LIMIT 25;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT x.cbsaname AS metro_area,
       SUM(r.C000) AS total_workers,
       ROUND(SUM(r.CE03) / NULLIF(SUM(r.C000), 0) * 100, 1) AS pct_high_earners,
       ROUND(SUM(r.CA01) / NULLIF(SUM(r.C000), 0) * 100, 1) AS pct_under_30
FROM census_lehd_lodes_dwv.rac r
JOIN census_lehd_lodes_dwv.xwalk x ON r.h_geocode = x.tabblk2020
WHERE r.year = 2023 AND x.cbsa IS NOT NULL
GROUP BY 1
HAVING SUM(r.C000) > 100000
ORDER BY total_workers DESC
LIMIT 25;
```

{% endtab %}
{% endtabs %}

### Employment Trend Over Time

Track total job counts by county across years to identify growth and decline.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT x.ctyname AS county, x.stname AS state,
       w.year,
       SUM(w.C000) AS total_jobs
FROM DWV.WAC w
JOIN DWV.XWALK x ON w.w_geocode = x.tabblk2020
WHERE x.cty = '06037'  -- Los Angeles County
GROUP BY 1, 2, 3
ORDER BY w.year;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT x.ctyname AS county, x.stname AS state,
       w.year,
       SUM(w.C000) AS total_jobs
FROM census_lehd_lodes_dwv.wac w
JOIN census_lehd_lodes_dwv.xwalk x ON w.w_geocode = x.tabblk2020
WHERE x.cty = '06037'  -- Los Angeles County
GROUP BY 1, 2, 3
ORDER BY w.year;
```

{% endtab %}
{% endtabs %}

### Tracking Data Changes Over Time

`FEEDS_FILES` records every batch load with `row_count_delta` showing what changed. Use this to monitor source data updates.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT source_name, data_period, row_count, row_count_delta,
       created_at AS loaded_at, is_latest
FROM DWV.FEEDS_FILES
ORDER BY source_name, created_at;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT source_name, data_period, row_count, row_count_delta,
       created_at AS loaded_at, is_latest
FROM census_lehd_lodes_dwv.feeds_files
ORDER BY source_name, created_at;
```

{% endtab %}
{% endtabs %}

{% hint style="success" %}
**Ready to access Census LEHD LODES data?**
{% endhint %}

| Platform       | Action                                                                                                                                                  |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Snowflake**  | [Get on Marketplace](https://app.snowflake.com/marketplace/listing/GZT1Z7QRSWHE/dataplex-consulting-data-products-census-lehd-lodes-employment-dataset) |
| **Databricks** | [Free Trial](https://trial.dataplex-consulting.com)                                                                                                     |

{% hint style="success" %}
Questions? [Contact our team](mailto:support@dataplex-consulting.com) for a walkthrough.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.dataplex-consulting.com/data-catalog/census-lehd-lodes-dataset.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
