# Census LEHD LODES Employment Dataset

### About the Dataset

Census block-level employment data from the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) program. Origin-destination commute flows, workforce demographics, and job characteristics with geographic crosswalk for county, metro, and state aggregation. 22 years of annual data (2002-2023) covering all 50 states, DC, and Puerto Rico.

**75,000 files condensed into one SQL query.**

{% hint style="info" %}
**Get Full Access** | [Snowflake Marketplace](https://app.snowflake.com/marketplace/listing/GZT1Z7QRSWHE/dataplex-consulting-data-products-census-lehd-lodes-employment-dataset) | [Free Trial](https://trial.dataplex-consulting.com)
{% endhint %}

### Quick Access

**Tables**: OD, RAC, WAC, XWALK + 4 metadata tables\
**Sources**: 4 Census LEHD data sources\
**Coverage**: All US Census blocks (\~8 million), 22 years (2002-2023)\
**Update Frequency**: Annually (1-2 year lag from reference year to Census release)

## Overview

The Census LEHD LODES dataset provides comprehensive access to employment geography data including:

* **Origin-Destination** (`OD`) - Block-to-block commute flows: where workers live and where they work. 2.6 billion rows across 22 years.
* **Residence Area Characteristics** (`RAC`) - Jobs by where workers live with 42 demographic columns (age, earnings, 20 NAICS sectors, race, ethnicity, education, sex). 119 million rows.
* **Workplace Area Characteristics** (`WAC`) - Jobs by where people work with 52 columns including firm age and firm size (WAC-only). 48.5 million rows.
* **Geographic Crosswalk** (`XWALK`) - Maps every 2020 Census block to tracts, counties, metros, congressional districts, ZCTAs, and coordinates. 8.2 million rows. Always fully available (no trial limit).

### Metadata Tables

Every Dataplex data product includes these standard metadata tables:

| Table             | Purpose                                                        |
| ----------------- | -------------------------------------------------------------- |
| `FEEDS`           | Dataset catalog — available tables, descriptions, update dates |
| `FEEDS_FILES`     | Batch load history with `is_latest` flag for data freshness    |
| `CHANGELOG`       | Change log — data loads, schema changes, corrections           |
| `DATA_DICTIONARY` | Column descriptions for all tables                             |

## Entity Relationship Diagram

![Census LEHD LODES Entity Relationship Diagram](https://813439891-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FDeimGtBflXKQn786VLvj%2Fuploads%2Fgit-blob-dc93c606ab9c4ac998a22d89313c5eb86d8afb33%2Fentity-relationship.png?alt=media)

**Join pattern:** OD, RAC, and WAC all join to XWALK via geocode columns (`w_geocode` or `h_geocode` = `tabblk2020`) for geographic aggregation from blocks to counties, metros, and states. All data tables link to `FEEDS` and `FEEDS_FILES` via `feed_id` and `feeds_files_id` for data lineage.

## Data Tables

### OD (Origin-Destination)

Block-to-block commute flows: where workers live and where they work. Each row represents a unique home-block to work-block pair for a given year. JOIN to XWALK on `w_geocode` or `h_geocode` to aggregate to county, metro, or state level.

**Key Features:**

* 2.6 billion rows across 22 years (2002-2023)
* Census block-level granularity (15-digit FIPS codes)
* Job counts segmented by age (3), earnings (3), and industry (3)
* `main` files (both residence and workplace in same state) and `aux` files (workplace in state, residence elsewhere)

**Primary Key:** `w_geocode` + `h_geocode` + `year` + `part`

See [Schema Reference](https://docs.dataplex-consulting.com/data-catalog/schema-reference#od-origin-destination) for all 19 columns.

### RAC (Residence Area Characteristics)

Jobs by where workers live. Each row is a Census block where workers reside, with 42 demographic breakdown columns. JOIN to XWALK on `h_geocode` to aggregate.

**Key Features:**

* 119 million rows across 22 years
* 20 NAICS industry sectors (CNS01-CNS20)
* 7 race categories, 2 ethnicity groups, 4 education levels, 2 sex categories
* 3 age segments and 3 earnings brackets

**Primary Key:** `h_geocode` + `year`

See [Schema Reference](https://docs.dataplex-consulting.com/data-catalog/schema-reference#rac-residence-area-characteristics) for all 48 columns.

### WAC (Workplace Area Characteristics)

Jobs by where people work. Each row is a Census block where jobs are located. Identical demographic columns to RAC plus firm age (CFA) and firm size (CFS) columns not available in RAC.

**Key Features:**

* 48.5 million rows across 22 years
* All RAC columns plus 5 firm age groups (CFA01-CFA05) and 5 firm size groups (CFS01-CFS05)
* WAC uses noise infusion (NOT synthetic like RAC/OD) — reliable at 10+ jobs per block

**Primary Key:** `w_geocode` + `year`

See [Schema Reference](https://docs.dataplex-consulting.com/data-catalog/schema-reference#wac-workplace-area-characteristics) for all 58 columns.

### XWALK (Geographic Crosswalk)

Reference table mapping every 2020 Census block to higher geographies. This is the aggregation enabler — JOIN OD/RAC/WAC to XWALK to roll up block-level data to counties, metros, states, congressional districts, or ZCTAs.

**Key Features:**

* 8.2 million rows (one per 2020 Census block)
* Maps to: state, county, tract, block group, CBSA/metro, ZCTA, congressional district, place, school district, and more
* Includes block centroid coordinates (latitude/longitude)
* Always fully available — no trial limit (essential for any analysis)

**Primary Key:** `tabblk2020`

See [Schema Reference](https://docs.dataplex-consulting.com/data-catalog/schema-reference#xwalk-geographic-crosswalk) for all 45 columns.

## Data Quality

### Data Generation Methods

| Table | Method                         | Reliability                                                      |
| ----- | ------------------------------ | ---------------------------------------------------------------- |
| OD    | Full synthetic data generation | Statistically representative; reliable at county level and above |
| RAC   | Full synthetic data generation | Statistically representative; reliable at county level and above |
| WAC   | Multiplicative noise infusion  | Reliable at 10+ jobs per block                                   |
| XWALK | Exact Census geography         | Exact                                                            |

### Standardization

* All geocode columns zero-padded to 15 digits (preserved as strings, not integers)
* All geographic codes preserve leading zeros (state, county, tract, CBSA)
* Year extracted from filenames and added as a typed integer column
* All job count columns cast to NUMBER with `TRY_TO_NUMBER` for safe handling

### Data Freshness

Check when data was last updated:

```sql
SELECT source_name, data_period, row_count, created_at, is_latest
FROM DWV.FEEDS_FILES
WHERE is_latest = TRUE;
```

## Getting Started

### Platform Schema Reference

This dataset is available on both Snowflake and Databricks. Queries use schema-only references — the database is already set by the share or catalog context:

| Platform       | Schema                  | Example                    |
| -------------- | ----------------------- | -------------------------- |
| **Snowflake**  | `DWV`                   | `DWV.OD`                   |
| **Databricks** | `census_lehd_lodes_dwv` | `census_lehd_lodes_dwv.od` |

### Discover Available Data

Start with the `FEEDS` table to see what's available, and `FEEDS_FILES` to understand data freshness.

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Browse the data catalog
SELECT table_name, description, latest_data_period,
       latest_row_count, last_loaded_at, total_batches
FROM DWV.FEEDS
ORDER BY table_name;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
-- Browse the data catalog
SELECT table_name, description, latest_data_period,
       latest_row_count, last_loaded_at, total_batches
FROM census_lehd_lodes_dwv.feeds
ORDER BY table_name;
```

{% endtab %}
{% endtabs %}

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Check data freshness per source
SELECT source_name, data_period, row_count, row_count_delta,
       created_at AS loaded_at, is_latest
FROM DWV.FEEDS_FILES
ORDER BY source_name, created_at DESC;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT source_name, data_period, row_count, row_count_delta,
       created_at AS loaded_at, is_latest
FROM census_lehd_lodes_dwv.feeds_files
ORDER BY source_name, created_at DESC;
```

{% endtab %}
{% endtabs %}

### Working with Data Lineage

Every data row links to `FEEDS_FILES` via `feeds_files_id`, which tells you exactly which batch loaded that data. Use this to filter to the current data version or trace any row back to its source load.

{% tabs %}
{% tab title="Snowflake" %}

```sql
-- Get current WAC data with batch lineage
SELECT w.w_geocode, w.year, w.C000, w.CNS16,
       ff.data_period, ff.created_at AS data_loaded_at
FROM DWV.WAC w
JOIN DWV.FEEDS_FILES ff ON w.feeds_files_id = ff.id
WHERE ff.is_latest = TRUE
LIMIT 20;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT w.w_geocode, w.year, w.C000, w.CNS16,
       ff.data_period, ff.created_at AS data_loaded_at
FROM census_lehd_lodes_dwv.wac w
JOIN census_lehd_lodes_dwv.feeds_files ff ON w.feeds_files_id = ff.id
WHERE ff.is_latest = TRUE
LIMIT 20;
```

{% endtab %}
{% endtabs %}

### Top Employment Counties

Aggregate workplace block data to county level to find the highest employment centers.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT x.ctyname AS county, x.stname AS state,
       SUM(w.C000) AS total_jobs,
       SUM(w.CNS16) AS healthcare_jobs,
       SUM(w.CNS05) AS manufacturing_jobs
FROM DWV.WAC w
JOIN DWV.XWALK x ON w.w_geocode = x.tabblk2020
WHERE w.year = 2023
GROUP BY 1, 2
ORDER BY total_jobs DESC
LIMIT 25;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT x.ctyname AS county, x.stname AS state,
       SUM(w.C000) AS total_jobs,
       SUM(w.CNS16) AS healthcare_jobs,
       SUM(w.CNS05) AS manufacturing_jobs
FROM census_lehd_lodes_dwv.wac w
JOIN census_lehd_lodes_dwv.xwalk x ON w.w_geocode = x.tabblk2020
WHERE w.year = 2023
GROUP BY 1, 2
ORDER BY total_jobs DESC
LIMIT 25;
```

{% endtab %}
{% endtabs %}

### Commute Flows Between Counties

Find the largest commute flows between counties using OD + XWALK.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT xw.ctyname AS work_county, xw.stname AS work_state,
       xh.ctyname AS home_county, xh.stname AS home_state,
       SUM(od.S000) AS total_commuters
FROM DWV.OD od
JOIN DWV.XWALK xw ON od.w_geocode = xw.tabblk2020
JOIN DWV.XWALK xh ON od.h_geocode = xh.tabblk2020
WHERE od.year = 2023 AND od.part = 'main'
GROUP BY 1, 2, 3, 4
ORDER BY total_commuters DESC
LIMIT 25;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT xw.ctyname AS work_county, xw.stname AS work_state,
       xh.ctyname AS home_county, xh.stname AS home_state,
       SUM(od.S000) AS total_commuters
FROM census_lehd_lodes_dwv.od od
JOIN census_lehd_lodes_dwv.xwalk xw ON od.w_geocode = xw.tabblk2020
JOIN census_lehd_lodes_dwv.xwalk xh ON od.h_geocode = xh.tabblk2020
WHERE od.year = 2023 AND od.part = 'main'
GROUP BY 1, 2, 3, 4
ORDER BY total_commuters DESC
LIMIT 25;
```

{% endtab %}
{% endtabs %}

### Workforce Demographics by Metro

Analyze workforce age, earnings, and industry composition at the metropolitan area level.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT x.cbsaname AS metro_area,
       SUM(r.C000) AS total_workers,
       ROUND(SUM(r.CE03) / NULLIF(SUM(r.C000), 0) * 100, 1) AS pct_high_earners,
       ROUND(SUM(r.CA01) / NULLIF(SUM(r.C000), 0) * 100, 1) AS pct_under_30
FROM DWV.RAC r
JOIN DWV.XWALK x ON r.h_geocode = x.tabblk2020
WHERE r.year = 2023 AND x.cbsa IS NOT NULL
GROUP BY 1
HAVING SUM(r.C000) > 100000
ORDER BY total_workers DESC
LIMIT 25;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT x.cbsaname AS metro_area,
       SUM(r.C000) AS total_workers,
       ROUND(SUM(r.CE03) / NULLIF(SUM(r.C000), 0) * 100, 1) AS pct_high_earners,
       ROUND(SUM(r.CA01) / NULLIF(SUM(r.C000), 0) * 100, 1) AS pct_under_30
FROM census_lehd_lodes_dwv.rac r
JOIN census_lehd_lodes_dwv.xwalk x ON r.h_geocode = x.tabblk2020
WHERE r.year = 2023 AND x.cbsa IS NOT NULL
GROUP BY 1
HAVING SUM(r.C000) > 100000
ORDER BY total_workers DESC
LIMIT 25;
```

{% endtab %}
{% endtabs %}

### Employment Trend Over Time

Track total job counts by county across years to identify growth and decline.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT x.ctyname AS county, x.stname AS state,
       w.year,
       SUM(w.C000) AS total_jobs
FROM DWV.WAC w
JOIN DWV.XWALK x ON w.w_geocode = x.tabblk2020
WHERE x.cty = '06037'  -- Los Angeles County
GROUP BY 1, 2, 3
ORDER BY w.year;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT x.ctyname AS county, x.stname AS state,
       w.year,
       SUM(w.C000) AS total_jobs
FROM census_lehd_lodes_dwv.wac w
JOIN census_lehd_lodes_dwv.xwalk x ON w.w_geocode = x.tabblk2020
WHERE x.cty = '06037'  -- Los Angeles County
GROUP BY 1, 2, 3
ORDER BY w.year;
```

{% endtab %}
{% endtabs %}

### Tracking Data Changes Over Time

`FEEDS_FILES` records every batch load with `row_count_delta` showing what changed. Use this to monitor source data updates.

{% tabs %}
{% tab title="Snowflake" %}

```sql
SELECT source_name, data_period, row_count, row_count_delta,
       created_at AS loaded_at, is_latest
FROM DWV.FEEDS_FILES
ORDER BY source_name, created_at;
```

{% endtab %}

{% tab title="Databricks" %}

```sql
SELECT source_name, data_period, row_count, row_count_delta,
       created_at AS loaded_at, is_latest
FROM census_lehd_lodes_dwv.feeds_files
ORDER BY source_name, created_at;
```

{% endtab %}
{% endtabs %}

{% hint style="success" %}
**Ready to access Census LEHD LODES data?**
{% endhint %}

| Platform       | Action                                                                                                                                                  |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Snowflake**  | [Get on Marketplace](https://app.snowflake.com/marketplace/listing/GZT1Z7QRSWHE/dataplex-consulting-data-products-census-lehd-lodes-employment-dataset) |
| **Databricks** | [Free Trial](https://trial.dataplex-consulting.com)                                                                                                     |

{% hint style="success" %}
Questions? [Contact our team](mailto:support@dataplex-consulting.com) for a walkthrough.
{% endhint %}
