Census LEHD LODES Employment Dataset

About the Dataset

Census block-level employment data from the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) program. Origin-destination commute flows, workforce demographics, and job characteristics with geographic crosswalk for county, metro, and state aggregation. 22 years of annual data (2002-2023) covering all 50 states, DC, and Puerto Rico.

75,000 files condensed into one SQL query.

Get Full Access | Snowflake Marketplace | Free Trial

Quick Access

Tables: OD, RAC, WAC, XWALK + 4 metadata tables Sources: 4 Census LEHD data sources Coverage: All US Census blocks (~8 million), 22 years (2002-2023) Update Frequency: Annually (1-2 year lag from reference year to Census release)

Overview

The Census LEHD LODES dataset provides comprehensive access to employment geography data including:

Origin-Destination (OD) - Block-to-block commute flows: where workers live and where they work. 2.6 billion rows across 22 years.
Residence Area Characteristics (RAC) - Jobs by where workers live with 42 demographic columns (age, earnings, 20 NAICS sectors, race, ethnicity, education, sex). 119 million rows.
Workplace Area Characteristics (WAC) - Jobs by where people work with 52 columns including firm age and firm size (WAC-only). 48.5 million rows.
Geographic Crosswalk (XWALK) - Maps every 2020 Census block to tracts, counties, metros, congressional districts, ZCTAs, and coordinates. 8.2 million rows. Always fully available (no trial limit).

Metadata Tables

Every Dataplex data product includes these standard metadata tables:

Table

Purpose

FEEDS

Dataset catalog — available tables, descriptions, update dates

FEEDS_FILES

Batch load history with is_latest flag for data freshness

CHANGELOG

Change log — data loads, schema changes, corrections

DATA_DICTIONARY

Column descriptions for all tables

Entity Relationship Diagram

Join pattern: OD, RAC, and WAC all join to XWALK via geocode columns (w_geocode or h_geocode = tabblk2020) for geographic aggregation from blocks to counties, metros, and states. All data tables link to FEEDS and FEEDS_FILES via feed_id and feeds_files_id for data lineage.

Data Tables

OD (Origin-Destination)

Block-to-block commute flows: where workers live and where they work. Each row represents a unique home-block to work-block pair for a given year. JOIN to XWALK on w_geocode or h_geocode to aggregate to county, metro, or state level.

Key Features:

2.6 billion rows across 22 years (2002-2023)
Census block-level granularity (15-digit FIPS codes)
Job counts segmented by age (3), earnings (3), and industry (3)
main files (both residence and workplace in same state) and aux files (workplace in state, residence elsewhere)

Primary Key: w_geocode + h_geocode + year + part

See Schema Reference for all 19 columns.

RAC (Residence Area Characteristics)

Jobs by where workers live. Each row is a Census block where workers reside, with 42 demographic breakdown columns. JOIN to XWALK on h_geocode to aggregate.

Key Features:

119 million rows across 22 years
20 NAICS industry sectors (CNS01-CNS20)
7 race categories, 2 ethnicity groups, 4 education levels, 2 sex categories
3 age segments and 3 earnings brackets

Primary Key: h_geocode + year

See Schema Reference for all 48 columns.

WAC (Workplace Area Characteristics)

Jobs by where people work. Each row is a Census block where jobs are located. Identical demographic columns to RAC plus firm age (CFA) and firm size (CFS) columns not available in RAC.

Key Features:

48.5 million rows across 22 years
All RAC columns plus 5 firm age groups (CFA01-CFA05) and 5 firm size groups (CFS01-CFS05)
WAC uses noise infusion (NOT synthetic like RAC/OD) — reliable at 10+ jobs per block

Primary Key: w_geocode + year

See Schema Reference for all 58 columns.

XWALK (Geographic Crosswalk)

Reference table mapping every 2020 Census block to higher geographies. This is the aggregation enabler — JOIN OD/RAC/WAC to XWALK to roll up block-level data to counties, metros, states, congressional districts, or ZCTAs.

Key Features:

8.2 million rows (one per 2020 Census block)
Maps to: state, county, tract, block group, CBSA/metro, ZCTA, congressional district, place, school district, and more
Includes block centroid coordinates (latitude/longitude)
Always fully available — no trial limit (essential for any analysis)

Primary Key: tabblk2020

See Schema Reference for all 45 columns.

Data Quality

Data Generation Methods

Table

Method

Reliability

Full synthetic data generation

Statistically representative; reliable at county level and above

RAC

Full synthetic data generation

Statistically representative; reliable at county level and above

WAC

Multiplicative noise infusion

Reliable at 10+ jobs per block

XWALK

Exact Census geography

Exact

Standardization

All geocode columns zero-padded to 15 digits (preserved as strings, not integers)
All geographic codes preserve leading zeros (state, county, tract, CBSA)
Year extracted from filenames and added as a typed integer column
All job count columns cast to NUMBER with TRY_TO_NUMBER for safe handling

Data Freshness

Check when data was last updated:

SELECT source_name, data_period, row_count, created_at, is_latest
FROM DWV.FEEDS_FILES
WHERE is_latest = TRUE;

Getting Started

Platform Schema Reference

This dataset is available on both Snowflake and Databricks. Queries use schema-only references — the database is already set by the share or catalog context:

Platform

Schema

Example

Snowflake

DWV

DWV.OD

Databricks

census_lehd_lodes_dwv

census_lehd_lodes_dwv.od

Discover Available Data

Start with the FEEDS table to see what's available, and FEEDS_FILES to understand data freshness.

-- Browse the data catalog
SELECT table_name, description, latest_data_period,
       latest_row_count, last_loaded_at, total_batches
FROM DWV.FEEDS
ORDER BY table_name;

-- Check data freshness per source
SELECT source_name, data_period, row_count, row_count_delta,
       created_at AS loaded_at, is_latest
FROM DWV.FEEDS_FILES
ORDER BY source_name, created_at DESC;

Working with Data Lineage

Every data row links to FEEDS_FILES via feeds_files_id, which tells you exactly which batch loaded that data. Use this to filter to the current data version or trace any row back to its source load.

-- Get current WAC data with batch lineage
SELECT w.w_geocode, w.year, w.C000, w.CNS16,
       ff.data_period, ff.created_at AS data_loaded_at
FROM DWV.WAC w
JOIN DWV.FEEDS_FILES ff ON w.feeds_files_id = ff.id
WHERE ff.is_latest = TRUE
LIMIT 20;

SELECT w.w_geocode, w.year, w.C000, w.CNS16,
       ff.data_period, ff.created_at AS data_loaded_at
FROM census_lehd_lodes_dwv.wac w
JOIN census_lehd_lodes_dwv.feeds_files ff ON w.feeds_files_id = ff.id
WHERE ff.is_latest = TRUE
LIMIT 20;

Top Employment Counties

Aggregate workplace block data to county level to find the highest employment centers.

SELECT x.ctyname AS county, x.stname AS state,
       SUM(w.C000) AS total_jobs,
       SUM(w.CNS16) AS healthcare_jobs,
       SUM(w.CNS05) AS manufacturing_jobs
FROM DWV.WAC w
JOIN DWV.XWALK x ON w.w_geocode = x.tabblk2020
WHERE w.year = 2023
GROUP BY 1, 2
ORDER BY total_jobs DESC
LIMIT 25;

SELECT x.ctyname AS county, x.stname AS state,
       SUM(w.C000) AS total_jobs,
       SUM(w.CNS16) AS healthcare_jobs,
       SUM(w.CNS05) AS manufacturing_jobs
FROM census_lehd_lodes_dwv.wac w
JOIN census_lehd_lodes_dwv.xwalk x ON w.w_geocode = x.tabblk2020
WHERE w.year = 2023
GROUP BY 1, 2
ORDER BY total_jobs DESC
LIMIT 25;

Commute Flows Between Counties

Find the largest commute flows between counties using OD + XWALK.

SELECT xw.ctyname AS work_county, xw.stname AS work_state,
       xh.ctyname AS home_county, xh.stname AS home_state,
       SUM(od.S000) AS total_commuters
FROM DWV.OD od
JOIN DWV.XWALK xw ON od.w_geocode = xw.tabblk2020
JOIN DWV.XWALK xh ON od.h_geocode = xh.tabblk2020
WHERE od.year = 2023 AND od.part = 'main'
GROUP BY 1, 2, 3, 4
ORDER BY total_commuters DESC
LIMIT 25;

SELECT xw.ctyname AS work_county, xw.stname AS work_state,
       xh.ctyname AS home_county, xh.stname AS home_state,
       SUM(od.S000) AS total_commuters
FROM census_lehd_lodes_dwv.od od
JOIN census_lehd_lodes_dwv.xwalk xw ON od.w_geocode = xw.tabblk2020
JOIN census_lehd_lodes_dwv.xwalk xh ON od.h_geocode = xh.tabblk2020
WHERE od.year = 2023 AND od.part = 'main'
GROUP BY 1, 2, 3, 4
ORDER BY total_commuters DESC
LIMIT 25;

Workforce Demographics by Metro

Analyze workforce age, earnings, and industry composition at the metropolitan area level.

SELECT x.cbsaname AS metro_area,
       SUM(r.C000) AS total_workers,
       ROUND(SUM(r.CE03) / NULLIF(SUM(r.C000), 0) * 100, 1) AS pct_high_earners,
       ROUND(SUM(r.CA01) / NULLIF(SUM(r.C000), 0) * 100, 1) AS pct_under_30
FROM DWV.RAC r
JOIN DWV.XWALK x ON r.h_geocode = x.tabblk2020
WHERE r.year = 2023 AND x.cbsa IS NOT NULL
GROUP BY 1
HAVING SUM(r.C000) > 100000
ORDER BY total_workers DESC
LIMIT 25;

SELECT x.cbsaname AS metro_area,
       SUM(r.C000) AS total_workers,
       ROUND(SUM(r.CE03) / NULLIF(SUM(r.C000), 0) * 100, 1) AS pct_high_earners,
       ROUND(SUM(r.CA01) / NULLIF(SUM(r.C000), 0) * 100, 1) AS pct_under_30
FROM census_lehd_lodes_dwv.rac r
JOIN census_lehd_lodes_dwv.xwalk x ON r.h_geocode = x.tabblk2020
WHERE r.year = 2023 AND x.cbsa IS NOT NULL
GROUP BY 1
HAVING SUM(r.C000) > 100000
ORDER BY total_workers DESC
LIMIT 25;

Employment Trend Over Time

Track total job counts by county across years to identify growth and decline.

SELECT x.ctyname AS county, x.stname AS state,
       w.year,
       SUM(w.C000) AS total_jobs
FROM DWV.WAC w
JOIN DWV.XWALK x ON w.w_geocode = x.tabblk2020
WHERE x.cty = '06037'  -- Los Angeles County
GROUP BY 1, 2, 3
ORDER BY w.year;

SELECT x.ctyname AS county, x.stname AS state,
       w.year,
       SUM(w.C000) AS total_jobs
FROM census_lehd_lodes_dwv.wac w
JOIN census_lehd_lodes_dwv.xwalk x ON w.w_geocode = x.tabblk2020
WHERE x.cty = '06037'  -- Los Angeles County
GROUP BY 1, 2, 3
ORDER BY w.year;

Tracking Data Changes Over Time

FEEDS_FILES records every batch load with row_count_delta showing what changed. Use this to monitor source data updates.

SELECT source_name, data_period, row_count, row_count_delta,
       created_at AS loaded_at, is_latest
FROM DWV.FEEDS_FILES
ORDER BY source_name, created_at;

Ready to access Census LEHD LODES data?

Platform

Action

Snowflake

Get on Marketplace

Databricks

Free Trial

Questions? Contact our team for a walkthrough.

PreviousYouth Risk Behaviors NextSchema Reference

Last updated 39 minutes ago

hashtagAbout the Dataset

hashtagQuick Access

hashtagOverview

hashtagMetadata Tables

hashtagEntity Relationship Diagram

hashtagData Tables

hashtagOD (Origin-Destination)

hashtagRAC (Residence Area Characteristics)

hashtagWAC (Workplace Area Characteristics)

hashtagXWALK (Geographic Crosswalk)

hashtagData Quality

hashtagData Generation Methods

hashtagStandardization

hashtagData Freshness

hashtagGetting Started

hashtagPlatform Schema Reference

hashtagDiscover Available Data

hashtagWorking with Data Lineage

hashtagTop Employment Counties

hashtagCommute Flows Between Counties

hashtagWorkforce Demographics by Metro

hashtagEmployment Trend Over Time

hashtagTracking Data Changes Over Time

About the Dataset

Quick Access

Overview

Metadata Tables

Entity Relationship Diagram

Data Tables

OD (Origin-Destination)

RAC (Residence Area Characteristics)

WAC (Workplace Area Characteristics)

XWALK (Geographic Crosswalk)

Data Quality

Data Generation Methods

Standardization

Data Freshness

Getting Started

Platform Schema Reference

Discover Available Data

Working with Data Lineage

Top Employment Counties

Commute Flows Between Counties

Workforce Demographics by Metro

Employment Trend Over Time

Tracking Data Changes Over Time