alien-8bitReddit/Subreddit Dataset

About the Dataset

The Reddit/Subreddit Dataset is a comprehensive collection of data tracking attributes of over 1.1 million subreddits, including daily subscriber counts. This dataset is available on the Snowflake Marketplace and offers valuable insights into Reddit's vast community of interest-based forums. Key features include:

  • Daily tracking of subscriber counts for each subreddit

  • Detailed attributes for each subreddit

  • AI-appended categories and related subjects for enhanced analysis

  • Coverage of public and private subreddits

  • Language and advertising status information

A free trial is available, providing seven days of access to data for all subreddits with more than 100,000 subscribers.

🔗 Find the Reddit/Subreddit Dataset on the Snowflake Marketplace.arrow-up-right

Dataset Features

  • Comprehensive Coverage: Data on over 1.1 million subreddits

  • Daily Updates: New records added daily for each subreddit

  • Rich Attribute Set: Includes subreddit type, description, creation date, and more

  • Subscriber Tracking: Daily subscriber count for each subreddit

  • AI-Enhanced Categorization: Appended categories and related subjects for improved analysis

  • Advertising Information: Includes advertising status and categories

  • Language Data: ISO language codes for each subreddit

  • Content Policy Indicators: Flags for NSFW content, image/video permissions, etc.

Data Quality and Maintenance

At Dataplex Consulting & Data Products, we prioritize data quality:

  • Daily monitoring of ingestion and ETL jobs

  • Automated data quality checks to prevent bad data from reaching customers

  • Regular updates to ensure data freshness and accuracy

The dataset is refreshed daily, adding new records for each subreddit to track attributes and subscriber numbers.

Business Applications

The Reddit/Subreddit Dataset offers valuable insights for various business needs:

  • Market Analysis: Identify interest trends based on advertising categories and subscriber growth

  • Audience Segmentation: Discover fast-growing subreddit communities and break down metrics by language and advertising status

  • Customer Acquisition: Target audiences directly interested in products by leveraging appended categories and subscriber data

  • Content Strategy: Understand popular topics and community engagement across different subreddits

  • Trend Forecasting: Analyze subscriber growth patterns to predict emerging interests

Example Use Cases

  1. Identify top subreddits by category that allow advertisements

  2. Track the fastest-growing subreddits over a specified time period

  3. Analyze subscriber growth trends for specific interest areas

  4. Compare engagement across different languages or regions

  5. Discover related subreddits for targeted marketing campaigns

Data Structure

The dataset consists of three main tables:

  1. dim_subreddit: Dimension table containing subreddit attributes

  2. fct_subreddit_day: Fact table tracking daily subscriber counts

  3. dim_date: Date dimension table for time-based analysis

Entity Relationship Diagram

Reddit Subreddit Data Model

Sample Queries

  1. Find the top 10 largest subreddits:

  1. Get the top 10 trending subreddits in the past two days:

  1. Find the top subreddit for each advertising category:

Support and Contact

For any questions or assistance with the Reddit/Subreddit Dataset, please contact our support team:

We monitor our data pipelines daily and are committed to providing high-quality, timely support to all our customers.

About Dataplex

Dataplex Consulting & Data Products delivers turnkey, analytics-ready data products that make complex public and commercial data easy to use across modern data platforms. Our data pipelines include automated quality checks and active monitoring to ensure timely, reliable, and well-structured data that is ready for downstream analytics, machine learning, and operational use.

In addition to data products, Dataplex provides data engineering and analytics consulting services to organizations of all sizes. We bring deep, hands-on experience supporting both early-stage companies and large enterprises, helping teams build scalable data platforms, improve data reliability, and become more data-driven.

Last updated