LogoLogo
  • Getting Started
  • Snowflake Catalog
    • Snowflake Catalog
    • CDC Open Data Product
      • CDC Open Data Catalog
    • CMS Data Feeds Dataset
      • CMS Data Feeds Catalog
    • CMS Data Research Dataset
      • CMS Data Research Catalog
    • CMS Home Medical Equipment Product Databank
    • CMS Home Medical Equipment Provider Databank
    • CMS NPPES Provider Dataset
    • COVID-19 Diagnostic Laboratory Testing Dataset
    • FDA Device Dataset
    • FDA Drug Adverse Events (FAERS) Dataset
    • Google Reviews & Ratings Dataset
    • Google Trends Top Daily Terms Data Warehouse
    • Reddit/Subreddit Dataset
Powered by GitBook
LogoLogo

Explore More Resources

  • Data Solutions Hub
  • Visit Snowflake Marketplace
  • Visit Data Commerce Cloud
  • Connect with Us on LinkedIn

© 2025 Dataplex Consulting & Data Products

On this page
  • About the Dataset
  • Dataset Features
  • Data Quality and Maintenance
  • Business Applications
  • Example Use Cases
  • Data Structure
  • Sample Queries
  • Support and Contact
  • About Dataplex
  1. Snowflake Catalog

Reddit/Subreddit Dataset

About the Dataset

The Reddit/Subreddit Dataset is a comprehensive collection of data tracking attributes of over 1.1 million subreddits, including daily subscriber counts. This dataset is available on the Snowflake Marketplace and offers valuable insights into Reddit's vast community of interest-based forums. Key features include:

  • Daily tracking of subscriber counts for each subreddit

  • Detailed attributes for each subreddit

  • AI-appended categories and related subjects for enhanced analysis

  • Coverage of public and private subreddits

  • Language and advertising status information

A free trial is available, providing seven days of access to data for all subreddits with more than 100,000 subscribers.

Dataset Features

  • Comprehensive Coverage: Data on over 1.1 million subreddits

  • Daily Updates: New records added daily for each subreddit

  • Rich Attribute Set: Includes subreddit type, description, creation date, and more

  • Subscriber Tracking: Daily subscriber count for each subreddit

  • AI-Enhanced Categorization: Appended categories and related subjects for improved analysis

  • Advertising Information: Includes advertising status and categories

  • Language Data: ISO language codes for each subreddit

  • Content Policy Indicators: Flags for NSFW content, image/video permissions, etc.

Data Quality and Maintenance

At Dataplex Consulting & Data Products, we prioritize data quality:

  • Daily monitoring of ingestion and ETL jobs

  • Automated data quality checks to prevent bad data from reaching customers

  • Regular updates to ensure data freshness and accuracy

The dataset is refreshed daily, adding new records for each subreddit to track attributes and subscriber numbers.

Business Applications

The Reddit/Subreddit Dataset offers valuable insights for various business needs:

  • Market Analysis: Identify interest trends based on advertising categories and subscriber growth

  • Audience Segmentation: Discover fast-growing subreddit communities and break down metrics by language and advertising status

  • Customer Acquisition: Target audiences directly interested in products by leveraging appended categories and subscriber data

  • Content Strategy: Understand popular topics and community engagement across different subreddits

  • Trend Forecasting: Analyze subscriber growth patterns to predict emerging interests

Example Use Cases

  1. Identify top subreddits by category that allow advertisements

  2. Track the fastest-growing subreddits over a specified time period

  3. Analyze subscriber growth trends for specific interest areas

  4. Compare engagement across different languages or regions

  5. Discover related subreddits for targeted marketing campaigns

Data Structure

The dataset consists of three main tables:

  1. dim_subreddit: Dimension table containing subreddit attributes

  2. fct_subreddit_day: Fact table tracking daily subscriber counts

  3. dim_date: Date dimension table for time-based analysis

Entity Relationship Diagram

Sample Queries

  1. Find the top 10 largest subreddits:

SELECT subreddit, subreddit_url, last_subscribers_count
FROM dwv.dim_subreddit
ORDER BY last_subscribers_count DESC
LIMIT 10
  1. Get the top 10 trending subreddits in the past two days:

WITH subreddit_growth AS (
  SELECT s.subreddit,
         MIN(subscribers) AS subscribers_start,
         MAX(subscribers) AS subscribers_end
  FROM dwv.dim_subreddit s
  JOIN dwv.fct_subreddit_day sd ON s.id = sd.subreddit_id
  JOIN dwv.dim_date dd ON sd.date_key = dd.date_key
  WHERE dd.date_at > DATEADD(DAY, -2, CURRENT_DATE())
    AND NOT s.over18
  GROUP BY s.subreddit
)
SELECT subreddit,
       TO_CHAR(subscribers_start, '999,999,999') AS subscribers_start,
       TO_CHAR(subscribers_end, '999,999,999') AS subscribers_end,
       ROUND(((subscribers_end - subscribers_start) / subscribers_start) * 100, 2) || '%' AS growth_rate
FROM subreddit_growth
WHERE subscribers_start > 10000
ORDER BY growth_rate DESC
LIMIT 10
  1. Find the top subreddit for each advertising category:

WITH ranked_subreddits AS (
  SELECT d.subreddit,
         d.advertiser_category,
         fct.subscribers,
         RANK() OVER (PARTITION BY d.advertiser_category ORDER BY fct.subscribers DESC) AS category_rank
  FROM dwv.dim_subreddit d
  JOIN dwv.fct_subreddit_day fct ON d.id = fct.subreddit_id
  JOIN dwv.dim_date dd ON fct.date_key = dd.date_key
  WHERE dd.is_current_day = 1
)
SELECT advertiser_category, subreddit, subscribers
FROM ranked_subreddits
WHERE category_rank = 1
ORDER BY subscribers DESC

Support and Contact

For any questions or assistance with the Reddit/Subreddit Dataset, please contact our support team:

We monitor our data pipelines daily and are committed to providing high-quality, timely support to all our customers.

About Dataplex

Dataplex Consulting & Data Products is a leading provider of turnkey data products and consulting services. With over 20 years of experience serving businesses of all sizes, including Fortune 500 companies, we specialize in:

  • Creating accessible, high-quality data products

  • Implementing robust data pipelines with automatic quality checks

  • Offering expert data consulting services

  • Helping companies become more data-driven and increase revenue

Our team's extensive practical expertise ensures that we deliver solutions tailored to your specific business needs, driving success through data-informed decision-making.

PreviousGoogle Trends Top Daily Terms Data Warehouse

Last updated 3 months ago

Email:

support@dataplex-consulting.com