Reddit/Subreddit Dataset
About the Dataset
The Reddit/Subreddit Dataset is a comprehensive collection of data tracking attributes of over 1.1 million subreddits, including daily subscriber counts. This dataset is available on the Snowflake Marketplace and offers valuable insights into Reddit's vast community of interest-based forums. Key features include:
Daily tracking of subscriber counts for each subreddit
Detailed attributes for each subreddit
AI-appended categories and related subjects for enhanced analysis
Coverage of public and private subreddits
Language and advertising status information
A free trial is available, providing seven days of access to data for all subreddits with more than 100,000 subscribers.
Dataset Features
Comprehensive Coverage: Data on over 1.1 million subreddits
Daily Updates: New records added daily for each subreddit
Rich Attribute Set: Includes subreddit type, description, creation date, and more
Subscriber Tracking: Daily subscriber count for each subreddit
AI-Enhanced Categorization: Appended categories and related subjects for improved analysis
Advertising Information: Includes advertising status and categories
Language Data: ISO language codes for each subreddit
Content Policy Indicators: Flags for NSFW content, image/video permissions, etc.
Data Quality and Maintenance
At Dataplex Consulting & Data Products, we prioritize data quality:
Daily monitoring of ingestion and ETL jobs
Automated data quality checks to prevent bad data from reaching customers
Regular updates to ensure data freshness and accuracy
The dataset is refreshed daily, adding new records for each subreddit to track attributes and subscriber numbers.
Business Applications
The Reddit/Subreddit Dataset offers valuable insights for various business needs:
Market Analysis: Identify interest trends based on advertising categories and subscriber growth
Audience Segmentation: Discover fast-growing subreddit communities and break down metrics by language and advertising status
Customer Acquisition: Target audiences directly interested in products by leveraging appended categories and subscriber data
Content Strategy: Understand popular topics and community engagement across different subreddits
Trend Forecasting: Analyze subscriber growth patterns to predict emerging interests
Example Use Cases
Identify top subreddits by category that allow advertisements
Track the fastest-growing subreddits over a specified time period
Analyze subscriber growth trends for specific interest areas
Compare engagement across different languages or regions
Discover related subreddits for targeted marketing campaigns
Data Structure
The dataset consists of three main tables:
dim_subreddit: Dimension table containing subreddit attributes
fct_subreddit_day: Fact table tracking daily subscriber counts
dim_date: Date dimension table for time-based analysis
Entity Relationship Diagram
Sample Queries
Find the top 10 largest subreddits:
SELECT subreddit, subreddit_url, last_subscribers_count
FROM dwv.dim_subreddit
ORDER BY last_subscribers_count DESC
LIMIT 10
Get the top 10 trending subreddits in the past two days:
WITH subreddit_growth AS (
SELECT s.subreddit,
MIN(subscribers) AS subscribers_start,
MAX(subscribers) AS subscribers_end
FROM dwv.dim_subreddit s
JOIN dwv.fct_subreddit_day sd ON s.id = sd.subreddit_id
JOIN dwv.dim_date dd ON sd.date_key = dd.date_key
WHERE dd.date_at > DATEADD(DAY, -2, CURRENT_DATE())
AND NOT s.over18
GROUP BY s.subreddit
)
SELECT subreddit,
TO_CHAR(subscribers_start, '999,999,999') AS subscribers_start,
TO_CHAR(subscribers_end, '999,999,999') AS subscribers_end,
ROUND(((subscribers_end - subscribers_start) / subscribers_start) * 100, 2) || '%' AS growth_rate
FROM subreddit_growth
WHERE subscribers_start > 10000
ORDER BY growth_rate DESC
LIMIT 10
Find the top subreddit for each advertising category:
WITH ranked_subreddits AS (
SELECT d.subreddit,
d.advertiser_category,
fct.subscribers,
RANK() OVER (PARTITION BY d.advertiser_category ORDER BY fct.subscribers DESC) AS category_rank
FROM dwv.dim_subreddit d
JOIN dwv.fct_subreddit_day fct ON d.id = fct.subreddit_id
JOIN dwv.dim_date dd ON fct.date_key = dd.date_key
WHERE dd.is_current_day = 1
)
SELECT advertiser_category, subreddit, subscribers
FROM ranked_subreddits
WHERE category_rank = 1
ORDER BY subscribers DESC
Support and Contact
For any questions or assistance with the Reddit/Subreddit Dataset, please contact our support team:
Email: [email protected]
We monitor our data pipelines daily and are committed to providing high-quality, timely support to all our customers.
About Dataplex
Dataplex Consulting & Data Products is a leading provider of turnkey data products and consulting services. With over 20 years of experience serving businesses of all sizes, including Fortune 500 companies, we specialize in:
Creating accessible, high-quality data products
Implementing robust data pipelines with automatic quality checks
Offering expert data consulting services
Helping companies become more data-driven and increase revenue
Our team's extensive practical expertise ensures that we deliver solutions tailored to your specific business needs, driving success through data-informed decision-making.
Last updated