Let’s scrape Reddit data in R!

Manpreet Singh
4 min readMay 25, 2021

Welcome back! R is an awesome programming language for data science, so let’s do some data processing with this language! In this specific project we’ll be scraping some data from Reddit and essentially formatting it, it’s a pretty basic project but definitely a great project nonetheless. Funny fact, i’ve created a tool for myself (definitely a bit more intricate than this) that I use on an everyday basis, so some components of this project is definitely reproducible in other projects. With that out of the way let’s get to coding!

Requirements

Before going further make sure to have RStudio installed on your machine, check out this article I made the other month on how to install this:

I’m hoping you also have some experience with R, with that out of the way let’s get started!

Building The Project

Let’s start off by installing the following packages (if needed):

install.packages("jsonlite")
install.packages("tidyverse")
install.packages("dplyr")
install.packages("http")

Great, now let’s import this packages on our R script, we do so by doing this:

library(jsonlite)
library(tidyverse)
library(dplyr)
library(httr)

Awesome! Now luckily for us Reddit actually has a JSON component of their website, by adding .json to the end of the URL will actually give you the JSON of the website:

Well, we can use the jsonlite package to organize this data into a data frame, we do so by using the following command:

btc <- jsonlite::fromJSON(“ENTER REDDIT URL HERE")

btc is just a variable name, you can change that to whatever you want, you also need to change the Reddit URL above (in this case it’s a subreddit). In my example this is the code that i’m going to use…

--

--