How to scrape data from any website using R!

Manpreet Singh
5 min readApr 24, 2021

Welcome back! Web scraping is one of my favorite things to do (if you couldn’t tell from the millions of articles I talk about), so let’s do some web scraping using the fantastic programming language R! This is a very beginner friendly tutorial, but i’m assuming you have R installed on your machine and know a little bit of how this language works, if that sounds like you then lets get started!

Installation

The specific package we’re going to be using is Rvest, this is pretty much BeutifulSoup (the Python package) but for our R enviornment, to install this package use the following command in your R console:

install.packages(“rvest”)

Awesome! You’ve just installed the package for this tutorial!

Understanding HTML

Now before we start pulling data, we first must learn the layout of how the data is going to be scraped. When I started using this package I always saw a ton of tutorials speeding past this part, it led me to being stuck on tons of basic steps, so this is a very important concept to understand during web scraping, let’s take a look at the following HTML code:

<!DOCTYPE html>
<html>
<body>
<p1>This is a test.</p>
<p2>This is not a test.</p>
<p3>This is still a…

--

--