Welcome back! Web scraping is one of my favorite things to do (if you couldn’t tell from the millions of articles I talk about), so let’s do some web scraping using the fantastic programming language R! This is a very beginner friendly tutorial, but i’m assuming you have R installed on your machine and know a little bit of how this language works, if that sounds like you then lets get started!
The specific package we’re going to be using is Rvest, this is pretty much BeutifulSoup (the Python package) but for our R enviornment, to install this package use the following command in your R console:
Awesome! You’ve just installed the package for this tutorial!
Now before we start pulling data, we first must learn the layout of how the data is going to be scraped. When I started using this package I always saw a ton of tutorials speeding past this part, it led me to being stuck on tons of basic steps, so this is a very important concept to understand during web scraping, let’s take a look at the following HTML code:
<body><p1>This is a test.</p>
<p2>This is not a test.</p>
<p3>This is still a test.</p></body>
Also, if you’re wondering how you can find the HTML code of a website, almost every single browser (including Safari, Firefox and Chrome) allow you to see this code, to do so right click on any portion of the website and select inspect, you may have to enable this functionality within your settings. That’s a very quick walkthrough but that type of…