Member-only story
How To Scrape Websites Using Python & BeutifulSoup4

Welcome back! I’ve discussed tons of different ways to scrape data from websites using tons of different languages / packages, now let’s talk about one of the biggest web scraping packages for Python: Beautiful Soup.
Installation
First off, we’re going to install the Beautiful Soup package, to do this use the following pip command(s):
pip install bs4
#OR
pip3 install bs4
Awesome, we’re pretty much ready to start scraping websites, but there is one more important thing to keep in mind.
Understanding HTML
If you’re wondering, yes I took this from my other article, but it’s such a good description of this process ☺️, we first must learn the layout of how the data is going to be scraped. When I started using this package I always saw a ton of tutorials speeding past this part, it led me to being stuck on tons of basic steps, so this is a very important concept to understand during web scraping, let’s take a look at the following HTML code:
<!DOCTYPE html>
<html>
<body><p1>This is a test.</p>
<p2>This is not a test.</p>
<p3>This is still a test.</p></body>
</html>
As most of you know, every single web page is built using HTML, CSS, Javascript or some variation of these languages, HTML is essentially where the raw data is placed, since we’re scraping text, this is where we want to focus our attention at. Let’s say we wanted to scrape the “This is a test” text, we would essentially point our web scraper to scrape the text from the p1HTML tag, since it’s a unique identifier. Let’s say we wanted to do the same thing but for the This is not a test text, we would want to scrape the text from the p2 HTML tag.
Also, if you’re wondering how you can find the HTML code of a website, almost every single browser (including Safari, Firefox and Chrome) allow you to see this code, to do so right click on any portion of the website and select inspect, you may have to enable this functionality within your settings. That’s a very quick walkthrough but that type of thinking is what got me to understand this concept, let’s try this out with some actual examples!