How To Scrape Websites Using Python & BeutifulSoup4
Welcome back! I’ve discussed tons of different ways to scrape data from websites using tons of different languages / packages, now let’s talk about one of the biggest web scraping packages for Python: Beautiful Soup.
Installation
First off, we’re going to install the Beautiful Soup package, to do this use the following pip command(s):
pip install bs4
#OR
pip3 install bs4
Awesome, we’re pretty much ready to start scraping websites, but there is one more important thing to keep in mind.
Understanding HTML
If you’re wondering, yes I took this from my other article, but it’s such a good description of this process ☺️, we first must learn the layout of how the data is going to be scraped. When I started using this package I always saw a ton of tutorials speeding past this part, it led me to being stuck on tons of basic steps, so this is a very important concept to understand during web scraping, let’s take a look at the following HTML code:
<!DOCTYPE html>
<html>
<body><p1>This is a test.</p>
<p2>This is not a test.</p>
<p3>This is still a test.</p></body>
</html>
As most of you know, every single web page is built using HTML, CSS, Javascript or some variation of…