Let’s build a TikTok scraping tool with Python!

Welcome back! Let’s go ahead and develop a TikTok scraping tool using Python, now this specific tool is a little bit different than the other one’s i’ve created. First off, we’ll be running this program from our terminal / command prompt which will essentially be our front end for this program. The system of this program is this: User runs this program in their terminal > inputs a link to a TikTok users profile > The program will output the number of followers, following and total likes on their profile. If all that sounds good to you then let’s get started!

First off

As always, if you have any suggestions, thoughts or just want to connect, feel free to contact / follow me on Twitter! Also, below is a link to some of my favorite resources for learning programming, Python, R, Data Science, etc.

Let’s begin!

Basic introduction you could probably skip that I copied from my other article

First things first, we will need to have Python installed, read my article here to make sure you have Python and some IDE installed. Next, I wrote an article on using Selenium in Python, Selenium is a web scraping package that allows us to mimic a web browser using Python, it might be best to read that article for more of an understanding on web scraping, but it’s not a necessity, you can read that article here.

Let’s get started!

Now that we have our Python environment setup, let’s open up a blank Python script. Let’s import the Selenium package that you hopefully preinstalled from the previous paragraph (just pip install selenium). Once installed, import the following packages:

#IMPORT THESE PACKAGES
import selenium
from selenium import webdriver
import pandas as pd
#OPTIONAL PACKAGE, BUY MAYBE NEEDED
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options

As I stated in my previous articles, we are using the Google Chrome browser as our GUI, but you can use other browsers within Selenium, if you’de like to use a different browser go for it! Make sure to have the specific browser installed on your machine.

Now a little new addition, we want this program to not load up the full Google Chrome browser every time, this means we want to run this program inside of a “headless” environment, meaning that the internet browser won’t come on the screen, we do that by using the following code:

options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')

Now, within Selenium we need to define our web browser, so let’s do so by using the following line of code:

#THIS INITIALIZES THE DRIVER (AKA THE WEB BROWSER)
driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=options)

I would recommend running all of your code up to this point and see if the code runs successfully, if so you’re pretty much ready to continue!

TikTok Data Pipeline

Now let’s go ahead and develop our mini TikTok data pipeline, remember, the user will input the link to a TikTok user profile and the program will output the followers, following and likes. To do this, we will need to setup a pandas data frame with those data points as the columns within the data frame, to do this, use the following code:

data1 = {'Followers': [], 'Following': [], 'Likes': []}
fulldf = pd.DataFrame(data1)

Awesome, next up we want to get the actual input from the user in the command prompt / terminal, in order to do this we want to use the “input” command from base Python, we then store that input within a variable. In this specific case, we’ll be bringing in a link, so let’s ask the user to input a link and store it inside a variable called list:

link = input("Enter the link here: ")

Great, now we want to use the Selenium “driver.get” function to point our browser to the link the user pasted in, to do this, we use the following command:

driver.get("" + link)
time.sleep(2)

The “time.sleep(2)” line just tells the program to wait 2 seconds before moving on to the next steps, this is important since we need to make sure the page is loaded before moving on.

Now we need to make our way to any of the TikTok user profiles and gather our data points from that website, remember, all of the TikTok user profiles are the same, so we just need to do this one time and it works for all of them. Let’s start off by copying and pasting the following lines of code to your Python enviornment:

Followers = driver.find_element_by_xpath('PASTE XPATH HERE').text
Following = driver.find_element_by_xpath('PASTE XPATH HERE').text
Likes = driver.find_element_by_xpath('PASTE XPATH HERE').text

Now, all we have to do is make our way to any TikTok user profile page > Hover over the following number > Right click & click on inspect element, you will see the inspector load up somewhere on the page:

Now all we have to do is right click over the corresponding number within the inspector > click copy > copy full xpath, use the following image as an example of how to do this:

Copy and replace the capitalized line of code within the “Followers” variable we setup in our Python code. We now want to do the same exact thing to the following number and the likes number, our code will now look something like this (keep in mind TikTok may change their website in the future so it maybe best to verify this works):

Followers = driver.find_element_by_xpath('/html/body/div/div/div[2]/div[2]/div/header/h2[1]/div[2]/strong').text
Following = driver.find_element_by_xpath('/html/body/div/div/div[2]/div[2]/div/header/h2[1]/div[1]/strong').text
Likes = driver.find_element_by_xpath('/html/body/div/div/div[2]/div[2]/div/header/h2[1]/div[3]/strong').text

We are almost done, at this point all we want to do is let the user know the specific data points we pulled in and put those data points inside of the data frame we just created in the beginning of our program, we will use the following lines of code to accomplish this:

print(Followers)
print(Following)
print(Likes)
row = [Followers, Following, Likes]
fulldf.loc[len(fulldf)] = row

Believe it or not we are actually done

Running This Program

All we have to do now is save this program as a .py file anywhere on our PC, make sure it’s somewhere accessible because we then need to copy and paste this file path. Once saved, we want to make our way over to our terminal / command prompt and type in the following command:

python3 PATH/TO/.PY/FILE/HERE

This is what this line will look like for me:

Once we run this program this is what the output will look like:

This is awesome, all we have to do now is type in the full TikTok user profile URL to our terminal, once we do that our terminal should show us the number of followers, number of users that user is following and the number of total likes within this account, this is what my terminal shows me:

YAY! This program has just ran successfully! That’s all we have to do to get this basic tool all setup, you should definitely be proud if you were able to develop out this program. At this point I would highly suggest for you to iterate on this project, can you add a check in case someone puts in a link that's invalid? Are there any other data points you could print out and store in the data set? Could you make the program just take in a username rather than the URL and have the program run that way? These types of questions are important when improving on a project, other than that, hopefully you enjoyed this article!

Remember, if you have any suggestions, thoughts or just want to connect, feel free to contact / follow me on Twitter! Also, below is a link to some of my favorite resources for learning programming, Python, R, Data Science, etc.

❤️

Data Scientist / Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store