How To Build a TikTok Scraping Tool In Python!

TikTok is one of my favorite apps, and I love Python, so why not combine the two and create a tool that scrapes data from any given TikTok video! The tool is going to look something like this:

Prerequisites

For this project we’ll be using streamlit for the front end GUI, I have created an article talking about the basics of this package, I would recommend reading the article I made about this yesterday, read that article here if you want and install the streamlit package by:

pip install streamlit
pip install selenium
pip install pandas

Let’s Start building!

First off, let’s start off by importing these packages:

#IMPORT THESE PACKAGES
import streamlit as st
import selenium
from selenium import webdriver
import pandas as pd
#OPTIONAL PACKAGE, BUY MAYBE NEEDED
from webdriver_manager.chrome import ChromeDriverManager
# CREATES A EMPTY DATAFRAME
data1 = {'LIKES': [], 'COMMENTS': [], 'SHARES': [], 'USERNAME': [], }
fulldf = pd.DataFrame(data1)
#CREATING TITLE OF THE TOOL
st.title('TikTok Tool')
#CREATING USER INPUT, WE WILL PASTE TIKTOK URL HERE
user_input = st.text_input("Paste TikTok Link Below", "tiktok.com")
#SEGMENT COLUMNS FOR RIGHT AND LEFT
left_column, right_column = st.beta_columns(2)

#CREATING BUTTON FOR DATA PULL FROM TIKTOK
pressed2 = left_column.button('Press For Data Pull')
#CREATING THE LOADING BAR
'Loading'
#CREATING AN EMPTY LOADING BAR
latest_iteration = st.empty()
bar = st.progress(0)
if pressed2:
# THIS INITIALIZES THE DRIVER (AKA THE WEB BROWSER)
driver = webdriver.Chrome(ChromeDriverManager().install())
#INCREASING PROGRESS BAR TO 30%
bar.progress(30)
# THIS PRETTY MUCH TELLS THE WEB BROWSER WHICH WEBSITE TO GO TO
driver.get(user_input)
#INCREASING PROGRESS BAR TO 50%
bar.progress(50)
# THIS IS THE IMPORTANT PART SO I'LL BREAK IT DOWN IN A COUPLE DIFFERENT PARTS LOL

# THIS 'TEXT' PORTION | THIS PORTION WILL TAKE THE ELEMENT THAT
# PRETTY MUCH STORES THE | WE WANT FROM THE WEBSITE, THE .TEXT WILL
# WEBSITE DATA THAT WE WANT | SAVE THE INFORMATION AS A TEXT DOCUMENT
# IN THIS VARIABLE |
LIKES = driver.find_element_by_xpath(
'/html/body/div/div/div[2]/div[2]/div/div/main/div/div[1]/span[1]/div/div[1]/div[4]/div[2]/div[1]/strong').text
# THIS LINE GETS THE NUMBER OF COMMENTS
COMMENTS = driver.find_element_by_xpath(
'/html/body/div/div/div[2]/div[2]/div/div/main/div/div[1]/span[1]/div/div[1]/div[4]/div[2]/div[2]/strong').text
# THIS LINE GETS NUMBER OF SHARES
SHARES = driver.find_element_by_xpath(
'/html/body/div/div/div[2]/div[2]/div/div/main/div/div[1]/span[1]/div/div[1]/div[4]/div[2]/div[3]/strong').text
# THIS GETS USERNAME OF URL
USERNAME = driver.find_element_by_xpath(
'/html/body/div/div/div[2]/div[2]/div/div/main/div/div[1]/span[1]/div/div[1]/div[1]/a[1]/h3').text
#INCREASING PROGRESS BAR TO 70% AND THEN TO 90%
bar.progress(70)
bar.progress(90)
# APPENDING THE DATA PULLED FROM ABOVE INTO THE EXISTING DATAFRAME
row = [LIKES, COMMENTS, SHARES, USERNAME]
fulldf.loc[len(fulldf.index)] = row
#INCREASING PROGRESS BAR TO 100%
bar.progress(100)
'Data Pull Is Done!'
#CLOSING OUT WEB BROWSER
driver.close()
#PRINTING OUT DATAFRAME TO UI
fulldf
#IMPORT THESE PACKAGES
import streamlit as st
import pandas as pd
import selenium
from selenium import webdriver
import pandas as pd
#OPTIONAL PACKAGE, BUY MAYBE NEEDED
from webdriver_manager.chrome import ChromeDriverManager

# CREATES A EMPTY DATAFRAME
data1 = {'LIKES': [], 'COMMENTS': [], 'SHARES': [], 'USERNAME': [], }
fulldf = pd.DataFrame(data1)

#CREATING TITLE OF THE TOOL
st.title('TikTok Tool')

#CREATING USER INPUT, WE WILL PASTE TIKTOK URL HERE
user_input = st.text_input("Paste TikTok Link Below", "tiktok.com")

#SEGMENT COLUMNS FOR RIGHT AND LEFT
left_column, right_column = st.beta_columns(2)

#CREATING BUTTON FOR DATA PULL FROM TIKTOK
pressed2 = left_column.button('Press For Data Pull')

#CREATING THE LOADING BAR
'Loading'
#CREATING AN EMPTY LOADING BAR
latest_iteration = st.empty()
bar = st.progress(0)

#THE IF STATEMENT FOR THE BUTTON, WHEN IT'S PRESSED IT WILL EXECUTE THIS COMMAND
if pressed2:
# THIS INITIALIZES THE DRIVER (AKA THE WEB BROWSER)
driver = webdriver.Chrome(ChromeDriverManager().install())
#INCREASING PROGRESS BAR TO 30%
bar.progress(30)
# THIS PRETTY MUCH TELLS THE WEB BROWSER WHICH WEBSITE TO GO TO
driver.get(user_input)
#INCREASING PROGRESS BAR TO 50%
bar.progress(50)

# THIS IS THE IMPORTANT PART SO I'LL BREAK IT DOWN IN A COUPLE DIFFERENT PARTS LOL

# THIS 'TEXT' PORTION | THIS PORTION WILL TAKE THE ELEMENT THAT
# PRETTY MUCH STORES THE | WE WANT FROM THE WEBSITE, THE .TEXT WILL
# WEBSITE DATA THAT WE WANT | SAVE THE INFORMATION AS A TEXT DOCUMENT
# IN THIS VARIABLE |
LIKES = driver.find_element_by_xpath(
'/html/body/div/div/div[2]/div[2]/div/div/main/div/div[1]/span[1]/div/div[1]/div[4]/div[2]/div[1]/strong').text
# THIS LINE GETS THE NUMBER OF COMMENTS
COMMENTS = driver.find_element_by_xpath(
'/html/body/div/div/div[2]/div[2]/div/div/main/div/div[1]/span[1]/div/div[1]/div[4]/div[2]/div[2]/strong').text
# THIS LINE GETS NUMBER OF SHARES
SHARES = driver.find_element_by_xpath(
'/html/body/div/div/div[2]/div[2]/div/div/main/div/div[1]/span[1]/div/div[1]/div[4]/div[2]/div[3]/strong').text
# THIS GETS USERNAME OF URL
USERNAME = driver.find_element_by_xpath(
'/html/body/div/div/div[2]/div[2]/div/div/main/div/div[1]/span[1]/div/div[1]/div[1]/a[1]/h3').text
#INCREASING PROGRESS BAR TO 70% AND THEN TO 90%
bar.progress(70)
bar.progress(90)
# APPENDING THE DATA PULLED FROM ABOVE INTO THE EXISTING DATAFRAME
row = [LIKES, COMMENTS, SHARES, USERNAME]
fulldf.loc[len(fulldf.index)] = row
#INCREASING PROGRESS BAR TO 100%
bar.progress(100)
'Data Pull Is Done!'
#CLOSING OUT WEB BROWSER
driver.close()
#PRINTING OUT DATAFRAME TO UI
fulldf

Running The Code

To run the code we want to make sure to save the file, then copy the directory of the .py file. We then want to make our way over to our Terminal / Command Prompt, and, using the streamlit commands, run this .py file, the syntax should look something like this:

streamlit run DIRECTORY TO YOUR PYTHON FILE/file.py
streamlit run /Users/users/Documents/tiktokuiproject.py

Data Scientist / Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store