Predict etherum price using Machine Learning + Python

Welcome back! A few weeks ago I walked through some methods on predicting stock prices using Machine Learning and Python, we also hit on how to essentially do the same thing with Doge Coin and Bitcoin, now let’s try to do the same thing with Ethereum. Now, this is a pretty high level walkthrough, this is not a full tutorial on learning Machine Learning, more so looking at some capability that Machine Learning may have. First off, we’re going to be using Google Colab to run this code, luckily for us this code was pretty much already developed, please give all the credit to this website for the code, I did add a little bit more functionality with the attached code! Here is the link to access the Google Colab project, you can also copy the code over to your Python IDE if you prefer.

Quick Note: This is a somewhat advanced tutorial, I’m assuming you know you’re way around Python / Object Oriented Programming, a bit of Machine learning, and familiarity with Pandas / Numpy. I have tutorials that should bring you up to speed, but here’s a Basic introduction to Machine Learning that I wrote up, okay now let’s get started!

Importing / Installing the packages

As always we’re going to have to import these packages, we’re using numpy, pandas, matplotlib and SciKit-Learn as well as Yahoo’s finance packages, you can pip install these but since we’re using Google Colab these are already built in. We install these packages by doing this:

#INSTALL THESE PACKAGES IF NEEDED
!pip install yfinance
!pip install yahoofinancials

Now let’s import all of the packages we need, we do this by using the following statements:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error as mae
from sklearn.preprocessing import StandardScaler
import yfinance as yf
from yahoofinancials import YahooFinancials

Awesome! Next up we need to import the dataset, we do this by using the Yahoo finance package yf. command, this will allow us to download the data using the ticket symbol and date range, this is the following code to do so:

#CREATING THE DATAFRAME TO STORE DATA
df = yf.download('ETH-USD',
start='2021-01-01',
end='2021-04-01',
progress=False)
df.head()

We can print this data out using the df.head() function:

You can also bring in another dataset if you choose to do so. Next up, we want to get the close price of our crypto and store it in a different variable, we also want to reshape that data frame, we do so by using the following line:

series = df[‘Close’].values.reshape(-1, 1)

Great! The reason we store this in another variable is because we’re going to fit our machine learning model to that specific value. Next up we have to normalize the data, we first start off by declaring our scaler, this will allow us to have a mean value of 0 while having our standard deviation of 1, we would then fit our close data we created in the code above to the scaler we just created, we then declare the “series” variable back to the transformed scaler which is transformed into a 1D array using the “.flatten” command within Numpy. Seems like a lot (and it is) but here is the code to do so:

scaler = StandardScaler()
scaler.fit(series[:len(series) // 2])
series = scaler.transform(series).flatten()

Awesome! Now we must create some new data frames that will help us hold the data for us, let’s go line by line (almost), let’s go ahead and start by creating these variables / empty dataframes:

T = 10
D = 1
X = []
Y = []

Awesome! now we are going to use “T” as our past variable, aka how many days we’re going back in order to predict the future. Next up, we’re going to use a for loop to go through our series data, so let’s start off by declaring our for loop, we are going to use the following line:

for t in range(len(series) — T):

Notice that we are using a lowercase “t” which is our counter in this specific example, next up let’s fill our for loop up, so now we want to store our series data using our counter into another variable (x in this example) by slicing the dataset, then append that data to our uppercase X data frame that we declared above, here are those lines in our for loop:

x = series[t:t+T]
X.append(x)

Now we do the same thing but instead of slicing the dataset we’re going to be just using it as a counter within the series dataset, we then append that same data to the Y data frame we created earlier.

y = series[t+T]
Y.append(y)

Finally, we want to reshape our data frame, this will basically give a new shape to our data frame without changing any of the data in this data frame, we will then create an array for the “Y” data frame as well, finally we will get the length of the “X” array and store it in a new variable called “N”.

X = np.array(X).reshape(-1, T)
Y = np.array(Y)
N = len(X)
print(“X.shape”, X.shape, “Y.shape”, Y.shape)

Awesome! We’re now going to have to create a class for our Machine Learning model, this is the fun stuff! Let’s start off by creating a class called BaselineModel, then define a function with the following code:

class BaselineModel:
def predict(self, X):
return X[:,-1] # return the last value for each input sequence

Next up we’re going to have to split up our data to a train and test set. We do so by creating the Xtrain & Train variables, we then use the “X” and “N” variables we used before to fill those variables with data, we essentially do the same thing with our “Xtest” and “Ytest” variables with the other half of the data for our test set:

Xtrain, Ytrain = X[:-N//2], Y[:-N//2]
Xtest, Ytest = X[-N//2:], Y[-N//2:]

Awesome! Next up let’s go ahead and setup our model, we’re going to create a “model” variable that holds our “BaselineModel” class, we’re going to create some new variables to pass our train and testing data frames, we do so by using the following code:

model = BaselineModel()
Ptrain = model.predict(Xtrain)
Ptest = model.predict(Xtest)

Great! Now we’re going to go ahead and reshape our arrays once more and store them into another variable as well as create the 1D array with Numpy:

Ytrain2 = scaler.inverse_transform(Ytrain.reshape(-1, 1)).flatten()
Ytest2 = scaler.inverse_transform(Ytest.reshape(-1, 1)).flatten()
Ptrain2 = scaler.inverse_transform(Ptrain.reshape(-1, 1)).flatten()
Ptest2 = scaler.inverse_transform(Ptest.reshape(-1, 1)).flatten()

Almost Done! Now we’re going to go ahead and send our data to pretty much be forecasted, the future data will be appended into our “forecast” variable, then our data will be plotted using the package matplotlib! This is the code to do that:

# right forecast
forecast = []
input_ = Xtest[0]
while len(forecast) < len(Ytest):
f = model.predict(input_.reshape(1, T))[0]
forecast.append(f)
# make a new input with the latest forecast
input_ = np.roll(input_, -1)
input_[-1] = f
plt.plot(Ytest, label=’target’)
plt.plot(forecast, label=’prediction’)
plt.legend()
plt.title(“Right forecast”)
plt.show()

And this is our output!

Congrats! You’ve just predicted the future of Ethereum using Machine Learning! I would highly recommend not using this as financial advice at all, this is just a project to develop out your skills. Although we were able to predict the future, you can see it wasn’t completely accurate, one reason I think this maybe the case is the extreme volatility recently of Ethereum, these assumptions are very important to consider when developing out Machine Learning projects.

As Always

As always, if you have any suggestions, thoughts or just want to connect, feel free to contact / follow me on Twitter! Also, below is a link to some of my favorite resources for learning programming, Python, R, Data Science, etc.

Thanks so much for reading!

Data Scientist / Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store