Webscraping Stock Market

This was one of the projects I did when took the IBM Data Analyst Professional Certification.

Project Overview

For this project, I will assume the role of a Data Scientist / Data Analyst working for a new startup investment firm that helps customers invest their money in stocks. My job is to extract financial data like historical share price and quarterly revenue reportings from various sources using Python libraries and web scraping on popular stocks. After collecting this data I will visualize it in a dashboard to identify patterns or trends. The stocks we will work with are Tesla and GameStop.


I started with importing all the libraries that I need.

import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup as soup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

Since we have already imported the yfinance, we can now use the `Ticker` function to extract the stock data. The stock is Tesla and its ticker symbol is `TSLA`.

tesla = yf.Ticker("TSLA")

Using the ticker object and the function `history` extract stock information and save it in a dataframe named `tesla_data`. Set the `period` parameter to `max` so we get information for the maximum amount of time.

Reset the index using the `reset_index(inplace=True)` function on the tesla_data DataFrame and display the first five rows of the `tesla_data` dataframe using the `head` function.

tesla_data = tesla.history(period="max")

## view the first 5 row of the data

Now that we have collected the the stock data using yFiance. Lets get the company revenue using the request

from urllib.request import urlopen as uReq
my_url_tsla = 'https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue'
uClient = uReq(my_url_tsla)
html_data = uClient.read()

Using html parser to parse the site

page_soup_tsla = soup(html_data, "html.parser")

we can then now scrape the data off the website

tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])

tesla_data_html = page_soup_tsla.findAll("div",{"class":"col-xs-6"})[1].find("tbody").find_all("tr")
for row in tesla_data_html:
	col = row.find_all("td")
	date =col[0].text
	revenue = col[1].text.replace("$", "").replace(",", "")
	tesla_revenue = tesla_revenue.append({"Date":date, "Revenue": revenue}, ignore_index=True)

Upon checking the data types i realised that the revenue data type was in object so i have to cast it to float so that i can create the graph later on.

#casting to float

tesla_revenue["Revenue"] = pd.to_numeric(tesla_revenue["Revenue"], downcast="float")

Now that we have done Tesla part, now we can do the exact thing to Game Stop. The ticker name for GameStop is "GME"

GameStop = yf.Ticker("GME")
gme_data = GameStop.history(period="max")

Scraping GameStop revenue off the website.

from urllib.request import urlopen as uReq
my_url_gme = 'https://www.macrotrends.net/stocks/charts/GME/gamestop/revenue'
uClient = uReq(my_url_gme)
html_data = uClient.read()
page_soup_gme = soup(html_data, "html.parser")
gme_revenue = pd.DataFrame(columns=["Date", "Revenue"])

gme_data_html = page_soup_gme.findAll("div",{"class":"col-xs-6"})[1].find("tbody").find_all("tr")
for row in gme_data_html:
    col = row.find_all("td")
    date =col[0].text
    revenue = col[1].text.replace("$", "").replace(",", "")
    gme_revenue = gme_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)

Checking data types and casting float to revenue

gme_revenue['Revenue'] = pd.to_numeric(gme_revenue['Revenue'], downcast="float")

Now that we have the data we need, we can use the below pre defined graph script to create a simple line chart.

def make_graph(stock_data, revenue_data, stock):
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data.Date, infer_datetime_format=True), y=stock_data.Close.astype("float"), name="Share Price"), row=1, col=1)
    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data.Date, infer_datetime_format=True), y=revenue_data.Revenue.astype("float"), name="Revenue"), row=2, col=1)
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)

Lets create the graph for Tesla.

make_graph(tesla_data, tesla_revenue, 'Tesla')

Now lets create the graph for GameStop.

make_graph(gme_data, gme_revenue, 'GameStop')


My initial hypothesis was that the revenue and the stock price are corelated. The share price will increase together the revenue, the more money the company make the higher share price is. It is true for the case of Tesla, but it is different fror GameStop.
As you can see above graph, there is no corelation between the share price and the company revenue. The company revenue is very fluctuate and there is a small downward trend, but the share price of GameStop is very consistent over the year but interestingly there is a spike in year 2021. The reason for the short squeeze is very intereting and you can learn more about it here.