= [
line_data "date": "2020-10-01", "stock_price_usd": 100.00},
{"date": "2020-10-02", "stock_price_usd": 101.01},
{"date": "2020-10-03", "stock_price_usd": 120.20},
{"date": "2020-10-04", "stock_price_usd": 107.07},
{"date": "2020-10-05", "stock_price_usd": 142.42},
{"date": "2020-10-06", "stock_price_usd": 135.35},
{"date": "2020-10-07", "stock_price_usd": 160.60},
{"date": "2020-10-08", "stock_price_usd": 162.62},
{ ]
23 Data Visualization in Python
In Python, it is easy to create data visualizations. We can use a third-party package to help. In Python there are a handful of charting libraries, such as matplotlib
, seaborn
, and altair
, however the package we recommend is called plotly
.
Whichever package we choose, we should endeavor to follow data visualization best practices, such as adding title and axis labels to our charts.
23.1 The plotly
Package
Let’s explore how to create some basic charts using the plotly
package, focusing on a submodule called Plotly Express, which provides a number of different chart-making functions, such as bar
, line
, pie
, scatter
, etc.
In practice, we first decide which chart we would like to make, and then we consult the docs for the corresponding chart-maker function.
The examples below provide an overview of the most popular basic chart types.
In practice, making the charts is the easy part. The hard part is using our data processing techniques to get the data in a format the chart likes. In most cases, the chart-maker functions like us to pass a list of simple values to be plotted on the X and Y axis, respectively.
23.1.1 Line Charts
Starting with some example data:
Mapping the data, to get it into a format the chart likes (separate lists):
= []
dates = []
prices
for item in line_data:
"date"])
dates.append(item["stock_price_usd"])
prices.append(item[
print(dates)
print(prices)
['2020-10-01', '2020-10-02', '2020-10-03', '2020-10-04', '2020-10-05', '2020-10-06', '2020-10-07', '2020-10-08']
[100.0, 101.01, 120.2, 107.07, 142.42, 135.35, 160.6, 162.62]
from plotly.express import line
= line(x=dates, y=prices, height=350,
fig ="Stock Prices over Time",
title={"x": "Date", "y": "Stock Price ($)"}
labels
) fig.show()
23.1.2 Bar Charts
Starting with some example data:
= [
bar_data "genre": "Thriller", "viewers": 123456},
{"genre": "Mystery", "viewers": 234567},
{"genre": "Sci-Fi", "viewers": 987654},
{"genre": "Fantasy", "viewers": 876543},
{"genre": "Documentary", "viewers": 283105},
{"genre": "Action", "viewers": 544099},
{"genre": "Romantic Comedy", "viewers": 121212}
{ ]
Mapping the data, to get it into a format the chart likes (separate lists):
= []
genres = []
viewers
for item in bar_data:
"genre"])
genres.append(item["viewers"])
viewers.append(item[
print(genres)
print(viewers)
['Thriller', 'Mystery', 'Sci-Fi', 'Fantasy', 'Documentary', 'Action', 'Romantic Comedy']
[123456, 234567, 987654, 876543, 283105, 544099, 121212]
from plotly.express import bar
= bar(x=genres, y=viewers, height=350,
fig ="Viewership by Genre",
title={"x": "Genre", "y": "Viewers"}
labels
) fig.show()
23.1.2.1 Horizontal Bar Chart
A better version, horizontal bar chart with the bars sorted so the largest are on top:
from operator import itemgetter
= sorted(bar_data, key=itemgetter("viewers"))
sorted_bar_data
= []
genres = []
viewers for item in sorted_bar_data:
"genre"])
genres.append(item["viewers"])
viewers.append(item[print(genres)
print(viewers)
['Romantic Comedy', 'Thriller', 'Mystery', 'Documentary', 'Action', 'Fantasy', 'Sci-Fi']
[121212, 123456, 234567, 283105, 544099, 876543, 987654]
When sorting the data, we have to sort BEFORE mapping, to ensure the two resulting lists are in corresponding order!
With horizontal bar, we use the orientation
parameter, and also flip the X and Y references:
= bar(y=genres, x=viewers, orientation="h", height=350,
fig ="Viewership by Genre",
title={"y": "Genre", "x": "Viewers"}
labels
) fig.show()
23.1.3 Scatter Plots
We can use a scatter plot to examine the relationship between two variables (x
and y
).
Starting with some example data:
= [
scatter_data "income": 30_000, "life_expectancy": 65.5},
{"income": 35_000, "life_expectancy": 62.1},
{"income": 50_000, "life_expectancy": 66.7},
{"income": 55_000, "life_expectancy": 71.0},
{"income": 70_000, "life_expectancy": 72.5},
{"income": 75_000, "life_expectancy": 77.3},
{"income": 90_000, "life_expectancy": 82.9},
{"income": 95_000, "life_expectancy": 80.0},
{ ]
Mapping the data, to get it into a format the chart likes (separate lists):
= []
incomes = []
expectancies
for item in scatter_data:
"income"])
incomes.append(item["life_expectancy"])
expectancies.append(item[
print(incomes)
print(expectancies)
[30000, 35000, 50000, 55000, 70000, 75000, 90000, 95000]
[65.5, 62.1, 66.7, 71.0, 72.5, 77.3, 82.9, 80.0]
from plotly.express import scatter
= scatter(x=incomes, y=expectancies, height=350,
fig ="Life Expectancy by Income",
title={"x": "Income", "y": "Life Expectancy (years)"}
labels
) fig.show()
23.1.4 Pie Charts
Starting with some example data:
= [
pie_data "company": "Company X", "market_share": 0.55},
{"company": "Company Y", "market_share": 0.30},
{"company": "Company Z", "market_share": 0.15}
{ ]
Mapping the data, to get it into a format the chart likes (separate lists):
= []
companies = []
market_shares
for item in pie_data:
"company"])
companies.append(item["market_share"])
market_shares.append(item[
print(companies)
print(market_shares)
['Company X', 'Company Y', 'Company Z']
[0.55, 0.3, 0.15]
from plotly.express import pie
= pie(labels=companies, values=market_shares, height=350,
fig ="Market Share by Company",
title
) fig.show()
23.1.5 Histograms
Starting with some example data:
= [
histo_data "user": "User A", "average_opinion": 0.1},
{"user": "User B", "average_opinion": 0.4},
{"user": "User C", "average_opinion": 0.4},
{"user": "User D", "average_opinion": 0.8},
{"user": "User E", "average_opinion": 0.86},
{"user": "User F", "average_opinion": 0.75},
{"user": "User G", "average_opinion": 0.90},
{"user": "User H", "average_opinion": 0.99},
{ ]
Mapping the data, to get it into a format the chart likes (separate lists):
= [item["average_opinion"] for item in histo_data]
opinions print(opinions)
[0.1, 0.4, 0.4, 0.8, 0.86, 0.75, 0.9, 0.99]
from plotly.express import histogram
= histogram(x=opinions, height=350, nbins=5,
fig ="User Average Opinions",
title={"x": "Average Opinion"}
labels
) fig.show()