23  Data Visualization in Python

In Python, it is easy to create data visualizations. We can use a third-party package to help. In Python there are a handful of charting libraries, such as matplotlib, seaborn, and altair, however the package we recommend is called plotly.

Whichever package we choose, we should endeavor to follow data visualization best practices, such as adding title and axis labels to our charts.

23.1 The plotly Package

Let’s explore how to create some basic charts using the plotly package, focusing on a submodule called Plotly Express, which provides a number of different chart-making functions, such as bar, line, pie, scatter, etc.

In practice, we first decide which chart we would like to make, and then we consult the docs for the corresponding chart-maker function.

The examples below provide an overview of the most popular basic chart types.

In practice, making the charts is the easy part. The hard part is using our data processing techniques to get the data in a format the chart likes. In most cases, the chart-maker functions like us to pass a list of simple values to be plotted on the X and Y axis, respectively.

23.1.1 Line Charts

Starting with some example data:

line_data = [
    {"date": "2020-10-01", "stock_price_usd": 100.00},
    {"date": "2020-10-02", "stock_price_usd": 101.01},
    {"date": "2020-10-03", "stock_price_usd": 120.20},
    {"date": "2020-10-04", "stock_price_usd": 107.07},
    {"date": "2020-10-05", "stock_price_usd": 142.42},
    {"date": "2020-10-06", "stock_price_usd": 135.35},
    {"date": "2020-10-07", "stock_price_usd": 160.60},
    {"date": "2020-10-08", "stock_price_usd": 162.62},
]

Mapping the data, to get it into a format the chart likes (separate lists):

dates = []
prices = []

for item in line_data:
    dates.append(item["date"])
    prices.append(item["stock_price_usd"])

print(dates)
print(prices)
['2020-10-01', '2020-10-02', '2020-10-03', '2020-10-04', '2020-10-05', '2020-10-06', '2020-10-07', '2020-10-08']
[100.0, 101.01, 120.2, 107.07, 142.42, 135.35, 160.6, 162.62]
from plotly.express import line

fig = line(x=dates, y=prices, height=350,
          title="Stock Prices over Time",
          labels={"x": "Date", "y": "Stock Price ($)"}
)
fig.show()

23.1.2 Bar Charts

Starting with some example data:

bar_data = [
    {"genre": "Thriller", "viewers": 123456},
    {"genre": "Mystery", "viewers": 234567},
    {"genre": "Sci-Fi", "viewers": 987654},
    {"genre": "Fantasy", "viewers": 876543},
    {"genre": "Documentary", "viewers": 283105},
    {"genre": "Action", "viewers": 544099},
    {"genre": "Romantic Comedy", "viewers": 121212}
]

Mapping the data, to get it into a format the chart likes (separate lists):

genres = []
viewers = []

for item in bar_data:
    genres.append(item["genre"])
    viewers.append(item["viewers"])

print(genres)
print(viewers)
['Thriller', 'Mystery', 'Sci-Fi', 'Fantasy', 'Documentary', 'Action', 'Romantic Comedy']
[123456, 234567, 987654, 876543, 283105, 544099, 121212]
from plotly.express import bar

fig = bar(x=genres, y=viewers, height=350,
          title="Viewership by Genre",
          labels={"x": "Genre", "y": "Viewers"}
)
fig.show()

23.1.2.1 Horizontal Bar Chart

A better version, horizontal bar chart with the bars sorted so the largest are on top:

from operator import itemgetter

sorted_bar_data = sorted(bar_data, key=itemgetter("viewers"))

genres = []
viewers = []
for item in sorted_bar_data:
    genres.append(item["genre"])
    viewers.append(item["viewers"])
print(genres)
print(viewers)
['Romantic Comedy', 'Thriller', 'Mystery', 'Documentary', 'Action', 'Fantasy', 'Sci-Fi']
[121212, 123456, 234567, 283105, 544099, 876543, 987654]
Important Note

When sorting the data, we have to sort BEFORE mapping, to ensure the two resulting lists are in corresponding order!

With horizontal bar, we use the orientation parameter, and also flip the X and Y references:

fig = bar(y=genres, x=viewers, orientation="h", height=350,
          title="Viewership by Genre",
          labels={"y": "Genre", "x": "Viewers"}
)
fig.show()

23.1.3 Scatter Plots

We can use a scatter plot to examine the relationship between two variables (x and y).

Starting with some example data:

scatter_data = [
    {"income": 30_000, "life_expectancy": 65.5},
    {"income": 30_000, "life_expectancy": 62.1},
    {"income": 50_000, "life_expectancy": 66.7},
    {"income": 50_000, "life_expectancy": 71.0},
    {"income": 70_000, "life_expectancy": 72.5},
    {"income": 70_000, "life_expectancy": 77.3},
    {"income": 90_000, "life_expectancy": 82.9},
    {"income": 90_000, "life_expectancy": 80.0},
]

Mapping the data, to get it into a format the chart likes (separate lists):

incomes = []
expectancies = []

for item in scatter_data:
    incomes.append(item["income"])
    expectancies.append(item["life_expectancy"])

print(incomes)
print(expectancies)
[30000, 30000, 50000, 50000, 70000, 70000, 90000, 90000]
[65.5, 62.1, 66.7, 71.0, 72.5, 77.3, 82.9, 80.0]
from plotly.express import scatter

fig = scatter(x=incomes, y=expectancies, height=350,
          title="Life Expectancy by Income",
          labels={"x": "Income", "y": "Life Expectancy (years)"}
)
fig.show()

23.1.4 Pie Charts

Starting with some example data:

pie_data = [
    {"company": "Company X", "market_share": 0.55},
    {"company": "Company Y", "market_share": 0.30},
    {"company": "Company Z", "market_share": 0.15}
]

Mapping the data, to get it into a format the chart likes (separate lists):

companies = []
market_shares = []

for item in pie_data:
    companies.append(item["company"])
    market_shares.append(item["market_share"])

print(companies)
print(market_shares)
['Company X', 'Company Y', 'Company Z']
[0.55, 0.3, 0.15]
from plotly.express import pie

fig = pie(labels=companies, values=market_shares, height=350,
          title="Market Share by Company",
)
fig.show()

23.1.5 Histograms

Starting with some example data:

histo_data = [
    {"user": "User A", "average_opinion": 0.1},
    {"user": "User B", "average_opinion": 0.4},
    {"user": "User C", "average_opinion": 0.4},
    {"user": "User D", "average_opinion": 0.8},
    {"user": "User E", "average_opinion": 0.86},
    {"user": "User F", "average_opinion": 0.75},
    {"user": "User G", "average_opinion": 0.90},
    {"user": "User H", "average_opinion": 0.99},
]

Mapping the data, to get it into a format the chart likes (separate lists):

opinions = [item["average_opinion"] for item in histo_data]
print(opinions)
[0.1, 0.4, 0.4, 0.8, 0.86, 0.75, 0.9, 0.99]
from plotly.express import histogram

fig = histogram(x=opinions, height=350, nbins=5,
          title="User Average Opinions",
          labels={"x": "Average Opinion"}
)
fig.show()