7  Data Visualization with Tabular Data

We have previously covered how to create data visualizations using the plotly package.

In that introductory chapter, we passed simple lists to the chart-making functions, however the plotly package provides an easy-to-use, intuitive interface when working with tabular data.

Now that we know how to work with DataFrame objects, let’s revisit each of the previous examples, but this time using tabular data.

7.0.1 Line Charts, Revisited

Starting with some example data, like before, this time we construct a DataFrame object from the data (because the data is in an eligible format, in this case a list of dictionaries):

from pandas import DataFrame

line_data = [
    {"date": "2020-10-01", "stock_price_usd": 100.00},
    {"date": "2020-10-02", "stock_price_usd": 101.01},
    {"date": "2020-10-03", "stock_price_usd": 120.20},
    {"date": "2020-10-04", "stock_price_usd": 107.07},
    {"date": "2020-10-05", "stock_price_usd": 142.42},
    {"date": "2020-10-06", "stock_price_usd": 135.35},
    {"date": "2020-10-07", "stock_price_usd": 160.60},
    {"date": "2020-10-08", "stock_price_usd": 162.62},
]

df = DataFrame(line_data)
df.head()
date stock_price_usd
0 2020-10-01 100.00
1 2020-10-02 101.01
2 2020-10-03 120.20
3 2020-10-04 107.07
4 2020-10-05 142.42

If we construct a DataFrame from this data, we get to skip the mapping step, and move directly to the chart-making step.

Now we have a few options about how to pass this data to the chart-making function. We can use a Series oriented approach, or a DataFrame oriented approach.

7.0.1.1 Series Oriented Approach

In the Series oriented approach, we pass the columns to the chart-making function, because when representing a column, the series is list-like:

from plotly.express import line

fig = line(x=df["date"], y=df["stock_price_usd"], height=350,
          title="Stock Prices over Time",
          labels={"x": "Date", "y": "Stock Price ($)"}
)
fig.show()

7.0.1.2 DataFrame Oriented Approach

Alternatively, we can use a DataFrame oriented approach where we pass the DataFrame as the first parameter to the chart-maker function.

from plotly.express import line

fig = line(df, x="date", y="stock_price_usd", height=350,
          title="Stock Prices over Time",
          labels={"date": "Date", "stock_price_usd": "Stock Price ($)"}
)
fig.show()

Notice, when we pass the DataFrame as the first parameter, now the x and y parameters refer to string column names in that DataFrame to be plotted on the x and y axis, respectively. The labels parameter keys now reference the column names as well.

For the remaining examples, we will use this DataFrame oriented approach.

7.0.2 Bar Charts, Revisited

Constructing a DataFrame from the raw data:

bar_data = [
    {"genre": "Thriller", "viewers": 123456},
    {"genre": "Mystery", "viewers": 234567},
    {"genre": "Sci-Fi", "viewers": 987654},
    {"genre": "Fantasy", "viewers": 876543},
    {"genre": "Documentary", "viewers": 283105},
    {"genre": "Action", "viewers": 544099},
    {"genre": "Romantic Comedy", "viewers": 121212}
]
df = DataFrame(bar_data)
df.head()
genre viewers
0 Thriller 123456
1 Mystery 234567
2 Sci-Fi 987654
3 Fantasy 876543
4 Documentary 283105

Charting the data:

from plotly.express import bar

fig = bar(df, x="genre", y="viewers", height=350,
          title="Viewership by Genre",
          labels={"genre": "Genre", "viewers": "Viewers"}
)
fig.show()

7.0.2.1 Horizontal Bar Chart, Revisited

With categorical data, a horizontal bar chart can be a better choice than a vertical bar chart. Ideally, the bars are sorted so the largest are on top. This helps tell the story of which are the “top genres”.

Before charting, we use a pandas sorting operation to get the bars in the right order:

df.sort_values(by="viewers", inplace=True)
df.head()
genre viewers
6 Romantic Comedy 121212
0 Thriller 123456
1 Mystery 234567
4 Documentary 283105
5 Action 544099
Important Note

Notice, here in order to get bars in DESCENDING order, we sort the data in ASCENDING order.

Oddly, and counter-intuitively, plotly plots the data in reverse order as it was passed in.

fig = bar(df, y="genre", x="viewers", orientation="h", height=350,
          title="Viewership by Genre",
          labels={"genre": "Genre", "viewers": "Viewers"}
)
fig.show()

7.0.3 Scatter Plots, Revisited

Constructing a DataFrame from raw data:

scatter_data = [
    {"income": 30_000, "life_expectancy": 65.5},
    {"income": 35_000, "life_expectancy": 62.1},
    {"income": 50_000, "life_expectancy": 66.7},
    {"income": 55_000, "life_expectancy": 71.0},
    {"income": 70_000, "life_expectancy": 72.5},
    {"income": 75_000, "life_expectancy": 77.3},
    {"income": 90_000, "life_expectancy": 82.9},
    {"income": 95_000, "life_expectancy": 80.0},
]
df = DataFrame(scatter_data)
df.head()
income life_expectancy
0 30000 65.5
1 35000 62.1
2 50000 66.7
3 55000 71.0
4 70000 72.5

Plotting the data:

from plotly.express import scatter

fig = scatter(df, x="income", y="life_expectancy", height=350,
          title="Life Expectancy by Income",
          labels={"income": "Income", "life_expectancy": "Life Expectancy (years)"}
)
fig.show()

7.0.4 Pie Charts, Revisited

Constructing a DataFrame from raw data:

pie_data = [
    {"company": "Company X", "market_share": 0.55},
    {"company": "Company Y", "market_share": 0.30},
    {"company": "Company Z", "market_share": 0.15}
]
df = DataFrame(pie_data)
df.head()
company market_share
0 Company X 0.55
1 Company Y 0.30
2 Company Z 0.15
from plotly.express import pie

fig = pie(df, labels="company", values="market_share", height=350,
          title="Market Share by Company",
)
fig.show()

7.0.5 Histograms, Revisited

Constructing a DataFrame from raw data:

histo_data = [
    {"user": "User A", "average_opinion": 0.1},
    {"user": "User B", "average_opinion": 0.4},
    {"user": "User C", "average_opinion": 0.4},
    {"user": "User D", "average_opinion": 0.8},
    {"user": "User E", "average_opinion": 0.86},
    {"user": "User F", "average_opinion": 0.75},
    {"user": "User G", "average_opinion": 0.90},
    {"user": "User H", "average_opinion": 0.99},
]
df = DataFrame(histo_data)
df.head()
user average_opinion
0 User A 0.10
1 User B 0.40
2 User C 0.40
3 User D 0.80
4 User E 0.86

Charting the data:

from plotly.express import histogram

fig = histogram(df, x="average_opinion", height=350, nbins=5,
          title="User Average Opinions",
          labels={"average_opinion": "Average Opinion"}
)
fig.show()