7 Data Visualization with Tabular Data

We have previously covered how to create data visualizations using the plotly package.

In that introductory chapter, we passed simple lists to the chart-making functions, however the plotly package provides an easy-to-use, intuitive interface when working with tabular data.

Now that we know how to work with DataFrame objects, let’s revisit each of the previous examples, but this time using tabular data.

7.0.1 Line Charts, Revisited

Starting with some example data, like before, this time we construct a DataFrame object from the data (because the data is in an eligible format, in this case a list of dictionaries):

from pandas import DataFrame

line_data = [
    {"date": "2020-10-01", "stock_price_usd": 100.00},
    {"date": "2020-10-02", "stock_price_usd": 101.01},
    {"date": "2020-10-03", "stock_price_usd": 120.20},
    {"date": "2020-10-04", "stock_price_usd": 107.07},
    {"date": "2020-10-05", "stock_price_usd": 142.42},
    {"date": "2020-10-06", "stock_price_usd": 135.35},
    {"date": "2020-10-07", "stock_price_usd": 160.60},
    {"date": "2020-10-08", "stock_price_usd": 162.62},
]

df = DataFrame(line_data)
df.head()

	date	stock_price_usd
0	2020-10-01	100.00
1	2020-10-02	101.01
2	2020-10-03	120.20
3	2020-10-04	107.07
4	2020-10-05	142.42

If we construct a DataFrame from this data, we get to skip the mapping step, and move directly to the chart-making step.

Now we have a few options about how to pass this data to the chart-making function. We can use a Series oriented approach, or a DataFrame oriented approach.

7.0.1.1 `Series` Oriented Approach

In the Series oriented approach, we pass the columns to the chart-making function, because when representing a column, the series is list-like:

from plotly.express import line

fig = line(x=df["date"], y=df["stock_price_usd"], height=350,
          title="Stock Prices over Time",
          labels={"x": "Date", "y": "Stock Price ($)"}
)
fig.show()

7.0.1.2 `DataFrame` Oriented Approach

Alternatively, we can use a DataFrame oriented approach where we pass the DataFrame as the first parameter to the chart-maker function.

from plotly.express import line

fig = line(df, x="date", y="stock_price_usd", height=350,
          title="Stock Prices over Time",
          labels={"date": "Date", "stock_price_usd": "Stock Price ($)"}
)
fig.show()

Notice, when we pass the DataFrame as the first parameter, now the x and y parameters refer to string column names in that DataFrame to be plotted on the x and y axis, respectively. The labels parameter keys now reference the column names as well.

For the remaining examples, we will use this DataFrame oriented approach.

7.0.2 Bar Charts, Revisited

Constructing a DataFrame from the raw data:

bar_data = [
    {"genre": "Thriller", "viewers": 123456},
    {"genre": "Mystery", "viewers": 234567},
    {"genre": "Sci-Fi", "viewers": 987654},
    {"genre": "Fantasy", "viewers": 876543},
    {"genre": "Documentary", "viewers": 283105},
    {"genre": "Action", "viewers": 544099},
    {"genre": "Romantic Comedy", "viewers": 121212}
]
df = DataFrame(bar_data)
df.head()

	genre	viewers
0	Thriller	123456
1	Mystery	234567
2	Sci-Fi	987654
3	Fantasy	876543
4	Documentary	283105

Charting the data:

from plotly.express import bar

fig = bar(df, x="genre", y="viewers", height=350,
          title="Viewership by Genre",
          labels={"genre": "Genre", "viewers": "Viewers"}
)
fig.show()

7.0.2.1 Horizontal Bar Chart, Revisited

With categorical data, a horizontal bar chart can be a better choice than a vertical bar chart. Ideally, the bars are sorted so the largest are on top. This helps tell the story of which are the “top genres”.

Before charting, we use a pandas sorting operation to get the bars in the right order:

df.sort_values(by="viewers", inplace=True)
df.head()

	genre	viewers
6	Romantic Comedy	121212
0	Thriller	123456
1	Mystery	234567
4	Documentary	283105
5	Action	544099

Important Note

Notice, here in order to get bars in DESCENDING order, we sort the data in ASCENDING order.

Oddly, and counter-intuitively, plotly plots the data in reverse order as it was passed in.

fig = bar(df, y="genre", x="viewers", orientation="h", height=350,
          title="Viewership by Genre",
          labels={"genre": "Genre", "viewers": "Viewers"}
)
fig.show()

7.0.3 Scatter Plots, Revisited

Constructing a DataFrame from raw data:

scatter_data = [
    {"income": 30_000, "life_expectancy": 65.5},
    {"income": 35_000, "life_expectancy": 62.1},
    {"income": 50_000, "life_expectancy": 66.7},
    {"income": 55_000, "life_expectancy": 71.0},
    {"income": 70_000, "life_expectancy": 72.5},
    {"income": 75_000, "life_expectancy": 77.3},
    {"income": 90_000, "life_expectancy": 82.9},
    {"income": 95_000, "life_expectancy": 80.0},
]
df = DataFrame(scatter_data)
df.head()

	income	life_expectancy
0	30000	65.5
1	35000	62.1
2	50000	66.7
3	55000	71.0
4	70000	72.5

Plotting the data:

from plotly.express import scatter

fig = scatter(df, x="income", y="life_expectancy", height=350,
          title="Life Expectancy by Income",
          labels={"income": "Income", "life_expectancy": "Life Expectancy (years)"}
)
fig.show()

7.0.4 Pie Charts, Revisited

Constructing a DataFrame from raw data:

pie_data = [
    {"company": "Company X", "market_share": 0.55},
    {"company": "Company Y", "market_share": 0.30},
    {"company": "Company Z", "market_share": 0.15}
]
df = DataFrame(pie_data)
df.head()

	company	market_share
0	Company X	0.55
1	Company Y	0.30
2	Company Z	0.15

from plotly.express import pie

fig = pie(df, labels="company", values="market_share", height=350,
          title="Market Share by Company",
)
fig.show()

7.0.5 Histograms, Revisited

Constructing a DataFrame from raw data:

histo_data = [
    {"user": "User A", "average_opinion": 0.1},
    {"user": "User B", "average_opinion": 0.4},
    {"user": "User C", "average_opinion": 0.4},
    {"user": "User D", "average_opinion": 0.8},
    {"user": "User E", "average_opinion": 0.86},
    {"user": "User F", "average_opinion": 0.75},
    {"user": "User G", "average_opinion": 0.90},
    {"user": "User H", "average_opinion": 0.99},
]
df = DataFrame(histo_data)
df.head()

	user	average_opinion
0	User A	0.10
1	User B	0.40
2	User C	0.40
3	User D	0.80
4	User E	0.86

Charting the data:

from plotly.express import histogram

fig = histogram(df, x="average_opinion", height=350, nbins=5,
          title="User Average Opinions",
          labels={"average_opinion": "Average Opinion"}
)
fig.show()

7.0.1 Line Charts, Revisited

7.0.1.1 Series Oriented Approach

7.0.1.2 DataFrame Oriented Approach

7.0.2 Bar Charts, Revisited

7.0.2.1 Horizontal Bar Chart, Revisited

7.0.3 Scatter Plots, Revisited

7.0.4 Pie Charts, Revisited

7.0.5 Histograms, Revisited

7.0.1.1 `Series` Oriented Approach

7.0.1.2 `DataFrame` Oriented Approach