We have previously studied how to create scatter plots with trendlines. We can do this with tabular data as well.
Constructing a DataFrame
from raw data:
from pandas import DataFrame
scatter_data = [
{"income": 30_000, "life_expectancy": 65.5},
{"income": 35_000, "life_expectancy": 62.1},
{"income": 50_000, "life_expectancy": 66.7},
{"income": 55_000, "life_expectancy": 71.0},
{"income": 70_000, "life_expectancy": 72.5},
{"income": 75_000, "life_expectancy": 77.3},
{"income": 90_000, "life_expectancy": 82.9},
{"income": 95_000, "life_expectancy": 80.0},
]
df = DataFrame(scatter_data)
df.head()
0 |
30000 |
65.5 |
1 |
35000 |
62.1 |
2 |
50000 |
66.7 |
3 |
55000 |
71.0 |
4 |
70000 |
72.5 |
Linear trends using the “ols” trendline parameter value:
from plotly.express import scatter
fig = scatter(df, x="income", y="life_expectancy", height=350,
title="Life Expectancy by Income",
labels={"x": "Income", "life_expectancy": "Life Expectancy (years)"},
trendline="ols", trendline_color_override="red"
)
fig.show()
Non-linear trends using the “lowess” trendline parameter value:
fig = scatter(df, x="income", y="life_expectancy", height=350,
title="Life Expectancy by Income",
labels={"x": "Income", "life_expectancy": "Life Expectancy (years)"},
trendline="lowess", trendline_color_override="red"
)
fig.show()