Appendix N — Scatter Plot Trendlines w/ `plotly`

In many cases, it may be helpful to add a trendline to a chart, to help examine relationships between variables.

The scatter function in plotly is the only type of chart that supports trendlines.

To illustrate how to add trendlines, let’s revisit the previous scatter plot example:

Code

scatter_data = [
    {"income": 30_000, "life_expectancy": 65.5},
    {"income": 35_000, "life_expectancy": 62.1},
    {"income": 50_000, "life_expectancy": 66.7},
    {"income": 55_000, "life_expectancy": 71.0},
    {"income": 70_000, "life_expectancy": 72.5},
    {"income": 75_000, "life_expectancy": 77.3},
    {"income": 90_000, "life_expectancy": 82.9},
    {"income": 95_000, "life_expectancy": 80.0},
]

incomes = []
expectancies = []
for item in scatter_data:
    incomes.append(item["income"])
    expectancies.append(item["life_expectancy"])

from plotly.express import scatter

fig = scatter(x=incomes, y=expectancies, height=350,
                title="Life Expectancy by Income",
                labels={"x": "Income", "y": "Life Expectancy (years)"},
)
fig.show()

Upon viewing the chart, looks like there may be evidence of a trend.

N.1 Linear Trends

The scatter function has some trend-line related parameters:

from plotly.express import scatter

fig = scatter(x=incomes, y=expectancies, height=350,
                title="Life Expectancy by Income",
                labels={"x": "Income", "y": "Life Expectancy (years)"},
                trendline="ols", trendline_color_override="red"
)
fig.show()

FYI

Under the hood, plotly uses the statsmodels package to calculate the trend, so you may have to install that package as well.

A linear trend assumes that there is a straight-line relationship between the independent and dependent variables. In the context of US GDP data, a linear trend suggests that GDP changes at a constant rate over time. When applying linear regression, the goal is to find the best-fit line that minimizes the residuals (differences between the predicted and actual values) under the assumption that the underlying relationship is linear.

For linear trends only, plotly provides access to the underlying regression results summary, to tell us more about how well the trend line fits the data:

from plotly.express import get_trendline_results

results = get_trendline_results(fig)
print(results.px_fit_results.iloc[0].summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.916
Model:                            OLS   Adj. R-squared:                  0.901
Method:                 Least Squares   F-statistic:                     65.01
Date:                Fri, 18 Jul 2025   Prob (F-statistic):           0.000195
Time:                        18:02:51   Log-Likelihood:                -16.910
No. Observations:                   8   AIC:                             37.82
Df Residuals:                       6   BIC:                             37.98
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         53.9321      2.415     22.336      0.000      48.024      59.840
x1             0.0003   3.63e-05      8.063      0.000       0.000       0.000
==============================================================================
Omnibus:                        4.148   Durbin-Watson:                   2.235
Prob(Omnibus):                  0.126   Jarque-Bera (JB):                1.044
Skew:                           0.209   Prob(JB):                        0.593
Kurtosis:                       1.280   Cond. No.                     1.96e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.96e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/scipy/stats/_axis_nan_policy.py:418: UserWarning:

`kurtosistest` p-value may be inaccurate with fewer than 20 observations; only n=8 observations were given.

Linear regression is simple and interpretable but can be overly restrictive when the real-world data follows a more complex, non-linear pattern.

N.2 Non-linear Trends

In addition to the "ols" (Ordinary Least Squares) linear trend, we can use a "lowess" (Locally Weighted Scatterplot Smoothing) trend, which may be a better fit for non-linear relationships.

from plotly.express import scatter

fig = scatter(x=incomes, y=expectancies, height=350,
                title="Life Expectancy by Income",
                labels={"x": "Income", "y": "Life Expectancy (years)"},
                trendline="lowess", trendline_color_override="red"
)
fig.show()

LOWESS is a non-parametric method that fits multiple local regressions to different segments of the data. Instead of assuming a global linear relationship, it captures local patterns by fitting simple models in small neighborhoods around each point. These local models are then combined to create a smooth curve that adjusts to non-linearities in the data. A LOWESS trend can adapt to sudden changes, curves, and other complex behaviors in the data, making it ideal for datasets where the relationship between variables changes over time.