8  Statistical Tests

We can use the scipy package to perform basic statistical tests.

For these examples, let’s us this familiar example dataset of monthly financial and economic indicators:

from pandas import read_csv

repo_url = "https://raw.githubusercontent.com/prof-rossetti/python-for-finance"
request_url = f"{repo_url}/main/docs/data/monthly-indicators.csv"

df = read_csv(request_url)
df.head()
timestamp cpi fed spy gld
0 2024-05-01 314.069 5.33 525.6718 215.30
1 2024-04-01 313.548 5.33 500.3636 211.87
2 2024-03-01 312.332 5.33 521.3857 205.72
3 2024-02-01 310.326 5.33 504.8645 189.31
4 2024-01-01 308.417 5.33 479.8240 188.45

8.1 Normality Tests

We can use the normaltest function from to conduct a normality test, to see if a given variable is normally distributed.

This function tests the null hypothesis that a sample comes from a normal distribution.

If the p-value is “small” - that is, if there is a low probability of sampling data from a normally distributed population that produces such an extreme value of the statistic - this may be taken as evidence against the null hypothesis in favor of the alternative: the weights were not drawn from a normal distribution.

In this example, we pass a column or list of values to the normaltest function, which produces a result containing the statistic and p value:

from scipy.stats import normaltest

x = df["fed"]

result = normaltest(x)
print(result)
NormaltestResult(statistic=np.float64(34.68795952886342), pvalue=np.float64(2.9349809995776456e-08))

Interpreting the results:

To determine whether the data do not follow a normal distribution, compare the p-value to the significance level. Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates a 5% risk of concluding that the data do not follow a normal distribution when the data do follow a normal distribution.

P-value ≤ α: The data do not follow a normal distribution (Reject H0) If the p-value is less than or equal to the significance level, the decision is to reject the null hypothesis and conclude that your data do not follow a normal distribution.

P-value > α: You cannot conclude that the data do not follow a normal distribution (Fail to reject H0). If the p-value is larger than the significance level, the decision is to fail to reject the null hypothesis. You do not have enough evidence to conclude that your data do not follow a normal distribution. - source

We examine the p value. If the p value is less than some significance level we set (in this case 0.05), we reject the null hypothesis, and conclude the data is not normally distributed. Otherwise, we fail to reject the null hypothesis, and conclude it is possible the data could be normally distributed:

if result.pvalue <= 0.05:
    print("REJECT (NOT NORMAL)")
else:
    print("NOT ABLE TO REJECT (COULD BE NORMAL)")
REJECT (NOT NORMAL)

Looks like the federal fuds rate does not have a normal distribution (as this notebook was run on June 28th 2024).

How about the market?

x = df["spy"]

result = normaltest(x)
print(result)

if result.pvalue <= 0.05:
    print("REJECT (NOT NORMAL)")
else:
    print("NOT ABLE TO REJECT (COULD BE NORMAL)")
NormaltestResult(statistic=np.float64(27.560328618235523), pvalue=np.float64(1.0359783530157106e-06))
REJECT (NOT NORMAL)

8.2 T-Tests

Reference: https://www.investopedia.com/terms/t/t-test.asp

A t-test is an inferential statistic used to determine if there is a significant difference between the means of two groups and how they are related. T-tests are used when the data sets follow a normal distribution and have unknown variances, like the data set recorded from flipping a coin 100 times.

8.2.1 T-Test Considerations

Reference: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6676026/#sec-2title

In order to conduct a T-Test, the data needs to be normally distributed. So the examples below may not be the most methodologically sound. However they should provide code examples you can adapt for other use cases in the future.

8.2.2 2 Sample T-Test

A two sample T-test is used to determine whether two independent samples come from the same distribution.

Let’s split the most recent year’s rates from the rest. And see if the most recent years are statistically different.

#cutoff_date = "2022-06-01" # you can chose a different one if you'd like
cutoff_date = "2022-10-01"

rates_recent = df[df["timestamp"] >= cutoff_date]["fed"]
print(len(rates_recent))
print(rates_recent)
20
0     5.33
1     5.33
2     5.33
3     5.33
4     5.33
5     5.33
6     5.33
7     5.33
8     5.33
9     5.33
10    5.12
11    5.08
12    5.06
13    4.83
14    4.65
15    4.57
16    4.33
17    4.10
18    3.78
19    3.08
Name: fed, dtype: float64
rates_historic = df[df["timestamp"] < cutoff_date]["fed"]
print(len(rates_historic))
print(rates_historic)
214
20     2.56
21     2.33
22     1.68
23     1.21
24     0.77
       ... 
229    2.79
230    2.63
231    2.50
232    2.28
233    2.16
Name: fed, Length: 214, dtype: float64

Reference: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

Calculate the T-test for the means of two independent samples of scores.

This is a test for the null hypothesis that 2 independent samples have identical average (expected) values. This test assumes that the populations have identical variances by default.

The t-test quantifies the difference between the arithmetic means of the two samples. The p-value quantifies the probability of observing as or more extreme values assuming the null hypothesis, that the samples are drawn from populations with the same population means, is true. A p-value larger than a chosen threshold (e.g. 5% or 1%) indicates that our observation is not so unlikely to have occurred by chance. Therefore, we do not reject the null hypothesis of equal population means. If the p-value is smaller than our threshold, then we have evidence against the null hypothesis of equal population means. -

print(rates_recent.var())
print(rates_historic.var())
0.4033105263157895
2.7065506493791407
from scipy.stats import ttest_ind

result = ttest_ind(rates_recent, rates_historic)
print(result)

if result.pvalue <= 0.05:
    print("REJECT (MEANS NOT THE SAME)")
else:
    print("NOT ABLE TO REJECT (MEANS COULD BE THE SAME)")
TtestResult(statistic=np.float64(9.743816217891522), pvalue=np.float64(5.021356895595338e-19), df=np.float64(232.0))
REJECT (MEANS NOT THE SAME)

8.2.3 1 Sample T-Test

Reference: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_1samp.html

Calculate the T-test for the mean of ONE group of scores.

This is a test for the null hypothesis that the expected value (mean) of a sample of independent observations is equal to the given population mean, popmean.

Under certain assumptions about the population from which a sample is drawn, the confidence interval with confidence level 95% is expected to contain the true population mean in 95% of sample replications.

Suppose we wish to test the null hypothesis that the mean of the fed funds rates is equal to 2.5%.

We pass as parameters the column of values, and the population mean we wish to test. Then we inspect the p value to interpret the results.

from scipy.stats import ttest_1samp

x = df["fed"]
print(x.mean())

popmean = 2.5 # for example
result = ttest_1samp(x, popmean=popmean)
print(result)

if result.pvalue <= 0.05:
    print("REJECT (MEAN NOT EQUAL TO POPMEAN)")
else:
    print("NOT ABLE TO REJECT (MEAN COULT BE EQUAL TO POPMEAN)")
1.5887606837606836
TtestResult(statistic=np.float64(-7.415864219982758), pvalue=np.float64(2.2306437030862214e-12), df=np.int64(233))
REJECT (MEAN NOT EQUAL TO POPMEAN)

Finally, we can access information about the confidence interval for this test:

ci = result.confidence_interval(confidence_level=0.95)
print(ci)
ConfidenceInterval(low=np.float64(1.346668668631088), high=np.float64(1.8308526988902791))