--> Skip to main content

Total Pageviews

Stats 2 Extra Activity 5

Statistical Data Analysis Project

Statistical Data Analysis Project

In this project, we explore statistical data analysis by fitting a known probability distribution to the given dataset. We begin by visualizing the data through histograms and then apply statistical methods, including the method of moments and maximum likelihood estimation (MLE), to estimate the parameters of the selected distribution. Finally, we use the bootstrap method to construct approximate confidence intervals, highlighting the robustness and variability of our parameter estimates.

1. Histogram for the Data

The first step involves creating a histogram to visualize the distribution of the given data. This will help us understand the underlying patterns and identify the possible distribution to fit.

2. Fit a Known Distribution

We fit a suitable known distribution (e.g., Normal distribution) to the data. Using the scipy.stats library, we can estimate the parameters (such as mean and variance) of the distribution that best fits the data.

3. Method of Moments Estimate and Maximum Likelihood Estimate

The method of moments and maximum likelihood estimates are calculated for the selected distribution. These methods provide estimates for the unknown parameters, such as the mean and variance, based on the data.

4. Bootstrap Method for Confidence Intervals

We use the bootstrap method to form approximate confidence intervals for the estimated parameters (such as mean and variance). This technique involves resampling the data with replacement and calculating the sample statistics for each resample to generate confidence intervals.

Below is code for Google colab

Activity 2.1

Code for histogram of data

Histogram of Data

import numpy as np
import matplotlib.pyplot as plt

# Load the data into a NumPy array
data = np.array([75, 80, 20, 45, 42, 40, 40, 50, 87, 10, 60, 20, 30, 80, 60, 30, 40, 60, 22, 18,
80, 23, 48, 67, 35, 80, 52, 60, 50, 42, 70, 15, 65, 72, 30, 30, 80, 60, 50, 70, 50, 50, 60, 50,
27, 60, 10, 50, 60, 17, 50, 40, 77, 65, 30, 45, 25, 33, 40, 70, 33, 40, 30, 50, 50, 80, 30, 50,
43, 30, 60, 30, 47, 37, 37, 45, 50, 30, 30, 87, 62, 35, 47, 70, 50, 60, 50, 60, 80, 73, 30, 90,
70, 82, 60, 80, 30, 80, 10, 40, 60, 65, 30, 10, 40, 40, 30, 27, 50, 75, 37, 40, 65, 55, 80, 40,
60, 43, 30, 20, 60, 40, 27, 50, 40, 70, 40, 97, 55, 30, 47, 40, 80, 80, 27, 85, 40, 37, 30, 35,
30, 30, 10, 40, 65, 95, 90, 60, 50, 65, 17, 53, 40, 60, 43, 55, 57, 50, 60, 80, 50, 80, 55, 60,
20, 60, 55, 80, 60, 80, 80, 50, 60, 45, 35, 28, 60, 37, 25, 38, 60, 70, 60, 60, 25, 47, 33, 57,
90, 50, 30, 20, 22, 47, 47, 30, 33, 30, 30, 80, 50, 70, 57, 20, 62, 33, 80, 20, 40, 25, 35, 40,
73, 60, 60, 23, 20, 17, 50, 70, 57, 35, 67, 50, 30, 72, 80, 20, 60, 50, 50, 47, 35, 37, 40, 60,
40, 37, 50, 52, 67, 40, 28, 20, 20, 35, 30, 60, 35, 20, 30, 50, 40, 63, 40, 47, 0, 70, 80, 55,
18, 63, 75, 30, 60, 80, 25, 87, 62, 90, 30, 20, 50, 30, 23, 30, 83, 48, 53, 55, 25, 52, 55, 62,
40, 80, 45, 70, 30, 73, 40, 50, 60, 20, 30, 40, 33, 70, 35, 47, 83, 33, 20, 30, 40, 42, 52, 25,
40, 20, 65, 47, 40, 40, 75, 47, 63, 45, 10, 80, 48, 60, 87, 10, 80, 20, 45, 40, 30, 60, 100, 38,
60, 25, 42, 82, 48, 40, 47, 50, 72, 60, 50, 57, 48, 85, 60, 70, 45, 57, 50, 30, 60, 40, 30, 55,
52, 55, 28, 47, 40, 35, 38, 43, 50, 50, 35, 53, 30, 35, 33, 52, 77, 60, 20, 40, 70, 43, 40, 45,
58, 40, 50, 67, 40, 45, 10, 20, 55, 10, 40, 77, 60, 72, 25, 30, 90, 45, 40, 50, 50, 30, 47, 100,
57, 40, 57, 33, 10, 20, 37, 27, 40, 30, 60, 30, 40, 40, 30, 77, 47, 70, 27, 40, 48, 30, 70, 35,
23, 70, 30, 30, 77, 20, 50, 30, 30, 80, 95, 20, 40, 55, 40, 20, 43, 60, 50, 42, 35, 40, 70, 60,
60, 7, 50, 70, 50, 27, 43, 18, 40, 33, 60, 85, 45, 25, 70, 30, 50, 45, 40, 50, 15, 42, 40, 50,
60, 62, 40, 60, 65, 0, 40, 47, 25, 40, 50, 30, 77, 50, 50, 30, 37, 25, 50, 90, 70, 35, 33, 85,
60, 27, 30, 40, 52, 57, 20, 60, 70, 20, 28, 40, 40, 15, 20, 33, 52, 40, 10, 80, 50, 30, 38, 45,
80, 40, 67, 70, 40, 70, 35, 50, 80, 75, 33, 35, 80, 20, 40, 52, 38, 20, 50, 40, 70, 10, 45, 90,
40, 55, 40, 5, 50, 40, 40, 70, 17, 45, 93, 20, 22, 50, 20, 87, 27, 40, 50, 28, 42, 40, 30, 47,
30, 25, 5, 87, 30, 25, 65, 50, 15, 82, 40, 50, 30, 25, 65, 30, 50, 15, 55, 22, 30, 25, 10, 50,
17, 12, 23, 40, 85, 50, 50, 40, 37, 20, 50, 22, 50, 60, 77, 35, 50, 60, 68, 65, 40, 50, 50, 30,
33, 25, 20, 55, 77, 15, 40, 30, 20, 47, 32, 55, 37, 20, 82, 47, 15, 52, 50, 65, 30, 40, 90, 20,
35, 30, 25, 35, 53, 80, 67, 60, 35, 45, 70, 70, 27, 70, 20, 27, 32, 53, 40, 73, 45, 40, 28, 60,
60, 85, 63, 23, 25, 50, 40, 37, 15, 60, 10, 70, 45, 25, 35, 35, 40, 40, 35, 20, 35, 65, 30, 77,
37, 42, 22, 30, 40, 35, 35, 42, 35, 35, 40, 22, 22, 60, 20, 55, 45, 32, 35, 65, 50, 43, 20, 30,
40, 20, 50, 40, 20, 30, 45, 20, 23, 40, 30, 55, 80, 30, 70, 40, 57, 50, 37, 77, 20, 60, 30, 45])

# Create a histogram
plt.hist(data, bins=20, edgecolor='black', color='skyblue')
plt.title('Histogram of Data')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()
            

Activity 2.2

Fitting normal distributions to the data.

Histogram with Fitted Normal Distribution

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Your data
data = np.array([75, 80, 20, 45, 42, 40, 40, 50, 87, 10, 60, 20, 30, 80, 60, 30, 40, 60, 22, 18,
80, 23, 48, 67, 35, 80, 52, 60, 50, 42, 70, 15, 65, 72, 30, 30, 80, 60, 50, 70, 50, 50, 60, 50,
27, 60, 10, 50, 60, 17, 50, 40, 77, 65, 30, 45, 25, 33, 40, 70, 33, 40, 30, 50, 50, 80, 30, 50,
43, 30, 60, 30, 47, 37, 37, 45, 50, 30, 30, 87, 62, 35, 47, 70, 50, 60, 50, 60, 80, 73, 30, 90,
70, 82, 60, 80, 30, 80, 10, 40, 60, 65, 30, 10, 40, 40, 30, 27, 50, 75, 37, 40, 65, 55, 80, 40,
60, 43, 30, 20, 60, 40, 27, 50, 40, 70, 40, 97, 55, 30, 47, 40, 80, 80, 27, 85, 40, 37, 30, 35,
30, 30, 10, 40, 65, 95, 90, 60, 50, 65, 17, 53, 40, 60, 43, 55, 57, 50, 60, 80, 50, 80, 55, 60,
20, 60, 55, 80, 60, 80, 80, 50, 60, 45, 35, 28, 60, 37, 25, 38, 60, 70, 60, 60, 25, 47, 33, 57,
90, 50, 30, 20, 22, 47, 47, 30, 33, 30, 30, 80, 50, 70, 57, 20, 62, 33, 80, 20, 40, 25, 35, 40,
73, 60, 60, 23, 20, 17, 50, 70, 57, 35, 67, 50, 30, 72, 80, 20, 60, 50, 50, 47, 35, 37, 40, 60,
40, 37, 50, 52, 67, 40, 28, 20, 20, 35, 30, 60, 35, 20, 30, 50, 40, 63, 40, 47, 0, 70, 80, 55,
18, 63, 75, 30, 60, 80, 25, 87, 62, 90, 30, 20, 50, 30, 23, 30, 83, 48, 53, 55, 25, 52, 55, 62,
40, 80, 45, 70, 30, 73, 40, 50, 60, 20, 30, 40, 33, 70, 35, 47, 83, 33, 20, 30, 40, 42, 52, 25,
40, 20, 65, 47, 40, 40, 75, 47, 63, 45, 10, 80, 48, 60, 87, 10, 80, 20, 45, 40, 30, 60, 100, 38,
60, 25, 42, 82, 48, 40, 47, 50, 72, 60, 50, 57, 48, 85, 60, 70, 45, 57, 50, 30, 60, 40, 30, 55,
52, 55, 28, 47, 40, 35, 38, 43, 50, 50, 35, 53, 30, 35, 33, 52, 77, 60, 20, 40, 70, 43, 40, 45,
58, 40, 50, 67, 40, 45, 10, 20, 55, 10, 40, 77, 60, 72, 25, 30, 90, 45, 40, 50, 50, 30, 47, 100,
57, 40, 57, 33, 10, 20, 37, 27, 40, 30, 60, 30, 40, 40, 30, 77, 47, 70, 27, 40, 48, 30, 70, 35,
23, 70, 30, 30, 77, 20, 50, 30, 30, 80, 95, 20, 40, 55, 40, 20, 43, 60, 50, 42, 35, 40, 70, 60,
60, 7, 50, 70, 50, 27, 43, 18, 40, 33, 60, 85, 45, 25, 70, 30, 50, 45, 40, 50, 15, 42, 40, 50,
60, 62, 40, 60, 65, 0, 40, 47, 25, 40, 50, 30, 77, 50, 50, 30, 37, 25, 50, 90, 70, 35, 33, 85,
60, 27, 30, 40, 52, 57, 20, 60, 70, 20, 28, 40, 40, 15, 20, 33, 52, 40, 10, 80, 50, 30, 38, 45,
80, 40, 67, 70, 40, 70, 35, 50, 80, 75, 33, 35, 80, 20, 40, 52, 38, 20, 50, 40, 70, 10, 45, 90,
40, 55, 40, 5, 50, 40, 40, 70, 17, 45, 93, 20, 22, 50, 20, 87, 27, 40, 50, 28, 42, 40, 30, 47,
30, 25, 5, 87, 30, 25, 65, 50, 15, 82, 40, 50, 30, 25, 65, 30, 50, 15, 55, 22, 30, 25, 10, 50,
17, 12, 23, 40, 85, 50, 50, 40, 37, 20, 50, 22, 50, 60, 77, 35, 50, 60, 68, 65, 40, 50, 50, 30,
33, 25, 20, 55, 77, 15, 40, 30, 20, 47, 32, 55, 37, 20, 82, 47, 15, 52, 50, 65, 30, 40, 90, 20,
35, 30, 25, 35, 53, 80, 67, 60, 35, 45, 70, 70, 27, 70, 20, 27, 32, 53, 40, 73, 45, 40, 28, 60,
60, 85, 63, 23, 25, 50, 40, 37, 15, 60, 10, 70, 45, 25, 35, 35, 40, 40, 35, 20, 35, 65, 30, 77,
37, 42, 22, 30, 40, 35, 35, 42, 35, 35, 40, 22, 22, 60, 20, 55, 45, 32, 35, 65, 50, 43, 20, 30,
40, 20, 50, 40, 20, 30, 45, 20, 23, 40, 30, 55, 80, 30, 70, 40, 57, 50, 37, 77, 20, 60, 30, 45])
# Fit a normal distribution to the data
mu, sigma = norm.fit(data)

# Plot the histogram
plt.hist(data, bins=20, density=True, edgecolor='black', alpha=0.6, color='skyblue')

# Generate points for the fitted curve
x = np.linspace(min(data), max(data), 100)
pdf = norm.pdf(x, mu, sigma)

# Plot the fitted curve
plt.plot(x, pdf, 'r-', label=f'Normal Fit ($\mu$={mu:.2f}, $\sigma$={sigma:.2f})')

# Customize the plot
plt.title('Histogram with Fitted Normal Distribution')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.grid(True)
plt.show()
            

Activity 2.3

method of moments estimate and ML estimate for the unknown parameter(s)

Method of Moments (MoM) and Maximum Likelihood Estimations (MLE)

Copy the numbers from above here in this code.

import numpy as np

# Sample data (replace with your own data)
data = np.array([12.3, 15.6, 14.2, 13.8, 12.7, 14.5, 13.4, 12.9, 14.0, 13.3])

# Method of Moments (MoM) Estimations
# Step 1: Calculate the sample mean and sample second moment
mu_mom = np.mean(data)
sigma2_mom = np.mean(data**2) - mu_mom**2

# Maximum Likelihood Estimate (MLE)
# Step 1: MLE for mean (it is the sample mean)
mu_mle = mu_mom  # Since MLE for mean is also the sample mean

# Step 2: MLE for variance (it is the sample variance)
sigma2_mle = np.var(data)  # MLE for variance is the sample variance

# Display results
print(f"Method of Moments (MoM) Estimations:")
print(f"mu_MoM = {mu_mom:.4f}")
print(f"sigma2_MoM = {sigma2_mom:.4f}")

print("\nMaximum Likelihood Estimations (MLE):")
print(f"mu_MLE = {mu_mle:.4f}")
print(f"sigma2_MLE = {sigma2_mle:.4f}")
        

Activity 2.4

Use Boot strap to find confidence interval

Bootstrap Confidence Interval Estimation for Mean and Variance

This code generates bootstrap confidence intervals for the mean and variance of the sample data.

import numpy as np

# Sample data (replace with your own data)
data = np.array([12.3, 15.6, 14.2, 13.8, 12.7, 14.5, 13.4, 12.9, 14.0, 13.3])

# Number of bootstrap samples
n_bootstrap = 1000
alpha = 0.05  # Significance level for 95% confidence interval

# Function to calculate mean and variance
def bootstrap_statistic(data, n_bootstrap):
    # Initialize lists to store bootstrap statistics
    means = []
    variances = []

    # Resampling with replacement
    for _ in range(n_bootstrap):
        resample = np.random.choice(data, size=len(data), replace=True)
        means.append(np.mean(resample))
        variances.append(np.var(resample))

    return np.array(means), np.array(variances)

# Generate bootstrap statistics
bootstrap_means, bootstrap_variances = bootstrap_statistic(data, n_bootstrap)

# Calculate the percentiles for the confidence intervals
mean_lower = np.percentile(bootstrap_means, 100*alpha/2)
mean_upper = np.percentile(bootstrap_means, 100*(1-alpha/2))
var_lower = np.percentile(bootstrap_variances, 100*alpha/2)
var_upper = np.percentile(bootstrap_variances, 100*(1-alpha/2))

# Display the confidence intervals
print(f"Bootstrap Confidence Interval for Mean (95% CI): ({mean_lower:.4f}, {mean_upper:.4f})")
print(f"Bootstrap Confidence Interval for Variance (95% CI): ({var_lower:.4f}, {var_upper:.4f})")
            

Download google colab

Comments