Statistical Data Analysis Project
In this project, we explore statistical data analysis by fitting a known probability distribution to the given dataset. We begin by visualizing the data through histograms and then apply statistical methods, including the method of moments and maximum likelihood estimation (MLE), to estimate the parameters of the selected distribution. Finally, we use the bootstrap method to construct approximate confidence intervals, highlighting the robustness and variability of our parameter estimates.
1. Histogram for the Data
The first step involves creating a histogram to visualize the distribution of the given data. This will help us understand the underlying patterns and identify the possible distribution to fit.
2. Fit a Known Distribution
We fit a suitable known distribution (e.g., Normal distribution) to the data. Using the scipy.stats library, we can estimate the parameters (such as mean and variance) of the distribution that best fits the data.
3. Method of Moments Estimate and Maximum Likelihood Estimate
The method of moments and maximum likelihood estimates are calculated for the selected distribution. These methods provide estimates for the unknown parameters, such as the mean and variance, based on the data.
4. Bootstrap Method for Confidence Intervals
We use the bootstrap method to form approximate confidence intervals for the estimated parameters (such as mean and variance). This technique involves resampling the data with replacement and calculating the sample statistics for each resample to generate confidence intervals.
Below is code for Google colab
Activity 2.1
Code for histogram of data
Histogram of Data
import numpy as np import matplotlib.pyplot as plt # Load the data into a NumPy array data = np.array([75, 80, 20, 45, 42, 40, 40, 50, 87, 10, 60, 20, 30, 80, 60, 30, 40, 60, 22, 18, 80, 23, 48, 67, 35, 80, 52, 60, 50, 42, 70, 15, 65, 72, 30, 30, 80, 60, 50, 70, 50, 50, 60, 50, 27, 60, 10, 50, 60, 17, 50, 40, 77, 65, 30, 45, 25, 33, 40, 70, 33, 40, 30, 50, 50, 80, 30, 50, 43, 30, 60, 30, 47, 37, 37, 45, 50, 30, 30, 87, 62, 35, 47, 70, 50, 60, 50, 60, 80, 73, 30, 90, 70, 82, 60, 80, 30, 80, 10, 40, 60, 65, 30, 10, 40, 40, 30, 27, 50, 75, 37, 40, 65, 55, 80, 40, 60, 43, 30, 20, 60, 40, 27, 50, 40, 70, 40, 97, 55, 30, 47, 40, 80, 80, 27, 85, 40, 37, 30, 35, 30, 30, 10, 40, 65, 95, 90, 60, 50, 65, 17, 53, 40, 60, 43, 55, 57, 50, 60, 80, 50, 80, 55, 60, 20, 60, 55, 80, 60, 80, 80, 50, 60, 45, 35, 28, 60, 37, 25, 38, 60, 70, 60, 60, 25, 47, 33, 57, 90, 50, 30, 20, 22, 47, 47, 30, 33, 30, 30, 80, 50, 70, 57, 20, 62, 33, 80, 20, 40, 25, 35, 40, 73, 60, 60, 23, 20, 17, 50, 70, 57, 35, 67, 50, 30, 72, 80, 20, 60, 50, 50, 47, 35, 37, 40, 60, 40, 37, 50, 52, 67, 40, 28, 20, 20, 35, 30, 60, 35, 20, 30, 50, 40, 63, 40, 47, 0, 70, 80, 55, 18, 63, 75, 30, 60, 80, 25, 87, 62, 90, 30, 20, 50, 30, 23, 30, 83, 48, 53, 55, 25, 52, 55, 62, 40, 80, 45, 70, 30, 73, 40, 50, 60, 20, 30, 40, 33, 70, 35, 47, 83, 33, 20, 30, 40, 42, 52, 25, 40, 20, 65, 47, 40, 40, 75, 47, 63, 45, 10, 80, 48, 60, 87, 10, 80, 20, 45, 40, 30, 60, 100, 38, 60, 25, 42, 82, 48, 40, 47, 50, 72, 60, 50, 57, 48, 85, 60, 70, 45, 57, 50, 30, 60, 40, 30, 55, 52, 55, 28, 47, 40, 35, 38, 43, 50, 50, 35, 53, 30, 35, 33, 52, 77, 60, 20, 40, 70, 43, 40, 45, 58, 40, 50, 67, 40, 45, 10, 20, 55, 10, 40, 77, 60, 72, 25, 30, 90, 45, 40, 50, 50, 30, 47, 100, 57, 40, 57, 33, 10, 20, 37, 27, 40, 30, 60, 30, 40, 40, 30, 77, 47, 70, 27, 40, 48, 30, 70, 35, 23, 70, 30, 30, 77, 20, 50, 30, 30, 80, 95, 20, 40, 55, 40, 20, 43, 60, 50, 42, 35, 40, 70, 60, 60, 7, 50, 70, 50, 27, 43, 18, 40, 33, 60, 85, 45, 25, 70, 30, 50, 45, 40, 50, 15, 42, 40, 50, 60, 62, 40, 60, 65, 0, 40, 47, 25, 40, 50, 30, 77, 50, 50, 30, 37, 25, 50, 90, 70, 35, 33, 85, 60, 27, 30, 40, 52, 57, 20, 60, 70, 20, 28, 40, 40, 15, 20, 33, 52, 40, 10, 80, 50, 30, 38, 45, 80, 40, 67, 70, 40, 70, 35, 50, 80, 75, 33, 35, 80, 20, 40, 52, 38, 20, 50, 40, 70, 10, 45, 90, 40, 55, 40, 5, 50, 40, 40, 70, 17, 45, 93, 20, 22, 50, 20, 87, 27, 40, 50, 28, 42, 40, 30, 47, 30, 25, 5, 87, 30, 25, 65, 50, 15, 82, 40, 50, 30, 25, 65, 30, 50, 15, 55, 22, 30, 25, 10, 50, 17, 12, 23, 40, 85, 50, 50, 40, 37, 20, 50, 22, 50, 60, 77, 35, 50, 60, 68, 65, 40, 50, 50, 30, 33, 25, 20, 55, 77, 15, 40, 30, 20, 47, 32, 55, 37, 20, 82, 47, 15, 52, 50, 65, 30, 40, 90, 20, 35, 30, 25, 35, 53, 80, 67, 60, 35, 45, 70, 70, 27, 70, 20, 27, 32, 53, 40, 73, 45, 40, 28, 60, 60, 85, 63, 23, 25, 50, 40, 37, 15, 60, 10, 70, 45, 25, 35, 35, 40, 40, 35, 20, 35, 65, 30, 77, 37, 42, 22, 30, 40, 35, 35, 42, 35, 35, 40, 22, 22, 60, 20, 55, 45, 32, 35, 65, 50, 43, 20, 30, 40, 20, 50, 40, 20, 30, 45, 20, 23, 40, 30, 55, 80, 30, 70, 40, 57, 50, 37, 77, 20, 60, 30, 45]) # Create a histogram plt.hist(data, bins=20, edgecolor='black', color='skyblue') plt.title('Histogram of Data') plt.xlabel('Value') plt.ylabel('Frequency') plt.show()
Activity 2.2
Fitting normal distributions to the data.
Histogram with Fitted Normal Distribution
import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm # Your data data = np.array([75, 80, 20, 45, 42, 40, 40, 50, 87, 10, 60, 20, 30, 80, 60, 30, 40, 60, 22, 18, 80, 23, 48, 67, 35, 80, 52, 60, 50, 42, 70, 15, 65, 72, 30, 30, 80, 60, 50, 70, 50, 50, 60, 50, 27, 60, 10, 50, 60, 17, 50, 40, 77, 65, 30, 45, 25, 33, 40, 70, 33, 40, 30, 50, 50, 80, 30, 50, 43, 30, 60, 30, 47, 37, 37, 45, 50, 30, 30, 87, 62, 35, 47, 70, 50, 60, 50, 60, 80, 73, 30, 90, 70, 82, 60, 80, 30, 80, 10, 40, 60, 65, 30, 10, 40, 40, 30, 27, 50, 75, 37, 40, 65, 55, 80, 40, 60, 43, 30, 20, 60, 40, 27, 50, 40, 70, 40, 97, 55, 30, 47, 40, 80, 80, 27, 85, 40, 37, 30, 35, 30, 30, 10, 40, 65, 95, 90, 60, 50, 65, 17, 53, 40, 60, 43, 55, 57, 50, 60, 80, 50, 80, 55, 60, 20, 60, 55, 80, 60, 80, 80, 50, 60, 45, 35, 28, 60, 37, 25, 38, 60, 70, 60, 60, 25, 47, 33, 57, 90, 50, 30, 20, 22, 47, 47, 30, 33, 30, 30, 80, 50, 70, 57, 20, 62, 33, 80, 20, 40, 25, 35, 40, 73, 60, 60, 23, 20, 17, 50, 70, 57, 35, 67, 50, 30, 72, 80, 20, 60, 50, 50, 47, 35, 37, 40, 60, 40, 37, 50, 52, 67, 40, 28, 20, 20, 35, 30, 60, 35, 20, 30, 50, 40, 63, 40, 47, 0, 70, 80, 55, 18, 63, 75, 30, 60, 80, 25, 87, 62, 90, 30, 20, 50, 30, 23, 30, 83, 48, 53, 55, 25, 52, 55, 62, 40, 80, 45, 70, 30, 73, 40, 50, 60, 20, 30, 40, 33, 70, 35, 47, 83, 33, 20, 30, 40, 42, 52, 25, 40, 20, 65, 47, 40, 40, 75, 47, 63, 45, 10, 80, 48, 60, 87, 10, 80, 20, 45, 40, 30, 60, 100, 38, 60, 25, 42, 82, 48, 40, 47, 50, 72, 60, 50, 57, 48, 85, 60, 70, 45, 57, 50, 30, 60, 40, 30, 55, 52, 55, 28, 47, 40, 35, 38, 43, 50, 50, 35, 53, 30, 35, 33, 52, 77, 60, 20, 40, 70, 43, 40, 45, 58, 40, 50, 67, 40, 45, 10, 20, 55, 10, 40, 77, 60, 72, 25, 30, 90, 45, 40, 50, 50, 30, 47, 100, 57, 40, 57, 33, 10, 20, 37, 27, 40, 30, 60, 30, 40, 40, 30, 77, 47, 70, 27, 40, 48, 30, 70, 35, 23, 70, 30, 30, 77, 20, 50, 30, 30, 80, 95, 20, 40, 55, 40, 20, 43, 60, 50, 42, 35, 40, 70, 60, 60, 7, 50, 70, 50, 27, 43, 18, 40, 33, 60, 85, 45, 25, 70, 30, 50, 45, 40, 50, 15, 42, 40, 50, 60, 62, 40, 60, 65, 0, 40, 47, 25, 40, 50, 30, 77, 50, 50, 30, 37, 25, 50, 90, 70, 35, 33, 85, 60, 27, 30, 40, 52, 57, 20, 60, 70, 20, 28, 40, 40, 15, 20, 33, 52, 40, 10, 80, 50, 30, 38, 45, 80, 40, 67, 70, 40, 70, 35, 50, 80, 75, 33, 35, 80, 20, 40, 52, 38, 20, 50, 40, 70, 10, 45, 90, 40, 55, 40, 5, 50, 40, 40, 70, 17, 45, 93, 20, 22, 50, 20, 87, 27, 40, 50, 28, 42, 40, 30, 47, 30, 25, 5, 87, 30, 25, 65, 50, 15, 82, 40, 50, 30, 25, 65, 30, 50, 15, 55, 22, 30, 25, 10, 50, 17, 12, 23, 40, 85, 50, 50, 40, 37, 20, 50, 22, 50, 60, 77, 35, 50, 60, 68, 65, 40, 50, 50, 30, 33, 25, 20, 55, 77, 15, 40, 30, 20, 47, 32, 55, 37, 20, 82, 47, 15, 52, 50, 65, 30, 40, 90, 20, 35, 30, 25, 35, 53, 80, 67, 60, 35, 45, 70, 70, 27, 70, 20, 27, 32, 53, 40, 73, 45, 40, 28, 60, 60, 85, 63, 23, 25, 50, 40, 37, 15, 60, 10, 70, 45, 25, 35, 35, 40, 40, 35, 20, 35, 65, 30, 77, 37, 42, 22, 30, 40, 35, 35, 42, 35, 35, 40, 22, 22, 60, 20, 55, 45, 32, 35, 65, 50, 43, 20, 30, 40, 20, 50, 40, 20, 30, 45, 20, 23, 40, 30, 55, 80, 30, 70, 40, 57, 50, 37, 77, 20, 60, 30, 45]) # Fit a normal distribution to the data mu, sigma = norm.fit(data) # Plot the histogram plt.hist(data, bins=20, density=True, edgecolor='black', alpha=0.6, color='skyblue') # Generate points for the fitted curve x = np.linspace(min(data), max(data), 100) pdf = norm.pdf(x, mu, sigma) # Plot the fitted curve plt.plot(x, pdf, 'r-', label=f'Normal Fit ($\mu$={mu:.2f}, $\sigma$={sigma:.2f})') # Customize the plot plt.title('Histogram with Fitted Normal Distribution') plt.xlabel('Value') plt.ylabel('Density') plt.legend() plt.grid(True) plt.show()
Activity 2.3
method of moments estimate and ML estimate for the unknown parameter(s)
Method of Moments (MoM) and Maximum Likelihood Estimations (MLE)
Copy the numbers from above here in this code.
import numpy as np # Sample data (replace with your own data) data = np.array([12.3, 15.6, 14.2, 13.8, 12.7, 14.5, 13.4, 12.9, 14.0, 13.3]) # Method of Moments (MoM) Estimations # Step 1: Calculate the sample mean and sample second moment mu_mom = np.mean(data) sigma2_mom = np.mean(data**2) - mu_mom**2 # Maximum Likelihood Estimate (MLE) # Step 1: MLE for mean (it is the sample mean) mu_mle = mu_mom # Since MLE for mean is also the sample mean # Step 2: MLE for variance (it is the sample variance) sigma2_mle = np.var(data) # MLE for variance is the sample variance # Display results print(f"Method of Moments (MoM) Estimations:") print(f"mu_MoM = {mu_mom:.4f}") print(f"sigma2_MoM = {sigma2_mom:.4f}") print("\nMaximum Likelihood Estimations (MLE):") print(f"mu_MLE = {mu_mle:.4f}") print(f"sigma2_MLE = {sigma2_mle:.4f}")
Activity 2.4
Use Boot strap to find confidence interval
Bootstrap Confidence Interval Estimation for Mean and Variance
This code generates bootstrap confidence intervals for the mean and variance of the sample data.
import numpy as np # Sample data (replace with your own data) data = np.array([12.3, 15.6, 14.2, 13.8, 12.7, 14.5, 13.4, 12.9, 14.0, 13.3]) # Number of bootstrap samples n_bootstrap = 1000 alpha = 0.05 # Significance level for 95% confidence interval # Function to calculate mean and variance def bootstrap_statistic(data, n_bootstrap): # Initialize lists to store bootstrap statistics means = [] variances = [] # Resampling with replacement for _ in range(n_bootstrap): resample = np.random.choice(data, size=len(data), replace=True) means.append(np.mean(resample)) variances.append(np.var(resample)) return np.array(means), np.array(variances) # Generate bootstrap statistics bootstrap_means, bootstrap_variances = bootstrap_statistic(data, n_bootstrap) # Calculate the percentiles for the confidence intervals mean_lower = np.percentile(bootstrap_means, 100*alpha/2) mean_upper = np.percentile(bootstrap_means, 100*(1-alpha/2)) var_lower = np.percentile(bootstrap_variances, 100*alpha/2) var_upper = np.percentile(bootstrap_variances, 100*(1-alpha/2)) # Display the confidence intervals print(f"Bootstrap Confidence Interval for Mean (95% CI): ({mean_lower:.4f}, {mean_upper:.4f})") print(f"Bootstrap Confidence Interval for Variance (95% CI): ({var_lower:.4f}, {var_upper:.4f})")
Download google colab
Comments
Post a Comment