Skip to main content

Statistics Week 1-12 Summary

Statistics and Probability Summary

📚

Week 1 to 4

Descriptive and Inferential Statistics

Descriptive Statistics: Methods for summarizing and organizing data. Examples include measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation).

Example: Calculating the mean, median, and mode of the dataset [2, 3, 3, 5, 7].

Inferential Statistics: Techniques for making predictions or inferences about a population based on a sample of data. This includes hypothesis testing and confidence intervals.

Example: Using a sample of 100 students' test scores to estimate the average test score of all students in a school.

Data Types

Categorical Data: Data that can be divided into groups or categories. Examples include gender, race, and yes/no responses.

Example: Survey responses to the question "What is your favorite color?" with options like red, blue, green, etc.

Numerical Data: Data that represents quantities and can be measured. Examples include height, weight, and age.

Example: Recording the heights of students in a class.

Scale of Measurement

Nominal: Categories without a specific order. Examples include types of fruit, colors.

Example: Classifying students by their favorite fruit: apple, banana, orange.

Ordinal: Categories with a specific order but no consistent difference between them. Examples include rankings, satisfaction levels.

Example: Ranking students as first, second, and third in a race.

Interval: Numerical data with equal intervals between values but no true zero point. Examples include temperature in Celsius or Fahrenheit.

Example: Measuring temperature in degrees Celsius.

Ratio: Numerical data with equal intervals and a true zero point. Examples include height, weight, and age.

Example: Measuring the weight of different fruits.

Frequency and Relative Frequency

Frequency: The number of times a value occurs in a dataset.

Example: Counting the number of students who scored above 90 in a test.

Relative Frequency: The proportion of times a value occurs in a dataset, calculated as frequency divided by the total number of observations.

Example: If 10 out of 50 students scored above 90, the relative frequency is 10/50 = 0.2 or 20%.

Charts

Pie Chart: A circular chart divided into sectors, each representing a proportion of the whole.

Example: A pie chart showing the distribution of favorite fruits among students.

Bar Chart: A chart with rectangular bars representing the frequency or relative frequency of different categories.

Example: A bar chart showing the number of students in different grade levels.

Pareto Chart: A bar chart where categories are ordered by frequency in descending order, often combined with a cumulative frequency line.

Example: A Pareto chart showing the most common reasons for student absences.

Measures of Central Tendency

Mean: The average of a dataset, calculated as the sum of all values divided by the number of values.

Example: The mean of the dataset [2, 3, 3, 5, 7] is (2+3+3+5+7)/5 = 4.

Median: The middle value of a dataset when the values are arranged in ascending order.

Example: The median of the dataset [2, 3, 3, 5, 7] is 3.

Mode: The most frequently occurring value in a dataset.

Example: The mode of the dataset [2, 3, 3, 5, 7] is 3.

Measures of Dispersion

Range: The difference between the highest and lowest values in a dataset.

Example: The range of the dataset [2, 3, 3, 5, 7] is 7-2 = 5.

Variance: The average of the squared differences from the mean.

Example: For the dataset [2, 3, 3, 5, 7], the variance is [(2-4)^2 + (3-4)^2 + (3-4)^2 + (5-4)^2 + (7-4)^2]/5 = 3.2.

Standard Deviation: The square root of the variance, representing the average distance from the mean.

Example: The standard deviation of the dataset [2, 3, 3, 5, 7] is √3.2 ≈ 1.79.

Percentile

A measure indicating the value below which a given percentage of observations fall. For example, the 25th percentile is the value below which 25% of observations fall.

Example: In a dataset of test scores, the 90th percentile is the score below which 90% of the scores fall.

Association

Contingency Table: A table used to summarize the relationship between two categorical variables.

Example: A contingency table showing the relationship between gender (male, female) and preference for a subject (math, science).

Covariance: A measure of how two variables change together. A positive covariance indicates that the variables tend to increase together, while a negative covariance indicates that one variable tends to increase when the other decreases.

Example: Calculating the covariance between students' study hours and their test scores.

Correlation: A standardized measure of the relationship between two variables, ranging from -1 to 1. A correlation of 1 indicates a perfect positive relationship, -1 indicates a perfect negative relationship, and 0 indicates no relationship.

Example: Calculating the correlation between students' heights and weights.

Point Biserial Coefficient: A measure of the relationship between a binary variable and a continuous variable.

Example: Calculating the point biserial coefficient between gender (male, female) and test scores.

Week 5 to 8

Probability and Combinatorics

Permutation: An arrangement of objects in a specific order. The number of permutations of n objects taken r at a time is given by nPr = n! / (n-r)!

Example: The number of ways to arrange 3 out of 5 books on a shelf is 5P3 = 5! / (5-3)! = 60.

Combination: A selection of objects without regard to order. The number of combinations of n objects taken r at a time is given by nCr = n! / [r!(n-r)!]

Example: The number of ways to choose 3 out of 5 books is 5C3 = 5! / [3!(5-3)!] = 10.

Permutation vs Combination: Permutations consider order, while combinations do not.

Example: Arranging 3 books out of 5 in a specific order is a permutation problem, calculated as 5P3 = 5! / (5-3)! = 60. Choosing 3 books out of 5 without regard to order is a combination problem, calculated as 5C3 = 5! / [3!(5-3)!] = 10.

Probability

Probability: The likelihood of an event occurring, ranging from 0 (impossible) to 1 (certain).

Example: The probability of rolling a 4 on a fair six-sided die is 1/6.

Conditional Probability: The probability of an event occurring given that another event has occurred, calculated as P(A|B) = P(A and B) / P(B).

Example: The probability of drawing an ace from a deck of cards given that a king has already been drawn is 4/51.

Disjoint Events: Events that cannot occur simultaneously. The probability of either event occurring is the sum of their individual probabilities.

Example: The probability of rolling a 2 or a 5 on a fair six-sided die is 1/6 + 1/6 = 1/3.

Independent Events: Events where the occurrence of one does not affect the probability of the other. The probability of both events occurring is the product of their individual probabilities.

Example: The probability of flipping a coin and getting heads, and then rolling a die and getting a 6 is 1/2 * 1/6 = 1/12.

Total Probability: The probability of an event is the sum of the probabilities of the event occurring under different conditions, weighted by the probability of each condition.

Example: If a factory produces 70% of its products in Plant A with a defect rate of 2%, and 30% in Plant B with a defect rate of 5%, the total probability of a product being defective is (0.7 * 0.02) + (0.3 * 0.05) = 0.014 + 0.015 = 0.029 or 2.9%.

Bayes' Theorem: A formula for updating probabilities based on new information, given by P(A|B) = [P(B|A) * P(A)] / P(B).

Example: If 1% of the population has a disease and a test for the disease is 99% accurate, the probability that a person has the disease given a positive test result can be calculated using Bayes' Theorem.

Week 9 to 12

Random Variables and Distributions

Discrete Random Variable: A variable that can take on a finite or countably infinite number of values. Examples include the number of heads in coin tosses (Bernoulli, Binomial), the number of successes in a sample without replacement (Hypergeometric), and the number of events in a fixed interval (Poisson).

Example: The number of heads obtained in 10 coin tosses is a discrete random variable.

Continuous Random Variable: A variable that can take on any value within a given range. Examples include time, height, and weight.

Example: The time it takes for a student to complete an exam is a continuous random variable.

Uniform Distribution: A distribution where all outcomes are equally likely within a given range.

Example: The probability of rolling any number on a fair six-sided die is uniformly distributed.

Non-Uniform Distribution: A distribution where some outcomes are more likely than others.

Example: The probability of different grades in a class where some grades are more common than others.

Exponential Distribution: A distribution describing the time between events in a Poisson process, with a constant rate of occurrence.

Example: The time between arrivals of buses at a bus stop is exponentially distributed.

Normal Distribution: A symmetric, bell-shaped distribution characterized by its mean and standard deviation. The 68-95-99.7 rule states that approximately 68% of data falls within one standard deviation of the mean, 95% within two, and 99.7% within three.

Example: The heights of adult men in a population are normally distributed.

Standard Normal Distribution: A normal distribution with a mean of 0 and a standard deviation of 1.

Example: Standardizing a dataset of test scores to compare them to a standard normal distribution.

Embedded PDF

Comments

Popular post

IITM Notes

Course Overview “These handwritten notes encompass topics in data science and civil services. The beauty of knowledge is that you don’t need to belong to any specific group; simply maintain your curiosity, and knowledge will find its way to you. I hope these notes are helpful. If they are, please consider leaving a comment below and follow my blog for updates.” Mathematics 1 👉 Select Week Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Revision Statistics 1 👉 Select Week Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11

Maths 1 week 1 Summary

Number System and Set Theory 📚 Number System and Set Theory This week, our teacher covered the basics of the number system. We were instructed to consider 0 as part of the natural numbers, as it will be treated as such in future subjects like Python. However, in exams, it will be explicitly stated whether 0 should be considered a natural number. The key topics from this week include set theory and the relationship between two sets. In set theory, we focused on three Venn diagram problems. In the context of relations, we discussed the concepts of reflexive, symmetric, transitive, and equivalence relations. Detailed Explanation 1.Union of Two Sets The union of two sets A and B is the set of elements that are in either A , B , or both. It is denoted as A ∪ B . 2.Intersection of Two Sets The intersection of two sets A and B is the set of elements that are in both A and B . It is denoted as A ∩ B . 3.Subt

Community page

Welcome To our IITM BS Students Community This community is a student commune where IIT Madras Bachelor of Science students are studying. Our community is managed by 15 community admins who oversee our WhatsApp community, Discord, and Telegram profiles. With more than 1000+ active members, we study together, share memes, watch movies, play games, and have fun. Our goal is to bring all online IITM students together to excel in exams while having fun. Community Admins Agampreet LinkedIn Ansh Ashwin Ambatwar Arti Dattu Dolly Elango Koushik Shrijanani Saksham Shivamani Shivam Instagram LinkedIn Join Our Community Subscribe to our YouTube page Join our meme team on