π
Statistical Concepts
Summary and Association Between Two Categorical Variables
When analyzing the relationship between two categorical variables, we often use a contingency table. This table displays the frequency distribution of the variables, allowing us to observe any potential association between them.
Contingency Table
A contingency table, also known as a cross-tabulation or crosstab, is a matrix format that displays the frequency distribution of variables. Each cell in the table represents the frequency count of occurrences for a specific combination of the variables.
| | Category 1 | Category 2 | Total |
|-----------|------------|------------|-------|
| Variable A| 10 | 20 | 30 |
| Variable B| 15 | 25 | 40 |
| Total | 25 | 45 | 70 |
Row Relative Frequency
Row relative frequency is calculated by dividing each cell frequency by the total frequency of its row. It shows the proportion of each category within a row.
Column Relative Frequency
Column relative frequency is calculated by dividing each cell frequency by the total frequency of its column. It shows the proportion of each category within a column.
Stacked Bar Chart
A stacked bar chart is a graphical representation of data where each bar is divided into segments representing different categories. It is useful for comparing the relative proportions of categories within each group.
Association Between Two Numerical Variables
To analyze the relationship between two numerical variables, we use measures such as covariance and correlation coefficient.
Covariance
Covariance measures the direction of the linear relationship between two variables. A positive covariance indicates that the variables tend to increase together, while a negative covariance indicates that one variable tends to increase as the other decreases.
Population Covariance
Population covariance is calculated using the entire population data.
Sample Covariance
Sample covariance is calculated using sample data.
Correlation Coefficient
The correlation coefficient measures the strength and direction of the linear relationship between two variables. It ranges from -1 to 1, where -1 indicates a perfect negative linear relationship, 0 indicates no linear relationship, and 1 indicates a perfect positive linear relationship.
Association Between Categorical and Numerical Variables
To analyze the relationship between a categorical variable and a numerical variable, we use the biserial correlation coefficient.
Biserial Correlation Coefficient
The biserial correlation coefficient measures the strength and direction of the relationship between a binary categorical variable and a numerical variable.
Where:
- M1 = Mean of the numerical variable for the group coded as 1
- M0 = Mean of the numerical variable for the group coded as 0
- S = Standard deviation of the numerical variable
- p = Proportion of the group coded as 1
- q = Proportion of the group coded as 0
Comments
Post a Comment