EFA is a data reduction technique that aims to identify latent factors or constructs that explain patterns of correlations among observed variables. Exploratory Factor Analysis (EFA) in R Programming Language is commonly used in fields such as psychology, sociology, education, and market research to uncover the underlying structure of data.
Exploratory Factor AnalysisExploratory Factor Analysis (EFA) is a statistical method used to discover the underlying structure of a large set of variables. Think of it as a way to find hidden patterns or groups (called “factors”) within a bunch of data. The main purpose of EFA is to simplify complex data by reducing the number of variables into fewer factors. These factors help us understand how different variables are related to each other. For example, if you’re studying various traits of people’s personalities, EFA can help identify broader personality traits that group.
Importance of Preparing the Dataset for EFANow we will discuss the importance of preparing the dataset for exploratory factor analysis (EFA) so it helps us work with exploratory factor analysis.
- Dealing with Missing Data: Missing data should be handled carefully. Methods include removing missing data, imputing values, or using statistical techniques to estimate missing values.
- Checking for Outliers: Outliers are extreme values that can skew results. Identifying and potentially removing outliers can improve the accuracy of EFA.
- Ensuring Adequate Sample Size: A larger sample size provides more reliable results. A common guideline is having at least 5-10 observations per variable.
Step-by-Step Guide to Perform EFA Using RNow we will discuss Step-by-Step Guide to Perform Exploratory Factor Analysis (EFA) in R Programming Language.
Step 1: Install and Load PackagesWe install and load the required packages for Exploratory Factor Analysis.
R
install.packages("psych") # for psychometric analyses.
library(psych)
install.packages("factoextra") #for enhanced visualization of multivariate data analysis
library(factoextra)
install.packages("lavaan") # for structural equation modeling (SEM).
library(lavaan)
Step 2: Load and Inspect the DatasetLoad the mtcars dataset and view the first few rows.
R
data(mtcars)
head(mtcars)
Output:
mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 Step 3: Performing EDACheck for missing values and outliers in the dataset.
R
sum(is.na(mtcars))
boxplot(mtcars)
Output:
[1] 0  Creating Boxplot The boxplot visualizes the distribution and potential outliers for each variable.
Step 4: Determine the Number of FactorsNow we Determine the number of factors to retain in a factor analysis is a crucial step in the analysis process. Various methods can be employed to make this decision, including statistical criteria and visual inspection Kaiser Criterion is one of them.
Kaiser Criterion- Calculate eigenvalues and use the Kaiser Criterion (eigenvalues > 1) to determine the number of factors.
- The first two eigenvalues are greater than 1, suggesting 2 factors.
R
eigenvalues <- eigen(cor(mtcars))$values
print(eigenvalues)
Output:
[1] 6.60840025 2.65046789 0.62719727 0.26959744 0.22345110 0.21159612 [7] 0.13526199 0.12290143 0.07704665 0.05203544 0.02204441 Now we Visualize the eigenvalues to identify the “elbow.”
R
scree_plot <- data.frame(
eigenvalues = eigen(cor(mtcars))$values,
component = 1:length(eigen(cor(mtcars))$values)
)
plot(scree_plot$component, scree_plot$eigenvalues, type = "b",
xlab = "Component Number", ylab = "Eigenvalue",
main = "Scree Plot")
abline(h = 1, col = "red", lty = 2)
Output:
 Scree Plot The scree plot shows a sharp drop after the second component, supporting the choice of 2 factors.
Step 5: Conduct EFAPerform EFA with 2 factors and Varimax rotation.
R
efa_result <- fa(r = mtcars, nfactors = 2, rotate = "varimax")
print(efa_result)
Output:
Factor Analysis using method = minres Call: fa(r = mtcars, nfactors = 2, rotate = "varimax") Standardized loadings (pattern matrix) based upon correlation matrix MR1 MR2 h2 u2 com mpg 0.68 -0.63 0.85 0.147 2.0 cyl -0.63 0.73 0.94 0.064 2.0 disp -0.73 0.61 0.90 0.102 1.9 hp -0.32 0.88 0.88 0.124 1.3 drat 0.81 -0.22 0.71 0.292 1.1 wt -0.78 0.45 0.82 0.179 1.6 qsec -0.15 -0.87 0.78 0.216 1.1 vs 0.30 -0.79 0.71 0.292 1.3 am 0.90 0.07 0.82 0.183 1.0 gear 0.88 0.15 0.80 0.200 1.1 carb 0.05 0.81 0.66 0.342 1.0
MR1 MR2 SS loadings 4.46 4.39 Proportion Var 0.41 0.40 Cumulative Var 0.41 0.81 Proportion Explained 0.50 0.50 Cumulative Proportion 0.50 1.00
Mean item complexity = 1.4 Test of the hypothesis that 2 factors are sufficient.
df null model = 55 with the objective function = 15.4 with Chi Square = 408.01 df of the model are 34 and the objective function was 2.76
The root mean square of the residuals (RMSR) is 0.04 The df corrected root mean square of the residuals is 0.06
The harmonic n.obs is 32 with the empirical chi square 6.87 with prob < 1 The total n.obs was 32 with Likelihood Chi Square = 69.56 with prob < 0.00031
Tucker Lewis Index of factoring reliability = 0.827 RMSEA index = 0.178 and the 90 % confidence intervals are 0.121 0.245 BIC = -48.28 Fit based upon off diagonal values = 0.99 Measures of factor score adequacy MR1 MR2 Correlation of (regression) scores with factors 0.98 0.98 Multiple R square of scores with factors 0.95 0.96 Minimum correlation of possible factor scores 0.91 0.92 This output shows the factor loadings, the proportion of variance explained by each factor, and various fit indices.
- Examine the factor loadings to understand which variables are associated with each factor.
- MR1 is associated with variables mpg, drat, wt, am, and gear (positive loadings), and cyl, disp (negative loadings).
- MR2 is associated with hp, qsec, vs, carb, and cyl, disp (positive loadings).
- Both factors explain 80.5% of the variance.
ConclusionExploratory Factor Analysis (EFA) was used to understand a bunch of car data. EFA helps find hidden patterns or groups in the data. It was found that the car data could be simplified into two main groups, like how some features related to the car’s performance (like miles per gallon or horsepower) grouped together. This makes it easier to see how different aspects of cars are connected.
Exploratory Factor Analysis (EFA) in R-FAQsWhen should I use Exploratory Factor Analysis (EFA) versus Confirmatory Factor Analysis (CFA)?EFA is preferred when exploring unknown factor structures without pre-specified hypotheses, while CFA is suitable for testing pre-defined structures based on theoretical assumptions.
How do I determine the number of factors to retain in EFA?The number of factors to retain in EFA can be determined using criteria such as eigenvalues (>1 rule or scree plot), explained variance, and theoretical relevance, balancing interpretability and simplicity.
Can EFA handle categorical variables?Yes, EFA can handle categorical variables using techniques like categorical principal component analysis (CATPCA) to incorporate categorical data into the factor analysis process.
How should I interpret factor loadings in EFA?Factor loadings (>0.5 or <-0.5) indicate strong relationships between variables and factors, with consistent loadings across items within a factor contributing to meaningful interpretation.
What measures are used to assess model fit in EFA?Model fit in EFA is assessed using measures such as the Kaiser-Meyer-Olkin (KMO) measure for sampling adequacy (>0.6 indicates suitability), Bartlett’s Test of Sphericity for significant correlations between variables, and scree plots and rotation methods for determining the number of factors and model simplification.
How should I report EFA results?When reporting EFA results, include factor loadings tables, communalities, scree plots, rotation matrices, and model fit statistics (KMO, Bartlett’s test), along with interpretations of factors based on loadings and theoretical context.
What are common pitfalls to avoid in EFA?Common pitfalls in EFA include over-extracting or under-extracting factors based on arbitrary criteria, ignoring data preparation steps such as handling missing values and outliers, and misinterpreting factor loadings without considering the context and theoretical framework.
|