Exploratory Factor Analysis (EFA) in R - Coding

EFA is a data reduction technique that aims to identify latent factors or constructs that explain patterns of correlations among observed variables. Exploratory Factor Analysis (EFA) in R Programming Language is commonly used in fields such as psychology, sociology, education, and market research to uncover the underlying structure of data.

Exploratory Factor Analysis

Exploratory Factor Analysis (EFA) is a statistical method used to discover the underlying structure of a large set of variables. Think of it as a way to find hidden patterns or groups (called “factors”) within a bunch of data. The main purpose of EFA is to simplify complex data by reducing the number of variables into fewer factors. These factors help us understand how different variables are related to each other. For example, if you’re studying various traits of people’s personalities, EFA can help identify broader personality traits that group.

Importance of Preparing the Dataset for EFA

Now we will discuss the importance of preparing the dataset for exploratory factor analysis (EFA) so it helps us work with exploratory factor analysis.

Dealing with Missing Data: Missing data should be handled carefully. Methods include removing missing data, imputing values, or using statistical techniques to estimate missing values.
Checking for Outliers: Outliers are extreme values that can skew results. Identifying and potentially removing outliers can improve the accuracy of EFA.
Ensuring Adequate Sample Size: A larger sample size provides more reliable results. A common guideline is having at least 5-10 observations per variable.

Step-by-Step Guide to Perform EFA Using R

Now we will discuss Step-by-Step Guide to Perform Exploratory Factor Analysis (EFA) in R Programming Language.

Step 1: Install and Load Packages

We install and load the required packages for Exploratory Factor Analysis.

install.packages("psych")  # for psychometric analyses.
library(psych)
install.packages("factoextra") #for enhanced visualization of multivariate data analysis
library(factoextra) 
install.packages("lavaan")  # for structural equation modeling (SEM).
library(lavaan)

Step 2: Load and Inspect the Dataset

Load the mtcars dataset and view the first few rows.

data(mtcars)
head(mtcars)

Output:

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Step 3: Performing EDA

Check for missing values and outliers in the dataset.

sum(is.na(mtcars))

boxplot(mtcars)

Output:

[1] 0

Creating Boxplot

The boxplot visualizes the distribution and potential outliers for each variable.

Step 4: Determine the Number of Factors

Now we Determine the number of factors to retain in a factor analysis is a crucial step in the analysis process. Various methods can be employed to make this decision, including statistical criteria and visual inspection Kaiser Criterion is one of them.

Kaiser Criterion

Calculate eigenvalues and use the Kaiser Criterion (eigenvalues > 1) to determine the number of factors.
The first two eigenvalues are greater than 1, suggesting 2 factors.

eigenvalues <- eigen(cor(mtcars))$values
print(eigenvalues)

Output:

[1] 6.60840025 2.65046789 0.62719727 0.26959744 0.22345110 0.21159612
[7] 0.13526199 0.12290143 0.07704665 0.05203544 0.02204441

Now we Visualize the eigenvalues to identify the “elbow.”

scree_plot <- data.frame(
  eigenvalues = eigen(cor(mtcars))$values,
  component = 1:length(eigen(cor(mtcars))$values)
)

plot(scree_plot$component, scree_plot$eigenvalues, type = "b",
     xlab = "Component Number", ylab = "Eigenvalue",
     main = "Scree Plot")
abline(h = 1, col = "red", lty = 2)

Output:

Scree Plot

The scree plot shows a sharp drop after the second component, supporting the choice of 2 factors.

Step 5: Conduct EFA

Perform EFA with 2 factors and Varimax rotation.

efa_result <- fa(r = mtcars, nfactors = 2, rotate = "varimax")
print(efa_result)

Output:

Factor Analysis using method =  minres
Call: fa(r = mtcars, nfactors = 2, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
       MR1   MR2   h2    u2 com
mpg   0.68 -0.63 0.85 0.147 2.0
cyl  -0.63  0.73 0.94 0.064 2.0
disp -0.73  0.61 0.90 0.102 1.9
hp   -0.32  0.88 0.88 0.124 1.3
drat  0.81 -0.22 0.71 0.292 1.1
wt   -0.78  0.45 0.82 0.179 1.6
qsec -0.15 -0.87 0.78 0.216 1.1
vs    0.30 -0.79 0.71 0.292 1.3
am    0.90  0.07 0.82 0.183 1.0
gear  0.88  0.15 0.80 0.200 1.1
carb  0.05  0.81 0.66 0.342 1.0

                       MR1  MR2
SS loadings           4.46 4.39
Proportion Var        0.41 0.40
Cumulative Var        0.41 0.81
Proportion Explained  0.50 0.50
Cumulative Proportion 0.50 1.00

Mean item complexity =  1.4
Test of the hypothesis that 2 factors are sufficient.

df null model =  55  with the objective function =  15.4 with Chi Square =  408.01
df of  the model are 34  and the objective function was  2.76 

The root mean square of the residuals (RMSR) is  0.04 
The df corrected root mean square of the residuals is  0.06 

The harmonic n.obs is  32 with the empirical chi square  6.87  with prob <  1 
The total n.obs was  32  with Likelihood Chi Square =  69.56  with prob <  0.00031 

Tucker Lewis Index of factoring reliability =  0.827
RMSEA index =  0.178  and the 90 % confidence intervals are  0.121 0.245
BIC =  -48.28
Fit based upon off diagonal values = 0.99
Measures of factor score adequacy             
                                                   MR1  MR2
Correlation of (regression) scores with factors   0.98 0.98
Multiple R square of scores with factors          0.95 0.96
Minimum correlation of possible factor scores     0.91 0.92

This output shows the factor loadings, the proportion of variance explained by each factor, and various fit indices.

Examine the factor loadings to understand which variables are associated with each factor.
MR1 is associated with variables mpg, drat, wt, am, and gear (positive loadings), and cyl, disp (negative loadings).
MR2 is associated with hp, qsec, vs, carb, and cyl, disp (positive loadings).
Both factors explain 80.5% of the variance.

Conclusion

Exploratory Factor Analysis (EFA) was used to understand a bunch of car data. EFA helps find hidden patterns or groups in the data. It was found that the car data could be simplified into two main groups, like how some features related to the car’s performance (like miles per gallon or horsepower) grouped together. This makes it easier to see how different aspects of cars are connected.

Exploratory Factor Analysis (EFA) in R-FAQs

When should I use Exploratory Factor Analysis (EFA) versus Confirmatory Factor Analysis (CFA)?

EFA is preferred when exploring unknown factor structures without pre-specified hypotheses, while CFA is suitable for testing pre-defined structures based on theoretical assumptions.

How do I determine the number of factors to retain in EFA?

The number of factors to retain in EFA can be determined using criteria such as eigenvalues (>1 rule or scree plot), explained variance, and theoretical relevance, balancing interpretability and simplicity.

Can EFA handle categorical variables?

Yes, EFA can handle categorical variables using techniques like categorical principal component analysis (CATPCA) to incorporate categorical data into the factor analysis process.

How should I interpret factor loadings in EFA?

Factor loadings (>0.5 or <-0.5) indicate strong relationships between variables and factors, with consistent loadings across items within a factor contributing to meaningful interpretation.

What measures are used to assess model fit in EFA?

Model fit in EFA is assessed using measures such as the Kaiser-Meyer-Olkin (KMO) measure for sampling adequacy (>0.6 indicates suitability), Bartlett’s Test of Sphericity for significant correlations between variables, and scree plots and rotation methods for determining the number of factors and model simplification.

How should I report EFA results?

When reporting EFA results, include factor loadings tables, communalities, scree plots, rotation matrices, and model fit statistics (KMO, Bartlett’s test), along with interpretations of factors based on loadings and theoretical context.

What are common pitfalls to avoid in EFA?

Common pitfalls in EFA include over-extracting or under-extracting factors based on arbitrary criteria, ignoring data preparation steps such as handling missing values and outliers, and misinterpreting factor loadings without considering the context and theoretical framework.

Reffered: https://www.geeksforgeeks.org

R Language

Related
Pros and Cons of R Programming Language
Solving RMSE(Root Mean Square Error) Calculation Errors in R
How to Download and Install R Programming Language On Mac?
How to Create a DataFrame with Nested Array
R Shiny Examples

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	15