Horje
Full Information Maximum Likelihood for Missing Data in R

Missing data is a common issue in statistical analysis and can lead to biased results if not handled properly. Full Information Maximum Likelihood (FIML) is a robust method for dealing with missing data, particularly when the data is missing at random (MAR). FIML uses all available data to estimate parameters, providing unbiased and efficient estimates without the need for imputation. This article explains how to implement FIML for handling missing data in the R Programming Language.

What is Full Information Maximum Likelihood (FIML)?

FIML is an estimation method that uses all available data points in a dataset to estimate model parameters, even when some data points are missing. It does this by maximizing the likelihood function over the observed data, thus leveraging all available information.

  • Install and Load Necessary Packages: Ensure you have the required packages installed and loaded.
  • Load the Data: Import your dataset into R.
  • Induce Missing Data: For demonstration purposes, create missing data in the dataset.
  • Fit a Model Using FIML: Use structural equation modeling (SEM) with the lavaan package to fit a model using FIML.

Step 1: Install and Load Necessary Packages

Install the lavaan package, which supports FIML for handling missing data.

R
# Install and load the lavaan package
install.packages("lavaan")
library(lavaan)

Step 2: Load the Data

For demonstration purposes, we’ll use the built-in HolzingerSwineford1939 dataset from the lavaan package.

R
# Load the lavaan package
library(lavaan)

# Load the example dataset
data("HolzingerSwineford1939")
head(HolzingerSwineford1939)

Output:

  id sex ageyr agemo  school grade       x1   x2    x3       x4   x5        x6
1 1 1 13 1 Pasteur 7 3.333333 7.75 0.375 2.333333 5.75 1.2857143
2 2 2 13 7 Pasteur 7 5.333333 5.25 2.125 1.666667 3.00 1.2857143
3 3 2 13 1 Pasteur 7 4.500000 5.25 1.875 1.000000 1.75 0.4285714
4 4 1 13 2 Pasteur 7 5.333333 7.75 3.000 2.666667 4.50 2.4285714
5 5 2 12 2 Pasteur 7 4.833333 4.75 0.875 2.666667 4.00 2.5714286
6 6 2 14 1 Pasteur 7 5.333333 5.00 2.250 1.000000 3.00 0.8571429
x7 x8 x9
1 3.391304 5.75 6.361111
2 3.782609 6.25 7.916667
3 3.260870 3.90 4.416667
4 3.000000 5.30 4.861111
5 3.695652 6.30 5.916667
6 4.347826 6.65 7.500000

Step 3: Induce Missing Data

To demonstrate FIML, we’ll artificially introduce some missing values into the dataset.

R
# Introduce missing data for demonstration
set.seed(123)
HolzingerSwineford1939$visual[1:10] <- NA
HolzingerSwineford1939$cubes[11:20] <- NA

Step 4: Fit a Model Using FIML

Specify and fit a confirmatory factor analysis (CFA) model using the lavaan package. By default, lavaan uses FIML to handle missing data.

R
# Specify a CFA model
model <- '
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  speed   =~ x7 + x8 + x9
'

# Fit the model using FIML (default method in lavaan for missing data)
fit <- cfa(model, data = HolzingerSwineford1939, missing = "fiml")

# Summarize the model fit
summary(fit, fit.measures = TRUE, standardized = TRUE)

Output:

lavaan 0.6.17 ended normally after 35 iterations

Estimator ML
Optimization method NLMINB
Number of model parameters 30

Number of observations 301
Number of missing patterns 1

Model Test User Model:

Test statistic 85.306
Degrees of freedom 24
P-value (Chi-square) 0.000

Model Test Baseline Model:

Test statistic 918.852
Degrees of freedom 36
P-value 0.000

User Model versus Baseline Model:

Comparative Fit Index (CFI) 0.931
Tucker-Lewis Index (TLI) 0.896

Robust Comparative Fit Index (CFI) 0.931
Robust Tucker-Lewis Index (TLI) 0.896

Loglikelihood and Information Criteria:

Loglikelihood user model (H0) -3737.745
Loglikelihood unrestricted model (H1) -3695.092

Akaike (AIC) 7535.490
Bayesian (BIC) 7646.703
Sample-size adjusted Bayesian (SABIC) 7551.560

Root Mean Square Error of Approximation:

RMSEA 0.092
90 Percent confidence interval - lower 0.071
90 Percent confidence interval - upper 0.114
P-value H_0: RMSEA <= 0.050 0.001
P-value H_0: RMSEA >= 0.080 0.840

Robust RMSEA 0.092
90 Percent confidence interval - lower 0.071
90 Percent confidence interval - upper 0.114
P-value H_0: Robust RMSEA <= 0.050 0.001
P-value H_0: Robust RMSEA >= 0.080 0.840

Standardized Root Mean Square Residual:

SRMR 0.060

Parameter Estimates:

Standard errors Standard
Information Observed
Observed information based on Hessian

Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
visual =~
x1 1.000 0.900 0.772
x2 0.554 0.109 5.066 0.000 0.498 0.424
x3 0.729 0.117 6.220 0.000 0.656 0.581
textual =~
x4 1.000 0.990 0.852
x5 1.113 0.065 17.128 0.000 1.102 0.855
x6 0.926 0.056 16.481 0.000 0.917 0.838
speed =~
x7 1.000 0.619 0.570
x8 1.180 0.150 7.851 0.000 0.731 0.723
x9 1.082 0.195 5.543 0.000 0.670 0.665

Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
visual ~~
textual 0.408 0.080 5.124 0.000 0.459 0.459
speed 0.262 0.055 4.735 0.000 0.471 0.471
textual ~~
speed 0.173 0.049 3.518 0.000 0.283 0.283

Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 4.936 0.067 73.473 0.000 4.936 4.235
.x2 6.088 0.068 89.855 0.000 6.088 5.179
.x3 2.250 0.065 34.579 0.000 2.250 1.993
.x4 3.061 0.067 45.694 0.000 3.061 2.634
.x5 4.341 0.074 58.452 0.000 4.341 3.369
.x6 2.186 0.063 34.667 0.000 2.186 1.998
.x7 4.186 0.063 66.766 0.000 4.186 3.848
.x8 5.527 0.058 94.854 0.000 5.527 5.467
.x9 5.374 0.058 92.546 0.000 5.374 5.334

Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 0.549 0.119 4.612 0.000 0.549 0.404
.x2 1.134 0.104 10.875 0.000 1.134 0.821
.x3 0.844 0.095 8.881 0.000 0.844 0.662
.x4 0.371 0.048 7.739 0.000 0.371 0.275
.x5 0.446 0.058 7.703 0.000 0.446 0.269
.x6 0.356 0.043 8.200 0.000 0.356 0.298
.x7 0.799 0.088 9.130 0.000 0.799 0.676
.x8 0.488 0.092 5.321 0.000 0.488 0.477
.x9 0.566 0.091 6.250 0.000 0.566 0.558
visual 0.809 0.150 5.404 0.000 1.000 1.000
textual 0.979 0.112 8.729 0.000 1.000 1.000
speed 0.384 0.092 4.168 0.000 1.000 1.000
  • Model Fit Indices: The summary provides various fit indices (e.g., CFI, TLI, RMSEA) to assess how well the model fits the data.
  • Standardized Estimates: The standardized coefficients for each path in the model.
  • Parameter Estimates: The estimated values for each parameter in the model along with their standard errors and p-values.
  • Significant Parameters: Parameters with p-values less than a chosen significance level (e.g., 0.05) are considered statistically significant.
  • Model Fit: Good model fit is indicated by CFI and TLI values close to 1, and RMSEA values less than 0.05.

The output from the lavaan package in R shows that a confirmatory factor analysis (CFA) model with three latent variables (visual, textual, speed) was successfully estimated using Maximum Likelihood (ML) and converged after 35 iterations. The model fit indices indicate a moderately good fit (CFI=0.931, TLI=0.896, RMSEA=0.092). The factor loadings for all observed variables on their respective latent variables are significant. Covariances between latent variables are also significant, indicating relationships among them. Variances and intercepts of the observed variables reflect their levels and variability. Overall, the model provides a reasonable representation of the data structure, though some fit indices suggest room for improvement.

Conclusion

Full Information Maximum Likelihood (FIML) is a powerful method for handling missing data in statistical models. The lavaan package in R provides a convenient way to implement FIML in structural equation modeling. By following the steps outlined in this guide, you can effectively use FIML to handle missing data and obtain reliable parameter estimates in your analysis.




Reffered: https://www.geeksforgeeks.org


AI ML DS

Related
World Bank Dataset in R World Bank Dataset in R
Monitoring and Assessing the Significance of Changes in Time Series Data Monitoring and Assessing the Significance of Changes in Time Series Data
Building a Rule-Based Chatbot with Natural Language Processing Building a Rule-Based Chatbot with Natural Language Processing
Role of AI in Data Analytics Role of AI in Data Analytics
Lifelong Learning in AI: Revolutionizing Continuous Adaptation in Technology Lifelong Learning in AI: Revolutionizing Continuous Adaptation in Technology

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
14