Full Information Maximum Likelihood for Missing Data in R - Coding

Missing data is a common issue in statistical analysis and can lead to biased results if not handled properly. Full Information Maximum Likelihood (FIML) is a robust method for dealing with missing data, particularly when the data is missing at random (MAR). FIML uses all available data to estimate parameters, providing unbiased and efficient estimates without the need for imputation. This article explains how to implement FIML for handling missing data in the R Programming Language.

What is Full Information Maximum Likelihood (FIML)?

FIML is an estimation method that uses all available data points in a dataset to estimate model parameters, even when some data points are missing. It does this by maximizing the likelihood function over the observed data, thus leveraging all available information.

Install and Load Necessary Packages: Ensure you have the required packages installed and loaded.
Load the Data: Import your dataset into R.
Induce Missing Data: For demonstration purposes, create missing data in the dataset.
Fit a Model Using FIML: Use structural equation modeling (SEM) with the lavaan package to fit a model using FIML.

Step 1: Install and Load Necessary Packages

Install the lavaan package, which supports FIML for handling missing data.

# Install and load the lavaan package
install.packages("lavaan")
library(lavaan)

Step 2: Load the Data

For demonstration purposes, we’ll use the built-in HolzingerSwineford1939 dataset from the lavaan package.

# Load the lavaan package
library(lavaan)

# Load the example dataset
data("HolzingerSwineford1939")
head(HolzingerSwineford1939)

Output:

  id sex ageyr agemo  school grade       x1   x2    x3       x4   x5        x6
1  1   1    13     1 Pasteur     7 3.333333 7.75 0.375 2.333333 5.75 1.2857143
2  2   2    13     7 Pasteur     7 5.333333 5.25 2.125 1.666667 3.00 1.2857143
3  3   2    13     1 Pasteur     7 4.500000 5.25 1.875 1.000000 1.75 0.4285714
4  4   1    13     2 Pasteur     7 5.333333 7.75 3.000 2.666667 4.50 2.4285714
5  5   2    12     2 Pasteur     7 4.833333 4.75 0.875 2.666667 4.00 2.5714286
6  6   2    14     1 Pasteur     7 5.333333 5.00 2.250 1.000000 3.00 0.8571429
        x7   x8       x9
1 3.391304 5.75 6.361111
2 3.782609 6.25 7.916667
3 3.260870 3.90 4.416667
4 3.000000 5.30 4.861111
5 3.695652 6.30 5.916667
6 4.347826 6.65 7.500000

Step 3: Induce Missing Data

To demonstrate FIML, we’ll artificially introduce some missing values into the dataset.

# Introduce missing data for demonstration
set.seed(123)
HolzingerSwineford1939$visual[1:10] <- NA
HolzingerSwineford1939$cubes[11:20] <- NA

Step 4: Fit a Model Using FIML

Specify and fit a confirmatory factor analysis (CFA) model using the lavaan package. By default, lavaan uses FIML to handle missing data.

# Specify a CFA model
model <- '
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  speed   =~ x7 + x8 + x9
'

# Fit the model using FIML (default method in lavaan for missing data)
fit <- cfa(model, data = HolzingerSwineford1939, missing = "fiml")

# Summarize the model fit
summary(fit, fit.measures = TRUE, standardized = TRUE)

Output:

lavaan 0.6.17 ended normally after 35 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        30

  Number of observations                           301
  Number of missing patterns                         1

Model Test User Model:
                                                      
  Test statistic                                85.306
  Degrees of freedom                                24
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                               918.852
  Degrees of freedom                                36
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.931
  Tucker-Lewis Index (TLI)                       0.896
                                                      
  Robust Comparative Fit Index (CFI)             0.931
  Robust Tucker-Lewis Index (TLI)                0.896

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -3737.745
  Loglikelihood unrestricted model (H1)      -3695.092
                                                      
  Akaike (AIC)                                7535.490
  Bayesian (BIC)                              7646.703
  Sample-size adjusted Bayesian (SABIC)       7551.560

Root Mean Square Error of Approximation:

  RMSEA                                          0.092
  90 Percent confidence interval - lower         0.071
  90 Percent confidence interval - upper         0.114
  P-value H_0: RMSEA <= 0.050                    0.001
  P-value H_0: RMSEA >= 0.080                    0.840
                                                      
  Robust RMSEA                                   0.092
  90 Percent confidence interval - lower         0.071
  90 Percent confidence interval - upper         0.114
  P-value H_0: Robust RMSEA <= 0.050             0.001
  P-value H_0: Robust RMSEA >= 0.080             0.840

Standardized Root Mean Square Residual:

  SRMR                                           0.060

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Observed
  Observed information based on                Hessian

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1                1.000                               0.900    0.772
    x2                0.554    0.109    5.066    0.000    0.498    0.424
    x3                0.729    0.117    6.220    0.000    0.656    0.581
  textual =~                                                            
    x4                1.000                               0.990    0.852
    x5                1.113    0.065   17.128    0.000    1.102    0.855
    x6                0.926    0.056   16.481    0.000    0.917    0.838
  speed =~                                                              
    x7                1.000                               0.619    0.570
    x8                1.180    0.150    7.851    0.000    0.731    0.723
    x9                1.082    0.195    5.543    0.000    0.670    0.665

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textual           0.408    0.080    5.124    0.000    0.459    0.459
    speed             0.262    0.055    4.735    0.000    0.471    0.471
  textual ~~                                                            
    speed             0.173    0.049    3.518    0.000    0.283    0.283

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1                4.936    0.067   73.473    0.000    4.936    4.235
   .x2                6.088    0.068   89.855    0.000    6.088    5.179
   .x3                2.250    0.065   34.579    0.000    2.250    1.993
   .x4                3.061    0.067   45.694    0.000    3.061    2.634
   .x5                4.341    0.074   58.452    0.000    4.341    3.369
   .x6                2.186    0.063   34.667    0.000    2.186    1.998
   .x7                4.186    0.063   66.766    0.000    4.186    3.848
   .x8                5.527    0.058   94.854    0.000    5.527    5.467
   .x9                5.374    0.058   92.546    0.000    5.374    5.334

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1                0.549    0.119    4.612    0.000    0.549    0.404
   .x2                1.134    0.104   10.875    0.000    1.134    0.821
   .x3                0.844    0.095    8.881    0.000    0.844    0.662
   .x4                0.371    0.048    7.739    0.000    0.371    0.275
   .x5                0.446    0.058    7.703    0.000    0.446    0.269
   .x6                0.356    0.043    8.200    0.000    0.356    0.298
   .x7                0.799    0.088    9.130    0.000    0.799    0.676
   .x8                0.488    0.092    5.321    0.000    0.488    0.477
   .x9                0.566    0.091    6.250    0.000    0.566    0.558
    visual            0.809    0.150    5.404    0.000    1.000    1.000
    textual           0.979    0.112    8.729    0.000    1.000    1.000
    speed             0.384    0.092    4.168    0.000    1.000    1.000

Model Fit Indices: The summary provides various fit indices (e.g., CFI, TLI, RMSEA) to assess how well the model fits the data.
Standardized Estimates: The standardized coefficients for each path in the model.
Parameter Estimates: The estimated values for each parameter in the model along with their standard errors and p-values.
Significant Parameters: Parameters with p-values less than a chosen significance level (e.g., 0.05) are considered statistically significant.
Model Fit: Good model fit is indicated by CFI and TLI values close to 1, and RMSEA values less than 0.05.

The output from the lavaan package in R shows that a confirmatory factor analysis (CFA) model with three latent variables (visual, textual, speed) was successfully estimated using Maximum Likelihood (ML) and converged after 35 iterations. The model fit indices indicate a moderately good fit (CFI=0.931, TLI=0.896, RMSEA=0.092). The factor loadings for all observed variables on their respective latent variables are significant. Covariances between latent variables are also significant, indicating relationships among them. Variances and intercepts of the observed variables reflect their levels and variability. Overall, the model provides a reasonable representation of the data structure, though some fit indices suggest room for improvement.

Conclusion

Full Information Maximum Likelihood (FIML) is a powerful method for handling missing data in statistical models. The lavaan package in R provides a convenient way to implement FIML in structural equation modeling. By following the steps outlined in this guide, you can effectively use FIML to handle missing data and obtain reliable parameter estimates in your analysis.

Reffered: https://www.geeksforgeeks.org

AI ML DS

Related
World Bank Dataset in R
Monitoring and Assessing the Significance of Changes in Time Series Data
Building a Rule-Based Chatbot with Natural Language Processing
Role of AI in Data Analytics
Lifelong Learning in AI: Revolutionizing Continuous Adaptation in Technology

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	14