Horje
How to Include an Interaction Term in GAM in R?

Generalized Additive Models (GAM) are an extension of Generalized Linear Models (GLM) that allow for flexibility in modeling nonlinear relationships between predictors and the outcome variable. Generalized Linear Models (GLM) are particularly useful when the relationship between the predictor variables and the response variable is not well-represented by a straight line.

What is the Interaction Term?

Interaction terms in GAMs allow us to explore how the effect of one predictor variable on the response variable changes at different levels of another predictor variable. In this article, we’ll demonstrate how to include an interaction term in a GAM using the Boston housing dataset. To perform this operation we will use the mgcv package in R, which is mainly used to fit the model.

Basic Components of a GAM

Here we will discuss the Basic Components of a GAM.

  • Linear Predictors: GAMs include linear predictors, similar to traditional linear regression modeling, but they also incorporate additional components.
  • Smooth Functions: GAMs employ smooth functions to capture non-linear relationships. These functions are typically spline functions or other types of smooth curves.
  • Link Function: Like generalized linear models (GLMs), GAMs use a link function to relate the expected value of the response variable to the linear predictor.
  • Additive Structure: GAMs are additive models, meaning that the contribution of each smooth function is additive, allowing for the modelling of complex relationships as a sum of simpler components.

Generalized Additive Model on Boston Housing Dataset

The Boston housing dataset consists of various predictors such as crime rate (CRIM), average number of rooms per dwelling (RM), and proportion of owner-occupied units built before 1940 (AGE), among others, to predict the median value of owner-occupied homes (MEDV). We’ll be using a subset of these predictors to demonstrate the use of interaction terms in a GAM.

Step 1: Install and load required libraries

Load Necessary Libraries in RStudios mentioned below if they are not installed before.

R
library(mgcv)
library(dplyr)

Step 2: Load and Prepare the Dataset

For this example, we are using the boston housing dataset as mentioned earlier in this article. Follow to below code to include this dataset in you application.

Dataset Link: Boston housing dataset

R
boston_data <- read.csv("/mnt/data/HousingData.csv")
# Preview the data
head(boston_data)

Output:

     CRIM ZN INDUS CHAS   NOX    RM  AGE    DIS RAD TAX PTRATIO      B LSTAT MEDV
1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90  4.98 24.0
2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90  9.14 21.6
3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83  4.03 34.7
4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63  2.94 33.4
5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90    NA 36.2
6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7 394.12  5.21 28.7

Step 3: Fit a GAM with an Interaction Term

We’ll fit a GAM to predict MEDV using RM (average number of rooms per dwelling) and AGE (proportion of owner-occupied units built before 1940) with an interaction term. The interaction term will allow us to see how the relationship between RM and MEDV changes with different levels of AGE.

R
# Fit a GAM with an interaction term between RM and AGE
gam_model <- gam(MEDV ~ s(RM) + s(AGE) + ti(RM, AGE), data = boston_data)
# Summary of the model
summary(gam_model)

Output:

Family: gaussian 
Link function: identity 

Formula:
mpg ~ s(hp) + s(wt) + ti(hp, wt)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  18.8436     0.4925   38.26   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
            edf Ref.df     F p-value   
s(hp)     1.000   1.00 0.546 0.46840   
s(wt)     1.000   1.00 6.853 0.01648 * 
ti(hp,wt) 8.542  10.24 3.793 0.00706 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.917   Deviance explained = 94.5%
GCV = 4.6958  Scale est. = 3.0021    n = 32

Step 4:Visualize the Interaction Term

Visualizing the interaction term can help us understand how RM and AGE interact to influence MEDV. We use the plot function in R for this purpose.

R
# Plot the interaction term
plot(gam_model, pages = 1)

Output:

gh

Include an Interaction Term in GAM in R

The plot will show how the effect of RM on MEDV changes with different levels of AGE. The image shows the output plots from a Generalized Additive Model (GAM) fitted with an interaction term between RM (average number of rooms per dwelling) and AGE (proportion of owner-occupied units built before 1940) from the Boston housing dataset. Here’s a list of breakdowns of what each plot implies:

Top Left Plot (s(RM, 8.11))

  • This plot shows the smooth function for RM (average number of rooms).
  • The y-axis represents the contribution of RM to the response variable MEDV (median value of owner-occupied homes in $1000s).
  • The solid line in the plot represents the estimated smooth function, and the dashed lines indicate the confidence intervals.
  • The non-linear shape suggests that RM has a non-linear effect on MEDV.

Top Right Plot (s(AGE, 1.78))

  • This plot shows the smooth function for AGE (proportion of owner-occupied units built before 1940).
  • The y-axis represents the contribution of AGE to MEDV.
  • The almost flat line indicates that AGE has a relatively minor effect on MEDV.

Bottom Plot (ti(RM, AGE))

  • This plot shows the interaction effect between RM and AGE.
  • The contours in this plot represent different levels of the interaction effect, with the x-axis being RM and the y-axis being AGE.
  • The black lines indicate higher interaction effects, and the red/green lines represent lower interaction effects with confidence intervals.
  • The plot suggests that there is a significant interaction between RM and AGE meaning that the effect of RM on MEDV depends on the value of AGE and vice versa.

Overall, the model suggests that while RM has a notable non-linear effect on MEDV, the effect of AGE is relatively small, and there is a significant interaction between RM and AGE.

Conclusion

Relationships between predictors and the response variable can be understood more deeply when interaction terms are included in a GAM. Using the Boston housing dataset, this article got you through the steps of how to fit a GAM with an interaction term. This allows us to investigate the link that develops between the number of rooms per residence (RM) and the median value of dwellings (MEDV) at varying proportions of owner-occupied units constructed prior to 1940 (AGE). To find intricate links in your data, you can apply this method to additional datasets and predictors.Feel free to explore the world of datasets and visualizations with R and its packages.




Reffered: https://www.geeksforgeeks.org


AI ML DS

Related
How to fix &quot;Pandas : TypeError: float() argument must be a string or a number&quot; How to fix &quot;Pandas : TypeError: float() argument must be a string or a number&quot;
Movie and TV Show Recommendation Engine in R Movie and TV Show Recommendation Engine in R
How to Get an Internship as a Machine Learning Engineer How to Get an Internship as a Machine Learning Engineer
Confidence Intervals for Caret Package in R Confidence Intervals for Caret Package in R
Clinical Trial Outcome Analysis in R Clinical Trial Outcome Analysis in R

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
20