![]() |
Regression analysis is a powerful statistical tool used to understand the relationship between a dependent variable and one or more independent variables. One crucial aspect of regression analysis is evaluating the accuracy of the model by examining residuals. Residuals represent the differences between observed and predicted values, providing insights into the model’s performance. In this guide, we will explore how to calculate residuals in regression analysis using R Programming Language. What is Residuals?Residuals are the differences between the observed values of a variable and the values predicted by a model. In the context of statistical modeling, residuals represent the discrepancies between the actual data points and the values predicted by the regression model. [Tex][ e_i = y_i – \hat{y}_i ][/Tex] Where,
In multiple linear regression, where we have multiple independent variables, the formula is the same, but the predicted value is calculated based on the regression equation involving all the independent variables. Why Residuals are important in regresion analysis ?
Why calculating residuals is useful ?
We generates sample housing data, fits a multiple linear regression model to predict housing prices, calculates residuals, and visualizes them in a residual plot. Step 1: Load necessary librariesWe are importing the ggplot2 library, which is a popular package for data visualization in R. We’ll use it later for plotting.
Step 2: Generate sample data
Output: Square_Footage Num_Bedrooms Location Price 1 1719.762 4 Urban 362371.5 2 1884.911 1 Urban 310189.4 3 2779.354 3 Urban 465522.5 4 2035.254 4 Urban 400230.5 5 2064.644 3 Rural 284817.9 6 2857.532 5 Urban 522903.9 Step 3: Fit a Multiple Regression ModelWe use the lm() function to fit a multiple linear regression model. The formula Price ~ Square_Footage + Num_Bedrooms + Location indicates that we are predicting the price based on square footage, number of bedrooms, and location.
Output: Call: lm(formula = Price ~ Square_Footage + Num_Bedrooms + Location, data = housing_data) Residuals: Min 1Q Median 3Q Max -21149.4 -3326.2 125.2 5022.3 18806.8 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 932.438 4063.619 0.229 0.819 Square_Footage 106.321 1.716 61.973 <2e-16 *** Num_Bedrooms 20945.034 497.287 42.119 <2e-16 *** LocationSuburban 64337.365 1934.405 33.260 <2e-16 *** LocationUrban 95754.979 1881.764 50.886 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 7660 on 95 degrees of freedom Multiple R-squared: 0.9884, Adjusted R-squared: 0.9879 F-statistic: 2016 on 4 and 95 DF, p-value: < 2.2e-16 Step 4: Calculate ResidualsWe calculate the residuals of the regression model using the residuals() function and store them in a variable called residuals.
Output: 1 2 3 4 5 -942.943094 -7848.767618 10496.239789 3372.669231 1535.387347 6 7 8 9 10 17675.571973 6042.861621 9968.944483 5250.472224 -2676.649331 11 12 13 14 15 5152.597587 2296.406649 5631.501853 456.675537 -10076.142571 16 17 18 19 20 5650.024885 9350.439133 -16670.172726 -3132.977647 558.588523 21 22 23 24 25 -3821.112884 907.077122 -10236.856584 5593.526603 1670.440058 26 27 28 29 30 -5590.737716 3904.111955 6994.387866 -3950.453738 1504.496056 31 32 33 34 35 -1072.377352 -1344.673114 -10604.305226 -10465.640513 1775.992734 36 37 38 39 40 -84.177278 9733.910312 150.637067 8032.284566 8640.310207 41 42 43 44 45 -1024.834974 10177.218996 18806.782008 -20999.403485 14207.243496 46 47 48 49 50 -7846.034120 8823.218408 1504.596371 -9664.537069 99.718207 51 52 53 54 55 1568.032462 -1909.367878 1229.180938 1715.646864 -7818.629356 56 57 58 59 60 7152.389441 -7702.260011 -8070.371295 7469.795913 -4797.538442 61 62 63 64 65 -3677.927338 799.584920 -2443.508624 16792.598320 -2773.445386 66 67 68 69 70 -1143.521343 -3208.941126 5102.933548 -160.349639 -20029.353841 71 72 73 74 75 -1705.140019 -15958.087846 4213.057903 5432.842456 1097.418097 76 77 78 79 80 -2833.705250 10804.400539 -5458.665560 -468.383637 4995.446344 81 82 83 84 85 -1846.253065 3071.728138 -2538.683623 2493.368535 -1672.555099 86 87 88 89 90 -1246.544914 1227.355437 -1.376243 661.219895 -3839.375244 91 92 93 94 95 -29.310642 9696.151552 -5247.799470 -3011.886497 -2623.101721 96 97 98 99 100 1258.832019 -21149.369307 -6971.401515 -4831.575526 474.910311 Step 5: Visualize Residuals
Output: ![]() Resisuals Vs Fitted ConclusionRegression analysis is a valuable tool for understanding the relationship between variables. Evaluating model accuracy through residual analysis is crucial. Residuals, the differences between observed and predicted values, highlight a model’s performance. Here we’ve explored how to calculate residuals in R. By following simple steps, we generate example data, fit a regression model, calculate residuals, and visualize them. This process provides insights into model validity, guiding further analysis and model refinement. |
Reffered: https://www.geeksforgeeks.org
R Programs |
Related |
---|
![]() |
![]() |
![]() |
![]() |
![]() |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 15 |