Analyzing Online Course Engagement in R - Coding

Understanding online course engagement involves exploring, cleaning, transforming data, and visualizing it. This helps us see how people act online, find trends, and improve online learning. We’ll discuss why teachers and online platforms must know how engaged students are. Additionally, we’ll give clear steps for analyzing online course data.

Introduction to Online Course Engagement

Online course engagement refers to the level of interaction and participation of learners in an online course. High engagement typically correlates with better learning outcomes and satisfaction. Engagement can be measured through various indicators, including login frequency, time spent on course materials, participation in discussions, assignment submissions, and quiz performance.

Dataset Overview

The dataset is sourced from an online learning platform. It contains information about online courses and student engagement metrics.

course_id: Identifier for the course
userid_DI: Unique identifier for the user
final_cc_cname_DI: Country name of the user
LoE_DI: Level of education of the user
gender: Gender of the user
start_time_DI: Start time of the course
last_event_DI: Last event date for the user
grade: Grade achieved by the user
YoB: Year of birth of the user

Other columns include various engagement metrics like registered, viewed, explored, certified, as well as user roles, activity days, video play counts, etc.

Dataset Link: Online Course Engagement

Insights from this dataset can help in optimizing course design, improving user engagement, and enhancing overall learning outcomes in online education platforms.

Step-by-Step Implementation in R

Now we will discuss Step-by-Step Implementation to Analyzing Online Course Engagement in R Programming Language.

Step 1: Load the necessary libraries and Dataset

First we will load the required libraries and the dataset.

# Install necessary packages if not already installed
install.packages("dplyr")
install.packages("ggplot2")
install.packages("readr")

# Load libraries
library(dplyr)
library(ggplot2)
library(readr)

# Load the dataset
file_path <- "C:/Users/Tonmoy/Downloads/Dataset/Courses.csv"
courses_data <- read_csv(file_path)

head(courses_data)

Output:

  index Random                  course_id      userid_DI registered viewed explored
1     0     86 HarvardX/CB22x/2013_Spring MHxPC130442623          1      0        0
2     1      7        HarvardX/CS50x/2012 MHxPC130442623          1      1        0
3     2     70 HarvardX/CB22x/2013_Spring MHxPC130275857          1      0        0
4     3     60        HarvardX/CS50x/2012 MHxPC130275857          1      0        0
5     4      3 HarvardX/ER22x/2013_Spring MHxPC130275857          1      0        0
6     5     69  HarvardX/PH207x/2012_Fall MHxPC130275857          1      1        1
  certified final_cc_cname_DI LoE_DI YoB gender grade start_time_DI last_event_DI nevents
1         0     United States         NA            0    12/19/2012    11/17/2013      NA
2         0     United States         NA            0    10/15/2012                    NA
3         0     United States         NA            0      2/8/2013    11/17/2013      NA
4         0     United States         NA            0     9/17/2012                    NA
5         0     United States         NA            0    12/19/2012                    NA
6         0     United States         NA            0     9/17/2012     5/23/2013     502
  ndays_act nplay_video nchapters nforum_posts roles incomplete_flag
1         9          NA        NA            0    NA               1
2         9          NA         1            0    NA               1
3        16          NA        NA            0    NA               1
4        16          NA        NA            0    NA               1
5        16          NA        NA            0    NA               1
6        16          50        12            0    NA              NA

Step 2: Perform Exploratory Data Analysis

Now we perform EDA on our dataset so we calculate Summary statistics obtained with summary(courses_data) provide key descriptive measures such as mean, median, minimum, maximum, and quartiles for numerical variables, and frequency counts for categorical variables. Handle the missing values Data cleaning is crucial for handling missing values or anomalies in the dataset. Handle them by filling missing values with appropriate methods like mean, median, or specific values.

# View summary statistics of the dataset
summary(courses_data)

# View columns with missing values
colSums(is.na(courses_data))

# Load necessary library
library(dplyr)

# Fill missing values with appropriate methods
courses_data <- courses_data %>%
  mutate(
    nevents = ifelse(is.na(nevents), median(nevents, na.rm = TRUE), nevents),
    ndays_act = ifelse(is.na(ndays_act), median(ndays_act, na.rm = TRUE), ndays_act),
    nplay_video = ifelse(is.na(nplay_video), median(nplay_video, na.rm = TRUE), 
                                                                        nplay_video),
    nchapters = ifelse(is.na(nchapters), median(nchapters, na.rm = TRUE), nchapters),
    nforum_posts = ifelse(is.na(nforum_posts), 0, nforum_posts),
    roles = ifelse(is.na(roles), "Unknown", roles),
    incomplete_flag = ifelse(is.na(incomplete_flag), "Unknown", incomplete_flag)
  )

# Print the summary of the updated dataset to verify changes
sum(is.na(courses_data))

Output:

     index            Random                            course_id     
 Min.   :     0   Min.   :  1.00   HarvardX/CS50x/2012       :169621  
 1st Qu.:160284   1st Qu.: 25.00   MITx/6.00x/2012_Fall      : 66731  
 Median :320569   Median : 50.00   MITx/6.00x/2013_Spring    : 57715  
 Mean   :320569   Mean   : 50.43   HarvardX/ER22x/2013_Spring: 57406  
 3rd Qu.:480853   3rd Qu.: 75.00   HarvardX/PH207x/2012_Fall : 41592  
 Max.   :641137   Max.   :100.00   MITx/6.002x/2012_Fall     : 40811  
                                   (Other)                   :207262  
          userid_DI        registered     viewed          explored        certified      
 MHxPC130027283:    16   Min.   :1    Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
 MHxPC130103410:    16   1st Qu.:1    1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000  
 MHxPC130121287:    16   Median :1    Median :1.0000   Median :0.0000   Median :0.00000  
 MHxPC130126780:    16   Mean   :1    Mean   :0.6243   Mean   :0.0619   Mean   :0.02759  
 MHxPC130183602:    16   3rd Qu.:1    3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.00000  
 MHxPC130200926:    16   Max.   :1    Max.   :1.0000   Max.   :1.0000   Max.   :1.00000  
 (Other)       :641042                                                                   
      final_cc_cname_DI      LoE_DI           YoB           gender          grade        
 United States :184240   Min.   :1.000   Min.   :1931   Min.   :1.000   Min.   :0.00000  
 India         : 88696   1st Qu.:2.000   1st Qu.:1983   1st Qu.:2.000   1st Qu.:0.00000  
 Unknown/Other : 82029   Median :2.000   Median :1988   Median :3.000   Median :0.00000  
 Other Europe  : 40377   Mean   :3.511   Mean   :1986   Mean   :2.507   Mean   :0.03095  
 Other Africa  : 23897   3rd Qu.:6.000   3rd Qu.:1991   3rd Qu.:3.000   3rd Qu.:0.00000  
 United Kingdom: 22131   Max.   :6.000   Max.   :2013   Max.   :4.000   Max.   :1.01000  
 (Other)       :199768                                                                   
    start_time_DI       last_event_DI       nevents         ndays_act     
 8/17/2012 : 10165             :178954   Min.   :     1   Min.   :  1.00  
 1/23/2013 :  8368   11/17/2013:  7046   1st Qu.:     3   1st Qu.:  1.00  
 10/15/2012:  6766   3/13/2013 :  4747   Median :    24   Median :  2.00  
 8/16/2012 :  6369   3/14/2013 :  3470   Mean   :   431   Mean   :  5.71  
 12/20/2012:  5858   10/16/2012:  3339   3rd Qu.:   158   3rd Qu.:  4.00  
 2/14/2013 :  5810   10/15/2012:  3125   Max.   :197757   Max.   :205.00  
 (Other)   :597802   (Other)   :440457   NA's   :199151   NA's   :162743  
  nplay_video        nchapters       nforum_posts       roles         incomplete_flag 
 Min.   :    1.0   Min.   : 1.00    Min.   : 0.00000   Mode:logical   Min.   :1       
 1st Qu.:    5.0   1st Qu.: 1.00    1st Qu.: 0.00000   NA's:641138    1st Qu.:1       
 Median :   18.0   Median : 2.00    Median : 0.00000                  Median :1       
 Mean   :  114.8   Mean   : 3.63    Mean   : 0.01897                  Mean   :1       
 3rd Qu.:   73.0   3rd Qu.: 4.00    3rd Qu.: 0.00000                  3rd Qu.:1       
 Max.   :98517.0   Max.   :48.00    Max.   :20.00000                  Max.   :1       
 NA's   :457530    NA's   :258753                                     NA's   :540977  

            index            Random         course_id         userid_DI        registered 
                0                 0                 0                 0                 0 
           viewed          explored         certified final_cc_cname_DI            LoE_DI 
                0                 0                 0                 0                 0 
              YoB            gender             grade     start_time_DI     last_event_DI 
                0                 0                 0                 0                 0 
          nevents         ndays_act       nplay_video         nchapters      nforum_posts 
           199151            162743            457530            258753                 0 
            roles   incomplete_flag 
           641138            540977 

[1] 0

Step 3: Data Transformation

Data transformation involves converting data types and creating new variables if necessary. In this step, dates are converted to the Date type, and a new variable for course duration in days is created.

# Convert dates to Date type
courses_data <- courses_data %>%
  mutate(
    start_time_DI = as.Date(start_time_DI, format = "%m/%d/%Y"),
    last_event_DI = as.Date(last_event_DI, format = "%m/%d/%Y")
  )

# Create a new column for course duration in days
courses_data <- courses_data %>%
  mutate(course_duration_days = as.numeric(last_event_DI - start_time_DI))

Step 4: Data Visualization

Visualize the distribution of grades achieved by students in the courses. This visualization helps understand the overall performance of students in the courses. It can highlight whether the majority of students are performing well or if there’s a wide variation in grades.

# Histogram of grades
ggplot(courses_data, aes(x = grade)) +
  geom_histogram(binwidth = 0.1, fill = "blue", color = "black") +
  labs(title = "Distribution of Grades", x = "Grade", y = "Frequency")

Output:

Distribution Of Grades

Relationship between Interactions and Assessment Score

The scatter plot displays individual data points, where each point represents a student’s assessment score (on the y-axis) against the number of interactions they had (on the x-axis). By observing the scatter plot, you can identify any trends or patterns in the relationship between the number of interactions and assessment scores.

# Scatter plot of interactions (number of events) vs. assessment score (grade)
ggplot(courses_data, aes(x = nevents, y = grade)) +
  geom_point(color = "blue") +
  labs(title = "Relationship between Interactions and Assessment Score",
       x = "Number of Events",
       y = "Assessment Score (Grade)")

Output:

Analyzing Online Course Engagement in R

Visualize Enrollment by Country

The plot allows us to quickly see which countries contribute the most enrollments to the dataset. We can observe the relative enrollment sizes between different countries.

The countries with the tallest bars represent the top enrolling countries. These countries have the highest number of enrollments in the dataset.
By analyzing the enrollment distribution across countries, we can gain insights into the geographic reach and popularity of the courses. It helps in understanding the global participation trends and potentially identifying regions of interest for targeted marketing or outreach efforts.

# Bar plot of enrollments by country
top_countries <- courses_data %>%
  group_by(final_cc_cname_DI) %>%
  summarise(enrollments = n()) %>%
  arrange(desc(enrollments)) %>%
  head(10)

ggplot(top_countries, aes(x = reorder(final_cc_cname_DI, -enrollments),
                          y = enrollments)) +
  geom_bar(stat = "identity", fill = "green") +
  labs(title = "Top 10 Countries by Enrollment", x = "Country",
       y = "Number of Enrollments") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Output:

Visualize the country based on enrollment

Course Enrollment Over Time

Plot the number of course enrollments over time (e.g., by month or year). This visualization can show trends in course enrollment, indicating periods of high or low enrollment. It can also help identify any seasonality or patterns in enrollment behavior.

#Course Enrollment Over Time
ggplot(courses_data, aes(x = start_time_DI)) +
  geom_bar(stat = "count", fill = "lightgreen") +
  labs(title = "Course Enrollment Over Time", x = "Date", y = "Number of Enrollments")

Output:

Visualize Course Enrollment Over Time

Gender Distribution

Create a pie chart or bar chart showing the distribution of gender among course participants. Understanding the gender distribution can provide insights into the demographic makeup of the student population. It can also help assess whether there are any gender disparities in course enrollment.

# Calculate gender distribution
gender_distribution <- courses_data %>%
  group_by(gender) %>%
  summarise(participants = n())

# Create pie chart
ggplot(gender_distribution, aes(x = "", y = participants, fill = gender)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start = 0) +
  labs(title = "Gender Distribution Among Course Participants", fill = "Gender") +
  theme_void() +
  theme(legend.position = "right")

Output:

Visualize Gender distribution

Age Distribution of Students

Visualize the distribution of student ages using a histogram or density plot. Understanding the age distribution of students can provide insights into the target audience of the online courses. It can also help tailor course content and engagement strategies to different age groups.

ggplot(courses_data, aes(x = YoB)) +
  geom_histogram(binwidth = 5, fill = "lightgray", color = "black") +
  labs(title = "Age Distribution of Students", x = "Year of Birth", y = "Frequency")

Output:

Check the Age Distribution of Students

Conclusion

Understanding online course engagement involves exploring, cleaning, transforming data, and visualizing it. These steps help us understand how people behave online, spot trends, and enhance learning experiences. It’s vital for educators and platforms to know how engaged students are. Analyzing online course data in R provides clear insights. From overview to visualization, each step uncovers valuable information about engagement dynamics. Clear visualizations like scatter plots, bar charts, and pie charts simplify complex data, revealing enrollment patterns and demographics. It helps educators to design better courses and platforms to improve learning outcomes.

Reffered: https://www.geeksforgeeks.org

AI ML DS

Related
AI ML DS Interview
Different Morphological Operations in Image Processing
Dimensionality Reduction with PCA: Selecting the Largest Eigenvalues and Eigenvectors
Future of ML in AI generation
Data Science Degrees vs. Data Science Certificates

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	19