Horje
Analyzing Online Course Engagement in R

Understanding online course engagement involves exploring, cleaning, transforming data, and visualizing it. This helps us see how people act online, find trends, and improve online learning. We’ll discuss why teachers and online platforms must know how engaged students are. Additionally, we’ll give clear steps for analyzing online course data.

Introduction to Online Course Engagement

Online course engagement refers to the level of interaction and participation of learners in an online course. High engagement typically correlates with better learning outcomes and satisfaction. Engagement can be measured through various indicators, including login frequency, time spent on course materials, participation in discussions, assignment submissions, and quiz performance.

Dataset Overview

The dataset is sourced from an online learning platform. It contains information about online courses and student engagement metrics.

  • course_id: Identifier for the course
  • userid_DI: Unique identifier for the user
  • final_cc_cname_DI: Country name of the user
  • LoE_DI: Level of education of the user
  • gender: Gender of the user
  • start_time_DI: Start time of the course
  • last_event_DI: Last event date for the user
  • grade: Grade achieved by the user
  • YoB: Year of birth of the user

Other columns include various engagement metrics like registered, viewed, explored, certified, as well as user roles, activity days, video play counts, etc.

Dataset Link: Online Course Engagement

Insights from this dataset can help in optimizing course design, improving user engagement, and enhancing overall learning outcomes in online education platforms.

Step-by-Step Implementation in R

Now we will discuss Step-by-Step Implementation to Analyzing Online Course Engagement in R Programming Language.

Step 1: Load the necessary libraries and Dataset

First we will load the required libraries and the dataset.

R
# Install necessary packages if not already installed
install.packages("dplyr")
install.packages("ggplot2")
install.packages("readr")

# Load libraries
library(dplyr)
library(ggplot2)
library(readr)

# Load the dataset
file_path <- "C:/Users/Tonmoy/Downloads/Dataset/Courses.csv"
courses_data <- read_csv(file_path)

head(courses_data)

Output:

  index Random                  course_id      userid_DI registered viewed explored
1 0 86 HarvardX/CB22x/2013_Spring MHxPC130442623 1 0 0
2 1 7 HarvardX/CS50x/2012 MHxPC130442623 1 1 0
3 2 70 HarvardX/CB22x/2013_Spring MHxPC130275857 1 0 0
4 3 60 HarvardX/CS50x/2012 MHxPC130275857 1 0 0
5 4 3 HarvardX/ER22x/2013_Spring MHxPC130275857 1 0 0
6 5 69 HarvardX/PH207x/2012_Fall MHxPC130275857 1 1 1
certified final_cc_cname_DI LoE_DI YoB gender grade start_time_DI last_event_DI nevents
1 0 United States NA 0 12/19/2012 11/17/2013 NA
2 0 United States NA 0 10/15/2012 NA
3 0 United States NA 0 2/8/2013 11/17/2013 NA
4 0 United States NA 0 9/17/2012 NA
5 0 United States NA 0 12/19/2012 NA
6 0 United States NA 0 9/17/2012 5/23/2013 502
ndays_act nplay_video nchapters nforum_posts roles incomplete_flag
1 9 NA NA 0 NA 1
2 9 NA 1 0 NA 1
3 16 NA NA 0 NA 1
4 16 NA NA 0 NA 1
5 16 NA NA 0 NA 1
6 16 50 12 0 NA NA

Step 2: Perform Exploratory Data Analysis

Now we perform EDA on our dataset so we calculate Summary statistics obtained with summary(courses_data) provide key descriptive measures such as mean, median, minimum, maximum, and quartiles for numerical variables, and frequency counts for categorical variables. Handle the missing values Data cleaning is crucial for handling missing values or anomalies in the dataset. Handle them by filling missing values with appropriate methods like mean, median, or specific values.

R
# View summary statistics of the dataset
summary(courses_data)

# View columns with missing values
colSums(is.na(courses_data))

# Load necessary library
library(dplyr)

# Fill missing values with appropriate methods
courses_data <- courses_data %>%
  mutate(
    nevents = ifelse(is.na(nevents), median(nevents, na.rm = TRUE), nevents),
    ndays_act = ifelse(is.na(ndays_act), median(ndays_act, na.rm = TRUE), ndays_act),
    nplay_video = ifelse(is.na(nplay_video), median(nplay_video, na.rm = TRUE), 
                                                                        nplay_video),
    nchapters = ifelse(is.na(nchapters), median(nchapters, na.rm = TRUE), nchapters),
    nforum_posts = ifelse(is.na(nforum_posts), 0, nforum_posts),
    roles = ifelse(is.na(roles), "Unknown", roles),
    incomplete_flag = ifelse(is.na(incomplete_flag), "Unknown", incomplete_flag)
  )

# Print the summary of the updated dataset to verify changes
sum(is.na(courses_data))

Output:

     index            Random                            course_id     
Min. : 0 Min. : 1.00 HarvardX/CS50x/2012 :169621
1st Qu.:160284 1st Qu.: 25.00 MITx/6.00x/2012_Fall : 66731
Median :320569 Median : 50.00 MITx/6.00x/2013_Spring : 57715
Mean :320569 Mean : 50.43 HarvardX/ER22x/2013_Spring: 57406
3rd Qu.:480853 3rd Qu.: 75.00 HarvardX/PH207x/2012_Fall : 41592
Max. :641137 Max. :100.00 MITx/6.002x/2012_Fall : 40811
(Other) :207262
userid_DI registered viewed explored certified
MHxPC130027283: 16 Min. :1 Min. :0.0000 Min. :0.0000 Min. :0.00000
MHxPC130103410: 16 1st Qu.:1 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
MHxPC130121287: 16 Median :1 Median :1.0000 Median :0.0000 Median :0.00000
MHxPC130126780: 16 Mean :1 Mean :0.6243 Mean :0.0619 Mean :0.02759
MHxPC130183602: 16 3rd Qu.:1 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000
MHxPC130200926: 16 Max. :1 Max. :1.0000 Max. :1.0000 Max. :1.00000
(Other) :641042
final_cc_cname_DI LoE_DI YoB gender grade
United States :184240 Min. :1.000 Min. :1931 Min. :1.000 Min. :0.00000
India : 88696 1st Qu.:2.000 1st Qu.:1983 1st Qu.:2.000 1st Qu.:0.00000
Unknown/Other : 82029 Median :2.000 Median :1988 Median :3.000 Median :0.00000
Other Europe : 40377 Mean :3.511 Mean :1986 Mean :2.507 Mean :0.03095
Other Africa : 23897 3rd Qu.:6.000 3rd Qu.:1991 3rd Qu.:3.000 3rd Qu.:0.00000
United Kingdom: 22131 Max. :6.000 Max. :2013 Max. :4.000 Max. :1.01000
(Other) :199768
start_time_DI last_event_DI nevents ndays_act
8/17/2012 : 10165 :178954 Min. : 1 Min. : 1.00
1/23/2013 : 8368 11/17/2013: 7046 1st Qu.: 3 1st Qu.: 1.00
10/15/2012: 6766 3/13/2013 : 4747 Median : 24 Median : 2.00
8/16/2012 : 6369 3/14/2013 : 3470 Mean : 431 Mean : 5.71
12/20/2012: 5858 10/16/2012: 3339 3rd Qu.: 158 3rd Qu.: 4.00
2/14/2013 : 5810 10/15/2012: 3125 Max. :197757 Max. :205.00
(Other) :597802 (Other) :440457 NA's :199151 NA's :162743
nplay_video nchapters nforum_posts roles incomplete_flag
Min. : 1.0 Min. : 1.00 Min. : 0.00000 Mode:logical Min. :1
1st Qu.: 5.0 1st Qu.: 1.00 1st Qu.: 0.00000 NA's:641138 1st Qu.:1
Median : 18.0 Median : 2.00 Median : 0.00000 Median :1
Mean : 114.8 Mean : 3.63 Mean : 0.01897 Mean :1
3rd Qu.: 73.0 3rd Qu.: 4.00 3rd Qu.: 0.00000 3rd Qu.:1
Max. :98517.0 Max. :48.00 Max. :20.00000 Max. :1
NA's :457530 NA's :258753 NA's :540977

index Random course_id userid_DI registered
0 0 0 0 0
viewed explored certified final_cc_cname_DI LoE_DI
0 0 0 0 0
YoB gender grade start_time_DI last_event_DI
0 0 0 0 0
nevents ndays_act nplay_video nchapters nforum_posts
199151 162743 457530 258753 0
roles incomplete_flag
641138 540977

[1] 0

Step 3: Data Transformation

Data transformation involves converting data types and creating new variables if necessary. In this step, dates are converted to the Date type, and a new variable for course duration in days is created.

R
# Convert dates to Date type
courses_data <- courses_data %>%
  mutate(
    start_time_DI = as.Date(start_time_DI, format = "%m/%d/%Y"),
    last_event_DI = as.Date(last_event_DI, format = "%m/%d/%Y")
  )

# Create a new column for course duration in days
courses_data <- courses_data %>%
  mutate(course_duration_days = as.numeric(last_event_DI - start_time_DI))

Step 4: Data Visualization

Visualize the distribution of grades achieved by students in the courses. This visualization helps understand the overall performance of students in the courses. It can highlight whether the majority of students are performing well or if there’s a wide variation in grades.

R
# Histogram of grades
ggplot(courses_data, aes(x = grade)) +
  geom_histogram(binwidth = 0.1, fill = "blue", color = "black") +
  labs(title = "Distribution of Grades", x = "Grade", y = "Frequency")

Output:

Screenshot-2024-06-10-103012

Distribution Of Grades

Relationship between Interactions and Assessment Score

The scatter plot displays individual data points, where each point represents a student’s assessment score (on the y-axis) against the number of interactions they had (on the x-axis). By observing the scatter plot, you can identify any trends or patterns in the relationship between the number of interactions and assessment scores.

R
# Scatter plot of interactions (number of events) vs. assessment score (grade)
ggplot(courses_data, aes(x = nevents, y = grade)) +
  geom_point(color = "blue") +
  labs(title = "Relationship between Interactions and Assessment Score",
       x = "Number of Events",
       y = "Assessment Score (Grade)")

Output:

Screenshot-2024-06-10-103653

Analyzing Online Course Engagement in R

Visualize Enrollment by Country

The plot allows us to quickly see which countries contribute the most enrollments to the dataset. We can observe the relative enrollment sizes between different countries.

  • The countries with the tallest bars represent the top enrolling countries. These countries have the highest number of enrollments in the dataset.
  • By analyzing the enrollment distribution across countries, we can gain insights into the geographic reach and popularity of the courses. It helps in understanding the global participation trends and potentially identifying regions of interest for targeted marketing or outreach efforts.
R
# Bar plot of enrollments by country
top_countries <- courses_data %>%
  group_by(final_cc_cname_DI) %>%
  summarise(enrollments = n()) %>%
  arrange(desc(enrollments)) %>%
  head(10)

ggplot(top_countries, aes(x = reorder(final_cc_cname_DI, -enrollments),
                          y = enrollments)) +
  geom_bar(stat = "identity", fill = "green") +
  labs(title = "Top 10 Countries by Enrollment", x = "Country",
       y = "Number of Enrollments") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Output:

Screenshot-2024-06-10-103209

Visualize the country based on enrollment

Course Enrollment Over Time

Plot the number of course enrollments over time (e.g., by month or year). This visualization can show trends in course enrollment, indicating periods of high or low enrollment. It can also help identify any seasonality or patterns in enrollment behavior.

R
#Course Enrollment Over Time
ggplot(courses_data, aes(x = start_time_DI)) +
  geom_bar(stat = "count", fill = "lightgreen") +
  labs(title = "Course Enrollment Over Time", x = "Date", y = "Number of Enrollments")

Output:

Screenshot-2024-06-10-113400

Visualize Course Enrollment Over Time

Gender Distribution

Create a pie chart or bar chart showing the distribution of gender among course participants. Understanding the gender distribution can provide insights into the demographic makeup of the student population. It can also help assess whether there are any gender disparities in course enrollment.

R
# Calculate gender distribution
gender_distribution <- courses_data %>%
  group_by(gender) %>%
  summarise(participants = n())

# Create pie chart
ggplot(gender_distribution, aes(x = "", y = participants, fill = gender)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start = 0) +
  labs(title = "Gender Distribution Among Course Participants", fill = "Gender") +
  theme_void() +
  theme(legend.position = "right")

Output:

Screenshot-2024-06-10-113706

Visualize Gender distribution

Age Distribution of Students

Visualize the distribution of student ages using a histogram or density plot. Understanding the age distribution of students can provide insights into the target audience of the online courses. It can also help tailor course content and engagement strategies to different age groups.

R
ggplot(courses_data, aes(x = YoB)) +
  geom_histogram(binwidth = 5, fill = "lightgray", color = "black") +
  labs(title = "Age Distribution of Students", x = "Year of Birth", y = "Frequency")

Output:

Screenshot-2024-06-10-114543

Check the Age Distribution of Students

Conclusion

Understanding online course engagement involves exploring, cleaning, transforming data, and visualizing it. These steps help us understand how people behave online, spot trends, and enhance learning experiences. It’s vital for educators and platforms to know how engaged students are. Analyzing online course data in R provides clear insights. From overview to visualization, each step uncovers valuable information about engagement dynamics. Clear visualizations like scatter plots, bar charts, and pie charts simplify complex data, revealing enrollment patterns and demographics. It helps educators to design better courses and platforms to improve learning outcomes.




Reffered: https://www.geeksforgeeks.org


AI ML DS

Related
AI ML DS Interview AI ML DS Interview
Different Morphological Operations in Image Processing Different Morphological Operations in Image Processing
Dimensionality Reduction with PCA: Selecting the Largest Eigenvalues and Eigenvectors Dimensionality Reduction with PCA: Selecting the Largest Eigenvalues and Eigenvectors
Future of ML in AI generation Future of ML in AI generation
Data Science Degrees vs. Data Science Certificates Data Science Degrees vs. Data Science Certificates

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
19