![]() |
Splitting comma-separated strings in a column into separate rows is a common task in data manipulation and analysis in R Programming Language. This transformation is useful when dealing with data where multiple values are concatenated within a single cell, and you want to separate them into distinct rows for further analysis or visualization. This article explores various approaches to achieve this task, focusing on tidyr, dplyr, and base R methods. Let’s start with an example data frame that contains comma-separated strings in a specific column:
Output: ID Name Skills
1 1 Aliyana R,Python,SQL
2 2 Boby Excel,Tableau
3 3 Charlie Java,C++,Python In this example, the Skills column contains comma-separated strings representing the skills of each person. We aim to split these strings into separate rows. Splitting with tidyr separate_rowsThe separate_rows function from the tidyr package is designed for this purpose. It can split a column with delimiter-separated values into individual rows. Using separate_rows to Split Comma-Separated Strings
Output: ID Name Skills 1 1 Aliyana R,Python,SQL 2 2 Boby Excel,Tableau 3 3 Charlie Java,C++,Python # A tibble: 8 × 3 ID Name Skills <dbl> <chr> <chr> 1 1 Aliyana R 2 1 Aliyana Python 3 1 Aliyana SQL 4 2 Boby Excel 5 2 Boby Tableau 6 3 Charlie Java 7 3 Charlie C++ 8 3 Charlie Python This code splits the Skills column into separate rows based on the comma delimiter, resulting in a data frame where each skill is in a separate row. Splitting with dplyr and stringrThe combination of dplyr and stringr packages can also be used to achieve this task. You can first use stringr::str_split to split the strings and then use tidyr::unnest to expand the list into separate rows.
Output: ID Name Skills 1 1 Aliyana R,Python,SQL 2 2 Boby Excel,Tableau 3 3 Charlie Java,C++,Python # A tibble: 8 × 3 ID Name Skills <dbl> <chr> <chr> 1 1 Aliyana R 2 1 Aliyana Python 3 1 Aliyana SQL 4 2 Boby Excel 5 2 Boby Tableau 6 3 Charlie Java 7 3 Charlie C++ 8 3 Charlie Python This code first splits the Skills column into a list of individual skills and then uses unnest to create separate rows for each skill. Splitting with Base RBase R provides a more manual approach for splitting comma-separated strings into separate rows. This approach involves using strsplit, rep, and unlist.
Output: ID Name Skills This code splits the Skills column into lists, repeats the original data frame based on the length of the split lists, and then assigns the expanded skills to the new data frame. ConclusionSplitting comma-separated strings in a column into separate rows is a useful data transformation technique in R. This article presented three approaches to accomplish this task: using tidyr::separate_rows, the combination of dplyr and stringr, and a base R approach. Depending on your preferred method and requirements, you can select the most appropriate technique for your needs. |
Reffered: https://www.geeksforgeeks.org
R Language |
Related |
---|
![]() |
![]() |
![]() |
![]() |
![]() |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 17 |