Horje
World Bank Dataset in R

The World Bank provides a wealth of data that is invaluable for researchers, economists, and policymakers. This data encompasses various aspects of global development, including economic indicators, health statistics, education metrics, and environmental data. Leveraging this dataset in R, a powerful statistical programming language, can yield insightful analyses and visualizations. This guide will walk you through the process of accessing, manipulating, and visualizing World Bank data in R Programming Language.

1. Accessing World Bank Data in R

To work with World Bank data in R, you need to install and load specific packages designed to interact with the World Bank’s API.

Installing and loading Required Packages

The WDI package is a popular choice for accessing World Bank data in R. You can install it from CRAN using the following command:

R
install.packages("WDI")
library(WDI)

2. Retrieving Data

The WDI package provides functions to search for and download data. The primary function used is WDI(), which allows you to fetch data based on specific indicators, countries, and time periods.

To find the indicators you are interested in, use the WDIsearch() function. For example, to search for GDP-related indicators:

R
gdp_indicators <- WDIsearch("gdp")
head(gdp_indicators)

Output:

              indicator                                                 name
712      5.51.01.10.gdp                                Per capita GDP growth
714     6.0.GDP_current                                      GDP (current $)
715      6.0.GDP_growth                                GDP growth (annual %)
716         6.0.GDP_usd                                GDP (constant 2005 $)
717  6.0.GDPpc_constant GDP per capita, PPP (constant 2011 international $) 
1557  BG.GSR.NFSV.GD.ZS                         Trade in services (% of GDP)

This command returns a dataframe with information about GDP-related indicators, including their codes and descriptions.

Once you have identified the indicators you need, you can download the data. For example, to download GDP data for all countries from 2000 to 2020, use:

R
gdp_data <- WDI(indicator = "NY.GDP.MKTP.CD", start = 2000, end = 2020)
head(gdp_data)

Output:

                      country iso2c iso3c year NY.GDP.MKTP.CD
1 Africa Eastern and Southern    ZH   AFE 2020   9.288802e+11
2 Africa Eastern and Southern    ZH   AFE 2019   1.006191e+12
3 Africa Eastern and Southern    ZH   AFE 2018   1.012521e+12
4 Africa Eastern and Southern    ZH   AFE 2017   9.399593e+11
5 Africa Eastern and Southern    ZH   AFE 2016   8.297383e+11
6 Africa Eastern and Southern    ZH   AFE 2015   8.992556e+11

In this example, "NY.GDP.MKTP.CD" is the indicator code for GDP (current US$).

3. Data Manipulation

After downloading the data, you might need to clean and manipulate it to fit your analysis needs. Common tasks include handling missing values, filtering data, and reshaping data frames.

Handling Missing Values

To handle missing values, you can use functions from the tidyverse package, which provides a cohesive set of data manipulation tools:

R
library(tidyverse)
gdp_data <- gdp_data %>%
  filter(!is.na(NY.GDP.MKTP.CD))

Filtering Data

To focus on specific countries or regions, you can filter the dataset:

R
gdp_us_china <- gdp_data %>%
  filter(country %in% c("United States", "China"))

Reshaping Data

For certain types of analyses, you might need to reshape your data. The tidyr package from the tidyverse collection is particularly useful for this:

R
gdp_wide <- gdp_us_china %>%
  pivot_wider(names_from = country, values_from = NY.GDP.MKTP.CD)

4. Data Visualization

Visualizing data is crucial for conveying insights effectively. The ggplot2 package, also part of the tidyverse, is a powerful tool for creating a variety of visualizations.

To plot GDP trends over time for the United States and China, you can use the following code:

R
ggplot(gdp_us_china, aes(x = year, y = NY.GDP.MKTP.CD, color = country)) +
  geom_line() +
  labs(title = "GDP Trends: United States vs. China",
       x = "Year",
       y = "GDP (current US$)") +
  theme_minimal()

Output:

gh

World Bank Dataset in R

This code creates a line plot that shows how the GDP of the United States and China has evolved from 2000 to 2020.

Customizing Plots

ggplot2 allows extensive customization of plots. For instance, to change the color palette and add data points:

R
ggplot(gdp_us_china, aes(x = year, y = NY.GDP.MKTP.CD, color = country)) +
  geom_line() +
  geom_point() +
  scale_color_manual(values = c("United States" = "blue", "China" = "red")) +
  labs(title = "GDP Trends: United States vs. China",
       x = "Year",
       y = "GDP (current US$)",
       color = "Country") +
  theme_minimal()

Output:

gh

World Bank Dataset in R

5. Advanced Analysis

For more advanced analyses, you can integrate World Bank data with other datasets or perform statistical analyses.

Combining Datasets

To combine World Bank data with other data sources, use functions such as left_join() from the dplyr package:

R
population_data <- WDI(indicator = "SP.POP.TOTL", start = 2000, end = 2020)
combined_data <- left_join(gdp_data, population_data, by = c("country", "year"))
head(combined_data)

Output:

                      country iso2c.x iso3c.x year NY.GDP.MKTP.CD iso2c.y iso3c.y SP.POP.TOTL
1 Africa Eastern and Southern      ZH     AFE 2020   9.288802e+11      ZH     AFE   685112979
2 Africa Eastern and Southern      ZH     AFE 2019   1.006191e+12      ZH     AFE   667242986
3 Africa Eastern and Southern      ZH     AFE 2018   1.012521e+12      ZH     AFE   649757148
4 Africa Eastern and Southern      ZH     AFE 2017   9.399593e+11      ZH     AFE   632746570
5 Africa Eastern and Southern      ZH     AFE 2016   8.297383e+11      ZH     AFE   616377605
6 Africa Eastern and Southern      ZH     AFE 2015   8.992556e+11      ZH     AFE   600008424

Statistical Analysis

You can perform various statistical analyses using base R or other specialized packages. For example, to calculate the correlation between GDP and population:

R
correlation <- cor(combined_data$NY.GDP.MKTP.CD, combined_data$SP.POP.TOTL, 
                   use = "complete.obs")
print(correlation)

Output:

[1] 0.6817722

Conclusion

Accessing and analyzing World Bank data in R opens up a vast array of opportunities for research and insights into global development trends. By following the steps outlined in this guide, you can efficiently retrieve, manipulate, and visualize World Bank data to support your analyses and presentations. With the powerful combination of the WDI and tidyverse packages, R provides a robust framework for handling complex datasets and deriving meaningful conclusions.




Reffered: https://www.geeksforgeeks.org


AI ML DS

Related
Monitoring and Assessing the Significance of Changes in Time Series Data Monitoring and Assessing the Significance of Changes in Time Series Data
Building a Rule-Based Chatbot with Natural Language Processing Building a Rule-Based Chatbot with Natural Language Processing
Role of AI in Data Analytics Role of AI in Data Analytics
Lifelong Learning in AI: Revolutionizing Continuous Adaptation in Technology Lifelong Learning in AI: Revolutionizing Continuous Adaptation in Technology
What is Inductive Bias in Machine Learning? What is Inductive Bias in Machine Learning?

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
16