coronavirus

R-CMD Data Pipeline CRAN_Status_Badge lifecycle License: MIT GitHub commit Downloads

The coronavirus package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic and the vaccination efforts by country. The raw data is being pulled from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus repository.

More details available here, and a csv format of the package dataset available here

Source: Centers for Disease Control and Prevention’s Public Health Image Library

Important Notes

Vignettes

Additional documentation available on the followng vignettes:

Installation

Install the CRAN version:

install.packages("coronavirus")

Install the Github version (refreshed on a daily bases):

# install.packages("devtools")
devtools::install_github("RamiKrispin/coronavirus")

Datasets

The package provides the following two datasets:

Data refresh

While the coronavirus CRAN version is updated every month or two, the Github (Dev) version is updated on a daily bases. The update_dataset function enables to overcome this gap and keep the installed version with the most recent data available on the Github version:

library(coronavirus)
update_dataset()

Note: must restart the R session to have the updates available

Alternatively, you can pull the data using the Covid19R project data standard format with the refresh_coronavirus_jhu function:

covid19_df <- refresh_coronavirus_jhu()
head(covid19_df)
#>         date    location location_type location_code location_code_type  data_type value      lat      long
#> 1 2021-10-04 Afghanistan       country            AF         iso_3166_2 deaths_new     6 33.93911 67.709953
#> 2 2021-10-03 Afghanistan       country            AF         iso_3166_2 deaths_new     0 33.93911 67.709953
#> 3 2020-11-09 Afghanistan       country            AF         iso_3166_2 deaths_new     6 33.93911 67.709953
#> 4 2021-10-10 Afghanistan       country            AF         iso_3166_2 deaths_new     4 33.93911 67.709953
#> 5 2021-10-06 Afghanistan       country            AF         iso_3166_2 deaths_new     6 33.93911 67.709953
#> 6 2020-11-10 Afghanistan       country            AF         iso_3166_2 deaths_new    12 33.93911 67.709953

Usage

data("coronavirus")

head(coronavirus)
#>         date province country     lat      long      type cases
#> 1 2020-01-22  Alberta  Canada 53.9333 -116.5765 confirmed     0
#> 2 2020-01-23  Alberta  Canada 53.9333 -116.5765 confirmed     0
#> 3 2020-01-24  Alberta  Canada 53.9333 -116.5765 confirmed     0
#> 4 2020-01-25  Alberta  Canada 53.9333 -116.5765 confirmed     0
#> 5 2020-01-26  Alberta  Canada 53.9333 -116.5765 confirmed     0
#> 6 2020-01-27  Alberta  Canada 53.9333 -116.5765 confirmed     0

Summary of the total confrimed cases by country (top 20):

library(dplyr)

summary_df <- coronavirus %>% 
  filter(type == "confirmed") %>%
  group_by(country) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases)

summary_df %>% head(20) 
#> # A tibble: 20 × 2
#>    country        total_cases
#>    <chr>                <int>
#>  1 US                44683014
#>  2 India             34020730
#>  3 Brazil            21597949
#>  4 United Kingdom     8311851
#>  5 Russia             7742899
#>  6 Turkey             7540193
#>  7 France             7164924
#>  8 Iran               5742083
#>  9 Argentina          5268653
#> 10 Spain              4980206
#> 11 Colombia           4975656
#> 12 Italy              4707087
#> 13 Germany            4343591
#> 14 Indonesia          4231046
#> 15 Mexico             3732429
#> 16 Poland             2928065
#> 17 South Africa       2913880
#> 18 Ukraine            2697176
#> 19 Philippines        2690455
#> 20 Malaysia           2361529

Summary of new cases during the past 24 hours by country and type (as of 2021-10-13):

library(tidyr)

coronavirus %>% 
  filter(date == max(date)) %>%
  select(country, type, cases) %>%
  group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type,
              values_from = total_cases) %>%
  arrange(-confirmed)
#> # A tibble: 195 × 4
#> # Groups:   country [195]
#>    country        confirmed death recovered
#>    <chr>              <int> <int>     <int>
#>  1 US                120321  3054        NA
#>  2 United Kingdom     41669   136        NA
#>  3 Turkey             31248   236        NA
#>  4 Russia             27926   962        NA
#>  5 India              18987   246        NA
#>  6 Ukraine            17100   493        NA
#>  7 Romania            15733   390        NA
#>  8 Germany            12317    14        NA
#>  9 Iran               12298   194        NA
#> 10 Thailand           10064    82        NA
#> 11 Malaysia            7950    68        NA
#> 12 Brazil              7852   176        NA
#> 13 Philippines         7083   173        NA
#> 14 Serbia              6699    51        NA
#> 15 Georgia             4837    26        NA
#> 16 Netherlands         3772    13        NA
#> 17 Belgium             3667    13        NA
#> 18 Vietnam             3461   106        NA
#> 19 Bulgaria            3327    98        NA
#> 20 Singapore           3190     9        NA
#> 21 Cameroon            3003    33        NA
#> 22 Italy               2769    37        NA
#> 23 Spain               2758    42        NA
#> 24 Australia           2744    18        NA
#> 25 Lithuania           2740    26        NA
#> 26 Canada              2706    79        NA
#> 27 Poland              2640    40        NA
#> 28 Austria             2614    15        NA
#> 29 Slovakia            2406    20        NA
#> 30 Cuba                2354    28        NA
#> 31 Greece              2312    31        NA
#> 32 Latvia              2236    17        NA
#> 33 Kazakhstan          2084    35        NA
#> 34 Belarus             2060    17        NA
#> 35 Moldova             2052    29        NA
#> 36 Ireland             2051    26        NA
#> 37 Croatia             2022    27        NA
#> 38 Korea, South        1937    13        NA
#> 39 Mongolia            1920    15        NA
#> 40 Iraq                1766    35        NA
#> # … with 155 more rows

Plotting daily confirmed and death cases in Brazil:

library(plotly)

coronavirus %>% 
  group_by(type, date) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type, values_from = total_cases) %>%
  arrange(date) %>%
  mutate(active = confirmed - death - recovered) %>%
  mutate(active_total = cumsum(active),
                recovered_total = cumsum(recovered),
                death_total = cumsum(death)) %>%
  plot_ly(x = ~ date,
                  y = ~ active_total,
                  name = 'Active', 
                  fillcolor = '#1f77b4',
                  type = 'scatter',
                  mode = 'none', 
                  stackgroup = 'one') %>%
  add_trace(y = ~ death_total, 
             name = "Death",
             fillcolor = '#E41317') %>%
  add_trace(y = ~recovered_total, 
            name = 'Recovered', 
            fillcolor = 'forestgreen') %>%
  layout(title = "Distribution of Covid19 Cases Worldwide",
         legend = list(x = 0.1, y = 0.9),
         yaxis = list(title = "Number of Cases"),
         xaxis = list(title = "Source: Johns Hopkins University Center for Systems Science and Engineering"))

Plot the confirmed cases distribution by counrty with treemap plot:

conf_df <- coronavirus %>% 
  filter(type == "confirmed") %>%
  group_by(country) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases) %>%
  mutate(parents = "Confirmed") %>%
  ungroup() 
  
  plot_ly(data = conf_df,
          type= "treemap",
          values = ~total_cases,
          labels= ~ country,
          parents=  ~parents,
          domain = list(column=0),
          name = "Confirmed",
          textinfo="label+value+percent parent")

data(covid19_vaccine)

head(covid19_vaccine)
#>   country_region       date doses_admin people_partially_vaccinated people_fully_vaccinated report_date_string uid province_state iso2 iso3 code3 fips      lat      long combined_key population
#> 1    Afghanistan 2021-02-22           0                           0                       0         2021-02-22   4           <NA>   AF  AFG     4 <NA> 33.93911 67.709953  Afghanistan   38928341
#> 2    Afghanistan 2021-02-23           0                           0                       0         2021-02-23   4           <NA>   AF  AFG     4 <NA> 33.93911 67.709953  Afghanistan   38928341
#> 3    Afghanistan 2021-02-24           0                           0                       0         2021-02-24   4           <NA>   AF  AFG     4 <NA> 33.93911 67.709953  Afghanistan   38928341
#> 4    Afghanistan 2021-02-25           0                           0                       0         2021-02-25   4           <NA>   AF  AFG     4 <NA> 33.93911 67.709953  Afghanistan   38928341
#> 5    Afghanistan 2021-02-26           0                           0                       0         2021-02-26   4           <NA>   AF  AFG     4 <NA> 33.93911 67.709953  Afghanistan   38928341
#> 6    Afghanistan 2021-02-27           0                           0                       0         2021-02-27   4           <NA>   AF  AFG     4 <NA> 33.93911 67.709953  Afghanistan   38928341
#>   continent_name continent_code
#> 1           Asia             AS
#> 2           Asia             AS
#> 3           Asia             AS
#> 4           Asia             AS
#> 5           Asia             AS
#> 6           Asia             AS

Plot the top 20 vaccinated countries:

covid19_vaccine %>% 
  filter(date == max(date),
         !is.na(population)) %>% 
  mutate(fully_vaccinated_ratio = people_fully_vaccinated / population) %>%
  arrange(- fully_vaccinated_ratio) %>%
  slice_head(n = 20) %>%
  arrange(fully_vaccinated_ratio) %>%
  mutate(country = factor(country_region, levels = country_region)) %>%
  plot_ly(y = ~ country,
          x = ~ round(100 * fully_vaccinated_ratio, 2),
          text = ~ paste(round(100 * fully_vaccinated_ratio, 1), "%"),
          textposition = 'auto',
          orientation = "h",
          type = "bar") %>%
  layout(title = "Percentage of Fully Vaccineted Population - Top 20 Countries",
         yaxis = list(title = ""),
         xaxis = list(title = "Source: Johns Hopkins Centers for Civic Impact",
                      ticksuffix = "%"))

Dashboard

Note: Currently, the dashboard is under maintenance due to recent changes in the data structure. Please see this issue

A supporting dashboard is available here

Data Sources

The raw data pulled and arranged by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) from the following resources: