Global Mortality Rates

Death comes for us all. And what a splendid variety of forms it can take! Cardiovascular disease, cancer, natural disasters, malaria, infections — the list is almost endless.

This week we’ll look at mortality data collected by the World Health Organization (WHO) to see how cause of death varies by the socio-demographic index of the country in which you live.

library(tidyverse)
library(readxl)

download.file("https://github.com/rfordatascience/tidytuesday/blob/master/data/2018/2018-04-16/global_mortality.xlsx?raw=true", "global_mortality.xlsx")

mortality <- read_xlsx("global_mortality.xlsx") %>%
  filter(str_detect(country, "SDI")) %>%
  rename(sociodemographic_index = country) %>%
  mutate(sociodemographic_index = sub(" SDI", "", sociodemographic_index)) %>%
  mutate(sociodemographic_index = fct_relevel(sociodemographic_index, !! c("High", "High-middle", "Middle", "Low-middle", "Low")))

mortality

# A tibble: 135 x 35
   sociodemographi… country_code  year `Cardiovascular… `Cancers (%)` `Respiratory di…
   <chr>            <chr>        <dbl>            <dbl>         <dbl>            <dbl>
 1 High             NA            1990             42.7          24.2             4.45
 2 High             NA            1991             42.3          24.4             4.47
 3 High             NA            1992             41.8          24.6             4.50
 4 High             NA            1993             41.4          24.8             4.56
 5 High             NA            1994             40.9          25.0             4.59
 6 High             NA            1995             40.5          25.1             4.65
 7 High             NA            1996             40.1          25.4             4.70
 8 High             NA            1997             39.7          25.6             4.76
 9 High             NA            1998             39.2          25.8             4.81
10 High             NA            1999             38.8          25.9             4.89
# … with 125 more rows, and 29 more variables: `Diabetes (%)` <dbl>, `Dementia (%)` <dbl>,
#   `Lower respiratory infections (%)` <dbl>, `Neonatal deaths (%)` <dbl>, `Diarrheal diseases
#   (%)` <dbl>, `Road accidents (%)` <dbl>, `Liver disease (%)` <dbl>, `Tuberculosis
#   (%)` <dbl>, `Kidney disease (%)` <dbl>, `Digestive diseases (%)` <dbl>, `HIV/AIDS
#   (%)` <dbl>, `Suicide (%)` <dbl>, `Malaria (%)` <dbl>, `Homicide (%)` <dbl>, `Nutritional
#   deficiencies (%)` <dbl>, `Meningitis (%)` <dbl>, `Protein-energy malnutrition (%)` <dbl>,
#   `Drowning (%)` <dbl>, `Maternal deaths (%)` <dbl>, `Parkinson disease (%)` <dbl>, `Alcohol
#   disorders (%)` <dbl>, `Intestinal infectious diseases (%)` <dbl>, `Drug disorders
#   (%)` <dbl>, `Hepatitis (%)` <dbl>, `Fire (%)` <dbl>, `Heat-related (hot and cold exposure)
#   (%)` <dbl>, `Natural disasters (%)` <dbl>, `Conflict (%)` <dbl>, `Terrorism (%)` <dbl>

The dataset is in a wide format, so we’ll need to pivot into a longer, tidy format.

mortality <- mortality %>%
  select_if(~ all(!is.na(.))) %>%
  rename_all(~ sub(" [(]%[)]", "", .)) %>%
  gather(cause_of_death, deaths_per_1k, -sociodemographic_index, -year) %>%
  mutate(deaths_per_1k = 10 * deaths_per_1k,
         cause_of_death = replace(cause_of_death, cause_of_death == "Parkinson disease", "Parkinson's disease"),
         cause_of_death = replace(cause_of_death, cause_of_death == "Heat-related (hot and cold exposure)", "Heat-related"))

mortality

# A tibble: 4,050 x 4
   sociodemographic_index  year cause_of_death          deaths_per_1k
   <chr>                  <dbl> <chr>                           <dbl>
High                    1990 Cardiovascular diseases          427.
High                    1991 Cardiovascular diseases          423.
High                    1992 Cardiovascular diseases          418.
High                    1993 Cardiovascular diseases          414.
High                    1994 Cardiovascular diseases          409.
High                    1995 Cardiovascular diseases          405.
High                    1996 Cardiovascular diseases          401.
High                    1997 Cardiovascular diseases          397.
High                    1998 Cardiovascular diseases          392.
High                    1999 Cardiovascular diseases          388.
# … with 4,040 more rows

This part looks a little esoteric at first, but the point of it is to calculate an invisible upper limit on a series of facetted plots.

parse_factor_to_numeric <- function(x) as.numeric(levels(x))[x]

yvals <- mortality %>%
  group_by(cause_of_death) %>%
  summarize(ymax = max(deaths_per_1k),
            yupper = cut(max(deaths_per_1k), c(0, 5, 15, 30, 50, 150, 300, 500), 
                         labels = c(5, 15, 30, 50, 150, 300, 500))) %>%
  mutate(yupper = parse_factor_to_numeric(yupper)) %>%
  arrange(desc(ymax))

mortality <- mortality %>%
  left_join(yvals, by = "cause_of_death") %>%
  mutate(cause_of_death = factor(cause_of_death, levels = yvals$cause_of_death))

mortality

# A tibble: 4,050 x 6
   sociodemographic_index  year cause_of_death          deaths_per_1k  ymax yupper
   <chr>                  <dbl> <fct>                           <dbl> <dbl>  <dbl>
High                    1990 Cardiovascular diseases          427.  459.    500
High                    1991 Cardiovascular diseases          423.  459.    500
High                    1992 Cardiovascular diseases          418.  459.    500
High                    1993 Cardiovascular diseases          414.  459.    500
High                    1994 Cardiovascular diseases          409.  459.    500
High                    1995 Cardiovascular diseases          405.  459.    500
High                    1996 Cardiovascular diseases          401.  459.    500
High                    1997 Cardiovascular diseases          397.  459.    500
High                    1998 Cardiovascular diseases          392.  459.    500
High                    1999 Cardiovascular diseases          388.  459.    500
# … with 4,040 more rows

Lookin’ good.

Now we’ll make line graphs with year on the x-axis, deaths per 1,000 people on the y-axis, and lines colored by the socio-demoographic index (SDI) from ‘Low’ to ‘High’.

ggplot(mortality, aes(year, deaths_per_1k, color = sociodemographic_index)) +
  geom_hline(yintercept = 0,  size = 0.3, alpha = 0.5) +
  geom_hline(aes(yintercept = yupper), size = 0.3, alpha = 0) +
  geom_line() +
  facet_wrap(~ cause_of_death, scales = "free_y", ncol = 3) +
  scale_color_manual(values = c("#ffd700", "#fa8775", "#ea5f94", "#cd34b5", "#0000ff")) +
  labs(title = "Global mortality rates by Socio-demographic Index (SDI) from 1990 to 2016",
       subtitle = "",
       caption = "Source: ourworldindata.org | Graphic: nsgrantham.com/global-mortality",
       x = NULL, y = "Deaths per 1,000 people") +
  theme_minimal(base_family = "Fira Sans Extra Condensed Light", base_size = 12) +
  theme(plot.title = element_text(family = "Fira Sans Extra Condensed"),
        legend.position = c(0.33, 1.0358),
        legend.direction = "horizontal",
        legend.title = element_blank(),
        axis.text = element_text(size = 7),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        panel.grid.major.x = element_line(size = 0.4),
        panel.grid.minor.x = element_blank())

ggsave("global-mortality.png", width = 6.5, height = 12)

Boy, what a heart-warming graphic.