Kinship Structures

Introduction

Course description:

Kinship is a fundamental property of human populations and a key form of social structure. Demographers have long been interested in the interplay between demographic change and family configuration. This has led to the development of sophisticated methodological and conceptual approaches for the study of kinship, some of which are reviewed in this course.

Some useful things to know:

Download the syllabus: https://github.com/amandamartinsal/EDSD_kinship_24-25/blob/master/EDSD_2024_25_kinship_syllabus.pdf
The course’s slides are available here: https://github.com/amandamartinsal/EDSD_kinship_24-25/tree/master/slides
Find this website’s source code on GitHub: https://github.com/amandamartinsal/EDSD_kinship_24-25

Lab session 1 (Monday)

Getting started with matrix kinship models in R using DemoKin

We will start soon the computer lab sessions, so would be great if in advance we have prepared the R environment. First, you will need R and Rstudio installed. Second, install the DemoKin!

1. Installation

# Install the development version from GitHub

#install.packages("devtools")
#devtools::install_github("IvanWilli/DemoKin")

Load other packages that will be useful:

library(DemoKin)
library(dplyr)
library(tidyr)
library(ggplot2)

2. Built-in data

The DemoKin package includes data from Sweden as an example. This comes from the Human Mortality Database and Human Fertility Database.

2.1. `swe_px` matrix; survival probabilities by age

First we have survival probabilities by age:

swe <- data("swe_px", package="DemoKin")

swe_px[1:5, 1:5]

##      1900    1901    1902    1903    1904
## 0 0.91060 0.90673 0.92298 0.91890 0.92357
## 1 0.97225 0.97293 0.97528 0.97549 0.97847
## 2 0.98525 0.98579 0.98630 0.98835 0.98921
## 3 0.98998 0.98947 0.99079 0.99125 0.99226
## 4 0.99158 0.99133 0.99231 0.99352 0.99272

It has years in columns and age in rows. Plotting $q_x$ ($p_x$´s complement) over age for 2018 gives:

swe_px %>%
    as.data.frame() %>%
    select(px = `2018`) %>%
    mutate(ages = 1:nrow(swe_px)-1) %>%
    ggplot() +
    geom_line(aes(x = ages, y = 1-px)) +
    scale_y_log10()

2.2. `swe_asfr` matrix; age specific fertility rate

And age-specific fertility rates:

data("swe_asfr", package="DemoKin")

swe_asfr[15:20, 1:4]

##       1900    1901    1902    1903
## 14 0.00013 0.00006 0.00008 0.00008
## 15 0.00053 0.00054 0.00057 0.00057
## 16 0.00275 0.00319 0.00322 0.00259
## 17 0.00932 0.00999 0.00965 0.00893
## 18 0.02328 0.02337 0.02347 0.02391
## 19 0.04409 0.04357 0.04742 0.04380

Plotted over time and age for the same year:

swe_asfr %>% as.data.frame() %>%
      as.data.frame() %>%
      select(fx = `2018`) %>%
      mutate(age = 1:nrow(swe_asfr)-1) %>%
      ggplot() +
      geom_line(aes(x = age, y = fx))

3. ‘Keyfitz’ kinship diagram

We can visualize the implied kin counts for a Focal girl aged 5 yo in a time-invariant population using a network or ‘Keyfitz’ kinship diagram (Keyfitz and Caswell 2005) with the plot_diagram function:

First, get vectors for a given year:

swe_surv_2018 <- DemoKin::swe_px[,"2018"]
swe_asfr_2018 <- DemoKin::swe_asfr[,"2018"]

Run kinship models

swe_2018 <- kin(p = swe_surv_2018, f = swe_asfr_2018, time_invariant = TRUE)

swe_2018$kin_summary %>% 
  filter(age_focal == 5) %>% 
  select(kin, count = count_living) %>% 
  plot_diagram(rounding = 2)

4. What if we want to work with other countries?

We can access demographic rates from any country in the world, produced by the World Population Prospects (WPP) project.

In the lab sessions we will work with Brazil, so lets see the case of Brazil female population in 2023. For today session the data is being provided, you will download it later as homework.

# Load the .RData file

load("brazil_data.RData") # px and fx - Brazil in 2023

We have to reshape fertility and mortality to create a matrix to be used by DemoKin(i.e., create a matrix with years as columns and ages as rows):

Reshape fertility

country_fert <- brazil_data %>%
  select(age, year, fx) %>%
  pivot_wider(names_from = year, values_from = fx) %>%
  select(-age) %>%
  as.matrix()

Reshape survival

country_surv <- brazil_data %>%
  select(age, year, px) %>%
  pivot_wider(names_from = year, values_from = px) %>%
  select(-age) %>%
  as.matrix()

5. ‘Keyfitz’ kinship diagram

Run kinship models

br_2023 <- 
  kin(p = country_surv, f = country_fert, time_invariant = TRUE)

br_2023$kin_summary %>% 
  filter(age_focal == 5) %>% 
  select(kin, count = count_living) %>% 
  plot_diagram(rounding = 2)

6. Exercise

6.1 We saw the ‘Keyfitz’ kinship diagram for a Focal girl aged 5 yo, but what about an older Focal, for example, a woman 65 yo, what would you expected? Build the kinship diagram for this Focal for Brazil and discuss (groups of 2 or 3).

Now share with the class what you have discussed in your groups!

7. Homework

We will download fertility, mortality and population data for all countries from 1950 to 2023.

First, we will load the download_wpp24() function, which automates the process of downloading these data. Please note that this process may take a couple of minutes.

# Load the function to download data
source("UNWPP_download.R")

# Download the data
download_wpp24()

## [1] "2024 World Population Prospects data downloaded and saved successfully as CSV files."

After downloading the data, we can filter the information we are interested in, for example, the country of interest. To do this, we will load two functions:

UNWPP_data: This function filters the data for mortality (px) and fertility (fx) by the country of interest and the relevant time period.
UNWPP_pop: This function filters data for the population size (N), which will be necessary when discussing time-variant contexts on Wednesday.

To choose the country of interest, you can look at the name of the country here.

# Load function to filter data
source("UNWPP_data.R")

# Select country, year and sex to obtain px and fx

brazil_data <-
  UNWPP_data(country = "Brazil",
                   start_year =  2023,
                   end_year = 2023,
                   sex = "Female")

# Select country, year and sex to obtain N 

brazil_pop <-
  UNWPP_pop(country = "Brazil",
            start_year = 2023,
            end_year = 2023,
            sex = "Female")

Using the WPP data you downloaded, build the ‘Keyfitz’ kinship diagram for a Focal girl aged 5 yo in a time-invariant population. Discuss the results in relation to what we have seen for Brazil: identify the main differences and the reasons for them.

Lab session 2 (Tuesday)

In today’s session we will see the DemoKin functions and how to run a one-sex; time-invariant model.

Load the packages and download the data

library(dplyr)
library(tidyr)
library(ggplot2)
library(DemoKin)

Today (and tomorrow) we will use the Brazilian data from WPP that we downloaded yesterday. Select the data again if necessary:

# Load function
source("UNWPP_data.R")

# Select country, year and sex

data <- UNWPP_data(country = "Brazil",
                   start_year =  2023,
                   end_year = 2023,
                   sex = "Female")

1. The function `kin()`

DemoKin can be used to compute the number and age distribution of Focal’s relatives under a range of assumptions, including living and deceased kin. The function DemoKin::kin() currently does most of the heavy lifting in terms of implementing matrix kinship models.

This is what it looks like in action, in this case assuming time-invariant demographic rates:

# First, reshape fertility and survival for a given year

br_asfr_2023 <- brazil_data %>%
  select(age, year, fx) %>%
  pivot_wider(names_from = year, values_from = fx) %>%
  select(-age) %>%
  as.matrix()

br_surv_2023 <- brazil_data %>%
  select(age, year, px) %>%
  pivot_wider(names_from = year, values_from = px) %>%
  select(-age) %>%
  as.matrix()

# Run kinship models

br_2023 <- kin(p = br_surv_2023, f = br_asfr_2023, time_invariant = TRUE)

1.1. Arguments

p numeric. A vector (atomic) or matrix of survival probabilities with rows as ages (and columns as years in case of matrix).
f numeric. Same as U but for fertility rates.
time_invariant logical. Assume time-invariant rates. Default TRUE.
output_kin character. kin types to return: “m” for mother, “d” for daughter, …

1.2. Relative types

Relatives for the output_kin argument are identified by a unique code. Note that the relationship codes used in DemoKin differ from those in Caswell (2019). The equivalence between the two set of codes is given in the following table:

demokin_codes

##    DemoKin Caswell               Labels_female                   Labels_male
## 1      coa       t    Cousins from older aunts     Cousins from older uncles
## 2      cya       v  Cousins from younger aunts   Cousins from younger uncles
## 3        c    <NA>                     Cousins                       Cousins
## 4        d       a                   Daughters                          Sons
## 5       gd       b             Grand-daughters                    Grand-sons
## 6      ggd       c       Great-grand-daughters              Great-grand-sons
## 7      ggm       h          Great-grandmothers            Great-grandfathers
## 8       gm       g                Grandmothers                  Grandfathers
## 9        m       d                      Mother                        Father
## 10     nos       p   Nieces from older sisters   Nephews from older brothers
## 11     nys       q Nieces from younger sisters Nephews from younger brothers
## 12       n    <NA>                      Nieces                       Nephews
## 13      oa       r     Aunts older than mother     Uncles older than fathers
## 14      ya       s   Aunts younger than mother    Uncles younger than father
## 15       a    <NA>                       Aunts                        Uncles
## 16      os       m               Older sisters                Older brothers
## 17      ys       n             Younger sisters              Younger brothers
## 18       s    <NA>                     Sisters                      Brothers
##                          Labels_2sex
## 1    Cousins from older aunts/uncles
## 2  Cousins from younger aunts/uncles
## 3                            Cousins
## 4                           Children
## 5                    Grand-childrens
## 6              Great-grand-childrens
## 7                Great-grandfparents
## 8                       Grandparents
## 9                            Parents
## 10      Niblings from older siblings
## 11    Niblings from younger siblings
## 12                          Niblings
## 13   Aunts/Uncles older than parents
## 14 Aunts/Uncles younger than parents
## 15                      Aunts/Uncles
## 16                    Older siblings
## 17                  Younger siblings
## 18                          Siblings

1.3. Value

DemoKin::kin() returns a list containing two data frames: kin_full and kin_summary.

str(br_2023)

## List of 2
##  $ kin_full   : tibble [142,814 × 7] (S3: tbl_df/tbl/data.frame)
##   ..$ kin      : chr [1:142814] "d" "d" "d" "d" ...
##   ..$ age_kin  : int [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ age_focal: int [1:142814] 0 1 2 3 4 5 6 7 8 9 ...
##   ..$ living   : num [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ dead     : num [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ cohort   : logi [1:142814] NA NA NA NA NA NA ...
##   ..$ year     : logi [1:142814] NA NA NA NA NA NA ...
##  $ kin_summary: tibble [1,414 × 9] (S3: tbl_df/tbl/data.frame)
##   ..$ age_focal     : int [1:1414] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ kin           : chr [1:1414] "coa" "cya" "d" "gd" ...
##   ..$ year          : logi [1:1414] NA NA NA NA NA NA ...
##   ..$ count_living  : num [1:1414] 0.2183 0.0704 0 0 0 ...
##   ..$ mean_age      : num [1:1414] 11.48 5.53 NaN NaN NaN ...
##   ..$ sd_age        : num [1:1414] 8.08 4.7 NaN NaN NaN ...
##   ..$ count_dead    : num [1:1414] 0.000168 0.000105 0 0 0 ...
##   ..$ count_cum_dead: num [1:1414] 0.000168 0.000105 0 0 0 ...
##   ..$ mean_age_lost : num [1:1414] 0 0 NaN NaN NaN 0 0 0 0 NaN ...

`kin_full`

This data frame contains expected kin counts by year (or cohort), age of Focal, type of kin and, age of kin, including living and dead kin at that age.

head(br_2023$kin_full)

## # A tibble: 6 × 7
##   kin   age_kin age_focal living  dead cohort year 
##   <chr>   <int>     <int>  <dbl> <dbl> <lgl>  <lgl>
## 1 d           0         0      0     0 NA     NA   
## 2 d           0         1      0     0 NA     NA   
## 3 d           0         2      0     0 NA     NA   
## 4 d           0         3      0     0 NA     NA   
## 5 d           0         4      0     0 NA     NA   
## 6 d           0         5      0     0 NA     NA

`kin_summary`

This is a ‘summary’ data frame derived from kin_full. To produce it, we sum over all ages of kin to produce a data frame of expected kin counts by year or cohort and age of Focal (but not by age of kin). This is how the kin_summary object is derived:

kin_by_age_focal <- 
  br_2023$kin_full %>% 
  group_by(kin, age_focal) %>% 
  summarise(count = sum(living)) %>% 
  ungroup()

# Check that they are identical (for living kin only here)

kin_by_age_focal %>% 
  select(kin, age_focal, count) %>% 
  identical(
    br_2023$kin_summary %>% 
      select(kin, age_focal, count = count_living) %>% 
      arrange(kin, age_focal)
  )

## [1] TRUE

2. Example: kin counts in time-invariant populations

Following Caswell (2019), we assume a female closed population in which everyone experiences the Brazilian 2023 mortality and fertility rates at each age throughout their life. We then ask:

How can we characterize the kinship network of an average member of the population (call her ‘Focal’)?

output_kin <- c("c", "d", "gd", "ggd", "ggm", "gm", "m", "n", "a", "s")

# Run kinship models
br_2023 <- kin(p = br_surv_2023, f = br_asfr_2023, output_kin = output_kin, time_invariant = TRUE)

2.1. Living kin

Now, let’s visualize how the expected number of daughters, siblings, cousins, etc., changes over the life course of Focal (now, with full names to identify each relative type using the function DemoKin::rename_kin()).

br_2023$kin_summary %>%
  rename_kin() %>%
  ggplot() +
  geom_line(aes(age_focal, count_living))  +
  theme_bw() +
  labs(x = "Age of focal", y= "Number of living female relatives") +
  facet_wrap(~kin_label)

Note that we are working in a time invariant framework. You can think of the results as analogous to life expectancy (i.e., expected years of life for a synthetic cohort experiencing a given set of period mortality rates).

How does overall family size (and family composition) vary over life for an average woman who survives to each age?

counts <- 
  br_2023$kin_summary %>%
  group_by(age_focal) %>% 
  summarise(count_living = sum(count_living)) %>% 
  ungroup()

br_2023$kin_summary %>%
  select(age_focal, kin, count_living) %>% 
  rename_kin() %>% 
  ggplot(aes(x = age_focal, y = count_living)) +
  geom_area(aes(fill = kin_label), colour = "black") +
  geom_line(data = counts, size = 2) +
  labs(x = "Age of focal",
       y = "Number of living female relatives",
       fill = "Kin") +
  coord_cartesian(ylim = c(0, 6)) +
  theme_bw() +
  theme(legend.position = "bottom")

2.2. Age distribution of living kin

How old are Focal’s relatives? Using the kin_full data frame, we can visualize the age distribution of Focal’s relatives throughout Focal’s life. For example when Focal is 35, what are the ages of her relatives:

br_2023$kin_full %>%
  rename_kin() %>%
  filter(age_focal == 35) %>%
  ggplot() +
  geom_line(aes(age_kin, living)) +
  labs(x = "Age of kin",
       y = "Expected number of living relatives") +
  theme_bw() +
  facet_wrap(~kin_label)

2.3. Deceased kin

We have focused on living kin, but what about relatives who have died during Focal’s life? The output of kin also includes information of kin deaths experienced by Focal.

We start by considering the number of kin deaths that can expect to experience at each age. In other words, the non-cumulative number of deaths in the family that Focal experiences at a given age.

loss1 <- 
  br_2023$kin_summary %>%
  filter(age_focal>0) %>%
  group_by(age_focal) %>% 
  summarise(count_dead = sum(count_dead)) %>% 
  ungroup()

br_2023$kin_summary %>%
  rename_kin() %>%
  filter(age_focal > 0) %>%
  group_by(age_focal, kin_label) %>%
  summarise(count_dead = sum(count_dead)) %>%
  ungroup() %>%
  ggplot(aes(x = age_focal, y = count_dead)) +
    geom_area(aes(fill = kin_label), colour = "black") +
    geom_line(data = loss1, size = 2) +
    labs(x = "Age of focal",
         y = "Number of kin deaths experienced at each age",
         fill = "Kin") +
    coord_cartesian(ylim = c(0, 0.086)) +
    theme_bw() +
    theme(legend.position = "bottom")

Now, we combine all kin types to show the cumulative burden of kin death for an average member of the population surviving to each age:

loss2 <- 
  br_2023$kin_summary %>%
  group_by(age_focal) %>% 
  summarise(count_cum_dead = sum(count_cum_dead)) %>% 
  ungroup()

br_2023$kin_summary %>%
  rename_kin() %>% 
  group_by(age_focal, kin_label) %>% 
  summarise(count_cum_dead = sum(count_cum_dead)) %>% 
  ungroup() %>% 
 ggplot(aes(x = age_focal, y = count_cum_dead)) +
  geom_area(aes(fill = kin_label), colour = "black") +
  geom_line(data = loss2, aes(y = count_cum_dead), size = 2) +
  labs(x = "Age of focal",
       y = "Number of kin deaths experienced (cumulative)",
       fill = "Kin") +
  theme_bw() +
  theme(legend.position = "bottom")

A member of the population aged 15, 50, and 65 years old will have experienced, on average, the death of relatives, respectively:

loss2 %>%
  filter(age_focal %in% c(15, 50, 65)) %>%
  select(count_cum_dead) %>%
  pull(count_cum_dead) %>%
  round(1) %>%
  paste(collapse = ", ")

## [1] "0.6, 2.2, 3.1"

3. Exercises

For all exercises, assume time-invariant rates at the 2023 levels in the country of your choice and a female-only population.

3.1 Offspring availability and loss

Use DemoKin (assuming time-invariant rates at the 2023 levels in the country of your choice and a female-only population) to explore offspring survival and loss for mothers. Answer: What is the expected number of surviving offspring for an average woman aged 65?

Answer: What is the cumulative number of offspring deaths experienced by an average woman who survives to age 65?

3.2 Mean age of kin

The output of DemoKin::kin includes information on the average age of Focal’s relatives (in the columns kin_summary$mean_age and kin_summary$sd_age). For example, this allows us to determine the mean age, standard deviation and coefficient of variation of Focal’s sisters over Focal’s life-course:

3.2.1 Using only the raw output in kin_full, get then mean age of living mother, daughter and sisters for a female aged 35.

3.2.2 Living mother

What is the probability that Focal (an average woman in your country of choice) has a living mother over Focal’s live?

Instructions

Use DemoKin to obtain $M_1(a)$, the probability of having a living mother at age $a$ in a stable population. Conditional on ego’s survival, $M_1{(a)}$ can be thought of as a survival probability in a life table: it has to be equal to one when $a$ is equal to zero (the mother is alive when she gives birth), and goes monotonically to zero.

Answer: What is the probability that Focal has a living mother when Focal turns 70 years old? And 25 years old?

Lab session 3 (Wednesday)

In today’s session we will see how to application for the one-sex; time-variant model and two-sex;time-variant and invariant model.

First, load required libraries:

library(DemoKin)
library(dplyr)
library(tidyr)
library(ggplot2)

1. Living kin

We saw yesterday the case of Brazil in 2023 assuming constant rates. But the demography of Brazil is, as you know, changing every year. This means that Focal and her relatives will have experienced changing mortality and fertility rates over time.

Let’s select the Brazilian mortality and fertility rates for a range of years.

# Load function
source("UNWPP_data.R")

# Select country, year and sex
brazil_data <- UNWPP_data(country = "Brazil",
                   start_year = 1950 ,
                   end_year = 2023,
                   sex = "Female")

# Reshape fertility
br_asfr <- brazil_data %>%
  select(age, year, fx) %>%
  pivot_wider(names_from = year, values_from = fx) %>%
  select(-age) %>%
  as.matrix()

# Reshape survival
br_px <- brazil_data %>%
  select(age, year, px) %>%
  pivot_wider(names_from = year, values_from = px) %>%
  select(-age) %>%
  as.matrix()

The data we are using has years in columns and ages in rows. Here, we plot $q_x$ (p’s complement) over age and time:

br_px %>%
    as.data.frame() %>%
    mutate(age = 1:nrow(br_px)-1) %>%
    pivot_longer(-age, names_to = "year", values_to = "px") %>%
    mutate(qx = 1-px) %>%
    ggplot() +
    geom_line(aes(x = age, y = qx, col = year)) +
    scale_y_log10() +
    theme(legend.position = "none")

Age-specific fertility rates:

br_asfr %>% as.data.frame() %>%
     mutate(age = 1:nrow(br_asfr)-1) %>%
     pivot_longer(-age, names_to = "year", values_to = "asfr") %>%
     mutate(year = as.integer(year)) %>%
     ggplot() + geom_tile(aes(x = year, y = age, fill = asfr)) +
     scale_x_continuous(breaks = seq(1900,2020,10), labels = seq(1900,2020,10))

And female population by age: (Remember we saw on Monday how to download and select the population (N)? We will use it in this section)

# Load function to filter data
source("UNWPP_data.R")

br_pop <-
  UNWPP_pop(country_name = "Brazil",
            start_year = 1950,
            end_year = 2023,
            sex = "Female")

With this input we can model kinship structure in Age-Period-Cohort (APC) dimensions:

1.1 Cohort approach

Let’s take a look at the resulting kin counts from a time-variant (argument time_invariant = FALSE) model for a Focal born in 1960, limiting the output to a selection of relatives (see argument output_kin) and a given cohort (argument output_cohort). Do you see any new parameter?

br_time_varying_1960_cohort <-
  DemoKin::kin(p = br_px,
    f = br_asfr,
    n = br_pop,
    time_invariant =FALSE,
    output_cohort = 1960,
    output_kin = c("d","gd","ggd","m","gm","ggm"))

# plot
br_time_varying_1960_cohort$kin_summary %>%
  rename_kin() %>%
  ggplot(aes(age_focal,count_living)) +
  geom_line()+
  scale_y_continuous(name = "Expected number of living relatives",labels = seq(0,3,.2),breaks = seq(0,3,.2))+
  facet_wrap(~kin_label)+
  labs(x = "Age of Focal")+
  theme_bw()

These are the living kin that for an average woman born in 1960, given the time-variant fertility, mortality and population distribution for the 1950-2023 period.Note the argument output_cohort = 1960, used to extract estimates for a given cohort of Focals (a diagonal in the Lexis diagram). This is a subset from all possible results (101 age-classes and 73 years (1950 - 2023)). Estimates stop at age 63 because we only provided (period) input data up to year 2023 (2023 - 1960 = 63).

Let us now compare across cohorts. We can, for example compare the 1960 and 1990 cohorts.

br_time_varying_1990_1960_cohort <-
  kin(p = br_px,
    f = br_asfr,
    n = br_pop,
    time_invariant =FALSE,
    output_cohort = c(1990, 1960),
    output_kin = c("d","gd","ggd","m","gm","ggm"))

# plot
br_time_varying_1990_1960_cohort$kin_summary %>%
  rename_kin() %>%
  mutate(cohort = as.factor(cohort)) %>%
  ggplot(aes(age_focal,count_living,color=cohort)) +
  geom_line()+
  scale_y_continuous(name = "Expected number of living relatives",labels = seq(0,3,.2), breaks = seq(0,3,.2))+
  labs(x = "Age of Focal")+
  facet_wrap(~kin_label)+
  theme_bw()

1.2 Period approach

Maybe you are interested in taking a snapshot of kin distribution in some year, for example 1960. You can do this by specifying the argumentoutput_period = 1960.

br_time_varying_1960_period <-
  kin(
    p = br_px,
    f = br_asfr,
    n = br_pop,
    time_invariant =FALSE,
    output_period = 1960,
    output_kin = c("d","gd","ggd","m","gm","ggm")
    )

# plot
br_time_varying_1960_period$kin_summary %>%
  rename_kin() %>%
  ggplot(aes(age_focal, count_living)) +
  geom_line() +
  scale_y_continuous(
    name = "Expected number of living relatives",
    limits = c(0, 5),  
    labels = seq(0, 5, 0.5),  
    breaks = seq(0, 5, 0.5)  
  ) +
  facet_wrap(~kin_label, scales = "free") +
  labs(x = "Age of Focal")+
  theme_bw()

Answer: Do these ‘period’ plots look similar to the ‘cohort’ plots shown above? When would you prefer a period over a cohort approach?

1.3 DemoKin doesn’t like cohort-period combinations

DemoKin will only return values for either periods OR cohorts, but never for period-cohort combinations. This is related to time/memory issues. E.g., providing all possible period-cohort estimates in our exampe would give a data frame with 119 X 101 x 101 x 14 ~ 17 millions rows.

Consider the following code, which will give an error since we are asking for both a cohort and period output at the same time:

kin(p = br_px,
    f = br_asfr,
    n = br_pop,
    time_invariant =FALSE,
    output_cohort = c(1960, 1990),
    output_period = 2000,
    output_kin = c("d","gd","ggd","m","gm","ggm"))

## Error in kin(p = br_px, f = br_asfr, n = br_pop, time_invariant = FALSE, : sorry, you can not select cohort and period. Choose one please

2. Kin death

Kin loss can have severe consequences for bereaved relatives as it affects, for example, the provision of care support and intergenerational transfers over the life course. The function kin provides information on the number of relatives lost by Focal during her life, stored in the column kin_summary$count_cum_death. The plot below compares patterns of kin loss for the 1960 and 1990 cohorts.

br_time_varying_1990_1960_cohort$kin_summary %>%
  rename_kin() %>%
  mutate(cohort = as.factor(cohort)) %>%
  ggplot() +
  geom_line(aes(age_focal, count_cum_dead, col = cohort)) +
  labs(y = "Expected number of deceased relatives") +
  theme_bw() +
  labs(x = "Age of Focal")+
  facet_wrap(~kin_label,scales="free")

Answer: Based on the previous plot, which kin types show the largest differences in terms of kin loss across the two cohorts? Discuss with regards to absolute and relative differences in the expected number of deaths by kin type.

Given these population-level measures, we can also compute Focal’s mean age at the time of her relative’s death.

br_time_varying_1990_1960_cohort$kin_summary %>%
  rename_kin() %>%
  filter(age_focal == 30) %>%
  select(kin_label, cohort, mean_age_lost) %>%
  pivot_wider(names_from = cohort, values_from = mean_age_lost) %>%
  mutate_if(is.numeric, round, 1)

## # A tibble: 6 × 3
##   kin_label             `1960` `1990`
##   <chr>                  <dbl>  <dbl>
## 1 Daughters               23.8   23.1
## 2 Grand-daughters        NaN    NaN  
## 3 Great-grand-daughters  NaN    NaN  
## 4 Great-grandmothers       8.7   10.2
## 5 Grandmothers            16.1   17.1
## 6 Mother                  17.4   18.6

Answer: Consider a Focal aged 30 in both cohorts: how would you describe the differences in terms of her mean age at kin loss for different relatives types?

3.Age-classified two-sex kinship models and some exercises.

Human males generally live shorter and reproduce later than females. These sex-specific processes affect kinship dynamics in a number of ways. For example, the degree to which an average member of the population, call her Focal, has a living grandparent is affected by differential mortality affecting the parental generation at older ages. We may also be interested in considering how kinship structures vary by Focal’s sex: a male Focal may have a different number of grandchildren than a female Focal given differences in fertility by sex. Documenting these differences matters since women often face greater expectations to provide support and informal care to relatives. As they live longer, they may find themselves at greater risk of being having no living kin. The function kin2sex implements two-sex kinship models as introduced by Caswell (2022).

3.1 Demographic rates by sex

Data on male fertility by age is less common than female fertility. Schoumaker (2019) shows that male TFR is almost always higher than female Total Fertility Rates (TFR) using a sample of 160 countries, and this gap decrease with fertility transition.

For this example, we use data from 2012 France (from Caswell (2022)) to exemplify the use of the two-sex function in DemoKin. Data on female and male fertility and mortality are included in the package.

age <- 0:100
ages <- length(age)
fra_fert_f <- fra_asfr_sex[,"ff"]
fra_fert_m <- fra_asfr_sex[,"fm"]
fra_surv_f <- fra_surv_sex[,"pf"]
fra_surv_m <- fra_surv_sex[,"pm"]

# plot
data.frame(value = c(fra_fert_f, fra_fert_m, fra_surv_f, fra_surv_m),
           age = rep(age, 4),
           sex = rep(c(rep("f", ages), rep("m", ages)), 2),
           risk = c(rep("Fertility rate", ages * 2), rep("Survival probability", ages * 2))) %>%
  ggplot(aes(age, value, col=sex)) +
  geom_line() +
  facet_wrap(~ risk, scales = "free_y") +
  theme_bw()

3.2 Time-invariant two-sex kinship models

We now introduce the functions kin2sex, which is similar to the one-sex function kin (see ?kin) with two exceptions. First, the user needs to specify mortality and fertility by sex. Second, needs indicate the sex of Focal (which is assumed to be female by default, as in the one-sex model). Let us first consider the application for time-invariant populations:

fra_kin_2sex <- kin2sex(
  pf = fra_surv_f,
  pm = fra_surv_m,
  ff = fra_fert_f,
  fm = fra_fert_m,
  time_invariant = TRUE,
  sex_focal = "f",
  birth_female = .5)

The output of kin2sex is equivalent to that of kin, except that it includes a column sex_kin to specify the sex of the given relatives. Take a look with head(fra_kin_2sex$kin_summary).

A note on terminology: The function kin2sex uses the same codes as kin to identify relatives (see demokin_codes()). Note that when running a two-sex model, the code ‘m’ refers to either mothers or fathers! Use the column sex_kin to filter the sex of a given relatives. For example, in order to consider only sons and ignore daughters, use:

fra_kin_2sex$kin_summary %>%
  filter(kin == "d", sex_kin == "m") %>%
  head()

## # A tibble: 6 × 11
##   age_focal kin   sex_kin year  cohort count_living mean_age sd_age count_dead
##       <int> <chr> <chr>   <lgl> <lgl>         <dbl>    <dbl>  <dbl>      <dbl>
## 1         0 d     m       NA    NA                0      NaN    NaN          0
## 2         1 d     m       NA    NA                0      NaN    NaN          0
## 3         2 d     m       NA    NA                0      NaN    NaN          0
## 4         3 d     m       NA    NA                0      NaN    NaN          0
## 5         4 d     m       NA    NA                0      NaN    NaN          0
## 6         5 d     m       NA    NA                0      NaN    NaN          0
## # ℹ 2 more variables: count_cum_dead <dbl>, mean_age_lost <dbl>

Let’s group aunts and siblings and visualize the number of living kin by sex and Focal’s age.

kin_out <- fra_kin_2sex$kin_summary %>%
  rename_kin(sex = "2sex") %>%
  filter(kin_label %in% c("Children",
                          "Grand-childrens",
                          "Grandparents",
                          "Parents",
                          "Younger siblings",
                          "Older siblings"))

kin_out %>%
  summarise(count=sum(count_living), .by = c(kin_label, age_focal, sex_kin)) %>%
  ggplot(aes(age_focal, count, fill=sex_kin))+
  geom_area()+
  theme_bw() +
  labs(y = "Expected number of living kin by sex and Focal's age",
       x = "Age of Focal",
       fill = "Sex of Kin") +
  facet_wrap(~kin_label)

Information on kin availability by sex allows us to consider sex ratios, a traditional measure in demography, with females often in denominator. The following figure, for example, shows that a 25yo French woman in our hypothetical population can expect to have 0.5 grandfathers for every grandmother. Is always the case that the sex ratio will decrease by Focal´s age?

kin_out %>%
  group_by(kin_label, age_focal) %>%
  summarise(sex_ratio = sum(count_living[sex_kin=="m"], na.rm=T)/sum(count_living[sex_kin=="f"], na.rm=T)) %>%
  ggplot(aes(age_focal, sex_ratio))+
  geom_line()+
  theme_bw() +
  labs(y = "Sex ratio",
       x = "Age of Focal") +
  facet_wrap(~kin_label, scales = "free")

Answer: Should the total number of living aunts be the same in the one-sex model compared to the two-sex models? What about daughters?

The experience of kin loss for Focal depends on differences in mortality between sexes. A female Focal starts losing fathers earlier than mothers. We see a slightly different pattern for grandparents since Focal’s experience of grandparental loss is dependent on the initial availability of grandparents (i.e. if Focal’s grandparent died before her birth, she will never experience his death). What do you think?

kin_out %>%
  summarise(count=sum(count_dead), .by = c(kin_label, sex_kin, age_focal)) %>%
  ggplot(aes(age_focal, count, col=sex_kin))+
  geom_line()+
  theme_bw() +
  labs(y = "Expected number of deceased kin by sex and Focal's age",
       x = "Age of Focal",
       col = "Sex of Kin") +
  facet_wrap(~kin_label)

3.3 Time-variant two-sex kinship models

We look at populations where demographic rates are not static but change on a yearly basis. For this, we extend the period using data located in “docs/fra_2sex.Rdata”, that you can load with function load as we did in previous days. This is UN data, so another exercise can be done with HMD and HFD going back in time. For this example, we will ‘pretend’ that male fertility rates are the same than fertility but slightly older, translating shape for the difference in the mean age observed in 2012 (that you calculated before). Actually there is some data for the period 1998-2013 in HFD, but just to keep it simple so far (and also needs to extrapolate back level and pattern).

load("fra_2sex.Rdata")
years <- ncol(fra_asfr_females)
ages <- nrow(fra_asfr_females)

# difference between sex in mean age in 2012
mac_females_2012 <- sum(0:100 * fra_fert_f)/sum(fra_fert_f)
mac_males_2012   <- sum(0:100 * fra_fert_m)/sum(fra_fert_m)
dif_mac_2012     <- trunc(mac_males_2012 - mac_females_2012)

# create a matrix of male fertility
fra_asfr_males <- matrix(0, ages, years)
colnames(fra_asfr_males) <- colnames(fra_asfr_females)
fra_asfr_males[(dif_mac_2012+1):ages,] <- fra_asfr_females[1:(ages-dif_mac_2012),]

# plot any year
plot(age, fra_asfr_females[,"1990"], t="l", col=2, ylab = "asfr")
lines(age, fra_asfr_males[,"1990"], col=4)
legend("topright", c("females", "males"), col=c(2,4), lty=1)

We now run the time-variant two-sex models (note the time_invariant = FALSE argument):

kin_out_time_variant <- kin2sex(
                      pf = fra_surv_females,
                      pm = fra_surv_males,
                      ff = fra_asfr_females,
                      fm = fra_asfr_males,
                      sex_focal = "f",
                      time_invariant = FALSE,
                      birth_female = .5,
                      output_cohort = 1950)

We can plot data on kin availability alongside values coming from a time-invariant model to show how demographic change matters: the time-variant models take into account changes derived from the demographic transition, whereas the time-invariant models assume never-changing rates. Effects are the same for each sex?

kin_out_time_invariant <- kin2sex(
                      pf = fra_surv_females[,"1950"],
                      pm = fra_surv_males[,"1950"],
                      ff = fra_asfr_females[,"1950"],
                      fm = fra_asfr_males[,"1950"],
                      time_invariant = TRUE,
                      sex_focal = "f", birth_female = .5)


kin_out_time_variant$kin_summary %>%
  filter(cohort == 1950) %>% mutate(type = "Variant") %>%
  bind_rows(kin_out_time_invariant$kin_summary %>% mutate(type = "Invariant")) %>%
  mutate(kin_label = case_when(kin == "d" ~ "Children",
                         kin == "m" ~ "Parents",
                         kin == "gm" ~ "Grandparents",
                         kin == "ggm" ~ "Great-grandparents",
                         kin == "os" ~ "Siblings",
                         kin == "ys" ~ "Siblings",
                         kin == "oa" ~ "Aunts/Uncles",
                         kin == "ya" ~ "Aunts/Uncles",
                         T ~ "error?")) %>% 
    filter(kin_label %in% c("Children", "Parents", "Grandparents", "Great-grandparents", "Siblings", "Aunts/Uncles")) %>% 
  group_by(type, kin_label, age_focal, sex_kin) %>%
  summarise(count=sum(count_living)) %>%
  ggplot(aes(age_focal, count, linetype=type))+
  geom_line()+ theme_bw() +
  labs(y = "Expected number of living kin by sex and Focal's age",
       x = "Age of Focal",
       linetype = "Model") +
  facet_grid(cols = vars(kin_label), rows=vars(sex_kin), scales = "free")

An interpretation note: we are not tracking line of descendence or ascendence. That means that for example, grand-daughters can not be differentiate if they are offspring from Focal´s son or Focal´s daughter. You can visualize this looking outputs from DemoKin and relating which data you can construct with actual variables.

Lab session 4 (Thursday)

1 What about countries we don’t have information on male fertility?

1.1 Living kin by sex

1.1.1 Download data for your country from United Nations-Population Division. But this time also use the parameter `sex = "Male"` because you will need male-specific survival patterns by age and time. Don´t forget to reshape data to matrix format as we did yesterday. Assume for this exercise the same fertility pattern for males than females and answer:

Let’s see an example for Brazil:

# Load function
source("UNWPP_data.R")

# Female data
data_females <- UNWPP_data(country = "Brazil",
                   start_year =  1950,
                   end_year = 2023,
                   sex = "Female")

# Male data

data_males <- UNWPP_data(country = "Brazil",
                   start_year =  1950,
                   end_year = 2023,
                   sex = "Male")

1.1.2 In a time invariant model: how many living grand-mothers and grand-fathers can a woman expect to have at age 15 in 1950 and in 2023, and what are their mean ages? Extract a conclusion based on the results.

1.1.3 Compare ‘kin sex ratios’ of grandparents, parents, daughters and siblings in a time-variant framework for the cohort 1950, at each age of Focal.

2. For the country of your choice, how many living kins by sex can a woman expect to have at age 25 in 2023? Run the following settings:

One-sex model; time-invariant rates; GKP factors
Two-sex model; time-variant rates; approximate male kin using the androgynous assumption (i.e., male fertility is equivalent to female fertility); use mortality rates for males and females
One-sex model; time-invariant rates; GKP factors
Two-sex model; time-variant rates; approximate male kin using the androgynous assumption

3. Extensions

For a detailed description of extensions of the matrix kinship model, see:

time-invariant rates (Caswell 2019),
multistate models (Caswell 2020),
time-varying rates (Caswell and Song 2021), and
two-sex models (Caswell 2022).

Assignment

Description

The assignment should be completed in groups (of three) that will be defined at the start of the course.

You will use data on kinship structures to benchmark formal models of kinship. For this exercise, you will use the DemoKin R package to implement formal models of kinship. You should choose one country and run four different models according to the following specifications:

One-sex model; approximate male kin using GKP factors
- time-invariant rates
- time-variant rates
Two-sex model; approximate male kin using the androgynous assumption
- time-invariant rates
- time-variant rates

Use the output of the four models to answer the following questions: 1. Plot the expected number of living relatives by age of focal for each specification. For extra points (i.e., this is optional), also plot the expected number of deceased relatives by age of focal.

Discuss 1-2 key insights, when would you use different specifications? Consider the specific context and the data available for the country you selected. (max 250 words)
Can you think of other ways of incorporating male fertility into the kinship models (beyond the options we discussed in the course)? (max 250 words)

Handing in the assignment

Assignments (one per group) should be sent by email to martins@demogr.mpg.de before midnight of Friday, November 22. You should hand in the following files:

An .RMD file with all your code and answers to the exercise questions
A compiled .pdf of your markdown file showing all the code
All input data needed to replicate your code

References

Caswell, Hal. 2019. “The Formal Demography of Kinship: A Matrix Formulation.” Demographic Research 41 (September): 679–712. https://doi.org/10.4054/DemRes.2019.41.24.

———. 2020. “The Formal Demography of Kinship II: Multistate Models, Parity, and Sibship.” Demographic Research 42 (June): 1097–1146. https://doi.org/10.4054/DemRes.2020.42.38.

———. 2022. “The Formal Demography of Kinship IV: Two-Sex Models and Their Approximations.” Demographic Research 47 (September): 359–96. https://doi.org/10.4054/DemRes.2022.47.13.

Caswell, Hal, and Xi Song. 2021. “The Formal Demography of Kinship. III. Kinship Dynamics with Time-Varying Demographic Rates.” Demographic Research 45: 517–46.

Keyfitz, Nathan, and Hal Caswell. 2005. Applied Mathematical Demography. New York: Springer.