Kinship is a fundamental property of human populations and a key form of social structure. Demographers have long been interested in the interplay between demographic change and family configuration. This has led to the development of sophisticated methodological and conceptual approaches for the study of kinship, some of which are reviewed in this course.
Some useful things to know:
Download the syllabus: https://github.com/amandamartinsal/EDSD_kinship_24-25/blob/master/EDSD_2024_25_kinship_syllabus.pdf
The course’s slides are available here: https://github.com/amandamartinsal/EDSD_kinship_24-25/tree/master/slides
Find this website’s source code on GitHub: https://github.com/amandamartinsal/EDSD_kinship_24-25
We will start soon the computer lab sessions, so would be great if in advance we have prepared the R environment. First, you will need R and Rstudio installed. Second, install the DemoKin!
The DemoKin
package includes data from Sweden as an
example. This comes from the Human
Mortality Database and Human Fertility Database.
swe_px
matrix; survival probabilities by ageFirst we have survival probabilities by age:
## 1900 1901 1902 1903 1904
## 0 0.91060 0.90673 0.92298 0.91890 0.92357
## 1 0.97225 0.97293 0.97528 0.97549 0.97847
## 2 0.98525 0.98579 0.98630 0.98835 0.98921
## 3 0.98998 0.98947 0.99079 0.99125 0.99226
## 4 0.99158 0.99133 0.99231 0.99352 0.99272
It has years in columns and age in rows. Plotting \(q_x\) (\(p_x\)´s complement) over age for 2018 gives:
swe_px %>%
as.data.frame() %>%
select(px = `2018`) %>%
mutate(ages = 1:nrow(swe_px)-1) %>%
ggplot() +
geom_line(aes(x = ages, y = 1-px)) +
scale_y_log10()
swe_asfr
matrix; age specific fertility rateAnd age-specific fertility rates:
## 1900 1901 1902 1903
## 14 0.00013 0.00006 0.00008 0.00008
## 15 0.00053 0.00054 0.00057 0.00057
## 16 0.00275 0.00319 0.00322 0.00259
## 17 0.00932 0.00999 0.00965 0.00893
## 18 0.02328 0.02337 0.02347 0.02391
## 19 0.04409 0.04357 0.04742 0.04380
Plotted over time and age for the same year:
swe_asfr %>% as.data.frame() %>%
as.data.frame() %>%
select(fx = `2018`) %>%
mutate(age = 1:nrow(swe_asfr)-1) %>%
ggplot() +
geom_line(aes(x = age, y = fx))
We can visualize the implied kin counts for a Focal girl aged 5 yo in
a time-invariant population using a network or ‘Keyfitz’ kinship diagram
(Keyfitz and Caswell 2005) with the
plot_diagram
function:
First, get vectors for a given year:
Run kinship models
swe_2018$kin_summary %>%
filter(age_focal == 5) %>%
select(kin, count = count_living) %>%
plot_diagram(rounding = 2)
We can access demographic rates from any country in the world, produced by the World Population Prospects (WPP) project.
In the lab sessions we will work with Brazil, so lets see the case of Brazil female population in 2023. For today session the data is being provided, you will download it later as homework.
We have to reshape fertility and mortality to create a matrix to be
used by DemoKin
(i.e., create a matrix with years as columns
and ages as rows):
Reshape fertility
country_fert <- brazil_data %>%
select(age, year, fx) %>%
pivot_wider(names_from = year, values_from = fx) %>%
select(-age) %>%
as.matrix()
Reshape survival
We can visualize the implied kin counts for a Focal girl aged 5 yo in
a time-invariant population using a network or ‘Keyfitz’ kinship diagram
(Keyfitz and Caswell 2005) with the
plot_diagram
function:
Run kinship models
br_2023$kin_summary %>%
filter(age_focal == 5) %>%
select(kin, count = count_living) %>%
plot_diagram(rounding = 2)
Now share with the class what you have discussed in your groups!
We will download fertility, mortality and population data for all countries from 1950 to 2023.
First, we will load the download_wpp24()
function, which
automates the process of downloading these data. Please note that this
process may take a couple of minutes.
# Load the function to download data
source("UNWPP_download.R")
# Download the data
download_wpp24()
## [1] "2024 World Population Prospects data downloaded and saved successfully as CSV files."
After downloading the data, we can filter the information we are interested in, for example, the country of interest. To do this, we will load two functions:
UNWPP_data
: This function filters the data for
mortality (px) and fertility (fx) by the country of interest and the
relevant time period.
UNWPP_pop
: This function filters data for the
population size (N), which will be necessary when discussing
time-variant contexts on Wednesday.
To choose the country of interest, you can look at the name of the country here.
# Load function to filter data
source("UNWPP_data.R")
# Select country, year and sex to obtain px and fx
brazil_data <-
UNWPP_data(country = "Brazil",
start_year = 2023,
end_year = 2023,
sex = "Female")
# Select country, year and sex to obtain N
brazil_pop <-
UNWPP_pop(country = "Brazil",
start_year = 2023,
end_year = 2023,
sex = "Female")
Using the WPP data you downloaded, build the ‘Keyfitz’ kinship diagram for a Focal girl aged 5 yo in a time-invariant population. Discuss the results in relation to what we have seen for Brazil: identify the main differences and the reasons for them.
In today’s session we will see the DemoKin
functions and
how to run a one-sex; time-invariant model.
Today (and tomorrow) we will use the Brazilian data from WPP that we downloaded yesterday. Select the data again if necessary:
kin()
DemoKin
can be used to compute the number and age
distribution of Focal’s relatives under a range of assumptions,
including living and deceased kin. The function
DemoKin::kin()
currently does most of the heavy lifting in
terms of implementing matrix kinship models.
This is what it looks like in action, in this case assuming time-invariant demographic rates:
# First, reshape fertility and survival for a given year
br_asfr_2023 <- brazil_data %>%
select(age, year, fx) %>%
pivot_wider(names_from = year, values_from = fx) %>%
select(-age) %>%
as.matrix()
br_surv_2023 <- brazil_data %>%
select(age, year, px) %>%
pivot_wider(names_from = year, values_from = px) %>%
select(-age) %>%
as.matrix()
Relatives for the output_kin
argument are identified by
a unique code. Note that the relationship codes used in
DemoKin
differ from those in Caswell (2019). The equivalence between the two set of
codes is given in the following table:
## DemoKin Caswell Labels_female Labels_male
## 1 coa t Cousins from older aunts Cousins from older uncles
## 2 cya v Cousins from younger aunts Cousins from younger uncles
## 3 c <NA> Cousins Cousins
## 4 d a Daughters Sons
## 5 gd b Grand-daughters Grand-sons
## 6 ggd c Great-grand-daughters Great-grand-sons
## 7 ggm h Great-grandmothers Great-grandfathers
## 8 gm g Grandmothers Grandfathers
## 9 m d Mother Father
## 10 nos p Nieces from older sisters Nephews from older brothers
## 11 nys q Nieces from younger sisters Nephews from younger brothers
## 12 n <NA> Nieces Nephews
## 13 oa r Aunts older than mother Uncles older than fathers
## 14 ya s Aunts younger than mother Uncles younger than father
## 15 a <NA> Aunts Uncles
## 16 os m Older sisters Older brothers
## 17 ys n Younger sisters Younger brothers
## 18 s <NA> Sisters Brothers
## Labels_2sex
## 1 Cousins from older aunts/uncles
## 2 Cousins from younger aunts/uncles
## 3 Cousins
## 4 Children
## 5 Grand-childrens
## 6 Great-grand-childrens
## 7 Great-grandfparents
## 8 Grandparents
## 9 Parents
## 10 Niblings from older siblings
## 11 Niblings from younger siblings
## 12 Niblings
## 13 Aunts/Uncles older than parents
## 14 Aunts/Uncles younger than parents
## 15 Aunts/Uncles
## 16 Older siblings
## 17 Younger siblings
## 18 Siblings
DemoKin::kin()
returns a list containing two data
frames: kin_full
and kin_summary
.
## List of 2
## $ kin_full : tibble [142,814 × 7] (S3: tbl_df/tbl/data.frame)
## ..$ kin : chr [1:142814] "d" "d" "d" "d" ...
## ..$ age_kin : int [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ age_focal: int [1:142814] 0 1 2 3 4 5 6 7 8 9 ...
## ..$ living : num [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ dead : num [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ cohort : logi [1:142814] NA NA NA NA NA NA ...
## ..$ year : logi [1:142814] NA NA NA NA NA NA ...
## $ kin_summary: tibble [1,414 × 9] (S3: tbl_df/tbl/data.frame)
## ..$ age_focal : int [1:1414] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ kin : chr [1:1414] "coa" "cya" "d" "gd" ...
## ..$ year : logi [1:1414] NA NA NA NA NA NA ...
## ..$ count_living : num [1:1414] 0.2183 0.0704 0 0 0 ...
## ..$ mean_age : num [1:1414] 11.48 5.53 NaN NaN NaN ...
## ..$ sd_age : num [1:1414] 8.08 4.7 NaN NaN NaN ...
## ..$ count_dead : num [1:1414] 0.000168 0.000105 0 0 0 ...
## ..$ count_cum_dead: num [1:1414] 0.000168 0.000105 0 0 0 ...
## ..$ mean_age_lost : num [1:1414] 0 0 NaN NaN NaN 0 0 0 0 NaN ...
kin_full
This data frame contains expected kin counts by year (or cohort), age of Focal, type of kin and, age of kin, including living and dead kin at that age.
## # A tibble: 6 × 7
## kin age_kin age_focal living dead cohort year
## <chr> <int> <int> <dbl> <dbl> <lgl> <lgl>
## 1 d 0 0 0 0 NA NA
## 2 d 0 1 0 0 NA NA
## 3 d 0 2 0 0 NA NA
## 4 d 0 3 0 0 NA NA
## 5 d 0 4 0 0 NA NA
## 6 d 0 5 0 0 NA NA
kin_summary
This is a ‘summary’ data frame derived from kin_full
. To
produce it, we sum over all ages of kin to produce a data frame of
expected kin counts by year or cohort and age of Focal (but not
by age of kin). This is how the kin_summary
object is
derived:
kin_by_age_focal <-
br_2023$kin_full %>%
group_by(kin, age_focal) %>%
summarise(count = sum(living)) %>%
ungroup()
# Check that they are identical (for living kin only here)
kin_by_age_focal %>%
select(kin, age_focal, count) %>%
identical(
br_2023$kin_summary %>%
select(kin, age_focal, count = count_living) %>%
arrange(kin, age_focal)
)
## [1] TRUE
Following Caswell (2019), we assume a female closed population in which everyone experiences the Brazilian 2023 mortality and fertility rates at each age throughout their life. We then ask:
How can we characterize the kinship network of an average member of the population (call her ‘Focal’)?
output_kin <- c("c", "d", "gd", "ggd", "ggm", "gm", "m", "n", "a", "s")
# Run kinship models
br_2023 <- kin(p = br_surv_2023, f = br_asfr_2023, output_kin = output_kin, time_invariant = TRUE)
Now, let’s visualize how the expected number of daughters, siblings,
cousins, etc., changes over the life course of Focal (now, with full
names to identify each relative type using the function
DemoKin::rename_kin()
).
br_2023$kin_summary %>%
rename_kin() %>%
ggplot() +
geom_line(aes(age_focal, count_living)) +
theme_bw() +
labs(x = "Age of focal", y= "Number of living female relatives") +
facet_wrap(~kin_label)
Note that we are working in a time invariant framework. You can think of the results as analogous to life expectancy (i.e., expected years of life for a synthetic cohort experiencing a given set of period mortality rates).
How does overall family size (and family composition) vary over life for an average woman who survives to each age?
counts <-
br_2023$kin_summary %>%
group_by(age_focal) %>%
summarise(count_living = sum(count_living)) %>%
ungroup()
br_2023$kin_summary %>%
select(age_focal, kin, count_living) %>%
rename_kin() %>%
ggplot(aes(x = age_focal, y = count_living)) +
geom_area(aes(fill = kin_label), colour = "black") +
geom_line(data = counts, size = 2) +
labs(x = "Age of focal",
y = "Number of living female relatives",
fill = "Kin") +
coord_cartesian(ylim = c(0, 6)) +
theme_bw() +
theme(legend.position = "bottom")
How old are Focal’s relatives? Using the kin_full
data
frame, we can visualize the age distribution of Focal’s relatives
throughout Focal’s life. For example when Focal is 35, what are the ages
of her relatives:
br_2023$kin_full %>%
rename_kin() %>%
filter(age_focal == 35) %>%
ggplot() +
geom_line(aes(age_kin, living)) +
labs(x = "Age of kin",
y = "Expected number of living relatives") +
theme_bw() +
facet_wrap(~kin_label)
We have focused on living kin, but what about relatives who have died
during Focal’s life? The output of kin
also includes
information of kin deaths experienced by Focal.
We start by considering the number of kin deaths that can expect to experience at each age. In other words, the non-cumulative number of deaths in the family that Focal experiences at a given age.
loss1 <-
br_2023$kin_summary %>%
filter(age_focal>0) %>%
group_by(age_focal) %>%
summarise(count_dead = sum(count_dead)) %>%
ungroup()
br_2023$kin_summary %>%
rename_kin() %>%
filter(age_focal > 0) %>%
group_by(age_focal, kin_label) %>%
summarise(count_dead = sum(count_dead)) %>%
ungroup() %>%
ggplot(aes(x = age_focal, y = count_dead)) +
geom_area(aes(fill = kin_label), colour = "black") +
geom_line(data = loss1, size = 2) +
labs(x = "Age of focal",
y = "Number of kin deaths experienced at each age",
fill = "Kin") +
coord_cartesian(ylim = c(0, 0.086)) +
theme_bw() +
theme(legend.position = "bottom")
Now, we combine all kin types to show the cumulative burden of kin death for an average member of the population surviving to each age:
loss2 <-
br_2023$kin_summary %>%
group_by(age_focal) %>%
summarise(count_cum_dead = sum(count_cum_dead)) %>%
ungroup()
br_2023$kin_summary %>%
rename_kin() %>%
group_by(age_focal, kin_label) %>%
summarise(count_cum_dead = sum(count_cum_dead)) %>%
ungroup() %>%
ggplot(aes(x = age_focal, y = count_cum_dead)) +
geom_area(aes(fill = kin_label), colour = "black") +
geom_line(data = loss2, aes(y = count_cum_dead), size = 2) +
labs(x = "Age of focal",
y = "Number of kin deaths experienced (cumulative)",
fill = "Kin") +
theme_bw() +
theme(legend.position = "bottom")
A member of the population aged 15, 50, and 65 years old will have experienced, on average, the death of relatives, respectively:
loss2 %>%
filter(age_focal %in% c(15, 50, 65)) %>%
select(count_cum_dead) %>%
pull(count_cum_dead) %>%
round(1) %>%
paste(collapse = ", ")
## [1] "0.6, 2.2, 3.1"
For all exercises, assume time-invariant rates at the 2023 levels in the country of your choice and a female-only population.
Use DemoKin
(assuming time-invariant rates at the 2023
levels in the country of your choice and a female-only population) to
explore offspring survival and loss for mothers.
Answer: What is the expected number of surviving
offspring for an average woman aged 65?
Answer: What is the cumulative number of offspring deaths experienced by an average woman who survives to age 65?
The output of DemoKin::kin includes information on the average age of Focal’s relatives (in the columns kin_summary$mean_age and kin_summary$sd_age). For example, this allows us to determine the mean age, standard deviation and coefficient of variation of Focal’s sisters over Focal’s life-course:
What is the probability that Focal (an average woman in your country of choice) has a living mother over Focal’s live?
Instructions
Use DemoKin to obtain \(M_1(a)\), the probability of having a living mother at age \(a\) in a stable population. Conditional on ego’s survival, \(M_1{(a)}\) can be thought of as a survival probability in a life table: it has to be equal to one when \(a\) is equal to zero (the mother is alive when she gives birth), and goes monotonically to zero.
Answer: What is the probability that Focal has a living mother when Focal turns 70 years old? And 25 years old?
In today’s session we will see how to application for the one-sex; time-variant model and two-sex;time-variant and invariant model.
We saw yesterday the case of Brazil in 2023 assuming constant rates. But the demography of Brazil is, as you know, changing every year. This means that Focal and her relatives will have experienced changing mortality and fertility rates over time.
Let’s select the Brazilian mortality and fertility rates for a range of years.
# Load function
source("UNWPP_data.R")
# Select country, year and sex
brazil_data <- UNWPP_data(country = "Brazil",
start_year = 1950 ,
end_year = 2023,
sex = "Female")
# Reshape fertility
br_asfr <- brazil_data %>%
select(age, year, fx) %>%
pivot_wider(names_from = year, values_from = fx) %>%
select(-age) %>%
as.matrix()
# Reshape survival
br_px <- brazil_data %>%
select(age, year, px) %>%
pivot_wider(names_from = year, values_from = px) %>%
select(-age) %>%
as.matrix()
The data we are using has years in columns and ages in rows. Here, we plot \(q_x\) (p’s complement) over age and time:
br_px %>%
as.data.frame() %>%
mutate(age = 1:nrow(br_px)-1) %>%
pivot_longer(-age, names_to = "year", values_to = "px") %>%
mutate(qx = 1-px) %>%
ggplot() +
geom_line(aes(x = age, y = qx, col = year)) +
scale_y_log10() +
theme(legend.position = "none")
Age-specific fertility rates:
br_asfr %>% as.data.frame() %>%
mutate(age = 1:nrow(br_asfr)-1) %>%
pivot_longer(-age, names_to = "year", values_to = "asfr") %>%
mutate(year = as.integer(year)) %>%
ggplot() + geom_tile(aes(x = year, y = age, fill = asfr)) +
scale_x_continuous(breaks = seq(1900,2020,10), labels = seq(1900,2020,10))
And female population by age: (Remember we saw on Monday how to download and select the population (N)? We will use it in this section)
# Load function to filter data
source("UNWPP_data.R")
br_pop <-
UNWPP_pop(country_name = "Brazil",
start_year = 1950,
end_year = 2023,
sex = "Female")
With this input we can model kinship structure in Age-Period-Cohort (APC) dimensions:
Let’s take a look at the resulting kin counts from a time-variant
(argument time_invariant = FALSE
) model for a Focal born in
1960, limiting the output to a selection of relatives (see argument
output_kin
) and a given cohort (argument
output_cohort
). Do you see any new parameter?
br_time_varying_1960_cohort <-
DemoKin::kin(p = br_px,
f = br_asfr,
n = br_pop,
time_invariant =FALSE,
output_cohort = 1960,
output_kin = c("d","gd","ggd","m","gm","ggm"))
# plot
br_time_varying_1960_cohort$kin_summary %>%
rename_kin() %>%
ggplot(aes(age_focal,count_living)) +
geom_line()+
scale_y_continuous(name = "Expected number of living relatives",labels = seq(0,3,.2),breaks = seq(0,3,.2))+
facet_wrap(~kin_label)+
labs(x = "Age of Focal")+
theme_bw()
These are the living kin that for an average woman born in 1960,
given the time-variant fertility, mortality and population distribution
for the 1950-2023 period.Note the argument
output_cohort = 1960
, used to extract estimates for a given
cohort of Focals (a diagonal in the Lexis diagram). This is a subset
from all possible results (101 age-classes and 73 years (1950 - 2023)).
Estimates stop at age 63 because we only provided (period) input data up
to year 2023 (2023 - 1960 = 63
).
Let us now compare across cohorts. We can, for example compare the 1960 and 1990 cohorts.
br_time_varying_1990_1960_cohort <-
kin(p = br_px,
f = br_asfr,
n = br_pop,
time_invariant =FALSE,
output_cohort = c(1990, 1960),
output_kin = c("d","gd","ggd","m","gm","ggm"))
# plot
br_time_varying_1990_1960_cohort$kin_summary %>%
rename_kin() %>%
mutate(cohort = as.factor(cohort)) %>%
ggplot(aes(age_focal,count_living,color=cohort)) +
geom_line()+
scale_y_continuous(name = "Expected number of living relatives",labels = seq(0,3,.2), breaks = seq(0,3,.2))+
labs(x = "Age of Focal")+
facet_wrap(~kin_label)+
theme_bw()
Maybe you are interested in taking a snapshot of kin distribution in
some year, for example 1960. You can do this by specifying the
argumentoutput_period = 1960
.
br_time_varying_1960_period <-
kin(
p = br_px,
f = br_asfr,
n = br_pop,
time_invariant =FALSE,
output_period = 1960,
output_kin = c("d","gd","ggd","m","gm","ggm")
)
# plot
br_time_varying_1960_period$kin_summary %>%
rename_kin() %>%
ggplot(aes(age_focal, count_living)) +
geom_line() +
scale_y_continuous(
name = "Expected number of living relatives",
limits = c(0, 5),
labels = seq(0, 5, 0.5),
breaks = seq(0, 5, 0.5)
) +
facet_wrap(~kin_label, scales = "free") +
labs(x = "Age of Focal")+
theme_bw()
Answer: Do these ‘period’ plots look similar to the ‘cohort’ plots shown above? When would you prefer a period over a cohort approach?
DemoKin
will only return values for either periods OR cohorts, but never for period-cohort combinations. This is related to time/memory issues. E.g., providing all possible period-cohort estimates in our exampe would give a data frame with 119 X 101 x 101 x 14 ~ 17 millions rows.
Consider the following code, which will give an error since we are asking for both a cohort and period output at the same time:
kin(p = br_px,
f = br_asfr,
n = br_pop,
time_invariant =FALSE,
output_cohort = c(1960, 1990),
output_period = 2000,
output_kin = c("d","gd","ggd","m","gm","ggm"))
## Error in kin(p = br_px, f = br_asfr, n = br_pop, time_invariant = FALSE, : sorry, you can not select cohort and period. Choose one please
Kin loss can have severe consequences for bereaved relatives as it
affects, for example, the provision of care support and
intergenerational transfers over the life course. The function
kin
provides information on the number of relatives lost by
Focal during her life, stored in the column
kin_summary$count_cum_death
. The plot below compares
patterns of kin loss for the 1960 and 1990 cohorts.
br_time_varying_1990_1960_cohort$kin_summary %>%
rename_kin() %>%
mutate(cohort = as.factor(cohort)) %>%
ggplot() +
geom_line(aes(age_focal, count_cum_dead, col = cohort)) +
labs(y = "Expected number of deceased relatives") +
theme_bw() +
labs(x = "Age of Focal")+
facet_wrap(~kin_label,scales="free")
Answer: Based on the previous plot, which kin types show the largest differences in terms of kin loss across the two cohorts? Discuss with regards to absolute and relative differences in the expected number of deaths by kin type.
Given these population-level measures, we can also compute Focal’s mean age at the time of her relative’s death.
br_time_varying_1990_1960_cohort$kin_summary %>%
rename_kin() %>%
filter(age_focal == 30) %>%
select(kin_label, cohort, mean_age_lost) %>%
pivot_wider(names_from = cohort, values_from = mean_age_lost) %>%
mutate_if(is.numeric, round, 1)
## # A tibble: 6 × 3
## kin_label `1960` `1990`
## <chr> <dbl> <dbl>
## 1 Daughters 23.8 23.1
## 2 Grand-daughters NaN NaN
## 3 Great-grand-daughters NaN NaN
## 4 Great-grandmothers 8.7 10.2
## 5 Grandmothers 16.1 17.1
## 6 Mother 17.4 18.6
Answer: Consider a Focal aged 30 in both cohorts: how would you describe the differences in terms of her mean age at kin loss for different relatives types?
Human males generally live shorter and reproduce later than females.
These sex-specific processes affect kinship dynamics in a number of
ways. For example, the degree to which an average member of the
population, call her Focal, has a living grandparent is affected by
differential mortality affecting the parental generation at older ages.
We may also be interested in considering how kinship structures vary by
Focal’s sex: a male Focal may have a different number of grandchildren
than a female Focal given differences in fertility by sex. Documenting
these differences matters since women often face greater expectations to
provide support and informal care to relatives. As they live longer,
they may find themselves at greater risk of being having no living kin.
The function kin2sex
implements two-sex kinship models as
introduced by Caswell (2022).
Data on male fertility by age is less common than female fertility. Schoumaker (2019) shows that male TFR is almost always higher than female Total Fertility Rates (TFR) using a sample of 160 countries, and this gap decrease with fertility transition.
For this example, we use data from 2012 France (from Caswell (2022)) to exemplify the use of the
two-sex function in DemoKin
. Data on female and male
fertility and mortality are included in the package.
age <- 0:100
ages <- length(age)
fra_fert_f <- fra_asfr_sex[,"ff"]
fra_fert_m <- fra_asfr_sex[,"fm"]
fra_surv_f <- fra_surv_sex[,"pf"]
fra_surv_m <- fra_surv_sex[,"pm"]
# plot
data.frame(value = c(fra_fert_f, fra_fert_m, fra_surv_f, fra_surv_m),
age = rep(age, 4),
sex = rep(c(rep("f", ages), rep("m", ages)), 2),
risk = c(rep("Fertility rate", ages * 2), rep("Survival probability", ages * 2))) %>%
ggplot(aes(age, value, col=sex)) +
geom_line() +
facet_wrap(~ risk, scales = "free_y") +
theme_bw()
We now introduce the functions kin2sex
, which is similar
to the one-sex function kin
(see ?kin
) with
two exceptions. First, the user needs to specify mortality and fertility
by sex. Second, needs indicate the sex of Focal (which is assumed to be
female by default, as in the one-sex model). Let us first consider the
application for time-invariant populations:
fra_kin_2sex <- kin2sex(
pf = fra_surv_f,
pm = fra_surv_m,
ff = fra_fert_f,
fm = fra_fert_m,
time_invariant = TRUE,
sex_focal = "f",
birth_female = .5)
The output of kin2sex
is equivalent to that of
kin
, except that it includes a column sex_kin
to specify the sex of the given relatives. Take a look with
head(fra_kin_2sex$kin_summary)
.
A note on terminology: The function
kin2sex
uses the same codes askin
to identify relatives (seedemokin_codes()
). Note that when running a two-sex model, the code ‘m’ refers to either mothers or fathers! Use the columnsex_kin
to filter the sex of a given relatives. For example, in order to consider only sons and ignore daughters, use:
## # A tibble: 6 × 11
## age_focal kin sex_kin year cohort count_living mean_age sd_age count_dead
## <int> <chr> <chr> <lgl> <lgl> <dbl> <dbl> <dbl> <dbl>
## 1 0 d m NA NA 0 NaN NaN 0
## 2 1 d m NA NA 0 NaN NaN 0
## 3 2 d m NA NA 0 NaN NaN 0
## 4 3 d m NA NA 0 NaN NaN 0
## 5 4 d m NA NA 0 NaN NaN 0
## 6 5 d m NA NA 0 NaN NaN 0
## # ℹ 2 more variables: count_cum_dead <dbl>, mean_age_lost <dbl>
Let’s group aunts and siblings and visualize the number of living kin by sex and Focal’s age.
kin_out <- fra_kin_2sex$kin_summary %>%
rename_kin(sex = "2sex") %>%
filter(kin_label %in% c("Children",
"Grand-childrens",
"Grandparents",
"Parents",
"Younger siblings",
"Older siblings"))
kin_out %>%
summarise(count=sum(count_living), .by = c(kin_label, age_focal, sex_kin)) %>%
ggplot(aes(age_focal, count, fill=sex_kin))+
geom_area()+
theme_bw() +
labs(y = "Expected number of living kin by sex and Focal's age",
x = "Age of Focal",
fill = "Sex of Kin") +
facet_wrap(~kin_label)
Information on kin availability by sex allows us to consider sex ratios, a traditional measure in demography, with females often in denominator. The following figure, for example, shows that a 25yo French woman in our hypothetical population can expect to have 0.5 grandfathers for every grandmother. Is always the case that the sex ratio will decrease by Focal´s age?
kin_out %>%
group_by(kin_label, age_focal) %>%
summarise(sex_ratio = sum(count_living[sex_kin=="m"], na.rm=T)/sum(count_living[sex_kin=="f"], na.rm=T)) %>%
ggplot(aes(age_focal, sex_ratio))+
geom_line()+
theme_bw() +
labs(y = "Sex ratio",
x = "Age of Focal") +
facet_wrap(~kin_label, scales = "free")
Answer: Should the total number of living aunts be the same in the one-sex model compared to the two-sex models? What about daughters?
The experience of kin loss for Focal depends on differences in mortality between sexes. A female Focal starts losing fathers earlier than mothers. We see a slightly different pattern for grandparents since Focal’s experience of grandparental loss is dependent on the initial availability of grandparents (i.e. if Focal’s grandparent died before her birth, she will never experience his death). What do you think?
kin_out %>%
summarise(count=sum(count_dead), .by = c(kin_label, sex_kin, age_focal)) %>%
ggplot(aes(age_focal, count, col=sex_kin))+
geom_line()+
theme_bw() +
labs(y = "Expected number of deceased kin by sex and Focal's age",
x = "Age of Focal",
col = "Sex of Kin") +
facet_wrap(~kin_label)
We look at populations where demographic rates are not static but
change on a yearly basis. For this, we extend the period using data
located in “docs/fra_2sex.Rdata”, that you can load with
function load
as we did in previous days. This is UN data,
so another exercise can be done with HMD and HFD going back in time. For
this example, we will ‘pretend’ that male fertility rates are the same
than fertility but slightly older, translating shape for the difference
in the mean age observed in 2012 (that you calculated before). Actually
there is some data for the period 1998-2013 in HFD, but
just to keep it simple so far (and also needs to extrapolate back level
and pattern).
load("fra_2sex.Rdata")
years <- ncol(fra_asfr_females)
ages <- nrow(fra_asfr_females)
# difference between sex in mean age in 2012
mac_females_2012 <- sum(0:100 * fra_fert_f)/sum(fra_fert_f)
mac_males_2012 <- sum(0:100 * fra_fert_m)/sum(fra_fert_m)
dif_mac_2012 <- trunc(mac_males_2012 - mac_females_2012)
# create a matrix of male fertility
fra_asfr_males <- matrix(0, ages, years)
colnames(fra_asfr_males) <- colnames(fra_asfr_females)
fra_asfr_males[(dif_mac_2012+1):ages,] <- fra_asfr_females[1:(ages-dif_mac_2012),]
# plot any year
plot(age, fra_asfr_females[,"1990"], t="l", col=2, ylab = "asfr")
lines(age, fra_asfr_males[,"1990"], col=4)
legend("topright", c("females", "males"), col=c(2,4), lty=1)
We now run the time-variant two-sex models (note the
time_invariant = FALSE
argument):
kin_out_time_variant <- kin2sex(
pf = fra_surv_females,
pm = fra_surv_males,
ff = fra_asfr_females,
fm = fra_asfr_males,
sex_focal = "f",
time_invariant = FALSE,
birth_female = .5,
output_cohort = 1950)
We can plot data on kin availability alongside values coming from a time-invariant model to show how demographic change matters: the time-variant models take into account changes derived from the demographic transition, whereas the time-invariant models assume never-changing rates. Effects are the same for each sex?
kin_out_time_invariant <- kin2sex(
pf = fra_surv_females[,"1950"],
pm = fra_surv_males[,"1950"],
ff = fra_asfr_females[,"1950"],
fm = fra_asfr_males[,"1950"],
time_invariant = TRUE,
sex_focal = "f", birth_female = .5)
kin_out_time_variant$kin_summary %>%
filter(cohort == 1950) %>% mutate(type = "Variant") %>%
bind_rows(kin_out_time_invariant$kin_summary %>% mutate(type = "Invariant")) %>%
mutate(kin_label = case_when(kin == "d" ~ "Children",
kin == "m" ~ "Parents",
kin == "gm" ~ "Grandparents",
kin == "ggm" ~ "Great-grandparents",
kin == "os" ~ "Siblings",
kin == "ys" ~ "Siblings",
kin == "oa" ~ "Aunts/Uncles",
kin == "ya" ~ "Aunts/Uncles",
T ~ "error?")) %>%
filter(kin_label %in% c("Children", "Parents", "Grandparents", "Great-grandparents", "Siblings", "Aunts/Uncles")) %>%
group_by(type, kin_label, age_focal, sex_kin) %>%
summarise(count=sum(count_living)) %>%
ggplot(aes(age_focal, count, linetype=type))+
geom_line()+ theme_bw() +
labs(y = "Expected number of living kin by sex and Focal's age",
x = "Age of Focal",
linetype = "Model") +
facet_grid(cols = vars(kin_label), rows=vars(sex_kin), scales = "free")
An interpretation note: we are not tracking line of descendence or ascendence. That means that for example, grand-daughters can not be differentiate if they are offspring from Focal´s son or Focal´s daughter. You can visualize this looking outputs from DemoKin and relating which data you can construct with actual variables.
sex = "Male"
because you will need male-specific survival patterns by age and time.
Don´t forget to reshape data to matrix format as we did yesterday.
Assume for this exercise the same fertility pattern for males than
females and answer:Let’s see an example for Brazil:
One-sex model; time-invariant rates; GKP factors
Two-sex model; time-variant rates; approximate male kin using the androgynous assumption (i.e., male fertility is equivalent to female fertility); use mortality rates for males and females
One-sex model; time-invariant rates; GKP factors
Two-sex model; time-variant rates; approximate male kin using the androgynous assumption
For a detailed description of extensions of the matrix kinship model, see:
The assignment should be completed in groups (of three) that will be defined at the start of the course.
You will use data on kinship structures to benchmark formal models of
kinship. For this exercise, you will use the DemoKin
R
package to implement formal models of kinship. You should choose one
country and run four different models according to the following
specifications:
Use the output of the four models to answer the following questions: 1. Plot the expected number of living relatives by age of focal for each specification. For extra points (i.e., this is optional), also plot the expected number of deceased relatives by age of focal.
Discuss 1-2 key insights, when would you use different specifications? Consider the specific context and the data available for the country you selected. (max 250 words)
Can you think of other ways of incorporating male fertility into the kinship models (beyond the options we discussed in the course)? (max 250 words)
Assignments (one per group) should be sent by email to martins@demogr.mpg.de before midnight of Friday, November 22. You should hand in the following files: