This release adds a pipeline of composable functions for building contact
matrices ([, assign_age_groups(), weigh(), compute_matrix(),
symmetrise(), split_matrix(), per_capita()) and a contact_matrix S3
class. The vignette and README are rewritten around the pipeline (#288).
Minimum R version bumped to 4.1.0 (from 3.5.0). Examples in the pipeline
functions use the native |> pipe, introduced in 4.1.0.
Terminal age group labels now use [N,Inf) notation instead of N+ when
bracket notation is used (e.g. [0,5), [5,15), [15,Inf)). This matches
the contactmatrix package and gives parseable interval notation across all
age groups. It affects matrix dimnames and the age.group column in
$participants; code that matches on strings like "15+" will need
updating to "[15,Inf)". Dash notation (e.g. "15+") is unchanged.
New [.contact_survey method allows filtering survey objects with
expressions, e.g. polymod[country == "United Kingdom"] (#161).
New weigh() function for composable participant weighting: supports
day-of-week groups, named target vectors, direct numeric columns, and
population post-stratification (#161).
New compute_matrix() function computes a contact matrix from a prepared
survey. It is the final step of the pipeline after assign_age_groups()
and (optionally) weigh() (#161).
New post-processing functions symmetrise(), split_matrix(), and
per_capita() operate on compute_matrix() output. symmetrise() enforces
reciprocity, split_matrix() decomposes into mean contacts, normalisation,
and an assortativity matrix, and per_capita() converts to per-capita rates.
Example workflow (#161):
uk_pop <- data.frame(
lower.age.limit = c(0, 5, 15),
population = c(3500000, 6000000, 50000000)
)
polymod[country == "United Kingdom"] |>
assign_age_groups(age_limits = c(0, 5, 15)) |>
compute_matrix() |>
symmetrise(survey_pop = uk_pop)
Pipeline functions (compute_matrix(), symmetrise(), split_matrix(),
per_capita()) return a contact_matrix S3 class with print(), plot(),
and as.matrix() methods. The class inherits from list, so existing code
using $matrix or $participants continues to work.
New contact_age_distribution() function extracts the empirical age
distribution of contacts from a survey. Pass it to
assign_age_groups(estimated_contact_age = ...) to impute ages from
ranges by sampling from the reference distribution instead of uniformly.
This matters for surveys where many contacts have broad age bands, since
uniform sampling would flatten age-assortativity.
New agegroups_to_limits() function converts age group labels back to lower
age limits, the inverse of limits_to_agegroups().
compute_matrix() gains a weight_threshold parameter to cap extreme
weights before normalisation, matching the contact_matrix() option (#131).
Fixed bug where participants with NA dayofweek were incorrectly weighted
as weekend days. They now receive an average weight across all days (#131).
Fixed unmatched-merge warning count when merging files with duplicate keys; previously, the count could be wrong (or negative) due to counting join pairs rather than distinct matched rows (#289).
matrix_plot() now restores all graphical parameters (par()) on exit,
including when the function errors mid-plot. Previously the legend
parameters (new, pty) and the error handler (err) were left
modified in the user's session (#307).
wpp_age() and wpp_countries() are now soft-deprecated. Pass population
data directly via the survey_pop argument instead. The underlying
wpp2017 data is also outdated; the wpp2024 package from GitHub provides
more recent data (#258).
contact_matrix() now warns when it would look up population data
automatically via wpp_age(). This automatic lookup happens when
symmetric, split, per_capita, weigh_age, or return_demography is
set and countries is given (or the participant data has a country
column) without an explicit survey_pop. The implicit lookup will be
removed in a future release; pass survey_pop directly (e.g. from
survey_country_population() or the wpp2024 package) to silence the
warning and make the code forwards-compatible.
get_survey(), download_survey(), list_surveys(), get_citation(), and
survey_countries() now warn unconditionally when called. These functions
were soft-deprecated in 0.5.0 and users should switch to the
contactsurveys package
(#269).
contact_matrix() now uses assign_age_groups() internally, removing
duplicated code (#227).
contact_matrix() now uses weigh() internally for all weighting
(day-of-week, age, and user-defined). The helpers
warn_multiple_observations() and normalise_weights() were extracted
so compute_matrix() can share them (#131).
The vignette and README are rewritten around the pipeline (#288).
Enabled cyclocomp_linter, line_length_linter, and object_usage_linter.
Disabled indentation_linter (air handles indentation). Reduced cyclomatic
complexity of check.contact_survey(), [.contact_survey(),
find_unique_key(), and try_merge_additional_files() by extracting
helper functions (#289).
This is a patch release with a bug fix and documentation updates.
load_survey() no longer fails when merging contact files that lack a
cont_id column (#278).
The vignette now points to the contactsurveys package for downloading surveys from Zenodo, and no longer uses deprecated functions (#269).
Added Nicholas Tierney (@njtierney) as package author (#277).
This release focuses on improved modularity and flexibility for contact matrix workflows. Key highlights include new standalone functions for age group assignment and population data retrieval, more intuitive handling of age limits, and the beginning of a transition to the contactsurveys package for survey downloads.
contact_matrix() now preserves all user-specified age_limits, even when
no participants exist in some age groups. Previously, age groups beyond the
maximum participant age were silently dropped. Empty age groups now show
0 participants and NA values in the matrix. This may change matrix dimensions
for existing code (@Bisaloo, #144, #231).
contact_matrix(counts = TRUE)$matrix now returns an array rather than an
xtabs object. This matches the existing output format of
contact_matrix(counts = FALSE)$matrix (@Bisaloo, #118).
When age_limits is not specified, it is now inferred from both participant
and contact ages, not just participant ages. This may result in more age
groups if contacts include ages beyond the participant age range (#230).
as_contact_survey() no longer requires country and year columns. These
columns are now auto-detected if present, but surveys without them can be
loaded successfully (#193, #199).
New assign_age_groups() and survey_country_population() functions allow
modular pre-processing of survey data (#131, #226).
Reduced verbosity by removing messages about removing participants/contacts with missing ages (#228).
clean() now correctly processes age values with units (e.g., "6 months",
"52 weeks") (@LloydChapman, #250, #256).
contact_matrix() now warns when a survey contains multiple observations per
participant, as results will aggregate across all observations (#260).
load_survey() now correctly loads longitudinal surveys with repeated
observations per participant (e.g., sday files with wave/studyDay columns).
Previously, these columns were silently dropped (@njtierney, #192, #194).
Fixed a bug leading to excess contacts with NA age if the lowest age group
did not start at 0 (@lwillem, #170).
Argument names with dots (e.g., age.limits) have been deprecated in favour
of underscores (e.g., age_limits) in contact_matrix(),
as_contact_survey(), pop_age(), and clean(). The old argument names
still work but will produce deprecation warnings (#160).
get_survey(), download_survey(), get_citation(), list_surveys(), and
survey_countries() have been soft-deprecated and moved to
contactsurveys. This is
part of decoupling these features from socialmixr to reduce dependencies
(@njtierney, #179, #207). These will continue to work until version 1.0.0.
The missing_contact_age = "sample" option in contact_matrix() and
assign_age_groups() has been soft-deprecated. Use "remove" to exclude
contacts with missing ages, "keep" to retain them as a separate age group,
or "ignore" to drop only those contacts (#273).
limits_to_agegroups has been changed to return bracket notated age ranges by defaultAn error in list_surveys() was fixed which stopped this working.
contact_matrix() was updated to only accept survey objects, not DOIs and matches the documentation. It is still possible to get a contact matrix from a DOI but it is necessary to go through the get_survey() function.
# No longer works!
contact_matrix("10.5281/zenodo.1095664")
# Recommended workflow
get_survey("10.5281/zenodo.1095664") |>
contact_matrix()
The efficiency of the contact_matrix() was improved.
cite function has been deprecated and replaced with get_citation (#84).columns argument has been removed from check.survey() (#81).download_survey() has been reduced by externalising the find_common_prefix() function and failing early instead of relying on unnecessary if/else sequenceserror argument has been removed from check() and always return warnings. If you want to turn these warnings into errors, please have a look at options(warn = 2)quiet argument has been removed from check(), cite(), contact_matrix(), and get_survey(). If you want to silence diagnostic messages, you should use R idiomatic mechanisms, such as suppressMessages()n and bootstrap options of contact_matrix() have been deprecated and replaced with a sample.participants argument; bootstrapping is now explained in the vignette insteadmatrix_plot() function to plot contact matrixchkDots() is now used to ensure no argument is silently ignored by S3 methodsget_survey() has been split into separate functions for downloading and processing survey datareduce_agegroups