Title: | Subnational Data for COVID-19 Epidemiology |
---|---|
Description: | An interface to subnational and national level COVID-19 data sourced from both official sources, such as Public Health England in the UK, and from other COVID-19 data collections, including the World Health Organisation (WHO), European Centre for Disease Prevention and Control (ECDC), John Hopkins University (JHU), Google Open Data and others. Designed to streamline COVID-19 data extraction, cleaning, and processing from a range of data sources in an open and transparent way. This allows users to inspect and scrutinise the data, and tools used to process it, at every step. For all countries supported, data includes a daily time-series of cases. Wherever available data is also provided for deaths, hospitalisations, and tests. National level data are also supported using a range of sources. |
Authors: | Joseph Palmer [aut] , Katharine Sherratt [aut] , Richard Martin-Nielsen [aut] (https://github.com/RichardMN), Jonnie Bevan [aut], Hamish Gibbs [aut] , Hugo Gruson [aut] , Sophie Meakin [ctb], Joel Hellewell [ctb] , Patrick Barks [ctb], Paul Campbell [ctb], Flavio Finger [ctb] , Richard Boyes [ctb] (https://github.com/rboyes), Vang Le [ctb] (https://github.com/biocyberman), Sebastian Funk [aut], Sam Abbott [aut, cre] |
Maintainer: | Sam Abbott <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.9.3 |
Built: | 2024-10-31 21:26:01 UTC |
Source: | https://github.com/epiforecasts/covidregionaldata |
Adds extra columns filled with NAs to a dataset. This ensures that all datasets from the covidregionaldata package return datasets of the same underlying structure (i.e. same columns).
add_extra_na_cols(data)
add_extra_na_cols(data)
data |
A data frame |
A tibble with relevant NA columns added
Compulsory processing functions
calculate_columns_from_existing_data()
,
complete_cumulative_columns()
,
fill_empty_dates_with_na()
Available datasets
all_country_data
all_country_data
An object of class tbl_df
(inherits from tbl
, data.frame
) with 23 rows and 10 columns.
A tibble of available datasets and related information.
Information for downloading, cleaning and processing COVID-19 region level 1 and 2 data for Belgium.
covidregionaldata::DataClass
-> Belgium
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
ISO 3166-2 codes are used for both region and province levels in
Belgium, and for provinces these are marked as being
iso_3166_2_province
common_data_urls
List of named links to raw data that are common across levels.
level_data_urls
List of named links to raw data specific to each level of regions. For Belgium, there are only additional data for level 1 regions.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
Belgium$set_region_codes()
download()
Downloads data from source and (for Belgium) applies an initial data patch.
Belgium$download()
clean_level_1()
Region-level Data Cleaning
Belgium$clean_level_1()
clean_level_2()
Province-level Data Cleaning
Belgium$clean_level_2()
clone()
The objects of this class are cloneable with this method.
Belgium$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://epistat.sciensano.be/Data/COVID19BE_CASES_AGESEX.csv
Subnational data sources
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
## Not run: region <- Belgium$new(verbose = TRUE, steps = TRUE, get = TRUE, level = "2") region$return() ## End(Not run)
## Not run: region <- Belgium$new(verbose = TRUE, steps = TRUE, get = TRUE, level = "2") region$return() ## End(Not run)
Information for downloading, cleaning and processing COVID-19 region data for Brazil.
Data available on Github, curated by Wesley Cota: DOI 10.1590/SciELOPreprints.362
covidregionaldata::DataClass
-> Brazil
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data. Data is available at the city level and is aggregated to provide state data.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
Brazil$set_region_codes()
clean_common()
Common data cleaning for both levels
Brazil$clean_common()
clean_level_1()
State Level Data Cleaning
Brazil$clean_level_1()
clean_level_2()
City Level Data Cleaning
Brazil$clean_level_2()
clone()
The objects of this class are cloneable with this method.
Brazil$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://github.com/wcota/covid19br
Subnational data sources
Belgium
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
## Not run: region <- Brazil$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
## Not run: region <- Brazil$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
Checks which columns are missing (cumulative/daily counts) and if one is present and the other not then calculates the second from the first.
calculate_columns_from_existing_data(data)
calculate_columns_from_existing_data(data)
data |
A data frame |
A data frame with extra columns if required
Compulsory processing functions
add_extra_na_cols()
,
complete_cumulative_columns()
,
fill_empty_dates_with_na()
Information for downloading, cleaning and processing COVID-19 region data for Canada.
covidregionaldata::DataClass
-> Canada
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data that are common across levels.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
Canada$set_region_codes()
clean_common()
Provincial Level Data cleaning
Canada$clean_common()
...
pass additional arguments
clone()
The objects of this class are cloneable with this method.
Canada$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://health-infobase.canada.ca
Subnational data sources
Belgium
,
Brazil
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
## Not run: region <- Canada$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
## Not run: region <- Canada$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
Checks a given level is supported
check_level(level, supported_levels)
check_level(level, supported_levels)
level |
A character string indicating the current level. |
supported_levels |
A character vector of supported levels |
Information for downloading, cleaning and processing COVID-19 region data for Colombia
covidregionaldata::DataClass
-> Colombia
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
Colombia$set_region_codes()
download()
Colombia specific download using Socrata API
This uses the RSocrata
package if it is installed or downloads
a much larger csv file if that package is not available.
Colombia$download()
clean_common()
Colombia specific data cleaning
Colombia$clean_common()
clean_level_1()
Colombia Specific Department Level Data Cleaning
Aggregates data to the level 1 (department) regional level. Data is provided by the source at the level 2 (municipality) regional level.
Colombia$clean_level_1()
clone()
The objects of this class are cloneable with this method.
Colombia$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://www.datos.gov.co/Salud-y-Protecci-n-Social/Casos-positivos-de-COVID-19-en-Colombia/gt2j-8ykr
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
## Not run: region <- Colombia$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
## Not run: region <- Colombia$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
The region codes for Colombia
colombia_codes
colombia_codes
An object of class data.frame
with 1119 rows and 4 columns.
A tibble of region codes and related information.
If a dataset had a row of NAs added to it (using fill_empty_dates_with_na) then cumulative data columns will have NAs which can cause issues later. This function fills these values with the previous non-NA value.
complete_cumulative_columns(data)
complete_cumulative_columns(data)
data |
A data frame |
A data tibble with NAs filled in for cumulative data columns.
Compulsory processing functions
add_extra_na_cols()
,
calculate_columns_from_existing_data()
,
fill_empty_dates_with_na()
Acts as parent class for national data classes, allowing them
to access general methods defined in DataClass()
but with additional
On top of the methods documented in DataClass()
, this class
implements a custom filter function that supports partial matching to
English country names using the countrycode
package.
covidregionaldata::DataClass
-> CountryDataClass
filter_level
Character The level of the data to filter at. Defaults to the country level of the data.
filter()
Filter method for country level data. Uses countryname
to match input countries with known names.
CountryDataClass$filter(countries, level)
countries
A character vector of target countries. Overrides the
current class setting for target_regions
. If the filter_level
field
level
argument is set to anything other than level 1 this is passed
directly to the parent DataClass()
filter()
method with no
alteration.
level
Character The level of the data to filter at. Defaults to the conuntry level if not specified.
clone()
The objects of this class are cloneable with this method.
CountryDataClass$clone(deep = FALSE)
deep
Whether to make a deep clone.
Data interface functions
DataClass
,
get_available_datasets()
,
get_national_data()
,
get_regional_data()
,
initialise_dataclass()
Attributes and methods for COVID-19 data provided by the Covid19 Data Hub
This dataset supports both national and subnational data sources
with national level data returned by default. National data is sourced from
John Hopkins University and so we recommend using the JHU class included in
this package. Subnational data is supported for a subset of countries which
can be found after cleaning using the available_regions()
method,
see the examples for more details. These data sets are minimally cleaned
data files hosted by the team at COVID19 Data Hub so please see their
source repository for further details
(https://github.com/covid19datahub/COVID19/#data-sources)
If using for analysis checking the source for further details is
strongly advised.
If using this class please cite: "Guidotti et al., (2020). COVID-19 Data Hub Journal of Open Source Software, 5(51), 2376, https://doi.org/10.21105/joss.02376"
covidregionaldata::DataClass
-> covidregionaldata::CountryDataClass
-> Covid19DataHub
origin
name of country to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
level_data_urls
List of named links to raw data. The first, and only entry, is be named main.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
clean_common()
Covid19 Data Hub specific data cleaning. This takes all the raw data, renames some columns and checks types.
Covid19DataHub$clean_common()
clone()
The objects of this class are cloneable with this method.
Covid19DataHub$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://covid19datahub.io/articles/data.html
Aggregated data sources
Google
,
JHU
National data sources
ECDC
,
Google
,
JHU
,
JRC
,
WHO
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
# nolint start ## Not run: # set up a data cache start_using_memoise() # get all countries data cv19dh <- Covid19DataHub$new(level = "1", get = TRUE) cv19dh$return() # show available regions with data at the second level of interest cv19dh_level_2 <- Covid19DataHub$new(level = "2") cv19dh_level_2$download() cv19dh_level_2$clean() cv19dh$available_regions() # get all region data for the uk cv19dh_level_2$filter("uk") cv19dh_level_2$process() cv19dh_level_2$return() # get all regional data for the UK uk <- Covid19DataHub$new(regions = "uk", level = "2", get = TRUE) uk$return() # get all subregional data for the UK uk <- Covid19DataHub$new(regions = "uk", level = "3", get = TRUE) uk$return() ## End(Not run) # nolint end
# nolint start ## Not run: # set up a data cache start_using_memoise() # get all countries data cv19dh <- Covid19DataHub$new(level = "1", get = TRUE) cv19dh$return() # show available regions with data at the second level of interest cv19dh_level_2 <- Covid19DataHub$new(level = "2") cv19dh_level_2$download() cv19dh_level_2$clean() cv19dh$available_regions() # get all region data for the uk cv19dh_level_2$filter("uk") cv19dh_level_2$process() cv19dh_level_2$return() # get all regional data for the UK uk <- Covid19DataHub$new(regions = "uk", level = "2", get = TRUE) uk$return() # get all subregional data for the UK uk <- Covid19DataHub$new(regions = "uk", level = "3", get = TRUE) uk$return() ## End(Not run) # nolint end
Checks for use of memoise and then uses vroom::vroom.
csv_reader(file, verbose = FALSE, guess_max = 1000, ...)
csv_reader(file, verbose = FALSE, guess_max = 1000, ...)
file |
A URL or filepath to a CSV |
verbose |
Logical, defaults to |
guess_max |
Maximum number of records to use for guessing column types. Defaults to a 1000. |
... |
extra parameters to be passed to vroom::vroom |
A data table
Information for downloading, cleaning and processing COVID-19 region data for Cuba
covidregionaldata::DataClass
-> Cuba
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
Cuba$set_region_codes()
clean_common()
Cuba specific state level data cleaning
Cuba$clean_common()
clone()
The objects of this class are cloneable with this method.
Cuba$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://covid19cubadata.github.io/
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Estonia
,
France
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
## Not run: region <- Cuba$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
## Not run: region <- Cuba$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
A parent class containing non-dataset specific methods.
All data sets have shared methods for extracting geographic codes, downloading, processing, and returning data. These functions are contained within this parent class and so are accessible by all data sets which inherit from here. Individual data sets can overwrite any functions or fields providing they define a method with the same name, and can be extended with additional functionality. See the individual method documentaion for further details.
origin
the origin of the data source. For regional data sources this will usually be the name of the country.
data
Once initialised, a list of named data frames: raw
(list of named raw data frames) clean (cleaned data) and processed
(processed data). Data is accessed using $data
.
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
region_name
string Name for the region column, e.g. 'region'. This field is filled at initialisation with the region name for the specified level (supported_region_names$level).
code_name
string Name for the codes column, e.g. 'iso_3166_2' Filled at initialisation with the code name associated with the requested level (supported_region_codes$level).
codes_lookup
string or tibble Region codes for the target origin
filled by origin specific codes in
set_region_codes()
data_urls
List of named common and shared url links to raw data. Prefers shared if there is a name conflict.
common_data_urls
List of named links to raw data that are common across levels. The first entry should be named main.
level_data_urls
List of named links to raw data that are level
specific. Any urls that share a name with a url from
common_data_urls
will be selected preferentially. Each top level
list should be named after a supported level.
source_data_cols
existing columns within the raw data
level
target region level. This field is filled at initialisation
using user inputs or defaults in $new()
data_name
string. The country name followed by the level. E.g. "Italy at level 1"
totals
Boolean. If TRUE, returns totalled data per region
up to today's date. This field is filled at initialisation using user
inputs or defaults in $new()
localise
Boolean. Should region names be localised.
This field is filled at initialisation using user inputs or defaults
in $new()
verbose
Boolean. Display information at various stages.
This field is filled at initialisation. using user inputs or defaults
in $new()
steps
Boolean. Keep data from each processing step.
This field is filled at initialisation.using user inputs or defaults
in $new()
target_regions
A character vector of regions to filter for. Used
by the filter method
.
process_fns
array, additional, user supplied functions to process the data.
filter_level
Character The level of the data to filter at. Defaults to the target level.
set_region_codes()
Place holder for custom country specific function to load region codes.
DataClass$set_region_codes()
new()
Initialize function used by all DataClass
objects.
Set up the DataClass
class with attributes set to input parameters.
Should only be called by a DataClass
class object.
DataClass$new( level = "1", filter_level, regions, totals = FALSE, localise = TRUE, verbose = TRUE, steps = FALSE, get = FALSE, process_fns )
level
A character string indicating the target administrative level of the data with the default being "1". Currently supported options are level 1 ("1) and level 2 ("2").
filter_level
A character string indicating the level to filter at.
Defaults to the level of the data if not specified and if not otherwise
defined in the class.
Use get_available_datasets()
for supported options by dataset.
regions
A character vector of target regions to be assigned to
thetarget_regions
field if present.
totals
Logical, defaults to FALSE. If TRUE, returns totalled data per region up to today's date. If FALSE, returns the full dataset stratified by date and region.
localise
Logical, defaults to TRUE. Should region names be localised.
verbose
Logical, defaults to TRUE. Should verbose processing
steps
Logical, defaults to FALSE. Should all processing and cleaning steps be kept and output in a list.
get
Logical, defaults to FALSE. Should the class get
method be
called (this will download, clean, and process data at initialisation).
process_fns
Array, additional functions to process the data.
Users can supply their own functions here which would act on clean data
and they will be called alongside our default processing functions.
The default optional function added is set_negative_values_to_zero
.
if process_fns is not set (see process_fns
field for all defaults).
If you want to keep this when supplying your own processing functions
remember to add it to your list also. If you feel you have created a
cool processing function that others could benefit from please submit a
Pull Request to our github repository
and we will consider adding it to the package.
download()
Download raw data from data_urls
, stores a named list
of the data_url
name and the corresponding raw data table in
data$raw
DataClass$download()
download_JSON()
Download raw data from data_urls
, stores a named list
of the data_url
name and the corresponding raw data table in
data$raw
. Designed as a drop-in replacement for download
so
it can be used in sub-classes.
DataClass$download_JSON()
clean()
Cleans raw data (corrects format, converts column types,
etc). Works on raw data and so should be called after
download()
Calls the specific class specific cleaning method (clean_common
)
followed by level specific cleaning methods.
clean_level_[1/2]
. Cleaned data is stored in data$clean
DataClass$clean()
clean_common()
Cleaning methods that are common across a class.
By default this method is empty as if any code is required it should be
defined in a child class specific clean_common
method.
DataClass$clean_common()
available_regions()
Show regions that are available to be used for
filtering operations. Can only be called once clean()
has been
called. Filtering level is determined by checking the filter_level
field.
DataClass$available_regions(level)
level
A character string indicating the level to filter at.
Defaults to using the filter_level
field if not specified
filter()
Filter cleaned data for a specific region To be called
after clean()
DataClass$filter(regions, level)
regions
A character vector of target regions. Overrides the
current class setting for target_regions
.
level
Character The level of the data to filter at. Defaults to the lowest level in the data.
process()
Processes data by adding and calculating absent columns.
Called on clean data (after clean()
).
Some countries may have data as new events (e.g. number of
new cases for that day) whilst others have a running total up to that
date. Processing calculates these based on what the data comes with
via the functions region_dispatch()
and process_internal()
,
which does the following:
Adds columns not present in the data add_extra_na_cols()
Ensures there are no negative values
set_negative_values_to_zero()
Removes NA dates fill_empty_dates_with_na()
Calculates cumulative data complete_cumulative_columns()
Calculates missing columns from existing ones
calculate_columns_from_existing_data()
DataClass$process(process_fns)
process_fns
Array, additional functions to process the data.
Users can supply their own functions here which would act on clean data
and they will be called alongside our default processing functions.
The default optional function added is set_negative_values_to_zero
.
if process_fns is not set (see process_fns
field for all defaults).
get()
Get data related to the data class. This runs each distinct
step in the workflow in order.
Internally calls download()
,
clean()
,
filter()
and
process()
download
, clean
, filter
and process
methods.
DataClass$get()
return()
Return data. Designed to be called after
process()
this uses the steps argument to return either a
list of all the data preserved at each step or just the processed data.
For most datasets a custom method should not be needed.
DataClass$return()
summary()
Create a table of summary information for the data set being processed.
DataClass$summary()
Returns a single row summary tibble containing the origin of the data source, class, level 1 and 2 region names, the type of data, the urls of the raw data and the columns present in the raw data.
test()
Run tests on a country class instance. Calling test()
on a
class instance runs tests with the settings in use. For example, if you
set level = "1"
and localise = FALSE
the tests will be run on level 1
data which is not localised. Rather than downloading data for a test
users can provide a path to a snapshot file of data to test instead.
Tests are run on a clone of the class. This method calls generic tests
for all country class objects. It also calls country specific tests
which can be defined in an individual country class method called
specific_tests()
. The snapshots contain the first 1000 rows of data.
For more details see the
'testing' vignette: vignette(testing)
.
DataClass$test( download = FALSE, snapshot_dir = paste0(tempdir(), "/snapshots"), all = FALSE, ... )
download
logical. To download the data (TRUE) or use a snapshot (FALSE). Defaults to FALSE.
snapshot_dir
character_array the name of a directory to save the
downloaded data or read from. If not defined a directory called
'snapshots' will be created in the temp directory. Snapshots are saved as
rds files with the class name and level: e.g. Italy_level_1.rds
.
all
logical. Run tests with all settings (TRUE) or with those defined in the current class instance (FALSE). Defaults to FALSE.
...
Additional parameters to pass to specific_tests
clone()
The objects of this class are cloneable with this method.
DataClass$clone(deep = FALSE)
deep
Whether to make a deep clone.
Data interface functions
CountryDataClass
,
get_available_datasets()
,
get_national_data()
,
get_regional_data()
,
initialise_dataclass()
Download Excel Documents
download_excel(url, archive, verbose = FALSE, transpose = TRUE, ...)
download_excel(url, archive, verbose = FALSE, transpose = TRUE, ...)
url |
Character string containing the full URL to the Excel document. |
archive |
Character string naming the file name to assign in the temporary directory. |
verbose |
Logical, defaults to |
transpose |
Logical, should the read in data be transposed |
... |
Additional parameters to pass to |
A data.frame
.
Information for downloading, cleaning and processing the European Centre for Disease Prevention and Control COVID-19 data.
covidregionaldata::DataClass
-> covidregionaldata::CountryDataClass
-> ECDC
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
clean_common()
ECDC specific state level data cleaning
ECDC$clean_common()
return()
Specific return settings for the ECDC dataset.
ECDC$return()
specific_tests()
Run additional tests on ECDC class. Tests ECDC has required
additional columns and that there is only one row per country. Designed
to be run from test
and not run directly.
ECDC$specific_tests(self_copy, ...)
self_copy
R6class the object to test
...
Extra params passed to specific download functions
clone()
The objects of this class are cloneable with this method.
ECDC$clone(deep = FALSE)
deep
Whether to make a deep clone.
National data sources
Covid19DataHub
,
Google
,
JHU
,
JRC
,
WHO
## Not run: national <- ECDC$new(verbose = TRUE, steps = TRUE, get = TRUE) national$return() ## End(Not run)
## Not run: national <- ECDC$new(verbose = TRUE, steps = TRUE, get = TRUE) national$return() ## End(Not run)
Information for downloading, cleaning and processing COVID-19 region data for Estonia
covidregionaldata::DataClass
-> Estonia
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
Estonia$set_region_codes()
clean_common()
Estonia specific state level data cleaning
Estonia$clean_common()
clone()
The objects of this class are cloneable with this method.
Estonia$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://www.terviseamet.ee/et/koroonaviirus/avaandmed
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
France
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
## Not run: region <- Estonia$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
## Not run: region <- Estonia$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
Checks the date column is an s3 class and that region level column is a character in the cleaned data (data$clean)
expect_clean_cols(data, level)
expect_clean_cols(data, level)
data |
The clean data to check |
level |
character_array the level of the data to check |
Functions used for testing data is cleaned and processed correctly
expect_columns_contain_data()
,
expect_processed_cols()
,
test_cleaning()
,
test_download_JSON()
,
test_download()
,
test_processing()
,
test_return()
Checks that cleaned columns cases, deaths, recovered and test (new and total) are not entirely composed of NAs.
expect_columns_contain_data(DataClass_obj)
expect_columns_contain_data(DataClass_obj)
DataClass_obj |
The DataClass object (R6Class) to perform checks on.
Must be a |
Functions used for testing data is cleaned and processed correctly
expect_clean_cols()
,
expect_processed_cols()
,
test_cleaning()
,
test_download_JSON()
,
test_download()
,
test_processing()
,
test_return()
Checks that processed data columns date, cases_new, cases_total, deaths_new, deaths_total and that region level have the correct types.
expect_processed_cols(data, level = "1", localised = TRUE)
expect_processed_cols(data, level = "1", localised = TRUE)
data |
The data to check |
level |
character_array the level of the data to check |
localised |
logical to check localised data or not, defaults to TRUE. |
Functions used for testing data is cleaned and processed correctly
expect_clean_cols()
,
expect_columns_contain_data()
,
test_cleaning()
,
test_download_JSON()
,
test_download()
,
test_processing()
,
test_return()
There are points, particularly early during data collection, where data was not collected for all regions. This function finds dates which have data for some regions, but not all, and adds rows of NAs for the missing regions. This is mainly for reasons of completeness.
fill_empty_dates_with_na(data)
fill_empty_dates_with_na(data)
data |
A data frame |
A tibble with rows of NAs added.
Compulsory processing functions
add_extra_na_cols()
,
calculate_columns_from_existing_data()
,
complete_cumulative_columns()
Information for downloading, cleaning and processing COVID-19 region data for France.
covidregionaldata::DataClass
-> France
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
level_data_urls
List of named links to raw data that are level specific.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
France$set_region_codes()
clean_level_1()
Region Level Data Cleaning
France$clean_level_1()
clean_level_2()
Department Level Data Cleaning
France$clean_level_2()
clone()
The objects of this class are cloneable with this method.
France$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://www.data.gouv.fr/fr/datasets/r/406c6a23-e283-4300-9484-54e78c8ae675
https://www.data.gouv.fr/fr/datasets/r/6fadff46-9efd-4c53-942a-54aca783c30c
https://www.data.gouv.fr/fr/datasets/r/001aca18-df6a-45c8-89e6-f82d689e6c01
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
## Not run: region <- France$new(level = "2", verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
## Not run: region <- France$new(level = "2", verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
The region codes for France
france_codes
france_codes
An object of class data.frame
with 104 rows and 5 columns.
A tibble of region codes and related information.
Information for downloading, cleaning and processing COVID-19 region level 1 and 2 data for Germany.
covidregionaldata::DataClass
-> Germany
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data. The first, and only entry, is be named main.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
Germany$set_region_codes()
clean_common()
Common Data Cleaning
Germany$clean_common()
clean_level_1()
Bundesland Level Data Cleaning
Germany$clean_level_1()
clean_level_2()
Landkreis Level Data Cleaning
Germany$clean_level_2()
clone()
The objects of this class are cloneable with this method.
Germany$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://opendata.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0.csv
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
## Not run: region <- Germany$new(verbose = TRUE, steps = TRUE, level = "2", get = TRUE) region$return() ## End(Not run)
## Not run: region <- Germany$new(verbose = TRUE, steps = TRUE, level = "2", get = TRUE) region$return() ## End(Not run)
Returns data on what countries are available from the data provided with this package either using a cached dataset or built by searching the target namespace.
get_available_datasets(type, render = FALSE, namespace = "covidregionaldata")
get_available_datasets(type, render = FALSE, namespace = "covidregionaldata")
type |
A character vector indicating the types of data to
return. Current options include "national" (which are datasets at the
national level which inherit from |
render |
Logical If TRUE the supported data set table is built from the
available classes using |
namespace |
Character string The name of the namespace to search for class objects. Defaults to "covidregionaldata" as the package. |
A list of available data sets and the spatial aggregation data is available for.
Data interface functions
CountryDataClass
,
DataClass
,
get_national_data()
,
get_regional_data()
,
initialise_dataclass()
# see all available datasets get_available_datasets() # see only national level datasets get_available_datasets("national") # see only regional level datasets get_available_datasets("regional") # render the data get_available_datasets(render = TRUE)
# see all available datasets get_available_datasets() # see only national level datasets get_available_datasets("national") # see only regional level datasets get_available_datasets("regional") # render the data get_available_datasets(render = TRUE)
Provides an interface to source specific classes which
support national level data. For simple use cases this allows downloading
clean, standardised, national-level COVID-19 data sets. Internally this uses
the CountryDataClass()
parent class which allows documented downloading,
cleaning, and processing. Optionally all steps of data processing can be
returned along with the functions used for processing but by default just
the finalised processed data is returned. See the examples for some
potential use cases and the links to lower level functions for more details
and options.
get_national_data( countries, source = "who", level = "1", totals = FALSE, steps = FALSE, class = FALSE, verbose = TRUE, ... )
get_national_data( countries, source = "who", level = "1", totals = FALSE, steps = FALSE, class = FALSE, verbose = TRUE, ... )
countries |
A character vector specifying country names of interest. Used to filter the data. |
source |
A character string specifying the data source (not case
dependent). Defaults to WHO (the World Health Organisation). See
|
level |
A character string indicating the target administrative level
of the data with the default being "1". Currently supported options are
level 1 ("1) and level 2 ("2"). Use |
totals |
Logical, defaults to FALSE. If TRUE, returns totalled data per region up to today's date. If FALSE, returns the full dataset stratified by date and region. |
steps |
Logical, defaults to FALSE. Should all processing and cleaning steps be kept and output in a list. |
class |
Logical, defaults to FALSE. If TRUE returns the
|
verbose |
Logical, defaults to |
... |
Additional arguments to pass to class specific functionality. |
A tibble with data related to cases, deaths, hospitalisations, recoveries and testing.
WHO()
, ECDC()
, JHU()
, Google()
Data interface functions
CountryDataClass
,
DataClass
,
get_available_datasets()
,
get_regional_data()
,
initialise_dataclass()
## Not run: # set up a data cache start_using_memoise() # download all national data from the WHO get_national_data(source = "who") # download data for Canada keeping all processing steps get_national_data(countries = "canada", source = "ecdc") # download data for Canada from the JHU and return the full class jhu <- get_national_data(countries = "canada", source = "jhu", class = TRUE) jhu # return the JHU data for canada jhu$return() # check which regions the JHU supports national data for jhu$available_regions() # filter instead for France (and then reprocess) jhu$filter("France") jhu$process() # explore the structure of the stored JHU data jhu$data ## End(Not run)
## Not run: # set up a data cache start_using_memoise() # download all national data from the WHO get_national_data(source = "who") # download data for Canada keeping all processing steps get_national_data(countries = "canada", source = "ecdc") # download data for Canada from the JHU and return the full class jhu <- get_national_data(countries = "canada", source = "jhu", class = TRUE) jhu # return the JHU data for canada jhu$return() # check which regions the JHU supports national data for jhu$available_regions() # filter instead for France (and then reprocess) jhu$filter("France") jhu$process() # explore the structure of the stored JHU data jhu$data ## End(Not run)
Provides an interface to source specific classes which
support regional level data. For simple use cases this allows downloading
clean, standardised, regional-level COVID-19 data sets. Internally this uses
the DataClass()
parent class which allows documented downloading, cleaning,
and processing. Optionally all steps of data processing can be returned
along with the functions used for processing but by default just the
finalised processed data is returned. See the examples for some potential
use cases and the links to lower level functions for more details and
options.
get_regional_data( country, level = "1", totals = FALSE, localise = TRUE, steps = FALSE, class = FALSE, verbose = TRUE, regions, ... )
get_regional_data( country, level = "1", totals = FALSE, localise = TRUE, steps = FALSE, class = FALSE, verbose = TRUE, regions, ... )
country |
A character string specifying the country to get data from.
Not case dependent. Name should be the English name. For a list of
options use |
level |
A character string indicating the target administrative level
of the data with the default being "1". Currently supported options are
level 1 ("1) and level 2 ("2"). Use |
totals |
Logical, defaults to FALSE. If TRUE, returns totalled data per region up to today's date. If FALSE, returns the full dataset stratified by date and region. |
localise |
Logical, defaults to TRUE. Should region names be localised. |
steps |
Logical, defaults to FALSE. Should all processing and cleaning steps be kept and output in a list. |
class |
Logical, defaults to FALSE. If TRUE returns the
|
verbose |
Logical, defaults to |
regions |
A character vector of target regions to be assigned to the
|
... |
Additional arguments to pass to class specific functionality. |
A tibble with data related to cases, deaths, hospitalisations, recoveries and testing stratified by regions within the given country.
Data interface functions
CountryDataClass
,
DataClass
,
get_available_datasets()
,
get_national_data()
,
initialise_dataclass()
## Not run: # set up a data cache start_using_memoise() # download data for Italy get_regional_data("italy") # return totals for Italy with no localisation get_regional_data("italy", localise = FALSE, totals = TRUE) # download data for the UK but return the class uk <- get_regional_data("United Kingdom", class = TRUE) uk # return UK data from the class object] uk$return() ## End(Not run)
## Not run: # set up a data cache start_using_memoise() # download data for Italy get_regional_data("italy") # return totals for Italy with no localisation get_regional_data("italy", localise = FALSE, totals = TRUE) # download data for the UK but return the class uk <- get_regional_data("United Kingdom", class = TRUE) uk # return UK data from the class object] uk$return() ## End(Not run)
Glue the spatial level into a variable name
glue_level(level)
glue_level(level)
level |
A character string indicating the current level. |
A string in the form "level_1_region".
Google specific information for downloading, cleaning and processing covid-19 region data for an example Country. The function works the same as other national data sources, however, data from Google supports three subregions (country, subregion and subregion2) which can be accessed using the 'level' argument. There is also more data available, such as hospitalisations data. The raw data comes as three seperate data sets, "epidemiology" which is comprised of cases, tests and deaths, "index", which holds information about countries linking the other data sets, and "hospitalizations" which holds data about number of people in hospital, ICU, etc.
covidregionaldata::DataClass
-> covidregionaldata::CountryDataClass
-> Google
origin
name of country to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
clean_common()
GoogleData specific subregion2 level data cleaning. This takes all the raw data, puts into a single data frame, renames some columns and checks types.
Google$clean_common()
clean_level_1()
Google specific subregion level data cleaning. Takes the
data cleaned by clean_common
and aggregates it to the country level
(level 1).
Google$clean_level_1()
clean_level_2()
Google specific subregion2 level data cleaning. Takes the
data cleaned by clean_common
and aggregates it to the subregion level
(level 2).
Google$clean_level_2()
new()
custom initialize for Google
Google$new(...)
...
arguments to be passed to DataClass
and initialize Google
clone()
The objects of this class are cloneable with this method.
Google$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://github.com/GoogleCloudPlatform/covid-19-open-data
Aggregated data sources
Covid19DataHub
,
JHU
National data sources
Covid19DataHub
,
ECDC
,
JHU
,
JRC
,
WHO
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
# nolint start ## Not run: # set up a data cache start_using_memoise() # get all countries national <- Google$new(level = "1", get = TRUE) national$return() # show available regions with data at the second level of interest google_level_2 <- Google$new(level = "2") google_level_2$download() google_level_2$clean() google$available_regions() # get all region data for the uk google_level_2$filter("uk") google_level_2$process() google_level_2$return() # get all regional data for the UK uk <- Google$new(regions = "uk", level = "2", get = TRUE) uk$return() # get all subregional data for the UK uk <- Google$new(regions = "uk", level = "3", get = TRUE) uk$return() ## End(Not run) # nolint end
# nolint start ## Not run: # set up a data cache start_using_memoise() # get all countries national <- Google$new(level = "1", get = TRUE) national$return() # show available regions with data at the second level of interest google_level_2 <- Google$new(level = "2") google_level_2$download() google_level_2$clean() google$available_regions() # get all region data for the uk google_level_2$filter("uk") google_level_2$process() google_level_2$return() # get all regional data for the UK uk <- Google$new(regions = "uk", level = "2", get = TRUE) uk$return() # get all subregional data for the UK uk <- Google$new(regions = "uk", level = "3", get = TRUE) uk$return() ## End(Not run) # nolint end
Information for downloading, cleaning and processing COVID-19 region data for India.
covidregionaldata::DataClass
-> India
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
India$set_region_codes()
clean_common()
India state level data cleaning
India$clean_common()
get_desired_status()
Extract data from raw table
India$get_desired_status(status)
status
The data to extract
clone()
The objects of this class are cloneable with this method.
India$clone(deep = FALSE)
deep
Whether to make a deep clone.
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
## Not run: region <- India$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
## Not run: region <- India$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
This function initialises classes based on the DataClass()
which allows documented downloading, cleaning, and processing. See the
examples for some potential use cases and the DataClass()
documentation
for more details.
initialise_dataclass( class = character(), level = "1", totals = FALSE, localise = TRUE, regions, verbose = TRUE, steps = FALSE, get = FALSE, type = c("national", "regional"), ... )
initialise_dataclass( class = character(), level = "1", totals = FALSE, localise = TRUE, regions, verbose = TRUE, steps = FALSE, get = FALSE, type = c("national", "regional"), ... )
class |
A character string specifying the |
level |
A character string indicating the target administrative level
of the data with the default being "1". Currently supported options are
level 1 ("1) and level 2 ("2"). Use |
totals |
Logical, defaults to FALSE. If TRUE, returns totalled data per region up to today's date. If FALSE, returns the full dataset stratified by date and region. |
localise |
Logical, defaults to TRUE. Should region names be localised. |
regions |
A character vector of target regions to be assigned to the
|
verbose |
Logical, defaults to |
steps |
Logical, defaults to FALSE. Should all processing and cleaning steps be kept and output in a list. |
get |
Logical, defaults to FALSE. Should the class |
type |
A character vector indicating the types of data to
return. Current options include "national" (which are datasets at the
national level which inherit from |
... |
Additional arguments to pass to class specific functionality. |
An initialised version of the target class if available,
e.g. Italy()
Data interface functions
CountryDataClass
,
DataClass
,
get_available_datasets()
,
get_national_data()
,
get_regional_data()
## Not run: # set up a cache to store data to avoid downloading repeatedly start_using_memoise() # check currently available datasets get_available_datasets() # initialise a data set in the United Kingdom # at the UTLA level utla <- UK$new(level = "2") # download UTLA data utla$download() # clean UTLA data utla$clean() # inspect available level 1 regions utla$available_regions(level = "1") # filter data to the East of England utla$filter("East of England") # process UTLA data utla$process() # return processed and filtered data utla$return() # inspect all data steps utla$data # initialise Italian data, download, clean and process it italy <- initialise_dataclass("Italy", get = TRUE) italy$return() # initialise ECDC data, fully process it, and return totals ecdc <- initialise_dataclass("ecdc", get = TRUE, totals = TRUE) ecdc$return() ## End(Not run)
## Not run: # set up a cache to store data to avoid downloading repeatedly start_using_memoise() # check currently available datasets get_available_datasets() # initialise a data set in the United Kingdom # at the UTLA level utla <- UK$new(level = "2") # download UTLA data utla$download() # clean UTLA data utla$clean() # inspect available level 1 regions utla$available_regions(level = "1") # filter data to the East of England utla$filter("East of England") # process UTLA data utla$process() # return processed and filtered data utla$return() # inspect all data steps utla$data # initialise Italian data, download, clean and process it italy <- initialise_dataclass("Italy", get = TRUE) italy$return() # initialise ECDC data, fully process it, and return totals ecdc <- initialise_dataclass("ecdc", get = TRUE, totals = TRUE) ecdc$return() ## End(Not run)
Information for downloading, cleaning and processing COVID-19 region data for Italy.
covidregionaldata::DataClass
-> Italy
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data. The first, and only entry, is be named main.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
Italy$set_region_codes()
clean_common()
State level data cleaning
Italy$clean_common()
clone()
The objects of this class are cloneable with this method.
Italy$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://github.com/pcm-dpc/COVID-19/
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
India
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
## Not run: region <- Italy$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
## Not run: region <- Italy$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
Attributes and methods for COVID-19 data used for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL)
This dataset support both national and subnational data sources
with national level data returned by default. Subnational data is supported
for a subset of countries which can be found after cleaning using the
available_regions()
method, see the examples for more details. These data
sets are sourced, cleaned, standardised by the JHU team so please see the
source repository for further details. Note that unlike many other data sets
this means methods applied to this source are not being applied to raw
surveillance data but instead to already cleaned data. If using for
analysis checking the JHU source for further details is advisable.
If using this data please cite: "Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Inf Dis. 20(5):533-534. doi: 10.1016/S1473-3099(20)30120-1"
covidregionaldata::DataClass
-> covidregionaldata::CountryDataClass
-> JHU
origin
name of country to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data. The first, and only entry, is be named main.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
JHU$set_region_codes()
clean_common()
JHU specific data cleaning. Joins the raw data sets, checks column types and renames where needed.
JHU$clean_common()
clean_level_1()
JHU specific country level data cleaning. Aggregates the data to the country (level 2) level.
JHU$clean_level_1()
clone()
The objects of this class are cloneable with this method.
JHU$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data
Aggregated data sources
Covid19DataHub
,
Google
National data sources
Covid19DataHub
,
ECDC
,
Google
,
JRC
,
WHO
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
India
,
Italy
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
# nolint start ## Not run: # set up a data cache start_using_memoise() # get all countries data jhu <- JHU$new(level = "1", get = TRUE) jhu$return() # show available regions with data at the second level of interest jhu_level_2 <- JHU$new(level = "2") jhu_level_2$download() jhu_level_2$clean() jhu$available_regions() # get all region data for the uk jhu_level_2$filter("uk") jhu_level_2$process() jhu_level_2$return() ## End(Not run) # nolint end
# nolint start ## Not run: # set up a data cache start_using_memoise() # get all countries data jhu <- JHU$new(level = "1", get = TRUE) jhu$return() # show available regions with data at the second level of interest jhu_level_2 <- JHU$new(level = "2") jhu_level_2$download() jhu_level_2$clean() jhu$available_regions() # get all region data for the uk jhu_level_2$filter("uk") jhu_level_2$process() jhu_level_2$return() ## End(Not run) # nolint end
The region codes for JHU
JHU_codes
JHU_codes
An object of class spec_tbl_df
(inherits from tbl_df
, tbl
, data.frame
) with 4193 rows and 2 columns.
A tibble of region codes and related information.
Class for downloading, cleaning and processing COVID-19 region data from the European Commission's Joint Research Centre. Subnational data (admin level 1) on numbers of contagious and fatalities by COVID-19, collected directly from the National Authoritative sources (National monitoring websites, when available). For more details see https://github.com/ec-jrc/COVID-19
covidregionaldata::DataClass
-> covidregionaldata::CountryDataClass
-> JRC
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
level_data_urls
List of named links to raw data.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
clean_common()
JRC specific data cleaning. The raw source data columns are converted to the correct type and renamed appropriately to match the standard for general processing.
JRC$clean_common()
clean_level_1()
JRC specific country level data cleaning. Selects country level (level 1) columns from the data ready for further processing.
JRC$clean_level_1()
clean_level_2()
JRC specific region level data cleaning. Selects country (level 1) and region (level 2) columns from the data ready for further processing.
JRC$clean_level_2()
clone()
The objects of this class are cloneable with this method.
JRC$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://github.com/ec-jrc/COVID-19
National data sources
Covid19DataHub
,
ECDC
,
Google
,
JHU
,
WHO
## Not run: # get country level data jrc_level_1 <- JRC$new(level = "1", verbose = TRUE, steps = TRUE, get = TRUE) jrc_level_1$return() # show available regions with data at the first level of interest (country) jrc_level_1$available_regions() # get region level data jrc_level_2 <- JRC$new(level = "2", verbose = TRUE, steps = TRUE, get = TRUE) jrc_level_2$return() # show available regions with data at the second level of interest (region) jrc_level_2$available_regions() ## End(Not run)
## Not run: # get country level data jrc_level_1 <- JRC$new(level = "1", verbose = TRUE, steps = TRUE, get = TRUE) jrc_level_1$return() # show available regions with data at the first level of interest (country) jrc_level_1$available_regions() # get region level data jrc_level_2 <- JRC$new(level = "2", verbose = TRUE, steps = TRUE, get = TRUE) jrc_level_2$return() # show available regions with data at the second level of interest (region) jrc_level_2$available_regions() ## End(Not run)
Checks for use of memoise and then uses vroom::vroom.
json_reader(file, verbose = FALSE, ...)
json_reader(file, verbose = FALSE, ...)
file |
A URL or filepath to a JSON |
verbose |
Logical, defaults to |
... |
extra parameters to be passed to jsonlite::fromJSON |
A data table
Information for downloading, cleaning and processing COVID-19 region level 1 and 2 data for Lithuania.
The Official Statistics Portal (OSP) provides many data series in their table.
The full range of these vectors can be returned by setting
all_osp_fields
to TRUE
.
The following describes the data provided by the OSP.
field | description |
date |
the reporting day during which the events occurred or at the end of which the accounting was performed |
municipality_code * |
code of the municipality assigned to persons |
municipality_name + |
the name of the municipality assigned to the persons |
population |
population size according to the data of the beginning of 2021, according to the declared place of residence |
ab_pos_day |
Number of positive antibody test responses, days |
ab_neg_day |
Number of negative antibody test responses, days |
ab_tot_day |
Number of antibody tests, daily |
ab_prc_day |
Percentage of positive antibody test responses per day |
ag_pos_day |
Number of positive antigen test responses, daily |
ag_neg_day |
Number of negative antigen test responses, daily |
ag_tot_day |
Number of antigen tests, daily |
ag_prc_day |
Percentage of positive responses to antigen tests per day |
pcr_pos_day |
number of positive PCR test responses, daily |
pcr_neg_day |
Number of PCR test negative responses, daily |
pcr_tot_day |
number of PCR tests per day |
pcr_prc_day |
Percentage of positive PCR test responses per day |
dgn_pos_day |
Number of positive answers to diagnostic tests / tests, days |
dgn_neg_day |
Number of negative answers to diagnostic tests / tests, days |
dgn_prc_day |
Number of diagnostic examinations / tests, days |
dgn_tot_day |
Percentage of positive answers to diagnostic tests / tests per day |
dgn_tot_day_gmp |
Number of diagnostic examinations / tests of samples collected at mobile points, days |
daily_deaths_def1 |
The number of new deaths per day according to the (narrowest) COVID death definition No. 1. # |
daily_deaths_def2 |
Number of new deaths per day according to COVID death definition No. 2. # |
daily_deaths_def3 |
Number of new deaths per day according to COVID death definition No. 3. # |
daily_deaths_all |
Daily deaths in Lithuania (by date of death) |
incidence + |
Number of new COVID cases per day (laboratory or physician confirmed) |
cumulative_totals + |
Total number of COVID cases (laboratory or physician confirmed) |
active_de_jure |
Declared number of people with COVID |
active_sttstcl |
Statistical number of people with COVID |
dead_cases |
The number of dead persons who were ever diagnosed with COVID |
recovered_de_jure |
Declared number of recovered live persons |
recovered_sttstcl |
Statistical number of recovered live persons |
map_colors $ |
The map colour-coding for the municipality, based on averages of test positivity and incidence per capita |
*
The municipality_code
is discarded since it does not correspond
to ISO-3166:2 codes used elsewhere in the package.
+
These fields are renamed but returned unmodified.
#
Lithuania offers counts according to three
different definitions of whether a death is attributable to COVID-19.
$
This field is not recalculated for counties and is deleted.
Beginning in February 2021 the OSP publishes death counts according to three different criteria, from most to least strictly attributed to COVID-19.
of
Number of deaths with COVID-19 (coronavirus infection) as
the leading cause of death. The indicator is calculated by summing
all registered records of medical form E106 (unique persons), in which
the main cause of death is IPC disease codes U07.1 or U07.2. Deaths
due to external causes are not included (ICD disease codes are V00-Y36,
or Y85-Y87, or Y89, or S00-T79, or T89-T98).
with
Number of deaths with COVID-19 (coronavirus infection) of
any cause of death.
The indicator is calculated by summing all registered records of the
medical form E106 (unique persons), in which the ICD disease codes
U07.1, U07.2, U07.3, U07.4, U07.5 are indicated as the main, direct,
intermediate cause of death or other important pathological condition,
or identified as related to COVID-19 disease (coronavirus infection).
Deaths due to external causes are not included (ICD disease codes
are V00-Y36, or Y85-Y87, or Y89, or S00-T79, or T89-T98).
after
Number of deaths from any cause of COVID-19 or COVID-19
deaths due to non-external causes within 28 days.
The indicator is calculated by summing all registered records of the
medical form E106 (unique persons), in which the ICD disease codes
U07.1, U07.2, U07.3, U07.4, U07 are indicated as the main, direct,
intermediate cause of death or other important pathological condition,
or identified as related to COVID-19 disease (coronavirus infection)
and all records of medical form E106 (unique individuals) where the
person died within the last 28 days after receiving a positive
diagnostic response to the SARS-CoV-2 test or had an entry in medical
form E025 with ICD disease code U07.2 or U07.1. Deaths due to external
causes are not included (ICD disease codes are V00-Y36, or Y85-Y87, or
Y89, or S00-T79, or T89-T98).
The number of deaths reported in the last day is preliminary and increases by about 20-40% in a few days. Such a "delay" in the data is natural: for example, for those who died last night, a death certificate is likely to be issued as soon as this report is published this morning.
Beginning in February 2021 the OSP makes statistical estimates of the number of recovered and active cases, since review of the data showed that some cases individuals still considered as active cases had recovered, but not documented or registered as such.
These are listed as by the OSP as active_de_jure
and
recovered_de_jure
(officially still considered sick),
and active_sttstcl
and recovered_sttstcl
(an estimate of how
many of these are still ill).
covidregionaldata::DataClass
-> Lithuania
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data that are common across levels.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
death_definition
which criteria of deaths attributed to COVID to use
recovered_definition
whether to use the official counts of recovered cases or the statistical estimates provided by OSP
all_osp_fields
whether to return all the data vectors provided by OSP
national_data
whether to return data rows for national results
set_region_codes()
Set up a table of region codes for clean data
Lithuania$set_region_codes()
clean_common()
Common data cleaning for both levels
Lithuania$clean_common()
clean_level_1()
Lithuania Specific County Level Data Cleaning
Aggregates data to the level 1 (county) regional level. Data is provided by the source at the level 2 (municipality) regional level.
Lithuania$clean_level_1()
new()
Initialize the country
Lithuania$new( death_definition = "of", recovered_definition = "official", all_osp_fields = FALSE, national_data = FALSE, ... )
death_definition
A character string. Determines which criteria
for attributing deaths to COVID is used. Should be "of"
,
"with"
, or "after"
. Can also be "daily_deaths_def1"
,
"daily_deaths_def2"
, or "daily_deaths_def3"
. (Defaults
to "of"
, the strictest definition.)
recovered_definition
A character string. Determines whether
the count of officially-recovered (de jure) cases is used, or
the statistical estimate provided by OSP. Should be "official"
or "statistical"
. (Defaults to "official"
.)
all_osp_fields
A logical scalar. Should all the meaningful
data fields from the OSP source be returned? (Defaults FALSE
)
national_data
A logical scalar. Should national values be
returned? (Defaults FALSE
)
...
Parameters passed to DataClass()
initalize
clone()
The objects of this class are cloneable with this method.
Lithuania$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://hub.arcgis.com/datasets/d49a63c934be4f65a93b6273785a8449_0
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
## Not run: region <- Lithuania$new(verbose = TRUE, steps = TRUE, get = TRUE) ## End(Not run)
## Not run: region <- Lithuania$new(verbose = TRUE, steps = TRUE, get = TRUE) ## End(Not run)
The region codes for Lithuania
lithuania_codes
lithuania_codes
An object of class spec_tbl_df
(inherits from tbl_df
, tbl
, data.frame
) with 61 rows and 6 columns.
A tibble of region codes and related information, including ISO 3166:2 codes for counties (apskritis) and municipalities (savivaldybe), and noting which municipalities are city municipalities or regional municipalities.
Makes a github workflow yaml file for a given source to be used as an action to check the data as a github action.
make_github_workflow( source, workflow_path = paste0(".github/workflows/", source, ".yaml"), cron = "36 12 * * *" )
make_github_workflow( source, workflow_path = paste0(".github/workflows/", source, ".yaml"), cron = "36 12 * * *" )
source |
character_array The name of the class to create the workflow for. |
workflow_path |
character_array The path to where the workflow file should be saved. Defaults to '.github/workflows/' |
cron |
character_array the cron time to run the tests, defaults to 36 12 * * *, following the minute, hour, day(month), month and day(week) format. |
Makes a new regional or national country class with the name provided as the source. This forms a basic template for the user to fill in with the specific field values and cleaning functions required. This also creates a github workflow file for the same country.
make_new_data_source( source, type = "subnational", newfile_path = paste0("R/", source, ".R") )
make_new_data_source( source, type = "subnational", newfile_path = paste0("R/", source, ".R") )
source |
character_array The name of the class to create. Must start with a capital letter (be upper camel case or an acronym in all caps such as WHO). |
type |
character_array the type of class to create, subnational or
National defaults to subnational. Regional classes are individual countries,
such as UK, Italy, India, etc. These inherit from |
newfile_path |
character_array the place to save the class file |
A wrapper for message
that only prints output when
verbose = TRUE
.
message_verbose(verbose = TRUE, ...)
message_verbose(verbose = TRUE, ...)
verbose |
Logical, defaults to |
... |
Additional arguments passed to |
Information for downloading, cleaning and processing COVID-19 region data for Mexico.
Notes on region codes:
Level 1 codes = ISO-3166-2, source: https://en.wikipedia.org/wiki/ISO_3166-2:MX
Level 2 codes = INEGI Mexican official statistics geocoding, source: raw data
Level 1 INEGI codes are the first 2 characters of Level 2 INEGI codes
covidregionaldata::DataClass
-> Mexico
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data.
level_data_urls
List of named links to raw data that are level specific.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
Mexico$set_region_codes()
download()
Data download()
function for Mexico data. This replaces
the generic download function in DataClass()
. To get the latest data
use a PHP script from the website.
Mexico$download()
clean_common()
Common Data Cleaning
Mexico$clean_common()
clean_level_1()
Estados Level Data Cleaning
Mexico$clean_level_1()
clean_level_2()
Municipality Level Data Cleaning
Mexico$clean_level_2()
clone()
The objects of this class are cloneable with this method.
Mexico$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://datos.covid-19.conacyt.mx/
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
,
USA
## Not run: region <- Mexico$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
## Not run: region <- Mexico$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
Details of the region codes used for the Mexico dataset.
mexico_codes
mexico_codes
An object of class spec_tbl_df
(inherits from tbl_df
, tbl
, data.frame
) with 2489 rows and 4 columns.
A nested tibble of region codes and related information.
Class for downloading, cleaning and processing COVID-19 sub-regional data for the Netherlands, provided by RVIM (English: National Institute for Public Health and the Environment). This data contains number of newly reported cases (that have tested positive), number of newly reported hospital admissions and number of newly reported deaths going back to 27/02/2020. Data is provided at both the province and municipality level.
covidregionaldata::DataClass
-> Netherlands
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data. The first, and only entry, is be named main.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
Netherlands$set_region_codes()
clean_common()
Common cleaning steps to be applied to raw data, regardless of level (province or municipality) for raw Netherlands data.
Netherlands$clean_common()
clean_level_1()
Netherlands specific province level data cleaning. Takes
the data cleaned by clean_common
and aggregates it to the Province
level (level 1).
Netherlands$clean_level_1()
clone()
The objects of this class are cloneable with this method.
Netherlands$clone(deep = FALSE)
deep
Whether to make a deep clone.
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
SouthAfrica
,
Switzerland
,
UK
,
USA
## Not run: region <- Netherlands$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
## Not run: region <- Netherlands$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
Internal shared regional data cleaning designed to be called
by process
.
process_internal( clean_data, level, group_vars, totals = FALSE, localise = TRUE, verbose = TRUE, process_fns )
process_internal( clean_data, level, group_vars, totals = FALSE, localise = TRUE, verbose = TRUE, process_fns )
clean_data |
The clean data for a class, e.g. |
level |
The level of the data, e.g. 'level_1_region' |
group_vars |
Grouping variables, used to for grouping and to localise names. It is assumed that the first entry indicates the main region variable and the second indicates the geocode for this variable. |
totals |
Logical, defaults to |
localise |
Logical, defaults to |
verbose |
Logical, defaults to |
process_fns |
array, additional functions to be called after default processing steps |
Functions used in the processing pipeline
run_default_processing_fns()
,
run_optional_processing_fns()
Controls the grouping variables used in
process_internal
based on the supported regions present in the
class.
region_dispatch(level, all_levels, region_names, region_codes)
region_dispatch(level, all_levels, region_names, region_codes)
level |
A character string indicating the current level. |
all_levels |
A character vector indicating all the levels supported. |
region_names |
A named list of region names named after the levels supported. |
region_codes |
A named list of region codes named after the levels supported. |
Reset Cache and Update all Local Data
reset_cache()
reset_cache()
Null
Controls data return for get_reigonal_data
and
get_national_data
return_data(obj, class = FALSE)
return_data(obj, class = FALSE)
obj |
A Class based on a |
class |
Logical, defaults to FALSE. If TRUE returns the
|
The default processing steps to which are always run. Runs on clean data
run_default_processing_fns(data)
run_default_processing_fns(data)
data |
A data table |
Functions used in the processing pipeline
process_internal()
,
run_optional_processing_fns()
user supplied processing steps which are run after default steps
run_optional_processing_fns(data, process_fns)
run_optional_processing_fns(data, process_fns)
data |
A data table |
process_fns |
array, additional functions to be called after default processing steps |
Functions used in the processing pipeline
process_internal()
,
run_default_processing_fns()
Set data values to 0 if they are negative in a dataset. Data in the datasets should always be > 0.
set_negative_values_to_zero(data)
set_negative_values_to_zero(data)
data |
A data frame |
A data frame with all relevant data > 0.
Optional processing function
totalise_data()
Information for downloading, cleaning and processing COVID-19 region data for South Africa.
covidregionaldata::DataClass
-> SouthAfrica
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
SouthAfrica$set_region_codes()
clean_common()
Province level data cleaning
SouthAfrica$clean_common()
clone()
The objects of this class are cloneable with this method.
SouthAfrica$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://github.com/dsfsi/covid19za/
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
Switzerland
,
UK
,
USA
## Not run: region <- SouthAfrica$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
## Not run: region <- SouthAfrica$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
Adds useMemoise to options meaning memoise is used when reading data in.
start_using_memoise(path = tempdir(), verbose = TRUE)
start_using_memoise(path = tempdir(), verbose = TRUE)
path |
Path to cache directory, defaults to a temporary directory. |
verbose |
Logical, defaults to |
Sets useMemoise in options to NULL, meaning memoise isn't used when reading data in
stop_using_memoise()
stop_using_memoise()
Information for downloading, cleaning and processing COVID-19 region data for Switzerland
covidregionaldata::DataClass
-> Switzerland
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data. This url links to a JSON file which provides the addresses for the most recently-updated CSV files, which are then downloaded.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
Switzerland$set_region_codes()
download()
Download function to get raw data. Downloads
the updated list of CSV files using download_JSON
, filters
that to identify the required CSV files, then uses the parent
method download
to download the CSV files.
Switzerland$download()
clean_common()
Switzerland specific state level data cleaning
Switzerland$clean_common()
clone()
The objects of this class are cloneable with this method.
Switzerland$clone(deep = FALSE)
deep
Whether to make a deep clone.
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
UK
,
USA
## Not run: region <- Switzerland$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
## Not run: region <- Switzerland$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
Test data can be cleaned properly. The clean method is invoked
to generate clean data. This data is checked to ensure it is a data.frame,
is not empty, has at least two columns and that columns are clean by calling
expect_clean_cols
. Also tests that avaliable_regions()
are not NA and
they are all characters.
test_cleaning(DataClass_obj)
test_cleaning(DataClass_obj)
DataClass_obj |
The R6Class object to perform checks on.
Must be a |
Functions used for testing data is cleaned and processed correctly
expect_clean_cols()
,
expect_columns_contain_data()
,
expect_processed_cols()
,
test_download_JSON()
,
test_download()
,
test_processing()
,
test_return()
Test data can be downloaded if download = TRUE
, or a requested
snapshot file is not found, and store a snap shot in the snapshot_dir
. If
an existing snapshot file is found then load this data to use in future tests
test_download(DataClass_obj, download, snapshot_path)
test_download(DataClass_obj, download, snapshot_path)
DataClass_obj |
The R6Class object to perform checks on.
Must be a |
download |
Logical check to download or use a snapshot of the data |
snapshot_path |
character_array the path to save the downloaded snapshot to. |
Functions used for testing data is cleaned and processed correctly
expect_clean_cols()
,
expect_columns_contain_data()
,
expect_processed_cols()
,
test_cleaning()
,
test_download_JSON()
,
test_processing()
,
test_return()
Test data can be downloaded if download = TRUE
, or a requested
snapshot file is not found, and store a snap shot in the snapshot_dir
. If
an existing snapshot file is found then load this data to use in future tests
test_download_JSON(DataClass_obj, download, snapshot_path)
test_download_JSON(DataClass_obj, download, snapshot_path)
DataClass_obj |
The R6Class object to perform checks on.
Must be a |
download |
Logical check to download or use a snapshot of the data |
snapshot_path |
character_array the path to save the downloaded snapshot to. |
Functions used for testing data is cleaned and processed correctly
expect_clean_cols()
,
expect_columns_contain_data()
,
expect_processed_cols()
,
test_cleaning()
,
test_download()
,
test_processing()
,
test_return()
Test data can be processed correctly using the process method.
process is invoked to generate processed data which is then checked to ensure
it is a data.frame, which is not empty, has at least 2 columns and calls
expect_processed_columns
to check each column types.
test_processing(DataClass_obj, all = FALSE)
test_processing(DataClass_obj, all = FALSE)
DataClass_obj |
The R6Class object to perform checks on.
Must be a |
all |
Logical. Run tests with all settings (TRUE) or with those defined in the current class instance (FALSE). Defaults to FALSE. |
Functions used for testing data is cleaned and processed correctly
expect_clean_cols()
,
expect_columns_contain_data()
,
expect_processed_cols()
,
test_cleaning()
,
test_download_JSON()
,
test_download()
,
test_return()
Test data can be returned correctly using the return method. return is invoked to generate returned data which is then checked to ensure it is a data.frame, not empty and has at least 2 columns. Each column is then checked to ensure it contains data and is not just composed of NAs.
test_return(DataClass_obj)
test_return(DataClass_obj)
DataClass_obj |
The R6Class object to perform checks on.
Must be a |
Functions used for testing data is cleaned and processed correctly
expect_clean_cols()
,
expect_columns_contain_data()
,
expect_processed_cols()
,
test_cleaning()
,
test_download_JSON()
,
test_download()
,
test_processing()
Get totals data given the time series data.
totalise_data(data)
totalise_data(data)
data |
A data table |
A data table, totalled up
Optional processing function
set_negative_values_to_zero()
Extracts daily COVID-19 data for the UK, stratified by region and nation. Additional options for this class are: to return subnational English regions using NHS region boundaries instead of PHE boundaries (nhsregions = TRUE), a release date to download from (release_date) and a geographical resolution (resolution).
covidregionaldata::DataClass
-> UK
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data. The first, and only entry, is be named main.
level_data_urls
List of named links to raw data that are level specific.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
query_filters
Set what filters to use to query the data
nhsregions
Whether to include NHS regions in the data
release_date
The release date for the data
resolution
The resolution of the data to return
authority_data
The raw data for creating authority lookup tables
set_region_codes()
Specific function for getting region codes for UK .
UK$set_region_codes()
download()
UK specific download()
function.
UK$download()
clean_level_1()
Region Level Data Cleaning
UK$clean_level_1()
clean_level_2()
Level 2 Data Cleaning
UK$clean_level_2()
new()
Initalize the UK Class
UK$new(nhsregions = FALSE, release_date = NULL, resolution = "utla", ...)
nhsregions
Return subnational English regions using NHS region boundaries instead of PHE boundaries.
release_date
Date data was released. Default is to extract latest release. Dates should be in the format "yyyy-mm-dd".
resolution
"utla" (default) or "ltla", depending on which geographical resolution is preferred
...
Optional arguments passed to DataClass()
initalize.
\dontrun{ UK$new( level = 1, localise = TRUE, verbose = True, steps = FALSE, nhsregions = FALSE, release_date = NULL, resolution = "utla" ) }
download_filter()
Helper function for downloading data API
UK$download_filter(filter)
filter
region filters
set_filters()
Set filters for UK data api query.
UK$set_filters()
download_nhs_regions()
Download NHS data for level 1 regions Separate NHS data is available for "first" admissions, excluding readmissions. This is available for England + English regions only. Data are available separately for the periods 2020-08-01 to 2021-04-06, and 2021-04-07 - present. See: https://www.england.nhs.uk/statistics/statistical-work-areas/covid-19-hospital-activity/ Section 2, "2. Estimated new hospital cases"
UK$download_nhs_regions()
nhs data.frame of nhs regions
add_nhs_regions()
Add NHS data for level 1 regions Separate NHS data is available for "first" admissions, excluding readmissions. This is available for England + English regions only. See: https://www.england.nhs.uk/statistics/statistical-work-areas/covid-19-hospital-activity/ Section 2, "2. Estimated new hospital cases"
UK$add_nhs_regions(clean_data, nhs_data)
clean_data
Cleaned UK covid-19 data
nhs_data
NHS region data
specific_tests()
Specific tests for UK data. In addition to generic tests ran
by DataClass$test()
data for NHS regions are downloaded and ran through
the same generic checks (test_cleaning, test_processing, test_return). If
download = TRUE or a snapshot file is not found, the nhs data is
downloaded and saved to the snapshot location provided. If an existing
snapshot file is found then this data is used in the next tests.
Tests data can be downloaded, cleaned, processed and returned. Designed
to be ran from test
and not ran directly.
UK$specific_tests( self_copy, download = FALSE, all = FALSE, snapshot_path = "", ... )
self_copy
R6class the object to test.
download
logical. To download the data (TRUE) or use a snapshot (FALSE). Defaults to FALSE.
all
logical. Run tests with all settings (TRUE) or with those defined in the current class instance (FALSE). Defaults to FALSE.
snapshot_path
character_array the path to save the downloaded
snapshot to. Works on the snapshot path constructed by test
but adds
...
Additional parameters to pass to specific_tests
clone()
The objects of this class are cloneable with this method.
UK$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://coronavirus.data.gov.uk/details/download
https://coronavirus.data.gov.uk/details/download
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
USA
## Not run: # setup a data cache start_using_memoise() # download, clean and process level 1 UK data with hospital admissions region <- UK$new(level = "1", nhsregions = TRUE) region$return() # initialise level 2 data utla <- UK$new(level = "2") # download UTLA data utla$download() # clean UTLA data utla$clean() # inspect available level 1 regions utla$available_regions(level = "1") # filter data to the East of England utla$filter("East of England") # process UTLA data utla$process() # return processed and filtered data utla$return() # inspect all data steps utla$data ## End(Not run) ## ------------------------------------------------ ## Method `UK$new` ## ------------------------------------------------ ## Not run: UK$new( level = 1, localise = TRUE, verbose = True, steps = FALSE, nhsregions = FALSE, release_date = NULL, resolution = "utla" ) ## End(Not run)
## Not run: # setup a data cache start_using_memoise() # download, clean and process level 1 UK data with hospital admissions region <- UK$new(level = "1", nhsregions = TRUE) region$return() # initialise level 2 data utla <- UK$new(level = "2") # download UTLA data utla$download() # clean UTLA data utla$clean() # inspect available level 1 regions utla$available_regions(level = "1") # filter data to the East of England utla$filter("East of England") # process UTLA data utla$process() # return processed and filtered data utla$return() # inspect all data steps utla$data ## End(Not run) ## ------------------------------------------------ ## Method `UK$new` ## ------------------------------------------------ ## Not run: UK$new( level = 1, localise = TRUE, verbose = True, steps = FALSE, nhsregions = FALSE, release_date = NULL, resolution = "utla" ) ## End(Not run)
The uk authority look table for providing region codes used for level 2 UK data.
uk_codes
uk_codes
An object of class tbl_df
(inherits from tbl
, data.frame
) with 429 rows and 4 columns.
A tibble of region codes and related information.
Information for downloading, cleaning and processing COVID-19 region data for USA.
covidregionaldata::DataClass
-> USA
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
level_data_urls
List of named links to raw data that are level specific.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
set_region_codes()
Set up a table of region codes for clean data
USA$set_region_codes()
clean_level_1()
State Level Data Cleaning
USA$clean_level_1()
clean_level_2()
County Level Data Cleaning
USA$clean_level_2()
clone()
The objects of this class are cloneable with this method.
USA$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://github.com/nytimes/covid-19-data/
Subnational data sources
Belgium
,
Brazil
,
Canada
,
Colombia
,
Covid19DataHub
,
Cuba
,
Estonia
,
France
,
Germany
,
Google
,
India
,
Italy
,
JHU
,
Lithuania
,
Mexico
,
Netherlands
,
SouthAfrica
,
Switzerland
,
UK
## Not run: region <- USA$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
## Not run: region <- USA$new(verbose = TRUE, steps = TRUE, get = TRUE) region$return() ## End(Not run)
The region codes for Viet Nam
vietnam_codes
vietnam_codes
An object of class data.frame
with 63 rows and 2 columns.
A tibble of region codes and related information.
Information for downloading, cleaning and processing COVID-19 region data from the World Health Organisation
covidregionaldata::DataClass
-> covidregionaldata::CountryDataClass
-> WHO
origin
name of origin to fetch data for
supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
common_data_urls
List of named links to raw data. The first, and only entry, is be named main.
source_data_cols
existing columns within the raw data
source_text
Plain text description of the source of the data
source_url
Website address for explanation/introduction of the data
clean_common()
WHO specific data cleaning
WHO$clean_common()
return()
Specific return settings for the WHO dataset.
WHO$return()
specific_tests()
Run additional tests on WHO data. Tests that there is only
one row per country. Designed to be ran from test
and not ran directly.
WHO$specific_tests(self_copy, ...)
self_copy
R6class the object to test
...
Extra params passed to specific download functions
clone()
The objects of this class are cloneable with this method.
WHO$clone(deep = FALSE)
deep
Whether to make a deep clone.
National data sources
Covid19DataHub
,
ECDC
,
Google
,
JHU
,
JRC
## Not run: national <- WHO$new(verbose = TRUE, steps = TRUE, get = TRUE) national$return() ## End(Not run)
## Not run: national <- WHO$new(verbose = TRUE, steps = TRUE, get = TRUE) national$return() ## End(Not run)