Get necessary data for unify_colnames() and unify_classes()

This function is part of a group of functions intended to solve a scenario where there is equivalent data that is potentially stored heterogeneously (e.g. different column names and datatypes).

Usage

get_unifying_file_info(dict, file, selected_columns)

Arguments

dict: Data frame that represents a refined unifying dictionary (possible created by sort_partial_dictionary() and refined by the user) that contains information about a group of files intended to be process together. It must have at least three columns: uniname, uniclass and the file name.
file: String that represents a file name that is part of the group of files (i.e. is a column name in dict).
selected_columns: Atomic vector that is a subset of the uninames dict's column. In other words, is a set of desired columns that individually, should be in at least one file of the group of files.

Value

A list with three fields:

original_colnames: Atomic character vector that contains the names of the file's columns that are related to the selected_columns's uninames.
new_colnames: Atomic character vector that contains the uninames associated with the file variables original_colnames. Notice that are arranged to be aligned with the original_colnames.
new_classes: Atomic character vector that contains the uniclasses associated with the file variables original_colnames. Notice that are arranged to be aligned with the original_colnames.

Details

This function returns auxiliary information that is employed by unify_colnames() to unify the column names and by unify_classes() to unify the column data types across a group of files.

Examples

if (FALSE) {
unifying_file_info <- get_unifying_file_info(
    'dict.xlsx', 'example.parquet', c('YEAR', 'MONTH', 'SEX'))
}

Get necessary data for `unify_colnames()` and `unify_classes()`

Usage

Arguments

Value

Details

See also

Examples