Skip to contents

This function is part of a group of functions intended to solve a scenario where there is equivalent data that is potentially stored heterogeneously (e.g. different column names and datatypes).

Usage

get_unifying_file_info(dict, file, selected_columns)

Arguments

dict

Data frame that represents a refined unifying dictionary (possible created by sort_partial_dictionary() and refined by the user) that contains information about a group of files intended to be process together. It must have at least three columns: uniname, uniclass and the file name.

file

String that represents a file name that is part of the group of files (i.e. is a column name in dict).

selected_columns

Atomic vector that is a subset of the uninames dict's column. In other words, is a set of desired columns that individually, should be in at least one file of the group of files.

Value

A list with three fields:

  • original_colnames: Atomic character vector that contains the names of the file's columns that are related to the selected_columns's uninames.

  • new_colnames: Atomic character vector that contains the uninames associated with the file variables original_colnames. Notice that are arranged to be aligned with the original_colnames.

  • new_classes: Atomic character vector that contains the uniclasses associated with the file variables original_colnames. Notice that are arranged to be aligned with the original_colnames.

Details

This function returns auxiliary information that is employed by unify_colnames() to unify the column names and by unify_classes() to unify the column data types across a group of files.

See also

For a full example, see the vignette process_data_with_partial_dict in the website or with the command vignette('process_data_with_partial_dict', package = dataRC').

Examples

if (FALSE) {
unifying_file_info <- get_unifying_file_info(
    'dict.xlsx', 'example.parquet', c('YEAR', 'MONTH', 'SEX'))
}