This function is part of a group of functions intended to solve a scenario where there is equivalent data that is potentially stored heterogeneously (e.g. different column names and datatypes).
Arguments
- dict
Data frame that represents a refined unifying dictionary (possible created by
sort_partial_dictionary()and refined by the user) that contains information about a group of files intended to be process together. It must have at least three columns:uniname,uniclassand thefilename.- file
String that represents a file name that is part of the group of files (i.e. is a column name in
dict).- selected_columns
Atomic vector that is a subset of the
uninamesdict's column. In other words, is a set of desired columns that individually, should be in at least one file of the group of files.
Value
A list with three fields:
original_colnames: Atomic character vector that contains the names of the file's columns that are related to theselected_columns'suninames.new_colnames: Atomic character vector that contains theuninames associated with the file variablesoriginal_colnames. Notice that are arranged to be aligned with theoriginal_colnames.new_classes: Atomic character vector that contains theuniclasses associated with the file variablesoriginal_colnames. Notice that are arranged to be aligned with theoriginal_colnames.
Details
This function returns auxiliary information that is employed by
unify_colnames() to unify the column names and by
unify_classes() to unify the column data types across a group of
files.
See also
For a full example, see the vignette
process_data_with_partial_dict in the
website
or with the command vignette('process_data_with_partial_dict', package = dataRC').
Examples
if (FALSE) {
unifying_file_info <- get_unifying_file_info(
'dict.xlsx', 'example.parquet', c('YEAR', 'MONTH', 'SEX'))
}
