This function is part of a group of functions intended to solve a scenario where there is equivalent data that is potentially stored heterogeneously (e.g. different column names and datatypes).
Arguments
- dict
Data frame that represents a refined unifying dictionary (possible created by
sort_partial_dictionary()
and refined by the user) that contains information about a group of files intended to be process together. It must have at least three columns:uniname
,uniclass
and thefile
name.- file
String that represents a file name that is part of the group of files (i.e. is a column name in
dict
).- selected_columns
Atomic vector that is a subset of the
uninames
dict
's column. In other words, is a set of desired columns that individually, should be in at least one file of the group of files.
Value
A list with three fields:
original_colnames
: Atomic character vector that contains the names of the file's columns that are related to theselected_columns
'suninames
.new_colnames
: Atomic character vector that contains theuniname
s associated with the file variablesoriginal_colnames
. Notice that are arranged to be aligned with theoriginal_colnames
.new_classes
: Atomic character vector that contains theuniclass
es associated with the file variablesoriginal_colnames
. Notice that are arranged to be aligned with theoriginal_colnames
.
Details
This function returns auxiliary information that is employed by
unify_colnames()
to unify the column names and by
unify_classes()
to unify the column data types across a group of
files.
See also
For a full example, see the vignette
process_data_with_partial_dict
in the
website
or with the command vignette('process_data_with_partial_dict', package = dataRC')
.
Examples
if (FALSE) {
unifying_file_info <- get_unifying_file_info(
'dict.xlsx', 'example.parquet', c('YEAR', 'MONTH', 'SEX'))
}