
Refine the "raw partial dictionary"
sort_partial_dictionary.Rd
This function refine the raw partial dictionary created by
create_partial_dictionary()
. It reads the raw dictionary file,
sorts the column names by frequency (across all files) and alphabetically,
and creates some columns with descriptive statistics for each variable. The
sorted (refined) dictionary is saved into a xlsx file.
Arguments
- old_dict_path
Path to the raw dictionary file.
- new_dict_path
Path to save the sorted (refined) dictionary If
NULL
, the original file will beold_dict_path
(the default isNULL
).- overwrite
Logical indicating whether to overwrite the existing dictionary file if
new_dict_path
already exists. Its default value isFALSE
to avoid undesired changes.
Value
None. The function saves the sorted dictionary file to the specified location. The dictionary has the following columns:
uniname
: Suggested unifying name for each variable. It groups across files identical case robust variable names.uniclass
: Empty column that is intended to be filled manually by the user with the unifying class for each variable. Theoretically, this could be filled with any value, however, is recommended to use supported values byunify_classes()
, as is intended to be used with this function.coverage
: The percentage of files that have a match with theuniname
.class_mode
: The class mode peruniname
computed withget_mode()
.unique_classes
: All the classes peruniname
.One column per file named as the file itself, excluding the folder path. Using only the file name ensures that the dictionary remains valid even if the dataset is moved to a different location, as long as the file structure is preserved.
Details
After the creation of the dictionary, the user must manually valida
the uniname
suggestions and make the necessary changes. Besides, the user
must complete the uniclass
column with supported values by
unify_classes()
. After this, is recommended to use the functions
unify_colnames()
, unify_classes()
and
relocate_columns()
for an efficient data processing.
See also
For a full example, see the vignette
process_data_with_partial_dict
in the
website
or with the command
vignette('process_data_with_partial_dict', package = dataRC')
.