Skip to contents

This function refine the raw partial dictionary created by create_partial_dictionary(). It reads the raw dictionary file, sorts the column names by frequency (across all files) and alphabetically, and creates some columns with descriptive statistics for each variable. The sorted (refined) dictionary is saved into a xlsx file.

Usage

sort_partial_dictionary(old_dict_path, new_dict_path = NULL, overwrite = F)

Arguments

old_dict_path

Path to the raw dictionary file.

new_dict_path

Path to save the sorted (refined) dictionary If NULL, the original file will be old_dict_path (the default is NULL).

overwrite

Logical indicating whether to overwrite the existing dictionary file if new_dict_path already exists. Its default value is FALSE to avoid undesired changes.

Value

None. The function saves the sorted dictionary file to the specified location. The dictionary has the following columns:

  • uniname: Suggested unifying name for each variable. It groups across files identical case robust variable names.

  • uniclass: Empty column that is intended to be filled manually by the user with the unifying class for each variable. Theoretically, this could be filled with any value, however, is recommended to use supported values by unify_classes(), as is intended to be used with this function.

  • coverage: The percentage of files that have a match with the uniname.

  • class_mode: The class mode per uniname computed with get_mode().

  • unique_classes: All the classes per uniname.

  • One column per file named as the file itself, excluding the folder path. Using only the file name ensures that the dictionary remains valid even if the dataset is moved to a different location, as long as the file structure is preserved.

Details

After the creation of the dictionary, the user must manually valida the uniname suggestions and make the necessary changes. Besides, the user must complete the uniclass column with supported values by unify_classes(). After this, is recommended to use the functions unify_colnames(), unify_classes() and relocate_columns() for an efficient data processing.

See also

For a full example, see the vignette process_data_with_partial_dict in the website or with the command vignette('process_data_with_partial_dict', package = dataRC').

Examples

if (FALSE) {
# Sort a partial dictionary file
sort_partial_dictionary(old_dict_path = "original_dictionary.xlsx",
                          overwrite = TRUE)
}