
Partition a parquet file into multiple parquet files of a maximum size.
partition_data.Rd
This function partitions a parquet file into multiple partitions based on the specified maximum partition size.
Arguments
- original_file
Path to the original parquet file.
- partition_folder
Path to the folder where partitions must be stored.
- max_partition_size
Maximum size of each partition (the default is
25
).- units
Units of storage supported by
files_size()
(the default is'mb'
).
Note
In the urge enhance the performance, the size of each partition is
forecast by assuming homogeneous storage demand along the original file.
However this may be unrealistic, thus, the max_partition_size
do not
guarantee that the partition with the largest size have at most this size.
The above is specially true for small files/partitions, since the memory
gains due to the use of parquet becomes weaker.