disdrodb.utils package#
Submodules#
disdrodb.utils.compression module#
DISDRODB raw data compression utility.
- disdrodb.utils.compression.archive_station_data(metadata_filepath: str) str [source]#
Archive station data into a zip file for subsequent data upload.
It create a zip file into a temporary directory !
- Parameters
metadata_filepath (str) – Metadata file path.
- disdrodb.utils.compression.compress_station_files(base_dir: str, data_source: str, campaign_name: str, station_name: str, method: str = 'gzip', skip: bool = True) None [source]#
Compress each raw file of a station.
- Parameters
base_dir (str) – Base directory of DISDRODB
data_source (str) – Name of data source of interest.
campaign_name (str) – Name of the campaign of interest.
station_name (str) – Station name of interest.
method (str) – Compression method.
"zip"
,"gzip"
or"bzip2"
.skip (bool) – Whether to raise an error if a file is already compressed. If
True
, it does not raise an error and try to compress the other files. IfFalse
, it raise an error and stop the compression routine. The default isTrue
.
disdrodb.utils.directories module#
Define utilities for Directory/File Checks/Creation/Deletion.
- disdrodb.utils.directories.copy_file(src_filepath, dst_filepath)[source]#
Copy a file from a location to another.
- disdrodb.utils.directories.count_directories(dir_path, glob_pattern, recursive=False)[source]#
Return the number of files (exclude directories).
- disdrodb.utils.directories.count_files(dir_path, glob_pattern, recursive=False)[source]#
Return the number of files (exclude directories).
- disdrodb.utils.directories.create_directory(path: str, exist_ok=True) None [source]#
Create a directory at the provided path.
- disdrodb.utils.directories.create_required_directory(dir_path, dir_name)[source]#
Create directory
dir_name
inside thedir_path
directory.
- disdrodb.utils.directories.ensure_string_path(path, msg, accepth_pathlib=False)[source]#
Ensure that the path is a string.
- disdrodb.utils.directories.is_empty_directory(path)[source]#
Check if a directory path is empty.
Return
False
if path is a file or non-empty directory. If the path does not exist, raise an error.
- disdrodb.utils.directories.list_directories(dir_path, glob_pattern, recursive=False)[source]#
Return a list of directory paths (exclude file paths).
- disdrodb.utils.directories.list_files(dir_path, glob_pattern, recursive=False)[source]#
Return a list of filepaths (exclude directory paths).
- disdrodb.utils.directories.list_paths(dir_path, glob_pattern, recursive=False)[source]#
Return a list of filepaths and directory paths.
- disdrodb.utils.directories.remove_if_exists(path: str, force: bool = False) None [source]#
Remove file or directory if exists and
force=True
.If
force=False
, it raises an error.
- disdrodb.utils.directories.remove_path_trailing_slash(path: str) str [source]#
Removes a trailing slash or backslash from a file path if it exists.
This function ensures that the provided file path is normalized by removing any trailing directory separator characters (
'/'
or'\\'
). This is useful for maintaining consistency in path strings and for preparing paths for operations that may not expect a trailing slash.- Parameters
path (str) – The file path to normalize.
- Returns
The normalized path without a trailing slash.
- Return type
str
- Raises
TypeError – If the input path is not a string.
Examples
>>> remove_trailing_slash("some/path/") 'some/path' >>> remove_trailing_slash("another\\path\\") 'another\\path'
disdrodb.utils.logger module#
DISDRODB logger utility.
- disdrodb.utils.logger.close_logger(logger: <Logger asyncio (WARNING)>) None [source]#
Close the logger.
- Parameters
logger (logger) – Logger object.
- disdrodb.utils.logger.create_file_logger(processed_dir, product, station_name, filename, parallel)[source]#
Create file logger.
- disdrodb.utils.logger.define_summary_log(list_logs)[source]#
Define a station summary and a problems log file from the list of input logs.
The summary log select only logged lines with
root
,WARNING
andERROR
keywords. The problems log file select only logged lines with theERROR
keyword. The two log files are saved in the parent directory of the input list_logs.The function assume that the files logs are located at:
/DISDRODB/Processed/<DATA_SOURCE>/<CAMPAIGN_NAME>/logs/<product>/<station_name>/*.log
- disdrodb.utils.logger.log_debug(logger: <Logger asyncio (WARNING)>, msg: str, verbose: bool = False) None [source]#
Include debug entry into log.
- Parameters
logger (logger) – Log object.
msg (str) – Message.
verbose (bool, optional) – Whether to verbose the processing. The default is
False
.
- disdrodb.utils.logger.log_error(logger: <Logger asyncio (WARNING)>, msg: str, verbose: bool = False) None [source]#
Include error entry into log.
- Parameters
logger (logger) – Log object.
msg (str) – Message.
verbose (bool, optional) – Whether to verbose the processing. The default is
False
.
disdrodb.utils.netcdf module#
DISDRODB netCDF utility.
- disdrodb.utils.netcdf.ensure_monotonic_dimension(list_ds: list, filepaths: str, dim: str = 'time', verbose: bool = False) list [source]#
Ensure that a list of xr.Dataset has a monotonic increasing (non duplicated) dimension values.
- Parameters
list_ds (list) – List of xarray Dataset.
filepaths (list) – List of netCDFs file paths.
dim (str, optional) – Dimension name. The default is
"time"
.
- Returns
list – List of xarray Dataset.
list – List of netCDFs file paths.
- disdrodb.utils.netcdf.ensure_unique_dimension_values(list_ds: list, filepaths: str, dim: str = 'time', verbose: bool = False) list [source]#
Ensure that a list of xr.Dataset has non duplicated dimension values.
- Parameters
list_ds (list) – List of xarray Dataset.
filepaths (list) – List of netCDFs file paths.
dim (str, optional) – Dimension name. The default is
"time"
.
- Returns
list – List of xarray Dataset.
list – List of netCDFs file paths.
- disdrodb.utils.netcdf.get_list_ds(filepaths: str) list [source]#
Get list of xarray datasets from file paths.
- Parameters
filepaths (list) – List of netCDFs file paths.
- Returns
List of xarray datasets.
- Return type
list
- disdrodb.utils.netcdf.xr_concat_datasets(filepaths: str, verbose=False) Dataset [source]#
Concat xr.Dataset in a robust and parallel way.
It checks for time dimension monotonicity
- Parameters
filepaths (list) – List of netCDFs file paths.
- Returns
A single xarray dataset.
- Return type
xr.Dataset
- Raises
ValueError – Error if the merging/concatenation operations can not be achieved.
disdrodb.utils.pandas module#
Pandas utility.
disdrodb.utils.scripts module#
DISDRODB scripts utility.
- disdrodb.utils.scripts.click_base_dir_option(function: object)[source]#
Click command line argument for DISDRODB
base_dir
.- Parameters
function (object) – Function.
- disdrodb.utils.scripts.click_station_arguments(function: object)[source]#
Click command line arguments for DISDRODB station processing.
- Parameters
function (object) – Function.
disdrodb.utils.xarray module#
Xarray utility.
- disdrodb.utils.xarray.get_dataset_start_end_time(ds: Dataset)[source]#
Retrieves dataset starting and ending time.
- Parameters
ds (xr.Dataset) – Input dataset
- Returns
(
starting_time
,ending_time
)- Return type
tuple
- disdrodb.utils.xarray.regularize_dataset(ds: ~xarray.core.dataset.Dataset, freq: str, time_dim='time', method=None, fill_value=<NA>)[source]#
Regularize a dataset across time dimension with uniform resolution.
- Parameters
ds (xr.Dataset) – xarray Dataset.
time_dim (str, optional) – The time dimension in the xr.Dataset. The default is
"time"
.freq (str) – The
freq
string to pass topd.date_range
to define the new time coordinates. Examples:freq="2min"
.method (str, optional) – Method to use for filling missing timesteps. If
None
, fill withfill_value
. The default isNone
. For other possible methods, see https://docs.xarray.dev/en/stable/generated/xarray.Dataset.reindex.htmlfill_value (float, optional) – Fill value to fill missing timesteps. The default is
dtypes.NA
.
- Returns
ds_reindexed – Regularized dataset.
- Return type
xr.Dataset
disdrodb.utils.yaml module#
YAML utility.