disdrodb.utils package#

Submodules#

disdrodb.utils.compression module#

DISDRODB raw data compression utility.

disdrodb.utils.compression.archive_station_data(metadata_filepath: str) str[source]#

Archive station data into a zip file for subsequent data upload.

It create a zip file into a temporary directory !

Parameters

metadata_filepath (str) – Metadata file path.

disdrodb.utils.compression.compress_station_files(base_dir: str, data_source: str, campaign_name: str, station_name: str, method: str = 'gzip', skip: bool = True) None[source]#

Compress each raw file of a station.

Parameters
  • base_dir (str) – Base directory of DISDRODB

  • data_source (str) – Name of data source of interest.

  • campaign_name (str) – Name of the campaign of interest.

  • station_name (str) – Station name of interest.

  • method (str) – Compression method. "zip", "gzip" or "bzip2".

  • skip (bool) – Whether to raise an error if a file is already compressed. If True, it does not raise an error and try to compress the other files. If False, it raise an error and stop the compression routine. The default is True.

disdrodb.utils.compression.unzip_file(filepath: str, dest_path: str) None[source]#

Unzip a file into a directory.

Parameters
  • filepath (str) – Path of the file to unzip.

  • dest_path (str) – Path of the destination directory.

disdrodb.utils.directories module#

Define utilities for Directory/File Checks/Creation/Deletion.

disdrodb.utils.directories.check_directory_exists(dir_path)[source]#

Check if the directory exists.

disdrodb.utils.directories.copy_file(src_filepath, dst_filepath)[source]#

Copy a file from a location to another.

disdrodb.utils.directories.count_directories(dir_path, glob_pattern, recursive=False)[source]#

Return the number of files (exclude directories).

disdrodb.utils.directories.count_files(dir_path, glob_pattern, recursive=False)[source]#

Return the number of files (exclude directories).

disdrodb.utils.directories.create_directory(path: str, exist_ok=True) None[source]#

Create a directory at the provided path.

disdrodb.utils.directories.create_required_directory(dir_path, dir_name)[source]#

Create directory dir_name inside the dir_path directory.

disdrodb.utils.directories.ensure_string_path(path, msg, accepth_pathlib=False)[source]#

Ensure that the path is a string.

disdrodb.utils.directories.is_empty_directory(path)[source]#

Check if a directory path is empty.

Return False if path is a file or non-empty directory. If the path does not exist, raise an error.

disdrodb.utils.directories.list_directories(dir_path, glob_pattern, recursive=False)[source]#

Return a list of directory paths (exclude file paths).

disdrodb.utils.directories.list_files(dir_path, glob_pattern, recursive=False)[source]#

Return a list of filepaths (exclude directory paths).

disdrodb.utils.directories.list_paths(dir_path, glob_pattern, recursive=False)[source]#

Return a list of filepaths and directory paths.

disdrodb.utils.directories.remove_if_exists(path: str, force: bool = False) None[source]#

Remove file or directory if exists and force=True.

If force=False, it raises an error.

disdrodb.utils.directories.remove_path_trailing_slash(path: str) str[source]#

Removes a trailing slash or backslash from a file path if it exists.

This function ensures that the provided file path is normalized by removing any trailing directory separator characters ('/' or '\\'). This is useful for maintaining consistency in path strings and for preparing paths for operations that may not expect a trailing slash.

Parameters

path (str) – The file path to normalize.

Returns

The normalized path without a trailing slash.

Return type

str

Raises

TypeError – If the input path is not a string.

Examples

>>> remove_trailing_slash("some/path/")
'some/path'
>>> remove_trailing_slash("another\\path\\")
'another\\path'

disdrodb.utils.logger module#

DISDRODB logger utility.

disdrodb.utils.logger.close_logger(logger: <Logger asyncio (WARNING)>) None[source]#

Close the logger.

Parameters

logger (logger) – Logger object.

disdrodb.utils.logger.create_file_logger(processed_dir, product, station_name, filename, parallel)[source]#

Create file logger.

disdrodb.utils.logger.define_summary_log(list_logs)[source]#

Define a station summary and a problems log file from the list of input logs.

The summary log select only logged lines with root, WARNING and ERROR keywords. The problems log file select only logged lines with the ERROR keyword. The two log files are saved in the parent directory of the input list_logs.

The function assume that the files logs are located at:

/DISDRODB/Processed/<DATA_SOURCE>/<CAMPAIGN_NAME>/logs/<product>/<station_name>/*.log

disdrodb.utils.logger.log_debug(logger: <Logger asyncio (WARNING)>, msg: str, verbose: bool = False) None[source]#

Include debug entry into log.

Parameters
  • logger (logger) – Log object.

  • msg (str) – Message.

  • verbose (bool, optional) – Whether to verbose the processing. The default is False.

disdrodb.utils.logger.log_error(logger: <Logger asyncio (WARNING)>, msg: str, verbose: bool = False) None[source]#

Include error entry into log.

Parameters
  • logger (logger) – Log object.

  • msg (str) – Message.

  • verbose (bool, optional) – Whether to verbose the processing. The default is False.

disdrodb.utils.logger.log_info(logger: <Logger asyncio (WARNING)>, msg: str, verbose: bool = False) None[source]#

Include info entry into log.

Parameters
  • logger (logger) – Log object.

  • msg (str) – Message.

  • verbose (bool, optional) – Whether to verbose the processing. The default is False.

disdrodb.utils.logger.log_warning(logger: <Logger asyncio (WARNING)>, msg: str, verbose: bool = False) None[source]#

Include warning entry into log.

Parameters
  • logger (logger) – Log object.

  • msg (str) – Message.

  • verbose (bool, optional) – Whether to verbose the processing. The default is False.

disdrodb.utils.netcdf module#

DISDRODB netCDF utility.

disdrodb.utils.netcdf.ensure_monotonic_dimension(list_ds: list, filepaths: str, dim: str = 'time', verbose: bool = False) list[source]#

Ensure that a list of xr.Dataset has a monotonic increasing (non duplicated) dimension values.

Parameters
  • list_ds (list) – List of xarray Dataset.

  • filepaths (list) – List of netCDFs file paths.

  • dim (str, optional) – Dimension name. The default is "time".

Returns

  • list – List of xarray Dataset.

  • list – List of netCDFs file paths.

disdrodb.utils.netcdf.ensure_unique_dimension_values(list_ds: list, filepaths: str, dim: str = 'time', verbose: bool = False) list[source]#

Ensure that a list of xr.Dataset has non duplicated dimension values.

Parameters
  • list_ds (list) – List of xarray Dataset.

  • filepaths (list) – List of netCDFs file paths.

  • dim (str, optional) – Dimension name. The default is "time".

Returns

  • list – List of xarray Dataset.

  • list – List of netCDFs file paths.

disdrodb.utils.netcdf.get_list_ds(filepaths: str) list[source]#

Get list of xarray datasets from file paths.

Parameters

filepaths (list) – List of netCDFs file paths.

Returns

List of xarray datasets.

Return type

list

disdrodb.utils.netcdf.xr_concat_datasets(filepaths: str, verbose=False) Dataset[source]#

Concat xr.Dataset in a robust and parallel way.

  1. It checks for time dimension monotonicity

Parameters

filepaths (list) – List of netCDFs file paths.

Returns

A single xarray dataset.

Return type

xr.Dataset

Raises

ValueError – Error if the merging/concatenation operations can not be achieved.

disdrodb.utils.pandas module#

Pandas utility.

disdrodb.utils.pandas.get_dataframe_start_end_time(df: DataFrame)[source]#

Retrieves dataframe starting and ending time.

Parameters

df (pd.DataFrame) – Input dataframe

Returns

(starting_time, ending_time)

Return type

tuple

disdrodb.utils.scripts module#

DISDRODB scripts utility.

disdrodb.utils.scripts.click_base_dir_option(function: object)[source]#

Click command line argument for DISDRODB base_dir.

Parameters

function (object) – Function.

disdrodb.utils.scripts.click_station_arguments(function: object)[source]#

Click command line arguments for DISDRODB station processing.

Parameters

function (object) – Function.

disdrodb.utils.scripts.parse_arg_to_list(args)[source]#

Utility to pass list to command line scripts.

If args = '' returns None. If args = 'None' returns None. If args = 'variable' returns [variable]. If args = 'variable1 variable2' returns [variable1, variable2].

disdrodb.utils.scripts.parse_base_dir(base_dir)[source]#

Utility to parse base_dir provided by command line.

If base_dir = 'None' returns None. If base_dir = '' returns None.

disdrodb.utils.xarray module#

Xarray utility.

disdrodb.utils.xarray.get_dataset_start_end_time(ds: Dataset)[source]#

Retrieves dataset starting and ending time.

Parameters

ds (xr.Dataset) – Input dataset

Returns

(starting_time, ending_time)

Return type

tuple

disdrodb.utils.xarray.regularize_dataset(ds: ~xarray.core.dataset.Dataset, freq: str, time_dim='time', method=None, fill_value=<NA>)[source]#

Regularize a dataset across time dimension with uniform resolution.

Parameters
  • ds (xr.Dataset) – xarray Dataset.

  • time_dim (str, optional) – The time dimension in the xr.Dataset. The default is "time".

  • freq (str) – The freq string to pass to pd.date_range to define the new time coordinates. Examples: freq="2min".

  • method (str, optional) – Method to use for filling missing timesteps. If None, fill with fill_value. The default is None. For other possible methods, see https://docs.xarray.dev/en/stable/generated/xarray.Dataset.reindex.html

  • fill_value (float, optional) – Fill value to fill missing timesteps. The default is dtypes.NA.

Returns

ds_reindexed – Regularized dataset.

Return type

xr.Dataset

disdrodb.utils.yaml module#

YAML utility.

disdrodb.utils.yaml.read_yaml(filepath: str) dict[source]#

Read a YAML file into a dictionary.

Parameters

filepath (str) – Input YAML file path.

Returns

Dictionary with the attributes read from the YAML file.

Return type

dict

disdrodb.utils.yaml.write_yaml(dictionary, filepath, sort_keys=False)[source]#

Write a dictionary into a YAML file.

Parameters

dictionary (dict) – Dictionary to write into a YAML file.

Module contents#