disdrodb.utils package

disdrodb.utils package#

Submodules#

disdrodb.utils.attrs module#

DISDRODB netCDF4 attributes utilities.

disdrodb.utils.attrs.get_coords_attrs_dict()[source][source]#: Return dictionary with DISDRODB coordinates attributes.

disdrodb.utils.attrs.set_attrs(ds, attrs_dict)[source][source]#: Set attributes to the variables of the xr.Dataset.

disdrodb.utils.attrs.set_coordinate_attributes(ds)[source][source]#: Set coordinates attributes.

disdrodb.utils.attrs.set_disdrodb_attrs(ds, product: str)[source][source]#

Add DISDRODB processing information to the netCDF global attributes.

It assumes stations metadata are already added the dataset.

Parameters:

ds (xarray.Dataset) – Dataset
product (str) – DISDRODB product.

Returns:

Dataset.

Return type:

xarray dataset

disdrodb.utils.attrs.update_disdrodb_attrs(ds, product: str)[source][source]#

Add DISDRODB processing information to the netCDF global attributes.

It assumes stations metadata are already added the dataset.

Parameters:

ds (xarray dataset.) – Dataset
product (str) – DISDRODB product.

Returns:

Dataset.

Return type:

xarray dataset

disdrodb.utils.cli module#

DISDRODB command-line-interface scripts utilities.

disdrodb.utils.cli.click_data_archive_dir_option(function: object)[source][source]#

Click command line argument for DISDRODB data_archive_dir.

Parameters:: function (object) – Function.

disdrodb.utils.cli.click_l0_archive_options(function: object)[source][source]#

Click command line arguments for L0 processing archiving of a station.

Parameters:: function (object) – Function.

disdrodb.utils.cli.click_metadata_archive_dir_option(function: object)[source][source]#

Click command line argument for DISDRODB metadata_archive_dir.

Parameters:: function (object) – Function.

disdrodb.utils.cli.click_processing_options(function: object)[source][source]#

Click command line default parameters for L0 processing options.

Parameters:: function (object) – Function.

disdrodb.utils.cli.click_remove_l0a_option(function: object)[source][source]#: Click command line argument for remove_l0a.

disdrodb.utils.cli.click_remove_l0b_option(function: object)[source][source]#: Click command line argument for remove_l0b.

disdrodb.utils.cli.click_station_arguments(function: object)[source][source]#

Click command line arguments for DISDRODB station processing.

Parameters:: function (object) – Function.

disdrodb.utils.cli.click_stations_options(function: object)[source][source]#

Click command line options for DISDRODB archive L0 processing.

Parameters:: function (object) – Function.

disdrodb.utils.cli.parse_archive_dir(archive_dir: str)[source][source]#

Utility to parse archive directories provided by command line.

If archive_dir = 'None' returns None. If archive_dir = '' returns None.

disdrodb.utils.cli.parse_arg_to_list(args)[source][source]#

Utility to pass list to command line scripts.

If args = '' returns None. If args = 'None' returns None. If args = 'variable' returns [variable]. If args = 'variable1 variable2' returns [variable1, variable2].

disdrodb.utils.compression module#

DISDRODB raw data compression utility.

disdrodb.utils.compression.archive_station_data(metadata_filepath: str, data_archive_dir: str) → str[source][source]#

Archive station data into a zip file for subsequent data upload.

It create a zip file into a temporary directory !

Parameters:: metadata_filepath (str) – Metadata file path.

disdrodb.utils.compression.check_consistent_station_name(metadata_filepath, station_name)[source][source]#: Check consistent station_name between YAML file name and metadata key.

disdrodb.utils.compression.compress_station_files(data_archive_dir: str, data_source: str, campaign_name: str, station_name: str, method: str = 'gzip', skip: bool = True) → None[source][source]#

Compress each raw file of a station.

Parameters:

data_archive_dir (str) – DISDRODB Data Archive directory
data_source (str) – Name of data source of interest.
campaign_name (str) – Name of the campaign of interest.
station_name (str) – Station name of interest.
method (str) – Compression method. "zip", "gzip" or "bzip2".
skip (bool) – Whether to raise an error if a file is already compressed. If True, it does not raise an error and try to compress the other files. If False, it raise an error and stop the compression routine. The default value is True.

disdrodb.utils.compression.unzip_file(filepath: str, dest_path: str) → None[source][source]#

Unzip a file into a directory.

Parameters:

filepath (str) – Path of the file to unzip.
dest_path (str) – Path of the destination directory.

disdrodb.utils.dask module#

Utilities for Dask Distributed computations.

disdrodb.utils.dask.close_dask_cluster(cluster, client)[source][source]#: Close Dask Cluster.

disdrodb.utils.dask.initialize_dask_cluster()[source][source]#: Initialize Dask Cluster.

disdrodb.utils.decorators module#

DISDRODB decorators.

disdrodb.utils.decorators.check_pytmatrix_availability(func)[source][source]#: Decorator to ensure that the ‘pytmatrix’ package is installed.

disdrodb.utils.decorators.check_software_availability(software, conda_package)[source][source]#

A decorator to ensure that a software package is installed.

Parameters:

software (str) – The package name as recognized by Python’s import system.
conda_package (str) – The package name as recognized by conda-forge.

disdrodb.utils.decorators.delayed_if_parallel(function)[source][source]#: Decorator to make the function delayed if its parallel argument is True.

disdrodb.utils.decorators.single_threaded_if_parallel(function)[source][source]#: Decorator to make a function use a single threadon delayed if its parallel argument is True.

disdrodb.utils.directories module#

Define utilities for Directory/File Checks/Creation/Deletion.

disdrodb.utils.directories.check_directory_exists(dir_path)[source][source]#: Check if the directory exists.

disdrodb.utils.directories.check_glob_pattern(pattern: str) → None[source][source]#

Check if glob pattern is a string and is a valid pattern.

Parameters:: pattern (str) – String to be checked.

disdrodb.utils.directories.check_glob_patterns(patterns: str | list) → list[source][source]#: Check if glob patterns are valids.

disdrodb.utils.directories.contains_files(dir_path: str) → bool[source][source]#

Check (recursively) if a directory contains any file.

os.walk under the hood uses os.scandir os.walk file generator + any() avoid use of while loop

The function returns True as soon as one file is found (short-circuit); False otherwise.

disdrodb.utils.directories.contains_netcdf_or_parquet_files(dir_path: str) → bool[source][source]#

Check (recursively) if a directory has any Parquet or netCDF file.

os.walk under the hood uses os.scandir os.walk file generator + any() avoid use of while loop

The function returns True as soon as one file is found (short-circuit)^; False otherwise.

disdrodb.utils.directories.copy_file(src_filepath, dst_filepath)[source][source]#: Copy a file from a location to another.

disdrodb.utils.directories.count_directories(dir_path, glob_pattern, recursive=False)[source][source]#: Return the number of files (exclude directories).

disdrodb.utils.directories.count_files(dir_path, glob_pattern, recursive=False)[source][source]#: Return the number of files (exclude directories).

disdrodb.utils.directories.create_directory(path: str, exist_ok=True) → None[source][source]#: Create a directory at the provided path.

disdrodb.utils.directories.create_required_directory(dir_path, dir_name, exist_ok=True)[source][source]#: Create directory dir_name inside the dir_path directory.

disdrodb.utils.directories.ensure_string_path(path, msg, accepth_pathlib=False)[source][source]#: Ensure that the path is a string.

disdrodb.utils.directories.is_empty_directory(path)[source][source]#

Check if a directory path is empty.

Return False if path is a file or non-empty directory. If the path does not exist, raise an error.

disdrodb.utils.directories.list_directories(dir_path, glob_pattern, recursive=False)[source][source]#: Return a list of directory paths (exclude file paths).

disdrodb.utils.directories.list_files(dir_path, glob_pattern, recursive=False)[source][source]#: Return a list of filepaths (exclude directory paths).

disdrodb.utils.directories.list_paths(dir_path, glob_pattern, recursive=False)[source][source]#

Return a list of filepaths and directory paths.

This function accept also a list of glob patterns !

disdrodb.utils.directories.remove_if_exists(path: str, force: bool = False, logger=None) → None[source][source]#

Remove file or directory if exists and force=True.

If force=False, it raises an error.

disdrodb.utils.directories.remove_path_trailing_slash(path: str) → str[source][source]#

Removes a trailing slash or backslash from a file path if it exists.

This function ensures that the provided file path is normalized by removing any trailing directory separator characters ('/' or '\\'). This is useful for maintaining consistency in path strings and for preparing paths for operations that may not expect a trailing slash.

Parameters:: path (str) – The file path to normalize.
Returns:: The normalized path without a trailing slash.
Return type:: str
Raises:: TypeError – If the input path is not a string.

Examples

>>> remove_trailing_slash("some/path/")
'some/path'
>>> remove_trailing_slash("another\\path\\")
'another\\path'

disdrodb.utils.encoding module#

DISDRODB netCDF4 encoding utilities.

disdrodb.utils.encoding.get_time_encoding() → dict[source][source]#

Create time encoding.

Returns:: Time encoding.
Return type:: dict

disdrodb.utils.encoding.rechunk_dataset(ds: Dataset, encoding_dict: dict) → Dataset[source][source]#

Coerce the dataset arrays to have the chunk size specified in the encoding dictionary.

Parameters:

ds (xarray.Dataset) – Input xarray dataset
encoding_dict (dict) – Dictionary containing the encoding to write the xarray dataset as a netCDF.

Returns:

Output xarray dataset

Return type:

xarray.Dataset

disdrodb.utils.encoding.sanitize_encodings_dict(encoding_dict: dict, ds: Dataset) → dict[source][source]#

Ensure chunk size to be smaller than the array shape.

Parameters:

encoding_dict (dict) – Dictionary containing the variable encodings.
ds (xarray.Dataset) – Input dataset.

Returns:

Encoding dictionary.

Return type:

dict

disdrodb.utils.encoding.set_encodings(ds: Dataset, encoding_dict: dict) → Dataset[source][source]#

Apply the encodings to the xarray Dataset.

Parameters:

ds (xarray.Dataset) – Input xarray dataset.
encoding_dict (dict) – Dictionary with encoding specifications.

Returns:

Output xarray dataset.

Return type:

xarray.Dataset

disdrodb.utils.list module#

Utilities to work with lists.

disdrodb.utils.list.flatten_list(nested_list)[source][source]#: Flatten a nested list into a single-level list.

disdrodb.utils.logger module#

DISDRODB logger utility.

disdrodb.utils.logger.close_logger(logger) → None[source][source]#

Close the logger.

Parameters:: logger (logging.Logger) – Logger object.

disdrodb.utils.logger.create_logger_file(logs_dir, filename, parallel)[source][source]#: Create logger file.

disdrodb.utils.logger.create_product_logs(product, data_source, campaign_name, station_name, data_archive_dir=None, list_logs=None, **product_kwargs)[source][source]#

Create station summary and station problems log files.

The summary log selects only logged lines with root, WARNING, and ERROR keywords. The problems log file selects only logged lines with the ERROR keyword.

The logs directory structure is the follow: /logs - /files/<product_acronym>/<station> (same structure as data … a log for each processed file) - /summary

–> SUMMARY.<PRODUCT_ACRONYM>.<CAMPAIGN_NAME>.<STATION_NAME>.log

/problems –> PROBLEMS.<PRODUCT_ACRONYM>.<CAMPAIGN_NAME>.<STATION_NAME>.log

Parameters:

product (str) – The DISDRODB product.
data_source (str) – The data source name.
campaign_name (str) – The campaign name.
station_name (str) – The station name.
data_archive_dir (str, optional) – The base directory path. Default is None.
sample_interval (str, optional) – The sample interval for L2E option. Default is None.
rolling (str, optional) – The rolling option for L2E. Default is None.
model_name (str, optional) – The model name for L2M. Default is None.
list_logs (list, optional) – List of log file paths. If None, the function will list the log files.

Return type:

None

disdrodb.utils.logger.log_debug(logger: <Logger asyncio (WARNING)>, msg: str, verbose: bool = False) → None[source][source]#

Include debug entry into log.

Parameters:

logger (logging.Logger) – Log object.
msg (str) – Message.
verbose (bool, optional) – Whether to verbose the processing. The default value is False.

disdrodb.utils.logger.log_error(logger: <Logger asyncio (WARNING)>, msg: str, verbose: bool = False) → None[source][source]#

Include error entry into log.

Parameters:

logger (logging.Logger) – Log object.
msg (str) – Message.
verbose (bool, optional) – Whether to verbose the processing. The default value is False.

disdrodb.utils.logger.log_info(logger: <Logger asyncio (WARNING)>, msg: str, verbose: bool = False) → None[source][source]#

Include info entry into log.

Parameters:

logger (logging.Logger) – Log object.
msg (str) – Message.
verbose (bool, optional) – Whether to verbose the processing. The default value is False.

disdrodb.utils.logger.log_warning(logger: <Logger asyncio (WARNING)>, msg: str, verbose: bool = False) → None[source][source]#

Include warning entry into log.

Parameters:

logger (logging.Logger) – Log object.
msg (str) – Message.
verbose (bool, optional) – Whether to verbose the processing. The default value is False.

disdrodb.utils.time module#

This module contains utilities related to the processing of temporal dataset.

disdrodb.utils.time.acronym_to_seconds(acronym)[source][source]#

Extract the interval in seconds from the duration acronym.

Parameters:: acronym (str) – A string representing a duration: e.g., “1H30MIN”, “ROLL1H30MIN”.
Returns:: Duration in seconds.
Return type:: seconds

disdrodb.utils.time.ensure_sample_interval_in_seconds(sample_interval)[source][source]#

Ensure the sample interval is in seconds.

Parameters:: sample_interval (int, numpy.ndarray, xarray.DataArray, or numpy.timedelta64) – The sample interval to be converted to seconds. It can be: - An integer representing the interval in seconds. - A numpy array or xarray DataArray of integers representing intervals in seconds. - A numpy.timedelta64 object representing the interval. - A numpy array or xarray DataArray of numpy.timedelta64 objects representing intervals.
Returns:: The sample interval converted to seconds. The return type matches the input type: - If the input is an integer, the output is an integer. - If the input is a numpy array, the output is a numpy array of integers (unless NaN is present) - If the input is an xarray DataArray, the output is an xarray DataArray of integers (unless NaN is present).
Return type:: int, numpy.ndarray, or xarray.DataArray

disdrodb.utils.time.ensure_sorted_by_time(obj, time='time')[source][source]#: Ensure a xarray object or pandas Dataframe is sorted by time.

disdrodb.utils.time.get_dataframe_start_end_time(df: DataFrame, time_column='time')[source][source]#

Retrieves dataframe starting and ending time.

Parameters:

df (pandas.DataFrame) – Input dataframe
time_column (str) – Name of the time column. The default is “time”. The column must be of type datetime.

Returns:

(start_time, end_time) – File start and end time of type pandas.Timestamp.

Return type:

tuple

disdrodb.utils.time.get_dataset_start_end_time(ds: Dataset, time_dim='time')[source][source]#

Retrieves dataset starting and ending time.

Parameters:

ds (xarray.Dataset) – Input dataset
time_dim (str) – Name of the time dimension. The default is “time”.

Returns:

(start_time, end_time) – File start and end time of type pandas.Timestamp.

Return type:

tuple

disdrodb.utils.time.get_file_start_end_time(obj, time='time')[source][source]#

Retrieves object starting and ending time.

Parameters:

obj (xarray.Dataset or pandas.DataFrame) – Input object with time dimension or column respectively.
time (str) – Name of the time dimension or column. The default is “time”.

Returns:

(start_time, end_time) – File start and end time of type pandas.Timestamp.

Return type:

tuple

disdrodb.utils.time.get_problematic_timestep_indices(timesteps, sample_interval)[source][source]#: Identify timesteps with missing previous or following timesteps.

disdrodb.utils.time.get_resampling_information(sample_interval_acronym)[source][source]#

Extract resampling information from the sample interval acronym.

Parameters:: sample_interval_acronym (str) – A string representing the sample interval: e.g., “1H30MIN”, “ROLL1H30MIN”.
Returns:: sample_interval_seconds, rolling – Sample_interval in seconds and whether rolling is enabled.
Return type:: tuple

disdrodb.utils.time.infer_sample_interval(ds, robust=False, verbose=False, logger=None)[source][source]#

Infer the sample interval of a dataset.

Duplicated timesteps are removed before inferring the sample interval.

NOTE: This function is used only for the reader preparation.

disdrodb.utils.time.regularize_dataset(xr_obj, freq: str, time_dim: str = 'time', method: str | None = None, fill_value=None)[source][source]#

Regularize a dataset across time dimension with uniform resolution.

Parameters:

xr_obj (xarray.Dataset or xr.DataArray) – xarray object with time dimension.
time_dim (str, optional) – The time dimension in the xarray object. The default value is "time".
freq (str) – The freq string to pass to pd.date_range() to define the new time coordinates. Examples: freq="2min".
method (str, optional) – Method to use for filling missing timesteps. If None, fill with fill_value. The default value is None. For other possible methods, see xarray.Dataset.reindex()`.
fill_value ((float, dict), optional) – Fill value to fill missing timesteps. If not specified, for float variables it uses dtypes.NA while for for integers variables it uses the maximum allowed integer value or, in case of undecoded variables, the _FillValue DataArray attribute..

Returns:

ds_reindexed – Regularized dataset.

Return type:

xarray.Dataset

disdrodb.utils.time.regularize_timesteps(ds, sample_interval, robust=False, add_quality_flag=True, logger=None, verbose=True)[source][source]#

Ensure timesteps match with the sample_interval.

This function: - drop dataset indices with duplicated timesteps, - but does not add missing timesteps to the dataset.

disdrodb.utils.time.seconds_to_acronym(seconds)[source][source]#

Convert a duration in seconds to a readable string format (e.g., “1H30”, “1D2H”).

Parameters:: (int) (- seconds) –
Returns:: - str
Return type:: The duration as a string in a format like “30S”, “1MIN30S”, “1H30MIN”, or “1D2H”.

disdrodb.utils.warnings module#

Warning utilities.

disdrodb.utils.warnings.suppress_warnings()[source][source]#: Context manager suppressing RuntimeWarnings and UserWarnings.

disdrodb.utils.writer module#

DISDRODB product writers.

disdrodb.utils.writer.write_product(ds: Dataset, filepath: str, product: str, force: bool = False) → None[source][source]#

Save the xarray dataset into a NetCDF file.

Parameters:

ds (xarray.Dataset) – Input xarray dataset.
filepath (str) – Output file path.
product (str) – DISDRODB product name.
force (bool, optional) – Whether to overwrite existing data. If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. This is the default.

disdrodb.utils.xarray module#

Xarray utilities.

disdrodb.utils.xarray.define_dataarray_fill_value(da)[source][source]#: Define the fill value for a numerical xarray.DataArray.

disdrodb.utils.xarray.define_dataarray_fill_value_dictionary(da)[source][source]#

Define fill values for numerical variables and coordinates of a xarray.DataArray.

Return a dict of fill values:

floating → NaN
integer → ds[var].attrs[“_FillValue”] if present, else np.iinfo(dtype).max

disdrodb.utils.xarray.define_dataset_fill_value_dictionary(ds)[source][source]#

Define fill values for numerical variables and coordinates of a xarray.Dataset.

Return a dict of per-variable fill values:

floating –> NaN
integer –> ds[var].attrs[“_FillValue”] if present, else the maximum allowed number.

disdrodb.utils.xarray.define_fill_value_dictionary(xr_obj)[source][source]#

Define fill values for numerical variables and coordinates of a xarray object.

Return a dict of per-variable fill values:

floating –> NaN
integer –> ds[var].attrs[“_FillValue”] if present, else the maximum allowed number.

disdrodb.utils.xarray.remove_diameter_coordinates(xr_obj)[source][source]#: Drop diameter coordinates from xarray object.

disdrodb.utils.xarray.remove_velocity_coordinates(xr_obj)[source][source]#: Drop velocity coordinates from xarray object.

disdrodb.utils.xarray.xr_get_last_valid_idx(da_condition, dim, fill_value=None)[source][source]#

Get the index of the last True value along a specified dimension in an xarray DataArray.

This function finds the last index along the given dimension where the condition is True. If all values are False or NaN along that dimension, the function returns fill_value.

Parameters:

da_condition (xarray.DataArray) – A boolean DataArray where True indicates valid or desired values. Should have the dimension specified in dim.
dim (str) – The name of the dimension along which to find the last True index.
fill_value (int or float) – The fill value when all values are False or NaN along the specified dimension. The default value is dim_size - 1.

Returns:

last_idx – An array containing the index of the last True value along the specified dimension. If all values are False or NaN, the corresponding entry in last_idx will be NaN.

Return type:

xarray.DataArray

Notes

The function works by reversing the DataArray along the specified dimension and using argmax to find the first True value in the reversed array. It then calculates the corresponding index in the original array. To handle cases where all values are False or NaN (and argmax would return 0), the function checks if there is any True value along the dimension and assigns NaN to last_idx where appropriate.

Examples

>>> import xarray as xr
>>> da = xr.DataArray([[False, False, True], [False, False, False]], dims=["time", "my_dimension"])
>>> last_idx = xr_get_last_valid_idx(da, "my_dimension")
>>> print(last_idx)
<xarray.DataArray (time: 2)>
array([2., nan])
Dimensions without coordinates: time

In this example, for the first time step, the last True index is 2. For the second time step, all values are False, so the function returns NaN.

disdrodb.utils.yaml module#

YAML utility.

disdrodb.utils.yaml.read_yaml(filepath: str) → dict[source][source]#

Read a YAML file into a dictionary.

Parameters:: filepath (str) – Input YAML file path.
Returns:: Dictionary with the attributes read from the YAML file.
Return type:: dict

disdrodb.utils.yaml.write_yaml(dictionary, filepath, sort_keys=False)[source][source]#

Write a dictionary into a YAML file.

Parameters:: dictionary (dict) – Dictionary to write into a YAML file.

Module contents#

DISDRODB Utils Module.

disdrodb.utils package

Contents

disdrodb.utils package#

Submodules#

disdrodb.utils.attrs module#

disdrodb.utils.cli module#

disdrodb.utils.compression module#

disdrodb.utils.dask module#

disdrodb.utils.decorators module#

disdrodb.utils.directories module#

disdrodb.utils.encoding module#

disdrodb.utils.list module#

disdrodb.utils.logger module#

disdrodb.utils.time module#

disdrodb.utils.warnings module#

disdrodb.utils.writer module#

disdrodb.utils.xarray module#

disdrodb.utils.yaml module#

Module contents#