disdrodb.l0 package

Contents

disdrodb.l0 package#

Subpackages#

Submodules#

disdrodb.l0.check_configs module#

Check configuration files.

class disdrodb.l0.check_configs.L0BEncodingSchema(*, contiguous: bool, dtype: str, zlib: bool, complevel: int, shuffle: bool, fletcher32: bool, chunksizes: int | list[int] | None)[source][source]#

Bases: BaseModel

Pydantic model for DISDRODB L0B encodings.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod check_chunksizes_and_zlib(values)[source][source]#

Check the chunksizes validity.

classmethod check_contiguous_and_fletcher32(values)[source][source]#

Check the fletcher value validity.

classmethod check_contiguous_and_zlib(values)[source][source]#

Check the the compression value validity.

chunksizes: int | list[int] | None#
complevel: int#
contiguous: bool#
dtype: str#
fletcher32: bool#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None[source]#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:

self: The BaseModel instance. context: The context.

shuffle: bool#
zlib: bool#
class disdrodb.l0.check_configs.RawDataFormatSchema(*, n_digits: int | None, n_characters: int | None, n_decimals: int | None, n_naturals: int | None, data_range: list[float] | None, nan_flags: int | str | None = None, valid_values: list[float] | None = None, dimension_order: list[str] | None = None, n_values: int | None = None, field_number: str | None = None)[source][source]#

Bases: BaseModel

Pydantic model for the DISDRODB RAW Data Format YAML files.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod check_list_length(value)[source][source]#

Check the data_range validity.

data_range: list[float] | None#
dimension_order: list[str] | None#
field_number: str | None#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_characters: int | None#
n_decimals: int | None#
n_digits: int | None#
n_naturals: int | None#
n_values: int | None#
nan_flags: int | str | None#
valid_values: list[float] | None#
exception disdrodb.l0.check_configs.SchemaValidationException[source][source]#

Bases: Exception

Exception raised when schema validation fails.

disdrodb.l0.check_configs.check_all_sensors_configs() None[source][source]#

Check all sensors configuration YAML files.

disdrodb.l0.check_configs.check_l0a_encoding(sensor_name: str) None[source][source]#

Check l0a_encodings.yml file.

Parameters:

sensor_name (str) – Name of the sensor.

Raises:

ValueError – Error raised if the value of a key is not in the list of accepted values.

disdrodb.l0.check_configs.check_l0b_encoding(sensor_name: str) None[source][source]#

Check l0b_encodings.yml file based on the schema defined in the class L0BEncodingSchema.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_sensor_configs(sensor_name: str) None[source][source]#

Check validity of sensor configuration YAML files.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_standards module#

Check data standards.

disdrodb.l0.check_standards.check_l0a_column_names(df: DataFrame, sensor_name: str) None[source][source]#

Checks that the dataframe columns respects DISDRODB standards.

Parameters:
Raises:

ValueError – Error if some columns do not meet the DISDRODB standards or if the 'time' column is missing in the dataframe.

disdrodb.l0.check_standards.check_l0a_standards(df: DataFrame, sensor_name: str, logger=None, verbose: bool = True) None[source][source]#

Checks that a file respects the DISDRODB L0A standards.

Parameters:
  • df (pandas.DataFrame) – L0A dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool, optional) – Whether to verbose the processing. The default value is True.

Raises:

ValueError – Error if some columns have inconsistent values.

disdrodb.l0.check_standards.check_l0b_standards(x: str) None[source][source]#

Check L0B standards.

disdrodb.l0.l0_reader module#

Define DISDRODB L0 readers routines.

disdrodb.l0.l0_reader.available_readers(sensor_name, data_sources=None, return_path=False)[source][source]#

Retrieve available readers information.

disdrodb.l0.l0_reader.check_metadata_reader(metadata)[source][source]#

Check the metadata reader key is available and points to an existing disdrodb reader.

disdrodb.l0.l0_reader.check_reader_arguments(reader)[source][source]#

Check the reader function have the expected input arguments.

disdrodb.l0.l0_reader.check_reader_exists(reader_reference, sensor_name)[source][source]#

Check the reader exists.

disdrodb.l0.l0_reader.check_reader_reference(reader_reference)[source][source]#

Check the reader_reference value.

disdrodb.l0.l0_reader.check_software_readers()[source][source]#

Check the validity of all readers included in disdrodb software .

disdrodb.l0.l0_reader.define_reader_path(sensor_name, reader_reference)[source][source]#

Define the reader path based on the reader reference name.

disdrodb.l0.l0_reader.define_readers_directory(sensor_name='') str[source][source]#

Returns the path to the disdrodb.l0.readers directory within the disdrodb package.

disdrodb.l0.l0_reader.get_reader(reader_reference, sensor_name)[source][source]#

Retrieve the reader function.

Parameters:
  • reader_reference (str) – The reader reference name. The reader is located at disdrodb.l0.readers.{sensor_name}.{reader_reference}. The reader_reference naming convention is "{DATA_SOURCE}"/"{CAMPAIGN_NAME}_{OPTIONAL_SUFFIX}".

  • sensor_name (str) – The sensor name.

Returns:

The reader() function.

Return type:

callable

disdrodb.l0.l0_reader.get_reader_from_metadata(metadata)[source][source]#

Retrieve the reader function based on the metadata information.

The reader_reference naming convention is "{DATA_SOURCE}"/"{CAMPAIGN_NAME}_{OPTIONAL_SUFFIX}". The reader is located at disdrodb.l0.readers.{sensor_name}.{reader_reference}.

disdrodb.l0.l0_reader.get_specific_readers_path(sensor_name)[source][source]#

Returns a dictionary with the file paths of the available readers for each data source.

disdrodb.l0.l0_reader.get_specific_readers_references(sensor_name)[source][source]#

Returns a dictionary with the readers references available for each data source.

disdrodb.l0.l0_reader.get_station_reader(data_source, campaign_name, station_name, metadata_archive_dir=None)[source][source]#

Retrieve the reader function of a specific DISDRODB station.

disdrodb.l0.l0_reader.is_documented_by(original)[source][source]#

Wrapper function to apply generic docstring to the decorated function.

Parameters:

original (function) – Function to take the docstring from.

disdrodb.l0.l0_reader.list_readers_paths(sensor_name) list[source][source]#

Returns the file paths of the available readers for a given sensor in disdrodb.l0.readers.{sensor_name}.

disdrodb.l0.l0_reader.list_readers_references(sensor_name)[source][source]#

Returns the readers references available for a given sensor in disdrodb.l0.readers.{sensor_name}.

disdrodb.l0.l0_reader.reader_generic_docstring()[source][source]#

Reader to convert a raw data file to DISDRODB L0A or L0B format.

Raw text files are read and converted to a pandas.DataFrame (L0A format). Raw netCDF files are read and converted to a xarray.Dataset (L0B format).

Parameters:
  • filepath (str) – Filepath of the raw data file to be processed.

  • logger (logging.Logger, optional) – Logger to use for logging messages. Default is None, which means no logger is used.

disdrodb.l0.l0a_processing module#

Functions to process raw text files into DISDRODB L0A Apache Parquet.

disdrodb.l0.l0a_processing.cast_column_dtypes(df: DataFrame, sensor_name: str) DataFrame[source][source]#

Convert 'object' dataframe columns into DISDRODB L0A dtype standards.

Parameters:
Returns:

Dataframe with corrected columns types.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.check_matching_column_number(df, column_names)[source][source]#

Check the number of columns in the dataframe matches the length of column names.

disdrodb.l0.l0a_processing.coerce_corrupted_values_to_nan(df: DataFrame, sensor_name: str) DataFrame[source][source]#

Coerce corrupted values in dataframe numeric columns to np.nan.

Parameters:
Returns:

Dataframe with string columns without corrupted values.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.concatenate_dataframe(list_df: list, logger=None, verbose: bool = False) DataFrame[source][source]#

Concatenate a list of dataframes.

Parameters:
  • list_df (list) – List of dataframes.

  • verbose (bool, optional) – If True, print messages. If False, no print.

Returns:

Concatenated dataframe.

Return type:

pandas.DataFrame

Raises:

ValueError – Concatenation can not be done.

disdrodb.l0.l0a_processing.drop_time_periods(df, time_periods)[source][source]#

Drop problematic time periods.

disdrodb.l0.l0a_processing.drop_timesteps(df, timesteps)[source][source]#

Drop problematic time steps.

disdrodb.l0.l0a_processing.is_raw_array_string_not_corrupted(string)[source][source]#

Check if the raw array is corrupted.

disdrodb.l0.l0a_processing.preprocess_reader_kwargs(reader_kwargs: dict) dict[source][source]#

Preprocess arguments required to read raw text file into Pandas.

Parameters:

reader_kwargs (dict) – Initial parameter dictionary.

Returns:

Parameter dictionary that matches either Pandas or Dask.

Return type:

dict

disdrodb.l0.l0a_processing.read_l0a_dataframe(filepaths: str | list, verbose: bool = False, logger=None, debugging_mode: bool = False) DataFrame[source][source]#

Read DISDRODB L0A Apache Parquet file(s).

Parameters:
  • filepaths (str or list) – Either a list or a single filepath.

  • verbose (bool) – Whether to print detailed processing information into terminal. The default is False.

  • debugging_mode (bool) – If True, it reduces the amount of data to process. If filepaths is a list, it reads only the first 3 files. For each file it select only the first 100 rows. The default is False.

Returns:

L0A Dataframe.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.read_raw_text_file(filepath: str, column_names: list, reader_kwargs: dict, logger=None) DataFrame[source][source]#

Read a raw file into a dataframe.

Parameters:
  • filepath (str) – Raw file path.

  • column_names (list) – Column names.

  • reader_kwargs (dict) – Pandas pd.read_csv arguments.

  • logger (logging.Logger) – Logger object. The default is None. If None, the logger is created using the module name. If logger is passed, it will be used to log messages.

Returns:

Pandas dataframe.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.read_raw_text_files(filepaths: list | str, reader, sensor_name, verbose=True, logger=None) DataFrame[source][source]#

Read and parse a list for raw files into a dataframe.

Parameters:
  • filepaths (Union[list,str]) – File(s) path(s)

  • reader – DISDRODB reader function. Format: reader(filepath, logger=None)

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing. The default is True.

Returns:

Dataframe

Return type:

pandas.DataFrame

Raises:

ValueError – Input parameters can not be used or the raw file can not be processed.

disdrodb.l0.l0a_processing.remove_corrupted_rows(df)[source][source]#

Remove corrupted rows by checking conversion of raw fields to numeric.

Note: The raw array must be stripped away from delimiter at start and end !

disdrodb.l0.l0a_processing.remove_duplicated_timesteps(df: DataFrame, logger=None, verbose: bool = False)[source][source]#

Remove duplicated timesteps.

It keep only the first timestep occurrence !

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe with valid unique timesteps.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.remove_issue_timesteps(df, issue_dict, logger=None, verbose=False)[source][source]#

Drop dataframe rows with timesteps listed in the issue dictionary.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • issue_dict (dict) – Issue dictionary.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe with problematic timesteps removed.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.remove_rows_with_missing_time(df: ~pandas.core.frame.DataFrame, logger=<Logger disdrodb.l0.l0a_processing (WARNING)>, verbose: bool = False)[source][source]#

Remove dataframe rows where the "time" is NaT.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe with valid timesteps.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.replace_nan_flags(df, sensor_name, logger=None, verbose=False)[source][source]#

Set values corresponding to nan_flags to np.nan.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe without nan_flags values.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.sanitize_df(df, sensor_name, verbose=True, issue_dict=None, logger=None)[source][source]#

Read and parse a raw text files into a L0A dataframe.

Parameters:
  • filepath (str) – File path

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing. The default is True.

  • issue_dict (dict) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are 'timesteps' and 'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.

Returns:

Dataframe

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.set_nan_invalid_values(df, sensor_name, logger=None, verbose=False)[source][source]#

Set invalid (class) values to np.nan.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe without invalid values.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.set_nan_outside_data_range(df, sensor_name, logger=None, verbose=False)[source][source]#

Set values outside the data range as np.nan.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe without values outside the expected data range.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.strip_delimiter(string)[source][source]#

Remove the first and last delimiter occurrence from a string.

disdrodb.l0.l0a_processing.strip_delimiter_from_raw_arrays(df)[source][source]#

Remove the first and last delimiter occurrence from the raw array fields.

disdrodb.l0.l0a_processing.strip_string_spaces(df: DataFrame, sensor_name: str) DataFrame[source][source]#

Strip leading/trailing spaces from dataframe string columns.

Parameters:
Returns:

Dataframe with string columns without leading/trailing spaces.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.write_l0a(df: DataFrame, filepath: str, force: bool = False, logger=None, verbose: bool = False)[source][source]#

Save the dataframe into an Apache Parquet file.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • filepath (str) – Output file path.

  • force (bool, optional) – Whether to overwrite existing data. If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. This is the default.

  • verbose (bool, optional) – Whether to verbose the processing. The default is False.

Raises:
  • ValueError – The input dataframe can not be written as an Apache Parquet file.

  • NotImplementedError – The input dataframe can not be processed.

disdrodb.l0.l0b_nc_processing module#

Functions to process DISDRODB raw netCDF files into DISDRODB L0B netCDF files.

disdrodb.l0.l0b_nc_processing.add_dataset_missing_variables(ds, missing_vars, sensor_name)[source][source]#

Add missing xr.Dataset variables as np.nan xr.DataArrays.

disdrodb.l0.l0b_nc_processing.drop_time_periods(ds, time_periods: list)[source][source]#

Drop all time steps within any of the specified time intervals.

Parameters:
  • ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.

  • time_periods (list of tuple) – Each tuple is (start_time, end_time), datetime-like, inclusive.

Returns:

Dataset with all times within the given periods removed.

Return type:

xarray.Dataset

Raises:

ValueError – If no timesteps remain after removal.

disdrodb.l0.l0b_nc_processing.drop_timesteps(ds, timesteps: list)[source][source]#

Drop specific time steps from a Dataset.

Parameters:
  • ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.

  • timesteps (list) – List of datetime-like values to remove.

Returns:

Dataset with specified timesteps removed.

Return type:

xarray.Dataset

Raises:

ValueError – If no timesteps remain after removal.

disdrodb.l0.l0b_nc_processing.open_raw_netcdf_file(filepath, logger=None, engine='netcdf4', cache=False, chunks=None, decode_timedelta=False, **kwargs)[source][source]#

Open a raw netCDF file.

Parameters:

filepath (str) – Path to the raw netCDF file.

Returns:

Raw netCDF file as an xarray Dataset.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.remove_issue_timesteps(ds, issue_dict: dict, logger=None, verbose: bool = False)[source][source]#

Remove bad timesteps and time periods from an xarray Dataset according to issue definitions.

Parameters:
  • ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.

  • issue_dict (dict) – Dictionary with optional keys ‘timesteps’ (list of datetimes) and ‘time_periods’ (list of (start, end) tuples).

  • logger (any, optional) – Logger instance to record dropped steps, by default None.

  • verbose (bool, optional) – Whether to log informational messages, by default False.

Returns:

Cleaned dataset.

Return type:

xarray.Dataset

Raises:

ValueError – If after removing specified timesteps/periods no data remains.

disdrodb.l0.l0b_nc_processing.rename_dataset(ds, dict_names)[source][source]#

Rename xr.Dataset variables, coordinates and dimensions.

disdrodb.l0.l0b_nc_processing.replace_custom_nan_flags(ds, dict_nan_flags, logger=None, verbose=False)[source][source]#

Set values corresponding to nan_flags to np.nan.

This function must be used in a reader, if necessary.

Parameters:
  • df (xarray.Dataset) – Input xarray dataset

  • dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan.

  • verbose (bool) – Whether to verbose the processing. The default value is False.

Returns:

Dataset without nan_flags values.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.replace_nan_flags(ds, sensor_name, verbose, logger=None)[source][source]#

Set values corresponding to nan_flags to np.nan.

Parameters:
  • ds (xarray.Dataset) – Input xarray dataset

  • dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan

  • verbose (bool) – Whether to verbose the processing.

Returns:

Dataset without nan_flags values.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.sanitize_ds(ds, sensor_name, metadata, issue_dict=None, verbose=False, logger=None)[source][source]#

Convert a raw xr.Dataset into a DISDRODB L0B netCDF.

Parameters:
  • ds (xarray.Dataset) – Raw xarray dataset

  • attrs (dict) – Global metadata to attach as global attributes to the xr.Dataset.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing.

Returns:

L0B xr.Dataset

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.set_nan_invalid_values(ds, sensor_name, verbose, logger=None)[source][source]#

Set invalid (class) values to np.nan.

Parameters:
  • ds (xarray.Dataset) – Input xarray dataset

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing.

Returns:

Dataset without invalid values.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.set_nan_outside_data_range(ds, sensor_name, verbose, logger=None)[source][source]#

Set values outside the data range as np.nan.

Parameters:
  • ds (xarray.Dataset) – Input xarray dataset

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing.

Returns:

Dataset without values outside the expected data range.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.standardize_raw_dataset(ds, dict_names, sensor_name)[source][source]#

This function preprocess raw netCDF to improve compatibility with DISDRODB standards.

This function checks validity of the dict_names, rename and subset the data accordingly. If some variables specified in the dict_names are missing, it adds a np.nan xr.DataArray !

Parameters:
  • ds (xarray.Dataset) – Raw netCDF to be converted to DISDRODB standards.

  • dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.

  • sensor_name (str) – Sensor name.

Returns:

ds – xarray Dataset with variables compliant with DISDRODB conventions.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.subset_dataset(ds, dict_names, sensor_name)[source][source]#

Subset xr.Dataset with expected variables.

disdrodb.l0.l0b_processing module#

Functions to process DISDRODB L0A files into DISDRODB L0B netCDF files.

disdrodb.l0.l0b_processing.add_dataset_crs_coords(ds)[source][source]#

Add the CRS coordinate to the xr.Dataset.

disdrodb.l0.l0b_processing.create_l0b_from_l0a(df: DataFrame, metadata: dict, logger=None, verbose: bool = False) Dataset[source][source]#

Transform the L0A dataframe to the L0B xr.Dataset.

Parameters:
  • df (pandas.DataFrame) – DISDRODB L0A dataframe. The raw drop number spectrum is reshaped to a 2D(+time) array. The raw drop concentration and velocity are reshaped to 1D(+time) arrays.

  • metadata (dict) – DISDRODB station metadata. To use this function outside the DISDRODB routines, the dictionary must contain the fields: sensor_name, latitude, longitude, altitude, platform_type.

  • verbose (bool, optional) – Whether to verbose the processing. The default value is False.

Returns:

DISDRODB L0B dataset.

Return type:

xarray.Dataset

Raises:

ValueError – Error if the DISDRODB L0B xarray dataset can not be created.

disdrodb.l0.l0b_processing.finalize_dataset(ds, sensor_name, attrs)[source][source]#

Finalize DISDRODB L0B Dataset.

disdrodb.l0.l0b_processing.infer_split_str(string: str) str[source][source]#

Infer the delimiter inside a string.

Parameters:

string (str) – Input string.

Returns:

Inferred delimiter.

Return type:

str

disdrodb.l0.l0b_processing.retrieve_l0b_arrays(df: DataFrame, sensor_name: str, logger=None, verbose: bool = False) dict[source][source]#

Retrieves the L0B data matrix.

Parameters:
Returns:

Dictionary with data arrays.

Return type:

dict

disdrodb.l0.l0b_processing.set_geolocation_coordinates(ds, attrs)[source][source]#

Add geolocation coordinates to dataset.

disdrodb.l0.l0b_processing.set_l0b_encodings(ds: Dataset, sensor_name: str)[source][source]#

Apply the L0B encodings to the xarray Dataset.

Parameters:
  • ds (xarray.Dataset) – Input xarray dataset.

  • sensor_name (str) – Name of the sensor.

Returns:

Output xarray dataset.

Return type:

xarray.Dataset

disdrodb.l0.l0b_processing.write_l0b(ds: Dataset, filepath: str, force=False) None[source][source]#

Save the xarray dataset into a NetCDF file.

Parameters:
  • ds (xarray.Dataset) – Input xarray dataset.

  • filepath (str) – Output file path.

  • sensor_name (str) – Name of the sensor.

  • force (bool, optional) – Whether to overwrite existing data. If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. This is the default.

disdrodb.l0.l0c_processing module#

Functions to process DISDRODB L0B files into DISDRODB L0C netCDF files.

disdrodb.l0.l0c_processing.check_timesteps_regularity(ds, sample_interval, verbose=False, logger=None)[source][source]#

Check for the regularity of timesteps.

disdrodb.l0.l0c_processing.create_daily_file(day, filepaths, measurement_intervals, ensure_variables_equality=True, logger=None, verbose=True)[source][source]#

Create a daily file by merging and processing data from multiple filepaths.

Parameters:
  • day (str or numpy.datetime64) – The day for which the daily file is to be created. Should be in a format that can be converted to numpy.datetime64.

  • filepaths (list of str) – List of filepaths to the data files to be processed.

Returns:

The processed dataset containing data for the specified day.

Return type:

xarray.Dataset

Raises:

ValueError – If less than 5 timesteps are available for the specified day.

Notes

  • The function adds a tolerance for searching timesteps

before and after 00:00 to account for imprecise logging times. - It checks that duplicated timesteps have the same raw drop number values. - The function infers the time integration sample interval and regularizes timesteps to handle trailing seconds. - The data is loaded into memory and connections to source files are closed before returning the dataset.

disdrodb.l0.l0c_processing.drop_timesteps_with_invalid_sample_interval(ds, measurement_intervals, verbose=True, logger=None)[source][source]#

Drop timesteps with unexpected sample intervals.

disdrodb.l0.l0c_processing.finalize_l0c_dataset(ds, sample_interval, start_day, end_day, verbose=True, logger=None)[source][source]#

Finalize a L0C dataset with unique sampling interval.

It adds the sampling_interval coordinate and it regularizes the timesteps for trailing seconds.

disdrodb.l0.l0c_processing.get_files_per_days(filepaths)[source][source]#

Organize files by the days they cover based on their start and end times.

Parameters:

filepaths (list of str) – List of file paths to be processed.

Returns:

Dictionary where keys are days (as strings) and values are lists of file paths that cover those days.

Return type:

dict

Notes

This function adds a tolerance of 60 seconds to account for imprecise time logging by the sensors.

disdrodb.l0.l0c_processing.has_same_value_over_time(da)[source][source]#

Check if a DataArray has the same value over all timesteps, considering NaNs as equal.

Parameters:

da (xarray.DataArray) – The DataArray to check. Must have a ‘time’ dimension.

Returns:

True if the values are the same (or NaN in the same positions) across all timesteps, False otherwise.

Return type:

bool

disdrodb.l0.l0c_processing.remove_duplicated_timesteps(ds, ensure_variables_equality=True, logger=None, verbose=True)[source][source]#

Removes duplicated timesteps from a xarray dataset.

disdrodb.l0.l0c_processing.retrieve_possible_measurement_intervals(metadata)[source][source]#

Retrieve list of possible measurements intervals.

disdrodb.l0.l0c_processing.split_dataset_by_sampling_intervals(ds, measurement_intervals, min_sample_interval=10, min_block_size=5)[source][source]#

Split a dataset into subsets where each subset has a consistent sampling interval.

Parameters:
  • ds (xarray.Dataset) – The input dataset with a ‘time’ dimension.

  • measurement_intervals (list or array-like) – A list of possible primary sampling intervals (in seconds) that the dataset might have.

  • min_sample_interval (int, optional) – The minimum expected sampling interval in seconds. Defaults to 10s.

  • min_block_size (float, optional) – The minimum number of timesteps with a given sampling interval to be considered. Otherwise such portion of data is discarded ! Defaults to 5 timesteps.

Returns:

A dictionary where keys are the identified sampling intervals (in seconds), and values are xarray.Datasets containing only data from those intervals.

Return type:

dict

disdrodb.l0.routines module#

Implement DISDRODB L0 processing.

disdrodb.l0.routines.run_l0a_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L0A processing of a specific DISDRODB station when invoked from the terminal.

This function is intended to be called through the disdrodb_run_l0a_station command-line interface.

Parameters:
  • data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.

  • campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.

  • station_name (str) – The name of the station.

  • force (bool, optional) – If True, existing data in the destination directories will be overwritten. If False (default), an error will be raised if data already exists in the destination directories.

  • verbose (bool, optional) – If True (default), detailed processing information will be printed to the terminal. If False, less information will be displayed.

  • parallel (bool, optional) – If True, files will be processed in multiple processes simultaneously with each process using a single thread. If False (default), files will be processed sequentially in a single process, and multi-threading will be automatically exploited to speed up I/O tasks.

  • debugging_mode (bool, optional) – If True, the amount of data processed will be reduced. Only the first 3 raw data files will be processed. The default value is False.

  • data_archive_dir (str, optional) – The base directory of DISDRODB, expected in the format <...>/DISDRODB. If not specified, the path specified in the DISDRODB active configuration will be used.

disdrodb.l0.routines.run_l0b_station(data_source, campaign_name, station_name, remove_l0a: bool = False, force: bool = False, verbose: bool = True, parallel: bool = True, debugging_mode: bool = False, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L0B processing of a specific DISDRODB station when invoked from the terminal.

This function is intended to be called through the disdrodb_run_l0b_station command-line interface.

Parameters:
  • data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.

  • campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.

  • station_name (str) – The name of the station.

  • force (bool, optional) – If True, existing data in the destination directories will be overwritten. If False (default), an error will be raised if data already exists in the destination directories.

  • verbose (bool, optional) – If True (default), detailed processing information will be printed to the terminal. If False, less information will be displayed.

  • parallel (bool, optional) – If True, files will be processed in multiple processes simultaneously, with each process using a single thread to avoid issues with the HDF/netCDF library. If False (default), files will be processed sequentially in a single process, and multi-threading will be automatically exploited to speed up I/O tasks.

  • debugging_mode (bool, optional) – If True, the amount of data processed will be reduced. Only the first 100 rows of 3 L0A files will be processed. The default value is False.

  • remove_l0a (bool, optional) – Whether to remove the processed L0A files. The default value is False.

  • data_archive_dir (str, optional) – The base directory of DISDRODB, expected in the format <...>/DISDRODB. If not specified, the path specified in the DISDRODB active configuration will be used.

disdrodb.l0.routines.run_l0c_station(data_source, campaign_name, station_name, remove_l0b: bool = False, force: bool = False, verbose: bool = True, parallel: bool = True, debugging_mode: bool = False, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L0C processing of a specific DISDRODB station when invoked from the terminal.

The DISDRODB L0A and L0B routines just convert source raw data into netCDF format. The DISDRODB L0C routine ingests L0B files and performs data homogenization. The DISDRODB L0C routine takes care of:

  • removing duplicated timesteps across files,

  • merging/splitting files into daily files,

  • regularizing timesteps for potentially trailing seconds,

  • ensuring L0C files with unique sample intervals.

Duplicated timesteps are automatically dropped if their variable values coincides, otherwise an error is raised.

This function is intended to be called through the disdrodb_run_l0c_station command-line interface.

Parameters:
  • data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.

  • campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.

  • station_name (str) – The name of the station.

  • force (bool, optional) – If True, existing data in the destination directories will be overwritten. If False (default), an error will be raised if data already exists in the destination directories.

  • verbose (bool, optional) – If True (default), detailed processing information will be printed to the terminal. If False, less information will be displayed.

  • parallel (bool, optional) – If True, files will be processed in multiple processes simultaneously, with each process using a single thread to avoid issues with the HDF/netCDF library. If False (default), files will be processed sequentially in a single process, and multi-threading will be automatically exploited to speed up I/O tasks.

  • debugging_mode (bool, optional) – If True, the amount of data processed will be reduced. Only the first 3 files will be processed. The default value is False.

  • remove_l0b (bool, optional) – Whether to remove the processed L0B files. The default value is False.

  • data_archive_dir (str, optional) – The base directory of DISDRODB, expected in the format <...>/DISDRODB. If not specified, the path specified in the DISDRODB active configuration will be used.

disdrodb.l0.standards module#

Retrieve L0 sensor standards.

disdrodb.l0.standards.allowed_l0_variables(sensor_name: str) list[source][source]#

Get the list of allowed L0 variables for a given sensor.

disdrodb.l0.standards.get_bin_coords_dict(sensor_name: str) dict[source][source]#

Retrieve diameter (and velocity) bin coordinates.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with coordinates arrays.

Return type:

dict

disdrodb.l0.standards.get_data_format_dict(sensor_name: str) dict[source][source]#

Get a dictionary containing the data format of each sensor variable.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Data format of each sensor variable.

Return type:

dict

disdrodb.l0.standards.get_data_range_dict(sensor_name: str) dict[source][source]#

Get the variable data range.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected data value range for each data field. It excludes variables without specified data_range key.

Return type:

dict

disdrodb.l0.standards.get_diameter_bin_center(sensor_name: str) list[source][source]#

Get diameter bin center.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Diameter bin center.

Return type:

list

disdrodb.l0.standards.get_diameter_bin_lower(sensor_name: str) list[source][source]#

Get diameter bin lower bound.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Diameter bin lower bound.

Return type:

list

disdrodb.l0.standards.get_diameter_bin_upper(sensor_name: str) list[source][source]#

Get diameter bin upper bound.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Diameter bin upper bound.

Return type:

list

disdrodb.l0.standards.get_diameter_bin_width(sensor_name: str) list[source][source]#

Get diameter bin width.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Diameter bin width.

Return type:

list

disdrodb.l0.standards.get_diameter_bins_dict(sensor_name: str) dict[source][source]#

Get dictionary with sensor_name diameter bins information.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Sensor diameter bins information.

Return type:

dict

disdrodb.l0.standards.get_dims_size_dict(sensor_name: str) dict[source][source]#

Get the number of bins for each dimension.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the number of bins for each dimension.

Return type:

dict

disdrodb.l0.standards.get_field_nchar_dict(sensor_name: str) dict[source][source]#

Get the total number of characters from the instrument default string standards.

Important note: it accounts also for the comma and the minus sign !!!

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected number of characters for each data field.

Return type:

dict

disdrodb.l0.standards.get_field_ndigits_decimals_dict(sensor_name: dict) dict[source][source]#

Get number of digits on the right side of the comma from the instrument default string standards.

Example: 123,45 -> 45 –> 2 decimal digits.

Parameters:

sensor_name (dict) – Name of the sensor.

Returns:

Dictionary with the expected number of decimal digits for each data field.

Return type:

dict

disdrodb.l0.standards.get_field_ndigits_dict(sensor_name: str) dict[source][source]#

Get number of digits from the instrument default string standards.

Important note: it excludes the comma but it counts the minus sign !!!

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected number of digits for each data field.

Return type:

dict

disdrodb.l0.standards.get_field_ndigits_natural_dict(sensor_name: str) dict[source][source]#

Get number of digits on the left side of the comma from the instrument default string standards.

Example: 123,45 -> 123 –> 3 natural digits.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected number of natural digits for each data field.

Return type:

dict

disdrodb.l0.standards.get_l0a_dtype(sensor_name: str) dict[source][source]#

Get a dictionary containing the L0A dtype.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the L0A dtype.

Return type:

dict

disdrodb.l0.standards.get_l0a_encodings_dict(sensor_name: str) dict[source][source]#

Get a dictionary containing the L0A encodings.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

L0A encodings.

Return type:

dict

disdrodb.l0.standards.get_l0b_cf_attrs_dict(sensor_name: str) dict[source][source]#

Get a dictionary containing the CF attributes of each sensor variable.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

CF attributes of each sensor variable. For each variable, the ‘units’, ‘description’, and ‘long_name’ attributes are specified.

Return type:

dict

disdrodb.l0.standards.get_l0b_encodings_dict(sensor_name: str) dict[source][source]#

Get a dictionary containing the encoding to write L0B netCDFs.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Encoding to write L0B netCDFs

Return type:

dict

disdrodb.l0.standards.get_n_diameter_bins(sensor_name)[source][source]#

Get the number of diameter bins.

disdrodb.l0.standards.get_n_velocity_bins(sensor_name)[source][source]#

Get the number of velocity bins.

disdrodb.l0.standards.get_nan_flags_dict(sensor_name: str) dict[source][source]#

Get the variable nan_flags.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected nan_flags list for each data field. It excludes variables without specified nan_flags key.

Return type:

dict

disdrodb.l0.standards.get_raw_array_dims_order(sensor_name: str) dict[source][source]#

Get the dimension order of the raw fields.

The order of dimension specified for raw_drop_number controls the reshaping of the precipitation raw spectrum.

Examples

OTT Parsivel spectrum [v1d1 … v1d32, v2d1, …, v2d32] –> dimension_order = [“velocity_bin_center”, “diameter_bin_center”] Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] –> dimension_order = [“diameter_bin_center”, “velocity_bin_center”]

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dimension order dictionary.

Return type:

dict

disdrodb.l0.standards.get_raw_array_nvalues(sensor_name: str) dict[source][source]#

Get a dictionary with the number of values expected for each raw array.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Field definition.

Return type:

dict

disdrodb.l0.standards.get_sensor_logged_variables(sensor_name: str) list[source][source]#

Get the sensor logged variables list.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

List of the variables logged by the sensor.

Return type:

list

disdrodb.l0.standards.get_valid_coordinates_names(sensor_name)[source][source]#

Get list of valid coordinates for DISDRODB L0B.

disdrodb.l0.standards.get_valid_dimension_names(sensor_name)[source][source]#

Get list of valid dimension names for DISDRODB L0B.

disdrodb.l0.standards.get_valid_names(sensor_name)[source][source]#

Return the list of valid variable and coordinates names for DISDRODB L0B.

disdrodb.l0.standards.get_valid_values_dict(sensor_name: str) dict[source][source]#

Get the list of valid values for a variable.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected values for specific variables. It excludes variables without specified valid_values key.

Return type:

dict

disdrodb.l0.standards.get_valid_variable_names(sensor_name)[source][source]#

Get list of valid variables.

disdrodb.l0.standards.get_variables_dimension(sensor_name: str)[source][source]#

Returns a dictionary with the variable dimensions of a L0B product.

disdrodb.l0.standards.get_velocity_bin_center(sensor_name: str) list[source][source]#

Get velocity bin center.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Velocity bin center.

Return type:

list

disdrodb.l0.standards.get_velocity_bin_lower(sensor_name: str) list[source][source]#

Get velocity bin lower bound.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Velocity bin lower bound.

Return type:

list

disdrodb.l0.standards.get_velocity_bin_upper(sensor_name: str) list[source][source]#

Get velocity bin upper bound.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Velocity bin upper bound.

Return type:

list

disdrodb.l0.standards.get_velocity_bin_width(sensor_name: str) list[source][source]#

Get velocity bin width.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Velocity bin width.

Return type:

list

disdrodb.l0.standards.get_velocity_bins_dict(sensor_name: str) dict[source][source]#

Get velocity with sensor_name diameter bins information.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Sensor velocity bins information.

Return type:

dict

disdrodb.l0.template_tools module#

Useful tools helping in the implementation of the DISDRODB L0 readers.

disdrodb.l0.template_tools.check_column_names(column_names: list, sensor_name: str) None[source][source]#

Checks that the column names respects DISDRODB standards.

Parameters:
  • column_names (list) – List of columns names.

  • sensor_name (str) – Name of the sensor.

Raises:

TypeError – Error if some columns do not meet the DISDRODB standards.

disdrodb.l0.template_tools.get_decimal_ndigits(string: str) int[source][source]#

Get the number of decimal digits.

Parameters:

string (str) – Input string.

Returns:

The number of decimal digits.

Return type:

int

disdrodb.l0.template_tools.get_df_columns_unique_values_dict(df: DataFrame, column_indices: int | slice | list | None = None, column_names: bool = True)[source][source]#

Create a dictionary {column: unique values}.

Parameters:
  • df (pandas.DataFrame) – Input dataframe

  • column_indices (Union[int,slice,list], optional) – Column indices. If None, select all columns.

  • column_names (bool, optional) – If True, the dictionary key are the column names. The default value is True.

disdrodb.l0.template_tools.get_natural_ndigits(string: str) int[source][source]#

Get the number of natural digits.

Parameters:

string (str) – Input string.

Returns:

The number of natural digits.

Return type:

int

disdrodb.l0.template_tools.get_nchar(string: str) int[source][source]#

Get the number of characters.

Parameters:

string (str) – Input string.

Returns:

The number of characters.

Return type:

int

disdrodb.l0.template_tools.get_ndigits(string: str) int[source][source]#

Get the number of total numeric digits.

Parameters:

string (str) – Input string

Returns:

The number of total digits.

Return type:

int

disdrodb.l0.template_tools.get_unique_sorted_values(array)[source][source]#

Return unique sorted values.

It deals with np.nan within an array of string by converting object dtype to str.

disdrodb.l0.template_tools.infer_column_names(df: DataFrame, sensor_name: str, row_idx: int = 1)[source][source]#

Try to guess the dataframe columns names based on string characteristics.

Parameters:
  • df (pandas.DataFrame) – The dataframe to analyse.

  • sensor_name (str) – name of the sensor.

  • row_idx (int, optional) – The row index of the dataframe to use to infer the column names. The default row index is 1.

Returns:

Dictionary with the keys being the column id and the values being the guessed column names

Return type:

dict

disdrodb.l0.template_tools.print_allowed_column_names(sensor_name: str) None[source][source]#

Print valid columns names from the standard.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.template_tools.print_df_column_names(df: DataFrame) None[source][source]#

Print dataframe columns names.

Parameters:

df (pandas.DataFrame) – The dataframe.

disdrodb.l0.template_tools.print_df_columns_unique_values(df: DataFrame, column_indices: int | slice | list | None = None, print_column_names: bool = True) None[source][source]#

Print columns’ unique values.

Parameters:
  • df (pandas.DataFrame) – Input dataframe

  • column_indices (Union[int,slice,list], optional) – Column indices. If None, select all columns.

  • column_names (bool, optional) – If True, print the column names. The default value is True.

disdrodb.l0.template_tools.print_df_first_n_rows(df: DataFrame, n: int = 5, print_column_names: bool = True) None[source][source]#

Print the n first n rows dataframe by column.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • n (int, optional) – Number of row. The default is 5.

  • column_names (bool , optional) – If true columns name are printed, by default True.

disdrodb.l0.template_tools.print_df_random_n_rows(df: DataFrame, n: int = 5, print_column_names: bool = True) None[source][source]#

Print the content of the dataframe by column, randomly chosen.

Parameters:
  • df (pandas.DataFrame) – The dataframe.

  • n (int, optional) – The number of row to print. The default is 5.

  • print_column_names (bool, optional) – If true, print the column names. The default value is True.

disdrodb.l0.template_tools.print_df_summary_stats(df: DataFrame, column_indices: int | slice | list | None = None, print_column_names: bool = True)[source][source]#

Create a columns statistics summary.

Parameters:
  • df (pandas.DataFrame) – Input dataframe

  • column_indices (Union[int,slice,list], optional) – Column indices. If None, select all columns.

  • print_column_names (bool, optional) – If True, print the column names. The default value is True.

Raises:

ValueError – Error if columns types is not numeric.

disdrodb.l0.template_tools.print_df_with_any_nan_rows(df: DataFrame) None[source][source]#

Print empty rows.

Parameters:

df (pandas.DataFrame) – Input dataframe.

disdrodb.l0.template_tools.str_has_decimal_digits(string: str) bool[source][source]#

Check if a string has decimals.

Parameters:

string (str) – Input string.

Returns:

True if string has digits.

Return type:

bool

disdrodb.l0.template_tools.str_is_integer(string: str) bool[source][source]#

Check if a string represent an integer.

Parameters:

string (str) – Input string.

Returns:

True if integer.

Return type:

bool

disdrodb.l0.template_tools.str_is_number(string: str) bool[source][source]#

Check if a string represents a number.

Parameters:

string (str) – Input string.

Returns:

True if float.

Return type:

bool

Module contents#

DISDRODB L0 software.