disdrodb.l0 package

Contents

disdrodb.l0 package#

Subpackages#

Submodules#

disdrodb.l0.check_configs module#

Check configuration files.

class disdrodb.l0.check_configs.L0BEncodingSchema(*, contiguous: bool, dtype: str, zlib: bool, complevel: int, shuffle: bool, fletcher32: bool, _FillValue: int | float | None = None, chunksizes: int | list[int] | None, add_offset: float | None = None, scale_factor: float | None = None)[source][source]#

Bases: CustomBaseModel

Pydantic model for DISDRODB netCDF encodings.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

FillValue: int | float | None#
add_offset: float | None#
classmethod check_chunksizes_and_zlib(values)[source][source]#

Check the chunksizes validity.

classmethod check_contiguous_and_fletcher32(values)[source][source]#

Check the fletcher value validity.

classmethod check_contiguous_and_zlib(values)[source][source]#

Check the compression value validity.

classmethod check_integer_fillvalue(values)[source][source]#

Check that integer dtypes have valid _FillValue.

chunksizes: int | list[int] | None#
complevel: int#
contiguous: bool#
dtype: str#
fletcher32: bool#
model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'hide_error_urls': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

remove_offset_and_scale_if_not_int()[source][source]#

Ensure add_offset and scale_factor only apply to integer/uint dtypes.

scale_factor: float | None#
shuffle: bool#
zlib: bool#
class disdrodb.l0.check_configs.RawDataFormatSchema(*, n_digits: int | None, n_characters: int | None, n_decimals: int | None, n_naturals: int | None, data_range: list[float] | None, nan_flags: int | float | str | list | None = None, valid_values: list[float] | None = None, dimension_order: list[str] | None = None, n_values: int | None = None, field_number: str | None = None)[source][source]#

Bases: CustomBaseModel

Pydantic model for the DISDRODB RAW Data Format YAML files.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod check_list_length(value)[source][source]#

Check the data_range validity.

data_range: list[float] | None#
dimension_order: list[str] | None#
field_number: str | None#
model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'hide_error_urls': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_characters: int | None#
n_decimals: int | None#
n_digits: int | None#
n_naturals: int | None#
n_values: int | None#
nan_flags: int | float | str | list | None#
valid_values: list[float] | None#
disdrodb.l0.check_configs.check_all_sensors_configs() None[source][source]#

Check all sensors configuration YAML files.

disdrodb.l0.check_configs.check_bin_consistency(sensor_name: str) None[source][source]#

Check bin consistency from config file.

Do not check the first and last bin !

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_cf_attributes(sensor_name: str) None[source][source]#

Check that the l0b_cf_attrs.yml description, long_name and units values are strings.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_l0a_encoding(sensor_name: str) None[source][source]#

Check l0a_encodings.yml file.

Parameters:

sensor_name (str) – Name of the sensor.

Raises:

ValueError – Error raised if the value of a key is not in the list of accepted values.

disdrodb.l0.check_configs.check_l0b_encoding(sensor_name: str) None[source][source]#

Check l0b_encodings.yml file based on the schema defined in the class L0BEncodingSchema.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_raw_array(sensor_name: str) None[source][source]#

Check raw array consistency from config file.

Parameters:

sensor_name (str) – Name of the sensor.

Raises:

ValueError – Error if the chunksizes are not consistent.

disdrodb.l0.check_configs.check_raw_data_format(sensor_name: str) None[source][source]#

Check raw_data_format.yml file based on the schema defined in the class RawDataFormatSchema.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_sensor_configs(sensor_name: str) None[source][source]#

Check validity of sensor configuration YAML files.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_variable_consistency(sensor_name: str) None[source][source]#

Check variable consistency across config files.

The variables specified within l0b_encoding.yml must be defined also in the other config files. The raw_data_format.yml can contain some extra variables !

Parameters:

sensor_name (str) – Name of the sensor.

Raises:

ValueError – If the keys are not consistent.

disdrodb.l0.check_configs.check_yaml_files_exists(sensor_name: str) None[source][source]#

Check if all L0 config YAML files exist.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_standards module#

Check data standards.

disdrodb.l0.check_standards.check_l0a_column_names(df: DataFrame, sensor_name: str) None[source][source]#

Checks that the dataframe columns respects DISDRODB standards.

Parameters:
Raises:

ValueError – Error if some columns do not meet the DISDRODB standards or if the 'time' column is missing in the dataframe.

disdrodb.l0.check_standards.check_l0a_standards(df: DataFrame, sensor_name: str, logger=None, verbose: bool = True) None[source][source]#

Checks that a file respects the DISDRODB L0A standards.

Parameters:
  • df (pandas.DataFrame) – L0A dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool, optional) – Whether to verbose the processing. The default value is True.

Raises:

ValueError – Error if some columns have inconsistent values.

disdrodb.l0.check_standards.check_l0b_standards(x: str) None[source][source]#

Check L0B standards.

disdrodb.l0.l0_reader module#

Define DISDRODB L0 readers routines.

disdrodb.l0.l0_reader.available_readers(sensor_name, data_sources=None, return_path=False)[source][source]#

Retrieve available readers information.

disdrodb.l0.l0_reader.check_metadata_reader(metadata)[source][source]#

Check the metadata reader key is available and points to an existing disdrodb reader.

disdrodb.l0.l0_reader.check_reader_arguments(reader)[source][source]#

Check the reader function have the expected input arguments.

disdrodb.l0.l0_reader.check_reader_exists(reader_reference, sensor_name)[source][source]#

Check the reader exists.

disdrodb.l0.l0_reader.check_reader_reference(reader_reference)[source][source]#

Check the reader_reference value.

disdrodb.l0.l0_reader.check_software_readers()[source][source]#

Check the validity of all readers included in disdrodb software .

disdrodb.l0.l0_reader.define_reader_path(sensor_name, reader_reference)[source][source]#

Define the reader path based on the reader reference name.

disdrodb.l0.l0_reader.define_readers_directory(sensor_name='') str[source][source]#

Returns the path to the disdrodb.l0.readers directory within the disdrodb package.

disdrodb.l0.l0_reader.get_reader(reader_reference, sensor_name)[source][source]#

Retrieve the reader function.

Parameters:
  • reader_reference (str) – The reader reference name. The reader is located at disdrodb.l0.readers.{sensor_name}.{reader_reference}. The reader_reference naming convention is "{DATA_SOURCE}"/"{CAMPAIGN_NAME}_{OPTIONAL_SUFFIX}".

  • sensor_name (str) – The sensor name.

Returns:

The reader() function.

Return type:

callable

disdrodb.l0.l0_reader.get_reader_from_metadata(metadata)[source][source]#

Retrieve the reader function based on the metadata information.

The reader_reference naming convention is "{DATA_SOURCE}"/"{CAMPAIGN_NAME}_{OPTIONAL_SUFFIX}". The reader is located at disdrodb.l0.readers.{sensor_name}.{reader_reference}.

disdrodb.l0.l0_reader.get_specific_readers_path(sensor_name)[source][source]#

Returns a dictionary with the file paths of the available readers for each data source.

disdrodb.l0.l0_reader.get_specific_readers_references(sensor_name)[source][source]#

Returns a dictionary with the readers references available for each data source.

disdrodb.l0.l0_reader.get_station_reader(data_source, campaign_name, station_name, metadata_archive_dir=None)[source][source]#

Retrieve the reader function of a specific DISDRODB station.

disdrodb.l0.l0_reader.is_documented_by(original)[source][source]#

Wrapper function to apply generic docstring to the decorated function.

Parameters:

original (callable) – Function to take the docstring from.

disdrodb.l0.l0_reader.list_readers_paths(sensor_name) list[source][source]#

Returns the file paths of the available readers for a given sensor in disdrodb.l0.readers.{sensor_name}.

disdrodb.l0.l0_reader.list_readers_references(sensor_name)[source][source]#

Returns the readers references available for a given sensor in disdrodb.l0.readers.{sensor_name}.

disdrodb.l0.l0_reader.reader_generic_docstring()[source][source]#

Reader to convert a raw data file to DISDRODB L0A or L0B format.

Raw text files are read and converted to a pandas.DataFrame (L0A format). Raw netCDF files are read and converted to a xarray.Dataset (L0B format).

Parameters:
  • filepath (str) – Filepath of the raw data file to be processed.

  • logger (logging.Logger, optional) – Logger to use for logging messages. Default is None, which means no logger is used.

disdrodb.l0.l0a_processing module#

Functions to process raw text files into DISDRODB L0A Apache Parquet.

disdrodb.l0.l0a_processing.cast_column_dtypes(df: DataFrame, sensor_name: str) DataFrame[source][source]#

Convert 'object' dataframe columns into DISDRODB L0A dtype standards.

Parameters:
Returns:

Dataframe with corrected columns types.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.check_matching_column_number(df, column_names)[source][source]#

Check the number of columns in the dataframe matches the length of column names.

disdrodb.l0.l0a_processing.coerce_corrupted_values_to_nan(df: DataFrame, sensor_name: str) DataFrame[source][source]#

Coerce corrupted values in dataframe numeric columns to np.nan.

Parameters:
Returns:

Dataframe with string columns without corrupted values.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.concatenate_dataframe(list_df: list, logger=None, verbose: bool = False) DataFrame[source][source]#

Concatenate a list of dataframes.

Parameters:
  • list_df (list) – List of dataframes.

  • verbose (bool, optional) – If True, print messages. If False, no print.

Returns:

Concatenated dataframe.

Return type:

pandas.DataFrame

Raises:

ValueError – Concatenation can not be done.

disdrodb.l0.l0a_processing.drop_time_periods(df, time_periods)[source][source]#

Drop problematic time periods.

disdrodb.l0.l0a_processing.drop_timesteps(df, timesteps)[source][source]#

Drop problematic time steps.

disdrodb.l0.l0a_processing.generate_l0a(filepaths: list | str, reader, sensor_name, issue_dict=None, verbose=True, logger=None) DataFrame[source][source]#

Read and parse a list of raw files and generate a DISDRODB L0A dataframe.

Parameters:
  • filepaths (Union[list,str]) – File(s) path(s)

  • reader – DISDRODB reader function. Format: reader(filepath, logger=None)

  • sensor_name (str) – Name of the sensor.

  • issue_dict (dict, optional) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are 'timesteps' and 'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.

  • verbose (bool) – Whether to verbose the processing. The default is True.

Returns:

Dataframe

Return type:

pandas.DataFrame

Raises:

ValueError – Input parameters can not be used or the raw file can not be processed.

disdrodb.l0.l0a_processing.is_raw_array_string_not_corrupted(string)[source][source]#

Check if the raw array is corrupted.

disdrodb.l0.l0a_processing.preprocess_reader_kwargs(reader_kwargs: dict) dict[source][source]#

Preprocess arguments required to read raw text file into Pandas.

Parameters:

reader_kwargs (dict) – Initial parameter dictionary.

Returns:

Parameter dictionary that matches either Pandas or Dask.

Return type:

dict

disdrodb.l0.l0a_processing.read_l0a_dataframe(filepaths: str | list, debugging_mode: bool = False) DataFrame[source][source]#

Read DISDRODB L0A Apache Parquet file(s).

Parameters:
  • filepaths (str or list) – Either a list or a single filepath.

  • debugging_mode (bool) – If True, it reduces the amount of data to process. If filepaths is a list, it reads only the first 3 files. It selects only 100 rows sampled from the first 3 files. The default is False.

Returns:

L0A Dataframe.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.read_raw_text_file(filepath: str, column_names: list, reader_kwargs: dict, logger=None) DataFrame[source][source]#

Read a raw file into a dataframe.

Parameters:
  • filepath (str) – Raw file path.

  • column_names (list) – Column names.

  • reader_kwargs (dict) – Pandas pd.read_csv arguments.

  • logger (logging.Logger) – Logger object. The default is None. If None, the logger is created using the module name. If logger is passed, it will be used to log messages.

Returns:

Pandas dataframe.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.remove_corrupted_rows(df)[source][source]#

Remove corrupted rows by checking conversion of raw fields to numeric.

Note: The raw array must be stripped away from delimiter at start and end !

disdrodb.l0.l0a_processing.remove_duplicated_timesteps(df: DataFrame, logger=None, verbose: bool = False)[source][source]#

Remove duplicated timesteps.

It keep only the first timestep occurrence !

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe with valid unique timesteps.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.remove_issue_timesteps(df, issue_dict, logger=None, verbose=False)[source][source]#

Drop dataframe rows with timesteps listed in the issue dictionary.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • issue_dict (dict) – Issue dictionary.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe with problematic timesteps removed.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.remove_rows_with_missing_time(df: ~pandas.DataFrame, logger=<Logger disdrodb.l0.l0a_processing (WARNING)>, verbose: bool = False)[source][source]#

Remove dataframe rows where the "time" is NaT.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe with valid timesteps.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.replace_nan_flags(df, sensor_name, logger=None, verbose=False)[source][source]#

Set values corresponding to nan_flags to np.nan.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe without nan_flags values.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.sanitize_df(df, sensor_name, verbose=True, issue_dict=None, logger=None)[source][source]#

Read and parse a raw text files into a L0A dataframe.

Parameters:
  • filepath (str) – File path

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing. The default is True.

  • issue_dict (dict) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are 'timesteps' and 'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.

Returns:

Dataframe

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.set_nan_invalid_values(df, sensor_name, logger=None, verbose=False)[source][source]#

Set invalid (class) values to np.nan.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe without invalid values.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.set_nan_outside_data_range(df, sensor_name, logger=None, verbose=False)[source][source]#

Set values outside the data range as np.nan.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe without values outside the expected data range.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.strip_delimiter(string)[source][source]#

Remove the first and last delimiter occurrence from a string.

disdrodb.l0.l0a_processing.strip_delimiter_from_raw_arrays(df)[source][source]#

Remove the first and last delimiter occurrence from the raw array fields.

disdrodb.l0.l0a_processing.strip_string_spaces(df: DataFrame, sensor_name: str) DataFrame[source][source]#

Strip leading/trailing spaces from dataframe string columns.

Parameters:
Returns:

Dataframe with string columns without leading/trailing spaces.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.write_l0a(df: DataFrame, filepath: str, force: bool = False, logger=None, verbose: bool = False)[source][source]#

Save the dataframe into an Apache Parquet file.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • filepath (str) – Output file path.

  • force (bool, optional) – Whether to overwrite existing data. If True, overwrite existing data in destination directories. If False, raise an error if there are already data in destination directories. This is the default.

  • verbose (bool, optional) – Whether to verbose the processing. The default is False.

Raises:
  • ValueError – The input dataframe can not be written as an Apache Parquet file.

  • NotImplementedError – The input dataframe can not be processed.

disdrodb.l0.l0b_nc_processing module#

Functions to process DISDRODB raw netCDF files into DISDRODB L0B netCDF files.

disdrodb.l0.l0b_nc_processing.add_dataset_missing_variables(ds, missing_vars, sensor_name)[source][source]#

Add missing xr.Dataset variables as np.nan xr.DataArrays.

disdrodb.l0.l0b_nc_processing.drop_time_periods(ds, time_periods: list)[source][source]#

Drop all time steps within any of the specified time intervals.

Parameters:
  • ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.

  • time_periods (list of tuple) – Each tuple is (start_time, end_time), datetime-like, inclusive.

Returns:

Dataset with all times within the given periods removed.

Return type:

xarray.Dataset

Raises:

ValueError – If no timesteps remain after removal.

disdrodb.l0.l0b_nc_processing.drop_timesteps(ds, timesteps: list)[source][source]#

Drop specific time steps from a Dataset.

Parameters:
  • ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.

  • timesteps (list) – List of datetime-like values to remove.

Returns:

Dataset with specified timesteps removed.

Return type:

xarray.Dataset

Raises:

ValueError – If no timesteps remain after removal.

disdrodb.l0.l0b_nc_processing.generate_l0b_from_nc(filepaths: list | str, reader, sensor_name, metadata, issue_dict=None, verbose=True, logger=None)[source][source]#

Read and parse a list of raw netCDF files and generate a DISDRODB L0B dataset.

Parameters:
  • filepaths (Union[list,str]) – File(s) path(s)

  • reader – DISDRODB reader function. Format: reader(filepath, logger=None)

  • sensor_name (str) – Name of the sensor.

  • metadata (dict) – Station metadata to attach as global attributes to the xr.Dataset.

  • issue_dict (dict, optional) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are 'timesteps' and 'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.

  • verbose (bool) – Whether to verbose the processing. The default is True.

Returns:

DISDRODB L0B Dataset.

Return type:

xarray.Dataset

Raises:

ValueError – Input parameters can not be used or the raw file can not be processed.

disdrodb.l0.l0b_nc_processing.open_raw_netcdf_file(filepath, logger=None, engine='netcdf4', cache=False, chunks=None, decode_timedelta=False, **kwargs)[source][source]#

Open a raw netCDF file.

Parameters:

filepath (str) – Path to the raw netCDF file.

Returns:

Raw netCDF file as an xarray Dataset.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.remove_issue_timesteps(ds, issue_dict: dict, logger=None, verbose: bool = False)[source][source]#

Remove bad timesteps and time periods from an xarray Dataset according to issue definitions.

Parameters:
  • ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.

  • issue_dict (dict) – Dictionary with optional keys ‘timesteps’ (list of datetimes) and ‘time_periods’ (list of (start, end) tuples).

  • logger (optional) – Logger instance to record dropped steps, by default None.

  • verbose (bool, optional) – Whether to log informational messages, by default False.

Returns:

Cleaned dataset.

Return type:

xarray.Dataset

Raises:

ValueError – If after removing specified timesteps/periods no data remains.

disdrodb.l0.l0b_nc_processing.rename_dataset(ds, dict_names)[source][source]#

Rename xr.Dataset variables, coordinates and dimensions.

disdrodb.l0.l0b_nc_processing.replace_custom_nan_flags(ds, dict_nan_flags, logger=None, verbose=False)[source][source]#

Set values corresponding to nan_flags to np.nan.

This function must be used in a reader, if necessary.

Parameters:
  • df (xarray.Dataset) – Input xarray dataset

  • dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan.

  • verbose (bool) – Whether to verbose the processing. The default value is False.

Returns:

Dataset without nan_flags values.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.replace_nan_flags(ds, sensor_name, verbose, logger=None)[source][source]#

Set values corresponding to nan_flags to np.nan.

Parameters:
  • ds (xarray.Dataset) – Input xarray dataset

  • dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan

  • verbose (bool) – Whether to verbose the processing.

Returns:

Dataset without nan_flags values.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.sanitize_ds(ds, sensor_name, metadata, issue_dict=None, verbose=False, logger=None)[source][source]#

Convert a raw xr.Dataset into a DISDRODB L0B netCDF.

Parameters:
  • ds (xarray.Dataset) – Raw xarray dataset

  • metadata (dict) – Station metadata to attach as global attributes to the xr.Dataset.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing.

Returns:

L0B xr.Dataset

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.set_nan_invalid_values(ds, sensor_name, verbose, logger=None)[source][source]#

Set invalid (class) values to np.nan.

Parameters:
  • ds (xarray.Dataset) – Input xarray dataset

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing.

Returns:

Dataset without invalid values.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.set_nan_outside_data_range(ds, sensor_name, verbose, logger=None)[source][source]#

Set values outside the data range as np.nan.

Parameters:
  • ds (xarray.Dataset) – Input xarray dataset

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing.

Returns:

Dataset without values outside the expected data range.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.standardize_raw_dataset(ds, dict_names, sensor_name)[source][source]#

This function preprocess raw netCDF to improve compatibility with DISDRODB standards.

This function checks validity of the dict_names, rename and subset the data accordingly. If some variables specified in the dict_names are missing, it adds a np.nan xr.DataArray !

Parameters:
  • ds (xarray.Dataset) – Raw netCDF to be converted to DISDRODB standards.

  • dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.

  • sensor_name (str) – Sensor name.

Returns:

ds – xarray Dataset with variables compliant with DISDRODB conventions.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.subset_dataset(ds, dict_names, sensor_name)[source][source]#

Subset xr.Dataset with expected variables.

disdrodb.l0.l0b_processing module#

Functions to process DISDRODB L0A files into DISDRODB L0B netCDF files.

disdrodb.l0.l0b_processing.convert_object_variables_to_string(ds: Dataset) Dataset[source][source]#

Convert variables with object dtype to string.

Parameters:

ds (xarray.Dataset) – Input dataset.

Returns:

Output dataset.

Return type:

xarray.Dataset

disdrodb.l0.l0b_processing.ensure_valid_geolocation(ds: Dataset, coord: str, errors: str = 'ignore') Dataset[source][source]#

Ensure valid geolocation coordinates.

‘altitude’ must be >= 0, ‘latitude’ must be within [-90, 90] and ‘longitude’ within [-180, 180].

It can deal with coordinates varying with time.

Parameters:
  • ds (xarray.Dataset) – Dataset containing the coordinate.

  • coord (str) – Name of the coordinate variable to validate.

  • errors (str, optional) –

    How to handle invalid values. Options are:

    • ”ignore”: nothing is done.

    • ”raise” : raise ValueError if invalid values are found.

    • ”coerce”: out-of-range values are replaced with NaN.

Returns:

Dataset with validated coordinate values.

Return type:

xarray.Dataset

disdrodb.l0.l0b_processing.finalize_dataset(ds, sensor_name, metadata)[source][source]#

Finalize DISDRODB L0B Dataset.

disdrodb.l0.l0b_processing.format_string_array(string: str, n_values: int)[source][source]#

Split a string with multiple numbers separated by a delimiter into an 1D array.

format_string_array(“2,44,22,33”, 4) will return [ 2. 44. 22. 33.]

If empty string (“”) or “” –> Return an arrays of zeros If the list length is not n_values -> Return an arrays of np.nan

The function strip potential delimiters at start and end before splitting.

Parameters:
  • string (str) – Input string

  • n_values (int) – Expected length of the output array.

Returns:

array of float

Return type:

numpy.ndarray

disdrodb.l0.l0b_processing.generate_l0b(df: DataFrame, metadata: dict, logger=None, verbose: bool = False) Dataset[source][source]#

Transform the DISDRODB L0A dataframe to the DISDRODB L0B xr.Dataset.

Parameters:
  • df (pandas.DataFrame) – DISDRODB L0A dataframe. The raw drop number spectrum is reshaped to a 2D(+time) array. The raw drop concentration and velocity are reshaped to 1D(+time) arrays.

  • metadata (dict) – DISDRODB station metadata. To use this function outside the DISDRODB routines, the dictionary must contain the fields: sensor_name, latitude, longitude, altitude, platform_type.

  • verbose (bool, optional) – Whether to verbose the processing. The default value is False.

Returns:

DISDRODB L0B dataset.

Return type:

xarray.Dataset

Raises:

ValueError – Error if the DISDRODB L0B xarray dataset can not be created.

disdrodb.l0.l0b_processing.infer_split_str(string: str) str[source][source]#

Infer the delimiter inside a string.

Parameters:

string (str) – Input string.

Returns:

Inferred delimiter.

Return type:

str

disdrodb.l0.l0b_processing.replace_empty_strings_with_zeros(values)[source][source]#

Replace empty comma separated strings with ‘0’.

disdrodb.l0.l0b_processing.reshape_raw_spectrum(arr, dims_order: list, dims_size_dict: dict, n_timesteps: int)[source][source]#

Reshape the raw spectrum to a 2D+time array.

The array has dimensions [“time”] + dims_order

Parameters:
  • arr (numpy.ndarray) – Input array.

  • dims_order (list) – The order of dimension in the raw spectrum. For OTT PARSIVEL spectrum [v1d1 … v1d32, v2d1, …, v2d32], thus dims_order = ["diameter_bin_center", "velocity_bin_center"] For Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2], thus dims_order = ["velocity_bin_center", "diameter_bin_center"]

  • dims_size_dict (dict) – Dictionary with the number of bins for each dimension. For PARSIVEL and PARSIVEL2: {"diameter_bin_center": 32, "velocity_bin_center": 32} For LPM: {"diameter_bin_center": 22, "velocity_bin_center": 20} For PWS100: {"diameter_bin_center": 34, "velocity_bin_center": 34}

  • n_timesteps (int) – Number of timesteps.

Returns:

Output array.

Return type:

numpy.ndarray

Raises:

ValueError – Impossible to reshape the raw_spectrum matrix

disdrodb.l0.l0b_processing.retrieve_l0b_arrays(df: DataFrame, sensor_name: str, logger=None, verbose: bool = False) dict[source][source]#

Retrieves the L0B data matrix.

Parameters:
Returns:

Dictionary with data arrays.

Return type:

dict

disdrodb.l0.l0b_processing.set_geolocation_coordinates(ds, metadata)[source][source]#

Add geolocation coordinates to dataset.

disdrodb.l0.l0b_processing.set_l0b_encodings(ds: Dataset, sensor_name: str)[source][source]#

Apply the L0B encodings to the xarray Dataset.

Parameters:
  • ds (xarray.Dataset) – Input xarray dataset.

  • sensor_name (str) – Name of the sensor.

Returns:

Output xarray dataset.

Return type:

xarray.Dataset

disdrodb.l0.l0b_processing.set_variable_attributes(ds: Dataset, sensor_name: str) Dataset[source][source]#

Set attributes to each xr.Dataset variable.

Parameters:
Returns:

Dataset with variable attributes.

Return type:

xarray.Dataset

disdrodb.l0.l0c_processing module#

Functions to process DISDRODB L0B files into DISDRODB L0C netCDF files.

disdrodb.l0.l0c_processing.check_timesteps_regularity(ds, sample_interval, verbose=False, logger=None)[source][source]#

Check for the regularity of timesteps.

disdrodb.l0.l0c_processing.create_l0c_datasets(event_info, measurement_intervals, ensure_variables_equality=True, logger=None, verbose=True)[source][source]#

Create a single dataset by merging and processing data from multiple filepaths.

Parameters:

event_info (dict) – Dictionary with start_time, end_time and filepaths keys.

Returns:

A dictionary with an xarray.Dataset for each measurement interval.

Return type:

dict

Raises:

ValueError – If less than 5 timesteps are available for the specified day.

Notes

  • Data is loaded into memory and connections to source files are closed before returning the dataset.

  • Tolerance in input files is used around expected dataset start_time and end_time to account for imprecise logging times and ensuring correct definition of qc_time at files boundaries (e.g. 00:00).

  • Duplicated timesteps with different raw drop number values are dropped

  • First occurrence of duplicated timesteps with equal raw drop number values is kept.

  • Regularizes timesteps to handle trailing seconds.

disdrodb.l0.l0c_processing.drop_timesteps_with_invalid_sample_interval(ds, measurement_intervals, verbose=True, logger=None)[source][source]#

Drop timesteps with unexpected sample intervals.

disdrodb.l0.l0c_processing.finalize_l0c_dataset(ds, sample_interval, sensor_name, verbose=True, logger=None)[source][source]#

Finalize a L0C dataset with unique sampling interval.

It adds the sampling_interval coordinate and it regularizes the timesteps for trailing seconds.

disdrodb.l0.l0c_processing.generate_l0c(ds, measurement_interval, ensure_variables_equality=True, logger=None, verbose=True)[source][source]#

Generate a single L0C dataset for a specific measurement interval.

This is a convenience wrapper around generate_l0c_datasets that returns only the dataset corresponding to the specified measurement interval.

Parameters:
  • ds (xarray.Dataset) – Input L0B dataset to process.

  • measurement_interval (int) – The expected measurement interval (in seconds) of the data.

  • ensure_variables_equality (bool, optional) – If True, drops duplicated timesteps where variables other than ‘raw_drop_number’ have different values. If False, keeps duplicated timesteps if ‘raw_drop_number’ values are equal, even if other variables differ. Default is True.

  • logger (logging.Logger, optional) – Logger instance for logging warnings and information. Default is None.

  • verbose (bool, optional) – If True, prints log messages to console in addition to logging. Default is True.

Returns:

Processed L0C dataset for the specified measurement interval with regularized timesteps and quality flags.

Return type:

xarray.Dataset

Notes

Processing steps: 1. Drops timesteps with invalid measurement interval (if ‘sample_interval’ variable exists) 2. Removes duplicated timesteps based on ‘raw_drop_number’ equality 4. Regularizes timesteps to handle trailing seconds 6. Adds sample_interval coordinate and updates attributes

disdrodb.l0.l0c_processing.generate_l0c_datasets(ds, measurement_intervals, ensure_variables_equality=True, logger=None, verbose=True)[source][source]#

Generate L0C datasets from L0B data separating timesteps with different measurement intervals.

This function processes an L0B dataset by removing invalid and duplicated timesteps, splitting timesteps by measurement intervals, and finalizing each subset into L0C format.

Parameters:
  • ds (xarray.Dataset) – Input L0B dataset to process.

  • measurement_intervals (list or int) – List of expected measurement intervals (in seconds).

  • ensure_variables_equality (bool, optional) – If True, drops duplicated timesteps where variables other than ‘raw_drop_number’ have different values. If False, keeps duplicated timesteps if ‘raw_drop_number’ values are equal, even if other variables differ. Default is True.

  • logger (logging.Logger, optional) – Logger instance for logging warnings and information. Default is None.

  • verbose (bool, optional) – If True, prints log messages to console in addition to logging. Default is True.

Returns:

Dictionary with measurements intervals (int, in seconds) as keys and processed xarray.Dataset objects as values. Each dataset contains data for a single measurement interval with regularized timesteps and quality flags.

Return type:

dict

Notes

Processing steps: 1. Drops timesteps with invalid sample intervals (if ‘sample_interval’ variable exists) 2. Removes duplicated timesteps based on ‘raw_drop_number’ equality 3. Splits dataset by sampling intervals 4. Regularizes timesteps to handle trailing seconds 5. Adds quality control flags for time coordinate 6. Adds sample_interval coordinate and updates attributes

Multiple sampling intervals may be present in a single input dataset, resulting in multiple output datasets.

disdrodb.l0.l0c_processing.get_problematic_timestep_indices(timesteps, sample_interval)[source][source]#

Identify timesteps with missing previous or following timesteps.

disdrodb.l0.l0c_processing.has_same_value_over_time(da)[source][source]#

Check if a DataArray has the same value over all timesteps, considering NaNs as equal.

Parameters:

da (xarray.DataArray) – The DataArray to check. Must have a ‘time’ dimension.

Returns:

True if the values are the same (or NaN in the same positions) across all timesteps, False otherwise.

Return type:

bool

disdrodb.l0.l0c_processing.nearest_expected_times(times, expected_times)[source][source]#

Return index of nearest expected time.

disdrodb.l0.l0c_processing.regularize_timesteps(ds, sample_interval, robust=False, add_quality_flag=True, logger=None, verbose=True)[source][source]#

Ensure timesteps match with the sample_interval.

This function: - drop dataset indices with duplicated timesteps, - but does not add missing timesteps to the dataset.

disdrodb.l0.l0c_processing.remove_duplicated_timesteps(ds, ensure_variables_equality=True, logger=None, verbose=True)[source][source]#

Removes duplicated timesteps from a xarray dataset.

disdrodb.l0.l0c_processing.split_dataset_by_sampling_intervals(ds, measurement_intervals, min_sample_interval=10, min_block_size=5, time_is_interval_end=True)[source][source]#

Split a dataset into subsets where each subset has a consistent sampling interval.

Parameters:
  • ds (xarray.Dataset) – The input dataset with a ‘time’ dimension.

  • measurement_intervals (list or array-like) – A list of possible primary sampling intervals (in seconds) that the dataset might have.

  • min_sample_interval (int, optional) – The minimum expected sampling interval in seconds. Defaults to 10s. This is used to deal with possible trailing seconds errors.

  • min_block_size (float, optional) – The minimum number of timesteps with a given sampling interval to be considered. Otherwise such portion of data is discarded ! Defaults to 5 timesteps.

  • time_is_interval_end (bool) – Whether time refers to the end of the measurement interval. The default is True.

Notes

Does not modify timesteps (regularization is left to regularize_timesteps).

Assumes no duplicated timesteps in the dataset.

If only one measurement interval is specified, no timestep-diff checks are performed.

If multiple measurement intervals are specified:

  • Raises an error if none of the expected intervals appear.

  • Splits where interval changes.

Segments shorter than min_block_size are discarded.

Returns:

A dictionary where keys are the identified sampling intervals (in seconds), and values are xarray.Datasets containing only data from those sampling intervals.

Return type:

dict[int, xarray.Dataset]

disdrodb.l0.standards module#

Retrieve L0 sensor standards.

disdrodb.l0.standards.allowed_l0_variables(sensor_name: str) list[source][source]#

Get the list of allowed L0 variables for a given sensor.

disdrodb.l0.standards.get_bin_coords_dict(sensor_name: str) dict[source][source]#

Retrieve diameter (and velocity) bin coordinates.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with coordinates arrays.

Return type:

dict

disdrodb.l0.standards.get_data_format_dict(sensor_name: str) dict[source][source]#

Get a dictionary containing the data format of each sensor variable.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Data format of each sensor variable.

Return type:

dict

disdrodb.l0.standards.get_data_range_dict(sensor_name: str) dict[source][source]#

Get the variable data range.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected data value range for each data field. It excludes variables without specified data_range key.

Return type:

dict

disdrodb.l0.standards.get_diameter_bin_center(sensor_name: str) list[source][source]#

Get diameter bin center.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Diameter bin center.

Return type:

list

disdrodb.l0.standards.get_diameter_bin_edges(sensor_name: str) list[source][source]#

Get diameter bin edges.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Diameter bin edges.

Return type:

list

disdrodb.l0.standards.get_diameter_bin_lower(sensor_name: str) list[source][source]#

Get diameter bin lower bound.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Diameter bin lower bound.

Return type:

list

disdrodb.l0.standards.get_diameter_bin_upper(sensor_name: str) list[source][source]#

Get diameter bin upper bound.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Diameter bin upper bound.

Return type:

list

disdrodb.l0.standards.get_diameter_bin_width(sensor_name: str) list[source][source]#

Get diameter bin width.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Diameter bin width.

Return type:

list

disdrodb.l0.standards.get_diameter_bins_dict(sensor_name: str) dict[source][source]#

Get dictionary with sensor_name diameter bins information.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Sensor diameter bins information.

Return type:

dict

disdrodb.l0.standards.get_dims_size_dict(sensor_name: str) dict[source][source]#

Get the number of bins for each dimension.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the number of bins for each dimension.

Return type:

dict

disdrodb.l0.standards.get_field_nchar_dict(sensor_name: str) dict[source][source]#

Get the total number of characters from the instrument default string standards.

Important note: it accounts also for the comma and the minus sign !!!

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected number of characters for each data field.

Return type:

dict

disdrodb.l0.standards.get_field_ndigits_decimals_dict(sensor_name: dict) dict[source][source]#

Get number of digits on the right side of the comma from the instrument default string standards.

Example: 123,45 -> 45 –> 2 decimal digits.

Parameters:

sensor_name (dict) – Name of the sensor.

Returns:

Dictionary with the expected number of decimal digits for each data field.

Return type:

dict

disdrodb.l0.standards.get_field_ndigits_dict(sensor_name: str) dict[source][source]#

Get number of digits from the instrument default string standards.

Important note: it excludes the comma but it counts the minus sign !!!

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected number of digits for each data field.

Return type:

dict

disdrodb.l0.standards.get_field_ndigits_natural_dict(sensor_name: str) dict[source][source]#

Get number of digits on the left side of the comma from the instrument default string standards.

Example: 123,45 -> 123 –> 3 natural digits.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected number of natural digits for each data field.

Return type:

dict

disdrodb.l0.standards.get_l0a_dtype(sensor_name: str) dict[source][source]#

Get a dictionary containing the L0A dtype.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the L0A dtype.

Return type:

dict

disdrodb.l0.standards.get_l0a_encodings_dict(sensor_name: str) dict[source][source]#

Get a dictionary containing the L0A encodings.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

L0A encodings.

Return type:

dict

disdrodb.l0.standards.get_l0b_cf_attrs_dict(sensor_name: str) dict[source][source]#

Get a dictionary containing the CF attributes of each sensor variable.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

CF attributes of each sensor variable. For each variable, the ‘units’, ‘description’, and ‘long_name’ attributes are specified.

Return type:

dict

disdrodb.l0.standards.get_l0b_encodings_dict(sensor_name: str) dict[source][source]#

Get a dictionary containing the encoding to write L0B netCDFs.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Encoding to write L0B netCDFs

Return type:

dict

disdrodb.l0.standards.get_n_diameter_bins(sensor_name)[source][source]#

Get the number of diameter bins.

disdrodb.l0.standards.get_n_velocity_bins(sensor_name)[source][source]#

Get the number of velocity bins.

disdrodb.l0.standards.get_nan_flags_dict(sensor_name: str) dict[source][source]#

Get the variable nan_flags.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected nan_flags list for each data field. It excludes variables without specified nan_flags key.

Return type:

dict

disdrodb.l0.standards.get_raw_array_dims_order(sensor_name: str) dict[source][source]#

Get the dimension order of the raw fields.

The order of dimension specified for raw_drop_number controls the reshaping of the precipitation raw spectrum.

Examples

OTT Parsivel spectrum [d1v1 … d32v1, d1v2, …, d32v2] (diameter increases first) –> dimension_order = [“velocity_bin_center”, “diameter_bin_center”] Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] (velocity increases first) –> dimension_order = [“diameter_bin_center”, “velocity_bin_center”] PWS 100 spectrum [d1v1 … d1v34, d2v1, …, d2v34] (velocity increases first) –> dimension_order = [“diameter_bin_center”, “velocity_bin_center”]

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dimension order dictionary.

Return type:

dict

disdrodb.l0.standards.get_raw_array_nvalues(sensor_name: str) dict[source][source]#

Get a dictionary with the number of values expected for each raw array.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Field definition.

Return type:

dict

disdrodb.l0.standards.get_sensor_logged_variables(sensor_name: str) list[source][source]#

Get the sensor logged variables list.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

List of the variables logged by the sensor.

Return type:

list

disdrodb.l0.standards.get_valid_coordinates_names(sensor_name)[source][source]#

Get list of valid coordinates for DISDRODB L0B.

disdrodb.l0.standards.get_valid_dimension_names(sensor_name)[source][source]#

Get list of valid dimension names for DISDRODB L0B.

disdrodb.l0.standards.get_valid_names(sensor_name)[source][source]#

Return the list of valid variable and coordinates names for DISDRODB L0B.

disdrodb.l0.standards.get_valid_values_dict(sensor_name: str) dict[source][source]#

Get the list of valid values for a variable.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected values for specific variables. It excludes variables without specified valid_values key.

Return type:

dict

disdrodb.l0.standards.get_valid_variable_names(sensor_name)[source][source]#

Get list of valid variables.

disdrodb.l0.standards.get_variables_dimension(sensor_name: str)[source][source]#

Returns a dictionary with the variable dimensions of a L0B product.

disdrodb.l0.standards.get_velocity_bin_center(sensor_name: str) list[source][source]#

Get velocity bin center.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Velocity bin center.

Return type:

list

disdrodb.l0.standards.get_velocity_bin_edges(sensor_name: str) list[source][source]#

Get velocity bin edges.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Velocity bin edges.

Return type:

list

disdrodb.l0.standards.get_velocity_bin_lower(sensor_name: str) list[source][source]#

Get velocity bin lower bound.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Velocity bin lower bound.

Return type:

list

disdrodb.l0.standards.get_velocity_bin_upper(sensor_name: str) list[source][source]#

Get velocity bin upper bound.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Velocity bin upper bound.

Return type:

list

disdrodb.l0.standards.get_velocity_bin_width(sensor_name: str) list[source][source]#

Get velocity bin width.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Velocity bin width.

Return type:

list

disdrodb.l0.standards.get_velocity_bins_dict(sensor_name: str) dict[source][source]#

Get velocity with sensor_name diameter bins information.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Sensor velocity bins information.

Return type:

dict

disdrodb.l0.template_tools module#

Useful tools helping in the implementation of the DISDRODB L0 readers.

disdrodb.l0.template_tools.check_column_names(column_names: list, sensor_name: str) None[source][source]#

Checks that the column names respects DISDRODB standards.

Parameters:
  • column_names (list) – List of columns names.

  • sensor_name (str) – Name of the sensor.

Raises:

TypeError – Error if some columns do not meet the DISDRODB standards.

disdrodb.l0.template_tools.get_decimal_ndigits(string: str) int[source][source]#

Get the number of decimal digits.

Parameters:

string (str) – Input string.

Returns:

The number of decimal digits.

Return type:

int

disdrodb.l0.template_tools.get_df_columns_unique_values_dict(df: DataFrame, column_indices: int | slice | list | None = None, column_names: bool = True)[source][source]#

Create a dictionary {column: unique values}.

Parameters:
  • df (pandas.DataFrame) – Input dataframe

  • column_indices (Union[int,slice,list], optional) – Column indices. If None, select all columns.

  • column_names (bool, optional) – If True, the dictionary key are the column names. The default value is True.

disdrodb.l0.template_tools.get_natural_ndigits(string: str) int[source][source]#

Get the number of natural digits.

Parameters:

string (str) – Input string.

Returns:

The number of natural digits.

Return type:

int

disdrodb.l0.template_tools.get_nchar(string: str) int[source][source]#

Get the number of characters.

Parameters:

string (str) – Input string.

Returns:

The number of characters.

Return type:

int

disdrodb.l0.template_tools.get_ndigits(string: str) int[source][source]#

Get the number of total numeric digits.

Parameters:

string (str) – Input string

Returns:

The number of total digits.

Return type:

int

disdrodb.l0.template_tools.get_unique_sorted_values(array)[source][source]#

Return unique sorted values.

It deals with np.nan within an array of string by converting object dtype to str.

disdrodb.l0.template_tools.infer_column_names(df: DataFrame, sensor_name: str, row_idx: int = 0)[source][source]#

Try to guess the dataframe columns names based on string characteristics.

Parameters:
  • df (pandas.DataFrame) – The dataframe to analyse.

  • sensor_name (str) – name of the sensor.

  • row_idx (int, optional) – The row index of the dataframe to use to infer the column names. The default row index is 0.

Returns:

Dictionary with the keys being the column id and the values being the guessed column names

Return type:

dict

disdrodb.l0.template_tools.print_allowed_column_names(sensor_name: str) None[source][source]#

Print valid columns names from the standard.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.template_tools.print_df_column_names(df: DataFrame) None[source][source]#

Print dataframe columns names.

Parameters:

df (pandas.DataFrame) – The dataframe.

disdrodb.l0.template_tools.print_df_columns_unique_values(df: DataFrame, column_indices: int | slice | list | None = None, print_column_names: bool = True) None[source][source]#

Print columns’ unique values.

Parameters:
  • df (pandas.DataFrame) – Input dataframe

  • column_indices (Union[int,slice,list], optional) – Column indices. If None, select all columns.

  • column_names (bool, optional) – If True, print the column names. The default value is True.

disdrodb.l0.template_tools.print_df_first_n_rows(df: DataFrame, n: int = 5, print_column_names: bool = True) None[source][source]#

Print the n first n rows dataframe by column.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • n (int, optional) – Number of row. The default is 5.

  • column_names (bool , optional) – If true columns name are printed, by default True.

disdrodb.l0.template_tools.print_df_random_n_rows(df: DataFrame, n: int = 5, print_column_names: bool = True) None[source][source]#

Print the content of the dataframe by column, randomly chosen.

Parameters:
  • df (pandas.DataFrame) – The dataframe.

  • n (int, optional) – The number of row to print. The default is 5.

  • print_column_names (bool, optional) – If true, print the column names. The default value is True.

disdrodb.l0.template_tools.print_df_summary_stats(df: DataFrame, column_indices: int | slice | list | None = None, print_column_names: bool = True)[source][source]#

Create a columns statistics summary.

Parameters:
  • df (pandas.DataFrame) – Input dataframe

  • column_indices (Union[int,slice,list], optional) – Column indices. If None, select all columns.

  • print_column_names (bool, optional) – If True, print the column names. The default value is True.

Raises:

ValueError – Error if columns types is not numeric.

disdrodb.l0.template_tools.print_df_with_any_nan_rows(df: DataFrame) None[source][source]#

Print empty rows.

Parameters:

df (pandas.DataFrame) – Input dataframe.

disdrodb.l0.template_tools.str_has_decimal_digits(string: str) bool[source][source]#

Check if a string has decimals.

Parameters:

string (str) – Input string.

Returns:

True if string has digits.

Return type:

bool

disdrodb.l0.template_tools.str_is_integer(string: str) bool[source][source]#

Check if a string represent an integer.

Parameters:

string (str) – Input string.

Returns:

True if integer.

Return type:

bool

disdrodb.l0.template_tools.str_is_number(string: str) bool[source][source]#

Check if a string represents a number.

Parameters:

string (str) – Input string.

Returns:

True if float.

Return type:

bool

Module contents#

DISDRODB L0 software.

disdrodb.l0.available_readers(sensor_name, data_sources=None, return_path=False)[source][source]#

Retrieve available readers information.

disdrodb.l0.generate_l0a(filepaths: list | str, reader, sensor_name, issue_dict=None, verbose=True, logger=None) DataFrame[source][source]#

Read and parse a list of raw files and generate a DISDRODB L0A dataframe.

Parameters:
  • filepaths (Union[list,str]) – File(s) path(s)

  • reader – DISDRODB reader function. Format: reader(filepath, logger=None)

  • sensor_name (str) – Name of the sensor.

  • issue_dict (dict, optional) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are 'timesteps' and 'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.

  • verbose (bool) – Whether to verbose the processing. The default is True.

Returns:

Dataframe

Return type:

pandas.DataFrame

Raises:

ValueError – Input parameters can not be used or the raw file can not be processed.

disdrodb.l0.generate_l0b(df: DataFrame, metadata: dict, logger=None, verbose: bool = False) Dataset[source][source]#

Transform the DISDRODB L0A dataframe to the DISDRODB L0B xr.Dataset.

Parameters:
  • df (pandas.DataFrame) – DISDRODB L0A dataframe. The raw drop number spectrum is reshaped to a 2D(+time) array. The raw drop concentration and velocity are reshaped to 1D(+time) arrays.

  • metadata (dict) – DISDRODB station metadata. To use this function outside the DISDRODB routines, the dictionary must contain the fields: sensor_name, latitude, longitude, altitude, platform_type.

  • verbose (bool, optional) – Whether to verbose the processing. The default value is False.

Returns:

DISDRODB L0B dataset.

Return type:

xarray.Dataset

Raises:

ValueError – Error if the DISDRODB L0B xarray dataset can not be created.

disdrodb.l0.generate_l0b_from_nc(filepaths: list | str, reader, sensor_name, metadata, issue_dict=None, verbose=True, logger=None)[source][source]#

Read and parse a list of raw netCDF files and generate a DISDRODB L0B dataset.

Parameters:
  • filepaths (Union[list,str]) – File(s) path(s)

  • reader – DISDRODB reader function. Format: reader(filepath, logger=None)

  • sensor_name (str) – Name of the sensor.

  • metadata (dict) – Station metadata to attach as global attributes to the xr.Dataset.

  • issue_dict (dict, optional) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are 'timesteps' and 'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.

  • verbose (bool) – Whether to verbose the processing. The default is True.

Returns:

DISDRODB L0B Dataset.

Return type:

xarray.Dataset

Raises:

ValueError – Input parameters can not be used or the raw file can not be processed.

disdrodb.l0.generate_l0c(ds, measurement_interval, ensure_variables_equality=True, logger=None, verbose=True)[source][source]#

Generate a single L0C dataset for a specific measurement interval.

This is a convenience wrapper around generate_l0c_datasets that returns only the dataset corresponding to the specified measurement interval.

Parameters:
  • ds (xarray.Dataset) – Input L0B dataset to process.

  • measurement_interval (int) – The expected measurement interval (in seconds) of the data.

  • ensure_variables_equality (bool, optional) – If True, drops duplicated timesteps where variables other than ‘raw_drop_number’ have different values. If False, keeps duplicated timesteps if ‘raw_drop_number’ values are equal, even if other variables differ. Default is True.

  • logger (logging.Logger, optional) – Logger instance for logging warnings and information. Default is None.

  • verbose (bool, optional) – If True, prints log messages to console in addition to logging. Default is True.

Returns:

Processed L0C dataset for the specified measurement interval with regularized timesteps and quality flags.

Return type:

xarray.Dataset

Notes

Processing steps: 1. Drops timesteps with invalid measurement interval (if ‘sample_interval’ variable exists) 2. Removes duplicated timesteps based on ‘raw_drop_number’ equality 4. Regularizes timesteps to handle trailing seconds 6. Adds sample_interval coordinate and updates attributes

See also

generate_l0c_datasets

disdrodb.l0.get_reader(reader_reference, sensor_name)[source][source]#

Retrieve the reader function.

Parameters:
  • reader_reference (str) – The reader reference name. The reader is located at disdrodb.l0.readers.{sensor_name}.{reader_reference}. The reader_reference naming convention is "{DATA_SOURCE}"/"{CAMPAIGN_NAME}_{OPTIONAL_SUFFIX}".

  • sensor_name (str) – The sensor name.

Returns:

The reader() function.

Return type:

callable

disdrodb.l0.get_station_reader(data_source, campaign_name, station_name, metadata_archive_dir=None)[source][source]#

Retrieve the reader function of a specific DISDRODB station.