disdrodb.l0 package

Contents

disdrodb.l0 package#

Subpackages#

Submodules#

disdrodb.l0.check_configs module#

Check configuration files.

class disdrodb.l0.check_configs.L0BEncodingSchema(*, contiguous: bool, dtype: str, zlib: bool, complevel: int, shuffle: bool, fletcher32: bool, _FillValue: int | float | None = None, chunksizes: int | list[int] | None, add_offset: float | None = None, scale_factor: float | None = None)[source][source]#

Bases: CustomBaseModel

Pydantic model for DISDRODB netCDF encodings.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

FillValue: int | float | None#
add_offset: float | None#
classmethod check_chunksizes_and_zlib(values)[source][source]#

Check the chunksizes validity.

classmethod check_contiguous_and_fletcher32(values)[source][source]#

Check the fletcher value validity.

classmethod check_contiguous_and_zlib(values)[source][source]#

Check the the compression value validity.

classmethod check_integer_fillvalue(values)[source][source]#

Check that integer dtypes have valid _FillValue.

chunksizes: int | list[int] | None#
complevel: int#
contiguous: bool#
dtype: str#
fletcher32: bool#
model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'hide_error_urls': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

remove_offset_and_scale_if_not_int()[source][source]#

Ensure add_offset and scale_factor only apply to integer/uint dtypes.

scale_factor: float | None#
shuffle: bool#
zlib: bool#
class disdrodb.l0.check_configs.RawDataFormatSchema(*, n_digits: int | None, n_characters: int | None, n_decimals: int | None, n_naturals: int | None, data_range: list[float] | None, nan_flags: int | float | str | None = None, valid_values: list[float] | None = None, dimension_order: list[str] | None = None, n_values: int | None = None, field_number: str | None = None)[source][source]#

Bases: CustomBaseModel

Pydantic model for the DISDRODB RAW Data Format YAML files.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod check_list_length(value)[source][source]#

Check the data_range validity.

data_range: list[float] | None#
dimension_order: list[str] | None#
field_number: str | None#
model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'hide_error_urls': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_characters: int | None#
n_decimals: int | None#
n_digits: int | None#
n_naturals: int | None#
n_values: int | None#
nan_flags: int | float | str | None#
valid_values: list[float] | None#
disdrodb.l0.check_configs.check_all_sensors_configs() None[source][source]#

Check all sensors configuration YAML files.

disdrodb.l0.check_configs.check_bin_consistency(sensor_name: str) None[source][source]#

Check bin consistency from config file.

Do not check the first and last bin !

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_cf_attributes(sensor_name: str) None[source][source]#

Check that the l0b_cf_attrs.yml description, long_name and units values are strings.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_l0a_encoding(sensor_name: str) None[source][source]#

Check l0a_encodings.yml file.

Parameters:

sensor_name (str) – Name of the sensor.

Raises:

ValueError – Error raised if the value of a key is not in the list of accepted values.

disdrodb.l0.check_configs.check_l0b_encoding(sensor_name: str) None[source][source]#

Check l0b_encodings.yml file based on the schema defined in the class L0BEncodingSchema.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_raw_array(sensor_name: str) None[source][source]#

Check raw array consistency from config file.

Parameters:

sensor_name (str) – Name of the sensor.

Raises:

ValueError – Error if the chunksizes are not consistent.

disdrodb.l0.check_configs.check_raw_data_format(sensor_name: str) None[source][source]#

Check raw_data_format.yml file based on the schema defined in the class RawDataFormatSchema.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_sensor_configs(sensor_name: str) None[source][source]#

Check validity of sensor configuration YAML files.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_variable_consistency(sensor_name: str) None[source][source]#

Check variable consistency across config files.

The variables specified within l0b_encoding.yml must be defined also in the other config files. The raw_data_format.yml can contain some extra variables !

Parameters:

sensor_name (str) – Name of the sensor.

Raises:

ValueError – If the keys are not consistent.

disdrodb.l0.check_configs.check_yaml_files_exists(sensor_name: str) None[source][source]#

Check if all L0 config YAML files exist.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_standards module#

Check data standards.

disdrodb.l0.check_standards.check_l0a_column_names(df: DataFrame, sensor_name: str) None[source][source]#

Checks that the dataframe columns respects DISDRODB standards.

Parameters:
Raises:

ValueError – Error if some columns do not meet the DISDRODB standards or if the 'time' column is missing in the dataframe.

disdrodb.l0.check_standards.check_l0a_standards(df: DataFrame, sensor_name: str, logger=None, verbose: bool = True) None[source][source]#

Checks that a file respects the DISDRODB L0A standards.

Parameters:
  • df (pandas.DataFrame) – L0A dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool, optional) – Whether to verbose the processing. The default value is True.

Raises:

ValueError – Error if some columns have inconsistent values.

disdrodb.l0.check_standards.check_l0b_standards(x: str) None[source][source]#

Check L0B standards.

disdrodb.l0.l0_reader module#

Define DISDRODB L0 readers routines.

disdrodb.l0.l0_reader.available_readers(sensor_name, data_sources=None, return_path=False)[source][source]#

Retrieve available readers information.

disdrodb.l0.l0_reader.check_metadata_reader(metadata)[source][source]#

Check the metadata reader key is available and points to an existing disdrodb reader.

disdrodb.l0.l0_reader.check_reader_arguments(reader)[source][source]#

Check the reader function have the expected input arguments.

disdrodb.l0.l0_reader.check_reader_exists(reader_reference, sensor_name)[source][source]#

Check the reader exists.

disdrodb.l0.l0_reader.check_reader_reference(reader_reference)[source][source]#

Check the reader_reference value.

disdrodb.l0.l0_reader.check_software_readers()[source][source]#

Check the validity of all readers included in disdrodb software .

disdrodb.l0.l0_reader.define_reader_path(sensor_name, reader_reference)[source][source]#

Define the reader path based on the reader reference name.

disdrodb.l0.l0_reader.define_readers_directory(sensor_name='') str[source][source]#

Returns the path to the disdrodb.l0.readers directory within the disdrodb package.

disdrodb.l0.l0_reader.get_reader(reader_reference, sensor_name)[source][source]#

Retrieve the reader function.

Parameters:
  • reader_reference (str) – The reader reference name. The reader is located at disdrodb.l0.readers.{sensor_name}.{reader_reference}. The reader_reference naming convention is "{DATA_SOURCE}"/"{CAMPAIGN_NAME}_{OPTIONAL_SUFFIX}".

  • sensor_name (str) – The sensor name.

Returns:

The reader() function.

Return type:

callable

disdrodb.l0.l0_reader.get_reader_from_metadata(metadata)[source][source]#

Retrieve the reader function based on the metadata information.

The reader_reference naming convention is "{DATA_SOURCE}"/"{CAMPAIGN_NAME}_{OPTIONAL_SUFFIX}". The reader is located at disdrodb.l0.readers.{sensor_name}.{reader_reference}.

disdrodb.l0.l0_reader.get_specific_readers_path(sensor_name)[source][source]#

Returns a dictionary with the file paths of the available readers for each data source.

disdrodb.l0.l0_reader.get_specific_readers_references(sensor_name)[source][source]#

Returns a dictionary with the readers references available for each data source.

disdrodb.l0.l0_reader.get_station_reader(data_source, campaign_name, station_name, metadata_archive_dir=None)[source][source]#

Retrieve the reader function of a specific DISDRODB station.

disdrodb.l0.l0_reader.is_documented_by(original)[source][source]#

Wrapper function to apply generic docstring to the decorated function.

Parameters:

original (function) – Function to take the docstring from.

disdrodb.l0.l0_reader.list_readers_paths(sensor_name) list[source][source]#

Returns the file paths of the available readers for a given sensor in disdrodb.l0.readers.{sensor_name}.

disdrodb.l0.l0_reader.list_readers_references(sensor_name)[source][source]#

Returns the readers references available for a given sensor in disdrodb.l0.readers.{sensor_name}.

disdrodb.l0.l0_reader.reader_generic_docstring()[source][source]#

Reader to convert a raw data file to DISDRODB L0A or L0B format.

Raw text files are read and converted to a pandas.DataFrame (L0A format). Raw netCDF files are read and converted to a xarray.Dataset (L0B format).

Parameters:
  • filepath (str) – Filepath of the raw data file to be processed.

  • logger (logging.Logger, optional) – Logger to use for logging messages. Default is None, which means no logger is used.

disdrodb.l0.l0a_processing module#

Functions to process raw text files into DISDRODB L0A Apache Parquet.

disdrodb.l0.l0a_processing.cast_column_dtypes(df: DataFrame, sensor_name: str) DataFrame[source][source]#

Convert 'object' dataframe columns into DISDRODB L0A dtype standards.

Parameters:
Returns:

Dataframe with corrected columns types.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.check_matching_column_number(df, column_names)[source][source]#

Check the number of columns in the dataframe matches the length of column names.

disdrodb.l0.l0a_processing.coerce_corrupted_values_to_nan(df: DataFrame, sensor_name: str) DataFrame[source][source]#

Coerce corrupted values in dataframe numeric columns to np.nan.

Parameters:
Returns:

Dataframe with string columns without corrupted values.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.concatenate_dataframe(list_df: list, logger=None, verbose: bool = False) DataFrame[source][source]#

Concatenate a list of dataframes.

Parameters:
  • list_df (list) – List of dataframes.

  • verbose (bool, optional) – If True, print messages. If False, no print.

Returns:

Concatenated dataframe.

Return type:

pandas.DataFrame

Raises:

ValueError – Concatenation can not be done.

disdrodb.l0.l0a_processing.drop_time_periods(df, time_periods)[source][source]#

Drop problematic time periods.

disdrodb.l0.l0a_processing.drop_timesteps(df, timesteps)[source][source]#

Drop problematic time steps.

disdrodb.l0.l0a_processing.generate_l0a(filepaths: list | str, reader, sensor_name, issue_dict=None, verbose=True, logger=None) DataFrame[source][source]#

Read and parse a list of raw files and generate a DISDRODB L0A dataframe.

Parameters:
  • filepaths (Union[list,str]) – File(s) path(s)

  • reader – DISDRODB reader function. Format: reader(filepath, logger=None)

  • sensor_name (str) – Name of the sensor.

  • issue_dict (dict, optional) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are 'timesteps' and 'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.

  • verbose (bool) – Whether to verbose the processing. The default is True.

Returns:

Dataframe

Return type:

pandas.DataFrame

Raises:

ValueError – Input parameters can not be used or the raw file can not be processed.

disdrodb.l0.l0a_processing.is_raw_array_string_not_corrupted(string)[source][source]#

Check if the raw array is corrupted.

disdrodb.l0.l0a_processing.preprocess_reader_kwargs(reader_kwargs: dict) dict[source][source]#

Preprocess arguments required to read raw text file into Pandas.

Parameters:

reader_kwargs (dict) – Initial parameter dictionary.

Returns:

Parameter dictionary that matches either Pandas or Dask.

Return type:

dict

disdrodb.l0.l0a_processing.read_l0a_dataframe(filepaths: str | list, debugging_mode: bool = False) DataFrame[source][source]#

Read DISDRODB L0A Apache Parquet file(s).

Parameters:
  • filepaths (str or list) – Either a list or a single filepath.

  • debugging_mode (bool) – If True, it reduces the amount of data to process. If filepaths is a list, it reads only the first 3 files. It selects only 100 rows sampled from the first 3 files. The default is False.

Returns:

L0A Dataframe.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.read_raw_text_file(filepath: str, column_names: list, reader_kwargs: dict, logger=None) DataFrame[source][source]#

Read a raw file into a dataframe.

Parameters:
  • filepath (str) – Raw file path.

  • column_names (list) – Column names.

  • reader_kwargs (dict) – Pandas pd.read_csv arguments.

  • logger (logging.Logger) – Logger object. The default is None. If None, the logger is created using the module name. If logger is passed, it will be used to log messages.

Returns:

Pandas dataframe.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.remove_corrupted_rows(df)[source][source]#

Remove corrupted rows by checking conversion of raw fields to numeric.

Note: The raw array must be stripped away from delimiter at start and end !

disdrodb.l0.l0a_processing.remove_duplicated_timesteps(df: DataFrame, logger=None, verbose: bool = False)[source][source]#

Remove duplicated timesteps.

It keep only the first timestep occurrence !

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe with valid unique timesteps.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.remove_issue_timesteps(df, issue_dict, logger=None, verbose=False)[source][source]#

Drop dataframe rows with timesteps listed in the issue dictionary.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • issue_dict (dict) – Issue dictionary.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe with problematic timesteps removed.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.remove_rows_with_missing_time(df: ~pandas.core.frame.DataFrame, logger=<Logger disdrodb.l0.l0a_processing (WARNING)>, verbose: bool = False)[source][source]#

Remove dataframe rows where the "time" is NaT.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe with valid timesteps.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.replace_nan_flags(df, sensor_name, logger=None, verbose=False)[source][source]#

Set values corresponding to nan_flags to np.nan.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe without nan_flags values.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.sanitize_df(df, sensor_name, verbose=True, issue_dict=None, logger=None)[source][source]#

Read and parse a raw text files into a L0A dataframe.

Parameters:
  • filepath (str) – File path

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing. The default is True.

  • issue_dict (dict) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are 'timesteps' and 'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.

Returns:

Dataframe

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.set_nan_invalid_values(df, sensor_name, logger=None, verbose=False)[source][source]#

Set invalid (class) values to np.nan.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe without invalid values.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.set_nan_outside_data_range(df, sensor_name, logger=None, verbose=False)[source][source]#

Set values outside the data range as np.nan.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing. The default is False.

Returns:

Dataframe without values outside the expected data range.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.strip_delimiter(string)[source][source]#

Remove the first and last delimiter occurrence from a string.

disdrodb.l0.l0a_processing.strip_delimiter_from_raw_arrays(df)[source][source]#

Remove the first and last delimiter occurrence from the raw array fields.

disdrodb.l0.l0a_processing.strip_string_spaces(df: DataFrame, sensor_name: str) DataFrame[source][source]#

Strip leading/trailing spaces from dataframe string columns.

Parameters:
Returns:

Dataframe with string columns without leading/trailing spaces.

Return type:

pandas.DataFrame

disdrodb.l0.l0a_processing.write_l0a(df: DataFrame, filepath: str, force: bool = False, logger=None, verbose: bool = False)[source][source]#

Save the dataframe into an Apache Parquet file.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • filepath (str) – Output file path.

  • force (bool, optional) – Whether to overwrite existing data. If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. This is the default.

  • verbose (bool, optional) – Whether to verbose the processing. The default is False.

Raises:
  • ValueError – The input dataframe can not be written as an Apache Parquet file.

  • NotImplementedError – The input dataframe can not be processed.

disdrodb.l0.l0b_nc_processing module#

Functions to process DISDRODB raw netCDF files into DISDRODB L0B netCDF files.

disdrodb.l0.l0b_nc_processing.add_dataset_missing_variables(ds, missing_vars, sensor_name)[source][source]#

Add missing xr.Dataset variables as np.nan xr.DataArrays.

disdrodb.l0.l0b_nc_processing.drop_time_periods(ds, time_periods: list)[source][source]#

Drop all time steps within any of the specified time intervals.

Parameters:
  • ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.

  • time_periods (list of tuple) – Each tuple is (start_time, end_time), datetime-like, inclusive.

Returns:

Dataset with all times within the given periods removed.

Return type:

xarray.Dataset

Raises:

ValueError – If no timesteps remain after removal.

disdrodb.l0.l0b_nc_processing.drop_timesteps(ds, timesteps: list)[source][source]#

Drop specific time steps from a Dataset.

Parameters:
  • ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.

  • timesteps (list) – List of datetime-like values to remove.

Returns:

Dataset with specified timesteps removed.

Return type:

xarray.Dataset

Raises:

ValueError – If no timesteps remain after removal.

disdrodb.l0.l0b_nc_processing.generate_l0b_from_nc(filepaths: list | str, reader, sensor_name, metadata, issue_dict=None, verbose=True, logger=None)[source][source]#

Read and parse a list of raw netCDF files and generate a DISDRODB L0B dataset.

Parameters:
  • filepaths (Union[list,str]) – File(s) path(s)

  • reader – DISDRODB reader function. Format: reader(filepath, logger=None)

  • sensor_name (str) – Name of the sensor.

  • metadata (dict) – Station metadata to attach as global attributes to the xr.Dataset.

  • issue_dict (dict, optional) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are 'timesteps' and 'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.

  • verbose (bool) – Whether to verbose the processing. The default is True.

Returns:

DISDRODB L0B Dataset.

Return type:

xarray.Dataset

Raises:

ValueError – Input parameters can not be used or the raw file can not be processed.

disdrodb.l0.l0b_nc_processing.open_raw_netcdf_file(filepath, logger=None, engine='netcdf4', cache=False, chunks=None, decode_timedelta=False, **kwargs)[source][source]#

Open a raw netCDF file.

Parameters:

filepath (str) – Path to the raw netCDF file.

Returns:

Raw netCDF file as an xarray Dataset.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.remove_issue_timesteps(ds, issue_dict: dict, logger=None, verbose: bool = False)[source][source]#

Remove bad timesteps and time periods from an xarray Dataset according to issue definitions.

Parameters:
  • ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.

  • issue_dict (dict) – Dictionary with optional keys ‘timesteps’ (list of datetimes) and ‘time_periods’ (list of (start, end) tuples).

  • logger (any, optional) – Logger instance to record dropped steps, by default None.

  • verbose (bool, optional) – Whether to log informational messages, by default False.

Returns:

Cleaned dataset.

Return type:

xarray.Dataset

Raises:

ValueError – If after removing specified timesteps/periods no data remains.

disdrodb.l0.l0b_nc_processing.rename_dataset(ds, dict_names)[source][source]#

Rename xr.Dataset variables, coordinates and dimensions.

disdrodb.l0.l0b_nc_processing.replace_custom_nan_flags(ds, dict_nan_flags, logger=None, verbose=False)[source][source]#

Set values corresponding to nan_flags to np.nan.

This function must be used in a reader, if necessary.

Parameters:
  • df (xarray.Dataset) – Input xarray dataset

  • dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan.

  • verbose (bool) – Whether to verbose the processing. The default value is False.

Returns:

Dataset without nan_flags values.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.replace_nan_flags(ds, sensor_name, verbose, logger=None)[source][source]#

Set values corresponding to nan_flags to np.nan.

Parameters:
  • ds (xarray.Dataset) – Input xarray dataset

  • dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan

  • verbose (bool) – Whether to verbose the processing.

Returns:

Dataset without nan_flags values.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.sanitize_ds(ds, sensor_name, metadata, issue_dict=None, verbose=False, logger=None)[source][source]#

Convert a raw xr.Dataset into a DISDRODB L0B netCDF.

Parameters:
  • ds (xarray.Dataset) – Raw xarray dataset

  • metadata (dict) – Station metadata to attach as global attributes to the xr.Dataset.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing.

Returns:

L0B xr.Dataset

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.set_nan_invalid_values(ds, sensor_name, verbose, logger=None)[source][source]#

Set invalid (class) values to np.nan.

Parameters:
  • ds (xarray.Dataset) – Input xarray dataset

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing.

Returns:

Dataset without invalid values.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.set_nan_outside_data_range(ds, sensor_name, verbose, logger=None)[source][source]#

Set values outside the data range as np.nan.

Parameters:
  • ds (xarray.Dataset) – Input xarray dataset

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Whether to verbose the processing.

Returns:

Dataset without values outside the expected data range.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.standardize_raw_dataset(ds, dict_names, sensor_name)[source][source]#

This function preprocess raw netCDF to improve compatibility with DISDRODB standards.

This function checks validity of the dict_names, rename and subset the data accordingly. If some variables specified in the dict_names are missing, it adds a np.nan xr.DataArray !

Parameters:
  • ds (xarray.Dataset) – Raw netCDF to be converted to DISDRODB standards.

  • dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.

  • sensor_name (str) – Sensor name.

Returns:

ds – xarray Dataset with variables compliant with DISDRODB conventions.

Return type:

xarray.Dataset

disdrodb.l0.l0b_nc_processing.subset_dataset(ds, dict_names, sensor_name)[source][source]#

Subset xr.Dataset with expected variables.

disdrodb.l0.l0b_processing module#

Functions to process DISDRODB L0A files into DISDRODB L0B netCDF files.

disdrodb.l0.l0b_processing.convert_object_variables_to_string(ds: Dataset) Dataset[source][source]#

Convert variables with object dtype to string.

Parameters:

ds (xarray.Dataset) – Input dataset.

Returns:

Output dataset.

Return type:

xarray.Dataset

disdrodb.l0.l0b_processing.ensure_valid_geolocation(ds: Dataset, coord: str, errors: str = 'ignore') Dataset[source][source]#

Ensure valid geolocation coordinates.

‘altitude’ must be >= 0, ‘latitude’ must be within [-90, 90] and ‘longitude’ within [-180, 180].

It can deal with coordinates varying with time.

Parameters:
  • ds (xarray.Dataset) – Dataset containing the coordinate.

  • coord (str) – Name of the coordinate variable to validate.

  • errors ({"ignore", "raise", "coerce"}, default "ignore") –

    • “ignore”: nothing is done.

    • ”raise” : raise ValueError if invalid values are found.

    • ”coerce”: out-of-range values are replaced with NaN.

Returns:

Dataset with validated coordinate values.

Return type:

xr.Dataset

disdrodb.l0.l0b_processing.finalize_dataset(ds, sensor_name, metadata)[source][source]#

Finalize DISDRODB L0B Dataset.

disdrodb.l0.l0b_processing.format_string_array(string: str, n_values: int) array[source][source]#

Split a string with multiple numbers separated by a delimiter into an 1D array.

e.g. : format_string_array(“2,44,22,33”, 4) will return [ 2. 44. 22. 33.]

If empty string (“”) or “” –> Return an arrays of zeros If the list length is not n_values -> Return an arrays of np.nan

The function strip potential delimiters at start and end before splitting.

Parameters:
  • string (str) – Input string

  • n_values (int) – Expected length of the output array.

Returns:

array of float

Return type:

np.array

disdrodb.l0.l0b_processing.generate_l0b(df: DataFrame, metadata: dict, logger=None, verbose: bool = False) Dataset[source][source]#

Transform the DISDRODB L0A dataframe to the DISDRODB L0B xr.Dataset.

Parameters:
  • df (pandas.DataFrame) – DISDRODB L0A dataframe. The raw drop number spectrum is reshaped to a 2D(+time) array. The raw drop concentration and velocity are reshaped to 1D(+time) arrays.

  • metadata (dict) – DISDRODB station metadata. To use this function outside the DISDRODB routines, the dictionary must contain the fields: sensor_name, latitude, longitude, altitude, platform_type.

  • verbose (bool, optional) – Whether to verbose the processing. The default value is False.

Returns:

DISDRODB L0B dataset.

Return type:

xarray.Dataset

Raises:

ValueError – Error if the DISDRODB L0B xarray dataset can not be created.

disdrodb.l0.l0b_processing.infer_split_str(string: str) str[source][source]#

Infer the delimiter inside a string.

Parameters:

string (str) – Input string.

Returns:

Inferred delimiter.

Return type:

str

disdrodb.l0.l0b_processing.replace_empty_strings_with_zeros(values)[source][source]#

Replace empty comma separated strings with ‘0’.

disdrodb.l0.l0b_processing.reshape_raw_spectrum(arr: array, dims_order: list, dims_size_dict: dict, n_timesteps: int) array[source][source]#

Reshape the raw spectrum to a 2D+time array.

The array has dimensions [“time”] + dims_order

Parameters:
  • arr (np.array) – Input array.

  • dims_order (list) – The order of dimension in the raw spectrum.

Examples

  • OTT PARSIVEL spectrum [v1d1 … v1d32, v2d1, …, v2d32]

–> dims_order = [“diameter_bin_center”, “velocity_bin_center”] - Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] –> dims_order = [“velocity_bin_center”, “diameter_bin_center”]

dims_size_dictdict

Dictionary with the number of bins for each dimension. For PARSIVEL and PARSIVEL2: {“diameter_bin_center”: 32, “velocity_bin_center”: 32} For LPM {“diameter_bin_center”: 22, “velocity_bin_center”: 20} For PWS100 {“diameter_bin_center”: 34, “velocity_bin_center”: 34}

n_timestepsint

Number of timesteps.

Returns:

Output array.

Return type:

np.array

Raises:

ValueError – Impossible to reshape the raw_spectrum matrix

disdrodb.l0.l0b_processing.retrieve_l0b_arrays(df: DataFrame, sensor_name: str, logger=None, verbose: bool = False) dict[source][source]#

Retrieves the L0B data matrix.

Parameters:
Returns:

Dictionary with data arrays.

Return type:

dict

disdrodb.l0.l0b_processing.set_geolocation_coordinates(ds, metadata)[source][source]#

Add geolocation coordinates to dataset.

disdrodb.l0.l0b_processing.set_l0b_encodings(ds: Dataset, sensor_name: str)[source][source]#

Apply the L0B encodings to the xarray Dataset.

Parameters:
  • ds (xarray.Dataset) – Input xarray dataset.

  • sensor_name (str) – Name of the sensor.

Returns:

Output xarray dataset.

Return type:

xarray.Dataset

disdrodb.l0.l0b_processing.set_variable_attributes(ds: Dataset, sensor_name: str) Dataset[source][source]#

Set attributes to each xr.Dataset variable.

Parameters:
Returns:

xr.Dataset.

Return type:

ds

disdrodb.l0.l0c_processing module#

Functions to process DISDRODB L0B files into DISDRODB L0C netCDF files.

disdrodb.l0.l0c_processing.check_timesteps_regularity(ds, sample_interval, verbose=False, logger=None)[source][source]#

Check for the regularity of timesteps.

disdrodb.l0.l0c_processing.create_l0c_datasets(event_info, measurement_intervals, sensor_name, ensure_variables_equality=True, logger=None, verbose=True)[source][source]#

Create a single dataset by merging and processing data from multiple filepaths.

Parameters:

event_info (dict) – Dictionary with start_time, end_time and filepaths keys.

Returns:

A dictionary with an xarray.Dataset for each measurement interval.

Return type:

dict

Raises:

ValueError – If less than 5 timesteps are available for the specified day.

Notes

  • Data is loaded into memory and connections to source files are closed before returning the dataset.

  • Tolerance in input files is used around expected dataset start_time and end_time to account for imprecise logging times and ensuring correct definition of qc_time at files boundaries (e.g. 00:00).

  • Duplicated timesteps with different raw drop number values are dropped

  • First occurrence of duplicated timesteps with equal raw drop number values is kept.

  • Regularizes timesteps to handle trailing seconds.

disdrodb.l0.l0c_processing.drop_timesteps_with_invalid_sample_interval(ds, measurement_intervals, verbose=True, logger=None)[source][source]#

Drop timesteps with unexpected sample intervals.

disdrodb.l0.l0c_processing.get_problematic_timestep_indices(timesteps, sample_interval)[source][source]#

Identify timesteps with missing previous or following timesteps.

disdrodb.l0.l0c_processing.has_same_value_over_time(da)[source][source]#

Check if a DataArray has the same value over all timesteps, considering NaNs as equal.

Parameters:

da (xarray.DataArray) – The DataArray to check. Must have a ‘time’ dimension.

Returns:

True if the values are the same (or NaN in the same positions) across all timesteps, False otherwise.

Return type:

bool

disdrodb.l0.l0c_processing.nearest_expected_times(times, expected_times)[source][source]#

Return index of nearest expected time.

disdrodb.l0.l0c_processing.regularize_timesteps(ds, sample_interval, robust=False, add_quality_flag=True, logger=None, verbose=True)[source][source]#

Ensure timesteps match with the sample_interval.

This function: - drop dataset indices with duplicated timesteps, - but does not add missing timesteps to the dataset.

disdrodb.l0.l0c_processing.remove_duplicated_timesteps(ds, ensure_variables_equality=True, logger=None, verbose=True)[source][source]#

Removes duplicated timesteps from a xarray dataset.

disdrodb.l0.l0c_processing.split_dataset_by_sampling_intervals(ds, measurement_intervals, min_sample_interval=10, min_block_size=5, time_is_end_interval=True)[source][source]#

Split a dataset into subsets where each subset has a consistent sampling interval.

Notes

  • Does not modify timesteps (regularization is left to regularize_timesteps).

  • Assumes no duplicated timesteps in the dataset.

  • If only one measurement interval is specified, no timestep-diff checks are performed.

  • If multiple measurement intervals are specified:
    • Raises an error if none of the expected intervals appear.

    • Splits where interval changes.

  • Segments shorter than min_block_size are discarded.

Parameters:
  • ds (xarray.Dataset) – The input dataset with a ‘time’ dimension.

  • measurement_intervals (list or array-like) – A list of possible primary sampling intervals (in seconds) that the dataset might have.

  • min_sample_interval (int, optional) – The minimum expected sampling interval in seconds. Defaults to 10s. This is used to deal with possible trailing seconds errors.

  • min_block_size (float, optional) – The minimum number of timesteps with a given sampling interval to be considered. Otherwise such portion of data is discarded ! Defaults to 5 timesteps.

  • time_is_end_interval (bool) – Whether time refers to the end of the measurement interval. The default is True.

Returns:

A dictionary where keys are the identified sampling intervals (in seconds), and values are xarray.Datasets containing only data from those sampling intervals.

Return type:

dict[int, xr.Dataset]

disdrodb.l0.standards module#

Retrieve L0 sensor standards.

disdrodb.l0.standards.allowed_l0_variables(sensor_name: str) list[source][source]#

Get the list of allowed L0 variables for a given sensor.

disdrodb.l0.standards.get_bin_coords_dict(sensor_name: str) dict[source][source]#

Retrieve diameter (and velocity) bin coordinates.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with coordinates arrays.

Return type:

dict

disdrodb.l0.standards.get_data_format_dict(sensor_name: str) dict[source][source]#

Get a dictionary containing the data format of each sensor variable.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Data format of each sensor variable.

Return type:

dict

disdrodb.l0.standards.get_data_range_dict(sensor_name: str) dict[source][source]#

Get the variable data range.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected data value range for each data field. It excludes variables without specified data_range key.

Return type:

dict

disdrodb.l0.standards.get_diameter_bin_center(sensor_name: str) list[source][source]#

Get diameter bin center.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Diameter bin center.

Return type:

list

disdrodb.l0.standards.get_diameter_bin_lower(sensor_name: str) list[source][source]#

Get diameter bin lower bound.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Diameter bin lower bound.

Return type:

list

disdrodb.l0.standards.get_diameter_bin_upper(sensor_name: str) list[source][source]#

Get diameter bin upper bound.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Diameter bin upper bound.

Return type:

list

disdrodb.l0.standards.get_diameter_bin_width(sensor_name: str) list[source][source]#

Get diameter bin width.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Diameter bin width.

Return type:

list

disdrodb.l0.standards.get_diameter_bins_dict(sensor_name: str) dict[source][source]#

Get dictionary with sensor_name diameter bins information.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Sensor diameter bins information.

Return type:

dict

disdrodb.l0.standards.get_dims_size_dict(sensor_name: str) dict[source][source]#

Get the number of bins for each dimension.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the number of bins for each dimension.

Return type:

dict

disdrodb.l0.standards.get_field_nchar_dict(sensor_name: str) dict[source][source]#

Get the total number of characters from the instrument default string standards.

Important note: it accounts also for the comma and the minus sign !!!

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected number of characters for each data field.

Return type:

dict

disdrodb.l0.standards.get_field_ndigits_decimals_dict(sensor_name: dict) dict[source][source]#

Get number of digits on the right side of the comma from the instrument default string standards.

Example: 123,45 -> 45 –> 2 decimal digits.

Parameters:

sensor_name (dict) – Name of the sensor.

Returns:

Dictionary with the expected number of decimal digits for each data field.

Return type:

dict

disdrodb.l0.standards.get_field_ndigits_dict(sensor_name: str) dict[source][source]#

Get number of digits from the instrument default string standards.

Important note: it excludes the comma but it counts the minus sign !!!

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected number of digits for each data field.

Return type:

dict

disdrodb.l0.standards.get_field_ndigits_natural_dict(sensor_name: str) dict[source][source]#

Get number of digits on the left side of the comma from the instrument default string standards.

Example: 123,45 -> 123 –> 3 natural digits.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected number of natural digits for each data field.

Return type:

dict

disdrodb.l0.standards.get_l0a_dtype(sensor_name: str) dict[source][source]#

Get a dictionary containing the L0A dtype.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the L0A dtype.

Return type:

dict

disdrodb.l0.standards.get_l0a_encodings_dict(sensor_name: str) dict[source][source]#

Get a dictionary containing the L0A encodings.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

L0A encodings.

Return type:

dict

disdrodb.l0.standards.get_l0b_cf_attrs_dict(sensor_name: str) dict[source][source]#

Get a dictionary containing the CF attributes of each sensor variable.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

CF attributes of each sensor variable. For each variable, the ‘units’, ‘description’, and ‘long_name’ attributes are specified.

Return type:

dict

disdrodb.l0.standards.get_l0b_encodings_dict(sensor_name: str) dict[source][source]#

Get a dictionary containing the encoding to write L0B netCDFs.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Encoding to write L0B netCDFs

Return type:

dict

disdrodb.l0.standards.get_n_diameter_bins(sensor_name)[source][source]#

Get the number of diameter bins.

disdrodb.l0.standards.get_n_velocity_bins(sensor_name)[source][source]#

Get the number of velocity bins.

disdrodb.l0.standards.get_nan_flags_dict(sensor_name: str) dict[source][source]#

Get the variable nan_flags.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected nan_flags list for each data field. It excludes variables without specified nan_flags key.

Return type:

dict

disdrodb.l0.standards.get_raw_array_dims_order(sensor_name: str) dict[source][source]#

Get the dimension order of the raw fields.

The order of dimension specified for raw_drop_number controls the reshaping of the precipitation raw spectrum.

Examples

OTT Parsivel spectrum [d1v1 … d32v1, d1v2, …, d32v2] (diameter increases first) –> dimension_order = [“velocity_bin_center”, “diameter_bin_center”] Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] (velocity increases first) –> dimension_order = [“diameter_bin_center”, “velocity_bin_center”] PWS 100 spectrum [d1v1 … d1v34, d2v1, …, d2v34] (velocity increases first) –> dimension_order = [“diameter_bin_center”, “velocity_bin_center”]

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dimension order dictionary.

Return type:

dict

disdrodb.l0.standards.get_raw_array_nvalues(sensor_name: str) dict[source][source]#

Get a dictionary with the number of values expected for each raw array.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Field definition.

Return type:

dict

disdrodb.l0.standards.get_sensor_logged_variables(sensor_name: str) list[source][source]#

Get the sensor logged variables list.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

List of the variables logged by the sensor.

Return type:

list

disdrodb.l0.standards.get_valid_coordinates_names(sensor_name)[source][source]#

Get list of valid coordinates for DISDRODB L0B.

disdrodb.l0.standards.get_valid_dimension_names(sensor_name)[source][source]#

Get list of valid dimension names for DISDRODB L0B.

disdrodb.l0.standards.get_valid_names(sensor_name)[source][source]#

Return the list of valid variable and coordinates names for DISDRODB L0B.

disdrodb.l0.standards.get_valid_values_dict(sensor_name: str) dict[source][source]#

Get the list of valid values for a variable.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Dictionary with the expected values for specific variables. It excludes variables without specified valid_values key.

Return type:

dict

disdrodb.l0.standards.get_valid_variable_names(sensor_name)[source][source]#

Get list of valid variables.

disdrodb.l0.standards.get_variables_dimension(sensor_name: str)[source][source]#

Returns a dictionary with the variable dimensions of a L0B product.

disdrodb.l0.standards.get_velocity_bin_center(sensor_name: str) list[source][source]#

Get velocity bin center.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Velocity bin center.

Return type:

list

disdrodb.l0.standards.get_velocity_bin_lower(sensor_name: str) list[source][source]#

Get velocity bin lower bound.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Velocity bin lower bound.

Return type:

list

disdrodb.l0.standards.get_velocity_bin_upper(sensor_name: str) list[source][source]#

Get velocity bin upper bound.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Velocity bin upper bound.

Return type:

list

disdrodb.l0.standards.get_velocity_bin_width(sensor_name: str) list[source][source]#

Get velocity bin width.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Velocity bin width.

Return type:

list

disdrodb.l0.standards.get_velocity_bins_dict(sensor_name: str) dict[source][source]#

Get velocity with sensor_name diameter bins information.

Parameters:

sensor_name (str) – Name of the sensor.

Returns:

Sensor velocity bins information.

Return type:

dict

disdrodb.l0.template_tools module#

Useful tools helping in the implementation of the DISDRODB L0 readers.

disdrodb.l0.template_tools.check_column_names(column_names: list, sensor_name: str) None[source][source]#

Checks that the column names respects DISDRODB standards.

Parameters:
  • column_names (list) – List of columns names.

  • sensor_name (str) – Name of the sensor.

Raises:

TypeError – Error if some columns do not meet the DISDRODB standards.

disdrodb.l0.template_tools.get_decimal_ndigits(string: str) int[source][source]#

Get the number of decimal digits.

Parameters:

string (str) – Input string.

Returns:

The number of decimal digits.

Return type:

int

disdrodb.l0.template_tools.get_df_columns_unique_values_dict(df: DataFrame, column_indices: int | slice | list | None = None, column_names: bool = True)[source][source]#

Create a dictionary {column: unique values}.

Parameters:
  • df (pandas.DataFrame) – Input dataframe

  • column_indices (Union[int,slice,list], optional) – Column indices. If None, select all columns.

  • column_names (bool, optional) – If True, the dictionary key are the column names. The default value is True.

disdrodb.l0.template_tools.get_natural_ndigits(string: str) int[source][source]#

Get the number of natural digits.

Parameters:

string (str) – Input string.

Returns:

The number of natural digits.

Return type:

int

disdrodb.l0.template_tools.get_nchar(string: str) int[source][source]#

Get the number of characters.

Parameters:

string (str) – Input string.

Returns:

The number of characters.

Return type:

int

disdrodb.l0.template_tools.get_ndigits(string: str) int[source][source]#

Get the number of total numeric digits.

Parameters:

string (str) – Input string

Returns:

The number of total digits.

Return type:

int

disdrodb.l0.template_tools.get_unique_sorted_values(array)[source][source]#

Return unique sorted values.

It deals with np.nan within an array of string by converting object dtype to str.

disdrodb.l0.template_tools.infer_column_names(df: DataFrame, sensor_name: str, row_idx: int = 0)[source][source]#

Try to guess the dataframe columns names based on string characteristics.

Parameters:
  • df (pandas.DataFrame) – The dataframe to analyse.

  • sensor_name (str) – name of the sensor.

  • row_idx (int, optional) – The row index of the dataframe to use to infer the column names. The default row index is 0.

Returns:

Dictionary with the keys being the column id and the values being the guessed column names

Return type:

dict

disdrodb.l0.template_tools.print_allowed_column_names(sensor_name: str) None[source][source]#

Print valid columns names from the standard.

Parameters:

sensor_name (str) – Name of the sensor.

disdrodb.l0.template_tools.print_df_column_names(df: DataFrame) None[source][source]#

Print dataframe columns names.

Parameters:

df (pandas.DataFrame) – The dataframe.

disdrodb.l0.template_tools.print_df_columns_unique_values(df: DataFrame, column_indices: int | slice | list | None = None, print_column_names: bool = True) None[source][source]#

Print columns’ unique values.

Parameters:
  • df (pandas.DataFrame) – Input dataframe

  • column_indices (Union[int,slice,list], optional) – Column indices. If None, select all columns.

  • column_names (bool, optional) – If True, print the column names. The default value is True.

disdrodb.l0.template_tools.print_df_first_n_rows(df: DataFrame, n: int = 5, print_column_names: bool = True) None[source][source]#

Print the n first n rows dataframe by column.

Parameters:
  • df (pandas.DataFrame) – Input dataframe.

  • n (int, optional) – Number of row. The default is 5.

  • column_names (bool , optional) – If true columns name are printed, by default True.

disdrodb.l0.template_tools.print_df_random_n_rows(df: DataFrame, n: int = 5, print_column_names: bool = True) None[source][source]#

Print the content of the dataframe by column, randomly chosen.

Parameters:
  • df (pandas.DataFrame) – The dataframe.

  • n (int, optional) – The number of row to print. The default is 5.

  • print_column_names (bool, optional) – If true, print the column names. The default value is True.

disdrodb.l0.template_tools.print_df_summary_stats(df: DataFrame, column_indices: int | slice | list | None = None, print_column_names: bool = True)[source][source]#

Create a columns statistics summary.

Parameters:
  • df (pandas.DataFrame) – Input dataframe

  • column_indices (Union[int,slice,list], optional) – Column indices. If None, select all columns.

  • print_column_names (bool, optional) – If True, print the column names. The default value is True.

Raises:

ValueError – Error if columns types is not numeric.

disdrodb.l0.template_tools.print_df_with_any_nan_rows(df: DataFrame) None[source][source]#

Print empty rows.

Parameters:

df (pandas.DataFrame) – Input dataframe.

disdrodb.l0.template_tools.str_has_decimal_digits(string: str) bool[source][source]#

Check if a string has decimals.

Parameters:

string (str) – Input string.

Returns:

True if string has digits.

Return type:

bool

disdrodb.l0.template_tools.str_is_integer(string: str) bool[source][source]#

Check if a string represent an integer.

Parameters:

string (str) – Input string.

Returns:

True if integer.

Return type:

bool

disdrodb.l0.template_tools.str_is_number(string: str) bool[source][source]#

Check if a string represents a number.

Parameters:

string (str) – Input string.

Returns:

True if float.

Return type:

bool

Module contents#

DISDRODB L0 software.

disdrodb.l0.available_readers(sensor_name, data_sources=None, return_path=False)[source][source]#

Retrieve available readers information.

disdrodb.l0.generate_l0a(filepaths: list | str, reader, sensor_name, issue_dict=None, verbose=True, logger=None) DataFrame[source][source]#

Read and parse a list of raw files and generate a DISDRODB L0A dataframe.

Parameters:
  • filepaths (Union[list,str]) – File(s) path(s)

  • reader – DISDRODB reader function. Format: reader(filepath, logger=None)

  • sensor_name (str) – Name of the sensor.

  • issue_dict (dict, optional) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are 'timesteps' and 'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.

  • verbose (bool) – Whether to verbose the processing. The default is True.

Returns:

Dataframe

Return type:

pandas.DataFrame

Raises:

ValueError – Input parameters can not be used or the raw file can not be processed.

disdrodb.l0.generate_l0b(df: DataFrame, metadata: dict, logger=None, verbose: bool = False) Dataset[source][source]#

Transform the DISDRODB L0A dataframe to the DISDRODB L0B xr.Dataset.

Parameters:
  • df (pandas.DataFrame) – DISDRODB L0A dataframe. The raw drop number spectrum is reshaped to a 2D(+time) array. The raw drop concentration and velocity are reshaped to 1D(+time) arrays.

  • metadata (dict) – DISDRODB station metadata. To use this function outside the DISDRODB routines, the dictionary must contain the fields: sensor_name, latitude, longitude, altitude, platform_type.

  • verbose (bool, optional) – Whether to verbose the processing. The default value is False.

Returns:

DISDRODB L0B dataset.

Return type:

xarray.Dataset

Raises:

ValueError – Error if the DISDRODB L0B xarray dataset can not be created.

disdrodb.l0.generate_l0b_from_nc(filepaths: list | str, reader, sensor_name, metadata, issue_dict=None, verbose=True, logger=None)[source][source]#

Read and parse a list of raw netCDF files and generate a DISDRODB L0B dataset.

Parameters:
  • filepaths (Union[list,str]) – File(s) path(s)

  • reader – DISDRODB reader function. Format: reader(filepath, logger=None)

  • sensor_name (str) – Name of the sensor.

  • metadata (dict) – Station metadata to attach as global attributes to the xr.Dataset.

  • issue_dict (dict, optional) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are 'timesteps' and 'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.

  • verbose (bool) – Whether to verbose the processing. The default is True.

Returns:

DISDRODB L0B Dataset.

Return type:

xarray.Dataset

Raises:

ValueError – Input parameters can not be used or the raw file can not be processed.

disdrodb.l0.get_reader(reader_reference, sensor_name)[source][source]#

Retrieve the reader function.

Parameters:
  • reader_reference (str) – The reader reference name. The reader is located at disdrodb.l0.readers.{sensor_name}.{reader_reference}. The reader_reference naming convention is "{DATA_SOURCE}"/"{CAMPAIGN_NAME}_{OPTIONAL_SUFFIX}".

  • sensor_name (str) – The sensor name.

Returns:

The reader() function.

Return type:

callable

disdrodb.l0.get_station_reader(data_source, campaign_name, station_name, metadata_archive_dir=None)[source][source]#

Retrieve the reader function of a specific DISDRODB station.