disdrodb.l0 package#
Subpackages#
Submodules#
disdrodb.l0.check_configs module#
Check configuration files.
- class disdrodb.l0.check_configs.L0BEncodingSchema(*, contiguous: bool, dtype: str, zlib: bool, complevel: int, shuffle: bool, fletcher32: bool, _FillValue: int | float | None = None, chunksizes: int | list[int] | None, add_offset: float | None = None, scale_factor: float | None = None)[source][source]#
Bases:
CustomBaseModelPydantic model for DISDRODB netCDF encodings.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- classmethod check_contiguous_and_fletcher32(values)[source][source]#
Check the fletcher value validity.
- classmethod check_contiguous_and_zlib(values)[source][source]#
Check the the compression value validity.
- classmethod check_integer_fillvalue(values)[source][source]#
Check that integer dtypes have valid _FillValue.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'hide_error_urls': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class disdrodb.l0.check_configs.RawDataFormatSchema(*, n_digits: int | None, n_characters: int | None, n_decimals: int | None, n_naturals: int | None, data_range: list[float] | None, nan_flags: int | float | str | None = None, valid_values: list[float] | None = None, dimension_order: list[str] | None = None, n_values: int | None = None, field_number: str | None = None)[source][source]#
Bases:
CustomBaseModelPydantic model for the DISDRODB RAW Data Format YAML files.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'hide_error_urls': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- disdrodb.l0.check_configs.check_all_sensors_configs() None[source][source]#
Check all sensors configuration YAML files.
- disdrodb.l0.check_configs.check_bin_consistency(sensor_name: str) None[source][source]#
Check bin consistency from config file.
Do not check the first and last bin !
- Parameters:
sensor_name (str) – Name of the sensor.
- disdrodb.l0.check_configs.check_cf_attributes(sensor_name: str) None[source][source]#
Check that the
l0b_cf_attrs.ymldescription, long_name and units values are strings.- Parameters:
sensor_name (str) – Name of the sensor.
- disdrodb.l0.check_configs.check_l0a_encoding(sensor_name: str) None[source][source]#
Check
l0a_encodings.ymlfile.- Parameters:
sensor_name (str) – Name of the sensor.
- Raises:
ValueError – Error raised if the value of a key is not in the list of accepted values.
- disdrodb.l0.check_configs.check_l0b_encoding(sensor_name: str) None[source][source]#
Check
l0b_encodings.ymlfile based on the schema defined in the classL0BEncodingSchema.- Parameters:
sensor_name (str) – Name of the sensor.
- disdrodb.l0.check_configs.check_raw_array(sensor_name: str) None[source][source]#
Check raw array consistency from config file.
- Parameters:
sensor_name (str) – Name of the sensor.
- Raises:
ValueError – Error if the chunksizes are not consistent.
- disdrodb.l0.check_configs.check_raw_data_format(sensor_name: str) None[source][source]#
Check
raw_data_format.ymlfile based on the schema defined in the classRawDataFormatSchema.- Parameters:
sensor_name (str) – Name of the sensor.
- disdrodb.l0.check_configs.check_sensor_configs(sensor_name: str) None[source][source]#
Check validity of sensor configuration YAML files.
- Parameters:
sensor_name (str) – Name of the sensor.
- disdrodb.l0.check_configs.check_variable_consistency(sensor_name: str) None[source][source]#
Check variable consistency across config files.
The variables specified within l0b_encoding.yml must be defined also in the other config files. The raw_data_format.yml can contain some extra variables !
- Parameters:
sensor_name (str) – Name of the sensor.
- Raises:
ValueError – If the keys are not consistent.
disdrodb.l0.check_standards module#
Check data standards.
- disdrodb.l0.check_standards.check_l0a_column_names(df: DataFrame, sensor_name: str) None[source][source]#
Checks that the dataframe columns respects DISDRODB standards.
- Parameters:
df (pandas.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
- Raises:
ValueError – Error if some columns do not meet the DISDRODB standards or if the
'time'column is missing in the dataframe.
- disdrodb.l0.check_standards.check_l0a_standards(df: DataFrame, sensor_name: str, logger=None, verbose: bool = True) None[source][source]#
Checks that a file respects the DISDRODB L0A standards.
- Parameters:
df (pandas.DataFrame) – L0A dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool, optional) – Whether to verbose the processing. The default value is
True.
- Raises:
ValueError – Error if some columns have inconsistent values.
disdrodb.l0.l0_reader module#
Define DISDRODB L0 readers routines.
- disdrodb.l0.l0_reader.available_readers(sensor_name, data_sources=None, return_path=False)[source][source]#
Retrieve available readers information.
- disdrodb.l0.l0_reader.check_metadata_reader(metadata)[source][source]#
Check the metadata
readerkey is available and points to an existing disdrodb reader.
- disdrodb.l0.l0_reader.check_reader_arguments(reader)[source][source]#
Check the reader function have the expected input arguments.
- disdrodb.l0.l0_reader.check_reader_exists(reader_reference, sensor_name)[source][source]#
Check the reader exists.
- disdrodb.l0.l0_reader.check_reader_reference(reader_reference)[source][source]#
Check the reader_reference value.
- disdrodb.l0.l0_reader.check_software_readers()[source][source]#
Check the validity of all readers included in disdrodb software .
- disdrodb.l0.l0_reader.define_reader_path(sensor_name, reader_reference)[source][source]#
Define the reader path based on the reader reference name.
- disdrodb.l0.l0_reader.define_readers_directory(sensor_name='') str[source][source]#
Returns the path to the
disdrodb.l0.readersdirectory within the disdrodb package.
- disdrodb.l0.l0_reader.get_reader(reader_reference, sensor_name)[source][source]#
Retrieve the reader function.
- Parameters:
- Returns:
The
reader()function.- Return type:
callable
- disdrodb.l0.l0_reader.get_reader_from_metadata(metadata)[source][source]#
Retrieve the reader function based on the metadata information.
The reader_reference naming convention is
"{DATA_SOURCE}"/"{CAMPAIGN_NAME}_{OPTIONAL_SUFFIX}". The reader is located atdisdrodb.l0.readers.{sensor_name}.{reader_reference}.
- disdrodb.l0.l0_reader.get_specific_readers_path(sensor_name)[source][source]#
Returns a dictionary with the file paths of the available readers for each data source.
- disdrodb.l0.l0_reader.get_specific_readers_references(sensor_name)[source][source]#
Returns a dictionary with the readers references available for each data source.
- disdrodb.l0.l0_reader.get_station_reader(data_source, campaign_name, station_name, metadata_archive_dir=None)[source][source]#
Retrieve the reader function of a specific DISDRODB station.
- disdrodb.l0.l0_reader.is_documented_by(original)[source][source]#
Wrapper function to apply generic docstring to the decorated function.
- Parameters:
original (function) – Function to take the docstring from.
- disdrodb.l0.l0_reader.list_readers_paths(sensor_name) list[source][source]#
Returns the file paths of the available readers for a given sensor in
disdrodb.l0.readers.{sensor_name}.
- disdrodb.l0.l0_reader.list_readers_references(sensor_name)[source][source]#
Returns the readers references available for a given sensor in
disdrodb.l0.readers.{sensor_name}.
- disdrodb.l0.l0_reader.reader_generic_docstring()[source][source]#
Reader to convert a raw data file to DISDRODB L0A or L0B format.
Raw text files are read and converted to a
pandas.DataFrame(L0A format). Raw netCDF files are read and converted to axarray.Dataset(L0B format).- Parameters:
filepath (str) – Filepath of the raw data file to be processed.
logger (logging.Logger, optional) – Logger to use for logging messages. Default is
None, which means no logger is used.
disdrodb.l0.l0a_processing module#
Functions to process raw text files into DISDRODB L0A Apache Parquet.
- disdrodb.l0.l0a_processing.cast_column_dtypes(df: DataFrame, sensor_name: str) DataFrame[source][source]#
Convert
'object'dataframe columns into DISDRODB L0A dtype standards.- Parameters:
df (pandas.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
- Returns:
Dataframe with corrected columns types.
- Return type:
- disdrodb.l0.l0a_processing.check_matching_column_number(df, column_names)[source][source]#
Check the number of columns in the dataframe matches the length of column names.
- disdrodb.l0.l0a_processing.coerce_corrupted_values_to_nan(df: DataFrame, sensor_name: str) DataFrame[source][source]#
Coerce corrupted values in dataframe numeric columns to
np.nan.- Parameters:
df (pandas.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
- Returns:
Dataframe with string columns without corrupted values.
- Return type:
- disdrodb.l0.l0a_processing.concatenate_dataframe(list_df: list, logger=None, verbose: bool = False) DataFrame[source][source]#
Concatenate a list of dataframes.
- Parameters:
- Returns:
Concatenated dataframe.
- Return type:
- Raises:
ValueError – Concatenation can not be done.
- disdrodb.l0.l0a_processing.drop_time_periods(df, time_periods)[source][source]#
Drop problematic time periods.
- disdrodb.l0.l0a_processing.drop_timesteps(df, timesteps)[source][source]#
Drop problematic time steps.
- disdrodb.l0.l0a_processing.generate_l0a(filepaths: list | str, reader, sensor_name, issue_dict=None, verbose=True, logger=None) DataFrame[source][source]#
Read and parse a list of raw files and generate a DISDRODB L0A dataframe.
- Parameters:
reader – DISDRODB reader function. Format: reader(filepath, logger=None)
sensor_name (str) – Name of the sensor.
issue_dict (dict, optional) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary
{}. Valid issue_dict key are'timesteps'and'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of theissue_dict, use thedisdrodb.l0.issue.check_issue_dictfunction.verbose (bool) – Whether to verbose the processing. The default is
True.
- Returns:
Dataframe
- Return type:
- Raises:
ValueError – Input parameters can not be used or the raw file can not be processed.
- disdrodb.l0.l0a_processing.is_raw_array_string_not_corrupted(string)[source][source]#
Check if the raw array is corrupted.
- disdrodb.l0.l0a_processing.preprocess_reader_kwargs(reader_kwargs: dict) dict[source][source]#
Preprocess arguments required to read raw text file into Pandas.
- disdrodb.l0.l0a_processing.read_l0a_dataframe(filepaths: str | list, debugging_mode: bool = False) DataFrame[source][source]#
Read DISDRODB L0A Apache Parquet file(s).
- Parameters:
- Returns:
L0A Dataframe.
- Return type:
- disdrodb.l0.l0a_processing.read_raw_text_file(filepath: str, column_names: list, reader_kwargs: dict, logger=None) DataFrame[source][source]#
Read a raw file into a dataframe.
- Parameters:
filepath (str) – Raw file path.
column_names (list) – Column names.
reader_kwargs (dict) – Pandas
pd.read_csvarguments.logger (logging.Logger) – Logger object. The default is
None. IfNone, the logger is created using the module name. Ifloggeris passed, it will be used to log messages.
- Returns:
Pandas dataframe.
- Return type:
- disdrodb.l0.l0a_processing.remove_corrupted_rows(df)[source][source]#
Remove corrupted rows by checking conversion of raw fields to numeric.
Note: The raw array must be stripped away from delimiter at start and end !
- disdrodb.l0.l0a_processing.remove_duplicated_timesteps(df: DataFrame, logger=None, verbose: bool = False)[source][source]#
Remove duplicated timesteps.
It keep only the first timestep occurrence !
- Parameters:
df (pandas.DataFrame) – Input dataframe.
verbose (bool) – Whether to verbose the processing. The default is
False.
- Returns:
Dataframe with valid unique timesteps.
- Return type:
- disdrodb.l0.l0a_processing.remove_issue_timesteps(df, issue_dict, logger=None, verbose=False)[source][source]#
Drop dataframe rows with timesteps listed in the issue dictionary.
- Parameters:
df (pandas.DataFrame) – Input dataframe.
issue_dict (dict) – Issue dictionary.
verbose (bool) – Whether to verbose the processing. The default is
False.
- Returns:
Dataframe with problematic timesteps removed.
- Return type:
- disdrodb.l0.l0a_processing.remove_rows_with_missing_time(df: ~pandas.core.frame.DataFrame, logger=<Logger disdrodb.l0.l0a_processing (WARNING)>, verbose: bool = False)[source][source]#
Remove dataframe rows where the
"time"isNaT.- Parameters:
df (pandas.DataFrame) – Input dataframe.
verbose (bool) – Whether to verbose the processing. The default is
False.
- Returns:
Dataframe with valid timesteps.
- Return type:
- disdrodb.l0.l0a_processing.replace_nan_flags(df, sensor_name, logger=None, verbose=False)[source][source]#
Set values corresponding to
nan_flagstonp.nan.- Parameters:
df (pandas.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing. The default is
False.
- Returns:
Dataframe without nan_flags values.
- Return type:
- disdrodb.l0.l0a_processing.sanitize_df(df, sensor_name, verbose=True, issue_dict=None, logger=None)[source][source]#
Read and parse a raw text files into a L0A dataframe.
- Parameters:
filepath (str) – File path
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing. The default is
True.issue_dict (dict) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary
{}. Valid issue_dict key are'timesteps'and'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of theissue_dict, use thedisdrodb.l0.issue.check_issue_dictfunction.
- Returns:
Dataframe
- Return type:
- disdrodb.l0.l0a_processing.set_nan_invalid_values(df, sensor_name, logger=None, verbose=False)[source][source]#
Set invalid (class) values to
np.nan.- Parameters:
df (pandas.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing. The default is
False.
- Returns:
Dataframe without invalid values.
- Return type:
- disdrodb.l0.l0a_processing.set_nan_outside_data_range(df, sensor_name, logger=None, verbose=False)[source][source]#
Set values outside the data range as
np.nan.- Parameters:
df (pandas.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing. The default is
False.
- Returns:
Dataframe without values outside the expected data range.
- Return type:
- disdrodb.l0.l0a_processing.strip_delimiter(string)[source][source]#
Remove the first and last delimiter occurrence from a string.
- disdrodb.l0.l0a_processing.strip_delimiter_from_raw_arrays(df)[source][source]#
Remove the first and last delimiter occurrence from the raw array fields.
- disdrodb.l0.l0a_processing.strip_string_spaces(df: DataFrame, sensor_name: str) DataFrame[source][source]#
Strip leading/trailing spaces from dataframe string columns.
- Parameters:
df (pandas.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
- Returns:
Dataframe with string columns without leading/trailing spaces.
- Return type:
- disdrodb.l0.l0a_processing.write_l0a(df: DataFrame, filepath: str, force: bool = False, logger=None, verbose: bool = False)[source][source]#
Save the dataframe into an Apache Parquet file.
- Parameters:
df (pandas.DataFrame) – Input dataframe.
filepath (str) – Output file path.
force (bool, optional) – Whether to overwrite existing data. If
True, overwrite existing data into destination directories. IfFalse, raise an error if there are already data into destination directories. This is the default.verbose (bool, optional) – Whether to verbose the processing. The default is
False.
- Raises:
ValueError – The input dataframe can not be written as an Apache Parquet file.
NotImplementedError – The input dataframe can not be processed.
disdrodb.l0.l0b_nc_processing module#
Functions to process DISDRODB raw netCDF files into DISDRODB L0B netCDF files.
- disdrodb.l0.l0b_nc_processing.add_dataset_missing_variables(ds, missing_vars, sensor_name)[source][source]#
Add missing xr.Dataset variables as
np.nanxr.DataArrays.
- disdrodb.l0.l0b_nc_processing.drop_time_periods(ds, time_periods: list)[source][source]#
Drop all time steps within any of the specified time intervals.
- Parameters:
ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.
time_periods (list of tuple) – Each tuple is (start_time, end_time), datetime-like, inclusive.
- Returns:
Dataset with all times within the given periods removed.
- Return type:
- Raises:
ValueError – If no timesteps remain after removal.
- disdrodb.l0.l0b_nc_processing.drop_timesteps(ds, timesteps: list)[source][source]#
Drop specific time steps from a Dataset.
- Parameters:
ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.
timesteps (list) – List of datetime-like values to remove.
- Returns:
Dataset with specified timesteps removed.
- Return type:
- Raises:
ValueError – If no timesteps remain after removal.
- disdrodb.l0.l0b_nc_processing.generate_l0b_from_nc(filepaths: list | str, reader, sensor_name, metadata, issue_dict=None, verbose=True, logger=None)[source][source]#
Read and parse a list of raw netCDF files and generate a DISDRODB L0B dataset.
- Parameters:
reader – DISDRODB reader function. Format: reader(filepath, logger=None)
sensor_name (str) – Name of the sensor.
metadata (dict) – Station metadata to attach as global attributes to the xr.Dataset.
issue_dict (dict, optional) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary
{}. Valid issue_dict key are'timesteps'and'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of theissue_dict, use thedisdrodb.l0.issue.check_issue_dictfunction.verbose (bool) – Whether to verbose the processing. The default is
True.
- Returns:
DISDRODB L0B Dataset.
- Return type:
- Raises:
ValueError – Input parameters can not be used or the raw file can not be processed.
- disdrodb.l0.l0b_nc_processing.open_raw_netcdf_file(filepath, logger=None, engine='netcdf4', cache=False, chunks=None, decode_timedelta=False, **kwargs)[source][source]#
Open a raw netCDF file.
- Parameters:
filepath (str) – Path to the raw netCDF file.
- Returns:
Raw netCDF file as an xarray Dataset.
- Return type:
- disdrodb.l0.l0b_nc_processing.remove_issue_timesteps(ds, issue_dict: dict, logger=None, verbose: bool = False)[source][source]#
Remove bad timesteps and time periods from an xarray Dataset according to issue definitions.
- Parameters:
ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.
issue_dict (dict) – Dictionary with optional keys ‘timesteps’ (list of datetimes) and ‘time_periods’ (list of (start, end) tuples).
logger (any, optional) – Logger instance to record dropped steps, by default None.
verbose (bool, optional) – Whether to log informational messages, by default False.
- Returns:
Cleaned dataset.
- Return type:
- Raises:
ValueError – If after removing specified timesteps/periods no data remains.
- disdrodb.l0.l0b_nc_processing.rename_dataset(ds, dict_names)[source][source]#
Rename xr.Dataset variables, coordinates and dimensions.
- disdrodb.l0.l0b_nc_processing.replace_custom_nan_flags(ds, dict_nan_flags, logger=None, verbose=False)[source][source]#
Set values corresponding to
nan_flagstonp.nan.This function must be used in a reader, if necessary.
- Parameters:
df (xarray.Dataset) – Input xarray dataset
dict_nan_flags (dict) – Dictionary with nan flags value to set as
np.nan.verbose (bool) – Whether to verbose the processing. The default value is
False.
- Returns:
Dataset without
nan_flagsvalues.- Return type:
- disdrodb.l0.l0b_nc_processing.replace_nan_flags(ds, sensor_name, verbose, logger=None)[source][source]#
Set values corresponding to
nan_flagstonp.nan.- Parameters:
ds (xarray.Dataset) – Input xarray dataset
dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan
verbose (bool) – Whether to verbose the processing.
- Returns:
Dataset without
nan_flagsvalues.- Return type:
- disdrodb.l0.l0b_nc_processing.sanitize_ds(ds, sensor_name, metadata, issue_dict=None, verbose=False, logger=None)[source][source]#
Convert a raw
xr.Datasetinto a DISDRODB L0B netCDF.- Parameters:
ds (xarray.Dataset) – Raw xarray dataset
metadata (dict) – Station metadata to attach as global attributes to the xr.Dataset.
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing.
- Returns:
L0B xr.Dataset
- Return type:
- disdrodb.l0.l0b_nc_processing.set_nan_invalid_values(ds, sensor_name, verbose, logger=None)[source][source]#
Set invalid (class) values to
np.nan.- Parameters:
ds (xarray.Dataset) – Input xarray dataset
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing.
- Returns:
Dataset without invalid values.
- Return type:
- disdrodb.l0.l0b_nc_processing.set_nan_outside_data_range(ds, sensor_name, verbose, logger=None)[source][source]#
Set values outside the data range as
np.nan.- Parameters:
ds (xarray.Dataset) – Input xarray dataset
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing.
- Returns:
Dataset without values outside the expected data range.
- Return type:
- disdrodb.l0.l0b_nc_processing.standardize_raw_dataset(ds, dict_names, sensor_name)[source][source]#
This function preprocess raw netCDF to improve compatibility with DISDRODB standards.
This function checks validity of the
dict_names, rename and subset the data accordingly. If some variables specified in thedict_namesare missing, it adds anp.nanxr.DataArray !- Parameters:
ds (xarray.Dataset) – Raw netCDF to be converted to DISDRODB standards.
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
sensor_name (str) – Sensor name.
- Returns:
ds – xarray Dataset with variables compliant with DISDRODB conventions.
- Return type:
disdrodb.l0.l0b_processing module#
Functions to process DISDRODB L0A files into DISDRODB L0B netCDF files.
- disdrodb.l0.l0b_processing.convert_object_variables_to_string(ds: Dataset) Dataset[source][source]#
Convert variables with
objectdtype tostring.- Parameters:
ds (xarray.Dataset) – Input dataset.
- Returns:
Output dataset.
- Return type:
- disdrodb.l0.l0b_processing.ensure_valid_geolocation(ds: Dataset, coord: str, errors: str = 'ignore') Dataset[source][source]#
Ensure valid geolocation coordinates.
‘altitude’ must be >= 0, ‘latitude’ must be within [-90, 90] and ‘longitude’ within [-180, 180].
It can deal with coordinates varying with time.
- Parameters:
ds (xarray.Dataset) – Dataset containing the coordinate.
coord (str) – Name of the coordinate variable to validate.
errors ({"ignore", "raise", "coerce"}, default "ignore") –
“ignore”: nothing is done.
”raise” : raise ValueError if invalid values are found.
”coerce”: out-of-range values are replaced with NaN.
- Returns:
Dataset with validated coordinate values.
- Return type:
xr.Dataset
- disdrodb.l0.l0b_processing.finalize_dataset(ds, sensor_name, metadata)[source][source]#
Finalize DISDRODB L0B Dataset.
- disdrodb.l0.l0b_processing.format_string_array(string: str, n_values: int) array[source][source]#
Split a string with multiple numbers separated by a delimiter into an 1D array.
e.g. : format_string_array(“2,44,22,33”, 4) will return [ 2. 44. 22. 33.]
If empty string (“”) or “” –> Return an arrays of zeros If the list length is not n_values -> Return an arrays of np.nan
The function strip potential delimiters at start and end before splitting.
- disdrodb.l0.l0b_processing.generate_l0b(df: DataFrame, metadata: dict, logger=None, verbose: bool = False) Dataset[source][source]#
Transform the DISDRODB L0A dataframe to the DISDRODB L0B xr.Dataset.
- Parameters:
df (pandas.DataFrame) – DISDRODB L0A dataframe. The raw drop number spectrum is reshaped to a 2D(+time) array. The raw drop concentration and velocity are reshaped to 1D(+time) arrays.
metadata (dict) – DISDRODB station metadata. To use this function outside the DISDRODB routines, the dictionary must contain the fields:
sensor_name,latitude,longitude,altitude,platform_type.verbose (bool, optional) – Whether to verbose the processing. The default value is
False.
- Returns:
DISDRODB L0B dataset.
- Return type:
- Raises:
ValueError – Error if the DISDRODB L0B xarray dataset can not be created.
- disdrodb.l0.l0b_processing.infer_split_str(string: str) str[source][source]#
Infer the delimiter inside a string.
- disdrodb.l0.l0b_processing.replace_empty_strings_with_zeros(values)[source][source]#
Replace empty comma separated strings with ‘0’.
- disdrodb.l0.l0b_processing.reshape_raw_spectrum(arr: array, dims_order: list, dims_size_dict: dict, n_timesteps: int) array[source][source]#
Reshape the raw spectrum to a 2D+time array.
The array has dimensions [“time”] + dims_order
- Parameters:
arr (np.array) – Input array.
dims_order (list) – The order of dimension in the raw spectrum.
Examples
OTT PARSIVEL spectrum [v1d1 … v1d32, v2d1, …, v2d32]
–> dims_order = [“diameter_bin_center”, “velocity_bin_center”] - Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] –> dims_order = [“velocity_bin_center”, “diameter_bin_center”]
- dims_size_dictdict
Dictionary with the number of bins for each dimension. For PARSIVEL and PARSIVEL2: {“diameter_bin_center”: 32, “velocity_bin_center”: 32} For LPM {“diameter_bin_center”: 22, “velocity_bin_center”: 20} For PWS100 {“diameter_bin_center”: 34, “velocity_bin_center”: 34}
- n_timestepsint
Number of timesteps.
- Returns:
Output array.
- Return type:
np.array
- Raises:
ValueError – Impossible to reshape the raw_spectrum matrix
- disdrodb.l0.l0b_processing.retrieve_l0b_arrays(df: DataFrame, sensor_name: str, logger=None, verbose: bool = False) dict[source][source]#
Retrieves the L0B data matrix.
- Parameters:
df (pandas.DataFrame) – Input dataframe
sensor_name (str) – Name of the sensor
- Returns:
Dictionary with data arrays.
- Return type:
- disdrodb.l0.l0b_processing.set_geolocation_coordinates(ds, metadata)[source][source]#
Add geolocation coordinates to dataset.
disdrodb.l0.l0c_processing module#
Functions to process DISDRODB L0B files into DISDRODB L0C netCDF files.
- disdrodb.l0.l0c_processing.check_timesteps_regularity(ds, sample_interval, verbose=False, logger=None)[source][source]#
Check for the regularity of timesteps.
- disdrodb.l0.l0c_processing.create_l0c_datasets(event_info, measurement_intervals, sensor_name, ensure_variables_equality=True, logger=None, verbose=True)[source][source]#
Create a single dataset by merging and processing data from multiple filepaths.
- Parameters:
event_info (dict) – Dictionary with start_time, end_time and filepaths keys.
- Returns:
A dictionary with an xarray.Dataset for each measurement interval.
- Return type:
- Raises:
ValueError – If less than 5 timesteps are available for the specified day.
Notes
Data is loaded into memory and connections to source files are closed before returning the dataset.
Tolerance in input files is used around expected dataset start_time and end_time to account for imprecise logging times and ensuring correct definition of qc_time at files boundaries (e.g. 00:00).
Duplicated timesteps with different raw drop number values are dropped
First occurrence of duplicated timesteps with equal raw drop number values is kept.
Regularizes timesteps to handle trailing seconds.
- disdrodb.l0.l0c_processing.drop_timesteps_with_invalid_sample_interval(ds, measurement_intervals, verbose=True, logger=None)[source][source]#
Drop timesteps with unexpected sample intervals.
- disdrodb.l0.l0c_processing.get_problematic_timestep_indices(timesteps, sample_interval)[source][source]#
Identify timesteps with missing previous or following timesteps.
- disdrodb.l0.l0c_processing.has_same_value_over_time(da)[source][source]#
Check if a DataArray has the same value over all timesteps, considering NaNs as equal.
- Parameters:
da (xarray.DataArray) – The DataArray to check. Must have a ‘time’ dimension.
- Returns:
True if the values are the same (or NaN in the same positions) across all timesteps, False otherwise.
- Return type:
- disdrodb.l0.l0c_processing.nearest_expected_times(times, expected_times)[source][source]#
Return index of nearest expected time.
- disdrodb.l0.l0c_processing.regularize_timesteps(ds, sample_interval, robust=False, add_quality_flag=True, logger=None, verbose=True)[source][source]#
Ensure timesteps match with the sample_interval.
This function: - drop dataset indices with duplicated timesteps, - but does not add missing timesteps to the dataset.
- disdrodb.l0.l0c_processing.remove_duplicated_timesteps(ds, ensure_variables_equality=True, logger=None, verbose=True)[source][source]#
Removes duplicated timesteps from a xarray dataset.
- disdrodb.l0.l0c_processing.split_dataset_by_sampling_intervals(ds, measurement_intervals, min_sample_interval=10, min_block_size=5, time_is_end_interval=True)[source][source]#
Split a dataset into subsets where each subset has a consistent sampling interval.
Notes
Does not modify timesteps (regularization is left to regularize_timesteps).
Assumes no duplicated timesteps in the dataset.
If only one measurement interval is specified, no timestep-diff checks are performed.
- If multiple measurement intervals are specified:
Raises an error if none of the expected intervals appear.
Splits where interval changes.
Segments shorter than min_block_size are discarded.
- Parameters:
ds (xarray.Dataset) – The input dataset with a ‘time’ dimension.
measurement_intervals (list or array-like) – A list of possible primary sampling intervals (in seconds) that the dataset might have.
min_sample_interval (int, optional) – The minimum expected sampling interval in seconds. Defaults to 10s. This is used to deal with possible trailing seconds errors.
min_block_size (float, optional) – The minimum number of timesteps with a given sampling interval to be considered. Otherwise such portion of data is discarded ! Defaults to 5 timesteps.
time_is_end_interval (bool) – Whether time refers to the end of the measurement interval. The default is True.
- Returns:
A dictionary where keys are the identified sampling intervals (in seconds), and values are xarray.Datasets containing only data from those sampling intervals.
- Return type:
disdrodb.l0.standards module#
Retrieve L0 sensor standards.
- disdrodb.l0.standards.allowed_l0_variables(sensor_name: str) list[source][source]#
Get the list of allowed L0 variables for a given sensor.
- disdrodb.l0.standards.get_bin_coords_dict(sensor_name: str) dict[source][source]#
Retrieve diameter (and velocity) bin coordinates.
- disdrodb.l0.standards.get_data_format_dict(sensor_name: str) dict[source][source]#
Get a dictionary containing the data format of each sensor variable.
- disdrodb.l0.standards.get_data_range_dict(sensor_name: str) dict[source][source]#
Get the variable data range.
- disdrodb.l0.standards.get_diameter_bin_center(sensor_name: str) list[source][source]#
Get diameter bin center.
- disdrodb.l0.standards.get_diameter_bin_lower(sensor_name: str) list[source][source]#
Get diameter bin lower bound.
- disdrodb.l0.standards.get_diameter_bin_upper(sensor_name: str) list[source][source]#
Get diameter bin upper bound.
- disdrodb.l0.standards.get_diameter_bin_width(sensor_name: str) list[source][source]#
Get diameter bin width.
- disdrodb.l0.standards.get_diameter_bins_dict(sensor_name: str) dict[source][source]#
Get dictionary with
sensor_namediameter bins information.
- disdrodb.l0.standards.get_dims_size_dict(sensor_name: str) dict[source][source]#
Get the number of bins for each dimension.
- disdrodb.l0.standards.get_field_nchar_dict(sensor_name: str) dict[source][source]#
Get the total number of characters from the instrument default string standards.
Important note: it accounts also for the comma and the minus sign !!!
- disdrodb.l0.standards.get_field_ndigits_decimals_dict(sensor_name: dict) dict[source][source]#
Get number of digits on the right side of the comma from the instrument default string standards.
Example: 123,45 -> 45 –> 2 decimal digits.
- disdrodb.l0.standards.get_field_ndigits_dict(sensor_name: str) dict[source][source]#
Get number of digits from the instrument default string standards.
Important note: it excludes the comma but it counts the minus sign !!!
- disdrodb.l0.standards.get_field_ndigits_natural_dict(sensor_name: str) dict[source][source]#
Get number of digits on the left side of the comma from the instrument default string standards.
Example: 123,45 -> 123 –> 3 natural digits.
- disdrodb.l0.standards.get_l0a_dtype(sensor_name: str) dict[source][source]#
Get a dictionary containing the L0A dtype.
- disdrodb.l0.standards.get_l0a_encodings_dict(sensor_name: str) dict[source][source]#
Get a dictionary containing the L0A encodings.
- disdrodb.l0.standards.get_l0b_cf_attrs_dict(sensor_name: str) dict[source][source]#
Get a dictionary containing the CF attributes of each sensor variable.
- disdrodb.l0.standards.get_l0b_encodings_dict(sensor_name: str) dict[source][source]#
Get a dictionary containing the encoding to write L0B netCDFs.
- disdrodb.l0.standards.get_n_diameter_bins(sensor_name)[source][source]#
Get the number of diameter bins.
- disdrodb.l0.standards.get_n_velocity_bins(sensor_name)[source][source]#
Get the number of velocity bins.
- disdrodb.l0.standards.get_nan_flags_dict(sensor_name: str) dict[source][source]#
Get the variable nan_flags.
- disdrodb.l0.standards.get_raw_array_dims_order(sensor_name: str) dict[source][source]#
Get the dimension order of the raw fields.
The order of dimension specified for raw_drop_number controls the reshaping of the precipitation raw spectrum.
Examples
OTT Parsivel spectrum [d1v1 … d32v1, d1v2, …, d32v2] (diameter increases first) –> dimension_order = [“velocity_bin_center”, “diameter_bin_center”] Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] (velocity increases first) –> dimension_order = [“diameter_bin_center”, “velocity_bin_center”] PWS 100 spectrum [d1v1 … d1v34, d2v1, …, d2v34] (velocity increases first) –> dimension_order = [“diameter_bin_center”, “velocity_bin_center”]
- disdrodb.l0.standards.get_raw_array_nvalues(sensor_name: str) dict[source][source]#
Get a dictionary with the number of values expected for each raw array.
- disdrodb.l0.standards.get_sensor_logged_variables(sensor_name: str) list[source][source]#
Get the sensor logged variables list.
- disdrodb.l0.standards.get_valid_coordinates_names(sensor_name)[source][source]#
Get list of valid coordinates for DISDRODB L0B.
- disdrodb.l0.standards.get_valid_dimension_names(sensor_name)[source][source]#
Get list of valid dimension names for DISDRODB L0B.
- disdrodb.l0.standards.get_valid_names(sensor_name)[source][source]#
Return the list of valid variable and coordinates names for DISDRODB L0B.
- disdrodb.l0.standards.get_valid_values_dict(sensor_name: str) dict[source][source]#
Get the list of valid values for a variable.
- disdrodb.l0.standards.get_valid_variable_names(sensor_name)[source][source]#
Get list of valid variables.
- disdrodb.l0.standards.get_variables_dimension(sensor_name: str)[source][source]#
Returns a dictionary with the variable dimensions of a L0B product.
- disdrodb.l0.standards.get_velocity_bin_center(sensor_name: str) list[source][source]#
Get velocity bin center.
- disdrodb.l0.standards.get_velocity_bin_lower(sensor_name: str) list[source][source]#
Get velocity bin lower bound.
- disdrodb.l0.standards.get_velocity_bin_upper(sensor_name: str) list[source][source]#
Get velocity bin upper bound.
disdrodb.l0.template_tools module#
Useful tools helping in the implementation of the DISDRODB L0 readers.
- disdrodb.l0.template_tools.check_column_names(column_names: list, sensor_name: str) None[source][source]#
Checks that the column names respects DISDRODB standards.
- disdrodb.l0.template_tools.get_decimal_ndigits(string: str) int[source][source]#
Get the number of decimal digits.
- disdrodb.l0.template_tools.get_df_columns_unique_values_dict(df: DataFrame, column_indices: int | slice | list | None = None, column_names: bool = True)[source][source]#
Create a dictionary {column: unique values}.
- Parameters:
df (pandas.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – Column indices. If
None, select all columns.column_names (bool, optional) – If
True, the dictionary key are the column names. The default value isTrue.
- disdrodb.l0.template_tools.get_natural_ndigits(string: str) int[source][source]#
Get the number of natural digits.
- disdrodb.l0.template_tools.get_nchar(string: str) int[source][source]#
Get the number of characters.
- disdrodb.l0.template_tools.get_ndigits(string: str) int[source][source]#
Get the number of total numeric digits.
- disdrodb.l0.template_tools.get_unique_sorted_values(array)[source][source]#
Return unique sorted values.
It deals with np.nan within an array of string by converting object dtype to str.
- disdrodb.l0.template_tools.infer_column_names(df: DataFrame, sensor_name: str, row_idx: int = 0)[source][source]#
Try to guess the dataframe columns names based on string characteristics.
- Parameters:
df (pandas.DataFrame) – The dataframe to analyse.
sensor_name (str) – name of the sensor.
row_idx (int, optional) – The row index of the dataframe to use to infer the column names. The default row index is 0.
- Returns:
Dictionary with the keys being the column id and the values being the guessed column names
- Return type:
- disdrodb.l0.template_tools.print_allowed_column_names(sensor_name: str) None[source][source]#
Print valid columns names from the standard.
- Parameters:
sensor_name (str) – Name of the sensor.
- disdrodb.l0.template_tools.print_df_column_names(df: DataFrame) None[source][source]#
Print dataframe columns names.
- Parameters:
df (pandas.DataFrame) – The dataframe.
- disdrodb.l0.template_tools.print_df_columns_unique_values(df: DataFrame, column_indices: int | slice | list | None = None, print_column_names: bool = True) None[source][source]#
Print columns’ unique values.
- Parameters:
df (pandas.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – Column indices. If
None, select all columns.column_names (bool, optional) – If
True, print the column names. The default value isTrue.
- disdrodb.l0.template_tools.print_df_first_n_rows(df: DataFrame, n: int = 5, print_column_names: bool = True) None[source][source]#
Print the n first n rows dataframe by column.
- Parameters:
df (pandas.DataFrame) – Input dataframe.
n (int, optional) – Number of row. The default is 5.
column_names (bool , optional) – If true columns name are printed, by default
True.
- disdrodb.l0.template_tools.print_df_random_n_rows(df: DataFrame, n: int = 5, print_column_names: bool = True) None[source][source]#
Print the content of the dataframe by column, randomly chosen.
- Parameters:
df (pandas.DataFrame) – The dataframe.
n (int, optional) – The number of row to print. The default is 5.
print_column_names (bool, optional) – If true, print the column names. The default value is
True.
- disdrodb.l0.template_tools.print_df_summary_stats(df: DataFrame, column_indices: int | slice | list | None = None, print_column_names: bool = True)[source][source]#
Create a columns statistics summary.
- Parameters:
df (pandas.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – Column indices. If
None, select all columns.print_column_names (bool, optional) – If
True, print the column names. The default value isTrue.
- Raises:
ValueError – Error if columns types is not numeric.
- disdrodb.l0.template_tools.print_df_with_any_nan_rows(df: DataFrame) None[source][source]#
Print empty rows.
- Parameters:
df (pandas.DataFrame) – Input dataframe.
- disdrodb.l0.template_tools.str_has_decimal_digits(string: str) bool[source][source]#
Check if a string has decimals.
Module contents#
DISDRODB L0 software.
- disdrodb.l0.available_readers(sensor_name, data_sources=None, return_path=False)[source][source]#
Retrieve available readers information.
- disdrodb.l0.generate_l0a(filepaths: list | str, reader, sensor_name, issue_dict=None, verbose=True, logger=None) DataFrame[source][source]#
Read and parse a list of raw files and generate a DISDRODB L0A dataframe.
- Parameters:
reader – DISDRODB reader function. Format: reader(filepath, logger=None)
sensor_name (str) – Name of the sensor.
issue_dict (dict, optional) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary
{}. Valid issue_dict key are'timesteps'and'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of theissue_dict, use thedisdrodb.l0.issue.check_issue_dictfunction.verbose (bool) – Whether to verbose the processing. The default is
True.
- Returns:
Dataframe
- Return type:
- Raises:
ValueError – Input parameters can not be used or the raw file can not be processed.
- disdrodb.l0.generate_l0b(df: DataFrame, metadata: dict, logger=None, verbose: bool = False) Dataset[source][source]#
Transform the DISDRODB L0A dataframe to the DISDRODB L0B xr.Dataset.
- Parameters:
df (pandas.DataFrame) – DISDRODB L0A dataframe. The raw drop number spectrum is reshaped to a 2D(+time) array. The raw drop concentration and velocity are reshaped to 1D(+time) arrays.
metadata (dict) – DISDRODB station metadata. To use this function outside the DISDRODB routines, the dictionary must contain the fields:
sensor_name,latitude,longitude,altitude,platform_type.verbose (bool, optional) – Whether to verbose the processing. The default value is
False.
- Returns:
DISDRODB L0B dataset.
- Return type:
- Raises:
ValueError – Error if the DISDRODB L0B xarray dataset can not be created.
- disdrodb.l0.generate_l0b_from_nc(filepaths: list | str, reader, sensor_name, metadata, issue_dict=None, verbose=True, logger=None)[source][source]#
Read and parse a list of raw netCDF files and generate a DISDRODB L0B dataset.
- Parameters:
reader – DISDRODB reader function. Format: reader(filepath, logger=None)
sensor_name (str) – Name of the sensor.
metadata (dict) – Station metadata to attach as global attributes to the xr.Dataset.
issue_dict (dict, optional) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary
{}. Valid issue_dict key are'timesteps'and'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of theissue_dict, use thedisdrodb.l0.issue.check_issue_dictfunction.verbose (bool) – Whether to verbose the processing. The default is
True.
- Returns:
DISDRODB L0B Dataset.
- Return type:
- Raises:
ValueError – Input parameters can not be used or the raw file can not be processed.