disdrodb.l0 package#
Subpackages#
Submodules#
disdrodb.l0.check_configs module#
Check configuration files.
- class disdrodb.l0.check_configs.L0BEncodingSchema(*, contiguous: bool, dtype: str, zlib: bool, complevel: int, shuffle: bool, fletcher32: bool, chunksizes: int | list[int] | None)[source][source]#
Bases:
BaseModelPydantic model for DISDRODB L0B encodings.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- classmethod check_contiguous_and_fletcher32(values)[source][source]#
Check the fletcher value validity.
- classmethod check_contiguous_and_zlib(values)[source][source]#
Check the the compression value validity.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class disdrodb.l0.check_configs.RawDataFormatSchema(*, n_digits: int | None, n_characters: int | None, n_decimals: int | None, n_naturals: int | None, data_range: list[float] | None, nan_flags: int | str | None = None, valid_values: list[float] | None = None, dimension_order: list[str] | None = None, n_values: int | None = None, field_number: str | None = None)[source][source]#
Bases:
BaseModelPydantic model for the DISDRODB RAW Data Format YAML files.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- exception disdrodb.l0.check_configs.SchemaValidationException[source][source]#
Bases:
ExceptionException raised when schema validation fails.
- disdrodb.l0.check_configs.check_all_sensors_configs() None[source][source]#
Check all sensors configuration YAML files.
- disdrodb.l0.check_configs.check_l0a_encoding(sensor_name: str) None[source][source]#
Check
l0a_encodings.ymlfile.- Parameters:
sensor_name (str) – Name of the sensor.
- Raises:
ValueError – Error raised if the value of a key is not in the list of accepted values.
disdrodb.l0.check_standards module#
Check data standards.
- disdrodb.l0.check_standards.check_l0a_column_names(df: DataFrame, sensor_name: str) None[source][source]#
Checks that the dataframe columns respects DISDRODB standards.
- Parameters:
df (pandas.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
- Raises:
ValueError – Error if some columns do not meet the DISDRODB standards or if the
'time'column is missing in the dataframe.
- disdrodb.l0.check_standards.check_l0a_standards(df: DataFrame, sensor_name: str, logger=None, verbose: bool = True) None[source][source]#
Checks that a file respects the DISDRODB L0A standards.
- Parameters:
df (pandas.DataFrame) – L0A dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool, optional) – Whether to verbose the processing. The default value is
True.
- Raises:
ValueError – Error if some columns have inconsistent values.
disdrodb.l0.l0_reader module#
Define DISDRODB L0 readers routines.
- disdrodb.l0.l0_reader.available_readers(sensor_name, data_sources=None, return_path=False)[source][source]#
Retrieve available readers information.
- disdrodb.l0.l0_reader.check_metadata_reader(metadata)[source][source]#
Check the metadata
readerkey is available and points to an existing disdrodb reader.
- disdrodb.l0.l0_reader.check_reader_arguments(reader)[source][source]#
Check the reader function have the expected input arguments.
- disdrodb.l0.l0_reader.check_reader_exists(reader_reference, sensor_name)[source][source]#
Check the reader exists.
- disdrodb.l0.l0_reader.check_reader_reference(reader_reference)[source][source]#
Check the reader_reference value.
- disdrodb.l0.l0_reader.check_software_readers()[source][source]#
Check the validity of all readers included in disdrodb software .
- disdrodb.l0.l0_reader.define_reader_path(sensor_name, reader_reference)[source][source]#
Define the reader path based on the reader reference name.
- disdrodb.l0.l0_reader.define_readers_directory(sensor_name='') str[source][source]#
Returns the path to the
disdrodb.l0.readersdirectory within the disdrodb package.
- disdrodb.l0.l0_reader.get_reader(reader_reference, sensor_name)[source][source]#
Retrieve the reader function.
- Parameters:
- Returns:
The
reader()function.- Return type:
callable
- disdrodb.l0.l0_reader.get_reader_from_metadata(metadata)[source][source]#
Retrieve the reader function based on the metadata information.
The reader_reference naming convention is
"{DATA_SOURCE}"/"{CAMPAIGN_NAME}_{OPTIONAL_SUFFIX}". The reader is located atdisdrodb.l0.readers.{sensor_name}.{reader_reference}.
- disdrodb.l0.l0_reader.get_specific_readers_path(sensor_name)[source][source]#
Returns a dictionary with the file paths of the available readers for each data source.
- disdrodb.l0.l0_reader.get_specific_readers_references(sensor_name)[source][source]#
Returns a dictionary with the readers references available for each data source.
- disdrodb.l0.l0_reader.get_station_reader(data_source, campaign_name, station_name, metadata_archive_dir=None)[source][source]#
Retrieve the reader function of a specific DISDRODB station.
- disdrodb.l0.l0_reader.is_documented_by(original)[source][source]#
Wrapper function to apply generic docstring to the decorated function.
- Parameters:
original (function) – Function to take the docstring from.
- disdrodb.l0.l0_reader.list_readers_paths(sensor_name) list[source][source]#
Returns the file paths of the available readers for a given sensor in
disdrodb.l0.readers.{sensor_name}.
- disdrodb.l0.l0_reader.list_readers_references(sensor_name)[source][source]#
Returns the readers references available for a given sensor in
disdrodb.l0.readers.{sensor_name}.
- disdrodb.l0.l0_reader.reader_generic_docstring()[source][source]#
Reader to convert a raw data file to DISDRODB L0A or L0B format.
Raw text files are read and converted to a
pandas.DataFrame(L0A format). Raw netCDF files are read and converted to axarray.Dataset(L0B format).- Parameters:
filepath (str) – Filepath of the raw data file to be processed.
logger (logging.Logger, optional) – Logger to use for logging messages. Default is
None, which means no logger is used.
disdrodb.l0.l0a_processing module#
Functions to process raw text files into DISDRODB L0A Apache Parquet.
- disdrodb.l0.l0a_processing.cast_column_dtypes(df: DataFrame, sensor_name: str) DataFrame[source][source]#
Convert
'object'dataframe columns into DISDRODB L0A dtype standards.- Parameters:
df (pandas.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
- Returns:
Dataframe with corrected columns types.
- Return type:
- disdrodb.l0.l0a_processing.check_matching_column_number(df, column_names)[source][source]#
Check the number of columns in the dataframe matches the length of column names.
- disdrodb.l0.l0a_processing.coerce_corrupted_values_to_nan(df: DataFrame, sensor_name: str) DataFrame[source][source]#
Coerce corrupted values in dataframe numeric columns to
np.nan.- Parameters:
df (pandas.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
- Returns:
Dataframe with string columns without corrupted values.
- Return type:
- disdrodb.l0.l0a_processing.concatenate_dataframe(list_df: list, logger=None, verbose: bool = False) DataFrame[source][source]#
Concatenate a list of dataframes.
- Parameters:
- Returns:
Concatenated dataframe.
- Return type:
- Raises:
ValueError – Concatenation can not be done.
- disdrodb.l0.l0a_processing.drop_time_periods(df, time_periods)[source][source]#
Drop problematic time periods.
- disdrodb.l0.l0a_processing.drop_timesteps(df, timesteps)[source][source]#
Drop problematic time steps.
- disdrodb.l0.l0a_processing.is_raw_array_string_not_corrupted(string)[source][source]#
Check if the raw array is corrupted.
- disdrodb.l0.l0a_processing.preprocess_reader_kwargs(reader_kwargs: dict) dict[source][source]#
Preprocess arguments required to read raw text file into Pandas.
- disdrodb.l0.l0a_processing.read_l0a_dataframe(filepaths: str | list, verbose: bool = False, logger=None, debugging_mode: bool = False) DataFrame[source][source]#
Read DISDRODB L0A Apache Parquet file(s).
- Parameters:
filepaths (str or list) – Either a list or a single filepath.
verbose (bool) – Whether to print detailed processing information into terminal. The default is
False.debugging_mode (bool) – If
True, it reduces the amount of data to process. If filepaths is a list, it reads only the first 3 files. For each file it select only the first 100 rows. The default isFalse.
- Returns:
L0A Dataframe.
- Return type:
- disdrodb.l0.l0a_processing.read_raw_text_file(filepath: str, column_names: list, reader_kwargs: dict, logger=None) DataFrame[source][source]#
Read a raw file into a dataframe.
- Parameters:
filepath (str) – Raw file path.
column_names (list) – Column names.
reader_kwargs (dict) – Pandas
pd.read_csvarguments.logger (logging.Logger) – Logger object. The default is
None. IfNone, the logger is created using the module name. Ifloggeris passed, it will be used to log messages.
- Returns:
Pandas dataframe.
- Return type:
- disdrodb.l0.l0a_processing.read_raw_text_files(filepaths: list | str, reader, sensor_name, verbose=True, logger=None) DataFrame[source][source]#
Read and parse a list for raw files into a dataframe.
- Parameters:
- Returns:
Dataframe
- Return type:
- Raises:
ValueError – Input parameters can not be used or the raw file can not be processed.
- disdrodb.l0.l0a_processing.remove_corrupted_rows(df)[source][source]#
Remove corrupted rows by checking conversion of raw fields to numeric.
Note: The raw array must be stripped away from delimiter at start and end !
- disdrodb.l0.l0a_processing.remove_duplicated_timesteps(df: DataFrame, logger=None, verbose: bool = False)[source][source]#
Remove duplicated timesteps.
It keep only the first timestep occurrence !
- Parameters:
df (pandas.DataFrame) – Input dataframe.
verbose (bool) – Whether to verbose the processing. The default is
False.
- Returns:
Dataframe with valid unique timesteps.
- Return type:
- disdrodb.l0.l0a_processing.remove_issue_timesteps(df, issue_dict, logger=None, verbose=False)[source][source]#
Drop dataframe rows with timesteps listed in the issue dictionary.
- Parameters:
df (pandas.DataFrame) – Input dataframe.
issue_dict (dict) – Issue dictionary.
verbose (bool) – Whether to verbose the processing. The default is
False.
- Returns:
Dataframe with problematic timesteps removed.
- Return type:
- disdrodb.l0.l0a_processing.remove_rows_with_missing_time(df: ~pandas.core.frame.DataFrame, logger=<Logger disdrodb.l0.l0a_processing (WARNING)>, verbose: bool = False)[source][source]#
Remove dataframe rows where the
"time"isNaT.- Parameters:
df (pandas.DataFrame) – Input dataframe.
verbose (bool) – Whether to verbose the processing. The default is
False.
- Returns:
Dataframe with valid timesteps.
- Return type:
- disdrodb.l0.l0a_processing.replace_nan_flags(df, sensor_name, logger=None, verbose=False)[source][source]#
Set values corresponding to
nan_flagstonp.nan.- Parameters:
df (pandas.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing. The default is
False.
- Returns:
Dataframe without nan_flags values.
- Return type:
- disdrodb.l0.l0a_processing.sanitize_df(df, sensor_name, verbose=True, issue_dict=None, logger=None)[source][source]#
Read and parse a raw text files into a L0A dataframe.
- Parameters:
filepath (str) – File path
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing. The default is
True.issue_dict (dict) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary
{}. Valid issue_dict key are'timesteps'and'time_periods'. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of theissue_dict, use thedisdrodb.l0.issue.check_issue_dictfunction.
- Returns:
Dataframe
- Return type:
- disdrodb.l0.l0a_processing.set_nan_invalid_values(df, sensor_name, logger=None, verbose=False)[source][source]#
Set invalid (class) values to
np.nan.- Parameters:
df (pandas.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing. The default is
False.
- Returns:
Dataframe without invalid values.
- Return type:
- disdrodb.l0.l0a_processing.set_nan_outside_data_range(df, sensor_name, logger=None, verbose=False)[source][source]#
Set values outside the data range as
np.nan.- Parameters:
df (pandas.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing. The default is
False.
- Returns:
Dataframe without values outside the expected data range.
- Return type:
- disdrodb.l0.l0a_processing.strip_delimiter(string)[source][source]#
Remove the first and last delimiter occurrence from a string.
- disdrodb.l0.l0a_processing.strip_delimiter_from_raw_arrays(df)[source][source]#
Remove the first and last delimiter occurrence from the raw array fields.
- disdrodb.l0.l0a_processing.strip_string_spaces(df: DataFrame, sensor_name: str) DataFrame[source][source]#
Strip leading/trailing spaces from dataframe string columns.
- Parameters:
df (pandas.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
- Returns:
Dataframe with string columns without leading/trailing spaces.
- Return type:
- disdrodb.l0.l0a_processing.write_l0a(df: DataFrame, filepath: str, force: bool = False, logger=None, verbose: bool = False)[source][source]#
Save the dataframe into an Apache Parquet file.
- Parameters:
df (pandas.DataFrame) – Input dataframe.
filepath (str) – Output file path.
force (bool, optional) – Whether to overwrite existing data. If
True, overwrite existing data into destination directories. IfFalse, raise an error if there are already data into destination directories. This is the default.verbose (bool, optional) – Whether to verbose the processing. The default is
False.
- Raises:
ValueError – The input dataframe can not be written as an Apache Parquet file.
NotImplementedError – The input dataframe can not be processed.
disdrodb.l0.l0b_nc_processing module#
Functions to process DISDRODB raw netCDF files into DISDRODB L0B netCDF files.
- disdrodb.l0.l0b_nc_processing.add_dataset_missing_variables(ds, missing_vars, sensor_name)[source][source]#
Add missing xr.Dataset variables as
np.nanxr.DataArrays.
- disdrodb.l0.l0b_nc_processing.drop_time_periods(ds, time_periods: list)[source][source]#
Drop all time steps within any of the specified time intervals.
- Parameters:
ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.
time_periods (list of tuple) – Each tuple is (start_time, end_time), datetime-like, inclusive.
- Returns:
Dataset with all times within the given periods removed.
- Return type:
- Raises:
ValueError – If no timesteps remain after removal.
- disdrodb.l0.l0b_nc_processing.drop_timesteps(ds, timesteps: list)[source][source]#
Drop specific time steps from a Dataset.
- Parameters:
ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.
timesteps (list) – List of datetime-like values to remove.
- Returns:
Dataset with specified timesteps removed.
- Return type:
- Raises:
ValueError – If no timesteps remain after removal.
- disdrodb.l0.l0b_nc_processing.open_raw_netcdf_file(filepath, logger=None, engine='netcdf4', cache=False, chunks=None, decode_timedelta=False, **kwargs)[source][source]#
Open a raw netCDF file.
- Parameters:
filepath (str) – Path to the raw netCDF file.
- Returns:
Raw netCDF file as an xarray Dataset.
- Return type:
- disdrodb.l0.l0b_nc_processing.remove_issue_timesteps(ds, issue_dict: dict, logger=None, verbose: bool = False)[source][source]#
Remove bad timesteps and time periods from an xarray Dataset according to issue definitions.
- Parameters:
ds (xarray.Dataset) – Input dataset with a ‘time’ dimension.
issue_dict (dict) – Dictionary with optional keys ‘timesteps’ (list of datetimes) and ‘time_periods’ (list of (start, end) tuples).
logger (any, optional) – Logger instance to record dropped steps, by default None.
verbose (bool, optional) – Whether to log informational messages, by default False.
- Returns:
Cleaned dataset.
- Return type:
- Raises:
ValueError – If after removing specified timesteps/periods no data remains.
- disdrodb.l0.l0b_nc_processing.rename_dataset(ds, dict_names)[source][source]#
Rename xr.Dataset variables, coordinates and dimensions.
- disdrodb.l0.l0b_nc_processing.replace_custom_nan_flags(ds, dict_nan_flags, logger=None, verbose=False)[source][source]#
Set values corresponding to
nan_flagstonp.nan.This function must be used in a reader, if necessary.
- Parameters:
df (xarray.Dataset) – Input xarray dataset
dict_nan_flags (dict) – Dictionary with nan flags value to set as
np.nan.verbose (bool) – Whether to verbose the processing. The default value is
False.
- Returns:
Dataset without
nan_flagsvalues.- Return type:
- disdrodb.l0.l0b_nc_processing.replace_nan_flags(ds, sensor_name, verbose, logger=None)[source][source]#
Set values corresponding to
nan_flagstonp.nan.- Parameters:
ds (xarray.Dataset) – Input xarray dataset
dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan
verbose (bool) – Whether to verbose the processing.
- Returns:
Dataset without
nan_flagsvalues.- Return type:
- disdrodb.l0.l0b_nc_processing.sanitize_ds(ds, sensor_name, metadata, issue_dict=None, verbose=False, logger=None)[source][source]#
Convert a raw
xr.Datasetinto a DISDRODB L0B netCDF.- Parameters:
ds (xarray.Dataset) – Raw xarray dataset
attrs (dict) – Global metadata to attach as global attributes to the xr.Dataset.
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing.
- Returns:
L0B xr.Dataset
- Return type:
- disdrodb.l0.l0b_nc_processing.set_nan_invalid_values(ds, sensor_name, verbose, logger=None)[source][source]#
Set invalid (class) values to
np.nan.- Parameters:
ds (xarray.Dataset) – Input xarray dataset
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing.
- Returns:
Dataset without invalid values.
- Return type:
- disdrodb.l0.l0b_nc_processing.set_nan_outside_data_range(ds, sensor_name, verbose, logger=None)[source][source]#
Set values outside the data range as
np.nan.- Parameters:
ds (xarray.Dataset) – Input xarray dataset
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing.
- Returns:
Dataset without values outside the expected data range.
- Return type:
- disdrodb.l0.l0b_nc_processing.standardize_raw_dataset(ds, dict_names, sensor_name)[source][source]#
This function preprocess raw netCDF to improve compatibility with DISDRODB standards.
This function checks validity of the
dict_names, rename and subset the data accordingly. If some variables specified in thedict_namesare missing, it adds anp.nanxr.DataArray !- Parameters:
ds (xarray.Dataset) – Raw netCDF to be converted to DISDRODB standards.
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
sensor_name (str) – Sensor name.
- Returns:
ds – xarray Dataset with variables compliant with DISDRODB conventions.
- Return type:
disdrodb.l0.l0b_processing module#
Functions to process DISDRODB L0A files into DISDRODB L0B netCDF files.
- disdrodb.l0.l0b_processing.add_dataset_crs_coords(ds)[source][source]#
Add the CRS coordinate to the xr.Dataset.
- disdrodb.l0.l0b_processing.create_l0b_from_l0a(df: DataFrame, metadata: dict, logger=None, verbose: bool = False) Dataset[source][source]#
Transform the L0A dataframe to the L0B xr.Dataset.
- Parameters:
df (pandas.DataFrame) – DISDRODB L0A dataframe. The raw drop number spectrum is reshaped to a 2D(+time) array. The raw drop concentration and velocity are reshaped to 1D(+time) arrays.
metadata (dict) – DISDRODB station metadata. To use this function outside the DISDRODB routines, the dictionary must contain the fields:
sensor_name,latitude,longitude,altitude,platform_type.verbose (bool, optional) – Whether to verbose the processing. The default value is
False.
- Returns:
DISDRODB L0B dataset.
- Return type:
- Raises:
ValueError – Error if the DISDRODB L0B xarray dataset can not be created.
- disdrodb.l0.l0b_processing.finalize_dataset(ds, sensor_name, attrs)[source][source]#
Finalize DISDRODB L0B Dataset.
- disdrodb.l0.l0b_processing.infer_split_str(string: str) str[source][source]#
Infer the delimiter inside a string.
- disdrodb.l0.l0b_processing.retrieve_l0b_arrays(df: DataFrame, sensor_name: str, logger=None, verbose: bool = False) dict[source][source]#
Retrieves the L0B data matrix.
- Parameters:
df (pandas.DataFrame) – Input dataframe
sensor_name (str) – Name of the sensor
- Returns:
Dictionary with data arrays.
- Return type:
- disdrodb.l0.l0b_processing.set_geolocation_coordinates(ds, attrs)[source][source]#
Add geolocation coordinates to dataset.
- disdrodb.l0.l0b_processing.set_l0b_encodings(ds: Dataset, sensor_name: str)[source][source]#
Apply the L0B encodings to the xarray Dataset.
- Parameters:
ds (xarray.Dataset) – Input xarray dataset.
sensor_name (str) – Name of the sensor.
- Returns:
Output xarray dataset.
- Return type:
- disdrodb.l0.l0b_processing.write_l0b(ds: Dataset, filepath: str, force=False) None[source][source]#
Save the xarray dataset into a NetCDF file.
- Parameters:
ds (xarray.Dataset) – Input xarray dataset.
filepath (str) – Output file path.
sensor_name (str) – Name of the sensor.
force (bool, optional) – Whether to overwrite existing data. If
True, overwrite existing data into destination directories. IfFalse, raise an error if there are already data into destination directories. This is the default.
disdrodb.l0.l0c_processing module#
Functions to process DISDRODB L0B files into DISDRODB L0C netCDF files.
- disdrodb.l0.l0c_processing.check_timesteps_regularity(ds, sample_interval, verbose=False, logger=None)[source][source]#
Check for the regularity of timesteps.
- disdrodb.l0.l0c_processing.create_daily_file(day, filepaths, measurement_intervals, ensure_variables_equality=True, logger=None, verbose=True)[source][source]#
Create a daily file by merging and processing data from multiple filepaths.
- Parameters:
day (str or numpy.datetime64) – The day for which the daily file is to be created. Should be in a format that can be converted to numpy.datetime64.
filepaths (list of str) – List of filepaths to the data files to be processed.
- Returns:
The processed dataset containing data for the specified day.
- Return type:
- Raises:
ValueError – If less than 5 timesteps are available for the specified day.
Notes
The function adds a tolerance for searching timesteps
before and after 00:00 to account for imprecise logging times. - It checks that duplicated timesteps have the same raw drop number values. - The function infers the time integration sample interval and regularizes timesteps to handle trailing seconds. - The data is loaded into memory and connections to source files are closed before returning the dataset.
- disdrodb.l0.l0c_processing.drop_timesteps_with_invalid_sample_interval(ds, measurement_intervals, verbose=True, logger=None)[source][source]#
Drop timesteps with unexpected sample intervals.
- disdrodb.l0.l0c_processing.finalize_l0c_dataset(ds, sample_interval, start_day, end_day, verbose=True, logger=None)[source][source]#
Finalize a L0C dataset with unique sampling interval.
It adds the sampling_interval coordinate and it regularizes the timesteps for trailing seconds.
- disdrodb.l0.l0c_processing.get_files_per_days(filepaths)[source][source]#
Organize files by the days they cover based on their start and end times.
- Parameters:
filepaths (list of str) – List of file paths to be processed.
- Returns:
Dictionary where keys are days (as strings) and values are lists of file paths that cover those days.
- Return type:
Notes
This function adds a tolerance of 60 seconds to account for imprecise time logging by the sensors.
- disdrodb.l0.l0c_processing.has_same_value_over_time(da)[source][source]#
Check if a DataArray has the same value over all timesteps, considering NaNs as equal.
- Parameters:
da (xarray.DataArray) – The DataArray to check. Must have a ‘time’ dimension.
- Returns:
True if the values are the same (or NaN in the same positions) across all timesteps, False otherwise.
- Return type:
- disdrodb.l0.l0c_processing.remove_duplicated_timesteps(ds, ensure_variables_equality=True, logger=None, verbose=True)[source][source]#
Removes duplicated timesteps from a xarray dataset.
- disdrodb.l0.l0c_processing.retrieve_possible_measurement_intervals(metadata)[source][source]#
Retrieve list of possible measurements intervals.
- disdrodb.l0.l0c_processing.split_dataset_by_sampling_intervals(ds, measurement_intervals, min_sample_interval=10, min_block_size=5)[source][source]#
Split a dataset into subsets where each subset has a consistent sampling interval.
- Parameters:
ds (xarray.Dataset) – The input dataset with a ‘time’ dimension.
measurement_intervals (list or array-like) – A list of possible primary sampling intervals (in seconds) that the dataset might have.
min_sample_interval (int, optional) – The minimum expected sampling interval in seconds. Defaults to 10s.
min_block_size (float, optional) – The minimum number of timesteps with a given sampling interval to be considered. Otherwise such portion of data is discarded ! Defaults to 5 timesteps.
- Returns:
A dictionary where keys are the identified sampling intervals (in seconds), and values are xarray.Datasets containing only data from those intervals.
- Return type:
disdrodb.l0.routines module#
Implement DISDRODB L0 processing.
- disdrodb.l0.routines.run_l0a_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#
Run the L0A processing of a specific DISDRODB station when invoked from the terminal.
This function is intended to be called through the
disdrodb_run_l0a_stationcommand-line interface.- Parameters:
data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.
campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.
station_name (str) – The name of the station.
force (bool, optional) – If
True, existing data in the destination directories will be overwritten. IfFalse(default), an error will be raised if data already exists in the destination directories.verbose (bool, optional) – If
True(default), detailed processing information will be printed to the terminal. IfFalse, less information will be displayed.parallel (bool, optional) – If
True, files will be processed in multiple processes simultaneously with each process using a single thread. IfFalse(default), files will be processed sequentially in a single process, and multi-threading will be automatically exploited to speed up I/O tasks.debugging_mode (bool, optional) – If
True, the amount of data processed will be reduced. Only the first 3 raw data files will be processed. The default value isFalse.data_archive_dir (str, optional) – The base directory of DISDRODB, expected in the format
<...>/DISDRODB. If not specified, the path specified in the DISDRODB active configuration will be used.
- disdrodb.l0.routines.run_l0b_station(data_source, campaign_name, station_name, remove_l0a: bool = False, force: bool = False, verbose: bool = True, parallel: bool = True, debugging_mode: bool = False, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#
Run the L0B processing of a specific DISDRODB station when invoked from the terminal.
This function is intended to be called through the
disdrodb_run_l0b_stationcommand-line interface.- Parameters:
data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.
campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.
station_name (str) – The name of the station.
force (bool, optional) – If
True, existing data in the destination directories will be overwritten. IfFalse(default), an error will be raised if data already exists in the destination directories.verbose (bool, optional) – If
True(default), detailed processing information will be printed to the terminal. IfFalse, less information will be displayed.parallel (bool, optional) – If
True, files will be processed in multiple processes simultaneously, with each process using a single thread to avoid issues with the HDF/netCDF library. IfFalse(default), files will be processed sequentially in a single process, and multi-threading will be automatically exploited to speed up I/O tasks.debugging_mode (bool, optional) – If
True, the amount of data processed will be reduced. Only the first 100 rows of 3 L0A files will be processed. The default value isFalse.remove_l0a (bool, optional) – Whether to remove the processed L0A files. The default value is
False.data_archive_dir (str, optional) – The base directory of DISDRODB, expected in the format
<...>/DISDRODB. If not specified, the path specified in the DISDRODB active configuration will be used.
- disdrodb.l0.routines.run_l0c_station(data_source, campaign_name, station_name, remove_l0b: bool = False, force: bool = False, verbose: bool = True, parallel: bool = True, debugging_mode: bool = False, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#
Run the L0C processing of a specific DISDRODB station when invoked from the terminal.
The DISDRODB L0A and L0B routines just convert source raw data into netCDF format. The DISDRODB L0C routine ingests L0B files and performs data homogenization. The DISDRODB L0C routine takes care of:
removing duplicated timesteps across files,
merging/splitting files into daily files,
regularizing timesteps for potentially trailing seconds,
ensuring L0C files with unique sample intervals.
Duplicated timesteps are automatically dropped if their variable values coincides, otherwise an error is raised.
This function is intended to be called through the
disdrodb_run_l0c_stationcommand-line interface.- Parameters:
data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.
campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.
station_name (str) – The name of the station.
force (bool, optional) – If
True, existing data in the destination directories will be overwritten. IfFalse(default), an error will be raised if data already exists in the destination directories.verbose (bool, optional) – If
True(default), detailed processing information will be printed to the terminal. IfFalse, less information will be displayed.parallel (bool, optional) – If
True, files will be processed in multiple processes simultaneously, with each process using a single thread to avoid issues with the HDF/netCDF library. IfFalse(default), files will be processed sequentially in a single process, and multi-threading will be automatically exploited to speed up I/O tasks.debugging_mode (bool, optional) – If
True, the amount of data processed will be reduced. Only the first 3 files will be processed. The default value isFalse.remove_l0b (bool, optional) – Whether to remove the processed L0B files. The default value is
False.data_archive_dir (str, optional) – The base directory of DISDRODB, expected in the format
<...>/DISDRODB. If not specified, the path specified in the DISDRODB active configuration will be used.
disdrodb.l0.standards module#
Retrieve L0 sensor standards.
- disdrodb.l0.standards.allowed_l0_variables(sensor_name: str) list[source][source]#
Get the list of allowed L0 variables for a given sensor.
- disdrodb.l0.standards.get_bin_coords_dict(sensor_name: str) dict[source][source]#
Retrieve diameter (and velocity) bin coordinates.
- disdrodb.l0.standards.get_data_format_dict(sensor_name: str) dict[source][source]#
Get a dictionary containing the data format of each sensor variable.
- disdrodb.l0.standards.get_data_range_dict(sensor_name: str) dict[source][source]#
Get the variable data range.
- disdrodb.l0.standards.get_diameter_bin_center(sensor_name: str) list[source][source]#
Get diameter bin center.
- disdrodb.l0.standards.get_diameter_bin_lower(sensor_name: str) list[source][source]#
Get diameter bin lower bound.
- disdrodb.l0.standards.get_diameter_bin_upper(sensor_name: str) list[source][source]#
Get diameter bin upper bound.
- disdrodb.l0.standards.get_diameter_bin_width(sensor_name: str) list[source][source]#
Get diameter bin width.
- disdrodb.l0.standards.get_diameter_bins_dict(sensor_name: str) dict[source][source]#
Get dictionary with
sensor_namediameter bins information.
- disdrodb.l0.standards.get_dims_size_dict(sensor_name: str) dict[source][source]#
Get the number of bins for each dimension.
- disdrodb.l0.standards.get_field_nchar_dict(sensor_name: str) dict[source][source]#
Get the total number of characters from the instrument default string standards.
Important note: it accounts also for the comma and the minus sign !!!
- disdrodb.l0.standards.get_field_ndigits_decimals_dict(sensor_name: dict) dict[source][source]#
Get number of digits on the right side of the comma from the instrument default string standards.
Example: 123,45 -> 45 –> 2 decimal digits.
- disdrodb.l0.standards.get_field_ndigits_dict(sensor_name: str) dict[source][source]#
Get number of digits from the instrument default string standards.
Important note: it excludes the comma but it counts the minus sign !!!
- disdrodb.l0.standards.get_field_ndigits_natural_dict(sensor_name: str) dict[source][source]#
Get number of digits on the left side of the comma from the instrument default string standards.
Example: 123,45 -> 123 –> 3 natural digits.
- disdrodb.l0.standards.get_l0a_dtype(sensor_name: str) dict[source][source]#
Get a dictionary containing the L0A dtype.
- disdrodb.l0.standards.get_l0a_encodings_dict(sensor_name: str) dict[source][source]#
Get a dictionary containing the L0A encodings.
- disdrodb.l0.standards.get_l0b_cf_attrs_dict(sensor_name: str) dict[source][source]#
Get a dictionary containing the CF attributes of each sensor variable.
- disdrodb.l0.standards.get_l0b_encodings_dict(sensor_name: str) dict[source][source]#
Get a dictionary containing the encoding to write L0B netCDFs.
- disdrodb.l0.standards.get_n_diameter_bins(sensor_name)[source][source]#
Get the number of diameter bins.
- disdrodb.l0.standards.get_n_velocity_bins(sensor_name)[source][source]#
Get the number of velocity bins.
- disdrodb.l0.standards.get_nan_flags_dict(sensor_name: str) dict[source][source]#
Get the variable nan_flags.
- disdrodb.l0.standards.get_raw_array_dims_order(sensor_name: str) dict[source][source]#
Get the dimension order of the raw fields.
The order of dimension specified for raw_drop_number controls the reshaping of the precipitation raw spectrum.
Examples
OTT Parsivel spectrum [v1d1 … v1d32, v2d1, …, v2d32] –> dimension_order = [“velocity_bin_center”, “diameter_bin_center”] Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] –> dimension_order = [“diameter_bin_center”, “velocity_bin_center”]
- disdrodb.l0.standards.get_raw_array_nvalues(sensor_name: str) dict[source][source]#
Get a dictionary with the number of values expected for each raw array.
- disdrodb.l0.standards.get_sensor_logged_variables(sensor_name: str) list[source][source]#
Get the sensor logged variables list.
- disdrodb.l0.standards.get_valid_coordinates_names(sensor_name)[source][source]#
Get list of valid coordinates for DISDRODB L0B.
- disdrodb.l0.standards.get_valid_dimension_names(sensor_name)[source][source]#
Get list of valid dimension names for DISDRODB L0B.
- disdrodb.l0.standards.get_valid_names(sensor_name)[source][source]#
Return the list of valid variable and coordinates names for DISDRODB L0B.
- disdrodb.l0.standards.get_valid_values_dict(sensor_name: str) dict[source][source]#
Get the list of valid values for a variable.
- disdrodb.l0.standards.get_valid_variable_names(sensor_name)[source][source]#
Get list of valid variables.
- disdrodb.l0.standards.get_variables_dimension(sensor_name: str)[source][source]#
Returns a dictionary with the variable dimensions of a L0B product.
- disdrodb.l0.standards.get_velocity_bin_center(sensor_name: str) list[source][source]#
Get velocity bin center.
- disdrodb.l0.standards.get_velocity_bin_lower(sensor_name: str) list[source][source]#
Get velocity bin lower bound.
- disdrodb.l0.standards.get_velocity_bin_upper(sensor_name: str) list[source][source]#
Get velocity bin upper bound.
disdrodb.l0.template_tools module#
Useful tools helping in the implementation of the DISDRODB L0 readers.
- disdrodb.l0.template_tools.check_column_names(column_names: list, sensor_name: str) None[source][source]#
Checks that the column names respects DISDRODB standards.
- disdrodb.l0.template_tools.get_decimal_ndigits(string: str) int[source][source]#
Get the number of decimal digits.
- disdrodb.l0.template_tools.get_df_columns_unique_values_dict(df: DataFrame, column_indices: int | slice | list | None = None, column_names: bool = True)[source][source]#
Create a dictionary {column: unique values}.
- Parameters:
df (pandas.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – Column indices. If
None, select all columns.column_names (bool, optional) – If
True, the dictionary key are the column names. The default value isTrue.
- disdrodb.l0.template_tools.get_natural_ndigits(string: str) int[source][source]#
Get the number of natural digits.
- disdrodb.l0.template_tools.get_nchar(string: str) int[source][source]#
Get the number of characters.
- disdrodb.l0.template_tools.get_ndigits(string: str) int[source][source]#
Get the number of total numeric digits.
- disdrodb.l0.template_tools.get_unique_sorted_values(array)[source][source]#
Return unique sorted values.
It deals with np.nan within an array of string by converting object dtype to str.
- disdrodb.l0.template_tools.infer_column_names(df: DataFrame, sensor_name: str, row_idx: int = 1)[source][source]#
Try to guess the dataframe columns names based on string characteristics.
- Parameters:
df (pandas.DataFrame) – The dataframe to analyse.
sensor_name (str) – name of the sensor.
row_idx (int, optional) – The row index of the dataframe to use to infer the column names. The default row index is 1.
- Returns:
Dictionary with the keys being the column id and the values being the guessed column names
- Return type:
- disdrodb.l0.template_tools.print_allowed_column_names(sensor_name: str) None[source][source]#
Print valid columns names from the standard.
- Parameters:
sensor_name (str) – Name of the sensor.
- disdrodb.l0.template_tools.print_df_column_names(df: DataFrame) None[source][source]#
Print dataframe columns names.
- Parameters:
df (pandas.DataFrame) – The dataframe.
- disdrodb.l0.template_tools.print_df_columns_unique_values(df: DataFrame, column_indices: int | slice | list | None = None, print_column_names: bool = True) None[source][source]#
Print columns’ unique values.
- Parameters:
df (pandas.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – Column indices. If
None, select all columns.column_names (bool, optional) – If
True, print the column names. The default value isTrue.
- disdrodb.l0.template_tools.print_df_first_n_rows(df: DataFrame, n: int = 5, print_column_names: bool = True) None[source][source]#
Print the n first n rows dataframe by column.
- Parameters:
df (pandas.DataFrame) – Input dataframe.
n (int, optional) – Number of row. The default is 5.
column_names (bool , optional) – If true columns name are printed, by default
True.
- disdrodb.l0.template_tools.print_df_random_n_rows(df: DataFrame, n: int = 5, print_column_names: bool = True) None[source][source]#
Print the content of the dataframe by column, randomly chosen.
- Parameters:
df (pandas.DataFrame) – The dataframe.
n (int, optional) – The number of row to print. The default is 5.
print_column_names (bool, optional) – If true, print the column names. The default value is
True.
- disdrodb.l0.template_tools.print_df_summary_stats(df: DataFrame, column_indices: int | slice | list | None = None, print_column_names: bool = True)[source][source]#
Create a columns statistics summary.
- Parameters:
df (pandas.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – Column indices. If
None, select all columns.print_column_names (bool, optional) – If
True, print the column names. The default value isTrue.
- Raises:
ValueError – Error if columns types is not numeric.
- disdrodb.l0.template_tools.print_df_with_any_nan_rows(df: DataFrame) None[source][source]#
Print empty rows.
- Parameters:
df (pandas.DataFrame) – Input dataframe.
- disdrodb.l0.template_tools.str_has_decimal_digits(string: str) bool[source][source]#
Check if a string has decimals.
Module contents#
DISDRODB L0 software.