disdrodb.l0 package#
Subpackages#
Submodules#
disdrodb.l0.check_configs module#
Check configuration files.
- class disdrodb.l0.check_configs.L0BEncodingSchema(*, contiguous: bool, dtype: str, zlib: bool, complevel: int, shuffle: bool, fletcher32: bool, chunksizes: Optional[Union[int, list[int]]])[source]#
Bases:
BaseModel
Pydantic model for DISDRODB L0B encodings.
- chunksizes: Optional[Union[int, list[int]]]#
- complevel: int#
- contiguous: bool#
- dtype: str#
- fletcher32: bool#
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'chunksizes': FieldInfo(annotation=Union[int, list[int], NoneType], required=True), 'complevel': FieldInfo(annotation=int, required=True), 'contiguous': FieldInfo(annotation=bool, required=True), 'dtype': FieldInfo(annotation=str, required=True), 'fletcher32': FieldInfo(annotation=bool, required=True), 'shuffle': FieldInfo(annotation=bool, required=True), 'zlib': FieldInfo(annotation=bool, required=True)}#
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- model_post_init(__context: Any) None #
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters
self – The BaseModel instance.
__context – The context.
- shuffle: bool#
- zlib: bool#
- class disdrodb.l0.check_configs.RawDataFormatSchema(*, n_digits: Optional[int], n_characters: Optional[int], n_decimals: Optional[int], n_naturals: Optional[int], data_range: Optional[list[float]], nan_flags: Optional[Union[int, str]] = None, valid_values: Optional[list[float]] = None, dimension_order: Optional[list[str]] = None, n_values: Optional[int] = None, field_number: Optional[str] = None)[source]#
Bases:
BaseModel
Pydantic model for the DISDRODB Raw Data Format YAML files.
- data_range: Optional[list[float]]#
- dimension_order: Optional[list[str]]#
- field_number: Optional[str]#
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'data_range': FieldInfo(annotation=Union[list[float], NoneType], required=True), 'dimension_order': FieldInfo(annotation=Union[list[str], NoneType], required=False), 'field_number': FieldInfo(annotation=Union[str, NoneType], required=False), 'n_characters': FieldInfo(annotation=Union[int, NoneType], required=True), 'n_decimals': FieldInfo(annotation=Union[int, NoneType], required=True), 'n_digits': FieldInfo(annotation=Union[int, NoneType], required=True), 'n_naturals': FieldInfo(annotation=Union[int, NoneType], required=True), 'n_values': FieldInfo(annotation=Union[int, NoneType], required=False), 'nan_flags': FieldInfo(annotation=Union[int, str, NoneType], required=False), 'valid_values': FieldInfo(annotation=Union[list[float], NoneType], required=False)}#
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- n_characters: Optional[int]#
- n_decimals: Optional[int]#
- n_digits: Optional[int]#
- n_naturals: Optional[int]#
- n_values: Optional[int]#
- nan_flags: Optional[Union[int, str]]#
- valid_values: Optional[list[float]]#
- exception disdrodb.l0.check_configs.SchemaValidationException[source]#
Bases:
Exception
Exception raised when schema validation fails.
- disdrodb.l0.check_configs.check_all_sensors_configs() None [source]#
Check all sensors configuration YAML files.
- disdrodb.l0.check_configs.check_l0a_encoding(sensor_name: str) None [source]#
Check
l0a_encodings.yml
file.- Parameters
sensor_name (str) – Name of the sensor.
- Raises
ValueError – Error raised if the value of a key is not in the list of accepted values.
disdrodb.l0.check_standards module#
Check data standards.
- disdrodb.l0.check_standards.check_l0a_column_names(df: DataFrame, sensor_name: str) None [source]#
Checks that the dataframe columns respects DISDRODB standards.
- Parameters
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
- Raises
ValueError – Error if some columns do not meet the DISDRODB standards or if the
'time'
column is missing in the dataframe.
- disdrodb.l0.check_standards.check_l0a_standards(df: DataFrame, sensor_name: str, verbose: bool = True) None [source]#
Checks that a file respects the DISDRODB L0A standards.
- Parameters
df (pd.DataFrame) – L0A dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool, optional) – Whether to verbose the processing. The default is
True
.
- Raises
ValueError – Error if some columns have inconsistent values.
disdrodb.l0.io module#
Define DISDRODB Data Input/Output.
- disdrodb.l0.io.get_l0a_filepaths(processed_dir, station_name, debugging_mode=False)[source]#
Retrieve L0A files for a give station.
- Parameters
processed_dir (str) – Directory of the campaign where to search for the L0A files. Format:
<..>/DISDRODB/Processed/<DATA_SOURCE>/<CAMPAIGN_NAME>
.station_name (str) – ID of the station
debugging_mode (bool, optional) – If
True
, it select maximum 3 files for debugging purposes. The default isFalse
.
- Returns
filepaths – List of L0A file paths.
- Return type
list
- disdrodb.l0.io.get_raw_filepaths(raw_dir, station_name, glob_patterns, verbose=False, debugging_mode=False)[source]#
Get the list of files from a directory based on input parameters.
Currently concatenates all files provided by the glob patterns. In future, this might be modified to enable DISDRODB processing when raw data are separated in multiple files.
- Parameters
raw_dir (str) – Directory of the campaign where to search for files. Format <..>/DISDRODB/Raw/<DATA_SOURCE>/<CAMPAIGN_NAME>
station_name (str) – ID of the station
verbose (bool, optional) – Whether to verbose the processing. The default is
False
.debugging_mode (bool, optional) – If
True
, it select maximum 3 files for debugging purposes. The default isFalse
.
- Returns
filepaths – List of files file paths.
- Return type
list
- disdrodb.l0.io.read_l0a_dataframe(filepaths: Union[str, list], verbose: bool = False, debugging_mode: bool = False) DataFrame [source]#
Read DISDRODB L0A Apache Parquet file(s).
- Parameters
filepaths (str or list) – Either a list or a single filepath.
verbose (bool) – Whether to print detailed processing information into terminal. The default is
False
.debugging_mode (bool) – If
True
, it reduces the amount of data to process. If filepaths is a list, it reads only the first 3 files. For each file it select only the first 100 rows. The default isFalse
.
- Returns
L0A Dataframe.
- Return type
pd.DataFrame
disdrodb.l0.l0_processing module#
Implement DISDRODB L0 processing.
- disdrodb.l0.l0_processing.run_l0a(raw_dir, processed_dir, station_name, glob_patterns, column_names, reader_kwargs, df_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]#
Run the L0A processing for a specific DISDRODB station.
This function is called in each reader to convert raw text files into DISDRODB L0A products.
- Parameters
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<...>/DISDRODB/Raw/<DATA_SOURCE>/<CAMPAIGN_NAME>
. Inside theraw_dir
directory, it is required to adopt the following structure:- ``/data/<station_name>/<raw_files>`` - ``/metadata/<station_name>.yml``
Important points:
For each
<station_name>
, there must be a corresponding YAML file in the metadata subdirectory.The
campaign_name
are expected to be UPPER CASE.- The
<CAMPAIGN_NAME>
must semantically match between: the
raw_dir
andprocessed_dir
directory paths;with the key
campaign_name
within the metadata YAML files.
- The
processed_dir (str) – The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:
<...>/DISDRODB/Processed/<DATA_SOURCE>/<CAMPAIGN_NAME>
. For testing purposes, this function exceptionally accepts also a directory path simply ending with<CAMPAIGN_NAME>
(e.g.,/tmp/<CAMPAIGN_NAME>
).station_name (str) –
station. (The name of the) –
glob_patterns (str) – Glob pattern to search for data files in
<raw_dir>/data/<station_name>
.column_names (list) – Column names of the raw text file.
reader_kwargs (dict) – Arguments for Pandas
read_csv
function to open the text file.df_sanitizer_fun (callable, optional) – Sanitizer function to format the DataFrame into DISDRODB L0A standard. Default is
None
.parallel (bool, optional) – If
True
, process the files simultaneously in multiple processes. The number of simultaneous processes can be customized using thedask.distributed.LocalCluster
. IfFalse
, process the files sequentially in a single process. Default isFalse
.verbose (bool, optional) – If
True
, print detailed processing information to the terminal. Default isFalse
.force (bool, optional) – If
True
, overwrite existing data in destination directories. IfFalse
, raise an error if data already exists in destination directories. Default isFalse
.debugging_mode (bool, optional) – If
True
, reduce the amount of data to process. Processes only the first 100 rows of 3 raw data files. Default isFalse
.
- disdrodb.l0.l0_processing.run_l0a_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, base_dir: Optional[str] = None)[source]#
Run the L0A processing of a specific DISDRODB station when invoked from the terminal.
This function is intended to be called through the
disdrodb_run_l0a_station
command-line interface.- Parameters
data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.
campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.
station_name (str) – The name of the station.
force (bool, optional) – If
True
, existing data in the destination directories will be overwritten. IfFalse
(default), an error will be raised if data already exists in the destination directories.verbose (bool, optional) – If
True
(default), detailed processing information will be printed to the terminal. IfFalse
, less information will be displayed.parallel (bool, optional) – If
True
, files will be processed in multiple processes simultaneously with each process using a single thread. IfFalse
(default), files will be processed sequentially in a single process, and multi-threading will be automatically exploited to speed up I/O tasks.debugging_mode (bool, optional) – If
True
, the amount of data processed will be reduced. Only the first 3 raw data files will be processed. By default,False
.base_dir (str, optional) – The base directory of DISDRODB, expected in the format
<...>/DISDRODB
. If not specified, the path specified in the DISDRODB active configuration will be used.
- disdrodb.l0.l0_processing.run_l0b(processed_dir, station_name, parallel, force, verbose, debugging_mode)[source]#
Run the L0B processing for a specific DISDRODB station.
- Parameters
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<...>/DISDRODB/Raw/<DATA_SOURCE>/<CAMPAIGN_NAME>
. Inside theraw_dir
directory, it is required to adopt the following structure:- ``/data/<station_name>/<raw_files>`` - ``/metadata/<station_name>.yml``
Important points:
For each
<station_name>
, there must be a corresponding YAML file in the metadata subdirectory.The
campaign_name
are expected to be UPPER CASE.- The
<CAMPAIGN_NAME>
must semantically match between: the
raw_dir
andprocessed_dir
directory paths;with the key
campaign_name
within the metadata YAML files.
- The
processed_dir (str) – The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:
<...>/DISDRODB/Processed/<DATA_SOURCE>/<CAMPAIGN_NAME>
. For testing purposes, this function exceptionally accepts also a directory path simply ending with<CAMPAIGN_NAME>
(e.g.,/tmp/<CAMPAIGN_NAME>
).station_name (str) – The name of the station.
force (bool, optional) – If
True
, overwrite existing data in destination directories. IfFalse
, raise an error if data already exists in destination directories. Default isFalse
.verbose (bool, optional) – If
True
, print detailed processing information to the terminal. Default isTrue
.parallel (bool, optional) – If
True
, process the files simultaneously in multiple processes. The number of simultaneous processes can be customized using thedask.distributed.LocalCluster
. Ensure that thethreads_per_worker
(number of thread per process) is set to 1 to avoid HDF errors. Also, ensure to set theHDF5_USE_FILE_LOCKING
environment variable toFalse
. IfFalse
, process the files sequentially in a single process. Default isFalse
.debugging_mode (bool, optional) – If
True
, reduce the amount of data to process. Only the first 3 raw data files will be processed. Default isFalse
.
- disdrodb.l0.l0_processing.run_l0b_concat(processed_dir, station_name, verbose=False)[source]#
Concatenate all L0B netCDF files into a single netCDF file.
The single netCDF file is saved at
<processed_dir>/L0B
.
- disdrodb.l0.l0_processing.run_l0b_concat_station(data_source, campaign_name, station_name, remove_l0b=False, verbose=True, base_dir: Optional[str] = None)[source]#
Define the L0B file concatenation of a station.
This function is intended to be called through the
disdrodb_run_l0b_concat station
command-line interface.- Parameters
data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.
campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.
station_name (str) – The name of the station.
verbose (bool, optional) – If
True
(default), detailed processing information will be printed to the terminal. IfFalse
, less information will be displayed.base_dir (str, optional) – The base directory of DISDRODB, expected in the format
<...>/DISDRODB
. If not specified, the path specified in the DISDRODB active configuration will be used.
- disdrodb.l0.l0_processing.run_l0b_from_nc(raw_dir, processed_dir, station_name, glob_patterns, dict_names, ds_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]#
Run the L0B processing for a specific DISDRODB station with raw netCDFs.
This function is called in the reader where raw netCDF files must be converted into DISDRODB L0B format.
- Parameters
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<...>/DISDRODB/Raw/<DATA_SOURCE>/<CAMPAIGN_NAME>
. Inside theraw_dir
directory, it is required to adopt the following structure:- ``/data/<station_name>/<raw_files>`` - ``/metadata/<station_name>.yml``
Important points:
For each
<station_name>
, there must be a corresponding YAML file in the metadata subdirectory.The
campaign_name
are expected to be UPPER CASE.- The
<CAMPAIGN_NAME>
must semantically match between: the
raw_dir
andprocessed_dir
directory paths;with the key
campaign_name
within the metadata YAML files.
- The
processed_dir (str) – The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:
<...>/DISDRODB/Processed/<DATA_SOURCE>/<CAMPAIGN_NAME>
. For testing purposes, this function exceptionally accepts also a directory path simply ending with<CAMPAIGN_NAME>
(e.g.,/tmp/<CAMPAIGN_NAME>
).station_name (str) – The name of the station.
glob_patterns (str) – Glob pattern to search data files in
<raw_dir>/data/<station_name>
. Example:glob_patterns = "*.nc"
dict_names (dict) –
- Dictionary mapping raw netCDF variables/coordinates/dimension names
to DISDRODB standards.
- ds_sanitizer_funobject, optional
Sanitizer function to format the raw netCDF into DISDRODB L0B standard.
force (bool, optional) – If
True
, overwrite existing data in destination directories. IfFalse
, raise an error if data already exists in destination directories. Default isFalse
.verbose (bool, optional) – If
True
, print detailed processing information to the terminal. Default isTrue
.parallel (bool, optional) – If
True
, process the files simultaneously in multiple processes. The number of simultaneous processes can be customized using thedask.distributed.LocalCluster
. Ensure that thethreads_per_worker
(number of thread per process) is set to 1 to avoid HDF errors. Also, ensure to set theHDF5_USE_FILE_LOCKING
environment variable toFalse
. IfFalse
, process the files sequentially in a single process. IfFalse
, multi-threading is automatically exploited to speed up I/0 tasks. Default isFalse
.debugging_mode (bool, optional) – If
True
, reduce the amount of data to process. Only the first 3 raw netCDF files will be processed. Default isFalse
.
- disdrodb.l0.l0_processing.run_l0b_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = True, parallel: bool = True, debugging_mode: bool = False, remove_l0a: bool = False, base_dir: Optional[str] = None)[source]#
Run the L0B processing of a specific DISDRODB station when invoked from the terminal.
This function is intended to be called through the
disdrodb_run_l0b_station
command-line interface.- Parameters
data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.
campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.
station_name (str) – The name of the station.
force (bool, optional) – If
True
, existing data in the destination directories will be overwritten. IfFalse
(default), an error will be raised if data already exists in the destination directories.verbose (bool, optional) – If
True
(default), detailed processing information will be printed to the terminal. IfFalse
, less information will be displayed.parallel (bool, optional) – If
True
, files will be processed in multiple processes simultaneously, with each process using a single thread to avoid issues with the HDF/netCDF library. IfFalse
(default), files will be processed sequentially in a single process, and multi-threading will be automatically exploited to speed up I/O tasks.debugging_mode (bool, optional) – If
True
, the amount of data processed will be reduced. Only the first 100 rows of 3 L0A files will be processed. By default,False
.base_dir (str, optional) – The base directory of DISDRODB, expected in the format
<...>/DISDRODB
. If not specified, the path specified in the DISDRODB active configuration will be used.
disdrodb.l0.l0_reader module#
Define DISDRODB L0 readers routines.
- disdrodb.l0.l0_reader.available_readers(data_sources=None, reader_path=False)[source]#
Retrieve available readers information.
- disdrodb.l0.l0_reader.check_available_readers()[source]#
Check the readers arguments of all package.
- disdrodb.l0.l0_reader.get_reader_function(reader_data_source: str, reader_name: str) object [source]#
Returns the reader function based on input parameters.
- Parameters
reader_data_source (str) – The directory within which the
reader_name
is located in thedisdrodb.l0.readers directory
.reader_name (str) – The reader name.
- Returns
The
reader()
function- Return type
object
- disdrodb.l0.l0_reader.get_reader_function_from_metadata_key(reader_data_source_name)[source]#
Retrieve the reader function from the
reader
metadata value.The convention for metadata reader key:
<data_source/reader_name>
indisdrodb.l0.readers
.
- disdrodb.l0.l0_reader.get_station_reader_function(data_source, campaign_name, station_name, base_dir=None)[source]#
Retrieve the reader function from the station metadata.
- disdrodb.l0.l0_reader.is_documented_by(original)[source]#
Wrapper function to apply generic docstring to the decorated function.
- Parameters
original (function) – Function to take the docstring from.
- disdrodb.l0.l0_reader.reader_generic_docstring()[source]#
Script to convert the raw data to L0A format.
- Parameters
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure
<...>/DISDRODB/Raw/<DATA_SOURCE>/<CAMPAIGN_NAME>
. Inside theraw_dir
directory, it is required to adopt the following structure:- ``/data/<station_name>/<raw_files>`` - ``/metadata/<station_name>.yml``
Important points:
For each
<station_name>
, there must be a corresponding YAML file in the metadata subdirectory.The
<CAMPAIGN_NAME>
are expected to be UPPER CASE.The
<CAMPAIGN_NAME>
must semantically match between:the
raw_dir
andprocessed_dir
directory paths;with the key
campaign_name
within the metadata YAML files.
processed_dir (str) – The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure
<...>/DISDRODB/Processed/<DATA_SOURCE>/<CAMPAIGN_NAME>
For testing purposes, this function exceptionally accepts also a directory path simply ending with<CAMPAIGN_NAME>
(e.g.,/tmp/<CAMPAIGN_NAME>
).station_name (str) – The name of the station.
force (bool, optional) – If
True
, overwrite existing data in destination directories. IfFalse
, raise an error if data already exists in destination directories. Default isFalse
.verbose (bool, optional) – If
True
, print detailed processing information to the terminal. Default isTrue
.parallel (bool, optional) – If
True
, process the files simultaneously in multiple processes. The number of simultaneous processes can be customized using thedask.distributed.LocalCluster
. IfFalse
, process the files sequentially in a single process. Default isFalse
.debugging_mode (bool, optional) – If
True
, reduce the amount of data to process. Only the first 3 raw data files will be processed. Default isFalse
.
disdrodb.l0.l0a_processing module#
Functions to process raw text files into DISDRODB L0A Apache Parquet.
- disdrodb.l0.l0a_processing.cast_column_dtypes(df: DataFrame, sensor_name: str) DataFrame [source]#
Convert
'object'
dataframe columns into DISDRODB L0A dtype standards.- Parameters
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
- Returns
Dataframe with corrected columns types.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.coerce_corrupted_values_to_nan(df: DataFrame, sensor_name: str) DataFrame [source]#
Coerce corrupted values in dataframe numeric columns to
np.nan
.- Parameters
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
- Returns
Dataframe with string columns without corrupted values.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.concatenate_dataframe(list_df: list, verbose: bool = False) DataFrame [source]#
Concatenate a list of dataframes.
- Parameters
list_df (list) – List of dataframes.
verbose (bool, optional) – If
True
, print messages. IfFalse
, no print.
- Returns
Concatenated dataframe.
- Return type
pd.DataFrame
- Raises
ValueError – Concatenation can not be done.
- disdrodb.l0.l0a_processing.drop_time_periods(df, time_periods)[source]#
Drop problematic time periods.
- disdrodb.l0.l0a_processing.process_raw_file(filepath, column_names, reader_kwargs, df_sanitizer_fun, sensor_name, verbose=True, issue_dict=None)[source]#
Read and parse a raw text files into a L0A dataframe.
- Parameters
filepath (str) – File path
column_names (list) – Columns names.
reader_kwargs (dict) – Pandas
read_csv
arguments.df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame.
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing. The default is
True
.issue_dict (dict) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary
{}
. Valid issue_dict key are'timesteps'
and'time_periods'
. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of theissue_dict
, use thedisdrodb.l0.issue.check_issue_dict
function.
- Returns
Dataframe
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.read_raw_file(filepath: str, column_names: list, reader_kwargs: dict) DataFrame [source]#
Read a raw file into a dataframe.
- Parameters
filepath (str) – Raw file path.
column_names (list) – Column names.
reader_kwargs (dict) – Pandas
pd.read_csv
arguments.
- Returns
Pandas dataframe.
- Return type
pandas.DataFrame
- disdrodb.l0.l0a_processing.read_raw_files(filepaths: Union[list, str], column_names: list, reader_kwargs: dict, sensor_name: str, verbose: bool, df_sanitizer_fun: object = None) DataFrame [source]#
Read and parse a list for raw files into a dataframe.
- Parameters
filepaths (Union[list,str]) – File(s) path(s)
column_names (list) – Columns names.
reader_kwargs (dict) – Pandas
read_csv
arguments.sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing. The default is
False
.df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame.
- Returns
Dataframe
- Return type
pd.DataFrame
- Raises
ValueError – Input parameters can not be used or the raw file can not be processed.
- disdrodb.l0.l0a_processing.remove_corrupted_rows(df)[source]#
Remove corrupted rows by checking conversion of raw fields to numeric.
Note: The raw array must be stripped away from delimiter at start and end !
- disdrodb.l0.l0a_processing.remove_duplicated_timesteps(df: DataFrame, verbose: bool = False)[source]#
Remove duplicated timesteps.
It keep only the first timestep occurrence !
- Parameters
df (pd.DataFrame) – Input dataframe.
verbose (bool) – Whether to verbose the processing. The default is
False
.
- Returns
Dataframe with valid unique timesteps.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.remove_issue_timesteps(df, issue_dict, verbose=False)[source]#
Drop dataframe rows with timesteps listed in the issue dictionary.
- Parameters
df (pd.DataFrame) – Input dataframe.
issue_dict (dict) – Issue dictionary.
verbose (bool) – Whether to verbose the processing. The default is
False
.
- Returns
Dataframe with problematic timesteps removed.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.remove_rows_with_missing_time(df: DataFrame, verbose: bool = False)[source]#
Remove dataframe rows where the
"time"
isNaT
.- Parameters
df (pd.DataFrame) – Input dataframe.
verbose (bool) – Whether to verbose the processing. The default is
False
.
- Returns
Dataframe with valid timesteps.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.replace_nan_flags(df, sensor_name, verbose=False)[source]#
Set values corresponding to
nan_flags
tonp.nan
.- Parameters
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing. The default is
False
.
- Returns
Dataframe without nan_flags values.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.set_nan_invalid_values(df, sensor_name, verbose=False)[source]#
Set invalid (class) values to
np.nan
.- Parameters
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing. The default is
False
.
- Returns
Dataframe without invalid values.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.set_nan_outside_data_range(df, sensor_name, verbose=False)[source]#
Set values outside the data range as
np.nan
.- Parameters
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing. The default is
False
.
- Returns
Dataframe without values outside the expected data range.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.strip_delimiter_from_raw_arrays(df)[source]#
Remove the first and last delimiter occurrence from the raw array fields.
- disdrodb.l0.l0a_processing.strip_string_spaces(df: DataFrame, sensor_name: str) DataFrame [source]#
Strip leading/trailing spaces from dataframe string columns.
- Parameters
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
- Returns
Dataframe with string columns without leading/trailing spaces.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.write_l0a(df: DataFrame, filepath: str, force: bool = False, verbose: bool = False)[source]#
Save the dataframe into an Apache Parquet file.
- Parameters
df (pd.DataFrame) – Input dataframe.
filepath (str) – Output file path.
force (bool, optional) – Whether to overwrite existing data. If
True
, overwrite existing data into destination directories. IfFalse
, raise an error if there are already data into destination directories. This is the default.verbose (bool, optional) – Whether to verbose the processing. The default is
False
.
- Raises
ValueError – The input dataframe can not be written as an Apache Parquet file.
NotImplementedError – The input dataframe can not be processed.
disdrodb.l0.l0b_nc_processing module#
Functions to process DISDRODB raw netCDF files into DISDRODB L0B netCDF files.
- disdrodb.l0.l0b_nc_processing.add_dataset_missing_variables(ds, missing_vars, sensor_name)[source]#
Add missing xr.Dataset variables as
np.nan
xr.DataArrays.
- disdrodb.l0.l0b_nc_processing.create_l0b_from_raw_nc(ds, dict_names, ds_sanitizer_fun, sensor_name, verbose, attrs)[source]#
Convert a raw
xr.Dataset
into a DISDRODB L0B netCDF.- Parameters
ds (xr.Dataset) – Raw xarray dataset
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
ds_sanitizer_fun (function) – Sanitizer function to do ad-hoc processing of the xr.Dataset.
attrs (dict) – Global metadata to attach as global attributes to the xr.Dataset.
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing.
- Returns
L0B xr.Dataset
- Return type
xr.Dataset
- disdrodb.l0.l0b_nc_processing.preprocess_raw_netcdf(ds, dict_names, sensor_name)[source]#
This function preprocess raw netCDF to improve compatibility with DISDRODB standards.
This function checks validity of the
dict_names
, rename and subset the data accordingly. If some variables specified in thedict_names
are missing, it adds anp.nan
xr.DataArray !- Parameters
ds (xr.Dataset) – Raw netCDF to be converted to DISDRODB standards.
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
sensor_name (str) – Sensor name.
- Returns
ds – xarray Dataset with variables compliant to DISDRODB conventions.
- Return type
xr.Dataset
- disdrodb.l0.l0b_nc_processing.rename_dataset(ds, dict_names)[source]#
Rename xr.Dataset variables, coordinates and dimensions.
- disdrodb.l0.l0b_nc_processing.replace_custom_nan_flags(ds, dict_nan_flags, verbose=False)[source]#
Set values corresponding to
nan_flags
tonp.nan
.This function must be used in a reader, if necessary.
- Parameters
df (xr.Dataset) – Input xarray dataset
dict_nan_flags (dict) – Dictionary with nan flags value to set as
np.nan
.verbose (bool) – Whether to verbose the processing. The default is
False
.
- Returns
Dataset without
nan_flags
values.- Return type
xr.Dataset
- disdrodb.l0.l0b_nc_processing.replace_nan_flags(ds, sensor_name, verbose)[source]#
Set values corresponding to
nan_flags
tonp.nan
.- Parameters
ds (xr.Dataset) – Input xarray dataset
dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan
verbose (bool) – Whether to verbose the processing.
- Returns
Dataset without
nan_flags
values.- Return type
xr.Dataset
- disdrodb.l0.l0b_nc_processing.set_nan_invalid_values(ds, sensor_name, verbose)[source]#
Set invalid (class) values to
np.nan
.- Parameters
ds (xr.Dataset) – Input xarray dataset
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing.
- Returns
Dataset without invalid values.
- Return type
xr.Dataset
- disdrodb.l0.l0b_nc_processing.set_nan_outside_data_range(ds, sensor_name, verbose)[source]#
Set values outside the data range as
np.nan
.- Parameters
ds (xr.Dataset) – Input xarray dataset
sensor_name (str) – Name of the sensor.
verbose (bool) – Whether to verbose the processing.
- Returns
Dataset without values outside the expected data range.
- Return type
xr.Dataset
disdrodb.l0.l0b_processing module#
Functions to process DISDRODB L0A files into DISDRODB L0B netCDF files.
- disdrodb.l0.l0b_processing.add_dataset_crs_coords(ds)[source]#
Add the CRS coordinate to the xr.Dataset.
- disdrodb.l0.l0b_processing.create_l0b_from_l0a(df: DataFrame, attrs: dict, verbose: bool = False) Dataset [source]#
Transform the L0A dataframe to the L0B xr.Dataset.
- Parameters
df (pd.DataFrame) – DISDRODB L0A dataframe.
attrs (dict) – Station metadata.
verbose (bool, optional) – Whether to verbose the processing. The default is
False
.
- Returns
DISDRODB L0B dataset.
- Return type
xr.Dataset
- Raises
ValueError – Error if the DISDRODB L0B xarray dataset can not be created.
- disdrodb.l0.l0b_processing.finalize_dataset(ds, sensor_name)[source]#
Finalize DISDRODB L0B Dataset.
- disdrodb.l0.l0b_processing.infer_split_str(string: str) str [source]#
Infer the delimiter inside a string.
- Parameters
string (str) – Input string.
- Returns
Inferred delimiter.
- Return type
str
- disdrodb.l0.l0b_processing.rechunk_dataset(ds: Dataset, encoding_dict: dict) Dataset [source]#
Coerce the dataset arrays to have the chunk size specified in the encoding dictionary.
- Parameters
ds (xr.Dataset) – Input xarray dataset
encoding_dict (dict) – Dictionary containing the encoding to write the xarray dataset as a netCDF.
- Returns
Output xarray dataset
- Return type
xr.Dataset
- disdrodb.l0.l0b_processing.retrieve_l0b_arrays(df: DataFrame, sensor_name: str, verbose: bool = False) dict [source]#
Retrieves the L0B data matrix.
- Parameters
df (pd.DataFrame) – Input dataframe
sensor_name (str) – Name of the sensor
- Returns
Dictionary with data arrays.
- Return type
dict
- disdrodb.l0.l0b_processing.sanitize_encodings_dict(encoding_dict: dict, ds: Dataset) dict [source]#
Ensure chunk size to be smaller than the array shape.
- Parameters
encoding_dict (dict) – Dictionary containing the encoding to write DISDRODB L0B netCDFs.
ds (xr.Dataset) – Input dataset.
- Returns
Encoding dictionary.
- Return type
dict
- disdrodb.l0.l0b_processing.set_encodings(ds: Dataset, sensor_name: str) Dataset [source]#
Apply the encodings to the xarray Dataset.
- Parameters
ds (xr.Dataset) – Input xarray dataset.
sensor_name (str) – Name of the sensor.
- Returns
Output xarray dataset.
- Return type
xr.Dataset
- disdrodb.l0.l0b_processing.write_l0b(ds: Dataset, filepath: str, force=False) None [source]#
Save the xarray dataset into a NetCDF file.
- Parameters
ds (xr.Dataset) – Input xarray dataset.
filepath (str) – Output file path.
sensor_name (str) – Name of the sensor.
force (bool, optional) – Whether to overwrite existing data. If
True
, overwrite existing data into destination directories. IfFalse
, raise an error if there are already data into destination directories. This is the default.
disdrodb.l0.routines module#
Implement DISDRODB wrappers to launch L0 processing in the terminal.
- disdrodb.l0.routines.click_l0_archive_options(function: object)[source]#
Click command line arguments for L0 processing archiving of a station.
- Parameters
function (object) – Function.
- disdrodb.l0.routines.click_l0_processing_options(function: object)[source]#
Click command line default parameters for L0 processing options.
- Parameters
function (object) – Function.
- disdrodb.l0.routines.click_l0_stations_options(function: object)[source]#
Click command line options for DISDRODB archive L0 processing.
- Parameters
function (object) – Function.
- disdrodb.l0.routines.click_l0b_concat_options(function: object)[source]#
Click command line default parameters for L0B concatenation.
- Parameters
function (object) – Function.
- disdrodb.l0.routines.click_remove_l0a_option(function: object)[source]#
Click command line argument for
remove_l0a
.
- disdrodb.l0.routines.run_disdrodb_l0(data_sources=None, campaign_names=None, station_names=None, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = False, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, base_dir: Optional[str] = None)[source]#
Run the L0 processing of DISDRODB stations.
This function allows to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.
- Parameters
data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default is
None
.campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default is
None
.station_names (list) – Station names to process. The default is
None
.l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is
True
.l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is
True
.l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If
l0b_concat=True
, all raw files will be saved into a single L0B netCDF file. Ifl0b_concat=False
, each raw file will be converted into the corresponding L0B netCDF file. The default isFalse
.remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is
False
.remove_l0b (bool) – Whether to remove the L0B files after having concatenated all L0B netCDF files. It takes places only if
l0b_concat = True
. The default isFalse
.force (bool) – If
True
, overwrite existing data into destination directories. IfFalse
, raise an error if there are already data into destination directories. The default isFalse
.verbose (bool) – Whether to print detailed processing information into terminal. The default is
False
.parallel (bool) – If
True
, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined withos.cpu_count()
. IfFalse
, the files are processed sequentially in a single process. IfFalse
, multi-threading is automatically exploited to speed up I/0 tasks.debugging_mode (bool) – If
True
, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. For L0B, it processes just the first 100 rows of 3 L0A files. The default isFalse
.base_dir (str (optional)) – Base directory of DISDRODB. Format:
<...>/DISDRODB
. IfNone
(the default), thebase_dir
path specified in the DISDRODB active configuration will be used.
- disdrodb.l0.routines.run_disdrodb_l0_station(data_source, campaign_name, station_name, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = False, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, base_dir: Optional[str] = None)[source]#
Run the L0 processing of a specific DISDRODB station from the terminal.
- Parameters
data_source (str) – Institution name (when campaign data spans more than 1 country), or country (when all campaigns (or sensor networks) are inside a given country). Must be UPPER CASE.
campaign_name (str) – Campaign name. Must be UPPER CASE.
station_name (str) – Station name
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is
True
.l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is
True
.l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If
l0b_concat=True
, all raw files will be saved into a single L0B netCDF file. Ifl0b_concat=False
, each raw file will be converted into the corresponding L0B netCDF file. The default isFalse
.remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is
False
.remove_l0b (bool) – Whether to remove the L0B files after having concatenated all L0B netCDF files. It takes places only if
l0b_concat=True
. The default isFalse
.force (bool) – If
True
, overwrite existing data into destination directories. IfFalse
, raise an error if there are already data into destination directories. The default isFalse
.verbose (bool) – Whether to print detailed processing information into terminal. The default is
True
.parallel (bool) – If
True
, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined withos.cpu_count()
. IfFalse
, the files are processed sequentially in a single process. IfFalse
, multi-threading is automatically exploited to speed up I/0 tasks.debugging_mode (bool) – If
True
, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files for each station. For L0B, it processes just the first 100 rows of 3 L0A files for each station. The default isFalse
.base_dir (str (optional)) – Base directory of DISDRODB. Format:
<...>/DISDRODB
. IfNone
(the default), thebase_dir
path specified in the DISDRODB active configuration will be used.
- disdrodb.l0.routines.run_disdrodb_l0a(data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, base_dir: Optional[str] = None)[source]#
Run the L0A processing of DISDRODB stations.
This function allows to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.
- Parameters
data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default is
None
.campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default is
None
.station_names (list) – Station names to process. The default is
None
.force (bool) – If
True
, overwrite existing data into destination directories. IfFalse
, raise an error if there are already data into destination directories. The default isFalse
.verbose (bool) – Whether to print detailed processing information into terminal. The default is
True
.parallel (bool) – If
True
, the files are processed simultaneously in multiple processes. By default, the number of process is defined withos.cpu_count()
. IfFalse
, the files are processed sequentially in a single process.debugging_mode (bool) – If
True
, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. The default isFalse
.base_dir (str (optional)) – Base directory of DISDRODB. Format:
<...>/DISDRODB
. IfNone
(the default), thebase_dir
path specified in the DISDRODB active configuration will be used.
- disdrodb.l0.routines.run_disdrodb_l0a_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, base_dir: Optional[str] = None)[source]#
Run the L0A processing of a station calling the disdrodb_l0a_station in the terminal.
- disdrodb.l0.routines.run_disdrodb_l0b(data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, base_dir: Optional[str] = None, remove_l0a: bool = False)[source]#
Run the L0B processing of DISDRODB stations.
This function allows to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB L0A stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.
- Parameters
data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default is
None
.campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default is
None
.station_names (list) – Station names to process. The default is
None
.force (bool) – If
True
, overwrite existing data into destination directories. IfFalse
, raise an error if there are already data into destination directories. The default isFalse
.verbose (bool) – Whether to print detailed processing information into terminal. The default is
True
.parallel (bool) – If
True
, the files are processed simultaneously in multiple processes. By default, the number of process is defined withos.cpu_count()
. IfFalse
, the files are processed sequentially in a single process.debugging_mode (bool) – If
True
, it reduces the amount of data to process. For L0B, it processes just the first 100 rows of 3 L0A files. The default isFalse
.base_dir (str (optional)) – Base directory of DISDRODB. Format:
<...>/DISDRODB
. IfNone
(the default), thebase_dir
path specified in the DISDRODB active configuration will be used.
- disdrodb.l0.routines.run_disdrodb_l0b_concat(data_sources=None, campaign_names=None, station_names=None, remove_l0b=False, verbose=False, base_dir=None)[source]#
Concatenate the L0B files of the DISDRODB archive.
This function is called by the
disdrodb_run_l0b_concat
script.
- disdrodb.l0.routines.run_disdrodb_l0b_concat_station(data_source, campaign_name, station_name, remove_l0b=False, verbose=False, base_dir=None)[source]#
Concatenate the L0B files of a single DISDRODB station.
This function runs the
disdrodb_run_l0b_concat_station
script in the terminal.
- disdrodb.l0.routines.run_disdrodb_l0b_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, base_dir: Optional[str] = None, remove_l0a: bool = False)[source]#
Run the L0B processing of a station calling disdrodb_run_l0b_station in the terminal.
disdrodb.l0.standards module#
Retrieve L0 sensor standards.
- disdrodb.l0.standards.get_bin_coords_dict(sensor_name: str) dict [source]#
Retrieve diameter (and velocity) bin coordinates.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with coordinates arrays.
- Return type
dict
- disdrodb.l0.standards.get_coords_attrs_dict()[source]#
Return dictionary with DISDRODB coordinates attributes.
- disdrodb.l0.standards.get_data_format_dict(sensor_name: str) dict [source]#
Get a dictionary containing the data format of each sensor variable.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Data format of each sensor variable.
- Return type
dict
- disdrodb.l0.standards.get_data_range_dict(sensor_name: str) dict [source]#
Get the variable data range.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the expected data value range for each data field. It excludes variables without specified data_range key.
- Return type
dict
- disdrodb.l0.standards.get_diameter_bin_center(sensor_name: str) list [source]#
Get diameter bin center.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Diameter bin center.
- Return type
list
- disdrodb.l0.standards.get_diameter_bin_lower(sensor_name: str) list [source]#
Get diameter bin lower bound.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Diameter bin lower bound.
- Return type
list
- disdrodb.l0.standards.get_diameter_bin_upper(sensor_name: str) list [source]#
Get diameter bin upper bound.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Diameter bin upper bound.
- Return type
list
- disdrodb.l0.standards.get_diameter_bin_width(sensor_name: str) list [source]#
Get diameter bin width.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Diameter bin width.
- Return type
list
- disdrodb.l0.standards.get_diameter_bins_dict(sensor_name: str) dict [source]#
Get dictionary with
sensor_name
diameter bins information.- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Sensor diameter bins information.
- Return type
dict
- disdrodb.l0.standards.get_dims_size_dict(sensor_name: str) dict [source]#
Get the number of bins for each dimension.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the number of bins for each dimension.
- Return type
dict
- disdrodb.l0.standards.get_field_nchar_dict(sensor_name: str) dict [source]#
Get the total number of characters from the instrument default string standards.
Important note: it accounts also for the comma and the minus sign !!!
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the expected number of characters for each data field.
- Return type
dict
- disdrodb.l0.standards.get_field_ndigits_decimals_dict(sensor_name: dict) dict [source]#
Get number of digits on the right side of the comma from the instrument default string standards.
Example: 123,45 -> 45 –> 2 decimal digits.
- Parameters
sensor_name (dict) – Name of the sensor.
- Returns
Dictionary with the expected number of decimal digits for each data field.
- Return type
dict
- disdrodb.l0.standards.get_field_ndigits_dict(sensor_name: str) dict [source]#
Get number of digits from the instrument default string standards.
Important note: it excludes the comma but it counts the minus sign !!!
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the expected number of digits for each data field.
- Return type
dict
- disdrodb.l0.standards.get_field_ndigits_natural_dict(sensor_name: str) dict [source]#
Get number of digits on the left side of the comma from the instrument default string standards.
Example: 123,45 -> 123 –> 3 natural digits.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the expected number of natural digits for each data field.
- Return type
dict
- disdrodb.l0.standards.get_l0a_dtype(sensor_name: str) dict [source]#
Get a dictionary containing the L0A dtype.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the L0A dtype.
- Return type
dict
- disdrodb.l0.standards.get_l0a_encodings_dict(sensor_name: str) dict [source]#
Get a dictionary containing the L0A encodings.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
L0A encodings.
- Return type
dict
- disdrodb.l0.standards.get_l0b_cf_attrs_dict(sensor_name: str) dict [source]#
Get a dictionary containing the CF attributes of each sensor variable.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
CF attributes of each sensor variable. For each variable, the ‘units’, ‘description’, and ‘long_name’ attributes are specified.
- Return type
dict
- disdrodb.l0.standards.get_l0b_encodings_dict(sensor_name: str) dict [source]#
Get a dictionary containing the encoding to write L0B netCDFs.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Encoding to write L0B netCDFs
- Return type
dict
- disdrodb.l0.standards.get_nan_flags_dict(sensor_name: str) dict [source]#
Get the variable nan_flags.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the expected nan_flags list for each data field. It excludes variables without specified nan_flags key.
- Return type
dict
- disdrodb.l0.standards.get_raw_array_dims_order(sensor_name: str) dict [source]#
Get the dimension order of the raw fields.
The order of dimension specified for raw_drop_number controls the reshaping of the precipitation raw spectrum.
Examples
OTT Parsivel spectrum [v1d1 … v1d32, v2d1, …, v2d32] –> dimension_order = [“velocity_bin_center”, “diameter_bin_center”] Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] –> dimension_order = [“diameter_bin_center”, “velocity_bin_center”]
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dimension order dictionary.
- Return type
dict
- disdrodb.l0.standards.get_raw_array_nvalues(sensor_name: str) dict [source]#
Get a dictionary with the number of values expected for each raw array.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Field definition.
- Return type
dict
- disdrodb.l0.standards.get_sensor_logged_variables(sensor_name: str) list [source]#
Get the sensor logged variables list.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
List of the variables logged by the sensor.
- Return type
list
- disdrodb.l0.standards.get_time_encoding() dict [source]#
Create time encoding.
- Returns
Time encoding.
- Return type
dict
- disdrodb.l0.standards.get_valid_coordinates_names(sensor_name)[source]#
Get list of valid coordinates for DISDRODB L0B.
- disdrodb.l0.standards.get_valid_dimension_names(sensor_name)[source]#
Get list of valid dimension names for DISDRODB L0B.
- disdrodb.l0.standards.get_valid_names(sensor_name)[source]#
Return the list of valid variable and coordinates names for DISDRODB L0B.
- disdrodb.l0.standards.get_valid_values_dict(sensor_name: str) dict [source]#
Get the list of valid values for a variable.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the expected values for specific variables. It excludes variables without specified valid_values key.
- Return type
dict
- disdrodb.l0.standards.get_variables_dimension(sensor_name: str)[source]#
Returns a dictionary with the variable dimensions of a L0B product.
- disdrodb.l0.standards.get_velocity_bin_center(sensor_name: str) list [source]#
Get velocity bin center.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Velocity bin center.
- Return type
list
- disdrodb.l0.standards.get_velocity_bin_lower(sensor_name: str) list [source]#
Get velocity bin lower bound.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Velocity bin lower bound.
- Return type
list
- disdrodb.l0.standards.get_velocity_bin_upper(sensor_name: str) list [source]#
Get velocity bin upper bound.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Velocity bin upper bound.
- Return type
list
- disdrodb.l0.standards.get_velocity_bin_width(sensor_name: str) list [source]#
Get velocity bin width.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Velocity bin width.
- Return type
list
- disdrodb.l0.standards.get_velocity_bins_dict(sensor_name: str) dict [source]#
Get velocity with
sensor_name
diameter bins information.- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Sensor velocity bins information.
- Return type
dict
- disdrodb.l0.standards.set_disdrodb_attrs(ds, product: str)[source]#
Add DISDRODB processing information to the netCDF global attributes.
It assumes stations metadata are already added the dataset.
- Parameters
ds (xarray dataset.) – Dataset
product (str) – DISDRODB product.
- Returns
Dataset.
- Return type
xarray dataset
disdrodb.l0.template_tools module#
Useful tools helping in the implementation of the DISDRODB L0 readers.
- disdrodb.l0.template_tools.check_column_names(column_names: list, sensor_name: str) None [source]#
Checks that the column names respects DISDRODB standards.
- Parameters
column_names (list) – List of columns names.
sensor_name (str) – Name of the sensor.
- Raises
TypeError – Error if some columns do not meet the DISDRODB standards.
- disdrodb.l0.template_tools.get_decimal_ndigits(string: str) int [source]#
Get the number of decimal digits.
- Parameters
string (str) – Input string.
- Returns
The number of decimal digits.
- Return type
int
- disdrodb.l0.template_tools.get_df_columns_unique_values_dict(df: DataFrame, column_indices: Optional[Union[int, slice, list]] = None, column_names: bool = True)[source]#
Create a dictionary {column: unique values}.
- Parameters
df (pd.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – Column indices. If
None
, select all columns.column_names (bool, optional) – If
True
, the dictionary key are the column names. The default isTrue
.
- disdrodb.l0.template_tools.get_natural_ndigits(string: str) int [source]#
Get the number of natural digits.
- Parameters
string (str) – Input string.
- Returns
The number of natural digits.
- Return type
int
- disdrodb.l0.template_tools.get_nchar(string: str) int [source]#
Get the number of characters.
- Parameters
string (str) – Input string.
- Returns
The number of characters.
- Return type
int
- disdrodb.l0.template_tools.get_ndigits(string: str) int [source]#
Get the number of total numeric digits.
- Parameters
string (str) – Input string
- Returns
The number of total digits.
- Return type
int
- disdrodb.l0.template_tools.infer_column_names(df: DataFrame, sensor_name: str, row_idx: int = 1)[source]#
Try to guess the dataframe columns names based on string characteristics.
- Parameters
df (pd.DataFrame) – The dataframe to analyse.
sensor_name (str) – name of the sensor.
row_idx (int, optional) – The row index of the dataframe to use to infer the column names. The default row index is 1.
- Returns
Dictionary with the keys being the column id and the values being the guessed column names
- Return type
dict
- disdrodb.l0.template_tools.print_df_column_names(df: DataFrame) None [source]#
Print dataframe columns names.
- Parameters
df (dataframe) – The dataframe.
- disdrodb.l0.template_tools.print_df_columns_unique_values(df: DataFrame, column_indices: Optional[Union[int, slice, list]] = None, print_column_names: bool = True) None [source]#
Print columns’ unique values.
- Parameters
df (pd.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – Column indices. If
None
, select all columns.column_names (bool, optional) – If
True
, print the column names. The default isTrue
.
- disdrodb.l0.template_tools.print_df_first_n_rows(df: DataFrame, n: int = 5, print_column_names: bool = True) None [source]#
Print the n first n rows dataframe by column.
- Parameters
df (pd.DataFrame) – Input dataframe.
n (int, optional) – Number of row. The default is 5.
column_names (bool , optional) – If true columns name are printed, by default
True
.
- disdrodb.l0.template_tools.print_df_random_n_rows(df: DataFrame, n: int = 5, print_column_names: bool = True) None [source]#
Print the content of the dataframe by column, randomly chosen.
- Parameters
df (dataframe) – The dataframe.
n (int, optional) – The number of row to print. The default is 5.
print_column_names (bool, optional) – If true, print the column names. The default is
True
.
- disdrodb.l0.template_tools.print_df_summary_stats(df: DataFrame, column_indices: Optional[Union[int, slice, list]] = None, print_column_names: bool = True)[source]#
Create a columns statistics summary.
- Parameters
df (pd.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – Column indices. If
None
, select all columns.print_column_names (bool, optional) – If
True
, print the column names. The default isTrue
.
- Raises
ValueError – Error if columns types is not numeric.
- disdrodb.l0.template_tools.print_df_with_any_nan_rows(df: DataFrame) None [source]#
Print empty rows.
- Parameters
df (pd.DataFrame) – Input dataframe.
- disdrodb.l0.template_tools.print_valid_l0_column_names(sensor_name: str) None [source]#
Print valid columns names from the standard.
- Parameters
sensor_name (str) – Name of the sensor.
- disdrodb.l0.template_tools.str_has_decimal_digits(string: str) bool [source]#
Check if a string has decimals.
- Parameters
string – Input string.
- Returns
True if string has digits.
- Return type
bool
Module contents#
- disdrodb.l0.available_readers(data_sources=None, reader_path=False)[source]#
Retrieve available readers information.
- disdrodb.l0.run_disdrodb_l0(data_sources=None, campaign_names=None, station_names=None, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = False, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, base_dir: Optional[str] = None)[source]#
Run the L0 processing of DISDRODB stations.
This function allows to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.
- Parameters
data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default is
None
.campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default is
None
.station_names (list) – Station names to process. The default is
None
.l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is
True
.l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is
True
.l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If
l0b_concat=True
, all raw files will be saved into a single L0B netCDF file. Ifl0b_concat=False
, each raw file will be converted into the corresponding L0B netCDF file. The default isFalse
.remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is
False
.remove_l0b (bool) – Whether to remove the L0B files after having concatenated all L0B netCDF files. It takes places only if
l0b_concat = True
. The default isFalse
.force (bool) – If
True
, overwrite existing data into destination directories. IfFalse
, raise an error if there are already data into destination directories. The default isFalse
.verbose (bool) – Whether to print detailed processing information into terminal. The default is
False
.parallel (bool) – If
True
, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined withos.cpu_count()
. IfFalse
, the files are processed sequentially in a single process. IfFalse
, multi-threading is automatically exploited to speed up I/0 tasks.debugging_mode (bool) – If
True
, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. For L0B, it processes just the first 100 rows of 3 L0A files. The default isFalse
.base_dir (str (optional)) – Base directory of DISDRODB. Format:
<...>/DISDRODB
. IfNone
(the default), thebase_dir
path specified in the DISDRODB active configuration will be used.
- disdrodb.l0.run_disdrodb_l0_station(data_source, campaign_name, station_name, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = False, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, base_dir: Optional[str] = None)[source]#
Run the L0 processing of a specific DISDRODB station from the terminal.
- Parameters
data_source (str) – Institution name (when campaign data spans more than 1 country), or country (when all campaigns (or sensor networks) are inside a given country). Must be UPPER CASE.
campaign_name (str) – Campaign name. Must be UPPER CASE.
station_name (str) – Station name
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is
True
.l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is
True
.l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If
l0b_concat=True
, all raw files will be saved into a single L0B netCDF file. Ifl0b_concat=False
, each raw file will be converted into the corresponding L0B netCDF file. The default isFalse
.remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is
False
.remove_l0b (bool) – Whether to remove the L0B files after having concatenated all L0B netCDF files. It takes places only if
l0b_concat=True
. The default isFalse
.force (bool) – If
True
, overwrite existing data into destination directories. IfFalse
, raise an error if there are already data into destination directories. The default isFalse
.verbose (bool) – Whether to print detailed processing information into terminal. The default is
True
.parallel (bool) – If
True
, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined withos.cpu_count()
. IfFalse
, the files are processed sequentially in a single process. IfFalse
, multi-threading is automatically exploited to speed up I/0 tasks.debugging_mode (bool) – If
True
, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files for each station. For L0B, it processes just the first 100 rows of 3 L0A files for each station. The default isFalse
.base_dir (str (optional)) – Base directory of DISDRODB. Format:
<...>/DISDRODB
. IfNone
(the default), thebase_dir
path specified in the DISDRODB active configuration will be used.
- disdrodb.l0.run_disdrodb_l0a(data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, base_dir: Optional[str] = None)[source]#
Run the L0A processing of DISDRODB stations.
This function allows to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.
- Parameters
data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default is
None
.campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default is
None
.station_names (list) – Station names to process. The default is
None
.force (bool) – If
True
, overwrite existing data into destination directories. IfFalse
, raise an error if there are already data into destination directories. The default isFalse
.verbose (bool) – Whether to print detailed processing information into terminal. The default is
True
.parallel (bool) – If
True
, the files are processed simultaneously in multiple processes. By default, the number of process is defined withos.cpu_count()
. IfFalse
, the files are processed sequentially in a single process.debugging_mode (bool) – If
True
, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. The default isFalse
.base_dir (str (optional)) – Base directory of DISDRODB. Format:
<...>/DISDRODB
. IfNone
(the default), thebase_dir
path specified in the DISDRODB active configuration will be used.
- disdrodb.l0.run_disdrodb_l0a_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, base_dir: Optional[str] = None)[source]#
Run the L0A processing of a station calling the disdrodb_l0a_station in the terminal.
- disdrodb.l0.run_disdrodb_l0b(data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, base_dir: Optional[str] = None, remove_l0a: bool = False)[source]#
Run the L0B processing of DISDRODB stations.
This function allows to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB L0A stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.
- Parameters
data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default is
None
.campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default is
None
.station_names (list) – Station names to process. The default is
None
.force (bool) – If
True
, overwrite existing data into destination directories. IfFalse
, raise an error if there are already data into destination directories. The default isFalse
.verbose (bool) – Whether to print detailed processing information into terminal. The default is
True
.parallel (bool) – If
True
, the files are processed simultaneously in multiple processes. By default, the number of process is defined withos.cpu_count()
. IfFalse
, the files are processed sequentially in a single process.debugging_mode (bool) – If
True
, it reduces the amount of data to process. For L0B, it processes just the first 100 rows of 3 L0A files. The default isFalse
.base_dir (str (optional)) – Base directory of DISDRODB. Format:
<...>/DISDRODB
. IfNone
(the default), thebase_dir
path specified in the DISDRODB active configuration will be used.
- disdrodb.l0.run_disdrodb_l0b_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, base_dir: Optional[str] = None, remove_l0a: bool = False)[source]#
Run the L0B processing of a station calling disdrodb_run_l0b_station in the terminal.
- disdrodb.l0.run_l0a(raw_dir, processed_dir, station_name, glob_patterns, column_names, reader_kwargs, df_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]#
Run the L0A processing for a specific DISDRODB station.
This function is called in each reader to convert raw text files into DISDRODB L0A products.
- Parameters
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<...>/DISDRODB/Raw/<DATA_SOURCE>/<CAMPAIGN_NAME>
. Inside theraw_dir
directory, it is required to adopt the following structure:- ``/data/<station_name>/<raw_files>`` - ``/metadata/<station_name>.yml``
Important points:
For each
<station_name>
, there must be a corresponding YAML file in the metadata subdirectory.The
campaign_name
are expected to be UPPER CASE.- The
<CAMPAIGN_NAME>
must semantically match between: the
raw_dir
andprocessed_dir
directory paths;with the key
campaign_name
within the metadata YAML files.
- The
processed_dir (str) – The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:
<...>/DISDRODB/Processed/<DATA_SOURCE>/<CAMPAIGN_NAME>
. For testing purposes, this function exceptionally accepts also a directory path simply ending with<CAMPAIGN_NAME>
(e.g.,/tmp/<CAMPAIGN_NAME>
).station_name (str) –
station. (The name of the) –
glob_patterns (str) – Glob pattern to search for data files in
<raw_dir>/data/<station_name>
.column_names (list) – Column names of the raw text file.
reader_kwargs (dict) – Arguments for Pandas
read_csv
function to open the text file.df_sanitizer_fun (callable, optional) – Sanitizer function to format the DataFrame into DISDRODB L0A standard. Default is
None
.parallel (bool, optional) – If
True
, process the files simultaneously in multiple processes. The number of simultaneous processes can be customized using thedask.distributed.LocalCluster
. IfFalse
, process the files sequentially in a single process. Default isFalse
.verbose (bool, optional) – If
True
, print detailed processing information to the terminal. Default isFalse
.force (bool, optional) – If
True
, overwrite existing data in destination directories. IfFalse
, raise an error if data already exists in destination directories. Default isFalse
.debugging_mode (bool, optional) – If
True
, reduce the amount of data to process. Processes only the first 100 rows of 3 raw data files. Default isFalse
.
- disdrodb.l0.run_l0b_from_nc(raw_dir, processed_dir, station_name, glob_patterns, dict_names, ds_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]#
Run the L0B processing for a specific DISDRODB station with raw netCDFs.
This function is called in the reader where raw netCDF files must be converted into DISDRODB L0B format.
- Parameters
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<...>/DISDRODB/Raw/<DATA_SOURCE>/<CAMPAIGN_NAME>
. Inside theraw_dir
directory, it is required to adopt the following structure:- ``/data/<station_name>/<raw_files>`` - ``/metadata/<station_name>.yml``
Important points:
For each
<station_name>
, there must be a corresponding YAML file in the metadata subdirectory.The
campaign_name
are expected to be UPPER CASE.- The
<CAMPAIGN_NAME>
must semantically match between: the
raw_dir
andprocessed_dir
directory paths;with the key
campaign_name
within the metadata YAML files.
- The
processed_dir (str) – The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:
<...>/DISDRODB/Processed/<DATA_SOURCE>/<CAMPAIGN_NAME>
. For testing purposes, this function exceptionally accepts also a directory path simply ending with<CAMPAIGN_NAME>
(e.g.,/tmp/<CAMPAIGN_NAME>
).station_name (str) – The name of the station.
glob_patterns (str) – Glob pattern to search data files in
<raw_dir>/data/<station_name>
. Example:glob_patterns = "*.nc"
dict_names (dict) –
- Dictionary mapping raw netCDF variables/coordinates/dimension names
to DISDRODB standards.
- ds_sanitizer_funobject, optional
Sanitizer function to format the raw netCDF into DISDRODB L0B standard.
force (bool, optional) – If
True
, overwrite existing data in destination directories. IfFalse
, raise an error if data already exists in destination directories. Default isFalse
.verbose (bool, optional) – If
True
, print detailed processing information to the terminal. Default isTrue
.parallel (bool, optional) – If
True
, process the files simultaneously in multiple processes. The number of simultaneous processes can be customized using thedask.distributed.LocalCluster
. Ensure that thethreads_per_worker
(number of thread per process) is set to 1 to avoid HDF errors. Also, ensure to set theHDF5_USE_FILE_LOCKING
environment variable toFalse
. IfFalse
, process the files sequentially in a single process. IfFalse
, multi-threading is automatically exploited to speed up I/0 tasks. Default isFalse
.debugging_mode (bool, optional) – If
True
, reduce the amount of data to process. Only the first 3 raw netCDF files will be processed. Default isFalse
.