disdrodb package

disdrodb package#

Subpackages#

Submodules#

disdrodb.configs module#

DISDRODB Configuration File functions.

disdrodb.configs.define_disdrodb_configs(data_archive_dir: str | None = None, metadata_archive_dir: str | None = None, folder_partitioning: str | None = None, zenodo_token: str | None = None, zenodo_sandbox_token: str | None = None)[source][source]#

Defines the DISDRODB configuration file with the given credentials and base directory.

Parameters:

data_archive_dir (str) – The directory path where the DISDRODB Data Archive is located.
metadata_archive_dir (str) – The directory path where the DISDRODB Metadata Archive is located.
folder_partitioning (str) – The folder partitioning scheme used in the DISDRODB Data Archive. Allowed values are: - “”: No additional subdirectories, files are saved directly in <station_dir>. - “year”: Files are stored under a subdirectory for the year (<station_dir>/2025). - “year/month”: Files are stored under subdirectories by year and month (<station_dir>/2025/04). - “year/month/day”: Files are stored under subdirectories by year, month and day (<station_dir>/2025/04/01). - “year/month_name”: Files are stored under subdirectories by year and month name (<station_dir>/2025/April). - “year/quarter”: Files are stored under subdirectories by year and quarter (<station_dir>/2025/Q2).
zenodo__token (str) – Zenodo Access Token. It is required to upload stations data to Zenodo.
zenodo_sandbox_token (str) – Zenodo Sandbox Access Token. It is required to upload stations data to Zenodo Sandbox.

Notes

This function write or update the DISDRODB config YAML file. The DISDRODB config YAML file is located in the user’s home directory at ~/.config_disdrodb.yml. The configuration file is used to run the various DISDRODB operations.

disdrodb.configs.get_data_archive_dir(data_archive_dir=None)[source][source]#: Return the DISDRODB base directory.

disdrodb.configs.get_folder_partitioning()[source][source]#: Return the folder partitioning.

disdrodb.configs.get_metadata_archive_dir(metadata_archive_dir=None)[source][source]#: Return the DISDRODB Metadata Archive Directory.

disdrodb.configs.get_zenodo_token(sandbox: bool)[source][source]#: Return the Zenodo access token.

disdrodb.configs.read_disdrodb_configs() → dict[str, str][source][source]#

Reads the DISDRODB configuration file and returns a dictionary with the configuration settings.

Returns:: A dictionary containing the configuration settings for the DISDRODB.
Return type:: dict
Raises:: ValueError – If the configuration file has not been defined yet. Use disdrodb.define_configs() to specify the configuration file path and settings.

Notes

This function reads the YAML configuration file located at ~/.config_disdrodb.yml.

disdrodb.docs module#

Open the documentation for the relevant sensor.

disdrodb.docs.open_documentation()[source][source]#: Open the DISDRODB documentation the browser.

disdrodb.docs.open_sensor_documentation(sensor_name)[source][source]#: Open the sensor documentation PDF in the browser.

disdrodb.routines module#

DISDRODB CLI routine wrappers.

disdrodb.routines.run_l0(data_sources=None, campaign_names=None, station_names=None, l0a_processing: bool = True, l0b_processing: bool = True, l0c_processing: bool = True, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L0 processing of DISDRODB stations.

This function allows to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.

Parameters:

data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default value is None.
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default value is None.
station_names (list) – Station names to process. The default value is None.
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default value is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default value is True.
l0c_processing (bool) – Whether to launch processing to generate L0C netCDF4 file(s) from L0B data. The default value is True.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default value is False.
remove_l0b (bool) – Whether to remove the L0B files after having produced all L0C netCDF files. The default value is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is False.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. For L0B, it processes just the first 100 rows of 3 L0A files. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.routines.run_l0_station(data_source, campaign_name, station_name, l0a_processing: bool = True, l0b_processing: bool = True, l0c_processing: bool = True, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L0 processing of a specific DISDRODB station from the terminal.

Parameters:

data_source (str) – Institution name (when campaign data spans more than 1 country), or country (when all campaigns (or sensor networks) are inside a given country). Must be UPPER CASE.
campaign_name (str) – Campaign name. Must be UPPER CASE.
station_name (str) – Station name
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default value is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default value is True.
l0b_processing – Whether to launch processing to generate L0C netCDF4 file(s) from L0B data. The default value is True.
l0c_processing (bool) – Whether to launch processing to generate L0C netCDF4 file(s) from L0C data. The default is True.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default value is False.
remove_l0b (bool) – Whether to remove the L0B files after having produced L0C netCDF files. The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is True.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files for each station. For L0B, it processes just the first 100 rows of 3 L0A files for each station. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.routines.run_l0a(data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L0A processing of DISDRODB stations.

Parameters:

data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default value is None.
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default value is None.
station_names (list) – Station names to process. The default value is None.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is True.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.routines.run_l0a_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#: Run the L0A processing of a station calling the disdrodb_l0a_station in the terminal.

disdrodb.routines.run_l0b(data_sources=None, campaign_names=None, station_names=None, remove_l0a: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L0B processing of DISDRODB stations.

This function allows to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB L0A stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.

Parameters:

data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default value is None.
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default value is None.
station_names (list) – Station names to process. The default value is None.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default value is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is True.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0B, it processes just the first 100 rows of 3 L0A files. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.routines.run_l0b_station(data_source, campaign_name, station_name, remove_l0a: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#: Run the L0B processing of a station calling disdrodb_run_l0b_station in the terminal.

disdrodb.routines.run_l0c(data_sources=None, campaign_names=None, station_names=None, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L0C processing of DISDRODB stations.

Parameters:

data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default value is None.
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default value is None.
station_names (list) – Station names to process. The default value is None.
remove_l0b (bool) – Whether to remove the L0B files after having produced L0C netCDF files. The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is False.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L1B, it processes just 3 L0B files. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.routines.run_l0c_station(data_source, campaign_name, station_name, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#: Run the L0C processing of a station calling the disdrodb_l0c_station in the terminal.

disdrodb.routines.run_l1(data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L1 processing of DISDRODB stations.

Parameters:

data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default value is None.
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default value is None.
station_names (list) – Station names to process. The default value is None.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is False.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L1B, it processes just 3 L0B files. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.routines.run_l1_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#: Run the L1 processing of a station calling the disdrodb_l1_station in the terminal.

disdrodb.routines.run_l2e(data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L2E processing of DISDRODB stations.

Parameters:

data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default value is None.
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default value is None.
station_names (list) – Station names to process. The default value is None.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is False.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L2E, it processes just 3 L1 files. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.routines.run_l2e_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#: Run the L2E processing of a station calling the disdrodb_l1_station in the terminal.

disdrodb.routines.run_l2m(data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L2M processing of DISDRODB stations.

Parameters:

data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default value is None.
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default value is None.
station_names (list) – Station names to process. The default value is None.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is False.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L2MB, it processes just 3 L0B files. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.routines.run_l2m_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#: Run the L2M processing of a station calling the disdrodb_l2m_station in the terminal.

Module contents#

DISDRODB software.

disdrodb.available_campaigns(product=None, data_sources=None, station_names=None, available_data=False, raise_error_if_empty=False, invalid_fields_policy='raise', data_archive_dir=None, metadata_archive_dir=None, **product_kwargs)[source][source]#: Return campaigns names for which stations are available.

disdrodb.available_data_sources(product=None, campaign_names=None, station_names=None, available_data=False, raise_error_if_empty=False, invalid_fields_policy='raise', data_archive_dir=None, metadata_archive_dir=None, **product_kwargs)[source][source]#: Return data sources for which stations are available.

disdrodb.available_readers(sensor_name, data_sources=None, return_path=False)[source][source]#: Retrieve available readers information.

disdrodb.available_sensor_names() → list[source][source]#

Get available names of sensors.

Returns:: sensor_names – Sorted list of the available sensors
Return type:: list

disdrodb.available_stations(product=None, data_sources=None, campaign_names=None, station_names=None, return_tuple=True, available_data=False, raise_error_if_empty=False, invalid_fields_policy='raise', data_archive_dir=None, metadata_archive_dir=None, **product_kwargs)[source][source]#

Return stations information for which metadata or product data are available on disk.

This function queries the DISDRODB Metadata Archive and, optionally, the local DISDRODB Data Archive to identify stations that satisfy the specified filters.

If the DISDRODB product is not specified, it lists the stations present in the DISDRODB Metadata Archive given the specified filtering criteria. If the DISDRODB product is specified, it lists the stations present in the local DISDRODB Data Archive given the specified filtering criteria.

Parameters:

product (str or None, optional) –
Name of the product to filter on (e.g., “RAW”, “L0A”, “L1”).

If the DISDRODB product is not specified (default), it lists the stations present in the DISDRODB Metadata Archive given the specified filtering criteria.

If the DISDRODB product is specified, it lists the stations present in the local DISDRODB Data Archive given the specified filtering criteria. The default is is None.
data_sources (str or sequence of str, optional) – One or more data source identifiers to filter stations by. The name(s) must be UPPER CASE. If None, no filtering on data source is applied. The default is is None.
campaign_names (str or sequence of str, optional) – One or more campaign names to filter stations by. The name(s) must be UPPER CASE. If None, no filtering on campaign is applied. The default is is None.
station_names (str or sequence of str, optional) – One or more station names to include. If None, all stations matching other filters are considered. The default is is None.
available_data (bool, optional) –
If product is not specified:
- if available_data is False, return stations present in the DISDRODB Metadata Archive
- if available_data is True, return stations with data available on the
online DISDRODB Decentralized Data Archive (i.e., stations with the disdrodb_data_url in the metadata).
If product is specified:
- if available_data is False, return stations where the product directory exists in the in the local DISDRODB Data Archive
- if available_data is True, return stations where product data exists in the in the local DISDRODB Data Archive.
The default is is False.
return_tuple (bool, optional) – If True, return a list of tuples (data_source, campaign_name, station_name). If False, return only a list of station names The default is True.
raise_error_if_empty (bool, optional) – If True and no stations satisfy the criteria, raise a ValueError. If False, return an empty list/tuple. The default is False.
invalid_fields_policy ({'raise', 'warn', 'ignore'}, optional) –
How to handle invalid filter values for data_sources, campaign_names, or station_names that are not present in the metadata archive:
- ’raise’ : raise a ValueError (default)
- ’warn’ : emit a warning, then ignore invalid entries
- ’ignore’: silently drop invalid entries
data_archive_dir (str or Path-like, optional) – Path to the root of the local DISDRODB Data Archive. Required only if ``product``is specified. If None, the default data archive base directory is used. Default is None.
metadata_archive_dir (str or Path-like, optional) – Path to the root of the DISDRODB Metadata Archive. If None, the default metadata base directory is used. Default is None.
**product_kwargs (dict, optional) – Additional arguments required for some products. For example, for the “L2E” product, you need to specify rolling and sample_interval. For the “L2M” product, you need to specify also the model_name.

Returns:

If return_tuple=True, return a list of tuples (data_source, campaign_name, station_name). If return_tuple=True,, return a list of station names.

Return type:

list

Examples

>>> # List all stations present in the DISDRODB Metadata Archive
>>> stations = available_stations()
>>> # List all stations present in the online DISDRODB Data Archive
>>> stations = available_stations(available_data=True)
>>> # List stations with raw data available in the local DISDRODB Data Archive
>>> raw_stations = available_stations(product="RAW", available_data=True)
>>> # List stations of specific data sources
>>> stations = available_stations(data_sources=["GPM", "EPFL"])

disdrodb.check_metadata_archive(metadata_archive_dir: str | None = None, raise_error=False)[source][source]#

Check the archive metadata compliance.

Parameters:

metadata_archive_dir (str (optional)) – The directory path where the DISDRODB Metadata Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the metadata_archive_dir path specified in the DISDRODB active configuration.
raise_error (bool (optional)) – Whether to raise an error and interrupt the archive check if a metadata is not compliant. The default value is False.

Returns:

If the check succeeds, the result is True, otherwise False.

Return type:

bool

disdrodb.check_metadata_archive_geolocation(metadata_archive_dir: str | None = None)[source][source]#

Check the metadata files have missing or wrong geolocation..

Parameters:: metadata_archive_dir (str (optional)) – The directory path where the DISDRODB Metadata Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the metadata_archive_dir path specified in the DISDRODB active configuration.
Returns:: If the check succeeds, the result is True, otherwise False.
Return type:: bool

disdrodb.check_station_metadata(data_source, campaign_name, station_name, metadata_archive_dir=None)[source][source]#: Check DISDRODB metadata compliance.

Defines the DISDRODB configuration file with the given credentials and base directory.

Parameters:

data_archive_dir (str) – The directory path where the DISDRODB Data Archive is located.
metadata_archive_dir (str) – The directory path where the DISDRODB Metadata Archive is located.
folder_partitioning (str) – The folder partitioning scheme used in the DISDRODB Data Archive. Allowed values are: - “”: No additional subdirectories, files are saved directly in <station_dir>. - “year”: Files are stored under a subdirectory for the year (<station_dir>/2025). - “year/month”: Files are stored under subdirectories by year and month (<station_dir>/2025/04). - “year/month/day”: Files are stored under subdirectories by year, month and day (<station_dir>/2025/04/01). - “year/month_name”: Files are stored under subdirectories by year and month name (<station_dir>/2025/April). - “year/quarter”: Files are stored under subdirectories by year and quarter (<station_dir>/2025/Q2).
zenodo__token (str) – Zenodo Access Token. It is required to upload stations data to Zenodo.
zenodo_sandbox_token (str) – Zenodo Sandbox Access Token. It is required to upload stations data to Zenodo Sandbox.

Notes

Download DISDRODB stations with the disdrodb_data_url in the metadata.

Parameters:

data_sources (str or list of str, optional) – Data source name (eg : EPFL). If not provided (None), all data sources will be downloaded. The default value is data_source=None.
campaign_names (str or list of str, optional) – Campaign name (eg : EPFL_ROOF_2012). If not provided (None), all campaigns will be downloaded. The default value is campaign_name=None.
station_names (str or list of str, optional) – Station name. If not provided (None), all stations will be downloaded. The default value is station_name=None.
force (bool, optional) – If True, overwrite the already existing raw data file. The default value is False.
data_archive_dir (str (optional)) – DISDRODB Data Archive directory. Format: <...>/DISDRODB. If None (the default), the disdrodb config variable data_archive_dir is used.

disdrodb.download_metadata_archive(directory_path, force=False)[source][source]#

Download the DISDRODB Metadata Archive to the specified directory.

Parameters:

directory_path (str) – The directory path where the DISDRODB-METADATA directory will be downloaded.
force (bool, optional) – If True, the existing DISDRODB-METADATA directory will be removed and a new one will be downloaded. The default value is False.

Returns:

The DISDRODB Metadata Archive directory path.

Return type:

metadata_archive_dir

disdrodb.download_station(data_source: str, campaign_name: str, station_name: str, force: bool = False, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None) → None[source][source]#

Download data of a single DISDRODB station from the DISDRODB remote repository.

Parameters:

data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.
campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.
station_name (str) – The name of the station.
data_archive_dir (str (optional)) – The base directory of DISDRODB, expected in the format <...>/DISDRODB. If not specified, the path specified in the DISDRODB active configuration will be used.
force (bool, optional) – If True, overwrite the already existing raw data file. The default value is False.
data_archive_dir – DISDRODB Data Archive directory. Format: <...>/DISDRODB. If None (the default), the disdrodb config variable data_archive_dir is used.

disdrodb.find_files(data_source, campaign_name, station_name, product, debugging_mode: bool = False, data_archive_dir: str | None = None, glob_pattern='*', **product_kwargs)[source][source]#

Retrieve DISDRODB product files for a give station.

Parameters:

data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.
campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.
station_name (str) – The name of the station.
product (str) – The name DISDRODB product.
debugging_mode (bool, optional) – If True, it select maximum 3 files for debugging purposes. The default value is False.
data_archive_dir (str, optional) – The base directory of DISDRODB, expected in the format <...>/DISDRODB. If not specified, the path specified in the DISDRODB active configuration will be used.
glob_pattern (str, optional) – Glob pattern to search for raw data files. The default is “*”. The argument is used only if product=”RAW”.
sample_interval (int, optional) – The sampling interval in seconds of the product. It must be specified only for product L2E and L2M !
rolling (bool, optional) – Whether the dataset has been resampled by aggregating or rolling. It must be specified only for product L2E and L2M !
model_name (str) – The model name of the statistical distribution for the DSD. It must be specified only for product L2M !

Returns:

filepaths – List of file paths.

Return type:

list

disdrodb.get_data_archive_dir(data_archive_dir=None)[source][source]#: Return the DISDRODB base directory.

disdrodb.get_metadata_archive_dir(metadata_archive_dir=None)[source][source]#: Return the DISDRODB Metadata Archive Directory.

disdrodb.get_reader(reader_reference, sensor_name)[source][source]#

Retrieve the reader function.

Parameters:

reader_reference (str) – The reader reference name. The reader is located at disdrodb.l0.readers.{sensor_name}.{reader_reference}. The reader_reference naming convention is "{DATA_SOURCE}"/"{CAMPAIGN_NAME}_{OPTIONAL_SUFFIX}".
sensor_name (str) – The sensor name.

Returns:

The reader() function.

Return type:

callable

disdrodb.get_station_reader(data_source, campaign_name, station_name, metadata_archive_dir=None)[source][source]#: Retrieve the reader function of a specific DISDRODB station.

disdrodb.open_dataset(data_source, campaign_name, station_name, product, product_kwargs=None, debugging_mode: bool = False, data_archive_dir: str | None = None, **open_kwargs)[source][source]#

Retrieve DISDRODB product files for a give station.

Parameters:

data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.
campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.
station_name (str) – The name of the station.
product (str) – The name DISDRODB product.
sample_interval (int, optional) – The sampling interval in seconds of the product. It must be specified only for product L2E and L2M !
rolling (bool, optional) – Whether the dataset has been resampled by aggregating or rolling. It must be specified only for product L2E and L2M !
model_name (str) – The model name of the statistical distribution for the DSD. It must be specified only for product L2M !
debugging_mode (bool, optional) – If True, it select maximum 3 files for debugging purposes. The default value is False.
data_archive_dir (str, optional) – The base directory of DISDRODB, expected in the format <...>/DISDRODB. If not specified, the path specified in the DISDRODB active configuration will be used.

Return type:

xarray.Dataset

disdrodb.open_documentation()[source][source]#: Open the DISDRODB documentation the browser.

disdrodb.open_logs_directory(data_source, campaign_name, station_name=None, data_archive_dir=None)[source][source]#: Open the DISDRODB Data Archive logs directory of a station.

disdrodb.open_metadata_directory(data_source, campaign_name, station_name=None, metadata_archive_dir=None)[source][source]#: Open the DISDRODB Metadata Archive station(s) metadata directory.

disdrodb.open_product_directory(product, data_source, campaign_name, station_name, data_archive_dir=None)[source][source]#: Open the DISDRODB Data Archive station product directory.

disdrodb.open_sensor_documentation(sensor_name)[source][source]#: Open the sensor documentation PDF in the browser.

disdrodb.read_metadata_archive(metadata_archive_dir=None, data_sources=None, campaign_names=None, station_names=None, available_data=False)[source][source]#

Read the DISDRODB Metadata Archive Database.

Parameters:

metadata_archive_dir (str or Path-like, optional) – Path to the root of the DISDRODB Metadata Archive. If None, the default metadata base directory is used. Default is None.
data_sources (str or sequence of str, optional) – One or more data source identifiers to filter stations by. If None, no filtering on data source is applied. The default is is None.
campaign_names (str or sequence of str, optional) – One or more campaign names to filter stations by. If None, no filtering on campaign is applied. The default is is None.
station_names (str or sequence of str, optional) – One or more station names to include. If None, all stations matching other filters are considered. The default is is None.
available_data (bool, optional) – If True, only information of stations with data available in the online DISDRODB Decentralized Data Archive are returned. If False (the default), all stations present in the DISDRODB Metadata Archive matching the filtering criteria are returned,

Return type:

pandas.DataFrame

disdrodb.read_station_metadata(data_source, campaign_name, station_name, metadata_archive_dir=None)[source][source]#

Open the station metadata YAML file into a dictionary.

Parameters:

data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.
campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.
station_name (str) – The name of the station.
metadata_archive_dir (str, optional) – The directory path where the DISDRODB Metadata Archive is located. If not specified, the path specified in the DISDRODB active configuration will be used. Expected path format: <...>/DISDRODB.

Returns:

metadata – The station metadata dictionary

Return type:

dictionary

disdrodb.run_l0(data_sources=None, campaign_names=None, station_names=None, l0a_processing: bool = True, l0b_processing: bool = True, l0c_processing: bool = True, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L0 processing of DISDRODB stations.

Parameters:

data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default value is None.
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default value is None.
station_names (list) – Station names to process. The default value is None.
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default value is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default value is True.
l0c_processing (bool) – Whether to launch processing to generate L0C netCDF4 file(s) from L0B data. The default value is True.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default value is False.
remove_l0b (bool) – Whether to remove the L0B files after having produced all L0C netCDF files. The default value is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is False.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. For L0B, it processes just the first 100 rows of 3 L0A files. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.run_l0_station(data_source, campaign_name, station_name, l0a_processing: bool = True, l0b_processing: bool = True, l0c_processing: bool = True, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L0 processing of a specific DISDRODB station from the terminal.

Parameters:

data_source (str) – Institution name (when campaign data spans more than 1 country), or country (when all campaigns (or sensor networks) are inside a given country). Must be UPPER CASE.
campaign_name (str) – Campaign name. Must be UPPER CASE.
station_name (str) – Station name
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default value is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default value is True.
l0b_processing – Whether to launch processing to generate L0C netCDF4 file(s) from L0B data. The default value is True.
l0c_processing (bool) – Whether to launch processing to generate L0C netCDF4 file(s) from L0C data. The default is True.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default value is False.
remove_l0b (bool) – Whether to remove the L0B files after having produced L0C netCDF files. The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is True.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files for each station. For L0B, it processes just the first 100 rows of 3 L0A files for each station. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.run_l0a(data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L0A processing of DISDRODB stations.

Parameters:

data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default value is None.
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default value is None.
station_names (list) – Station names to process. The default value is None.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is True.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.run_l0a_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#: Run the L0A processing of a station calling the disdrodb_l0a_station in the terminal.

disdrodb.run_l0b(data_sources=None, campaign_names=None, station_names=None, remove_l0a: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L0B processing of DISDRODB stations.

Parameters:

data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default value is None.
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default value is None.
station_names (list) – Station names to process. The default value is None.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default value is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is True.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0B, it processes just the first 100 rows of 3 L0A files. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.run_l0b_station(data_source, campaign_name, station_name, remove_l0a: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#: Run the L0B processing of a station calling disdrodb_run_l0b_station in the terminal.

disdrodb.run_l0c(data_sources=None, campaign_names=None, station_names=None, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L0C processing of DISDRODB stations.

Parameters:

data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default value is None.
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default value is None.
station_names (list) – Station names to process. The default value is None.
remove_l0b (bool) – Whether to remove the L0B files after having produced L0C netCDF files. The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is False.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L1B, it processes just 3 L0B files. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.run_l0c_station(data_source, campaign_name, station_name, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#: Run the L0C processing of a station calling the disdrodb_l0c_station in the terminal.

disdrodb.run_l1(data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L1 processing of DISDRODB stations.

Parameters:

data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default value is None.
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default value is None.
station_names (list) – Station names to process. The default value is None.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is False.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L1B, it processes just 3 L0B files. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.run_l1_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#: Run the L1 processing of a station calling the disdrodb_l1_station in the terminal.

disdrodb.run_l2e(data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L2E processing of DISDRODB stations.

Parameters:

data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default value is None.
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default value is None.
station_names (list) – Station names to process. The default value is None.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is False.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L2E, it processes just 3 L1 files. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.run_l2e_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#: Run the L2E processing of a station calling the disdrodb_l1_station in the terminal.

disdrodb.run_l2m(data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Run the L2M processing of DISDRODB stations.

Parameters:

data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default value is None.
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default value is None.
station_names (list) – Station names to process. The default value is None.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default value is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default value is False.
parallel (bool) – If True, the files are processed simultaneously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L2MB, it processes just 3 L0B files. The default value is False.
data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

disdrodb.run_l2m_station(data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#: Run the L2M processing of a station calling the disdrodb_l2m_station in the terminal.

disdrodb package

Contents

disdrodb package#

Subpackages#

Submodules#

disdrodb.configs module#

disdrodb.docs module#

disdrodb.routines module#

Module contents#