disdrodb.metadata package#

Submodules#

disdrodb.metadata.checks module#

Check metadata.

disdrodb.metadata.checks.check_metadata_archive(metadata_archive_dir: str | None = None, raise_error=False)[source][source]#

Check the archive metadata compliance.

Parameters:
  • metadata_archive_dir (str (optional)) – The directory path where the DISDRODB Metadata Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the metadata_archive_dir path specified in the DISDRODB active configuration.

  • raise_error (bool (optional)) – Whether to raise an error and interrupt the archive check if a metadata is not compliant. The default value is False.

Returns:

If the check succeeds, the result is True, otherwise False.

Return type:

bool

disdrodb.metadata.checks.check_metadata_archive_campaign_name(metadata_archive_dir: str | None = None) bool[source][source]#

Check metadata campaign_name.

Parameters:

metadata_archive_dir (str (optional)) – The directory path where the DISDRODB Metadata Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the metadata_archive_dir path specified in the DISDRODB active configuration.

Returns:

If the check succeeds, the result is True, otherwise False.

Return type:

bool

disdrodb.metadata.checks.check_metadata_archive_data_source(metadata_archive_dir: str | None = None) bool[source][source]#

Check metadata data_source.

Parameters:

metadata_archive_dir (str (optional)) – The directory path where the DISDRODB Metadata Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the metadata_archive_dir path specified in the DISDRODB active configuration.

Returns:

If the check succeeds, the result is True, otherwise False.

Return type:

bool

disdrodb.metadata.checks.check_metadata_archive_geolocation(metadata_archive_dir: str | None = None)[source][source]#

Check the metadata files have missing or wrong geolocation..

Parameters:

metadata_archive_dir (str (optional)) – The directory path where the DISDRODB Metadata Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the metadata_archive_dir path specified in the DISDRODB active configuration.

Returns:

If the check succeeds, the result is True, otherwise False.

Return type:

bool

disdrodb.metadata.checks.check_metadata_archive_keys(metadata_archive_dir: str | None = None) bool[source][source]#

Check that all metadata files have valid keys.

Parameters:

metadata_archive_dir (str (optional)) – The directory path where the DISDRODB Metadata Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the metadata_archive_dir path specified in the DISDRODB active configuration.

Returns:

If the check succeeds, the result is True, otherwise False.

Return type:

bool

disdrodb.metadata.checks.check_metadata_archive_reader(metadata_archive_dir: str | None = None) bool[source][source]#

Check if the reader key is available and there is the associated reader.

Parameters:

metadata_archive_dir (str (optional)) – The directory path where the DISDRODB Metadata Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the metadata_archive_dir path specified in the DISDRODB active configuration.

Returns:

If the check succeeds, the result is True, otherwise False.

Return type:

bool

disdrodb.metadata.checks.check_metadata_archive_sensor_name(metadata_archive_dir: str | None = None) bool[source][source]#

Check metadata sensor_name.

Parameters:

metadata_archive_dir (str (optional)) – The directory path where the DISDRODB Metadata Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the metadata_archive_dir path specified in the DISDRODB active configuration.

Returns:

If the check succeeds, the result is True, otherwise False.

Return type:

bool

disdrodb.metadata.checks.check_metadata_archive_station_name(metadata_archive_dir: str | None = None) bool[source][source]#

Check metadata station_name.

Parameters:

metadata_archive_dir (str (optional)) – The directory path where the DISDRODB Metadata Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the metadata_archive_dir path specified in the DISDRODB active configuration.

Returns:

If the check succeeds, the result is True, otherwise False.

Return type:

bool

disdrodb.metadata.checks.check_station_metadata(data_source, campaign_name, station_name, metadata_archive_dir=None)[source][source]#

Check DISDRODB metadata compliance.

disdrodb.metadata.checks.check_station_metadata_geolocation(metadata) None[source][source]#

Identify metadata with missing or wrong geolocation.

disdrodb.metadata.checks.get_metadata_invalid_keys(metadata)[source][source]#

Return the DISDRODB metadata keys which are not valid.

disdrodb.metadata.checks.get_metadata_missing_keys(metadata)[source][source]#

Return the DISDRODB metadata keys which are missing.

disdrodb.metadata.checks.identify_empty_metadata_keys(metadata_filepaths: list, keys: str | list) None[source][source]#

Identify empty metadata keys.

Parameters:
  • metadata_filepaths (str) – Input YAML file path.

  • keys (Union[str,list]) – Attributes to verify the presence.

disdrodb.metadata.checks.identify_missing_metadata_coords(metadata_filepaths: str) None[source][source]#

Identify missing coordinates.

Parameters:

metadata_filepaths (str) – Input YAML file path.

Raises:

TypeError – Error if latitude or longitude coordinates are not present or are wrongly formatted.

disdrodb.metadata.download module#

Routine to download the DISDRODB Metadata Archive from GitHub.

disdrodb.metadata.download.download_metadata_archive(directory_path, force=False)[source][source]#

Download the DISDRODB Metadata Archive to the specified directory.

Parameters:
  • directory_path (str) – The directory path where the DISDRODB-METADATA directory will be downloaded.

  • force (bool, optional) – If True, the existing DISDRODB-METADATA directory will be removed and a new one will be downloaded. The default value is False.

Returns:

The DISDRODB Metadata Archive directory path.

Return type:

metadata_archive_dir

disdrodb.metadata.geolocation module#

Metadata tools to verify/complete geolocation information.

disdrodb.metadata.geolocation.infer_altitude(latitude, longitude, dem='aster30m')[source][source]#

Infer station altitude using a Digital Elevation Model (DEM).

This function uses the OpenTopoData API to infer the altitude of a given location specified by latitude and longitude. By default, it uses the ASTER DEM at 30m resolution.

Parameters:
  • latitude (float) – The latitude of the location for which to infer the altitude.

  • longitude (float) – The longitude of the location for which to infer the altitude.

  • dem (str, optional) – The DEM to use for altitude inference. Options are “aster30m” (default), “srtm30”, and “mapzen”.

Returns:

elevation – The inferred altitude of the specified location.

Return type:

float

Raises:

ValueError – If the altitude retrieval fails.

Notes

  • The OpenTopoData API has a limit of 1000 calls per day.

  • Each request can include up to 100 locations.

  • The API allows a maximum of 1 call per second.

References

https://www.opentopodata.org/api/

disdrodb.metadata.geolocation.infer_altitudes(lats, lons, dem='aster30m')[source][source]#

Infer altitude of a given location using OpenTopoData API.

Parameters:
  • lats (list or array-like) – List or array of latitude coordinates.

  • lons (list or array-like) – List or array of longitude coordinates.

  • dem (str, optional) – Digital Elevation Model (DEM) to use for altitude inference. The default DEM is “aster30m”.

Returns:

elevations – Array of inferred altitudes corresponding to the input coordinates.

Return type:

numpy.ndarray

Raises:

ValueError – If the latitude and longitude arrays do not have the same length. If altitude retrieval fails for any block of coordinates.

Notes

  • The OpenTopoData API has a limit of 1000 calls per day.

  • Each request can include up to 100 locations.

  • The API allows a maximum of 1 call per second.

  • The API requests are made in blocks of up to 100 coordinates,

with a 2-second delay between requests.

disdrodb.metadata.info module#

Test Metadata Info Extraction.

disdrodb.metadata.info.get_archive_metadata_key_value(key: str, return_tuple: bool = True, metadata_archive_dir: str | None = None)[source][source]#

Return the values of a metadata key for all the archive.

Parameters:
  • data_archive_dir (str) – Path to the disdrodb directory.

  • key (str) – Metadata key.

  • return_tuple (bool, optional) – If True, returns a tuple (data_source,``campaign_name``,``station_name``, key_value) If False, returns a list of the key values. The default value is True.

  • metadata_archive_dir (str (optional)) – The directory path where the DISDRODB Metadata Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the metadata_archive_dir path specified in the DISDRODB active configuration.

Returns:

List or tuple of values of the metadata key.

Return type:

list or tuple

disdrodb.metadata.manipulation module#

Metadata Manipulation Tools.

disdrodb.metadata.manipulation.add_missing_metadata_keys(metadata)[source][source]#

Add missing keys to the metadata dictionary.

disdrodb.metadata.manipulation.remove_invalid_metadata_keys(metadata)[source][source]#

Remove invalid keys from the metadata dictionary.

disdrodb.metadata.manipulation.sort_metadata_dictionary(metadata)[source][source]#

Sort the keys of the metadata dictionary by valid_metadata_keys list order.

disdrodb.metadata.reader module#

Routines to read the DISDRODB Metadata.

disdrodb.metadata.reader.read_metadata_archive(metadata_archive_dir=None, data_sources=None, campaign_names=None, station_names=None, available_data=False)[source][source]#

Read the DISDRODB Metadata Archive Database.

Parameters:
  • metadata_archive_dir (str or Path-like, optional) – Path to the root of the DISDRODB Metadata Archive. If None, the default metadata base directory is used. Default is None.

  • data_sources (str or sequence of str, optional) – One or more data source identifiers to filter stations by. If None, no filtering on data source is applied. The default is is None.

  • campaign_names (str or sequence of str, optional) – One or more campaign names to filter stations by. If None, no filtering on campaign is applied. The default is is None.

  • station_names (str or sequence of str, optional) – One or more station names to include. If None, all stations matching other filters are considered. The default is is None.

  • available_data (bool, optional) – If True, only information of stations with data available in the online DISDRODB Decentralized Data Archive are returned. If False (the default), all stations present in the DISDRODB Metadata Archive matching the filtering criteria are returned,

Return type:

pandas.DataFrame

disdrodb.metadata.reader.read_station_metadata(data_source, campaign_name, station_name, metadata_archive_dir=None)[source][source]#

Open the station metadata YAML file into a dictionary.

Parameters:
  • data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.

  • campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.

  • station_name (str) – The name of the station.

  • metadata_archive_dir (str, optional) – The directory path where the DISDRODB Metadata Archive is located. If not specified, the path specified in the DISDRODB active configuration will be used. Expected path format: <...>/DISDRODB.

Returns:

metadata – The station metadata dictionary

Return type:

dictionary

disdrodb.metadata.search module#

Routines to manipulate the DISDRODB Metadata Archive.

disdrodb.metadata.search.get_list_metadata(data_sources=None, campaign_names=None, station_names=None, product=None, available_data=False, raise_error_if_empty=False, invalid_fields_policy='raise', data_archive_dir=None, metadata_archive_dir=None, **product_kwargs)[source][source]#

Get station metadata filepaths.

By default, it returns the metadata file paths of stations present in the DISDRODB Metadata Archive matching the filtering criteria.

If the DISDRODB product is specified, it lists only metadata file paths of stations with the specified product present in the local DISDRODB Data Archive.

Parameters:
  • product (str or None, optional) –

    Name of the product to filter on (e.g., “RAW”, “L0A”, “L1”).

    If the DISDRODB product is not specified (default), it returns the metadata file paths of stations present in the DISDRODB Metadata Archive matching the filtering criteria.

    If the DISDRODB product is specified, it lists only metadata file paths of stations with the specified product present in the local DISDRODB Data Archive.

  • available_data (bool, optional) –

    If product is not specified:

    • if available_data is False, return metadata filepaths of stations present in the DISDRODB Metadata Archive

    • if available_data is True, return metadata filepaths of stations with data available on the

    online DISDRODB Decentralized Data Archive (i.e., stations with the disdrodb_data_url in the metadata).

    If product is specified:

    • if available_data is False, return metadata filepaths of stations where

    the product directory exists in the in the local DISDRODB Data Archive - if available_data is True, return metadata filepaths of stations where product data exists in the

    in the local DISDRODB Data Archive.

    The default is is False.

  • data_sources (str or sequence of str, optional) – One or more data source identifiers to filter stations by. The name(s) must be UPPER CASE. If None, no filtering on data source is applied. The default is is None.

  • campaign_names (str or sequence of str, optional) – One or more campaign names to filter stations by. The name(s) must be UPPER CASE. If None, no filtering on campaign is applied. The default is is None.

  • station_names (str or sequence of str, optional) – One or more station names to include. If None, all stations matching other filters are considered. The default is is None.

  • raise_error_if_empty (bool, optional) – If True and no stations satisfy the criteria, raise a ValueError. If False, return an empty list/tuple. The default is False.

  • invalid_fields_policy ({'raise', 'warn', 'ignore'}, optional) –

    How to handle invalid filter values for data_sources, campaign_names, or station_names that are not present in the metadata archive:

    • ’raise’ : raise a ValueError (default)

    • ’warn’ : emit a warning, then ignore invalid entries

    • ’ignore’: silently drop invalid entries

  • data_archive_dir (str or Path-like, optional) – Path to the root of the local DISDRODB Data Archive. Format: <...>/DISDRODB Required only if product``is specified. If None, the``data_archive_dir path specified in the DISDRODB active configuration file is used. The default is None.

  • metadata_archive_dir (str or Path-like, optional) – Path to the root of the DISDRODB Metadata Archive. Format: <...>/DISDRODB If None, the``metadata_archive_dir`` path specified in the DISDRODB active configuratio. The default is None.

  • **product_kwargs (dict, optional) – Additional arguments required for some products. For example, for the “L2E” product, you need to specify rolling and sample_interval. For the “L2M” product, you need to specify also the model_name.

Returns:

metadata_filepaths – List of metadata YAML file paths

Return type:

list

disdrodb.metadata.standards module#

Define DISDRODB Metadata Standards.

disdrodb.metadata.standards.get_valid_metadata_keys() list[source][source]#

Get DISDRODB valid metadata list.

Returns:

List of valid metadata keys

Return type:

list

disdrodb.metadata.writer module#

Routines to write the DISDRODB Metadata.

disdrodb.metadata.writer.create_station_metadata(metadata_archive_dir, data_source, campaign_name, station_name)[source][source]#

Write a default (semi-empty) YAML metadata file for a DISDRODB station.

An error is raised if the file already exists !

Parameters:
  • data_archive_dir (str, optional) – The base directory of DISDRODB, expected in the format <...>/DISDRODB. If not specified, the path specified in the DISDRODB active configuration will be used.

  • data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.

  • campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.

  • station_name (str) – The name of the station.

disdrodb.metadata.writer.get_default_metadata_dict() dict[source][source]#

Get DISDRODB metadata default values.

Returns:

Dictionary of attributes standard

Return type:

dict

Module contents#

disdrodb.metadata.download_metadata_archive(directory_path, force=False)[source][source]#

Download the DISDRODB Metadata Archive to the specified directory.

Parameters:
  • directory_path (str) – The directory path where the DISDRODB-METADATA directory will be downloaded.

  • force (bool, optional) – If True, the existing DISDRODB-METADATA directory will be removed and a new one will be downloaded. The default value is False.

Returns:

The DISDRODB Metadata Archive directory path.

Return type:

metadata_archive_dir

disdrodb.metadata.get_archive_metadata_key_value(key: str, return_tuple: bool = True, metadata_archive_dir: str | None = None)[source][source]#

Return the values of a metadata key for all the archive.

Parameters:
  • data_archive_dir (str) – Path to the disdrodb directory.

  • key (str) – Metadata key.

  • return_tuple (bool, optional) – If True, returns a tuple (data_source,``campaign_name``,``station_name``, key_value) If False, returns a list of the key values. The default value is True.

  • metadata_archive_dir (str (optional)) – The directory path where the DISDRODB Metadata Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the metadata_archive_dir path specified in the DISDRODB active configuration.

Returns:

List or tuple of values of the metadata key.

Return type:

list or tuple

disdrodb.metadata.get_list_metadata(data_sources=None, campaign_names=None, station_names=None, product=None, available_data=False, raise_error_if_empty=False, invalid_fields_policy='raise', data_archive_dir=None, metadata_archive_dir=None, **product_kwargs)[source][source]#

Get station metadata filepaths.

By default, it returns the metadata file paths of stations present in the DISDRODB Metadata Archive matching the filtering criteria.

If the DISDRODB product is specified, it lists only metadata file paths of stations with the specified product present in the local DISDRODB Data Archive.

Parameters:
  • product (str or None, optional) –

    Name of the product to filter on (e.g., “RAW”, “L0A”, “L1”).

    If the DISDRODB product is not specified (default), it returns the metadata file paths of stations present in the DISDRODB Metadata Archive matching the filtering criteria.

    If the DISDRODB product is specified, it lists only metadata file paths of stations with the specified product present in the local DISDRODB Data Archive.

  • available_data (bool, optional) –

    If product is not specified:

    • if available_data is False, return metadata filepaths of stations present in the DISDRODB Metadata Archive

    • if available_data is True, return metadata filepaths of stations with data available on the

    online DISDRODB Decentralized Data Archive (i.e., stations with the disdrodb_data_url in the metadata).

    If product is specified:

    • if available_data is False, return metadata filepaths of stations where

    the product directory exists in the in the local DISDRODB Data Archive - if available_data is True, return metadata filepaths of stations where product data exists in the

    in the local DISDRODB Data Archive.

    The default is is False.

  • data_sources (str or sequence of str, optional) – One or more data source identifiers to filter stations by. The name(s) must be UPPER CASE. If None, no filtering on data source is applied. The default is is None.

  • campaign_names (str or sequence of str, optional) – One or more campaign names to filter stations by. The name(s) must be UPPER CASE. If None, no filtering on campaign is applied. The default is is None.

  • station_names (str or sequence of str, optional) – One or more station names to include. If None, all stations matching other filters are considered. The default is is None.

  • raise_error_if_empty (bool, optional) – If True and no stations satisfy the criteria, raise a ValueError. If False, return an empty list/tuple. The default is False.

  • invalid_fields_policy ({'raise', 'warn', 'ignore'}, optional) –

    How to handle invalid filter values for data_sources, campaign_names, or station_names that are not present in the metadata archive:

    • ’raise’ : raise a ValueError (default)

    • ’warn’ : emit a warning, then ignore invalid entries

    • ’ignore’: silently drop invalid entries

  • data_archive_dir (str or Path-like, optional) – Path to the root of the local DISDRODB Data Archive. Format: <...>/DISDRODB Required only if product``is specified. If None, the``data_archive_dir path specified in the DISDRODB active configuration file is used. The default is None.

  • metadata_archive_dir (str or Path-like, optional) – Path to the root of the DISDRODB Metadata Archive. Format: <...>/DISDRODB If None, the``metadata_archive_dir`` path specified in the DISDRODB active configuratio. The default is None.

  • **product_kwargs (dict, optional) – Additional arguments required for some products. For example, for the “L2E” product, you need to specify rolling and sample_interval. For the “L2M” product, you need to specify also the model_name.

Returns:

metadata_filepaths – List of metadata YAML file paths

Return type:

list

disdrodb.metadata.read_metadata_archive(metadata_archive_dir=None, data_sources=None, campaign_names=None, station_names=None, available_data=False)[source][source]#

Read the DISDRODB Metadata Archive Database.

Parameters:
  • metadata_archive_dir (str or Path-like, optional) – Path to the root of the DISDRODB Metadata Archive. If None, the default metadata base directory is used. Default is None.

  • data_sources (str or sequence of str, optional) – One or more data source identifiers to filter stations by. If None, no filtering on data source is applied. The default is is None.

  • campaign_names (str or sequence of str, optional) – One or more campaign names to filter stations by. If None, no filtering on campaign is applied. The default is is None.

  • station_names (str or sequence of str, optional) – One or more station names to include. If None, all stations matching other filters are considered. The default is is None.

  • available_data (bool, optional) – If True, only information of stations with data available in the online DISDRODB Decentralized Data Archive are returned. If False (the default), all stations present in the DISDRODB Metadata Archive matching the filtering criteria are returned,

Return type:

pandas.DataFrame

disdrodb.metadata.read_station_metadata(data_source, campaign_name, station_name, metadata_archive_dir=None)[source][source]#

Open the station metadata YAML file into a dictionary.

Parameters:
  • data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.

  • campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.

  • station_name (str) – The name of the station.

  • metadata_archive_dir (str, optional) – The directory path where the DISDRODB Metadata Archive is located. If not specified, the path specified in the DISDRODB active configuration will be used. Expected path format: <...>/DISDRODB.

Returns:

metadata – The station metadata dictionary

Return type:

dictionary