disdrodb.data_transfer package#

Submodules#

disdrodb.data_transfer.download_data module#

Routines to download data from the DISDRODB Decentralized Data Archive.

disdrodb.data_transfer.download_data.build_webserver_wget_command(url: str, cut_dirs: int, dst_dir: str, force: bool, verbose: bool) list[str][source][source]#

Construct the wget command list for subprocess.run.

Notes

The following wget arguments are used
  • -q

    : quiet mode (no detailed progress)

  • -r

    : recursive

  • -np

    : no parent

  • -nH

    : no host directories

  • –timestamping: download missing files or when remote version is newer

  • –cut-dirs : strip all but the last path segment from the remote path

  • -P dst_dir : download into dst_dir

  • url

disdrodb.data_transfer.download_data.check_consistent_station_name(metadata_filepath, station_name)[source][source]#

Check consistent station_name between YAML file name and metadata key.

disdrodb.data_transfer.download_data.click_download_archive_options(function: object)[source][source]#

Click command line options for DISDRODB archive download.

Parameters:

function (object) – Function.

disdrodb.data_transfer.download_data.click_download_options(function: object)[source][source]#

Click command line options for DISDRODB download.

Parameters:

function (object) – Function.

disdrodb.data_transfer.download_data.compute_cut_dirs(url: str) int[source][source]#

Compute the wget cut_dirs value to download directly in dst_dir.

Given a URL ending with ‘/’, compute the total number of path segments. By returning len(segments), we strip away all of them—so that files within that final directory land directly in dst_dir without creating an extra subfolder.

disdrodb.data_transfer.download_data.download_archive(data_sources: str | list[str] | None = None, campaign_names: str | list[str] | None = None, station_names: str | list[str] | None = None, force: bool = False, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None)[source][source]#

Download DISDRODB stations with the disdrodb_data_url in the metadata.

Parameters:
  • data_sources (str or list of str, optional) – Data source name (eg : EPFL). If not provided (None), all data sources will be downloaded. The default value is data_source=None.

  • campaign_names (str or list of str, optional) – Campaign name (eg : EPFL_ROOF_2012). If not provided (None), all campaigns will be downloaded. The default value is campaign_name=None.

  • station_names (str or list of str, optional) – Station name. If not provided (None), all stations will be downloaded. The default value is station_name=None.

  • force (bool, optional) – If True, overwrite the already existing raw data file. The default value is False.

  • data_archive_dir (str (optional)) – DISDRODB Data Archive directory. Format: <...>/DISDRODB. If None (the default), the disdrodb config variable data_archive_dir is used.

disdrodb.data_transfer.download_data.download_station(data_source: str, campaign_name: str, station_name: str, force: bool = False, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None) None[source][source]#

Download data of a single DISDRODB station from the DISDRODB remote repository.

Parameters:
  • data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.

  • campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.

  • station_name (str) – The name of the station.

  • data_archive_dir (str (optional)) – The base directory of DISDRODB, expected in the format <...>/DISDRODB. If not specified, the path specified in the DISDRODB active configuration will be used.

  • force (bool, optional) – If True, overwrite the already existing raw data file. The default value is False.

  • data_archive_dir – DISDRODB Data Archive directory. Format: <...>/DISDRODB. If None (the default), the disdrodb config variable data_archive_dir is used.

disdrodb.data_transfer.download_data.download_station_data(metadata_filepath: str, data_archive_dir: str, force: bool = False) None[source][source]#

Download and unzip the station data .

Parameters:
  • metadata_filepaths (str) – Metadata file path.

  • data_archive_dir (str (optional)) – DISDRODB Data Archive directory. Format: <...>/DISDRODB. If None (the default), the disdrodb config variable data_archive_dir is used.

  • force (bool, optional) – If True, delete existing files and redownload it. The default value is False.

disdrodb.data_transfer.download_data.download_web_server_data(url: str, dst_dir: str, force=True, verbose=True) None[source][source]#

Download data from a web server via HTTP or HTTPS.

Use the system’s wget command to recursively download all files and subdirectories under the given HTTPS “directory” URL. Works on both Windows and Linux, provided that wget is installed and on the PATH.

  1. Ensure wget is available.

  2. Normalize URL to end with ‘/’.

  3. Compute cut-dirs so that only the last segment of the path remains locally.

  4. Build and run the wget command.

Example:

download_with_wget(”https://ruisdael.citg.tudelft.nl/parsivel/PAR001_Cabauw/2021/202101/”) # → Creates a local folder “202101/” with all files and subfolders.

disdrodb.data_transfer.download_data.download_zenodo_zip_file(url, dst_dir, force)[source][source]#

Download zip file from zenodo and extract station raw data.

disdrodb.data_transfer.download_data.ensure_trailing_slash(url: str) str[source][source]#

Return url guaranteed to end with a slash.

disdrodb.data_transfer.download_data.ensure_wget_available() None[source][source]#

Raise FileNotFoundError if ‘wget’ is not on the system PATH.

disdrodb.data_transfer.upload_data module#

Routines to upload data to the DISDRODB Decentralized Data Archive.

disdrodb.data_transfer.upload_data.click_upload_archive_options(function: object)[source][source]#

Click command line options for DISDRODB archive upload.

Parameters:

function (object) – Function.

disdrodb.data_transfer.upload_data.click_upload_options(function: object)[source][source]#

Click command arguments for DISDRODB data upload.

disdrodb.data_transfer.upload_data.upload_archive(platform: str | None = None, force: bool = False, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None, **fields_kwargs) None[source][source]#

Find all stations containing local data and upload them to a remote repository.

Parameters:
  • platform (str, optional) – Name of the remote platform. The default platform is "sandbox.zenodo" (for testing purposes). Switch to "zenodo" for final data dissemination.

  • force (bool, optional) – If True, upload even if data already exists on another remote location. The default value is force=False.

  • data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

  • data_sources (str or list of str, optional) – Data source name (eg: EPFL). If not provided (None), all data sources will be uploaded. The default value is data_source=None.

  • campaign_names (str or list of str, optional) – Campaign name (eg: EPFL_ROOF_2012). If not provided (None), all campaigns will be uploaded. The default value is campaign_name=None.

  • station_names (str or list of str, optional) – Station name. If not provided (None), all stations will be uploaded. The default value is station_name=None.

disdrodb.data_transfer.upload_data.upload_station(data_source: str, campaign_name: str, station_name: str, platform: str | None = 'sandbox.zenodo', force: bool = False, data_archive_dir: str | None = None, metadata_archive_dir: str | None = None) None[source][source]#

Upload data from a single DISDRODB station on a remote repository.

This function also automatically update the disdrodb_data url in the metadata file.

Parameters:
  • data_source (str) – The name of the institution (for campaigns spanning multiple countries) or the name of the country (for campaigns or sensor networks within a single country). Must be provided in UPPER CASE.

  • campaign_name (str) – The name of the campaign. Must be provided in UPPER CASE.

  • station_name (str) – The name of the station.

  • data_archive_dir (str (optional)) – The directory path where the DISDRODB Data Archive is located. The directory path must end with <...>/DISDRODB. If None, it uses the data_archive_dir path specified in the DISDRODB active configuration.

  • platform (str, optional) – Name of the remote data storage platform. The default platform is "sandbox.zenodo" (for testing purposes). Switch to "zenodo" for final data dissemination.

  • force (bool, optional) – If True, upload the data and overwrite the disdrodb_data_url. The default value is force=False.

disdrodb.data_transfer.zenodo module#

DISDRODB Zenodo utility.

disdrodb.data_transfer.zenodo.upload_station_to_zenodo(metadata_filepath: str, station_zip_filepath: str, sandbox: bool = True) str[source][source]#

Zip station data, upload data to Zenodo and update the metadata disdrodb_data_url.

Parameters:
  • metadata_filepath (str) – Path to the station metadata file.

  • station_zip_filepath (str) – Path to the zip file containing the station data.

  • sandbox (bool) – If True, upload to Zenodo Sandbox (for testing purposes). If False, upload to Zenodo.

Module contents#

Routines to download and upload data to the DISDRODB Decentralized Data Archive.