Quick Start#

This quick start guide demonstrates how to download raw disdrometer data from the DISDRODB Decentralized Data Archive and generate DISDRODB products on your local machine.

Prerequisites:

Virtual environment with DISDRODB installed (see Installation)
DISDRODB Metadata Archive cloned locally
Sufficient disk space for raw data and products

What You’ll Learn:

How to configure DISDRODB directories
How to download station data from the archive
How to generate DISDRODB L0, L1, and L2E products
How to open and analyze products with xarray
How to explore available stations and filter by criteria

1. Download the DISDRODB Metadata Archive#

The DISDRODB Metadata Archive contains station information, data sources, and pointers to raw data in the Decentralized Data Archive.

Navigate to the directory where you want to store the DISDRODB Metadata Archive:

cd /path/to/directory/where/to/store/the/metadata/archive

Clone the DISDRODB Metadata Archive repository:

git clone https://github.com/ltelab/DISDRODB-METADATA.git

This creates a DISDRODB-METADATA directory.

Note

Git Required: Ensure git is installed on your system.

Ubuntu/Debian: sudo apt-get install git
macOS: brew install git or install Xcode Command Line Tools
Windows: Download from https://git-scm.com/

Note

The DISDRODB Metadata Archive is regularly updated with new stations and metadata. We recommend updating your local copy periodically by running git pull inside the DISDRODB-METADATA directory.

2. Configure DISDRODB Directories#

DISDRODB requires two main directory paths to operate:

Metadata Archive Directory (metadata_archive_dir): Path to the DISDRODB subdirectory within your cloned DISDRODB-METADATA repository (contains station information and metadata)
Data Archive Directory (data_archive_dir): Path where DISDRODB will store downloaded raw data and all processed products (L0, L1, L2)

DISDRODB will search for a configuration file named ~/.config_disdrodb.yml in your home directory.

Create Configuration File

To create the DISDRODB configuration file, adapt and run the following Python code snippet. See disdrodb.define_configs() for more details. Note that paths must end with \DISDRODB on Windows or /DISDRODB on macOS/Linux.

import disdrodb

metadata_archive_dir = "<path_to>/DISDRODB-METADATA/DISDRODB"
data_archive_dir = "<path_of_choice_to_the_local_data_archive>/DISDRODB"
disdrodb.define_configs(metadata_archive_dir=metadata_archive_dir, data_archive_dir=data_archive_dir)

This creates ~/.config_disdrodb.yml in your home directory, which DISDRODB uses as the default configuration.

Verify Configuration

Restart your Python session and verify the configuration:

import disdrodb

print("DISDRODB Metadata Archive Directory: ", disdrodb.get_metadata_archive_dir())
print("DISDRODB Data Archive Directory: ", disdrodb.get_data_archive_dir())

You can also verify and print the default DISDRODB Metadata Archive and Data Archive directories using the following terminal commands:

disdrodb_data_archive_directory
disdrodb_metadata_archive_directory

See disdrodb.get_metadata_archive_dir() and disdrodb.get_data_archive_dir() for API details.

Although not recommended for beginner users, you can also define the DISDRODB Data and Metadata Archive directories using environment variables. Set the DISDRODB_DATA_ARCHIVE_DIR and DISDRODB_METADATA_ARCHIVE_DIR variables either directly in your terminal or by adding them to your .bashrc (or equivalent shell configuration) file.

To set them in the terminal:

export DISDRODB_DATA_ARCHIVE_DIR="<path_of_choice_to_the_local_data_archive>/DISDRODB"
export DISDRODB_METADATA_ARCHIVE_DIR="<path_to>/DISDRODB-METADATA/DISDRODB"

Note

Environment variables DISDRODB_DATA_ARCHIVE_DIR and DISDRODB_METADATA_ARCHIVE_DIR, if defined, take priority over the paths specified in the .config_disdrodb.yml file.

Optional: Configure T-Matrix Scattering Tables (Advanced)

If you installed pyTmatrix to simulate radar variables in DISDRODB L2 products, you can specify a custom directory for storing T-matrix scattering lookup tables (these tables can be large and are reused across processing runs):

import disdrodb

scattering_table_dir = (
    "<path_of_choice_to_the_local_scattering_table_dir>/"  # Created automatically if it doesn't exist
)
disdrodb.define_configs(scattering_table_dir=scattering_table_dir)

3. Download Raw Disdrometer Data#

The DISDRODB Decentralized Data Archive stores raw disdrometer data contributed by the community. A growing number of stations are currently available for download, with new stations being added regularly.

Check Available Stations

Use disdrodb.available_stations() to list stations with downloadable data:

import disdrodb

disdrodb.available_stations(available_data=True)

By periodically updating the DISDRODB Metadata Archive (git pull), you can access newly available stations.

Download Station Data

To download raw data for specific stations:

disdrodb_download_archive --data_sources <data_source> --campaign_names <campaign_name> --station_names <station_name> --force False

The data_sources, campaign_names, and station_names parameters are optional and allow you to restrict the download to specific data sources, campaigns, and/or stations.

Command Parameters:

--data_sources (optional): Filter by data source (e.g., institution or country name)
--campaign_names (optional): Filter by campaign name (measurement campaign or network)
--station_names (optional): Filter by station name
--force (optional, default = False): Overwrite existing files if set to True

To download data from multiple data sources, campaigns, or stations, provide a space-separated string.

For example:

To download all EPFL and NASA data: --data_sources "EPFL NASA",
To download stations from specific campaigns: --campaign_names "HYMEX_LTE_SOP3 HYMEX_LTE_SOP4",
To download specific stations: --station_names "station_name1 station_name2".

Quick Start Example

For this tutorial, we’ll download a single station from the EPFL data source:

disdrodb_download_station EPFL HYMEX_LTE_SOP3 10

This command downloads data for:

Data Source: EPFL (École Polytechnique Fédérale de Lausanne)
Campaign: HYMEX_LTE_SOP3 (HyMeX Long-Term Experiment Special Observation Period 3)
Station: 10 (station identifier)

See disdrodb.download_station() for Python API.

4. Generate DISDRODB Products#

Once you have downloaded the raw data, you can generate standardized DISDRODB products.

Understanding the Processing Chain

DISDRODB processes raw disdrometer data through several stages:

L0 Processing: Converts raw data into standardized NetCDF format with quality control. Each day’s data is saved as a separate NetCDF file with CF-compliant metadata.
L1 Processing: Ingests L0C files and aggregates data at user-defined temporal resolutions (e.g., 1-minute, 5-minute, 10-minute). Performs quality checks, data homogenization, and applies a hydrometeor classification algorithm to differentiate between rain, snow, mixed precipitation, and non-hydrometeor particles.
L2 Processing: Generates advanced products from L1 data, including DSD moments, microphysical parameters (rain rate, liquid water content), and simulated radar variables.

Processing Chain Overview

Raw Data → L0A → L0B → L0C → L1 → L2E → L2M

L0: Standardized format with quality control (see L0A, L0B, L0C)
L1: Temporally resampled with hydrometeor classification (see L1)
L2E: Empirical rainfall parameters and radar observables (see L2E)
L2M: Modeled DSD parameters from parametric fitting (see L2M)

For detailed information about DISDRODB products and processing customization, see the DISDRODB Products and Products Configuration sections.

Quick Test Run (Debugging Mode)

To quickly test the processing chain on a small sample, use debugging mode. This processes only 3 raw files with verbose output:

disdrodb_run_l0_station EPFL HYMEX_LTE_SOP3 10 --debugging_mode True --parallel False --verbose True
disdrodb_run_l1_station EPFL HYMEX_LTE_SOP3 10 --debugging_mode True --parallel False --verbose True

This is useful for testing before running the full processing on all station data.

Process All Station Data

To process all available data for the station (recommended for actual data analysis):

disdrodb_run_l0_station EPFL HYMEX_LTE_SOP3 10 --parallel True --force True
disdrodb_run_l1_station EPFL HYMEX_LTE_SOP3 10 --parallel True --force True
disdrodb_run_l2e_station EPFL HYMEX_LTE_SOP3 10 --parallel True --force True

See disdrodb.run_l0_station(), disdrodb.run_l1_station(), and disdrodb.run_l2e_station() for Python API.

Create All Products with a Single Command

Generate all DISDRODB products (L0 → L1 → L2E) in one command:

disdrodb_run_station EPFL HYMEX_LTE_SOP3 10 --parallel True --force True

See disdrodb.run_station() for Python API.

Monitor Processing

While processing runs, you can monitor progress:

Dask Dashboard: View real-time parallel processing status at http://localhost:8787/status
Processing Logs: Check detailed logs for troubleshooting:
```
disdrodb_open_logs_directory EPFL HYMEX_LTE_SOP3 10
```

Common Command Options

--force True: Overwrite existing product files (useful when reprocessing)
--parallel True: Enable parallel processing across multiple CPU cores (default: True)
--verbose True: Print detailed processing information to console
--debugging_mode True: Process only 3 files for quick testing (default: False)

For complete processing options and batch processing, see Archive Processing.

5. Create Summary Figures and Tables#

After generating products, DISDRODB can automatically create summary visualizations and statistics for your station. These summaries provide a quick overview of data availability, quality, and key measurements.

disdrodb_create_summary_station EPFL HYMEX_LTE_SOP3 10

To view the generated summaries, open the summary directory:

disdrodb_open_product_directory SUMMARY HYMEX_LTE_SOP3 10

6. Open and Analyze DISDRODB L0 Products#

DISDRODB provides convenient functions to open and analyze processed products. Use disdrodb.open_dataset() to lazily load all station files for a product as an xarray.Dataset (or pandas.DataFrame for L0A products).

Open All Station Files

This approach opens all files for a station at once using lazy loading (data is only read from disk when needed):

import disdrodb

# Define station arguments
data_source = "EPFL"
campaign_name = "HYMEX_LTE_SOP3"
station_name = "10"
product = "L0C"

# Open all station files of a given DISDRODB product
ds = disdrodb.open_dataset(
    product=product,
    # Station arguments
    data_source=data_source,
    campaign_name=campaign_name,
    station_name=station_name,
)
ds

List and Open Individual Files

Alternatively, you can list all product files and open them individually. This is useful when you want to process files one at a time or inspect specific dates:

import disdrodb
import xarray as xr

# Define station arguments
data_source = "EPFL"
campaign_name = "HYMEX_LTE_SOP3"
station_name = "10"
product = "L0C"

# List all files
filepaths = disdrodb.find_files(
    product=product,
    data_source=data_source,
    campaign_name=campaign_name,
    station_name=station_name,
)
# Open a single file
ds = xr.open_dataset(filepaths[0])
ds

7. Open and Analyze the DISDRODB L1 Product#

The L1 product is the recommended product for most analyses, as it provides quality-controlled, temporally aggregated data with hydrometeor classification.

When opening L1 products, you must specify the temporal resolution (e.g., 1MIN, 5MIN, 10MIN).

Open L1 Dataset

import disdrodb

# Define station arguments
data_source = "EPFL"
campaign_name = "HYMEX_LTE_SOP3"
station_name = "10"
product = "L1"
temporal_resolution = "1MIN"

# Open all station files of the DISDRODB L1 product
ds = disdrodb.open_dataset(
    product=product,
    # Station arguments
    data_source=data_source,
    campaign_name=campaign_name,
    station_name=station_name,
    # Product options
    temporal_resolution=temporal_resolution,
)
ds = ds.compute()

Understanding Hydrometeor Classification

The L1 product includes classification variables to identify different precipitation types. This allows you to filter data by hydrometeor type (rain, snow, graupel, hail) and analysis quality (valid measurements vs. artifacts).

For details on the classification methodology, see L1 Product.

Filter Data by Precipitation Type

Here’s how to subset and analyze the dataset by precipitation type:

ds["precipitation_type"]
print(ds["precipitation_type"].attrs)

ds["hydrometeor_type"]
print(ds["hydrometeor_type"].attrs)

# Select timesteps with rain
ds_rain = ds.isel(time=(ds["precipitation_type"] == 0))

# Sum over time and plot the spectrum
ds_rain.disdrodb.plot_spectrum()

# Plot raw spectrum of the timestep with more drops
ds_rain.isel(time=ds_rain["n_particles"].argmax().item()).disdrodb.plot_spectrum()

# Select timesteps with likely graupel
ds_graupel = ds.isel(time=(ds["hydrometeor_type"] == 8))
ds_graupel.disdrodb.plot_spectrum()

# Select timesteps with large hail
ds_hail = ds.isel(time=(ds["flag_hail"] == 2))
ds_hail.disdrodb.plot_spectrum()

8. Open and Analyze the DISDRODB L2E Product#

The L2E product provides derived microphysical parameters and simulated radar variables, making it ideal for rainfall analysis and radar intercomparison studies.

Open L2E Dataset

import disdrodb

# Define station arguments
data_source = "EPFL"
campaign_name = "HYMEX_LTE_SOP3"
station_name = "10"
product = "L2E"
temporal_resolution = "1MIN"

# Open all station files of the DISDRODB L2E product
ds = disdrodb.open_dataset(
    product=product,
    # Station arguments
    data_source=data_source,
    campaign_name=campaign_name,
    station_name=station_name,
    # Product options
    temporal_resolution=temporal_resolution,
)

L2E Product Contents

The L2E product focuses on rainfall observations and provides:

Drop size distribution (DSD) spectra and concentration
DSD moments
Microphysical parameters (rain rate, liquid water content, etc.)
Simulated polarimetric radar variables at multiple frequencies
Quality control flags

For details, see L2E Product and Radar Variable Simulations.

Example Analysis

# Compute and load dataset
ds = ds.compute()

# Analyze rain rate time series
ds["R"].plot()

# Check radar reflectivity at C-band
ds["ZH"].sel(frequency=5.6, method="nearest").plot()

# Analyze DSD moments
ds[["M3", "M4", "M6"]].to_dataframe().describe()

9. Explore Available Stations#

DISDRODB provides powerful filtering capabilities to explore the station catalog. Use disdrodb.available_stations() to discover stations that match your criteria.

List All Known Stations

By default, this function returns all stations registered in the DISDRODB Metadata Archive, regardless of whether their raw data are currently available for download.

To see only stations with downloadable data, use available_data=True. Note that some contributors have registered their stations but not yet made their data publicly available.

import disdrodb

disdrodb.available_stations()  # available_data=False by default
disdrodb.available_stations(available_data=True)

Filter by Sensor and Measurement Interval

import disdrodb

disdrodb.available_stations(sensor_name="PWS100", available_data=True)
disdrodb.available_stations(sensor_name="LPM", available_data=True)
disdrodb.available_stations(sensor_name="PARSIVEL", measurement_interval=10, available_data=True)
disdrodb.available_stations(sensor_name=["PARSIVEL", "PARSIVEL2"], measurement_interval=60, available_data=True)
disdrodb.available_stations(sensor_name=["RD80"], measurement_interval=10, available_data=True)

Filter by Station Identifiers

You can filter stations by data source, campaign name, or station name. Multiple filters can be combined for precise selection:

import disdrodb

disdrodb.available_stations(data_sources=["ITALY", "EPFL"])
disdrodb.available_stations(campaign_names="RELAMPAGO")
disdrodb.available_stations(station_names=["TC-TO", "TC-AQ"], measurement_interval=60)

List Processed Stations

After generating products locally, list available processed stations with:

import disdrodb

disdrodb.available_stations(product="L1", temporal_resolution="1MIN")
disdrodb.available_stations(sensor_name="PARSIVEL2", product="L1", temporal_resolution="1MIN")
disdrodb.available_stations(sensor_name="PARSIVEL2", product="L2E", temporal_resolution="1MIN")

Note

The temporal_resolution argument is required when listing L1 and L2 products, as these products can exist at multiple temporal resolutions.

10. What’s Next?#

Congratulations! You’ve completed the DISDRODB quick start tutorial. Here are some next steps to deepen your knowledge and make the most of DISDRODB:

Learn More About Products

Products: Detailed descriptions of each DISDRODB product level
Products Configuration: Customize processing parameters, archive strategies, and model fitting
Radar Variable Simulations: Configure multi-frequency radar simulations

Process Your Data

Archive Processing: Batch process multiple stations and campaigns
Near-Real-Time Processing: Process individual files for operational applications
Multi-Frequency Radar Tutorial: Compute radar variables across frequency ranges

Contribute to DISDRODB

Contribute Data: Add your disdrometer stations to the Decentralized Data Archive
Contribute Code: Develop new readers, improve processing, or add features

Advanced Workflows

L2M Products: Fit parametric DSD models (gamma, exponential, etc.) using grid search, maximum likelihood, or method of moments (see L2M Product and disdrodb.run_l2m_station())
Custom Processing: Use the Python API for fine-grained control over individual processing steps (see disdrodb.generate_l0a(), disdrodb.generate_l1(), disdrodb.generate_l2e(), disdrodb.generate_l2m())
Data Analysis: Leverage DISDRODB’s xarray accessor methods for specialized visualizations, event detection, and DSD computations

API Reference

disdrodb.open_dataset(): Open products as xarray datasets
disdrodb.find_files(): List available product files
disdrodb.available_stations(): Explore station catalog
disdrodb.download_station(): Download raw data programmatically
disdrodb.run_station(): Process complete chain for single station

Get Help

GitHub Issues: Report bugs or request features
Discussions: Ask questions and share ideas
Documentation: Comprehensive guides and API reference

Stay Updated

Star the DISDRODB repository to follow development
Update your Metadata Archive regularly: cd DISDRODB-METADATA && git pull
Join the community to stay informed about new stations and features

Important: Data Citation

Warning

When using DISDRODB data in your research, you must properly cite and acknowledge each station’s data source. This is essential for recognizing data contributors’ efforts and maintaining the open science ecosystem.

Citation information, DOIs, and recommended references are available in:

DISDRODB NetCDF/xarray.Dataset global attributes
DISDRODB Metadata YAML file of each station