disdrodb.retrievals package#

Submodules#

disdrodb.retrievals.lut module#

Routines for 1D and 2D Look Up Tables (LUT).

class disdrodb.retrievals.lut.NearestNeighbourLUT1D(df, x, columns=None, dtype=<class 'numpy.float32'>)[source][source]#

Bases: object

A 1D nearest neighbor lookup table using k-d tree.

This class builds a k-d tree from 1D points and their associated values, enabling fast nearest neighbor queries for interpolation or lookup purposes.

Parameters:
  • df (pandas.DataFrame) – Input DataFrame containing the coordinate and values.

  • x (str) – Name of the column representing the x-coordinate.

  • columns (list of str, optional) – List of column names to include in the lookup table values. If None, all columns are included. Default is None.

  • dtype (numpy.dtype, optional) – Data type for storing points and values. Default is np.float32.

Notes

Rows with NaNs in the selected value columns are discarded before constructing the lookup table.

predict(x, return_distance=False, max_distance=None)[source][source]#

Query the lookup table for nearest neighbor values.

Parameters:
  • x (array-like) – x-coordinates of query points. Can be scalar or array.

  • return_distance (bool, optional) – If True, include the distance to the nearest neighbor in the output. Default is False.

  • max_distance (float, tuple of float, or None, optional) –

    Maximum distance threshold for valid predictions.

    • If None: no distance masking is applied.

    • If float: points with distance > max_distance are set to NaN.

    • If tuple (dx,): points are masked if |x - x_nearest| > dx.

    Default is None.

Returns:

DataFrame containing the nearest neighbor values for each query point. Columns include x coordinates, the lookup values, and optionally the distance to the nearest neighbor. Values beyond max_distance are NaN.

Return type:

pandas.DataFrame

predict_dict(x)[source][source]#

Query the lookup table and return result as a dictionary.

classmethod read_lut(filename)[source][source]#

Load a lookup table from a pickle file.

save_lut(filename)[source][source]#

Save the lookup table to a file using pickle.

class disdrodb.retrievals.lut.NearestNeighbourLUT2D(df, x, y, columns=None, dtype=<class 'numpy.float32'>)[source][source]#

Bases: object

A 2D nearest neighbor lookup table using k-d tree.

This class builds a k-d tree from 2D points and their associated values, enabling fast nearest neighbor queries for interpolation or lookup purposes.

Parameters:
  • df (pandas.DataFrame) – Input DataFrame containing the coordinates and values.

  • x (str) – Name of the column representing the x-coordinate.

  • y (str) – Name of the column representing the y-coordinate.

  • columns (list of str, optional) – List of column names to include in the lookup table values. If None, all columns are included. Default is None.

  • dtype (numpy.dtype, optional) – Data type for storing points and values. Default is np.float32.

x#

Name of the x-coordinate column.

Type:

str

y#

Name of the y-coordinate column.

Type:

str

columns#

Column names for the lookup values.

Type:

list of str

dtype#

Data type used for storage.

Type:

numpy.dtype

points#

Array of shape (n, 2) containing the coordinates.

Type:

numpy.ndarray

values#

Array of shape (n, len(columns)) containing the lookup values.

Type:

numpy.ndarray

tree#

k-d tree for fast nearest neighbor queries.

Type:

scipy.spatial.cKDTree

Raises:
  • ValueError – If x or y are not columns in the DataFrame.

  • ValueError – If any of the specified columns are not found in the DataFrame.

Notes

Rows with NaNs in the selected value columns are discarded before constructing the lookup table.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({'x': [0, 1, 2], 'y': [0, 1, 2], 'value': [10, 20, 30]})
>>> lut = NearestNeighbourLUT2D(df, 'x', 'y', columns=['value'])
>>> lut.predict([0.5], [0.5])
predict(x, y, return_distance=False, max_distance=None)[source][source]#

Query the lookup table for nearest neighbor values.

Parameters:
  • x (array-like) – x-coordinates of query points. Can be scalar or array.

  • y (array-like) – y-coordinates of query points. Can be scalar or array.

  • return_distance (bool, optional) – If True, include the distance to the nearest neighbor in the output. Default is False.

  • max_distance (float, tuple of float, or None, optional) –

    Maximum distance threshold for valid predictions.

    • If None: no distance masking is applied.

    • If float: points with Euclidean distance > max_distance are set to NaN.

    • If tuple (dx, dy): points are masked if |x - x_nearest| > dx OR |y - y_nearest| > dy.

    Default is None.

Returns:

DataFrame containing the nearest neighbor values for each query point. Columns include x, y coordinates, the lookup values, and optionally the distance to the nearest neighbor. Values beyond max_distance are NaN.

Return type:

pandas.DataFrame

Raises:

ValueError – If x and y do not have the same shape.

Notes

The method automatically converts scalar inputs to 1D arrays. Input values that are NaN or inf will produce NaN output rows.

predict_dict(x, y)[source][source]#

Query the lookup table and return result as a dictionary.

Parameters:
  • x (scalar) – x-coordinate of the query point.

  • y (scalar) – y-coordinate of the query point.

Returns:

Dictionary with column names as keys and nearest neighbor values as values, including x and y coordinates.

Return type:

dict

Raises:

ValueError – If x or y are not scalars.

Notes

This is a convenience method for single-point queries returning a dict instead of a DataFrame.

classmethod read_lut(filename)[source][source]#

Load a lookup table from a pickle file.

Parameters:

filename (str) – Path to the file containing the saved lookup table.

Returns:

The loaded lookup table instance.

Return type:

NearestNeighbourLUT2D

save_lut(filename)[source][source]#

Save the lookup table to a file using pickle.

Parameters:

filename (str) – Path to the file where the lookup table will be saved.

Notes

Uses pickle with HIGHEST_PROTOCOL for efficient serialization.

Module contents#

DISDRODB module for DSD retrievals.

class disdrodb.retrievals.NearestNeighbourLUT1D(df, x, columns=None, dtype=<class 'numpy.float32'>)[source][source]#

Bases: object

A 1D nearest neighbor lookup table using k-d tree.

This class builds a k-d tree from 1D points and their associated values, enabling fast nearest neighbor queries for interpolation or lookup purposes.

Parameters:
  • df (pandas.DataFrame) – Input DataFrame containing the coordinate and values.

  • x (str) – Name of the column representing the x-coordinate.

  • columns (list of str, optional) – List of column names to include in the lookup table values. If None, all columns are included. Default is None.

  • dtype (numpy.dtype, optional) – Data type for storing points and values. Default is np.float32.

Notes

Rows with NaNs in the selected value columns are discarded before constructing the lookup table.

predict(x, return_distance=False, max_distance=None)[source][source]#

Query the lookup table for nearest neighbor values.

Parameters:
  • x (array-like) – x-coordinates of query points. Can be scalar or array.

  • return_distance (bool, optional) – If True, include the distance to the nearest neighbor in the output. Default is False.

  • max_distance (float, tuple of float, or None, optional) –

    Maximum distance threshold for valid predictions.

    • If None: no distance masking is applied.

    • If float: points with distance > max_distance are set to NaN.

    • If tuple (dx,): points are masked if |x - x_nearest| > dx.

    Default is None.

Returns:

DataFrame containing the nearest neighbor values for each query point. Columns include x coordinates, the lookup values, and optionally the distance to the nearest neighbor. Values beyond max_distance are NaN.

Return type:

pandas.DataFrame

predict_dict(x)[source][source]#

Query the lookup table and return result as a dictionary.

classmethod read_lut(filename)[source][source]#

Load a lookup table from a pickle file.

save_lut(filename)[source][source]#

Save the lookup table to a file using pickle.

class disdrodb.retrievals.NearestNeighbourLUT2D(df, x, y, columns=None, dtype=<class 'numpy.float32'>)[source][source]#

Bases: object

A 2D nearest neighbor lookup table using k-d tree.

This class builds a k-d tree from 2D points and their associated values, enabling fast nearest neighbor queries for interpolation or lookup purposes.

Parameters:
  • df (pandas.DataFrame) – Input DataFrame containing the coordinates and values.

  • x (str) – Name of the column representing the x-coordinate.

  • y (str) – Name of the column representing the y-coordinate.

  • columns (list of str, optional) – List of column names to include in the lookup table values. If None, all columns are included. Default is None.

  • dtype (numpy.dtype, optional) – Data type for storing points and values. Default is np.float32.

x#

Name of the x-coordinate column.

Type:

str

y#

Name of the y-coordinate column.

Type:

str

columns#

Column names for the lookup values.

Type:

list of str

dtype#

Data type used for storage.

Type:

numpy.dtype

points#

Array of shape (n, 2) containing the coordinates.

Type:

numpy.ndarray

values#

Array of shape (n, len(columns)) containing the lookup values.

Type:

numpy.ndarray

tree#

k-d tree for fast nearest neighbor queries.

Type:

scipy.spatial.cKDTree

Raises:
  • ValueError – If x or y are not columns in the DataFrame.

  • ValueError – If any of the specified columns are not found in the DataFrame.

Notes

Rows with NaNs in the selected value columns are discarded before constructing the lookup table.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({'x': [0, 1, 2], 'y': [0, 1, 2], 'value': [10, 20, 30]})
>>> lut = NearestNeighbourLUT2D(df, 'x', 'y', columns=['value'])
>>> lut.predict([0.5], [0.5])
predict(x, y, return_distance=False, max_distance=None)[source][source]#

Query the lookup table for nearest neighbor values.

Parameters:
  • x (array-like) – x-coordinates of query points. Can be scalar or array.

  • y (array-like) – y-coordinates of query points. Can be scalar or array.

  • return_distance (bool, optional) – If True, include the distance to the nearest neighbor in the output. Default is False.

  • max_distance (float, tuple of float, or None, optional) –

    Maximum distance threshold for valid predictions.

    • If None: no distance masking is applied.

    • If float: points with Euclidean distance > max_distance are set to NaN.

    • If tuple (dx, dy): points are masked if |x - x_nearest| > dx OR |y - y_nearest| > dy.

    Default is None.

Returns:

DataFrame containing the nearest neighbor values for each query point. Columns include x, y coordinates, the lookup values, and optionally the distance to the nearest neighbor. Values beyond max_distance are NaN.

Return type:

pandas.DataFrame

Raises:

ValueError – If x and y do not have the same shape.

Notes

The method automatically converts scalar inputs to 1D arrays. Input values that are NaN or inf will produce NaN output rows.

predict_dict(x, y)[source][source]#

Query the lookup table and return result as a dictionary.

Parameters:
  • x (scalar) – x-coordinate of the query point.

  • y (scalar) – y-coordinate of the query point.

Returns:

Dictionary with column names as keys and nearest neighbor values as values, including x and y coordinates.

Return type:

dict

Raises:

ValueError – If x or y are not scalars.

Notes

This is a convenience method for single-point queries returning a dict instead of a DataFrame.

classmethod read_lut(filename)[source][source]#

Load a lookup table from a pickle file.

Parameters:

filename (str) – Path to the file containing the saved lookup table.

Returns:

The loaded lookup table instance.

Return type:

NearestNeighbourLUT2D

save_lut(filename)[source][source]#

Save the lookup table to a file using pickle.

Parameters:

filename (str) – Path to the file where the lookup table will be saved.

Notes

Uses pickle with HIGHEST_PROTOCOL for efficient serialization.