DISDRODB reader preparation tutorial#
This notebook aims to guide you through creating the reader for the raw files logged by a disdrometer device.
In first place, this notebook will provide you with functions that will display and enable to investigate the content of your raw data files.
Successively, you will define a series of parameters defining the reader behaviour. These pieces of code will be consolidated in the reader_template.py file to generate a DISDRODB L0 reader.
In this notebook, we uses a lightweight dataset for illustratory purposes. You may use it and readapt it for exploring your own dataset, when preparing a new reader.
Following the documentation in How to Contribute New Data to DISDRODB, you should have already:
defined the metadata for the stations you aim to define the reader
copied the raw data within the correct folder of the local DISDRODB archive
copied the reader_template.py, place it in the correct
disdrodb.l0.<READER_DATA_SOURCE>
directory and renamed it as<READER_NAME>.py
For this tutorial, we have prepared some sample data in the folder data/DISDRODB of the disdrodb repository. In this tutorial, this data/DISDRODB
directory will act as toy DISDRODB base directory.
The data corresponds to some measurements taken at two stations (station_name_1
and station_name_2
) during two days of a field campaign led by the EPFL LTE laboratory.
📁 DISDRODB
├── 📁 Raw
├── 📁 DATA_SOURCE
├── 📁 CAMPAIGN_NAME
├── 📁 data
├── 📁 station_name_1
├── 📜 file60_20180817.dat.gz
├── 📜 file60_20180818.dat.gz
├── 📁 station_name_2
├── 📜 file61_20180817.dat.gz
├── 📜 file61_20180818.dat.gz
├── 📁 info
├── 📁 issue
├── 📜 station_name_1.yml
├── 📜 station_name_2.yml
├── 📁 metadata
├── 📜 station_name_1.yml
├── 📜 station_name_2.yml
Step 1: Read and analyse the data#
The goal of Step 1 is to define the specifications to read the raw data into a dataframe and ensure that the dataframe columns match the DISDRODB standards. At the end of this tutorial, you should be able to generate Apache Parquet files from your input raw data.
Here we load the modules and packages required. Nothing must be changed here.
[1]:
# Define project base directory
import os
root_path = os.path.dirname(os.getcwd()) # something like /home/ghiggi/Projects/disdrodb
print(root_path)
/home/ghiggi/Python_Packages/disdrodb
[4]:
import pandas as pd
from disdrodb.api.checks import check_sensor_name
# Directory
from disdrodb.api.create_directories import create_l0_directory_structure
from disdrodb.api.info import infer_path_info_dict
# Standards
from disdrodb.api.path import define_campaign_dir
from disdrodb.l0.check_standards import check_l0a_column_names
# L0A processing
from disdrodb.l0.io import get_raw_filepaths
from disdrodb.l0.l0a_processing import (
read_raw_file,
read_raw_files,
)
# L0B processing
from disdrodb.l0.l0b_processing import (
create_l0b_from_l0a,
)
# Tools to develop the reader
from disdrodb.l0.template_tools import (
check_column_names,
get_df_columns_unique_values_dict,
infer_column_names,
print_df_column_names,
print_df_columns_unique_values,
print_df_first_n_rows,
print_df_random_n_rows,
print_df_summary_stats,
print_valid_l0_column_names,
)
# Metadata
from disdrodb.metadata import read_station_metadata
1. Define paths and running parameters
In the following section, define the raw and processed directory paths. This may be changed if you are using another folder.
NB: - In the real use case, the DATA_SOURCE
and CAMPAIGN_NAME
should be replaced by meaningul names ! - The raw_dir
and processed_dir
must end with the same CAMPAIGN_NAME
(in upper case format)
[5]:
base_dir = os.path.join(root_path, "data", "DISDRODB")
data_source = "DATA_SOURCE"
campaign_name = "CAMPAIGN_NAME"
raw_dir = define_campaign_dir(base_dir=base_dir,
product="RAW",
data_source=data_source,
campaign_name=campaign_name,
)
processed_dir = define_campaign_dir(base_dir=base_dir,
product="L0A",
data_source=data_source,
campaign_name=campaign_name,
)
assert os.path.exists(raw_dir), "Raw directory does not exist"
print(f"raw_dir: {raw_dir}")
print(f"processed_dir: {processed_dir}")
raw_dir: /home/ghiggi/Python_Packages/disdrodb/data/DISDRODB/Raw/DATA_SOURCE/CAMPAIGN_NAME
processed_dir: /home/ghiggi/Python_Packages/disdrodb/data/DISDRODB/Processed/DATA_SOURCE/CAMPAIGN_NAME
Then we define the reader execution parameters. When the new reader will be created, these parameters will be become the reader function arguments. Please have a look at the documentation to get a full description.
[6]:
force = True
parallel = False
verbose = True
debugging_mode = True
sensor_name = "OTT_Parsivel"
2. Selection of the station
In this example, we choose to implement and run the reader for station station_name_1
. However, feel free to change the station name :)
[7]:
station_name = "station_name_1"
3. Initialization
We initiate some checks, and get some variable. Nothing must be changed here.
[10]:
# Create directory structure
create_l0_directory_structure(
raw_dir=raw_dir,
processed_dir=processed_dir,
station_name=station_name,
force=force,
product="L0A",
)
Please, be sure to run the cell above only one time. If it is run many times, the log file blocks the folder creation.
4. Get the list of file to process
We now list all raw data files that are available for the selected station. Here we need to specify the glob pattern that enables to select all the relevant data files. Since the files in this case study are named like file<XXX>_<TIME>.dat.gz
, we define the glob pattern "*.dat*"
. Note that also "*.dat.gz"
or "file*.dat.gz"
would have worked.
[11]:
glob_pattern = "*.dat*"
filepaths = get_raw_filepaths(
raw_dir=raw_dir,
station_name=station_name,
glob_patterns=glob_pattern,
verbose=verbose,
debugging_mode=debugging_mode,
)
print(filepaths)
- - 2 files to process in /home/ghiggi/Python_Packages/disdrodb/data/DISDRODB/Raw/DATA_SOURCE/CAMPAIGN_NAME/data/station_name_1
['/home/ghiggi/Python_Packages/disdrodb/data/DISDRODB/Raw/DATA_SOURCE/CAMPAIGN_NAME/data/station_name_1/file60_20180817.dat.gz', '/home/ghiggi/Python_Packages/disdrodb/data/DISDRODB/Raw/DATA_SOURCE/CAMPAIGN_NAME/data/station_name_1/file60_20180818.dat.gz']
🚨 The glob_pattern
variable definition will be transferred into your reader function at the end of this notebook.
Remember that the glob_pattern
variable depends on the file naming and extensions of your raw data !!!
5. Retrieve metadata from YAML files
We now load the metadata file of the station.
If the name of the station is not correctly defined, an error message is raised.
[12]:
# Retrieve metadata
attrs = read_station_metadata(station_name=station_name,
product="RAW",
**infer_path_info_dict(raw_dir))
# Retrieve sensor name
sensor_name = attrs["sensor_name"]
check_sensor_name(sensor_name)
5. Load the one file into a dataframe
In the reader_kwargs
dictionary, you may set any arguments that need to be passed to read the raw text file into a pandas.DataFrame
.
[13]:
reader_kwargs = {}
# - Define delimiter
reader_kwargs["delimiter"] = ","
# - Avoid first column to become df index !!!
reader_kwargs["index_col"] = False
# Since column names are expected to be passed explicitly, header is set to None
reader_kwargs["header"] = None
# - Number of rows to be skipped at the beginning of the file
reader_kwargs["skiprows"] = None
# - Define behaviour when encountering bad lines
reader_kwargs["on_bad_lines"] = "skip"
# - Define reader engine
# - C engine is faster
# - Python engine is more feature-complete
reader_kwargs["engine"] = "python"
# - Define on-the-fly decompression of on-disk data
# - Available: gzip, bz2, zip
reader_kwargs["compression"] = "infer"
# - Strings to recognize as NA/NaN and replace with standard NA flags
# - Already included: '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN',
# '-NaN', '-nan', '1.#IND', '1.#QNAN', '<NA>', 'N/A',
# 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'
reader_kwargs["na_values"] = ["na", "", "error"]
# -----------------------------------------------------------
# Select first file
filepath = filepaths[0]
# Try to read the raw file
df_raw = read_raw_file(filepath, column_names=None, reader_kwargs=reader_kwargs)
# Print the dataframe
print(f"Dataframe for the file {os.path.basename(filepath)} :")
display(df_raw)
Dataframe for the file file60_20180817.dat.gz :
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 362511 | 4612.0301 | 00847.4977 | 01-08-2018 12:44:30 | NaN | OK | 0000.000 | 0056.49 | 00 | 00 | ... | 035 | 0.06 | 24.9 | 0 | 005.649 | 000 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 0 |
1 | 362512 | 4612.0301 | 00847.4978 | 01-08-2018 12:45:01 | NaN | OK | 0000.000 | 0056.49 | 00 | 00 | ... | 035 | 0.06 | 24.9 | 0 | 005.649 | 000 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 0 |
2 | 362513 | 4612.0301 | 00847.4985 | 01-08-2018 12:45:30 | NaN | OK | 0000.000 | 0056.49 | 00 | 00 | ... | 035 | 0.06 | 24.9 | 0 | 005.649 | 000 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 0 |
3 | 362514 | 4612.0305 | 00847.4990 | 01-08-2018 12:46:01 | NaN | OK | 0000.000 | 0056.49 | 00 | 00 | ... | 035 | 0.05 | 24.9 | 0 | 005.649 | 000 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 0 |
4 | 362515 | 4612.0303 | 00847.4992 | 01-08-2018 12:46:31 | NaN | OK | 0000.000 | 0056.49 | 00 | 00 | ... | 034 | 0.06 | 24.9 | 0 | 005.649 | 000 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4736 | 367249 | 4612.0313 | 00847.4956 | 03-08-2018 04:13:25 | NaN | OK | 0000.000 | 0056.71 | 00 | 00 | ... | 015 | 0.06 | 24.9 | 0 | 005.671 | 000 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 0 |
4737 | 367250 | 4612.0313 | 00847.4955 | 03-08-2018 04:13:56 | NaN | OK | 0000.000 | 0056.71 | 00 | 00 | ... | 015 | 0.06 | 24.9 | 0 | 005.671 | 000 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 0 |
4738 | 367251 | 4612.0313 | 00847.4955 | 03-08-2018 04:14:26 | NaN | OK | 0000.000 | 0056.71 | 00 | 00 | ... | 015 | 0.06 | 24.9 | 0 | 005.671 | 000 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 0 |
4739 | 367252 | 4612.0313 | 00847.4954 | 03-08-2018 04:14:55 | NaN | OK | 0000.000 | 0056.71 | 00 | 00 | ... | 015 | 0.06 | 24.9 | 0 | 005.671 | 000 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 0 |
4740 | 367253 | 4612.0313 | 00847.4954 | 03-08-2018 04:15:25 | NaN | OK | 0000.000 | 0056.71 | 00 | 00 | ... | 015 | 0.07 | 24.9 | 0 | 005.671 | 000 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 0 |
4741 rows × 24 columns
[14]:
print("Column names:", df_raw.columns)
print("Row Index:", df_raw.index)
Column names: Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23],
dtype='int64')
Row Index: RangeIndex(start=0, stop=4741, step=1)
Here we expect the df_raw
to have: - numeric column names (i.e. Int64Index
) - numeric row index (i.e. RangeIndex
)
If the structure of the dataframe looks fine (no header and no row index), we are on the good track !
Depending on the schema of your data, this reader_kwargs
dictionary may be fairly different from the one above.
🚨 The
reader_kwargs
dictionary will be transferred to your reader function at the end of this notebook.
6. Data exploration
Since the settings for searching and reading the raw data are now specified, we can now load one file and analyse its content to see if there is any errors or inconsistencies.
Here are some instructions :
Do not assign column names to the dataframe columns yet
Do not assign a dtype to the dataframe columns yet
Possibly look at multiple files !
We print the content first 3 rows : (Feel free to change the value of n to see more/less rows)
[15]:
print_df_first_n_rows(df_raw, n=2, print_column_names=False)
- Column 0 :
['362511' '362512' '362513']
- Column 1 :
['4612.0301' '4612.0301' '4612.0301']
- Column 2 :
['00847.4977' '00847.4978' '00847.4985']
- Column 3 :
['01-08-2018 12:44:30' '01-08-2018 12:45:01' '01-08-2018 12:45:30']
- Column 4 :
[nan nan nan]
- Column 5 :
['OK' 'OK' 'OK']
- Column 6 :
['0000.000' '0000.000' '0000.000']
- Column 7 :
['0056.49' '0056.49' '0056.49']
- Column 8 :
['00' '00' '00']
- Column 9 :
['00' '00' '00']
- Column 10 :
['-9.999' '-9.999' '-9.999']
- Column 11 :
['9999' '9999' '9999']
- Column 12 :
['12611' '12617' '12600']
- Column 13 :
['00000' '00000' '00000']
- Column 14 :
['035' '035' '035']
- Column 15 :
['0.06' '0.06' '0.06']
- Column 16 :
['24.9' '24.9' '24.9']
- Column 17 :
['0' '0' '0']
- Column 18 :
['005.649' '005.649' '005.649']
- Column 19 :
['000' '000' '000']
- Column 20 :
['-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
'-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
'-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,']
- Column 21 :
['00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
'00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
'00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,']
- Column 22 :
['000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
'000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
'000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,']
- Column 23 :
['0' '0' '0']
[16]:
df_raw.head(3)
[16]:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 362511 | 4612.0301 | 00847.4977 | 01-08-2018 12:44:30 | NaN | OK | 0000.000 | 0056.49 | 00 | 00 | ... | 035 | 0.06 | 24.9 | 0 | 005.649 | 000 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 0 |
1 | 362512 | 4612.0301 | 00847.4978 | 01-08-2018 12:45:01 | NaN | OK | 0000.000 | 0056.49 | 00 | 00 | ... | 035 | 0.06 | 24.9 | 0 | 005.649 | 000 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 0 |
2 | 362513 | 4612.0301 | 00847.4985 | 01-08-2018 12:45:30 | NaN | OK | 0000.000 | 0056.49 | 00 | 00 | ... | 035 | 0.06 | 24.9 | 0 | 005.649 | 000 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 0 |
3 rows × 24 columns
We print the content of n rows picked randomly :
[17]:
print_df_random_n_rows(df_raw, n=6, print_column_names=False)
- Column 0 :
['362996' '367008' '364763' '363666' '366535' '366533']
- Column 1 :
['4612.0298' '4612.0289' '4612.0321' '4612.0315' '4612.0310' '4612.0309']
- Column 2 :
['00847.4959' '00847.4950' '00847.4965' '00847.4955' '00847.4951'
'00847.4950']
- Column 3 :
['01-08-2018 16:47:00' '03-08-2018 02:13:01' '02-08-2018 07:30:31'
'01-08-2018 22:22:01' '02-08-2018 22:16:30' '02-08-2018 22:15:30']
- Column 4 :
[nan nan nan nan nan nan]
- Column 5 :
['OK' 'OK' 'OK' 'OK' 'OK' 'OK']
- Column 6 :
['0000.000' '0000.000' '0000.000' '0000.000' '0000.000' '0000.000']
- Column 7 :
['0056.52' '0056.71' '0056.67' '0056.67' '0056.71' '0056.71']
- Column 8 :
['00' '00' '00' '00' '00' '00']
- Column 9 :
['00' '00' '00' '00' '00' '00']
- Column 10 :
['-9.999' '-9.999' '-9.999' '-9.999' '-9.999' '-9.999']
- Column 11 :
['9999' '9999' '9999' '9999' '9999' '9999']
- Column 12 :
['12510' '11702' '12580' '12540' '12144' '12128']
- Column 13 :
['00000' '00000' '00000' '00000' '00000' '00000']
- Column 14 :
['021' '015' '025' '018' '016' '016']
- Column 15 :
['0.06' '0.06' '0.05' '0.06' '0.06' '0.06']
- Column 16 :
['24.9' '24.9' '24.9' '24.9' '24.9' '24.9']
- Column 17 :
['0' '0' '0' '0' '0' '0']
- Column 18 :
['005.652' '005.671' '005.667' '005.667' '005.671' '005.671']
- Column 19 :
['000' '000' '000' '000' '000' '000']
- Column 20 :
['-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
'-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
'-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
'-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
'-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
'-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,']
- Column 21 :
['00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
'00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
'00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
'00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
'00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
'00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,']
- Column 22 :
['000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
'000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
'000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
'000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
'000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
'000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,']
- Column 23 :
['0' '0' '0' '0' '0' '0']
Get the number of column :
[18]:
len(df_raw.columns)
[18]:
24
Look at unique values for a single column :
[19]:
print_df_columns_unique_values(df_raw, column_indices=11, print_column_names=False)
- Column 11 :
['0824', '0906', '1363', '1397', '2921', '3203', '3326', '3816', '4465', '9999']
Look at unique values for a few columns :
Note: Use column_indices=None
to get the unique values for all columns
[20]:
print_df_columns_unique_values(df_raw, column_indices=slice(10, 12), print_column_names=False)
- Column 10 :
['-9.999', '02.669', '04.241', '04.745', '04.826', '04.879', '05.430', '06.095', '06.220', '07.415', '08.436', '08.489', '08.506', '08.724', '08.956', '09.079', '09.894', '10.057', '10.567', '11.705', '12.097', '12.390', '12.923', '13.114', '13.407', '13.684', '14.324', '15.060', '16.530', '16.636', '16.668', '17.194', '17.382', '17.829', '17.918', '18.334', '18.655', '19.526', '20.329', '21.134', '21.426', '23.098', '23.664', '23.760', '24.472', '25.473', '25.957', '29.270', '31.271', '32.255', '33.844', '36.196']
- Column 11 :
['0824', '0906', '1363', '1397', '2921', '3203', '3326', '3816', '4465', '9999']
Get the unique values as dictionary
[21]:
get_df_columns_unique_values_dict(df_raw, column_indices=slice(10, 12), column_names=False)
[21]:
{'Column 10': ['-9.999',
'02.669',
'04.241',
'04.745',
'04.826',
'04.879',
'05.430',
'06.095',
'06.220',
'07.415',
'08.436',
'08.489',
'08.506',
'08.724',
'08.956',
'09.079',
'09.894',
'10.057',
'10.567',
'11.705',
'12.097',
'12.390',
'12.923',
'13.114',
'13.407',
'13.684',
'14.324',
'15.060',
'16.530',
'16.636',
'16.668',
'17.194',
'17.382',
'17.829',
'17.918',
'18.334',
'18.655',
'19.526',
'20.329',
'21.134',
'21.426',
'23.098',
'23.664',
'23.760',
'24.472',
'25.473',
'25.957',
'29.270',
'31.271',
'32.255',
'33.844',
'36.196'],
'Column 11': ['0824',
'0906',
'1363',
'1397',
'2921',
'3203',
'3326',
'3816',
'4465',
'9999']}
7. Columns name
Now we have validated the content of our data. It’s time to care about its content and structure: the column names.
The function infer_column_names()
tries to guess the column names based on the type of sensor and the sensor specifications described within the raw_data_format.yml config file file.
[23]:
infer_column_names(df_raw, sensor_name=sensor_name)
[23]:
{0: [],
1: [],
2: [],
3: [],
4: [],
5: [],
6: ['rainfall_rate_32bit'],
7: ['rainfall_accumulated_32bit', 'rainfall_accumulated_16bit'],
8: ['weather_code_synop_4680', 'weather_code_synop_4677'],
9: ['weather_code_synop_4680', 'weather_code_synop_4677'],
10: [],
11: ['mor_visibility'],
12: ['sample_interval', 'number_particles', 'laser_amplitude'],
13: ['sample_interval', 'number_particles', 'laser_amplitude'],
14: ['error_code', 'sensor_temperature'],
15: ['sensor_heating_current'],
16: ['sensor_battery_voltage'],
17: ['sensor_status'],
18: ['rainfall_amount_absolute_32bit'],
19: ['error_code', 'sensor_temperature'],
20: ['raw_drop_concentration', 'raw_drop_average_velocity'],
21: ['raw_drop_concentration', 'raw_drop_average_velocity'],
22: ['raw_drop_number'],
23: ['sensor_status']}
This can help us to define later the column_names
list.
As reference, here is the list of valid columns name (taken from l0a_encodings.yml
):
[24]:
print_valid_l0_column_names(sensor_name)
['rainfall_rate_32bit', 'rainfall_accumulated_32bit', 'weather_code_synop_4680', 'weather_code_synop_4677', 'weather_code_metar_4678', 'weather_code_nws', 'reflectivity_32bit', 'mor_visibility', 'sample_interval', 'laser_amplitude', 'number_particles', 'sensor_temperature', 'sensor_serial_number', 'firmware_iop', 'firmware_dsp', 'sensor_heating_current', 'sensor_battery_voltage', 'sensor_status', 'start_time', 'sensor_time', 'sensor_date', 'station_name', 'station_number', 'rainfall_amount_absolute_32bit', 'error_code', 'rainfall_rate_16bit', 'rainfall_rate_12bit', 'rainfall_accumulated_16bit', 'reflectivity_16bit', 'raw_drop_concentration', 'raw_drop_average_velocity', 'raw_drop_number']
It’s time now to define our current column names :
Hint to define the names : * get information from the disdrometer user guide and the data logger employed. * use infer_df_str_column_names()
to help you * analyse the content column after column with print_df_columns_unique_values()
[25]:
column_names = [
"unknown1",
"unknown2",
"unknown3",
"timestep",
"unknown4",
"unknown5",
"rainfall_rate_32bit",
"rainfall_accumulated_32bit",
"weather_code_synop_4680",
"weather_code_synop_4677",
"reflectivity_32bit",
"mor_visibility",
"laser_amplitude",
"number_particles",
"sensor_temperature",
"sensor_heating_current",
"sensor_battery_voltage",
"sensor_status",
"rainfall_amount_absolute_32bit",
"error_code",
"raw_drop_concentration",
"raw_drop_average_velocity",
"raw_drop_number",
"unknown6",
]
🚨 The
column_names
list will be transferred to the reader function at the end of this notebook.
Check the validity of your definition
[26]:
check_column_names(column_names, sensor_name)
The following columns do no met the DISDRODB standards: ['unknown4', 'unknown2', 'unknown3', 'unknown5', 'unknown1', 'unknown6', 'timestep'].
Please remove such columns within the df_sanitizer_fun
Please be sure to create the 'time' column within the df_sanitizer_fun.
The 'time' column must be datetime with resolution in seconds (dtype='M8[s]').
Ok, fair enough. There are columns that need to be removed, and we need to also define a column "time"
with dtype datetime
to meet the DISDRODB standards.
These points will be addressed in Section 10 of this notebook !
8. Read the dataframe with correct columns name
We can now create a new dataframe with the columns name :
[27]:
df = read_raw_file(filepath=filepath, column_names=column_names, reader_kwargs=reader_kwargs)
And print the dataframe column names :
[28]:
print_df_column_names(df)
- Column 0 : unknown1
- Column 1 : unknown2
- Column 2 : unknown3
- Column 3 : timestep
- Column 4 : unknown4
- Column 5 : unknown5
- Column 6 : rainfall_rate_32bit
- Column 7 : rainfall_accumulated_32bit
- Column 8 : weather_code_synop_4680
- Column 9 : weather_code_synop_4677
- Column 10 : reflectivity_32bit
- Column 11 : mor_visibility
- Column 12 : laser_amplitude
- Column 13 : number_particles
- Column 14 : sensor_temperature
- Column 15 : sensor_heating_current
- Column 16 : sensor_battery_voltage
- Column 17 : sensor_status
- Column 18 : rainfall_amount_absolute_32bit
- Column 19 : error_code
- Column 20 : raw_drop_concentration
- Column 21 : raw_drop_average_velocity
- Column 22 : raw_drop_number
- Column 23 : unknown6
9. Perform further tests and analysis to check the correctness of ``column_names``
You can for example check some statistics for a specific column.
[29]:
column_name = "rainfall_rate_32bit"
array_of_values = df.loc[:, [column_name]].astype("float")
print_df_summary_stats(array_of_values)
- Column 0 ( rainfall_rate_32bit ):
mean 0.005426
min 0.000000
25% 0.000000
50% 0.000000
75% 0.000000
max 2.881000
10. Final columns formatting
[30]:
check_l0a_column_names(df, sensor_name=sensor_name)
The following columns do no met the DISDRODB standards: ['unknown4', 'unknown2', 'unknown3', 'unknown5', 'unknown1', 'unknown6', 'timestep']
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/home/ghiggi/Python_Packages/disdrodb/tutorials/reader_preparation.ipynb Cell 62 line 1
----> <a href='vscode-notebook-cell:/home/ghiggi/Python_Packages/disdrodb/tutorials/reader_preparation.ipynb#Y115sZmlsZQ%3D%3D?line=0'>1</a> check_l0a_column_names(df, sensor_name=sensor_name)
File ~/Python_Packages/disdrodb/disdrodb/l0/check_standards.py:154, in check_l0a_column_names(df, sensor_name)
152 msg = f"The following columns do no met the DISDRODB standards: {invalid_columns}"
153 logger.error(msg)
--> 154 raise ValueError(msg)
155 # --------------------------------------------
156 # Check time column is present
157 if "time" not in df_columns:
ValueError: The following columns do no met the DISDRODB standards: ['unknown4', 'unknown2', 'unknown3', 'unknown5', 'unknown1', 'unknown6', 'timestep']
[31]:
check_column_names(column_names, sensor_name)
The following columns do no met the DISDRODB standards: ['unknown4', 'unknown2', 'unknown3', 'unknown5', 'unknown1', 'unknown6', 'timestep'].
Please remove such columns within the df_sanitizer_fun
Please be sure to create the 'time' column within the df_sanitizer_fun.
The 'time' column must be datetime with resolution in seconds (dtype='M8[s]').
Now, it’s time to remove all the columns that does not match the DISDRODB standard.
[32]:
df = df.drop(columns=["unknown1", "unknown2", "unknown3", "unknown4", "unknown5", "unknown6"])
It’s also time to define the column time
which is requested by the DISDRODB standard.
[33]:
df["time"] = pd.to_datetime(df["timestep"], format="%m-%d-%Y %H:%M:%S")
df = df.drop(columns=["timestep"])
Now let’s check that the column names, after custom processing, conform with the DISDRODB standards:
[34]:
check_l0a_column_names(df, sensor_name=sensor_name)
Finally, check if the dataframe looks as desired:
[35]:
print_df_column_names(df)
- Column 0 : rainfall_rate_32bit
- Column 1 : rainfall_accumulated_32bit
- Column 2 : weather_code_synop_4680
- Column 3 : weather_code_synop_4677
- Column 4 : reflectivity_32bit
- Column 5 : mor_visibility
- Column 6 : laser_amplitude
- Column 7 : number_particles
- Column 8 : sensor_temperature
- Column 9 : sensor_heating_current
- Column 10 : sensor_battery_voltage
- Column 11 : sensor_status
- Column 12 : rainfall_amount_absolute_32bit
- Column 13 : error_code
- Column 14 : raw_drop_concentration
- Column 15 : raw_drop_average_velocity
- Column 16 : raw_drop_number
- Column 17 : time
[36]:
print_df_random_n_rows(df, n=5)
- Column 0 ( rainfall_rate_32bit ):
['0000.000' '0000.000' '0000.000' '0000.000' '0000.000']
- Column 1 ( rainfall_accumulated_32bit ):
['0056.52' '0056.67' '0056.71' '0056.71' '0056.71']
- Column 2 ( weather_code_synop_4680 ):
['00' '00' '00' '00' '00']
- Column 3 ( weather_code_synop_4677 ):
['00' '00' '00' '00' '00']
- Column 4 ( reflectivity_32bit ):
['-9.999' '-9.999' '-9.999' '-9.999' '-9.999']
- Column 5 ( mor_visibility ):
['9999' '9999' '9999' '9999' '9999']
- Column 6 ( laser_amplitude ):
['12529' '12595' '11388' '12456' '12248']
- Column 7 ( number_particles ):
['00000' '00000' '00000' '00000' '00000']
- Column 8 ( sensor_temperature ):
['023' '024' '015' '020' '017']
- Column 9 ( sensor_heating_current ):
['0.05' '0.06' '0.06' '0.05' '0.06']
- Column 10 ( sensor_battery_voltage ):
['24.9' '24.9' '24.9' '24.9' '24.9']
- Column 11 ( sensor_status ):
['0' '0' '0' '0' '0']
- Column 12 ( rainfall_amount_absolute_32bit ):
['005.652' '005.667' '005.671' '005.671' '005.671']
- Column 13 ( error_code ):
['000' '000' '000' '000' '000']
- Column 14 ( raw_drop_concentration ):
['-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
'-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
'-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
'-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,'
'-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.999,']
- Column 15 ( raw_drop_average_velocity ):
['00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
'00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
'00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
'00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,'
'00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,00.000,']
- Column 16 ( raw_drop_number ):
['000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
'000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
'000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
'000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,'
'000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,']
- Column 17 ( time ):
['2018-01-08T15:26:31.000000000' '2018-02-08T06:21:30.000000000'
'2018-03-08T02:27:30.000000000' '2018-02-08T17:41:01.000000000'
'2018-02-08T21:08:01.000000000']
[37]:
print_df_columns_unique_values(df, column_indices=2, print_column_names=True)
- Column 2 ( weather_code_synop_4680 ):
['00', '57', '61', '62', '71', '72', '88']
11. Define the dataframe sanitizer function
The df_sanitizer_fun
encapsulate the code specific to each reader/dataset that is required to obtain a dataframe compliants with the DISDRODB standards.
With the data used in this notebook, we need to drop some columns and define the time
column !
From the code defined in Section 10, we define the following function:
[38]:
def df_sanitizer_fun(df):
# Import pandas
import pandas as pd
# - Drop unvalid columns
columns_to_drop = [
"unknown1",
"unknown2",
"unknown3",
"unknown4",
"unknown5",
"unknown6",
]
df = df.drop(columns=columns_to_drop)
# - Convert timestep column to datetime format
df["time"] = pd.to_datetime(df["timestep"], format="%m-%d-%Y %H:%M:%S")
df = df.drop(columns=["timestep"])
# - Return the dataframe
return df
🚨 The
df_sanitizer_fun()
function will be transfered to the reader function at the end of this notebook.
12. Now let’s try calling the reader function as it will be called in the DISDRODB L0 reader
You may try with increasing number of files (update
filepaths
)
Here we combine all raw files in a single dataframe.
The function read_raw_files
takes as argument : * filepaths
: the list of files present in the specified station directory * column_names
: the list of column (defined previously) * reader_kwargs
: dictionary to data loading into the dataframe (defined previously) * sensor_name
: taken from the sensor_name
key in the metadata YAML file of the station * df_sanitizer_fun
: the function to sanitize the data frame (defined previously)
All these arguments are defined either in the data directory structure, or earlier in the code.
[39]:
subset_filepaths = filepaths[:1]
df = read_raw_files(
filepaths=subset_filepaths,
column_names=column_names,
reader_kwargs=reader_kwargs,
sensor_name=sensor_name,
verbose=verbose,
df_sanitizer_fun=df_sanitizer_fun,
)
display(df)
- 1 / 1 processed successfully. File name: /home/ghiggi/Python_Packages/disdrodb/data/DISDRODB/Raw/DATA_SOURCE/CAMPAIGN_NAME/data/station_name_1/file60_20180817.dat.gz
- - 0 of 1 have been skipped.
rainfall_rate_32bit | rainfall_accumulated_32bit | weather_code_synop_4680 | weather_code_synop_4677 | reflectivity_32bit | mor_visibility | laser_amplitude | number_particles | sensor_temperature | sensor_heating_current | sensor_battery_voltage | sensor_status | rainfall_amount_absolute_32bit | error_code | raw_drop_concentration | raw_drop_average_velocity | raw_drop_number | time | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 56.490002 | 0.0 | 0.0 | -9.999 | 9999.0 | 12611.0 | 0.0 | 35.0 | 0.06 | 24.9 | 0.0 | 5.649 | 0.0 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 2018-01-08 12:44:30 |
1 | 0.0 | 56.490002 | 0.0 | 0.0 | -9.999 | 9999.0 | 12617.0 | 0.0 | 35.0 | 0.06 | 24.9 | 0.0 | 5.649 | 0.0 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 2018-01-08 12:45:01 |
2 | 0.0 | 56.490002 | 0.0 | 0.0 | -9.999 | 9999.0 | 12600.0 | 0.0 | 35.0 | 0.06 | 24.9 | 0.0 | 5.649 | 0.0 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 2018-01-08 12:45:30 |
3 | 0.0 | 56.490002 | 0.0 | 0.0 | -9.999 | 9999.0 | 12603.0 | 0.0 | 35.0 | 0.05 | 24.9 | 0.0 | 5.649 | 0.0 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 2018-01-08 12:46:01 |
4 | 0.0 | 56.490002 | 0.0 | 0.0 | -9.999 | 9999.0 | 12606.0 | 0.0 | 34.0 | 0.06 | 24.9 | 0.0 | 5.649 | 0.0 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 2018-01-08 12:46:31 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4736 | 0.0 | 56.709999 | 0.0 | 0.0 | -9.999 | 9999.0 | 11059.0 | 0.0 | 15.0 | 0.06 | 24.9 | 0.0 | 5.671 | 0.0 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 2018-03-08 04:13:25 |
4737 | 0.0 | 56.709999 | 0.0 | 0.0 | -9.999 | 9999.0 | 11175.0 | 0.0 | 15.0 | 0.06 | 24.9 | 0.0 | 5.671 | 0.0 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 2018-03-08 04:13:56 |
4738 | 0.0 | 56.709999 | 0.0 | 0.0 | -9.999 | 9999.0 | 11275.0 | 0.0 | 15.0 | 0.06 | 24.9 | 0.0 | 5.671 | 0.0 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 2018-03-08 04:14:26 |
4739 | 0.0 | 56.709999 | 0.0 | 0.0 | -9.999 | 9999.0 | 11361.0 | 0.0 | 15.0 | 0.06 | 24.9 | 0.0 | 5.671 | 0.0 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 2018-03-08 04:14:55 |
4740 | 0.0 | 56.709999 | 0.0 | 0.0 | -9.999 | 9999.0 | 11492.0 | 0.0 | 15.0 | 0.07 | 24.9 | 0.0 | 5.671 | 0.0 | -9.999,-9.999,-9.999,-9.999,-9.999,-9.999,-9.9... | 00.000,00.000,00.000,00.000,00.000,00.000,00.0... | 000,000,000,000,000,000,000,000,000,000,000,00... | 2018-03-08 04:15:25 |
4741 rows × 18 columns
Here we derive the corresponding xr.Dataset
object
[40]:
ds = create_l0b_from_l0a(df, attrs, verbose=False)
print(ds)
<xarray.Dataset>
Dimensions: (time: 4741, diameter_bin_center: 32,
velocity_bin_center: 32, crs: 1)
Coordinates: (12/13)
* diameter_bin_center (diameter_bin_center) float64 0.062 ... 24.5
diameter_bin_lower (diameter_bin_center) float64 0.0 ... 23.0
diameter_bin_upper (diameter_bin_center) float64 0.1245 ... ...
diameter_bin_width (diameter_bin_center) float64 0.125 ... 3.0
* velocity_bin_center (velocity_bin_center) float64 0.05 ... 20.8
velocity_bin_lower (velocity_bin_center) float64 0.0 ... 19.2
... ...
velocity_bin_width (velocity_bin_center) float64 0.1 ... 3.2
* time (time) datetime64[ns] 2018-01-08T12:44:30...
latitude float64 46.2
longitude float64 8.792
altitude int64 1671
* crs (crs) <U5 'WGS84'
Data variables: (12/17)
raw_drop_concentration (time, diameter_bin_center) float64 0.0 ....
raw_drop_average_velocity (time, velocity_bin_center) float64 0.0 ....
raw_drop_number (time, diameter_bin_center, velocity_bin_center) float64 ...
rainfall_rate_32bit (time) float32 0.0 0.0 0.0 ... 0.0 0.0 0.0
rainfall_accumulated_32bit (time) float32 56.49 56.49 ... 56.71 56.71
weather_code_synop_4680 (time) float32 0.0 0.0 0.0 ... 0.0 0.0 0.0
... ...
sensor_temperature (time) float32 35.0 35.0 35.0 ... 15.0 15.0
sensor_heating_current (time) float32 0.06 0.06 0.06 ... 0.06 0.07
sensor_battery_voltage (time) float32 24.9 24.9 24.9 ... 24.9 24.9
sensor_status (time) float32 0.0 0.0 0.0 ... 0.0 0.0 0.0
rainfall_amount_absolute_32bit (time) float32 5.649 5.649 ... 5.671 5.671
error_code (time) float32 0.0 0.0 0.0 ... 0.0 0.0 0.0
Attributes: (12/61)
data_source: DATA_SOURCE
campaign_name: CAMPAIGN_NAME
station_name: station_name_1
sensor_name: OTT_Parsivel
reader: EPFL/LOCARNO_2018
raw_data_format: raw
... ...
time_coverage_start: 2018-01-08T12:44:30.000000000
time_coverage_end: 2018-03-08T04:15:25.000000000
disdrodb_processing_date: 2023-12-01 13:36:52
disdrodb_product_version: V0
disdrodb_software_version: V0.0.18.dev57+g8911365.d20231103
disdrodb_product: L0B
/home/ghiggi/Python_Packages/disdrodb/disdrodb/l0/l0b_processing.py:475: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
ds = xr.Dataset(
which can be saved as DISDRODB L0B netCDF by running the following code:
[41]:
# ds = set_encodings(ds, sensor_name)
# ds.to_netcdf("/path/where/to/save/the/file.nc")
Step 2 : Create the reader#
Now we have all the parameters required to define a DISDRODB reader. All the DISDRODB reader parameters that we defined in this notebook must be transcribed into the reader function you are developing:
Update the
glob_pattern
stringBefore :
glob_patterns = "*"
After :
glob_pattern = "*.dat*"
Update the
columns_names
listBefore :
column_names = []
After :
column_names = [ "unknown1", "unknown2", "unknown3", "timestep", "unknown4", "unknown5", "rainfall_rate_32bit", "rainfall_accumulated_32bit", "weather_code_synop_4680", "weather_code_synop_4677", "reflectivity_32bit", "mor_visibility", "laser_amplitude", "number_particles", "sensor_temperature", "sensor_heating_current", "sensor_battery_voltage", "sensor_status", "rainfall_amount_absolute_32bit", "error_code", "raw_drop_concentration", "raw_drop_average_velocity", "raw_drop_number", "unknown6", ]
Update the
reader_kwargs
**
dictionary**
Before :
``` python
reader_kwargs = {}
```
After :
``` python
reader_kwargs = {}
# - Define delimiter
reader_kwargs["delimiter"] = ","
# - Avoid first column to become df index !!!
reader_kwargs["index_col"] = False
# Since column names are expected to be passed explicitly, header is set to None
reader_kwargs['header'] = None
# - Number of rows to be skipped at the beginning of the file
reader_kwargs['skiprows']= None
# - Define behaviour when encountering bad lines
reader_kwargs["on_bad_lines"] = "skip"
# - Define reader engine
# - C engine is faster
# - Python engine is more feature-complete
reader_kwargs["engine"] = "python"
# - Define on-the-fly decompression of on-disk data
# - Available: gzip, bz2, zip
reader_kwargs["compression"] = "infer"
# - Strings to recognize as NA/NaN and replace with standard NA flags
# - Already included: '#N/A’, '#N/A N/A’, '#NA’, '-1.#IND’, '-1.#QNAN’,
# '-NaN’, '-nan’, '1.#IND’, '1.#QNAN’, '<NA>’, 'N/A’,
# 'NA’, 'NULL’, 'NaN’, 'n/a’, 'nan’, 'null’
reader_kwargs["na_values"] = ["na", "", "error"]
```
Update the
df_sanitizer_fun()
functionBefore:
def df_sanitizer_fun(df): # - Import dask or pandas import pandas as pd # - Add here below the reader required custom code pass # - Return the dataframe return df
After :
def df_sanitizer_fun(df): # Import pandas import pandas as pd # - Drop unvalid columns columns_to_drop = ["unknown1", "unknown2", "unknown3","unknown4",'unknown5','unknown6'] df = df.drop(columns=columns_to_drop) # - Convert timestep column to datetime format df["time"] = pd.to_datetime(df["timestep"], format="%m-%d-%Y %H:%M:%S") df = df.drop(columns=["timestep"]) # - Return the dataframe return df
You arrived at the end of the tutorial. Well done 👋👋👋
At this point, you should now be able to create a new reader for your own data. When you think your reader is ready, you can test it following the Test the DISDRODB L0 processsing documentation of the How to Contribute New Data to DISDRODB guidelines.
Do not hesitate to open a GitHub Issue if you need any clarification.
The DISDRODB team hope you enjoyed this tutorial