eogrow.pipelines.merge_samples

Implements a pipeline for merging sampled features into numpy arrays fit for training models.

class eogrow.pipelines.merge_samples.MergeSamplesPipeline(config, raw_config=None)[source]

Pipeline to merge sampled data into joined numpy arrays

Parameters:

config (Schema) – A dictionary with configuration parameters
raw_config (RawConfig | None) – The configuration parameters pre-validation, for logging purposes only

pydantic model Schema[source]

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields:

field features_to_merge: List[Feature] [Required]: Dictionary of all features for which samples are to be merged.

field id_filename: str | None = None: Filename of array holding patch ID of concatenated features. The patch ID is the index of the patch in the final patch list, any filtration of the patch list will impact the results.

field input_folder_key: str [Required]

The storage manager key pointing to the input folder for the merge samples.

Validated by:

field num_threads: int = 1: Number of threads used to load data from EOPatches in parallel.

field output_folder_key: str [Required]

The storage manager key pointing to the output folder for the merge samples pipeline.

Validated by:

run_procedure()[source]

Procedure which merges data from EOPatches into ML-ready numpy arrays

build_workflow()[source]

Creates a workflow that outputs the requested features

merge_and_save_features(patches)[source]

Merges features from EOPatches and saves data