eogrow.pipelines.sampling

Implements different pipelines for sampling from data.

class eogrow.pipelines.sampling.BaseSamplingPipeline(config, raw_config=None)[source]

Bases: Pipeline

Pipeline to run sampling on EOPatches

Parameters:

config (Schema) – A dictionary with configuration parameters
raw_config (RawConfig | None) – The configuration parameters pre-validation, for logging purposes only

pydantic model Schema[source]

Bases: Schema

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields:

apply_to (Dict[str, Dict[eolearn.core.constants.FeatureType, List[str]]])
mask_of_samples_name (str | None)
output_folder_key (str)
sampled_suffix (str | None)

field apply_to: Dict[str, Dict[FeatureType, List[str]]] [Required]: A dictionary defining which features to sample, its structure is {folder_key: {feature_type: [feature_name]}}

field mask_of_samples_name: str | None = None: A name of a mask timeless output feature with information which pixels were sampled and how many times

field output_folder_key: str [Required]

The storage manager key pointing to the pipeline output folder.

Validated by:

validate_storage_key

field sampled_suffix: str | None = None: If provided features are saved with a suffix, e.g. for suffix SAMPLED the sampled FEATURES are saved as FEATURES_SAMPLED.

config: Schema

filter_patch_list(patch_list)[source]

Filter output EOPatches that have already been processed

Parameters:: patch_list (List[Tuple[str, BBox]]) –
Return type:: List[Tuple[str, BBox]]

build_workflow()[source]

Creates workflow that is divided into the following sub-parts:

loading data,
preprocessing steps,
sampling features
saving results

Return type:: EOWorkflow

class eogrow.pipelines.sampling.BaseRandomSamplingPipeline(*args, **kwargs)[source]

Bases: BaseSamplingPipeline

A base class for all sampling pipeline that work on random selection of samples

Parameters:

config – A dictionary with configuration parameters
raw_config – The configuration parameters pre-validation, for logging purposes only
args (Any) –
kwargs (Any) –

pydantic model Schema[source]

Bases: Schema

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields:

seed (Optional[int])

field seed: int | None = 42: A random generator seed to be used in order to obtain the same results every pipeline run.

config: Schema

get_execution_arguments(workflow, patch_list)[source]

Extends the basic method for adding execution arguments by adding seed arguments a sampling task

Parameters:

workflow (EOWorkflow) –
patch_list (List[Tuple[str, BBox]]) –

Return type:

Dict[str, Dict[EONode, Dict[str, object]]]

class eogrow.pipelines.sampling.FractionSamplingPipeline(*args, **kwargs)[source]

Bases: BaseRandomSamplingPipeline

A pipeline to sample per-class with different distributions

Parameters:

config – A dictionary with configuration parameters
raw_config – The configuration parameters pre-validation, for logging purposes only
args (Any) –
kwargs (Any) –

pydantic model Schema[source]

Bases: Schema

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields:

erosion_dict (Optional[Dict[int, List[int]]])
exclude_values (List[int])
fraction_of_samples (Union[float, Dict[int, float]])
sampling_feature_name (str)

field erosion_dict: Dict[int, List[int]] | None = None: A dictionary specifying disc radius of erosion operation to be applied to a list of label IDs

field exclude_values: List[int] [Optional]: Values to be excluded from sampling

field fraction_of_samples: float | Dict[int, float] [Required]: A fraction or a dictionary of per-class fractions of valid pixels to sample from the the sampling feature.

field sampling_feature_name: str [Required]: Name of MASK_TIMELESS feature to be used to create sample point

config: Schema

class eogrow.pipelines.sampling.BlockSamplingPipeline(*args, **kwargs)[source]

Bases: BaseRandomSamplingPipeline

A pipeline to randomly sample blocks

Parameters:

config – A dictionary with configuration parameters
raw_config – The configuration parameters pre-validation, for logging purposes only
args (Any) –
kwargs (Any) –

pydantic model Schema[source]

Bases: Schema

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields:

fraction_of_samples (Optional[float])
number_of_samples (Optional[int])
sample_size (Tuple[int, int])

field fraction_of_samples: float | None = None

A percentage of samples to be sampled. Exactly one of parameters fraction_of_samples and number_of_samples has to be given.

Validated by:

cannot_be_used_with_number_of_samples

field number_of_samples: int | None = None: A number of samples to be sampled. Exactly one of parameters fraction_of_samples and number_of_samples has to be given.

field sample_size: Tuple[int, int] [Required]: A height and width of each block in pixels.

config: Schema

class eogrow.pipelines.sampling.GridSamplingPipeline(config, raw_config=None)[source]

Bases: BaseSamplingPipeline

A pipeline to sample blocks in a regular grid

Parameters:

config (Schema) – A dictionary with configuration parameters
raw_config (RawConfig | None) – The configuration parameters pre-validation, for logging purposes only

pydantic model Schema[source]

Bases: Schema

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields:

sample_size (Tuple[int, int])
stride (Tuple[int, int])

field sample_size: Tuple[int, int] [Required]: A height and width of each block in pixels.

field stride: Tuple[int, int] [Required]: A tuple describing a distance between upper left corners of two consecutive sampled blocks. The first number is the vertical distance and the second number the horizontal distance. If stride in smaller than sample_size in any dimensions then sampled blocks will overlap.

config: Schema