eogrow.pipelines.sampling

Implements different pipelines for sampling from data.

class eogrow.pipelines.sampling.BaseSamplingPipeline(config, raw_config=None)[source]

Bases: Pipeline

Pipeline to run sampling on EOPatches

Parameters:
  • config (Schema) – A dictionary with configuration parameters

  • raw_config (RawConfig | None) – The configuration parameters pre-validation, for logging purposes only

pydantic model Schema[source]

Bases: Schema

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields:
  • apply_to (Dict[str, Dict[eolearn.core.constants.FeatureType, List[str]]])

  • mask_of_samples_name (str | None)

  • output_folder_key (str)

  • sampled_suffix (str | None)

field apply_to: Dict[str, Dict[FeatureType, List[str]]] [Required]

A dictionary defining which features to sample, its structure is {folder_key: {feature_type: [feature_name]}}

field mask_of_samples_name: str | None = None

A name of a mask timeless output feature with information which pixels were sampled and how many times

field output_folder_key: str [Required]

The storage manager key pointing to the pipeline output folder.

Validated by:
  • validate_storage_key

field sampled_suffix: str | None = None

If provided features are saved with a suffix, e.g. for suffix SAMPLED the sampled FEATURES are saved as FEATURES_SAMPLED.

config: Schema
filter_patch_list(patch_list)[source]

Filter output EOPatches that have already been processed

Parameters:

patch_list (List[Tuple[str, BBox]]) –

Return type:

List[Tuple[str, BBox]]

build_workflow()[source]

Creates workflow that is divided into the following sub-parts:

  1. loading data,

  2. preprocessing steps,

  3. sampling features

  4. saving results

Return type:

EOWorkflow

class eogrow.pipelines.sampling.BaseRandomSamplingPipeline(*args, **kwargs)[source]

Bases: BaseSamplingPipeline

A base class for all sampling pipeline that work on random selection of samples

Parameters:
  • config – A dictionary with configuration parameters

  • raw_config – The configuration parameters pre-validation, for logging purposes only

  • args (Any) –

  • kwargs (Any) –

pydantic model Schema[source]

Bases: Schema

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields:
  • seed (Optional[int])

field seed: int | None = 42

A random generator seed to be used in order to obtain the same results every pipeline run.

config: Schema
get_execution_arguments(workflow, patch_list)[source]

Extends the basic method for adding execution arguments by adding seed arguments a sampling task

Parameters:
  • workflow (EOWorkflow) –

  • patch_list (List[Tuple[str, BBox]]) –

Return type:

Dict[str, Dict[EONode, Dict[str, object]]]

class eogrow.pipelines.sampling.FractionSamplingPipeline(*args, **kwargs)[source]

Bases: BaseRandomSamplingPipeline

A pipeline to sample per-class with different distributions

Parameters:
  • config – A dictionary with configuration parameters

  • raw_config – The configuration parameters pre-validation, for logging purposes only

  • args (Any) –

  • kwargs (Any) –

pydantic model Schema[source]

Bases: Schema

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields:
  • erosion_dict (Optional[Dict[int, List[int]]])

  • exclude_values (List[int])

  • fraction_of_samples (Union[float, Dict[int, float]])

  • sampling_feature_name (str)

field erosion_dict: Dict[int, List[int]] | None = None

A dictionary specifying disc radius of erosion operation to be applied to a list of label IDs

field exclude_values: List[int] [Optional]

Values to be excluded from sampling

field fraction_of_samples: float | Dict[int, float] [Required]

A fraction or a dictionary of per-class fractions of valid pixels to sample from the the sampling feature.

field sampling_feature_name: str [Required]

Name of MASK_TIMELESS feature to be used to create sample point

config: Schema
class eogrow.pipelines.sampling.BlockSamplingPipeline(*args, **kwargs)[source]

Bases: BaseRandomSamplingPipeline

A pipeline to randomly sample blocks

Parameters:
  • config – A dictionary with configuration parameters

  • raw_config – The configuration parameters pre-validation, for logging purposes only

  • args (Any) –

  • kwargs (Any) –

pydantic model Schema[source]

Bases: Schema

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields:
  • fraction_of_samples (Optional[float])

  • number_of_samples (Optional[int])

  • sample_size (Tuple[int, int])

field fraction_of_samples: float | None = None

A percentage of samples to be sampled. Exactly one of parameters fraction_of_samples and number_of_samples has to be given.

Validated by:
  • cannot_be_used_with_number_of_samples

field number_of_samples: int | None = None

A number of samples to be sampled. Exactly one of parameters fraction_of_samples and number_of_samples has to be given.

field sample_size: Tuple[int, int] [Required]

A height and width of each block in pixels.

config: Schema
class eogrow.pipelines.sampling.GridSamplingPipeline(config, raw_config=None)[source]

Bases: BaseSamplingPipeline

A pipeline to sample blocks in a regular grid

Parameters:
  • config (Schema) – A dictionary with configuration parameters

  • raw_config (RawConfig | None) – The configuration parameters pre-validation, for logging purposes only

pydantic model Schema[source]

Bases: Schema

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields:
  • sample_size (Tuple[int, int])

  • stride (Tuple[int, int])

field sample_size: Tuple[int, int] [Required]

A height and width of each block in pixels.

field stride: Tuple[int, int] [Required]

A tuple describing a distance between upper left corners of two consecutive sampled blocks. The first number is the vertical distance and the second number the horizontal distance. If stride in smaller than sample_size in any dimensions then sampled blocks will overlap.

config: Schema