Config language

An important part of the eo-grow framework are configuration parameters which are kept separate from the code in a form of JSON files. In addition to the normal JSON syntax the framework implements a set of language rules defining how configuration parameters should be constructed and joined together.

Language rule	Signature	Description	When is evaluated	Use cases
Config joins	A dictionary key that starts with `` and points to a file path of another config, e.g. `{ ..., "any_name": "path/to/another/config.json", ...}`.	Evaluation replaces the key with keys and values from the referenced config file. The replacement happens recursively. In case of clashes, parameters that already exist in a config have priority. The reason behind `` notation is to be similar to `kwargs` in Python.	When config is read from a file.	For referencing config files with parameters that are shared between pipelines. This rule aims to reduce config and parameter duplication.
Path to the config file	A dictionary value containing `${config_path}`, e.g. `"${config_path}/path/to/a/file"`.	The signature is a replaced with a path to the current config file. The path is relative to a filesystem and doesn’t end with `/`.	When config is read from a file.	Can be used to reference another config file with a path that is relative to the current config location.
Reference a variable	A dictionary value containing `${var:my_variable}` and a subdictionary containing in form of `"variables": {"my_variable": "my_value", ...}`	The signature is replaced with values written in `variables` subdictionary and the subdictionary is removed in the process.	At a pipeline initialization phase.	This aims to reduce the number of duplicated or correlated config parameters and simplifies config parametrization.
Comments	`// A comment at the end of a line` or `/* A multi-line comment */`	The comments are ignored and removed when config is loaded.	When config is read from a file.	To explain why a parameter is set to a certain value.

According to these rules there are 2 stages when rules are applied:

when config is read from a file,
- This step is skipped in case configuration parameters are passed to a pipeline object as a dictionary in Python.
at a pipeline initialization phase,
- In case configuration is passed to a remote instance this happens on the remote instance.

Additional notes:

Dictionary keys must always be strings.
Config language interpretation supports any nested combination of dictionaries and lists.
Names of variables can only contain letters, numbers and _. Don’t use -, . or any other characters.
So far, config language is not completely OS-agnostic and it might not support Windows file paths.

Pipeline chains

A typical configuration is a dictionary with pipeline parameters. However, it can also be a list of pipeline-execution dictionaries that specify:

pipeline_config: a configuration for a single pipeline,
pipeline_resources (optional): a dictionary that is passed to ray.remote to configure which resources the main pipeline process will request from the cluster (see here for options). The pipeline requests 1 CPU by default (and nothing else).

The order of dictionaries defines the consecutive order in which pipelines will be run. Example:

[
  {
    "pipeline_config": {
      "pipeline": "FirstPipeline",
      "param1": "value1",
      ...
    },
  },
  {
    "pipeline_config": {
      "pipeline": "SecondPipeline",
      "param2": "value2",
      ...
    },
    "pipeline_resources": {"num_cpus": 2}
  },
  ...
]

There is currently no functionality to merge multiple pipeline chains, except by manually concatenating their contents into a single file.