ML project configuration management

Introduction to Facebook Hydra

Georg Heiler

May 8, 2021 1 min read

Configuration handling can get quite messy in complex machine learning pipelines. Facebook research has created Hydra to cope with this. Additionally, it allows for easy composition and re-configuration of such workflows.

Think of a simple project setup as outlined below:

├── conf
│   └── config.yml 
├── my_ml_script.py

NOTICE: the configuration is already set up as a folder to future-proof it i.e. add specific configurations for each model derived from some base configuration.

The config.yml file contains:

db:
  driver: mysql
  user: omry

Then you can use it directly in a python script my_ml_script:

import hydra
from omegaconf import DictConfig, OmegaConf

import logging

log = logging.getLogger(__name__)


@hydra.main(config_path="conf", config_name="config.yml")
def my_app(cfg: DictConfig) -> None:
    log.info(OmegaConf.to_yaml(cfg))


if __name__ == "__main__":
    my_app()

and call it (including overwritten configuration values)

python my_ml_script.py db.driver=postgresql

However, this does not directly work from our beloved Jupyter notebook interactive envrionment. But it is not too complicated to get it to work on Jupyter as well. Simply some more imports are needed - and the initialize function needs to be callend manually:

from hydra.experimental import compose, initialize
from omegaconf import OmegaConf

initialize(config_path="conf")
cfg = compose(config_name="config.yml", overrides=["db.driver=postgres", "db.user=me"])
print(OmegaConf.to_yaml(cfg))

Python Machine-Learning

ML project configuration management

Georg Heiler

Researcher & data scientist