ML project configuration management
Introduction to Facebook Hydra
Configuration handling can get quite messy in complex machine learning pipelines. Facebook research has created Hydra to cope with this. Additionally, it allows for easy composition and re-configuration of such workflows.
Think of a simple project setup as outlined below:
├── conf │ └── config.yml ├── my_ml_script.py
NOTICE: the configuration is already set up as a folder to future-proof it i.e. add specific configurations for each model derived from some base configuration.
config.yml file contains:
db: driver: mysql user: omry
Then you can use it directly in a python script
import hydra from omegaconf import DictConfig, OmegaConf import logging log = logging.getLogger(__name__) @hydra.main(config_path="conf", config_name="config.yml") def my_app(cfg: DictConfig) -> None: log.info(OmegaConf.to_yaml(cfg)) if __name__ == "__main__": my_app()
and call it (including overwritten configuration values)
python my_ml_script.py db.driver=postgresql
However, this does not directly work from our beloved Jupyter notebook interactive envrionment.
But it is not too complicated to get it to work on Jupyter as well.
Simply some more imports are needed - and the
initialize function needs to be callend manually:
from hydra.experimental import compose, initialize from omegaconf import OmegaConf initialize(config_path="conf") cfg = compose(config_name="config.yml", overrides=["db.driver=postgres", "db.user=me"]) print(OmegaConf.to_yaml(cfg))