ML project configuration management
Configuration handling can get quite messy in complex machine learning pipelines. Facebook research has created Hydra to cope with this. Additionally, it allows for easy composition and re-configuration of such workflows.
Think of a simple project setup as outlined below:
├── conf
│ └── config.yml
├── my_ml_script.py
NOTICE: the configuration is already set up as a folder to future-proof it i.e. add specific configurations for each model derived from some base configuration.
The config.yml
file contains:
db:
driver: mysql
user: omry
Then you can use it directly in a python script my_ml_script
:
import hydra
from omegaconf import DictConfig, OmegaConf
import logging
log = logging.getLogger(__name__)
@hydra.main(config_path="conf", config_name="config.yml")
def my_app(cfg: DictConfig) -> None:
log.info(OmegaConf.to_yaml(cfg))
if __name__ == "__main__":
my_app()
and call it (including overwritten configuration values)
python my_ml_script.py db.driver=postgresql
However, this does not directly work from our beloved Jupyter notebook interactive envrionment.
But it is not too complicated to get it to work on Jupyter as well.
Simply some more imports are needed - and the initialize
function needs to be callend manually:
from hydra.experimental import compose, initialize
from omegaconf import OmegaConf
initialize(config_path="conf")
cfg = compose(config_name="config.yml", overrides=["db.driver=postgres", "db.user=me"])
print(OmegaConf.to_yaml(cfg))