README.rst
1 Dockerized Model Training with MLflow 2 ------------------------------------- 3 This directory contains an MLflow project that trains a linear regression model on the UC Irvine 4 Wine Quality Dataset. The project uses a Docker image to capture the dependencies needed to run 5 training code. Running a project in a Docker environment (as opposed to Conda) allows for capturing 6 non-Python dependencies, e.g. Java libraries. In the future, we also hope to add tools to MLflow 7 for running Dockerized projects e.g. on a Kubernetes cluster for scale out. 8 9 Structure of this MLflow Project 10 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 11 12 This MLflow project contains a ``train.py`` file that trains a scikit-learn model and uses 13 MLflow Tracking APIs to log the model and its metadata (e.g., hyperparameters and metrics) 14 for later use and reference. ``train.py`` operates on the Wine Quality Dataset, which is included 15 in ``wine-quality.csv``. 16 17 Most importantly, the project also includes an ``MLproject`` file, which specifies the Docker 18 container environment in which to run the project using the ``docker_env`` field: 19 20 .. code-block:: yaml 21 22 docker_env: 23 image: mlflow-docker-example 24 25 Here, ``image`` can be any valid argument to ``docker run``, such as the tag, ID or URL of a Docker 26 image (see `Docker docs <https://docs.docker.com/engine/reference/run/#general-form>`_). The above 27 example references a locally-stored image (``mlflow-docker-example``) by tag. 28 29 Finally, the project includes a ``Dockerfile`` that is used to build the image referenced by the 30 ``MLproject`` file. The ``Dockerfile`` specifies library dependencies required by the project, such 31 as ``mlflow`` and ``scikit-learn``. 32 33 Running this Example 34 ^^^^^^^^^^^^^^^^^^^^ 35 36 First, install MLflow (via ``pip install mlflow``) and install 37 `Docker <https://www.docker.com/get-started>`_. 38 39 Then, build the image for the project's Docker container environment. You must use the same image 40 name that is given by the ``docker_env.image`` field of the MLproject file. In this example, the 41 image name is ``mlflow-docker-example``. Issue the following command to build an image with this 42 name: 43 44 .. code-block:: bash 45 46 docker build -t mlflow-docker-example -f Dockerfile . 47 48 Note that the name if the image used in the ``docker build`` command, ``mlflow-docker-example``, 49 matches the name of the image referenced in the ``MLproject`` file. 50 51 Finally, run the example project using ``mlflow run examples/docker -P alpha=0.5``. 52 53 .. note:: 54 If running this example on a Mac with Apple silicon, ensure that Docker Desktop is running and 55 that you are logged in to the Docker Desktop service. 56 If you are modifying the example ``DockerFile`` to specify older versions of ``scikit-learn``, 57 you should enable `Rosetta compatibility <https://docs.docker.com/desktop/settings/mac/#features-in-development>`_ 58 in the Docker Desktop configuration settings to ensure that the appropriate ``cython`` compiler is used. 59 60 What happens when the project is run? 61 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 62 63 Running ``mlflow run examples/docker`` builds a new Docker image based on ``mlflow-docker-example`` 64 that also contains our project code. The resulting image is tagged as 65 ``mlflow-docker-example-<git-version>`` where ``<git-version>`` is the git commit ID. After the image is 66 built, MLflow executes the default (main) project entry point within the container using ``docker run``. 67 68 Environment variables, such as ``MLFLOW_TRACKING_URI``, are propagated inside the container during 69 project execution. When running against a local tracking URI, MLflow mounts the host system's 70 tracking directory (e.g., a local ``mlruns`` directory) inside the container so that metrics and 71 params logged during project execution are accessible afterwards.