tutorial.mdx
1 --- 2 sidebar_position: 15 3 toc_max_heading_level: 4 4 sidebar_label: Tutorial 5 --- 6 7 import { APILink } from "@site/src/components/APILink"; 8 9 # Model Registry Tutorials 10 11 Explore the full functionality of the Model Registry in this tutorial — from registering a model and inspecting its structure, to loading a specific model version for further use. 12 13 ## Model Registry 14 15 Throughout this tutorial we will leverage a local tracking server and model registry for simplicity. 16 However, for production use cases we recommend using a 17 [remote tracking server](/ml/tracking/tutorials/remote-server). 18 19 ### Step 0: Install Dependencies 20 21 ```bash 22 pip install --upgrade mlflow 23 ``` 24 25 ### Step 1: Register a Model 26 27 To use the MLflow model registry, you need to add your MLflow models to it. This is done through 28 registering a given model via one of the below commands: 29 30 - `mlflow.<model_flavor>.log_model(registered_model_name=<model_name>)`: register the model 31 **while** logging it to the tracking server. 32 - `mlflow.register_model(<model_uri>, <model_name>)`: register the model **after** logging it to 33 the tracking server. Note that you'll have to log the model before running this command to get a 34 model URI. 35 36 MLflow has lots of model flavors. In the below example, we'll leverage scikit-learn's 37 RandomForestRegressor to demonstrate the simplest way to register a model, but note that you 38 can leverage any [supported model flavor](/ml/model#models_built-in-model-flavors). 39 In the code snippet below, we start an mlflow run and train a random forest model. We then log some 40 relevant hyper-parameters, the model mean-squared-error (MSE), and finally log and register the 41 model itself. 42 43 ```python 44 from sklearn.datasets import make_regression 45 from sklearn.ensemble import RandomForestRegressor 46 from sklearn.metrics import mean_squared_error 47 from sklearn.model_selection import train_test_split 48 49 import mlflow 50 import mlflow.sklearn 51 52 with mlflow.start_run() as run: 53 X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False) 54 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 55 56 params = {"max_depth": 2, "random_state": 42} 57 model = RandomForestRegressor(**params) 58 model.fit(X_train, y_train) 59 60 # Log parameters and metrics using the MLflow APIs 61 mlflow.log_params(params) 62 63 y_pred = model.predict(X_test) 64 mlflow.log_metrics({"mse": mean_squared_error(y_test, y_pred)}) 65 66 # Log the sklearn model and register as version 1 67 mlflow.sklearn.log_model( 68 sk_model=model, 69 name="sklearn-model", 70 input_example=X_train, 71 registered_model_name="sk-learn-random-forest-reg-model", 72 ) 73 ``` 74 75 ```bash title="Example Output" 76 Successfully registered model 'sk-learn-random-forest-reg-model'. 77 Created version '1' of model 'sk-learn-random-forest-reg-model'. 78 ``` 79 80 Great! We've registered a model. 81 82 Before moving on, let's highlight some important implementation notes. 83 84 - To register a model, you can leverage the `registered_model_name` parameter in the <APILink fn="mlflow.sklearn.log_model" /> 85 or call <APILink fn="mlflow.register_model" /> after logging the model. Generally, we suggest the former because it's more 86 concise. 87 - [Model Signatures](/ml/model/signatures) 88 provide validation for our model inputs and outputs. The `input_example` in `log_model()` 89 automatically infers and logs a signature. Again, we suggest using this implementation because 90 it's concise. 91 92 ## Explore the Registered Model 93 94 Now that we've logged an experiment and registered the model associated with that experiment run, 95 let's observe how this information is actually stored both in the MLflow UI and in our local 96 directory. Note that we can also get this information programmatically, but for explanatory purposes 97 we'll use the MLflow UI. 98 99 ### Step 1: Explore the `mlruns` Directory 100 101 Given that we're using our local filesystem as our tracking server and model registry, let's observe 102 the directory structure created when running the python script in the prior step. 103 104 Before diving in, it's import to note that MLflow is designed to abstract complexity from the user 105 and this directory structure is just for illustration purposes. Furthermore, on remote deployments, 106 which is recommended for production use cases, the tracking server will be 107 on object store (S3, ADLS, GCS, etc.) and the model registry will be on a relational database 108 (PostgreSQL, MySQL, etc.). 109 110 ``` 111 mlruns/ 112 ├── 0/ # Experiment ID 113 │ ├── bc6dc2a4f38d47b4b0c99d154bbc77ad/ # Run ID 114 │ │ ├── metrics/ 115 │ │ │ └── mse # Example metric file for mean squared error 116 │ │ ├── artifacts/ # Artifacts associated with our run 117 │ │ │ └── sklearn-model/ 118 │ │ │ ├── python_env.yaml 119 │ │ │ ├── requirements.txt # Python package requirements 120 │ │ │ ├── MLmodel # MLflow model file with model metadata 121 │ │ │ ├── model.pkl # Serialized model file 122 │ │ │ ├── input_example.json 123 │ │ │ └── conda.yaml 124 │ │ ├── tags/ 125 │ │ │ ├── mlflow.user 126 │ │ │ ├── mlflow.source.git.commit 127 │ │ │ ├── mlflow.runName 128 │ │ │ ├── mlflow.source.name 129 │ │ │ ├── mlflow.log-model.history 130 │ │ │ └── mlflow.source.type 131 │ │ ├── params/ 132 │ │ │ ├── max_depth 133 │ │ │ └── random_state 134 │ │ └── meta.yaml 135 │ └── meta.yaml 136 ├── models/ # Model Registry Directory 137 ├── sk-learn-random-forest-reg-model/ # Registered model name 138 │ ├── version-1/ # Model version directory 139 │ │ └── meta.yaml 140 │ └── meta.yaml 141 ``` 142 143 The tracking server is organized by _Experiment ID_ and _Run ID_ and is responsible for storing our 144 experiment artifacts, parameters, and metrics. The model registry, on the other hand, only stores 145 metadata with pointers to our tracking server. 146 147 As you can see, flavors that support [autologging](/ml/tracking/autolog) provide lots of additional 148 information out-of-the-box. Also note that even if we don't have autologging for our model of 149 interest, we can easily store this information with explicit logging calls. 150 151 One more interesting callout is that by default you get three way to manage your model's 152 environment: `python_env.yaml` (python virtualenv), `requirements.txt` (PyPi requirements), and 153 `conda.yaml` (conda env). 154 155 Ok, now that we have a very high-level understanding of what is logged, let's use the MLflow UI to 156 view this information. 157 158 ### Step 2: Start the Tracking Server 159 160 In the same directory as your `mlruns` folder, run the below command. 161 162 ```bash 163 mlflow server --host 127.0.0.1 --port 8080 164 ``` 165 166 ``` 167 INFO: Started server process [26393] 168 INFO: Waiting for application startup. 169 INFO: Application startup complete. 170 INFO: Uvicorn running on http://127.0.0.1:8080 (Press CTRL+C to quit) 171 ``` 172 173 ### Step 3: View the Tracking Server 174 175 Assuming there are no errors, you can go to your web browser and visit `http://localhost:8080` to 176 view the MLflow UI. 177 178 First, let's leave the experiment tracking tab and visit the model registry. 179 180 <div className="center-div" style={{ width: 1024, maxWidth: "100%" }}> 181  183 </div> 184 185 Next, let's add tags and a model version alias to 186 [facilitate model deployment](/ml/model-registry/workflow/#deploy-and-organize-models-with-aliases-and-tags). 187 You can add or edit tags and aliases by clicking on the corresponding `Add` link or pencil icon in 188 the model version table. Let's... 189 190 1. Add a model version tag with a key of `problem_type` and value of `regression`. 191 2. Add a model version alias of `the_best_model_ever`. 192 193 <div className="center-div" style={{ width: 1024, maxWidth: "100%" }}> 194  196 </div> 197 198 ## Load a Registered Model 199 200 To perform inference on a registered model version, we need to load it into memory. There are many 201 ways to find our model version, but the best method differs depending on the information you have 202 available. However, in the spirit of a quickstart, the below code snippet shows the simplest way to 203 load a model from the model registry via a specific model URI and perform inference. 204 205 ```python 206 import mlflow.sklearn 207 from sklearn.datasets import make_regression 208 209 model_name = "sk-learn-random-forest-reg-model" 210 model_version = "latest" 211 212 # Load the model from the Model Registry 213 model_uri = f"models:/{model_name}/{model_version}" 214 model = mlflow.sklearn.load_model(model_uri) 215 216 # Generate a new dataset for prediction and predict 217 X_new, _ = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False) 218 y_pred_new = model.predict(X_new) 219 220 print(y_pred_new) 221 ``` 222 223 Note that if you're not using sklearn, if your model flavor is supported, you should use the 224 specific model flavor load method e.g. `mlflow.<flavor>.load_model()`. If the model flavor is 225 not supported, you should leverage <APILink fn="mlflow.pyfunc.load_model" />. Throughout this tutorial 226 we leverage sklearn for demonstration purposes. 227 228 ### Example 0: Load via Tracking Server 229 230 A model URI is a unique identifier for a serialized model. Given the model artifact is stored with 231 experiments in the tracking server, you can use the below model URIs to bypass the model registry 232 and load the artifact into memory. 233 234 1. **Absolute local path**: `mlflow.sklearn.load_model("/Users/me/path/to/local/model")` 235 2. **Relative local path**: `mlflow.sklearn.load_model("relative/path/to/local/model")` 236 3. **Run id**: `mlflow.sklearn.load_model(f"runs:/{mlflow_run_id}/{run_relative_path_to_model}")` 237 238 However, unless you're in the same environment that you logged the model, you typically won't have 239 the above information. Instead, you should load the model by leveraging the model's name and 240 version. 241 242 ### Example 1: Load via Name and Version 243 244 To load a model into memory via the `model_name` and monotonically increasing `model_version`, 245 use the below method: 246 247 ```python 248 model = mlflow.sklearn.load_model(f"models:/{model_name}/{model_version}") 249 ``` 250 251 While this method is quick and easy, the monotonically increasing model version lacks flexibility. 252 Often, it's more efficient to leverage a model version alias. 253 254 ### Example 2: Load via Model Version Alias 255 256 Model version aliases are user-defined identifiers for a model version. Given they're mutable after 257 model registration, they decouple model versions from the code that uses them. 258 259 For instance, let's say we have a model version alias called `production_model`, corresponding to 260 a production model. When our team builds a better model that is ready for deployment, we don't have 261 to change our serving workload code. Instead, in MLflow we reassign the `production_model` alias 262 from the old model version to the new one. This can be done simply in the UI. In the API, we run 263 _client.set_registered_model_alias_ with the same model name, alias name, and **new** model version 264 ID. It's that easy! 265 266 In the prior page, we added a model version alias to our model, but here's a programmatic example. 267 268 ```python 269 import mlflow.sklearn 270 from mlflow import MlflowClient 271 272 client = MlflowClient() 273 274 # Set model version alias 275 model_name = "sk-learn-random-forest-reg-model" 276 model_version_alias = "the_best_model_ever" 277 client.set_registered_model_alias(model_name, model_version_alias, "1") # Duplicate of step in UI 278 279 # Get information about the model 280 model_info = client.get_model_version_by_alias(model_name, model_version_alias) 281 model_tags = model_info.tags 282 print(model_tags) 283 284 # Get the model version using a model URI 285 model_uri = f"models:/{model_name}@{model_version_alias}" 286 model = mlflow.sklearn.load_model(model_uri) 287 288 print(model) 289 ``` 290 291 ```_ title="Output" 292 {'problem_type': 'regression'} 293 RandomForestRegressor(max_depth=2, random_state=42) 294 ``` 295 296 Model version alias is highly dynamic and can correspond to anything that is meaningful for your 297 team. The most common example is a deployment state. For instance, let's say we have a `champion` 298 model in production but are developing `challenger` model that will hopefully out-perform our 299 production model. You can use `champion` and `challenger` model version aliases to uniquely 300 identify these model versions for easy access. 301 302 That's it! You should now be comfortable... 303 304 1. Registering a model 305 2. Finding a model and modifying the tags and model version alias via the MLflow UI 306 3. Loading the registered model for inference