Cradicle Explorer

/ docs / docs / classic-ml / model-registry / tutorial.mdx
tutorial.mdx
  1  ---
  2  sidebar_position: 15
  3  toc_max_heading_level: 4
  4  sidebar_label: Tutorial
  5  ---
  6  
  7  import { APILink } from "@site/src/components/APILink";
  8  
  9  # Model Registry Tutorials
 10  
 11  Explore the full functionality of the Model Registry in this tutorial — from registering a model and inspecting its structure, to loading a specific model version for further use.
 12  
 13  ## Model Registry
 14  
 15  Throughout this tutorial we will leverage a local tracking server and model registry for simplicity.
 16  However, for production use cases we recommend using a
 17  [remote tracking server](/ml/tracking/tutorials/remote-server).
 18  
 19  ### Step 0: Install Dependencies
 20  
 21  ```bash
 22  pip install --upgrade mlflow
 23  ```
 24  
 25  ### Step 1: Register a Model
 26  
 27  To use the MLflow model registry, you need to add your MLflow models to it. This is done through
 28  registering a given model via one of the below commands:
 29  
 30  - `mlflow.<model_flavor>.log_model(registered_model_name=<model_name>)`: register the model
 31    **while** logging it to the tracking server.
 32  - `mlflow.register_model(<model_uri>, <model_name>)`: register the model **after** logging it to
 33    the tracking server. Note that you'll have to log the model before running this command to get a
 34    model URI.
 35  
 36  MLflow has lots of model flavors. In the below example, we'll leverage scikit-learn's
 37  RandomForestRegressor to demonstrate the simplest way to register a model, but note that you
 38  can leverage any [supported model flavor](/ml/model#models_built-in-model-flavors).
 39  In the code snippet below, we start an mlflow run and train a random forest model. We then log some
 40  relevant hyper-parameters, the model mean-squared-error (MSE), and finally log and register the
 41  model itself.
 42  
 43  ```python
 44  from sklearn.datasets import make_regression
 45  from sklearn.ensemble import RandomForestRegressor
 46  from sklearn.metrics import mean_squared_error
 47  from sklearn.model_selection import train_test_split
 48  
 49  import mlflow
 50  import mlflow.sklearn
 51  
 52  with mlflow.start_run() as run:
 53      X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False)
 54      X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
 55  
 56      params = {"max_depth": 2, "random_state": 42}
 57      model = RandomForestRegressor(**params)
 58      model.fit(X_train, y_train)
 59  
 60      # Log parameters and metrics using the MLflow APIs
 61      mlflow.log_params(params)
 62  
 63      y_pred = model.predict(X_test)
 64      mlflow.log_metrics({"mse": mean_squared_error(y_test, y_pred)})
 65  
 66      # Log the sklearn model and register as version 1
 67      mlflow.sklearn.log_model(
 68          sk_model=model,
 69          name="sklearn-model",
 70          input_example=X_train,
 71          registered_model_name="sk-learn-random-forest-reg-model",
 72      )
 73  ```
 74  
 75  ```bash title="Example Output"
 76  Successfully registered model 'sk-learn-random-forest-reg-model'.
 77  Created version '1' of model 'sk-learn-random-forest-reg-model'.
 78  ```
 79  
 80  Great! We've registered a model.
 81  
 82  Before moving on, let's highlight some important implementation notes.
 83  
 84  - To register a model, you can leverage the `registered_model_name` parameter in the <APILink fn="mlflow.sklearn.log_model" />
 85    or call <APILink fn="mlflow.register_model" /> after logging the model. Generally, we suggest the former because it's more
 86    concise.
 87  - [Model Signatures](/ml/model/signatures)
 88    provide validation for our model inputs and outputs. The `input_example` in `log_model()`
 89    automatically infers and logs a signature. Again, we suggest using this implementation because
 90    it's concise.
 91  
 92  ## Explore the Registered Model
 93  
 94  Now that we've logged an experiment and registered the model associated with that experiment run,
 95  let's observe how this information is actually stored both in the MLflow UI and in our local
 96  directory. Note that we can also get this information programmatically, but for explanatory purposes
 97  we'll use the MLflow UI.
 98  
 99  ### Step 1: Explore the `mlruns` Directory
100  
101  Given that we're using our local filesystem as our tracking server and model registry, let's observe
102  the directory structure created when running the python script in the prior step.
103  
104  Before diving in, it's import to note that MLflow is designed to abstract complexity from the user
105  and this directory structure is just for illustration purposes. Furthermore, on remote deployments,
106  which is recommended for production use cases, the tracking server will be
107  on object store (S3, ADLS, GCS, etc.) and the model registry will be on a relational database
108  (PostgreSQL, MySQL, etc.).
109  
110  ```
111  mlruns/
112  ├── 0/                                    # Experiment ID
113  │   ├── bc6dc2a4f38d47b4b0c99d154bbc77ad/ # Run ID
114  │   │   ├── metrics/
115  │   │   │   └── mse                       # Example metric file for mean squared error
116  │   │   ├── artifacts/                    # Artifacts associated with our run
117  │   │   │   └── sklearn-model/
118  │   │   │       ├── python_env.yaml
119  │   │   │       ├── requirements.txt      # Python package requirements
120  │   │   │       ├── MLmodel               # MLflow model file with model metadata
121  │   │   │       ├── model.pkl             # Serialized model file
122  │   │   │       ├── input_example.json
123  │   │   │       └── conda.yaml
124  │   │   ├── tags/
125  │   │   │   ├── mlflow.user
126  │   │   │   ├── mlflow.source.git.commit
127  │   │   │   ├── mlflow.runName
128  │   │   │   ├── mlflow.source.name
129  │   │   │   ├── mlflow.log-model.history
130  │   │   │   └── mlflow.source.type
131  │   │   ├── params/
132  │   │   │   ├── max_depth
133  │   │   │   └── random_state
134  │   │   └── meta.yaml
135  │   └── meta.yaml
136  ├── models/                               # Model Registry Directory
137      ├── sk-learn-random-forest-reg-model/ # Registered model name
138      │   ├── version-1/                    # Model version directory
139      │   │   └── meta.yaml
140      │   └── meta.yaml
141  ```
142  
143  The tracking server is organized by _Experiment ID_ and _Run ID_ and is responsible for storing our
144  experiment artifacts, parameters, and metrics. The model registry, on the other hand, only stores
145  metadata with pointers to our tracking server.
146  
147  As you can see, flavors that support [autologging](/ml/tracking/autolog) provide lots of additional
148  information out-of-the-box. Also note that even if we don't have autologging for our model of
149  interest, we can easily store this information with explicit logging calls.
150  
151  One more interesting callout is that by default you get three way to manage your model's
152  environment: `python_env.yaml` (python virtualenv), `requirements.txt` (PyPi requirements), and
153  `conda.yaml` (conda env).
154  
155  Ok, now that we have a very high-level understanding of what is logged, let's use the MLflow UI to
156  view this information.
157  
158  ### Step 2: Start the Tracking Server
159  
160  In the same directory as your `mlruns` folder, run the below command.
161  
162  ```bash
163  mlflow server --host 127.0.0.1 --port 8080
164  ```
165  
166  ```
167  INFO:     Started server process [26393]
168  INFO:     Waiting for application startup.
169  INFO:     Application startup complete.
170  INFO:     Uvicorn running on http://127.0.0.1:8080 (Press CTRL+C to quit)
171  ```
172  
173  ### Step 3: View the Tracking Server
174  
175  Assuming there are no errors, you can go to your web browser and visit `http://localhost:8080` to
176  view the MLflow UI.
177  
178  First, let's leave the experiment tracking tab and visit the model registry.
179  
180  <div className="center-div" style={{ width: 1024, maxWidth: "100%" }}>
181    ![Model information from the mlflow
182    ui.](/images/quickstart/model-registry-quickstart/model-registry-ui.png)
183  </div>
184  
185  Next, let's add tags and a model version alias to
186  [facilitate model deployment](/ml/model-registry/workflow/#deploy-and-organize-models-with-aliases-and-tags).
187  You can add or edit tags and aliases by clicking on the corresponding `Add` link or pencil icon in
188  the model version table. Let's...
189  
190  1. Add a model version tag with a key of `problem_type` and value of `regression`.
191  2. Add a model version alias of `the_best_model_ever`.
192  
193  <div className="center-div" style={{ width: 1024, maxWidth: "100%" }}>
194    ![Model information from the mlflow
195    ui.](/images/quickstart/model-registry-quickstart/model-alias-and-tags.png)
196  </div>
197  
198  ## Load a Registered Model
199  
200  To perform inference on a registered model version, we need to load it into memory. There are many
201  ways to find our model version, but the best method differs depending on the information you have
202  available. However, in the spirit of a quickstart, the below code snippet shows the simplest way to
203  load a model from the model registry via a specific model URI and perform inference.
204  
205  ```python
206  import mlflow.sklearn
207  from sklearn.datasets import make_regression
208  
209  model_name = "sk-learn-random-forest-reg-model"
210  model_version = "latest"
211  
212  # Load the model from the Model Registry
213  model_uri = f"models:/{model_name}/{model_version}"
214  model = mlflow.sklearn.load_model(model_uri)
215  
216  # Generate a new dataset for prediction and predict
217  X_new, _ = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False)
218  y_pred_new = model.predict(X_new)
219  
220  print(y_pred_new)
221  ```
222  
223  Note that if you're not using sklearn, if your model flavor is supported, you should use the
224  specific model flavor load method e.g. `mlflow.<flavor>.load_model()`. If the model flavor is
225  not supported, you should leverage <APILink fn="mlflow.pyfunc.load_model" />. Throughout this tutorial
226  we leverage sklearn for demonstration purposes.
227  
228  ### Example 0: Load via Tracking Server
229  
230  A model URI is a unique identifier for a serialized model. Given the model artifact is stored with
231  experiments in the tracking server, you can use the below model URIs to bypass the model registry
232  and load the artifact into memory.
233  
234  1. **Absolute local path**: `mlflow.sklearn.load_model("/Users/me/path/to/local/model")`
235  2. **Relative local path**: `mlflow.sklearn.load_model("relative/path/to/local/model")`
236  3. **Run id**: `mlflow.sklearn.load_model(f"runs:/{mlflow_run_id}/{run_relative_path_to_model}")`
237  
238  However, unless you're in the same environment that you logged the model, you typically won't have
239  the above information. Instead, you should load the model by leveraging the model's name and
240  version.
241  
242  ### Example 1: Load via Name and Version
243  
244  To load a model into memory via the `model_name` and monotonically increasing `model_version`,
245  use the below method:
246  
247  ```python
248  model = mlflow.sklearn.load_model(f"models:/{model_name}/{model_version}")
249  ```
250  
251  While this method is quick and easy, the monotonically increasing model version lacks flexibility.
252  Often, it's more efficient to leverage a model version alias.
253  
254  ### Example 2: Load via Model Version Alias
255  
256  Model version aliases are user-defined identifiers for a model version. Given they're mutable after
257  model registration, they decouple model versions from the code that uses them.
258  
259  For instance, let's say we have a model version alias called `production_model`, corresponding to
260  a production model. When our team builds a better model that is ready for deployment, we don't have
261  to change our serving workload code. Instead, in MLflow we reassign the `production_model` alias
262  from the old model version to the new one. This can be done simply in the UI. In the API, we run
263  _client.set_registered_model_alias_ with the same model name, alias name, and **new** model version
264  ID. It's that easy!
265  
266  In the prior page, we added a model version alias to our model, but here's a programmatic example.
267  
268  ```python
269  import mlflow.sklearn
270  from mlflow import MlflowClient
271  
272  client = MlflowClient()
273  
274  # Set model version alias
275  model_name = "sk-learn-random-forest-reg-model"
276  model_version_alias = "the_best_model_ever"
277  client.set_registered_model_alias(model_name, model_version_alias, "1")  # Duplicate of step in UI
278  
279  # Get information about the model
280  model_info = client.get_model_version_by_alias(model_name, model_version_alias)
281  model_tags = model_info.tags
282  print(model_tags)
283  
284  # Get the model version using a model URI
285  model_uri = f"models:/{model_name}@{model_version_alias}"
286  model = mlflow.sklearn.load_model(model_uri)
287  
288  print(model)
289  ```
290  
291  ```_ title="Output"
292  {'problem_type': 'regression'}
293  RandomForestRegressor(max_depth=2, random_state=42)
294  ```
295  
296  Model version alias is highly dynamic and can correspond to anything that is meaningful for your
297  team. The most common example is a deployment state. For instance, let's say we have a `champion`
298  model in production but are developing `challenger` model that will hopefully out-perform our
299  production model. You can use `champion` and `challenger` model version aliases to uniquely
300  identify these model versions for easy access.
301  
302  That's it! You should now be comfortable...
303  
304  1. Registering a model
305  2. Finding a model and modifying the tags and model version alias via the MLflow UI
306  3. Loading the registered model for inference