Cradicle Explorer

/ examples / llms / RAG / question_answer_source.csv

question_answer_source.csv

1 question,answer,chunk,chunk_id,source
2 What is the purpose of the MLflow Model Registry?,"The purpose of the MLflow Model Registry is to collaboratively manage the full lifecycle of an MLflow Model. It provides model lineage, versioning, aliasing, tagging, and annotations. It serves as a centralized model store with a set of APIs and a UI for organizing and deploying models. The Model Registry allows users to register, organize, and serve MLflow models across different environments.","Documentation MLflow Model Registry MLflow Model Registry The MLflow Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model. It provides model lineage (which MLflow experiment and run produced the model), model versioning, model aliasing, model tagging, and annotations. Table of Contents Concepts Model Registry Workflows UI Workflow Register a Model Find Registered Models Deploy and Organize Models API Workflow Adding an MLflow Model to the Model Registry Deploy and Organize Models with Aliases and Tags Fetching an MLflow Model from the Model Registry Serving an MLflow Model from Model Registry Promoting an MLflow Model across environments Adding or Updating an MLflow Model Descriptions Renaming an MLflow Model Listing and Searching MLflow Models Deleting MLflow Models Registering a Model Saved Outside MLflow Registering an Unsupported Machine Learning Model Transitioning an MLflow Model's Stage Archiving an MLflow Model Concepts The Model Registry introduces a few concepts that describe and facilitate the full lifecycle of an MLflow Model. ModelAn MLflow Model is created from an experiment or run that is logged with one of the model flavor's mlflow.<model_flavor>.log_model() methods. Once logged, this model can then be registered with the Model Registry. Registered ModelAn MLflow Model can be registered with the Model Registry. A registered model has a unique name, contains versions,",0,model-registry.html
3 What is the purpose of registering a model with the Model Registry?,"The purpose of registering a model with the Model Registry is to have a unique name for the model, keep track of different versions of the model, store associated transitional stages, maintain model lineage, and store other metadata. By registering a model, you can easily refer to it using a model URI or the model registry API, making it convenient for deployment and updating of models in production workloads.","logged, this model can then be registered with the Model Registry. Registered ModelAn MLflow Model can be registered with the Model Registry. A registered model has a unique name, contains versions, associated transitional stages, model lineage, and other metadata. Model VersionEach registered model can have one or many versions. When a new model is added to the Model Registry, it is added as version 1. Each new model registered to the same model name increments the version number. Model AliasModel aliases allow you to assign a mutable, named reference to a particular version of a registered model. By assigning an alias to a specific model version, you can use the alias to refer that model version via a model URI or the model registry API. For example, you can create an alias named champion that points to version 1 of a model named MyModel. You can then refer to version 1 of MyModel by using the URI models:/MyModel@champion. Aliases are especially useful for deploying models. For example, you could assign a champion alias to the model version intended for production traffic and target this alias in production workloads. You can then update the model serving production traffic by reassigning the champion alias to a different model version. TagsTags are key-value pairs that you associate with registered models and model versions, allowing you to label and categorize them by function or status. For example, you could apply a tag with key ""task"" and value ""question-answering""",1,model-registry.html
4 What can you do with registered models and model versions?,"With registered models and model versions, you can label and categorize them by function or status. For example, you can apply tags to registered models to indicate their intended tasks, such as question-answering. At the model version level, you can tag versions undergoing pre-deployment validation or those cleared for deployment. Additionally, you can annotate the top-level model and each version individually using Markdown, providing descriptions and relevant information about the algorithm, dataset, or methodology.","associate with registered models and model versions, allowing you to label and categorize them by function or status. For example, you could apply a tag with key ""task"" and value ""question-answering"" (displayed in the UI as task:question-answering) to registered models intended for question answering tasks. At the model version level, you could tag versions undergoing pre-deployment validation with validation_status:pending and those cleared for deployment with validation_status:approved. Annotations and DescriptionsYou can annotate the top-level model and each version individually using Markdown, including description and any relevant information useful for the team such as algorithm descriptions, dataset employed or methodology. Model StageEach distinct model version can be assigned one stage at any given time. MLflow provides predefined stages for common use-cases such as Staging, Production or Archived. You can transition a model version from one stage to another stage. Model Registry Workflows If running your own MLflow server, you must use a database-backed backend store in order to access the model registry via the UI or API. See here for more information. Before you can add a model to the Model Registry, you must log it using the log_model methods of the corresponding model flavors. Once a model has been logged, you can add, modify, update, or delete the model in the Model Registry through the UI or the API. UI Workflow This section demonstrates how to use the MLflow",2,model-registry.html
5 "How can you add, modify, update, or delete a model in the Model Registry?","Once a model has been logged, you can add, modify, update, or delete the model in the Model Registry through the UI or the API.","flavors. Once a model has been logged, you can add, modify, update, or delete the model in the Model Registry through the UI or the API. UI Workflow This section demonstrates how to use the MLflow Model Registry UI to manage your MLflow models. Register a Model Follow the steps below to register your MLflow model in the Model Registry. Open the details page for the MLflow Run containing the logged MLflow model you'd like to register. Select the model folder containing the intended MLflow model in the Artifacts section. Click the Register Model button, which will trigger a form to pop up. In the Model dropdown menu on the form, you can either select "Create New Model", which creates a new registered model with your MLflow model as its initial version, or select an existing registered model, which registers your model under it as a new version. The screenshot below demonstrates registering the MLflow model to a new registered model named ""iris_model_testing"". Find Registered Models After you've registered your models in the Model Registry, you can navigate to them in the following ways. Navigate to the Registered Models page, which links to your registered models and correponding model versions. Go to the Artifacts section of your MLflow Runs details page, click the model folder, and then click the model version at the top right to view the version created from that model. Deploy and Organize Models You can deploy and organize your models in the Model Registry using model",3,model-registry.html
6 How can you deploy and organize models in the Model Registry?,"You can deploy and organize your models in the Model Registry using model aliases and tags. To set aliases and tags for model versions in your registered model, navigate to the overview page of your registered model. You can add or edit aliases and tags for a specific model version by clicking on the corresponding Add link or pencil icon in the model version table. To learn more about a specific model version, navigate to the details page for that model version. In this page, you can inspect model version details like the model signature, MLflow source run, and creation timestamp. You can also view and configure the version's aliases, tags, and description.","and then click the model version at the top right to view the version created from that model. Deploy and Organize Models You can deploy and organize your models in the Model Registry using model aliases and tags. To set aliases and tags for model versions in your registered model, navigate to the overview page of your registered model, such as the one below. You can add or edit aliases and tags for a specific model version by clicking on the corresponding Add link or pencil icon in the model verison table. To learn more about a specific model version, navigate to the details page for that model version. In this page, you can inspect model version details like the model signature, MLflow source run, and creation timestamp. You can also view and configure the verion's aliases, tags, and description. API Workflow An alternative way to interact with Model Registry is using the MLflow model flavor or MLflow Client Tracking API interface. In particular, you can register a model during an MLflow experiment run or after all your experiment runs. Adding an MLflow Model to the Model Registry There are three programmatic ways to add a model to the registry. First, you can use the mlflow.<model_flavor>.log_model() method. For example, in your code: from sklearn.datasets import make_regression from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split import mlflow import mlflow.sklearn from",4,model-registry.html
7 What is the purpose of the mlflow.sklearn.log_model() method?,"The purpose of the mlflow.sklearn.log_model() method is to log the sklearn model and register it as a new version in the MLflow registry. If a registered model with the specified name already exists, the method creates a new version of the model. This method is used to track and manage different versions of the trained models in MLflow.","from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split import mlflow import mlflow.sklearn from mlflow.models import infer_signature with mlflow.start_run() as run: X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) params = {""max_depth"": 2, ""random_state"": 42} model = RandomForestRegressor(**params) model.fit(X_train, y_train) # Infer the model signature y_pred = model.predict(X_test) signature = infer_signature(X_test, y_pred) # Log parameters and metrics using the MLflow APIs mlflow.log_params(params) mlflow.log_metrics({""mse"": mean_squared_error(y_test, y_pred)}) # Log the sklearn model and register as version 1 mlflow.sklearn.log_model( sk_model=model, artifact_path=""sklearn-model"", signature=signature, registered_model_name=""sk-learn-random-forest-reg-model"", ) In the above code snippet, if a registered model with the name doesn't exist, the method registers a new model and creates Version 1. If a registered model with the name exists, the method creates a new model version. The second way is to use the mlflow.register_model() method, after all your experiment runs complete and when you have decided which model is most suitable to add to the registry. For this method, you will need the run_id as part of the runs:URI argument. result =",5,model-registry.html
8 What method do you use to create a new registered model?,"To create a new registered model, you can use the create_registered_model() method. If the model name already exists, this method will throw an MlflowException because creating a new registered model requires a unique name. The create_registered_model() method creates an empty registered model with no version associated.","all your experiment runs complete and when you have decided which model is most suitable to add to the registry. For this method, you will need the run_id as part of the runs:URI argument. result = mlflow.register_model( ""runs:/d16076a3ec534311817565e6527539c0/sklearn-model"", ""sk-learn-random-forest-reg"" ) If a registered model with the name doesn't exist, the method registers a new model, creates Version 1, and returns a ModelVersion MLflow object. If a registered model with the name exists, the method creates a new model version and returns the version object. And finally, you can use the create_registered_model() to create a new registered model. If the model name exists, this method will throw an MlflowException because creating a new registered model requires a unique name. from mlflow import MlflowClient client = MlflowClient() client.create_registered_model(""sk-learn-random-forest-reg-model"") The method above creates an empty registered model with no version associated. You can use create_model_version() as shown below to create a new version of the model. client = MlflowClient() result = client.create_model_version( name=""sk-learn-random-forest-reg-model"", source=""mlruns/0/d16076a3ec534311817565e6527539c0/artifacts/sklearn-model"", run_id=""d16076a3ec534311817565e6527539c0"", ) Deploy and Organize Models with Aliases and Tags Model aliases and tags help you deploy and organize your models in the Model Registry. Set and delete aliases on models To set, update, and delete",6,model-registry.html
9 How can you deploy and organize models in the Model Registry?,"You can deploy and organize models in the Model Registry by using model aliases and tags. Model aliases allow you to set, update, and delete aliases on models, while model tags allow you to set and delete tags on models. To set an alias for a model, you can use the `set_registered_model_alias` function from the MLflow Client API. To delete an alias, you can use the `delete_registered_model_alias` function. Similarly, to set a tag for a model, you can use the `set_registered_model_tag` function, and to delete a tag, you can use the `delete_registered_model_tag` function. Additionally, you can set and delete tags for specific model versions using the `set_model_version_tag` and `delete_model_version_tag` functions respectively.",") Deploy and Organize Models with Aliases and Tags Model aliases and tags help you deploy and organize your models in the Model Registry. Set and delete aliases on models To set, update, and delete aliases using the MLflow Client API, see the examples below: from mlflow import MlflowClient client = MlflowClient() # create ""champion"" alias for version 1 of model ""example-model"" client.set_registered_model_alias(""example-model"", ""champion"", 1) # reassign the ""Champion"" alias to version 2 client.set_registered_model_alias(""example-model"", ""Champion"", 2) # get a model version by alias client.get_model_version_by_alias(""example-model"", ""Champion"") # delete the alias client.delete_registered_model_alias(""example-model"", ""Champion"") Set and delete tags on models To set and delete tags using the MLflow Client API, see the examples below: from mlflow import MlflowClient client = MlflowClient() # Set registered model tag client.set_registered_model_tag(""example-model"", ""task"", ""classification"") # Delete registered model tag client.delete_registered_model_tag(""example-model"", ""task"") # Set model version tag client.set_model_version_tag(""example-model"", ""1"", ""validation_status"", ""approved"") # Delete model version tag client.delete_model_version_tag(""example-model"", ""1"", ""validation_status"") For more details on alias and tag client APIs, see the mlflow.client API documentation. Fetching an MLflow Model from the Model Registry After you have registered an MLflow model, you can fetch that",7,model-registry.html
10 How can you fetch a specific model version?,"To fetch a specific model version, you can supply that version number as part of the model URI. For example, you can use the mlflow.pyfunc.load_model() function with the model_uri parameter set to 'models:/{model_name}/{model_version}' to fetch the desired model version. Once the model is fetched, you can use it for predictions or in inference workloads.","For more details on alias and tag client APIs, see the mlflow.client API documentation. Fetching an MLflow Model from the Model Registry After you have registered an MLflow model, you can fetch that model using mlflow.<model_flavor>.load_model(), or more generally, load_model(). You can use the loaded model for one off predictions or in inference workloads such as batch inference. Fetch a specific model version To fetch a specific model version, just supply that version number as part of the model URI. import mlflow.pyfunc model_name = ""sk-learn-random-forest-reg-model"" model_version = 1 model = mlflow.pyfunc.load_model(model_uri=f""models:/{model_name}/{model_version}"") model.predict(data) Fetch a model version by alias To fetch a model version by alias, specify the model alias in the model URI, and it will fetch the model version currently under it. import mlflow.pyfunc model_name = ""sk-learn-random-forest-reg-model"" alias = ""champion"" champion_version = mlflow.pyfunc.load_model(f""models:/{model_name}@{alias}"") champion_version.predict(data) Note that model alias assignments can be updated independently of your production code. If the champion alias in the snippet above is reassigned to a new model version in the Model Registry, the next execution of this snippet will automatically pick up the new model version. This allows you to decouple model deployments from your inference workloads. Fetch the latest model version in a specific stage To fetch a model version by stage,",8,model-registry.html
11 How can you fetch the latest model version in a specific stage?,"To fetch the latest model version in a specific stage, you can provide the model stage as part of the model URI and it will fetch the most recent version of the model in that stage.","pick up the new model version. This allows you to decouple model deployments from your inference workloads. Fetch the latest model version in a specific stage To fetch a model version by stage, simply provide the model stage as part of the model URI, and it will fetch the most recent version of the model in that stage. import mlflow.pyfunc model_name = ""sk-learn-random-forest-reg-model"" stage = ""Staging"" model = mlflow.pyfunc.load_model(model_uri=f""models:/{model_name}/{stage}"") model.predict(data) Serving an MLflow Model from Model Registry After you have registered an MLflow model, you can serve the model as a service on your host. #!/usr/bin/env sh # Set environment variable for the tracking URL where the Model Registry resides export MLFLOW_TRACKING_URI=http://localhost:5000 # Serve the production model from the model registry mlflow models serve -m ""models:/sk-learn-random-forest-reg-model@champion"" Promoting an MLflow Model across environments Over the course of a model's lifecycle, it might progress through various separate environments like development, testing, staging, production, and so on. This segregation facilitates continuous integration and deployment for the model. In MLflow, you can use registered models to set up environments for your MLflow Models, where each registered model corresponds to a specific environment. Furthermore, you can configure access controls for the registered models using MLflow Authentication. Then, to promote MLflow Models across",9,model-registry.html
12 What can you do to promote MLflow Models across environments?,"To promote MLflow Models across environments, you can use the copy_model_version() method to copy model versions across registered models.","each registered model corresponds to a specific environment. Furthermore, you can configure access controls for the registered models using MLflow Authentication. Then, to promote MLflow Models across environments, you can use the copy_model_version() method to copy model versions across registered models. client = MlflowClient() client.copy_model_version( src_model_uri=""models:/regression-model-staging@candidate"", dst_name=""regression-model-production"", ) This code snippet copies the model version with the candidate alias in the regression-model-staging model to the regression-model-production model as the latest version. Adding or Updating an MLflow Model Descriptions At any point in a model's lifecycle development, you can update a model version's description using update_model_version(). client = MlflowClient() client.update_model_version( name=""sk-learn-random-forest-reg-model"", version=1, description=""This model version is a scikit-learn random forest containing 100 decision trees"", ) Renaming an MLflow Model As well as adding or updating a description of a specific version of the model, you can rename an existing registered model using rename_registered_model(). client = MlflowClient() client.rename_registered_model( name=""sk-learn-random-forest-reg-model"", new_name=""sk-learn-random-forest-reg-model-100"", ) Listing and Searching MLflow Models You can fetch a list of registered models in the registry with a simple method. from pprint import pprint client =",10,model-registry.html
13 How can you fetch a list of registered models in the MLflow registry?,"You can fetch a list of registered models in the MLflow registry by using the `search_registered_models()` method provided by the `MlflowClient` class. This method returns a list of registered models, and you can iterate over the list to access the details of each model. Additionally, you can use the `pprint` module to print the details of each model in a more readable format.",") Listing and Searching MLflow Models You can fetch a list of registered models in the registry with a simple method. from pprint import pprint client = MlflowClient() for rm in client.search_registered_models(): pprint(dict(rm), indent=4) This outputs: { 'creation_timestamp': 1582671933216, 'description': None, 'last_updated_timestamp': 1582671960712, 'latest_versions': [<ModelVersion: creation_timestamp=1582671933246, current_stage='Production', description='A random forest model containing 100 decision trees trained in scikit-learn', last_updated_timestamp=1582671960712, name='sk-learn-random-forest-reg-model', run_id='ae2cc01346de45f79a44a320aab1797b', source='./mlruns/0/ae2cc01346de45f79a44a320aab1797b/artifacts/sklearn-model', status='READY', status_message=None, user_id=None, version=1>, <ModelVersion: creation_timestamp=1582671960628, current_stage='None', description=None, last_updated_timestamp=1582671960628, name='sk-learn-random-forest-reg-model', run_id='d994f18d09c64c148e62a785052e6723', source='./mlruns/0/d994f18d09c64c148e62a785052e6723/artifacts/sklearn-model', status='READY', status_message=None, user_id=None, version=2>], 'name': 'sk-learn-random-forest-reg-model'} With hundreds of models, it can be cumbersome to peruse the results returned from this call. A more efficient approach would be to search for a specific model name and list its version details using search_model_versions() method and provide a filter string such as",11,model-registry.html
14 What is the name of the model and its version details?,The name of the model is 'sk-learn-random-forest-reg-model' and it has two versions. Version 1 of the model was created on 1582671933246 and is currently in the 'Production' stage. It is a random forest model containing 100 decision trees trained in scikit-learn. Version 2 of the model was created on 1582671960628 and is not assigned to any stage.,"returned from this call. A more efficient approach would be to search for a specific model name and list its version details using search_model_versions() method and provide a filter string such as ""name='sk-learn-random-forest-reg-model'"" client = MlflowClient() for mv in client.search_model_versions(""name='sk-learn-random-forest-reg-model'""): pprint(dict(mv), indent=4) This outputs: { ""creation_timestamp"": 1582671933246, ""current_stage"": ""Production"", ""description"": ""A random forest model containing 100 decision trees "" ""trained in scikit-learn"", ""last_updated_timestamp"": 1582671960712, ""name"": ""sk-learn-random-forest-reg-model"", ""run_id"": ""ae2cc01346de45f79a44a320aab1797b"", ""source"": ""./mlruns/0/ae2cc01346de45f79a44a320aab1797b/artifacts/sklearn-model"", ""status"": ""READY"", ""status_message"": None, ""user_id"": None, ""version"": 1, } { ""creation_timestamp"": 1582671960628, ""current_stage"": ""None"", ""description"": None, ""last_updated_timestamp"": 1582671960628, ""name"": ""sk-learn-random-forest-reg-model"", ""run_id"": ""d994f18d09c64c148e62a785052e6723"", ""source"": ""./mlruns/0/d994f18d09c64c148e62a785052e6723/artifacts/sklearn-model"", ""status"": ""READY"", ""status_message"": None, ""user_id"": None, ""version"": 2, } Deleting MLflow Models Note Deleting registered models or model versions is irrevocable, so use it judiciously. You can either delete specific versions of a registered model or you can delete a registered model and all its versions. # Delete versions 1,2, and 3 of the model client =",12,model-registry.html
15 What is the purpose of saving the model in pickled format?,"The purpose of saving the model in pickled format is to store the trained model object in a serialized form. This allows the model to be easily loaded into memory at a later time without having to retrain it. Pickling is a way to convert a Python object into a byte stream, which can then be saved to a file. By saving the model in pickled format, it can be shared, deployed, or used for inference on new data without the need for the original training code or data.","# Make predictions using the testing set diabetes_y_pred = lr_model.predict(diabetes_X_test) print_predictions(lr_model, diabetes_y_pred) # save the model in the native sklearn format filename = ""lr_model.pkl"" pickle.dump(lr_model, open(filename, ""wb"")) Coefficients: [938.23786125] Mean squared error: 2548.07 Coefficient of determination: 0.47 Once saved in pickled format, we can load the sklearn model into memory using pickle API and register the loaded model with the Model Registry. import mlflow from mlflow.models import infer_signature import numpy as np from sklearn import datasets # load the model into memory loaded_model = pickle.load(open(filename, ""rb"")) # create a signature for the model based on the input and output data diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True) diabetes_X = diabetes_X[:, np.newaxis, 2] signature = infer_signature(diabetes_X, diabetes_y) # log and register the model using MLflow scikit-learn API mlflow.set_tracking_uri(""sqlite:///mlruns.db"") reg_model_name = ""SklearnLinearRegression"" print(""--"") mlflow.sklearn.log_model( loaded_model, ""sk_learn"", serialization_format=""cloudpickle"", signature=signature, registered_model_name=reg_model_name, ) -- Successfully registered model 'SklearnLinearRegression'. 2021/04/02 16:30:57 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: SklearnLinearRegression, version 1 Created version '1' of model",15,model-registry.html
16 What is an MLflow Model and what is its purpose?,"An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools. Its purpose is to provide a convention that lets you save a model in different 'flavors' that can be understood by different downstream tools. Flavors are the key concept that makes MLflow Models powerful, as they allow deployment tools to understand the model without having to integrate each tool with each library.","Documentation MLflow Models MLflow Models An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools—for example, real-time serving through a REST API or batch inference on Apache Spark. The format defines a convention that lets you save a model in different "flavors" that can be understood by different downstream tools. Table of Contents Storage Format Model Signature And Input Example Model API Built-In Model Flavors Model Evaluation Model Customization Built-In Deployment Tools Deployment to Custom Targets Community Model Flavors Storage Format Each MLflow Model is a directory containing arbitrary files, together with an MLmodel file in the root of the directory that can define multiple flavors that the model can be viewed in. Flavors are the key concept that makes MLflow Models powerful: they are a convention that deployment tools can use to understand the model, which makes it possible to write tools that work with models from any ML library without having to integrate each tool with each library. MLflow defines several "standard" flavors that all of its built-in deployment tools support, such as a "Python function" flavor that describes how to run the model as a Python function. However, libraries can also define and use other flavors. For example, MLflow's mlflow.sklearn library allows loading models back as a scikit-learn Pipeline object for use in code that is aware of scikit-learn, or as a generic",0,models.html
17 What are the flavors defined in the MLmodel file for the mlflow.sklearn library?,The flavors defined in the MLmodel file for the mlflow.sklearn library are sklearn and python_function. The sklearn flavor allows loading models back as a scikit-learn Pipeline object for use in code that is aware of scikit-learn. The python_function flavor allows loading models as a generic Python function for use in tools that just need to apply the model.,"define and use other flavors. For example, MLflow's mlflow.sklearn library allows loading models back as a scikit-learn Pipeline object for use in code that is aware of scikit-learn, or as a generic Python function for use in tools that just need to apply the model (for example, the mlflow deployments tool with the option -t sagemaker for deploying models to Amazon SageMaker). All of the flavors that a particular model supports are defined in its MLmodel file in YAML format. For example, mlflow.sklearn outputs models as follows: # Directory written by mlflow.sklearn.save_model(model, ""my_model"") my_model/ ├── MLmodel ├── model.pkl ├── conda.yaml ├── python_env.yaml └── requirements.txt And its MLmodel file describes two flavors: time_created: 2018-05-25T17:28:53.35 flavors: sklearn: sklearn_version: 0.19.1 pickled_model: model.pkl python_function: loader_module: mlflow.sklearn This model can then be used with any tool that supports either the sklearn or python_function model flavor. For example, the mlflow models serve command can serve a model with the python_function or the crate (R Function) flavor: mlflow models serve -m my_model Note If you wish to serve a model from inside a docker container (or to query it from another machine), you need to change the network address to 0.0.0.0 using the -h argument. mlflow models serve -h 0.0.0.0 -m my_model In addition, the mlflow deployments command-line tool can package and deploy models to AWS SageMaker as long as they support the",1,models.html
18 What command can be used to package and deploy models to AWS SageMaker?,The mlflow deployments command-line tool can be used to package and deploy models to AWS SageMaker.,"using the -h argument. mlflow models serve -h 0.0.0.0 -m my_model In addition, the mlflow deployments command-line tool can package and deploy models to AWS SageMaker as long as they support the python_function flavor: mlflow deployments create -t sagemaker -m my_model [other options] Note When a model registered in the MLflow Model Registry is downloaded, a YAML file named registered_model_meta is added to the model directory on the downloader's side. This file contains the name and version of the model referenced in the MLflow Model Registry, and will be used for deployment and other purposes. Fields in the MLmodel Format Apart from a flavors field listing the model flavors, the MLmodel YAML format can contain the following fields: time_createdDate and time when the model was created, in UTC ISO 8601 format. run_idID of the run that created the model, if the model was saved using MLflow Tracking. signaturemodel signature in JSON format. input_examplereference to an artifact with input example. databricks_runtimeDatabricks runtime version and type, if the model was trained in a Databricks notebook or job. mlflow_versionThe version of MLflow that was used to log the model. Additional Logged Files For environment recreation, we automatically log conda.yaml, python_env.yaml, and requirements.txt files whenever a model is logged. These files can then be used to reinstall dependencies using conda or virtualenv with pip. Note Anaconda Inc. updated their terms of service for",2,models.html
19 What is the default channel logged for models using MLflow v1.18 and above?,"The default channel logged for models using MLflow v1.18 and above is conda-forge, which points at the community managed https://conda-forge.org/.","and requirements.txt files whenever a model is logged. These files can then be used to reinstall dependencies using conda or virtualenv with pip. Note Anaconda Inc. updated their terms of service for anaconda.org channels. Based on the new terms of service you may require a commercial license if you rely on Anaconda's packaging and distribution. See Anaconda Commercial Edition FAQ for more information. Your use of any Anaconda channels is governed by their terms of service. MLflow models logged before v1.18 were by default logged with the conda defaults channel (https://repo.anaconda.com/pkgs/) as a dependency. Because of this license change, MLflow has stopped the use of the defaults channel for models logged using MLflow v1.18 and above. The default channel logged is now conda-forge, which points at the community managed https://conda-forge.org/. If you logged a model before MLflow v1.18 without excluding the defaults channel from the conda environment for the model, that model may have a dependency on the defaults channel that you may not have intended. To manually confirm whether a model has this dependency, you can examine channel value in the conda.yaml file that is packaged with the logged model. For example, a model's conda.yaml with a defaults channel dependency may look like this: name: mlflow-env channels: - defaults dependencies: - python=3.8.8 - pip - pip: - mlflow==2.3 - scikit-learn==0.23.2 - cloudpickle==1.6.0 If you would like to change the channel used in a",3,models.html
20 What information is stored in the conda.yaml file?,"The conda.yaml file stores the channel used in a model's environment and the dependencies required by the model. It also contains information about the Python version, version specifiers for pip, setuptools, and wheel, and the pip requirements of the model. Additional pip dependencies can be added to requirements.txt by including them as a pip dependency in the conda environment and logging the model with the environment or using the pip_requirements argument of the mlflow.<flavor>.log_model API.","like this: name: mlflow-env channels: - defaults dependencies: - python=3.8.8 - pip - pip: - mlflow==2.3 - scikit-learn==0.23.2 - cloudpickle==1.6.0 If you would like to change the channel used in a model's environment, you can re-register the model to the model registry with a new conda.yaml. You can do this by specifying the channel in the conda_env parameter of log_model(). For more information on the log_model() API, see the MLflow documentation for the model flavor you are working with, for example, mlflow.sklearn.log_model(). conda.yamlWhen saving a model, MLflow provides the option to pass in a conda environment parameter that can contain dependencies used by the model. If no conda environment is provided, a default environment is created based on the flavor of the model. This conda environment is then saved in conda.yaml. python_env.yamlThis file contains the following information that's required to restore a model environment using virtualenv: Python version Version specifiers for pip, setuptools, and wheel Pip requirements of the model (reference to requirements.txt) requirements.txtThe requirements file is created from the pip portion of the conda.yaml environment specification. Additional pip dependencies can be added to requirements.txt by including them as a pip dependency in a conda environment and logging the model with the environment or using the pip_requirements argument of the mlflow.<flavor>.log_model API. The following shows an example of saving a model",4,models.html
21 How can you save a model with a manually specified conda environment?,"You can save a model with a manually specified conda environment by using the conda_env parameter of the mlflow.sklearn.log_model API. This parameter allows you to specify the channels, dependencies, and name of the conda environment. For example, you can define the conda environment as follows: conda_env = { 'channels': ['conda-forge'], 'dependencies': ['python=3.8.8', 'pip'], 'pip': ['mlflow==2.3', 'scikit-learn==0.23.2', 'cloudpickle==1.6.0'], 'name': 'mlflow-env' } Then, you can pass this conda_env parameter to the mlflow.sklearn.log_model function to save the model with the specified conda environment.","in a conda environment and logging the model with the environment or using the pip_requirements argument of the mlflow.<flavor>.log_model API. The following shows an example of saving a model with a manually specified conda environment and the corresponding content of the generated conda.yaml and requirements.txt files. conda_env = { ""channels"": [""conda-forge""], ""dependencies"": [""python=3.8.8"", ""pip""], ""pip"": [""mlflow==2.3"", ""scikit-learn==0.23.2"", ""cloudpickle==1.6.0""], ""name"": ""mlflow-env"", } mlflow.sklearn.log_model(..., conda_env=conda_env) The written conda.yaml file: name: mlflow-env channels: - conda-forge dependencies: - python=3.8.8 - pip - pip: - mlflow==2.3 - scikit-learn==0.23.2 - cloudpickle==1.6.0 The written python_env.yaml file: python: 3.8.8 build_dependencies: - pip==21.1.3 - setuptools==57.4.0 - wheel==0.37.0 dependencies: - -r requirements.txt The written requirements.txt file: mlflow==2.3 scikit-learn==0.23.2 cloudpickle==1.6.0 Model Signature And Input Example When working with ML models you often need to know some basic functional properties of the model at hand, such as "What inputs does it expect?" and "What output does it produce?". MLflow models can include the following additional metadata about model inputs, outputs and params that can be used by downstream tooling: Model Inference Params - description of params used for model inference. Model Signature - description of a model's inputs, outputs and parameters. Model Input Example - example of a",5,models.html
22 What are inference params and how are they used during model inference?,"Inference params are parameters that are passed to the model at inference time. These parameters do not need to be specified when training the model, but could be useful for inference. With the advances in foundational models, more often 'inference configuration' is used to modify the behavior of a model. In some cases, especially popular LLMs, the same model may require different parameter configurations for different samples at inference time. By passing different params such as temperature, max_length, etc. to the model at inference time, you can easily control the output of the model. In order to use params at inference time, a valid Model Signature with params must be defined. The params are passed to the model at inference time as a dictionary and each param value will be validated against the corresponding param type defined in the model signature.","tooling: Model Inference Params - description of params used for model inference. Model Signature - description of a model's inputs, outputs and parameters. Model Input Example - example of a valid model input. Model Inference Params Inference params are parameters that are passed to the model at inference time. These parameters do not need to be specified when training the model, but could be useful for inference. With the advances in foundational models, more often "inference configuration" is used to modify the behavior of a model. In some cases, especially popular LLMs, the same model may require different parameter configurations for different samples at inference time. With this newly introduced feature, you can now specify a dictionary of inference params during model inference, providing a broader utility and improved control over the generated inference results, particularly for LLM use cases. By passing different params such as temperature, max_length, etc. to the model at inference time, you can easily control the output of the model. In order to use params at inference time, a valid Model Signature with params must be defined. The params are passed to the model at inference time as a dictionary and each param value will be validated against the corresponding param type defined in the model signature. Valid param types are DataType or a list of DataType as listed below. DataType.string or an array of DataType.string DataType.integer or an array of DataType.integer",6,models.html
23 What is the purpose of model signatures in MLflow?,"Model signatures in MLflow define input, output, and parameter schemas for models, providing a standard interface to codify and enforce the correct use of the models. Signatures are used by the MLflow Tracking UI and Model Registry UI to display model inputs, outputs, and parameters. They are also utilized by MLflow model deployment tools to validate inference inputs according to the model's assigned signature. To include a signature with a model, a model input example is passed as an argument to the log_model or save_model call. The model signature is stored in JSON format in the MLmodel file in the model artifacts.","[1, 2, 3]] # Passing some params -- add default values loaded_predict = loaded_model.predict([""input""], params={""str_param"": ""new_string""}) assert loaded_predict == [""new_string"", [1, 2, 3]] # Passing all params -- override loaded_predict = loaded_model.predict( [""input""], params={""str_param"": ""new_string"", ""int_array"": [4, 5, 6]} ) assert loaded_predict == [""new_string"", [4, 5, 6]] Model Signature Model signatures define input, output and parameters schemas for MLflow models, providing a standard interface to codify and enforce the correct use of your models. Signatures are fetched by the MLflow Tracking UI and Model Registry UI to display model inputs, outputs and params. They are also utilized by MLflow model deployment tools to validate inference inputs according to the model's assigned signature (see the Signature enforcement section for more details). To include a signature with your model, pass a model input example as an argument to the appropriate log_model or save_model call, e.g. sklearn.log_model(), and the model signature will be automatically inferred (see the How to log models with signatures section for more details). The model signature is stored in JSON format in the MLmodel file in your model artifacts, together with other model metadata. To set a signature on a logged or saved model, use the set_signature() API (see the How to set signatures on models section for more details). Model Signature Types A model signature consists on inputs and outputs schemas,",8,models.html
24 What is the API used to set signatures on models?,The API used to set signatures on models is the set_signature() API.,"a logged or saved model, use the set_signature() API (see the How to set signatures on models section for more details). Model Signature Types A model signature consists on inputs and outputs schemas, each of which can be either column-based or tensor-based. Column-based schemas are a sequence of (optionally) named columns with type specified as one of the MLflow data types. Tensor-based schemas are a sequence of (optionally) named tensors with type specified as one of the numpy data types. Params schema is a sequence of ParamSpec, each of which contains name, type, default and shape fields. type field must be specified as one of the MLflow data types, and shape field should be None for scalar parameters, or (-1,) for list parameters. See some examples of constructing them below. Column-based Signature Example Each column-based input and output is represented by a type corresponding to one of MLflow data types and an optional name. Input columns can also be marked as optional, indicating whether they are required as input to the model or can be omitted. The following example displays a modified MLmodel file excerpt containing the model signature for a classification model trained on the Iris dataset. The input has 4 named, numeric columns and 1 named, optional string column. The output is an unnamed integer specifying the predicted class. signature: inputs: '[{""name"": ""sepal length (cm)"", ""type"": ""double""}, {""name"": ""sepal width (cm)"", ""type"": ""double""}, {""name"": ""petal",9,models.html
25 What components are used to generate the final time series?,"The final time series is generated by adding the seasonal component, trend component, and residual component.","numpy as np import pandas as pd from statsmodels.tsa.arima.model import ARIMA # create a time series dataset with seasonality np.random.seed(0) # generate a time index with a daily frequency dates = pd.date_range(start=""2022-12-01"", end=""2023-12-01"", freq=""D"") # generate the seasonal component (weekly) seasonality = np.sin(np.arange(len(dates)) * (2 * np.pi / 365.25) * 7) # generate the trend component trend = np.linspace(-5, 5, len(dates)) + 2 * np.sin( np.arange(len(dates)) * (2 * np.pi / 365.25) * 0.1 ) # generate the residual component residuals = np.random.normal(0, 1, len(dates)) # generate the final time series by adding the components time_series = seasonality + trend + residuals # create a dataframe from the time series data = pd.DataFrame({""date"": dates, ""value"": time_series}) data.set_index(""date"", inplace=True) order = (1, 0, 0) # create the ARIMA model model = ARIMA(data, order=order) mlflow.statsmodels.autolog( log_models=True, disable=False, exclusive=False, disable_for_unsupported_versions=False, silent=False, registered_model_name=None, ) with mlflow.start_run(): res = model.fit() mlflow.log_params( { ""order"": order, ""trend"": model.trend, ""seasonal_order"": model.seasonal_order, } ) mlflow.log_params(res.params) mlflow.log_metric(""aic"", res.aic) mlflow.log_metric(""bic"", res.bic) model_info = mlflow.statsmodels.log_model(res, artifact_path=""ARIMA_model"") # load the pyfunc model statsmodels_pyfunc = mlflow.pyfunc.load_model(model_uri=model_info.model_uri) #",61,models.html
26 What functionality does the configuration DataFrame submitted to the pyfunc flavor provide?,The configuration DataFrame submitted to the pyfunc flavor provides the functionality to generate a subset of forecast predictions based on the grouping key values in the first row. This functionality eliminates the need to filter a subset from the full output of all groups forecasts when only a few or one group's results are needed.,"is present in the configuration DataFrame submitted to the pyfunc flavor, the grouping key values in the first row will be used to generate a subset of forecast predictions. This functionality removes the need to filter a subset from the full output of all groups forecasts if the results of only a few (or one) groups are needed. For a GroupedPmdarima model, an example configuration for the pyfunc predict() method is: import mlflow import pandas as pd from pmdarima.arima.auto import AutoARIMA from diviner import GroupedPmdarima with mlflow.start_run(): base_model = AutoARIMA(out_of_sample_size=96, maxiter=200) model = GroupedPmdarima(model_template=base_model).fit( df=df, group_key_columns=[""country"", ""city""], y_col=""watts"", datetime_col=""datetime"", silence_warnings=True, ) mlflow.diviner.save_model(diviner_model=model, path=""/tmp/diviner_model"") diviner_pyfunc = mlflow.pyfunc.load_model(model_uri=""/tmp/diviner_model"") predict_conf = pd.DataFrame( { ""n_periods"": 120, ""groups"": [ (""US"", ""NewYork""), (""CA"", ""Toronto""), (""MX"", ""MexicoCity""), ], # NB: List of tuples required. ""predict_col"": ""wattage_forecast"", ""alpha"": 0.1, ""return_conf_int"": True, ""on_error"": ""warn"", }, index=[0], ) subset_forecasts = diviner_pyfunc.predict(predict_conf) Note There are several instances in which a configuration DataFrame submitted to the pyfunc predict() method will cause an MlflowException to be raised: If neither horizon or n_periods are provided. The value of n_periods or horizon is not an",84,models.html
27 What is a common configuration for lowering the total memory pressure for pytorch models within transformers pipelines?,A common configuration for lowering the total memory pressure for pytorch models within transformers pipelines is to modify the processing data type. This is achieved through setting the torch_dtype argument when creating a Pipeline.,"flavor will automatically apply the appropriate general signature that the pipeline type supports (only for a single-entity; collections will not be inferred). Scalability for inference A common configuration for lowering the total memory pressure for pytorch models within transformers pipelines is to modify the processing data type. This is achieved through setting the torch_dtype argument when creating a Pipeline. For a full reference of these tunable arguments for configuration of pipelines, see the training docs . Note This feature does not exist in versions of transformers < 4.26.x In order to apply these configurations to a saved or logged run, there are two options: Save a pipeline with the torch_dtype argument set to the encoding type of your choice. Example: import transformers import torch import mlflow task = ""translation_en_to_fr"" my_pipeline = transformers.pipeline( task=task, model=transformers.T5ForConditionalGeneration.from_pretrained(""t5-small""), tokenizer=transformers.T5TokenizerFast.from_pretrained( ""t5-small"", model_max_length=100 ), framework=""pt"", torch_dtype=torch.bfloat16, ) with mlflow.start_run(): model_info = mlflow.transformers.log_model( transformers_model=my_pipeline, artifact_path=""my_pipeline"", ) # Illustrate that the torch data type is recorded in the flavor configuration print(model_info.flavors[""transformers""]) Result: {'transformers_version': '4.28.1', 'code': None, 'task': 'translation_en_to_fr', 'instance_type': 'TranslationPipeline',",98,models.html
28 What does the save_model() function do?,"The save_model() function writes the model dependencies to a requirements.txt and conda.yaml file in the model output directory. It also saves the model itself in the specified path using the specified serialization format. If the serialization format is pickle, the model is saved using the pickle.dump() function. If the serialization format is not pickle, the model is saved using the cloudpickle.dump() function. Additionally, the save_model() function adds the pip dependencies produced by the flavor to the get_default_pip_requirements() function, which returns the default requirements defined in get_default_pip_requirements().","_CONDA_ENV_FILE_NAME), ""w"") as f: yaml.safe_dump(conda_env, stream=f, default_flow_style=False) if pip_constraints: write_to(os.path.join(path, _CONSTRAINTS_FILE_NAME), ""\n"".join(pip_constraints)) write_to(os.path.join(path, _REQUIREMENTS_FILE_NAME), ""\n"".join(pip_requirements)) _PythonEnv.current().to_yaml(os.path.join(path, _PYTHON_ENV_FILE_NAME)) def _save_model(model, path, serialization_format): with open(path, ""wb"") as out: if serialization_format == SERIALIZATION_FORMAT_PICKLE: pickle.dump(model, out) else: import cloudpickle cloudpickle.dump(model, out) The save_model() function also writes the model dependencies to a requirements.txt and conda.yaml file in the model output directory. For this purpose the set of pip dependecies produced by this flavor need to be added to the get_default_pip_requirements() function. In this example only the minimum required dependencies are provided. In practice, additional requirements needed for preprocessing or post-processing steps could be included. Note that for any custom flavor, the mlflow.models.infer_pip_requirements() method in the save_model() function will return the default requirements defined in get_default_pip_requirements() as package imports are only inferred for built-in flavors. def get_default_pip_requirements(include_cloudpickle=False): pip_deps = [_get_pinned_requirement(""sktime"")] if include_cloudpickle: pip_deps += [_get_pinned_requirement(""cloudpickle"")] return pip_deps def",129,models.html
29 What is an MLflow Project?,"An MLflow Project is a format for packaging data science code in a reusable and reproducible way, based primarily on conventions. It allows data scientists to organize and describe their code in a directory or a Git repository. MLflow Projects can be run using an API or command-line tools, and they can be chained together to create workflows. Each project can have a human-readable name and specify entry points, which are commands that can be run within the project with their respective parameters.","Documentation MLflow Projects MLflow Projects An MLflow Project is a format for packaging data science code in a reusable and reproducible way, based primarily on conventions. In addition, the Projects component includes an API and command-line tools for running projects, making it possible to chain together projects into workflows. Table of Contents Overview Specifying Projects Running Projects Iterating Quickly Building Multistep Workflows Overview At the core, MLflow Projects are just a convention for organizing and describing your code to let other data scientists (or automated tools) run it. Each project is simply a directory of files, or a Git repository, containing your code. MLflow can run some projects based on a convention for placing files in this directory (for example, a conda.yaml file is treated as a Conda environment), but you can describe your project in more detail by adding a MLproject file, which is a YAML formatted text file. Each project can specify several properties: NameA human-readable name for the project. Entry PointsCommands that can be run within the project, and information about their parameters. Most projects contain at least one entry point that you want other users to call. Some projects can also contain more than one entry point: for example, you might have a single Git repository containing multiple featurization algorithms. You can also call any .py or .sh file in the project as an entry point. If you list your entry points in a MLproject",0,projects.html
30 What are the entry points in a MLproject file and how can you specify parameters for them?,"The entry points in a MLproject file are the .py or .sh files in the project that can be called as an entry point. By listing the entry points in a MLproject file, you can specify parameters for them, including data types and default values. This allows you to customize the behavior of the entry points and pass different inputs to them when running the project.","might have a single Git repository containing multiple featurization algorithms. You can also call any .py or .sh file in the project as an entry point. If you list your entry points in a MLproject file, however, you can also specify parameters for them, including data types and default values. EnvironmentThe software environment that should be used to execute project entry points. This includes all library dependencies required by the project code. See Project Environments for more information about the software environments supported by MLflow Projects, including Conda environments, Virtualenv environments, and Docker containers. You can run any project from a Git URI or from a local directory using the mlflow run command-line tool, or the mlflow.projects.run() Python API. These APIs also allow submitting the project for remote execution on Databricks and Kubernetes. Important By default, MLflow uses a new, temporary working directory for Git projects. This means that you should generally pass any file arguments to MLflow project using absolute, not relative, paths. If your project declares its parameters, MLflow automatically makes paths absolute for parameters of type path. Specifying Projects By default, any Git repository or local directory can be treated as an MLflow project; you can invoke any bash or Python script contained in the directory as a project entry point. The Project Directories section describes how MLflow interprets directories as projects. To provide",1,projects.html
31 What are the project environments supported by MLflow?,"MLflow currently supports the following project environments: Virtualenv environment, conda environment, Docker container environment, and system environment. Virtualenv environments support Python packages available on PyPI. Docker containers allow you to capture non-Python dependencies such as Java libraries.","you can invoke any bash or Python script contained in the directory as a project entry point. The Project Directories section describes how MLflow interprets directories as projects. To provide additional control over a project's attributes, you can also include an MLproject file in your project's repository or directory. Finally, MLflow projects allow you to specify the software environment that is used to execute project entry points. Project Environments MLflow currently supports the following project environments: Virtualenv environment, conda environment, Docker container environment, and system environment. Virtualenv environment (preferred)Virtualenv environments support Python packages available on PyPI. When an MLflow Project specifies a Virtualenv environment, MLflow will download the specified version of Python by using pyenv and create an isolated environment that contains the project dependencies using virtualenv, activating it as the execution environment prior to running the project code. You can specify a Virtualenv environment for your MLflow Project by including a python_env entry in your MLproject file. For details, see the Project Directories and Specifying an Environment sections. Docker container environmentDocker containers allow you to capture non-Python dependencies such as Java libraries. When you run an MLflow project that specifies a Docker image, MLflow runs your image as is with the parameters specified in your MLproject file. In this case you'll",2,projects.html
32 What is the purpose of the --build-image flag when running mlflow run?,The --build-image flag is used to run the project with a new image that is based on the specified image and contains the project's contents in the /mlflow/projects/code directory. It allows you to build and use a custom Docker image for running the MLflow project with the desired environment and code.,"such as Java libraries. When you run an MLflow project that specifies a Docker image, MLflow runs your image as is with the parameters specified in your MLproject file. In this case you'll need to pre build your images with both environment and code to run it. To run the project with a new image that's based on your image and contains the project's contents in the /mlflow/projects/code directory, use the --build-image flag when running mlflow run. Environment variables, such as MLFLOW_TRACKING_URI, are propagated inside the Docker container during project execution. Additionally, runs and experiments created by the project are saved to the tracking server specified by your tracking URI. When running against a local tracking URI, MLflow mounts the host system's tracking directory (e.g., a local mlruns directory) inside the container so that metrics, parameters, and artifacts logged during project execution are accessible afterwards. See Dockerized Model Training with MLflow for an example of an MLflow project with a Docker environment. To specify a Docker container environment, you must add an MLproject file to your project. For information about specifying a Docker container environment in an MLproject file, see Specifying an Environment. Conda environmentConda environments support both Python packages and native libraries (e.g, CuDNN or Intel MKL). When an MLflow Project specifies a Conda environment, it is activated before project code is run. Warning By using conda, you're",3,projects.html
33 What is the purpose of specifying a Conda environment in an MLflow project?,"The purpose of specifying a Conda environment in an MLflow project is to ensure that the project code is run in a specific environment with the required dependencies. When a Conda environment is specified, it is activated before the project code is executed. This allows for better reproducibility and control over the project's runtime environment. By using Conda, the user is responsible for adhering to Anaconda's terms of service. MLflow uses the system path to find and run the Conda binary by default, but a different Conda installation can be used by setting the MLFLOW_CONDA_HOME environment variable.","both Python packages and native libraries (e.g, CuDNN or Intel MKL). When an MLflow Project specifies a Conda environment, it is activated before project code is run. Warning By using conda, you're responsible for adhering to Anaconda's terms of service. By default, MLflow uses the system path to find and run the conda binary. You can use a different Conda installation by setting the MLFLOW_CONDA_HOME environment variable; in this case, MLflow attempts to run the binary at $MLFLOW_CONDA_HOME/bin/conda. You can specify a Conda environment for your MLflow project by including a conda.yaml file in the root of the project directory or by including a conda_env entry in your MLproject file. For details, see the Project Directories and Specifying an Environment sections. The mlflow run command supports running a conda environment project as a virtualenv environment project. To do this, run mlflow run with --env-manager virtualenv: mlflow run /path/to/conda/project --env-manager virtualenv Warning When a conda environment project is executed as a virtualenv environment project, conda dependencies will be ignored and only pip dependencies will be installed. System environmentYou can also run MLflow Projects directly in your current system environment. All of the project's dependencies must be installed on your system prior to project execution. The system environment is supplied at runtime. It is not part of the MLflow Project's directory contents or MLproject file. For information",4,projects.html
34 What is the purpose of the MLproject file?,"The MLproject file is used to specify the attributes of an MLflow project, such as the project's name, Conda environment, and entry points. It allows users to define and configure their projects in a standardized way. The MLproject file also allows users to specify runtime parameters that can be passed to the project's entry points when running the project.","must be installed on your system prior to project execution. The system environment is supplied at runtime. It is not part of the MLflow Project's directory contents or MLproject file. For information about using the system environment when running a project, see the Environment parameter description in the Running Projects section. Project Directories When running an MLflow Project directory or repository that does not contain an MLproject file, MLflow uses the following conventions to determine the project's attributes: The project's name is the name of the directory. The Conda environment is specified in conda.yaml, if present. If no conda.yaml file is present, MLflow uses a Conda environment containing only Python (specifically, the latest Python available to Conda) when running the project. Any .py and .sh file in the project can be an entry point. MLflow uses Python to execute entry points with the .py extension, and it uses bash to execute entry points with the .sh extension. For more information about specifying project entrypoints at runtime, see Running Projects. By default, entry points do not have any parameters when an MLproject file is not included. Parameters can be supplied at runtime via the mlflow run CLI or the mlflow.projects.run() Python API. Runtime parameters are passed to the entry point on the command line using --key value syntax. For more information about running projects and with runtime parameters, see Running Projects. MLproject File You can get",5,projects.html
35 How can you pass runtime parameters to the entry point of an MLflow Project?,"Runtime parameters can be passed to the entry point of an MLflow Project on the command line using the --key value syntax. For more information about running projects and with runtime parameters, see the 'Running Projects' section. Additionally, you can also add an MLproject file to the project's root directory to have more control over the project. The MLproject file is a text file in YAML syntax and can specify a name, a Conda or Docker environment, as well as more detailed information about each entry point. Each entry point defines a command to run and parameters to pass to the command, including data types.","are passed to the entry point on the command line using --key value syntax. For more information about running projects and with runtime parameters, see Running Projects. MLproject File You can get more control over an MLflow Project by adding an MLproject file, which is a text file in YAML syntax, to the project's root directory. The following is an example of an MLproject file: name: My Project python_env: python_env.yaml # or # conda_env: my_env.yaml # or # docker_env: # image: mlflow-docker-example entry_points: main: parameters: data_file: path regularization: {type: float, default: 0.1} command: ""python train.py -r {regularization} {data_file}"" validate: parameters: data_file: path command: ""python validate.py {data_file}"" The file can specify a name and a Conda or Docker environment, as well as more detailed information about each entry point. Specifically, each entry point defines a command to run and parameters to pass to the command (including data types). Specifying an Environment This section describes how to specify Conda and Docker container environments in an MLproject file. MLproject files cannot specify both a Conda environment and a Docker environment. Virtualenv environmentInclude a top-level python_env entry in the MLproject file. The value of this entry must be a relative path to a python_env YAML file within the MLflow project's directory. The following is an example MLProject file with a python_env definition: python_env: files/config/python_env.yaml",6,projects.html
36 What is the relative path to the python_env YAML file within the MLflow project's directory?,The relative path to the python_env YAML file within the MLflow project's directory is files/config/python_env.yaml.,"be a relative path to a python_env YAML file within the MLflow project's directory. The following is an example MLProject file with a python_env definition: python_env: files/config/python_env.yaml python_env refers to an environment file located at <MLFLOW_PROJECT_DIRECTORY>/files/config/python_env.yaml, where <MLFLOW_PROJECT_DIRECTORY> is the path to the MLflow project's root directory. The following is an example of a python_env.yaml file: # Python version required to run the project. python: ""3.8.15"" # Dependencies required to build packages. This field is optional. build_dependencies: - pip - setuptools - wheel==0.37.1 # Dependencies required to run the project. dependencies: - mlflow==2.3 - scikit-learn==1.0.2 Conda environmentInclude a top-level conda_env entry in the MLproject file. The value of this entry must be a relative path to a Conda environment YAML file within the MLflow project's directory. In the following example: conda_env: files/config/conda_environment.yaml conda_env refers to an environment file located at <MLFLOW_PROJECT_DIRECTORY>/files/config/conda_environment.yaml, where <MLFLOW_PROJECT_DIRECTORY> is the path to the MLflow project's root directory. Docker container environmentInclude a top-level docker_env entry in the MLproject file. The value of this entry must be the name of a Docker image that is accessible on the system executing the project; this image name may include a registry path and tags. Here are a couple of examples. Example 1: Image",7,projects.html
37 What are the additional local volume mounted and environment variables in the docker container?,"In this example, the docker container has one additional local volume mounted and two additional environment variables. The additional local volume is mounted using the 'new_var_value' parameter, and the two additional environment variables are one newly-defined variable and one copied from the host system using the 'VAR_TO_COPY_FROM_HOST_ENVIRONMENT' parameter.","""new_var_value""], ""VAR_TO_COPY_FROM_HOST_ENVIRONMENT""] In this example our docker container will have one additional local volume mounted, and two additional environment variables: one newly-defined, and one copied from the host system. Example 3: Image in a remote registry docker_env: image: 012345678910.dkr.ecr.us-west-2.amazonaws.com/mlflow-docker-example-environment:7.0 In this example, docker_env refers to the Docker image with name mlflow-docker-example-environment and tag 7.0 in the Docker registry with path 012345678910.dkr.ecr.us-west-2.amazonaws.com, which corresponds to an Amazon ECR registry. When the MLflow project is run, Docker attempts to pull the image from the specified registry. The system executing the MLflow project must have credentials to pull this image from the specified registry. Example 4: Build a new image docker_env: image: python:3.8 mlflow run ... --build-image To build a new image that's based on the specified image and files contained in the project directory, use the --build-image argument. In the above example, the image python:3.8 is pulled from Docker Hub if it's not present locally, and a new image is built based on it. The project is executed in a container created from this image. Command Syntax When specifying an entry point in an MLproject file, the command can be any string in Python format string syntax. All of the parameters declared in the entry point's parameters field are passed into this string for substitution. If you call the",9,projects.html
38 How does MLflow run a Project on Kubernetes?,"MLflow runs a Project on Kubernetes by pushing the new Project image to a specified Docker registry and starting a Kubernetes Job on a specified Kubernetes cluster. The Kubernetes Job then downloads the Project image and starts a corresponding Docker container. Finally, the container invokes the Project's entry point, logging parameters, tags, metrics, and artifacts to the MLflow tracking server.","Docker environment. MLflow then pushes the new Project image to your specified Docker registry and starts a Kubernetes Job on your specified Kubernetes cluster. This Kubernetes Job downloads the Project image and starts a corresponding Docker container. Finally, the container invokes your Project's entry point, logging parameters, tags, metrics, and artifacts to your MLflow tracking server. Execution guide You can run your MLflow Project on Kubernetes by following these steps: Add a Docker environment to your MLflow Project, if one does not already exist. For reference, see Specifying an Environment. Create a backend configuration JSON file with the following entries: kube-context The Kubernetes context where MLflow will run the job. If not provided, MLflow will use the current context. If no context is available, MLflow will assume it is running in a Kubernetes cluster and it will use the Kubernetes service account running the current pod ('in-cluster' configuration). repository-uri The URI of the docker repository where the Project execution Docker image will be uploaded (pushed). Your Kubernetes cluster must have access to this repository in order to run your MLflow Project. kube-job-template-path The path to a YAML configuration file for your Kubernetes Job - a Kubernetes Job Spec. MLflow reads the Job Spec and replaces certain fields to facilitate job execution and monitoring; MLflow does not modify the original template file. For more information about writing",14,projects.html
39 What fields are replaced when MLflow creates a Kubernetes Job for an MLflow Project?,"When MLflow creates a Kubernetes Job for an MLflow Project, the following fields are replaced: metadata.name is replaced with a string containing the name of the MLflow Project and the time of Project execution, spec.template.spec.container[0].name is replaced with the name of the MLflow Project, spec.template.spec.container[0].image is replaced with the URI of the Docker image created during Project execution (including the Docker image's digest hash), and spec.template.spec.container[0].command is replaced with the Project entry point command specified when executing the MLflow Project.","by creating Kubernetes Job resources. MLflow creates a Kubernetes Job for an MLflow Project by reading a user-specified Job Spec. When MLflow reads a Job Spec, it formats the following fields: metadata.name Replaced with a string containing the name of the MLflow Project and the time of Project execution spec.template.spec.container[0].name Replaced with the name of the MLflow Project spec.template.spec.container[0].image Replaced with the URI of the Docker image created during Project execution. This URI includes the Docker image's digest hash. spec.template.spec.container[0].command Replaced with the Project entry point command specified when executing the MLflow Project. The following example shows a simple Kubernetes Job Spec that is compatible with MLflow Project execution. Replaced fields are indicated using bracketed text. Example Kubernetes Job Spec apiVersion: batch/v1 kind: Job metadata: name: ""{replaced with MLflow Project name}"" namespace: mlflow spec: ttlSecondsAfterFinished: 100 backoffLimit: 0 template: spec: containers: - name: ""{replaced with MLflow Project name}"" image: ""{replaced with URI of Docker image created during Project execution}"" command: [""{replaced with MLflow Project entry point command}""] env: [""{appended with MLFLOW_TRACKING_URI, MLFLOW_RUN_ID and MLFLOW_EXPERIMENT_ID}""] resources: limits: memory: 512Mi requests: memory: 256Mi restartPolicy: Never The container.name, container.image, and container.command fields are only replaced for the first",16,projects.html
40 What is the syntax for searching runs using the MLflow UI and API?,"The syntax for searching runs using the MLflow UI and API is a search filter API, which is a simplified version of the SQL WHERE clause. It consists of one or more expressions joined by the AND keyword. Each expression has three parts: an identifier on the left-hand side (LHS), a comparator, and a constant on the right-hand side (RHS). The syntax does not support the OR keyword.","Documentation Search Runs Search Runs The MLflow UI and API support searching runs within a single experiment or a group of experiments using a search filter API. This API is a simplified version of the SQL WHERE clause. Table of Contents Syntax Example Expressions Identifier Entity Names Containing Special Characters Entity Names Starting with a Number Run Attributes Datasets MLflow Tags Comparator Constant Programmatically Searching Runs Python R Java Syntax A search filter is one or more expressions joined by the AND keyword. The syntax does not support OR. Each expression has three parts: an identifier on the left-hand side (LHS), a comparator, and constant on the right-hand side (RHS). Example Expressions Search for the subset of runs with logged accuracy metric greater than 0.92. metrics.accuracy > 0.92 Search for all completed runs. attributes.status = ""FINISHED"" Search for all failed runs. attributes.status = ""FAILED"" Search for runs created after UNIX timestamp 1670628787527. attributes.created > 1670628787527 attributes.Created > 1670628787527 attributes.start_time > 1670628787527 Search for the subset of runs with F1 score greater than 0.5. metrics.`f1 score` > 0.5 Search for runs created by user 'john@mlflow.com'. tags.`mlflow.user` = 'john@mlflow.com' Search for runs with models trained using scikit-learn (assumes runs have a tag called model whose value starts with sklearn). tags.`model` LIKE 'sklearn%' Search for runs with logistic regression models, ignoring",0,search-runs.html
41 What is the syntax for searching runs using the MLflow UI and API?,"The syntax for searching runs using the MLflow UI and API is a search filter API, which is a simplified version of the SQL WHERE clause. A search filter consists of one or more expressions joined by the AND keyword. Each expression has three parts: an identifier on the left-hand side (LHS), a comparator, and a constant on the right-hand side (RHS). The syntax does not support the OR keyword.","Documentation Search Runs Search Runs The MLflow UI and API support searching runs within a single experiment or a group of experiments using a search filter API. This API is a simplified version of the SQL WHERE clause. Table of Contents Syntax Example Expressions Identifier Entity Names Containing Special Characters Entity Names Starting with a Number Run Attributes Datasets MLflow Tags Comparator Constant Programmatically Searching Runs Python R Java Syntax A search filter is one or more expressions joined by the AND keyword. The syntax does not support OR. Each expression has three parts: an identifier on the left-hand side (LHS), a comparator, and constant on the right-hand side (RHS). Example Expressions Search for the subset of runs with logged accuracy metric greater than 0.92. metrics.accuracy > 0.92 Search for all completed runs. attributes.status = ""FINISHED"" Search for all failed runs. attributes.status = ""FAILED"" Search for runs created after UNIX timestamp 1670628787527. attributes.created > 1670628787527 attributes.Created > 1670628787527 attributes.start_time > 1670628787527 Search for the subset of runs with F1 score greater than 0.5. metrics.`f1 score` > 0.5 Search for runs created by user 'john@mlflow.com'. tags.`mlflow.user` = 'john@mlflow.com' Search for runs with models trained using scikit-learn (assumes runs have a tag called model whose value starts with sklearn). tags.`model` LIKE 'sklearn%' Search for runs with logistic regression models, ignoring",0,search-runs.html
42 What are the key parts of a search expression in MLflow?,"The key parts of a search expression in MLflow are the identifier, entity type, entity name, comparison operator, and comparison value. The identifier is used to signify the entity to compare against and has two parts: the type of the entity and the name of the entity. The entity type can be metrics, params, attributes, datasets, or tags. The entity name can contain alphanumeric characters and special characters. When a metric, parameter, or tag name contains a special character, it should be enclosed in double quotes or backticks. The comparison operator is used to specify the comparison to be performed, such as equals, not equals, greater than, less than, etc. The comparison value is the value to compare the entity against.","with models trained using scikit-learn (assumes runs have a tag called model whose value starts with sklearn). tags.`model` LIKE 'sklearn%' Search for runs with logistic regression models, ignoring case (assumes runs have a tag called type whose value contains logistic). tags.`type` ILIKE '%Logistic%' Search for runs whose names contain alpha. attributes.`run_name` ILIKE ""%alpha%"" attributes.`run name` ILIKE ""%alpha%"" attributes.`Run name` ILIKE ""%alpha%"" attributes.`Run Name` ILIKE ""%alpha%"" Search for runs created using a Logistic Regression model, a learning rate (lambda) of 0.001, and recorded error metric under 0.05. params.alpha = ""0.3"" and params.lambda = ""0.001"" and metrics.error <= 0.05 Identifier Required in the LHS of a search expression. Signifies an entity to compare against. An identifier has two parts separated by a period: the type of the entity and the name of the entity. The type of the entity is metrics, params, attributes, datasets, or tags. The entity name can contain alphanumeric characters and special characters. This section describes supported entity names and how to specify such names in search expressions. In this section: Entity Names Containing Special Characters Entity Names Starting with a Number Run Attributes Datasets MLflow Tags Entity Names Containing Special Characters When a metric, parameter, or tag name contains a special character like hyphen, space, period, and so on, enclose the entity name in double quotes or backticks. Examples",1,search-runs.html
43 What are some examples of entity names that contain special characters?,"Some examples of entity names that contain special characters are params.""model-type"" and metrics.`error rate`.","Containing Special Characters When a metric, parameter, or tag name contains a special character like hyphen, space, period, and so on, enclose the entity name in double quotes or backticks. Examples params.""model-type"" metrics.`error rate` Entity Names Starting with a Number Unlike SQL syntax for column names, MLflow allows logging metrics, parameters, and tags names that have a leading number. If an entity name contains a leading number, enclose the entity name in double quotes. For example: metrics.""2019-04-02 error rate"" Run Attributes You can search using the following run attributes contained in mlflow.entities.RunInfo: run_id, run_name, status, artifact_uri, user_id, start_time and end_time. The run_id, run_name, status, user_id and artifact_uri attributes have string values, while start_time and end_time are numeric. Other fields in mlflow.entities.RunInfo are not searchable. Run name, Run Name and run name are aliases for run_name. created and Created are aliases for start_time. Note The experiment ID is implicitly selected by the search API. A run's lifecycle_stage attribute is not allowed because it is already encoded as a part of the API's run_view_type field. To search for runs using run_id, it is more efficient to use get_run APIs. Example attributes.artifact_uri = 'models:/mymodel/1' attributes.status = 'ACTIVE' # RHS value for start_time and end_time are unix timestamp attributes.start_time >= 1664067852747 attributes.end_time < 1664067852747",2,search-runs.html
44 What are the key attributes for the model with the run_id 'a1b2c3d4' and run_name 'my-run'?,"The key attributes for the model with the run_id 'a1b2c3d4' and run_name 'my-run' are: attributes.status = 'ACTIVE', attributes.start_time >= 1664067852747, attributes.end_time < 1664067852747, attributes.user_id = 'user1', attributes.run_id = 'a1b2c3d4', and attributes.run_id IN ('a1b2c3d4', 'e5f6g7h8').","= 'models:/mymodel/1' attributes.status = 'ACTIVE' # RHS value for start_time and end_time are unix timestamp attributes.start_time >= 1664067852747 attributes.end_time < 1664067852747 attributes.user_id = 'user1' attributes.run_name = 'my-run' attributes.run_id = 'a1b2c3d4' attributes.run_id IN ('a1b2c3d4', 'e5f6g7h8') Datasets You can search using the following dataset attributes contained in mlflow.entities.Dataset: name, digest. Additionally, you may search for a specific mlflow.entities.InputTag: with key mlflow.data.context under the alias context. All dataset attributes are string values. Other fields in mlflow.entities.Dataset are not searchable. Example datasets.name = 'mydataset' datasets.digest = 's8ds293b' datasets.digest IN ('s8ds293b', 'jks834s2') datasets.context = 'train' MLflow Tags You can search for MLflow tags by enclosing the tag name in double quotes or backticks. For example, to search by owner of an MLflow run, specify tags.""mlflow.user"" or tags.`mlflow.user`. Examples tags.""mlflow.user"" tags.`mlflow.parentRunId` Comparator There are two classes of comparators: numeric and string. Numeric comparators (metrics): =, !=, >, >=, <, and <=. String comparators (params, tags, and attributes): =, !=, LIKE and ILIKE. Constant The search syntax requires the RHS of the expression to be a constant. The type of the constant depends on LHS. If LHS is a metric, the RHS must be an integer or float number. If LHS is a parameter or tag, the RHS must be a string constant",3,search-runs.html
45 What type of constant does the RHS need to be if LHS is a metric?,"If LHS is a metric, the RHS must be an integer or float number.","expression to be a constant. The type of the constant depends on LHS. If LHS is a metric, the RHS must be an integer or float number. If LHS is a parameter or tag, the RHS must be a string constant enclosed in single or double quotes. Programmatically Searching Runs The MLflow UI supports searching runs contained within the current experiment. To search runs across multiple experiments, use one of the client APIs. Python Use the MlflowClient.search_runs() or mlflow.search_runs() API to search programmatically. You can specify the list of columns to order by (for example, "metrics.rmse") in the order_by column. The column can contain an optional DESC or ASC value; the default is ASC. The default ordering is to sort by start_time DESC, then run_id. The mlflow.search_runs() API can be used to search for runs within specific experiments which can be identified by experiment IDs or experiment names, but not both at the same time. Warning Using both experiment_ids and experiment_names in the same call will result in error unless one of them is None or [] For example, if you'd like to identify the best active run from experiment ID 0 by accuracy, use: from mlflow import MlflowClient from mlflow.entities import ViewType run = MlflowClient().search_runs( experiment_ids=""0"", filter_string="""", run_view_type=ViewType.ACTIVE_ONLY, max_results=1, order_by=[""metrics.accuracy DESC""], )[0] To get all active runs from experiments IDs 3, 4, and 17 that used a CNN model with 10 layers and had a",4,search-runs.html
46 "How can you get all active runs from experiments IDs 3, 4, and 17 that used a CNN model with 10 layers and had a prediction accuracy of 94.5% or higher?","To get all active runs from experiments IDs 3, 4, and 17 that used a CNN model with 10 layers and had a prediction accuracy of 94.5% or higher, you can use the following code: from mlflow import MlflowClient from mlflow.entities import ViewType query = ""params.model = 'CNN' and params.layers = '10' and metrics.`prediction accuracy` >= 0.945"" runs = MlflowClient().search_runs(experiment_ids=[""3"", ""4"", ""17""], filter_string=query, run_view_type=ViewType.ACTIVE_ONLY)","run_view_type=ViewType.ACTIVE_ONLY, max_results=1, order_by=[""metrics.accuracy DESC""], )[0] To get all active runs from experiments IDs 3, 4, and 17 that used a CNN model with 10 layers and had a prediction accuracy of 94.5% or higher, use: from mlflow import MlflowClient from mlflow.entities import ViewType query = ""params.model = 'CNN' and params.layers = '10' and metrics.`prediction accuracy` >= 0.945"" runs = MlflowClient().search_runs( experiment_ids=[""3"", ""4"", ""17""], filter_string=query, run_view_type=ViewType.ACTIVE_ONLY, ) To search all known experiments for any MLflow runs created using the Inception model architecture: import mlflow from mlflow.entities import ViewType all_experiments = [exp.experiment_id for exp in mlflow.search_experiments()] runs = mlflow.search_runs( experiment_ids=all_experiments, filter_string=""params.model = 'Inception'"", run_view_type=ViewType.ALL, ) To get all runs from the experiment named "Social NLP Experiments", use: import mlflow runs = mlflow.search_runs(experiment_names=[""Social NLP Experiments""]) R The R API is similar to the Python API. library(mlflow) mlflow_search_runs( filter = ""metrics.rmse < 0.9 and tags.production = 'true'"", experiment_ids = as.character(1:2), order_by = ""params.lr DESC"" ) Java The Java API is similar to Python API. List<Long> experimentIds = Arrays.asList(""1"", ""2"", ""4"", ""8""); List<RunInfo> searchResult = client.searchRuns(experimentIds, ""metrics.accuracy_score < 99.90""); Previous Next © MLflow Project, a",5,search-runs.html
47 What is the purpose of the 'experimentIds' variable in the given paragraph?,"The 'experimentIds' variable is used to store a list of Long values, specifically the experiment IDs '1', '2', '4', and '8'. These experiment IDs are later used as input parameters in the 'client.searchRuns' method to search for runs associated with these experiments.","API. List<Long> experimentIds = Arrays.asList(""1"", ""2"", ""4"", ""8""); List<RunInfo> searchResult = client.searchRuns(experimentIds, ""metrics.accuracy_score < 99.90""); Previous Next © MLflow Project, a Series of LF Projects, LLC. All rights reserved.",6,search-runs.html
48 What is the MLflow Tracking component used for?,"The MLflow Tracking component is used for logging parameters, code versions, metrics, and output files when running machine learning code and for later visualizing the results. It provides an API and UI for logging and querying experiments using Python, REST, R API, and Java API APIs.","Documentation MLflow Tracking MLflow Tracking The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results. MLflow Tracking lets you log and query experiments using Python, REST, R API, and Java API APIs. Table of Contents Concepts Where Runs Are Recorded How runs and artifacts are recorded Scenario 1: MLflow on localhost Scenario 2: MLflow on localhost with SQLite Scenario 3: MLflow on localhost with Tracking Server Scenario 4: MLflow with remote Tracking Server, backend and artifact stores Scenario 5: MLflow Tracking Server enabled with proxied artifact storage access Scenario 6: MLflow Tracking Server used exclusively as proxied access host for artifact storage access Logging Data to Runs Logging functions Launching Multiple Runs in One Program Performance Tracking with Metrics Visualizing Metrics Automatic Logging Scikit-learn Keras Gluon XGBoost LightGBM Statsmodels Spark Fastai Pytorch Organizing Runs in Experiments Managing Experiments and Runs with the Tracking Service API Tracking UI Querying Runs Programmatically MLflow Tracking Servers Storage Networking Using the Tracking Server for proxied artifact access Logging to a Tracking Server System Tags Concepts MLflow Tracking is organized around the concept of runs, which are executions of some piece of data science code. Each run records the following information: Code VersionGit commit hash",0,tracking.html
49 What information does each run record in MLflow Tracking?,"Each run in MLflow Tracking records the following information: code version (Git commit hash used for the run), start and end time of the run, source (name of the file or project name and entry point), input parameters (key-value pairs), metrics (key-value pairs where the value is numeric), and artifacts (output files in any format). MLflow allows you to record runs using Python, R, Java, and REST APIs from anywhere you run your code, such as in a standalone program, on a remote cloud machine, or in an interactive notebook. You can also organize runs into experiments to group together runs for a specific task.","Tags Concepts MLflow Tracking is organized around the concept of runs, which are executions of some piece of data science code. Each run records the following information: Code VersionGit commit hash used for the run, if it was run from an MLflow Project. Start & End TimeStart and end time of the run SourceName of the file to launch the run, or the project name and entry point for the run if run from an MLflow Project. ParametersKey-value input parameters of your choice. Both keys and values are strings. MetricsKey-value metrics, where the value is numeric. Each metric can be updated throughout the course of the run (for example, to track how your model's loss function is converging), and MLflow records and lets you visualize the metric's full history. ArtifactsOutput files in any format. For example, you can record images (for example, PNGs), models (for example, a pickled scikit-learn model), and data files (for example, a Parquet file) as artifacts. You can record runs using MLflow Python, R, Java, and REST APIs from anywhere you run your code. For example, you can record them in a standalone program, on a remote cloud machine, or in an interactive notebook. If you record runs in an MLflow Project, MLflow remembers the project URI and source version. You can optionally organize runs into experiments, which group together runs for a specific task. You can create an experiment using the mlflow experiments CLI, with mlflow.create_experiment(), or using the corresponding REST",1,tracking.html
50 How can you create an experiment in MLflow?,"You can create an experiment in MLflow using the mlflow experiments CLI, with mlflow.create_experiment(), or using the corresponding REST parameters.","runs into experiments, which group together runs for a specific task. You can create an experiment using the mlflow experiments CLI, with mlflow.create_experiment(), or using the corresponding REST parameters. The MLflow API and UI let you create and search for experiments. Once your runs have been recorded, you can query them using the Tracking UI or the MLflow API. Where Runs Are Recorded MLflow runs can be recorded to local files, to a SQLAlchemy-compatible database, or remotely to a tracking server. By default, the MLflow Python API logs runs locally to files in an mlruns directory wherever you ran your program. You can then run mlflow ui to see the logged runs. To log runs remotely, set the MLFLOW_TRACKING_URI environment variable to a tracking server's URI or call mlflow.set_tracking_uri(). There are different kinds of remote tracking URIs: Local file path (specified as file:/my/local/dir), where data is just directly stored locally. Database encoded as <dialect>+<driver>://<username>:<password>@<host>:<port>/<database>. MLflow supports the dialects mysql, mssql, sqlite, and postgresql. For more details, see SQLAlchemy database uri. HTTP server (specified as https://my-server:5000), which is a server hosting an MLflow tracking server. Databricks workspace (specified as databricks or as databricks://<profileName>, a Databricks CLI profile. Refer to Access the MLflow tracking server from outside Databricks [AWS] [Azure], or the quickstart to easily get started with hosted",2,tracking.html
51 How can you create an experiment using MLflow?,"You can create an experiment using the mlflow experiments CLI, with mlflow.create_experiment(), or using the corresponding REST parameters.","runs into experiments, which group together runs for a specific task. You can create an experiment using the mlflow experiments CLI, with mlflow.create_experiment(), or using the corresponding REST parameters. The MLflow API and UI let you create and search for experiments. Once your runs have been recorded, you can query them using the Tracking UI or the MLflow API. Where Runs Are Recorded MLflow runs can be recorded to local files, to a SQLAlchemy-compatible database, or remotely to a tracking server. By default, the MLflow Python API logs runs locally to files in an mlruns directory wherever you ran your program. You can then run mlflow ui to see the logged runs. To log runs remotely, set the MLFLOW_TRACKING_URI environment variable to a tracking server's URI or call mlflow.set_tracking_uri(). There are different kinds of remote tracking URIs: Local file path (specified as file:/my/local/dir), where data is just directly stored locally. Database encoded as <dialect>+<driver>://<username>:<password>@<host>:<port>/<database>. MLflow supports the dialects mysql, mssql, sqlite, and postgresql. For more details, see SQLAlchemy database uri. HTTP server (specified as https://my-server:5000), which is a server hosting an MLflow tracking server. Databricks workspace (specified as databricks or as databricks://<profileName>, a Databricks CLI profile. Refer to Access the MLflow tracking server from outside Databricks [AWS] [Azure], or the quickstart to easily get started with hosted",2,tracking.html
52 What are the two components used by MLflow for storage?,"MLflow uses two components for storage: backend store and artifact store. The backend store persists MLflow entities such as runs, parameters, metrics, tags, notes, and metadata. On the other hand, the artifact store persists artifacts such as files, models, images, in-memory objects, or model summaries.","or as databricks://<profileName>, a Databricks CLI profile. Refer to Access the MLflow tracking server from outside Databricks [AWS] [Azure], or the quickstart to easily get started with hosted MLflow on Databricks Community Edition. How runs and artifacts are recorded As mentioned above, MLflow runs can be recorded to local files, to a SQLAlchemy-compatible database, or remotely to a tracking server. MLflow artifacts can be persisted to local files and a variety of remote file storage solutions. For storing runs and artifacts, MLflow uses two components for storage: backend store and artifact store. While the backend store persists MLflow entities (runs, parameters, metrics, tags, notes, metadata, etc), the artifact store persists artifacts (files, models, images, in-memory objects, or model summary, etc). The MLflow server can be configured with an artifacts HTTP proxy, passing artifact requests through the tracking server to store and retrieve artifacts without having to interact with underlying object store services. Usage of the proxied artifact access feature is described in Scenarios 5 and 6 below. The MLflow client can interface with a variety of backend and artifact storage configurations. Here are four common configuration scenarios: Scenario 1: MLflow on localhost Many developers run MLflow on their local machine, where both the backend and artifact store share a directory on the local filesystem—./mlruns—as shown in the diagram. The MLflow client directly",3,tracking.html
53 What interfaces does the MLflow client use to record MLflow entities and artifacts when running MLflow on a local machine with a SQLAlchemy-compatible database?,"When running MLflow on a local machine with a SQLAlchemy-compatible database, the MLflow client uses the following interfaces to record MLflow entities and artifacts: An instance of a LocalArtifactRepository to save artifacts and an instance of an SQLAlchemyStore to store MLflow entities to a SQLite file mlruns.db.","Many developers run MLflow on their local machine, where both the backend and artifact store share a directory on the local filesystem—./mlruns—as shown in the diagram. The MLflow client directly interfaces with an instance of a FileStore and LocalArtifactRepository. In this simple scenario, the MLflow client uses the following interfaces to record MLflow entities and artifacts: An instance of a LocalArtifactRepository (to store artifacts) An instance of a FileStore (to save MLflow entities) Scenario 2: MLflow on localhost with SQLite Many users also run MLflow on their local machines with a SQLAlchemy-compatible database: SQLite. In this case, artifacts are stored under the local ./mlruns directory, and MLflow entities are inserted in a SQLite database file mlruns.db. In this scenario, the MLflow client uses the following interfaces to record MLflow entities and artifacts: An instance of a LocalArtifactRepository (to save artifacts) An instance of an SQLAlchemyStore (to store MLflow entities to a SQLite file mlruns.db) Scenario 3: MLflow on localhost with Tracking Server Similar to scenario 1 but a tracking server is launched, listening for REST request calls at the default port 5000. The arguments supplied to the mlflow server <args> dictate what backend and artifact stores are used. The default is local FileStore. For example, if a user launched a tracking server as mlflow server --backend-store-uri sqlite:///mydb.sqlite, then SQLite would be used for backend storage",4,tracking.html
54 What is the default backend store used by MLflow?,"The default backend store used by MLflow is the local FileStore. For example, if a user launched a tracking server as 'mlflow server --backend-store-uri sqlite:///mydb.sqlite', then SQLite would be used for backend storage instead.","are used. The default is local FileStore. For example, if a user launched a tracking server as mlflow server --backend-store-uri sqlite:///mydb.sqlite, then SQLite would be used for backend storage instead. As in scenario 1, MLflow uses a local mlruns filesystem directory as a backend store and artifact store. With a tracking server running, the MLflow client interacts with the tracking server via REST requests, as shown in the diagram. Command to run the tracking server in this configuration mlflow server --backend-store-uri file:///path/to/mlruns --no-serve-artifacts To store all runs' MLflow entities, the MLflow client interacts with the tracking server via a series of REST requests: Part 1a and b: The MLflow client creates an instance of a RestStore and sends REST API requests to log MLflow entities The Tracking Server creates an instance of a FileStore to save MLflow entities and writes directly to the local mlruns directory For the artifacts, the MLflow client interacts with the tracking server via a REST request: Part 2a, b, and c: The MLflow client uses RestStore to send a REST request to fetch the artifact store URI location The Tracking Server responds with an artifact store URI location The MLflow client creates an instance of a LocalArtifactRepository and saves artifacts to the local filesystem location specified by the artifact store URI (a subdirectory of mlruns) Scenario 4: MLflow with remote Tracking Server, backend and artifact stores MLflow also supports",5,tracking.html
55 What is the architecture depicted in this example scenario?,"The architecture depicted in this example scenario is a distributed architecture with a remote MLflow Tracking Server, a Postgres database for backend entity storage, and an S3 bucket for artifact storage.","to the local filesystem location specified by the artifact store URI (a subdirectory of mlruns) Scenario 4: MLflow with remote Tracking Server, backend and artifact stores MLflow also supports distributed architectures, where the tracking server, backend store, and artifact store reside on remote hosts. This example scenario depicts an architecture with a remote MLflow Tracking Server, a Postgres database for backend entity storage, and an S3 bucket for artifact storage. Command to run the tracking server in this configuration mlflow server --backend-store-uri postgresql://user:password@postgres:5432/mlflowdb --default-artifact-root s3://bucket_name --host remote_host --no-serve-artifacts To record all runs' MLflow entities, the MLflow client interacts with the tracking server via a series of REST requests: Part 1a and b: The MLflow client creates an instance of a RestStore and sends REST API requests to log MLflow entities The Tracking Server creates an instance of an SQLAlchemyStore and connects to the remote host to insert MLflow entities in the database For artifact logging, the MLflow client interacts with the remote Tracking Server and artifact storage host: Part 2a, b, and c: The MLflow client uses RestStore to send a REST request to fetch the artifact store URI location from the Tracking Server The Tracking Server responds with an artifact store URI location (an S3 storage URI in this case) The MLflow client creates an instance of an S3ArtifactRepository, connects to",6,tracking.html
56 What information does autologging capture when launching short-lived MLflow runs?,"Autologging captures the following information: Framework, Metrics, Parameters, Tags, Artifacts. For example, in the case of fastai, it logs user-specified metrics, optimizer data as parameters, and the parameters of the EarlyStoppingCallback and OneCycleScheduler callbacks. Model checkpoints are logged to a 'models' directory. On training end, the MLflow Model (fastai Learner model) and model summary text are logged. In the case of Pytorch, autologging captures Framework/module, Metrics, Parameters, Tags, and Artifacts. It logs training loss, validation loss, average_test_accuracy, and user-defined metrics. It also logs fit() parameters, optimizer name, learning rate, and epsilon. On training start, it logs the model summary, and on training end, it logs the MLflow Model (Pytorch model) and metrics from the EarlyStoppingCallback.","when launching short-lived MLflow runs that result in datasource information not being logged. Fastai Call mlflow.fastai.autolog() before your training code to enable automatic logging of metrics and parameters. See an example usage with Fastai. Autologging captures the following information: Framework Metrics Parameters Tags Artifacts fastai user-specified metrics Logs optimizer data as parameters. For example, epochs, lr, opt_func, etc; Logs the parameters of the EarlyStoppingCallback and OneCycleScheduler callbacks – Model checkpoints are logged to a 'models' directory; MLflow Model (fastai Learner model) on training end; Model summary text is logged Pytorch Call mlflow.pytorch.autolog() before your Pytorch Lightning training code to enable automatic logging of metrics, parameters, and models. See example usages here. Note that currently, Pytorch autologging supports only models trained using Pytorch Lightning. Autologging is triggered on calls to pytorch_lightning.trainer.Trainer.fit and captures the following information: Framework/module Metrics Parameters Tags Artifacts pytorch_lightning.trainer.Trainer Training loss; validation loss; average_test_accuracy; user-defined-metrics. fit() parameters; optimizer name; learning rate; epsilon. – Model summary on training start, MLflow Model (Pytorch model) on training end; pytorch_lightning.callbacks.earlystopping Training loss; validation loss; average_test_accuracy; user-defined-metrics. Metrics from the EarlyStopping",21,tracking.html
57 What is the purpose of the --serve-artifacts flag?,"The purpose of the --serve-artifacts flag is to enable proxied access for artifacts. It allows the client to access artifacts via HTTP requests to the MLflow Tracking Server without the need to configure access tokens or username and password environment variables for the underlying object store. By default, a server is launched with this flag to simplify access requirements for users of the MLflow client when writing or retrieving artifacts.","relevant to that experiment. By default, a server is launched with the --serve-artifacts flag to enable proxied access for artifacts. The uri mlflow-artifacts:/ replaces an otherwise explicit object store destination (e.g., "s3:/my_bucket/mlartifacts") for interfacing with artifacts. The client can access artifacts via HTTP requests to the MLflow Tracking Server. This simplifies access requirements for users of the MLflow client, eliminating the need to configure access tokens or username and password environment variables for the underlying object store when writing or retrieving artifacts. To disable proxied access for artifacts, specify --no-serve-artifacts. Provided an MLflow server configuration where the --default-artifact-root is s3://my-root-bucket, the following patterns will all resolve to the configured proxied object store location of s3://my-root-bucket/mlartifacts: https://<host>:<port>/mlartifacts http://<host>/mlartifacts mlflow-artifacts://<host>/mlartifacts mlflow-artifacts://<host>:<port>/mlartifacts mlflow-artifacts:/mlartifacts If the host or host:port declaration is absent in client artifact requests to the MLflow server, the client API will assume that the host is the same as the MLflow Tracking uri. Note If an MLflow server is running with the --artifact-only flag, the client should interact with this server explicitly by including either a host or host:port definition for uri location references for artifacts. Otherwise, all artifact requests will",29,tracking.html