autologging.md
1 ## MLflow automatic Logging with SynapseML 2 3 [MLflow automatic logging](https://www.mlflow.org/docs/latest/tracking.html#automatic-logging) allows you to log metrics, parameters, and models without the need for explicit log statements. 4 SynapseML supports autologging for every model in the library. 5 6 Install SynapseML library following this [guidance](https://microsoft.github.io/SynapseML/docs/getting_started/installation/) 7 8 Default mlflow [log_model_allowlist file](https://github.com/mlflow/mlflow/blob/master/mlflow/pyspark/ml/log_model_allowlist.txt) already includes some SynapseML models. To enable more models, you could use `mlflow.pyspark.ml.autolog(log_model_allowlist=YOUR_SET_OF_MODELS)` function, or follow the below guidance by specifying a link to the file and update spark configuration. 9 10 To enable autologging with your custom log_model_allowlist file: 11 12 1. Put your customized log_model_allowlist file at a place that your code has access to. ([SynapseML official log_model_allowlist file](https://mmlspark.blob.core.windows.net/publicwasb/log_model_allowlist.txt)) 13 For example: 14 15 - In Synapse `wasb://<containername>@<accountname>.blob.core.windows.net/PATH_TO_YOUR/log_model_allowlist.txt` 16 - In Databricks `/dbfs/FileStore/PATH_TO_YOUR/log_model_allowlist.txt`. 17 18 2. Set spark configuration `spark.mlflow.pysparkml.autolog.logModelAllowlistFile` to the path of your `log_model_allowlist.txt` file. 19 3. Call `mlflow.pyspark.ml.autolog()` before your training code to enable autologging for all supported models. 20 21 Note: 22 23 If you want to support autologging of PySpark models not present in the log_model_allowlist file, you can add such models to the file. 24 25 ## Configuration process in Databricks as an example 26 27 1. Install latest MLflow via `%pip install mlflow -u` 28 2. Upload your customized `log_model_allowlist.txt` file to dbfs by clicking File/Upload Data button on Databricks UI. 29 3. Set Cluster Spark configuration following [this documentation](https://docs.microsoft.com/en-us/azure/databricks/clusters/configure#spark-configuration) 30 31 ``` 32 spark.mlflow.pysparkml.autolog.logModelAllowlistFile /dbfs/FileStore/PATH_TO_YOUR/log_model_allowlist.txt 33 ``` 34 35 4. Run the following line before your training code executes. 36 37 ```python 38 import mlflow 39 40 mlflow.pyspark.ml.autolog() 41 ``` 42 43 You can customize how autologging works by supplying appropriate [parameters](https://www.mlflow.org/docs/latest/python_api/mlflow.pyspark.ml.html#mlflow.pyspark.ml.autolog). 44 45 5. To find your experiment's results via the `Experiments` tab of the MLflow UI. 46 <img src="https://mmlspark.blob.core.windows.net/graphics/adb_experiments.png" width="1200" /> 47 48 ## Example for ConditionalKNNModel 49 50 ```python 51 from pyspark.ml.linalg import Vectors 52 from synapse.ml.nn import ConditionalKNN 53 54 df = spark.createDataFrame( 55 [ 56 (Vectors.dense(2.0, 2.0, 2.0), "foo", 1), 57 (Vectors.dense(2.0, 2.0, 4.0), "foo", 3), 58 (Vectors.dense(2.0, 2.0, 6.0), "foo", 4), 59 (Vectors.dense(2.0, 2.0, 8.0), "foo", 3), 60 (Vectors.dense(2.0, 2.0, 10.0), "foo", 1), 61 (Vectors.dense(2.0, 2.0, 12.0), "foo", 2), 62 (Vectors.dense(2.0, 2.0, 14.0), "foo", 0), 63 (Vectors.dense(2.0, 2.0, 16.0), "foo", 1), 64 (Vectors.dense(2.0, 2.0, 18.0), "foo", 3), 65 (Vectors.dense(2.0, 2.0, 20.0), "foo", 0), 66 (Vectors.dense(2.0, 4.0, 2.0), "foo", 2), 67 (Vectors.dense(2.0, 4.0, 4.0), "foo", 4), 68 (Vectors.dense(2.0, 4.0, 6.0), "foo", 2), 69 (Vectors.dense(2.0, 4.0, 8.0), "foo", 2), 70 (Vectors.dense(2.0, 4.0, 10.0), "foo", 4), 71 (Vectors.dense(2.0, 4.0, 12.0), "foo", 3), 72 (Vectors.dense(2.0, 4.0, 14.0), "foo", 2), 73 (Vectors.dense(2.0, 4.0, 16.0), "foo", 1), 74 (Vectors.dense(2.0, 4.0, 18.0), "foo", 4), 75 (Vectors.dense(2.0, 4.0, 20.0), "foo", 4), 76 ], 77 ["features", "values", "labels"], 78 ) 79 80 cnn = ConditionalKNN().setOutputCol("prediction") 81 cnnm = cnn.fit(df) 82 83 test_df = spark.createDataFrame( 84 [ 85 (Vectors.dense(2.0, 2.0, 2.0), "foo", 1, [0, 1]), 86 (Vectors.dense(2.0, 2.0, 4.0), "foo", 4, [0, 1]), 87 (Vectors.dense(2.0, 2.0, 6.0), "foo", 2, [0, 1]), 88 (Vectors.dense(2.0, 2.0, 8.0), "foo", 4, [0, 1]), 89 (Vectors.dense(2.0, 2.0, 10.0), "foo", 4, [0, 1]), 90 ], 91 ["features", "values", "labels", "conditioner"], 92 ) 93 94 display(cnnm.transform(test_df)) 95 ``` 96 97 This code should log one run with a ConditionalKNNModel artifact and its parameters. 98 <img src="https://mmlspark.blob.core.windows.net/graphics/autologgingRunSample.png" width="1200" />