/ examples / synapseml / autologging.md
autologging.md
 1  ## MLflow automatic Logging with SynapseML
 2  
 3  [MLflow automatic logging](https://www.mlflow.org/docs/latest/tracking.html#automatic-logging) allows you to log metrics, parameters, and models without the need for explicit log statements.
 4  SynapseML supports autologging for every model in the library.
 5  
 6  Install SynapseML library following this [guidance](https://microsoft.github.io/SynapseML/docs/getting_started/installation/)
 7  
 8  Default mlflow [log_model_allowlist file](https://github.com/mlflow/mlflow/blob/master/mlflow/pyspark/ml/log_model_allowlist.txt) already includes some SynapseML models. To enable more models, you could use `mlflow.pyspark.ml.autolog(log_model_allowlist=YOUR_SET_OF_MODELS)` function, or follow the below guidance by specifying a link to the file and update spark configuration.
 9  
10  To enable autologging with your custom log_model_allowlist file:
11  
12  1. Put your customized log_model_allowlist file at a place that your code has access to. ([SynapseML official log_model_allowlist file](https://mmlspark.blob.core.windows.net/publicwasb/log_model_allowlist.txt))
13     For example:
14  
15  - In Synapse `wasb://<containername>@<accountname>.blob.core.windows.net/PATH_TO_YOUR/log_model_allowlist.txt`
16  - In Databricks `/dbfs/FileStore/PATH_TO_YOUR/log_model_allowlist.txt`.
17  
18  2. Set spark configuration `spark.mlflow.pysparkml.autolog.logModelAllowlistFile` to the path of your `log_model_allowlist.txt` file.
19  3. Call `mlflow.pyspark.ml.autolog()` before your training code to enable autologging for all supported models.
20  
21  Note:
22  
23  If you want to support autologging of PySpark models not present in the log_model_allowlist file, you can add such models to the file.
24  
25  ## Configuration process in Databricks as an example
26  
27  1. Install latest MLflow via `%pip install mlflow -u`
28  2. Upload your customized `log_model_allowlist.txt` file to dbfs by clicking File/Upload Data button on Databricks UI.
29  3. Set Cluster Spark configuration following [this documentation](https://docs.microsoft.com/en-us/azure/databricks/clusters/configure#spark-configuration)
30  
31  ```
32  spark.mlflow.pysparkml.autolog.logModelAllowlistFile /dbfs/FileStore/PATH_TO_YOUR/log_model_allowlist.txt
33  ```
34  
35  4. Run the following line before your training code executes.
36  
37  ```python
38  import mlflow
39  
40  mlflow.pyspark.ml.autolog()
41  ```
42  
43  You can customize how autologging works by supplying appropriate [parameters](https://www.mlflow.org/docs/latest/python_api/mlflow.pyspark.ml.html#mlflow.pyspark.ml.autolog).
44  
45  5. To find your experiment's results via the `Experiments` tab of the MLflow UI.
46     <img src="https://mmlspark.blob.core.windows.net/graphics/adb_experiments.png" width="1200" />
47  
48  ## Example for ConditionalKNNModel
49  
50  ```python
51  from pyspark.ml.linalg import Vectors
52  from synapse.ml.nn import ConditionalKNN
53  
54  df = spark.createDataFrame(
55      [
56          (Vectors.dense(2.0, 2.0, 2.0), "foo", 1),
57          (Vectors.dense(2.0, 2.0, 4.0), "foo", 3),
58          (Vectors.dense(2.0, 2.0, 6.0), "foo", 4),
59          (Vectors.dense(2.0, 2.0, 8.0), "foo", 3),
60          (Vectors.dense(2.0, 2.0, 10.0), "foo", 1),
61          (Vectors.dense(2.0, 2.0, 12.0), "foo", 2),
62          (Vectors.dense(2.0, 2.0, 14.0), "foo", 0),
63          (Vectors.dense(2.0, 2.0, 16.0), "foo", 1),
64          (Vectors.dense(2.0, 2.0, 18.0), "foo", 3),
65          (Vectors.dense(2.0, 2.0, 20.0), "foo", 0),
66          (Vectors.dense(2.0, 4.0, 2.0), "foo", 2),
67          (Vectors.dense(2.0, 4.0, 4.0), "foo", 4),
68          (Vectors.dense(2.0, 4.0, 6.0), "foo", 2),
69          (Vectors.dense(2.0, 4.0, 8.0), "foo", 2),
70          (Vectors.dense(2.0, 4.0, 10.0), "foo", 4),
71          (Vectors.dense(2.0, 4.0, 12.0), "foo", 3),
72          (Vectors.dense(2.0, 4.0, 14.0), "foo", 2),
73          (Vectors.dense(2.0, 4.0, 16.0), "foo", 1),
74          (Vectors.dense(2.0, 4.0, 18.0), "foo", 4),
75          (Vectors.dense(2.0, 4.0, 20.0), "foo", 4),
76      ],
77      ["features", "values", "labels"],
78  )
79  
80  cnn = ConditionalKNN().setOutputCol("prediction")
81  cnnm = cnn.fit(df)
82  
83  test_df = spark.createDataFrame(
84      [
85          (Vectors.dense(2.0, 2.0, 2.0), "foo", 1, [0, 1]),
86          (Vectors.dense(2.0, 2.0, 4.0), "foo", 4, [0, 1]),
87          (Vectors.dense(2.0, 2.0, 6.0), "foo", 2, [0, 1]),
88          (Vectors.dense(2.0, 2.0, 8.0), "foo", 4, [0, 1]),
89          (Vectors.dense(2.0, 2.0, 10.0), "foo", 4, [0, 1]),
90      ],
91      ["features", "values", "labels", "conditioner"],
92  )
93  
94  display(cnnm.transform(test_df))
95  ```
96  
97  This code should log one run with a ConditionalKNNModel artifact and its parameters.
98  <img src="https://mmlspark.blob.core.windows.net/graphics/autologgingRunSample.png" width="1200" />