Cradicle Explorer

/ docs / docs / community / usage-tracking.mdx
usage-tracking.mdx
  1  import { Table } from "@site/src/components/Table";
  2  
  3  # Usage Tracking
  4  
  5  Starting with version **3.2.0**, MLflow collects anonymized usage data by default. This data contains no sensitive or personally identifiable information.
  6  
  7  :::important
  8  MLflow does not collect any data that contains personal information, in accordance with GDPR and other privacy regulations.
  9  As a Linux Foundation project, MLflow adheres to the [**LF telemetry data collection and usage policy**](https://lfprojects.org/policies/telemetry-data-policy/).
 10  This implementation has been reviewed and approved by the Linux Foundation, with the approved proposal documented at the [**Completed Reviews**](https://lfprojects.org/policies/telemetry-data-policy/) section
 11  in the official policy. See the [`Data Explanation section`](#data-explanation) below for details on what is collected.
 12  :::
 13  
 14  :::note
 15  Telemetry is only enabled in **Open Source MLflow**. If you're using MLflow through a managed service or distribution,
 16  please consult your vendor to determine whether telemetry is enabled in your environment.
 17  In all cases, you can choose to opt out by following the guidance provided in our documentation.
 18  :::
 19  
 20  ## Why is data being collected?
 21  
 22  MLflow uses anonymous telemetry to understand feature usage, which helps guide development priorities and improve the library.
 23  This data helps us identify which features are most valuable and where to focus on bug fixes or enhancements.
 24  
 25  ### GDPR Compliance
 26  
 27  Under the General Data Protection Regulation (GDPR), data controllers and processors are responsible for handling personal data with care, transparency, and accountability.
 28  
 29  MLflow complies with GDPR in the following ways:
 30  
 31  - **No Personal Data Collected**: The telemetry data collected is fully anonymized and does not include any personal or sensitive information (e.g., usernames, IP addresses, file names, parameters, or model content). MLflow generates a random UUID for each session for aggregating usage events, which cannot be used to identify or track individual users.
 32  - **Purpose Limitation**: Data is only used to improve the MLflow project based on aggregate feature usage patterns.
 33  - **Data Minimization**: Only the minimum necessary metadata is collected to inform project priorities (e.g., feature toggle state, SDK/platform used, version info).
 34  - **User Control**: Users can opt out of telemetry at any time by setting the environment variable **MLFLOW_DISABLE_TELEMETRY=true** or **DO_NOT_TRACK=true**. MLflow respects these settings immediately without requiring a restart.
 35  - **Transparency**: Telemetry endpoints and behavior are documented publicly, and MLflow users can inspect or block the relevant network calls.
 36  
 37  For further inquiries or data protection questions, users can file an issue on the [MLflow GitHub repository](https://github.com/mlflow/mlflow/issues).
 38  
 39  ## What data is collected?
 40  
 41  MLflow collects only non-sensitive, anonymized data to help us better understand usage patterns.
 42  The below section outlines the data currently collected in this version of MLflow. You can view the exact data collected [in the source code](https://github.com/mlflow/mlflow/blob/c71fd0d677c1806ba2d5928398435c4de2c25c0e/mlflow/telemetry/schemas.py).
 43  
 44  ### Data Explanation
 45  
 46  <Table>
 47    <thead>
 48      <tr>
 49        <th>Data Element</th>
 50        <th>Explanation</th>
 51        <th>Example</th>
 52        <th>Why we track this</th>
 53      </tr>
 54    </thead>
 55    <tbody>
 56      <tr>
 57        <td>Unique session ID</td>
 58        <td>A randomly generated, non-personally identifiable UUID is created for each session—defined as each time MLflow is imported</td>
 59        <td>45e2751243e84c7e87aca6ac25d75a0d</td>
 60        <td>As an identifier for the data in current MLflow session</td>
 61      </tr>
 62      <tr>
 63        <td>Unique installation ID</td>
 64        <td>A randomly generated, non-personally identifiable UUID is created for each installation—defined as each time MLflow is imported. Added in MLflow 3.7.0.</td>
 65        <td>45e2751243e84c7e87aca6ac25d75a0d</td>
 66        <td>As an identifier for the data in current MLflow installation</td>
 67      </tr>
 68      <tr>
 69        <td>Source SDK</td>
 70        <td>The current used SDK name</td>
 71        <td>mlflow | mlflow-skinny | mlflow-tracing</td>
 72        <td>To understand adoption of different MLflow SDKs and identify enhancement areas</td>
 73      </tr>
 74      <tr>
 75        <td>MLflow version</td>
 76        <td>The current SDK version</td>
 77        <td>3.2.0</td>
 78        <td>To identify version-specific usage patterns and support, bug fixes, or deprecation decisions</td>
 79      </tr>
 80      <tr>
 81        <td>Python version</td>
 82        <td>The current python version</td>
 83        <td>3.10.16</td>
 84        <td>To ensure compatibility across Python versions and guide testing or upgrade recommendations</td>
 85      </tr>
 86      <tr>
 87        <td>Operating System</td>
 88        <td>The operating system on which MLflow is running</td>
 89        <td>macOS-15.4.1-arm64-arm-64bit</td>
 90        <td>To understand platform-specific usage and detect platform-dependent issues</td>
 91      </tr>
 92      <tr>
 93        <td>Tracking URI Scheme</td>
 94        <td>The scheme of the current tracking URI</td>
 95        <td>file | sqlite | mysql | postgresql | mssql | https | http | custom_scheme | None</td>
 96        <td>To determine which tracking backends are most commonly used and optimize backend support</td>
 97      </tr>
 98      <tr>
 99        <td>Event Name</td>
100        <td>The tracked event name (see [below table](#tracked-events) for what events are tracked)</td>
101        <td>create_experiment</td>
102        <td>To measure feature usage and improvements</td>
103      </tr>
104      <tr>
105        <td>Event Status</td>
106        <td>Whether the event succeeds or not</td>
107        <td>success | failure | unknown</td>
108        <td>To identify common failure points and improve reliability and error handling</td>
109      </tr>
110      <tr>
111        <td>Timestamp (nanoseconds)</td>
112        <td>Time when the event occurred</td>
113        <td>1753760188623715000</td>
114        <td>As an identifier for the event</td>
115      </tr>
116      <tr>
117        <td>Duration</td>
118        <td>The time the event call takes, in milliseconds</td>
119        <td>1000</td>
120        <td>To monitor performance trends and detect regressions in response time</td>
121      </tr>
122      <tr>
123        <td>Parameters (boolean or enumerated values)</td>
124        <td>See [below table](#tracked-events) for collected parameters for each event</td>
125        <td>create_logged_model event: `{"flavor": "langchain"}`</td>
126        <td>To better understand the usage pattern for each event</td>
127      </tr>
128    </tbody>
129  </Table>
130  
131  #### Tracked Events
132  
133  **No details about the specific model, code, or weights are collected.** Only the parameters listed under the `Tracked Parameters` column are recorded alongside the event;
134  For events with None in the `Tracked Parameters` column, only the event name is recorded. If "MLFLOW_EXPERIMENT_ID" environment variable exists, it is tracked as a param.
135  For a comprehensive list of tracked events, please refer to the [source code](https://github.com/mlflow/mlflow/blob/005f8b18186d254286a7d258a564b414f0ee0f75/mlflow/telemetry/events.py).
136  
137  <Table>
138    <thead>
139      <tr>
140        <th style={{ width: "20%" }}>Event Name</th>
141        <th style={{ width: "40%" }}>Tracked Parameters</th>
142        <th style={{ width: "40%" }}>Example</th>
143      </tr>
144    </thead>
145    <tbody>
146      <tr>
147        <td>create_experiment</td>
148        <td>Created Experiment ID (random uuid or integer)</td>
149        <td>`{"experiment_id": "0"}`</td>
150      </tr>
151      <tr>
152        <td>create_run</td>
153        <td>Imported packages among [MODULES_TO_CHECK_IMPORT](https://github.com/mlflow/mlflow/blob/c71fd0d677c1806ba2d5928398435c4de2c25c0e/mlflow/telemetry/constant.py#L19) are imported or not; experiment ID used when creating the run</td>
154        <td>`{"imports": ["sklearn"], "experiment_id": "0"}`</td>
155      </tr>
156      <tr>
157        <td>create_logged_model</td>
158        <td>Flavor of the model (e.g. langchain, sklearn)</td>
159        <td>`{"flavor": "langchain"}`</td>
160      </tr>
161      <tr>
162        <td>get_logged_model</td>
163        <td>Imported packages among [MODULES_TO_CHECK_IMPORT](https://github.com/mlflow/mlflow/blob/c71fd0d677c1806ba2d5928398435c4de2c25c0e/mlflow/telemetry/constant.py#L19) are imported or not</td>
164        <td>`{"imports": ["sklearn"]}`</td>
165      </tr>
166      <tr>
167        <td>create_registered_model</td>
168        <td>None</td>
169        <td>None</td>
170      </tr>
171      <tr>
172        <td>create_model_version</td>
173        <td>None</td>
174        <td>None</td>
175      </tr>
176      <tr>
177        <td>create_prompt</td>
178        <td>None</td>
179        <td>None</td>
180      </tr>
181      <tr>
182        <td>load_prompt</td>
183        <td>Whether alias is used</td>
184        <td>`{"uses_alias": True}`</td>
185      </tr>
186      <tr>
187        <td>start_trace</td>
188        <td>None</td>
189        <td>None</td>
190      </tr>
191      <tr>
192        <td>traces_received_by_server</td>
193        <td>Type of client (sanitized) that submitted the traces and number of completed traces received</td>
194        <td>`{"source": "MLFLOW_PYTHON_CLIENT", "count": 3}`</td>
195      </tr>
196      <tr>
197        <td>log_assessment</td>
198        <td>Type of the assessment and source</td>
199        <td>`{"type": "feedback", "source_type": "CODE"}`</td>
200      </tr>
201      <tr>
202        <td>evaluate</td>
203        <td>None</td>
204        <td>None</td>
205      </tr>
206      <tr>
207        <td>create_webhook</td>
208        <td>Entities of the webhook</td>
209        <td>`{"events": ["model_version.created"]}`</td>
210      </tr>
211      <tr>
212        <td>genai_evaluate</td>
213        <td>Builtin scorers used during GenAI Evaluate</td>
214        <td>`{"builtin_scorers": ["relevance_to_query"]}`</td>
215      </tr>
216      <tr>
217        <td>prompt_optimization</td>
218        <td>Optimizer type, number of prompts, and number of scorers</td>
219        <td>`{"optimizer_type": True, "prompt_count": 5, "scorer_count": 1}`</td>
220      </tr>
221      <tr>
222        <td>log_dataset</td>
223        <td>None</td>
224        <td>None</td>
225      </tr>
226      <tr>
227        <td>log_metric</td>
228        <td>Whether synchronous mode is on or not</td>
229        <td>`{"synchronous": False}`</td>
230      </tr>
231      <tr>
232        <td>log_param</td>
233        <td>Whether synchronous mode is on or not</td>
234        <td>`{"synchronous": True}`</td>
235      </tr>
236      <tr>
237        <td>log_batch</td>
238        <td>Information on whether metrics, parameters, or tags are logged, and the logging mode</td>
239        <td>`{"metrics": False, "params": True, "tags": False, "synchronous": False}`</td>
240      </tr>
241      <tr>
242        <td>invoke_custom_judge_model</td>
243        <td>Judge model provider</td>
244        <td>`{"model_provider": "databricks"}`</td>
245      </tr>
246      <tr>
247        <td>make_judge</td>
248        <td>Model provider (extracted from model string if format is provider:model)</td>
249        <td>`{"model_provider": "openai"}`</td>
250      </tr>
251      <tr>
252        <td>align_judge</td>
253        <td>Number of traces provided and optimizer type</td>
254        <td>`{"trace_count": 100, "optimizer_type": "AlignmentOptimizer"}`</td>
255      </tr>
256      <tr>
257        <td>autologging</td>
258        <td>Flavor and metadata</td>
259        <td>`{"flavor": "openai", "log_traces": True, "disable": False}`</td>
260      </tr>
261      <tr>
262        <td>ai_command_run</td>
263        <td>Command key and invocation context (cli or mcp)</td>
264        <td>`{"command_key": "genai/analyze_experiment", "context": "cli"}`</td>
265      </tr>
266      <tr>
267        <td>gateway_start</td>
268        <td>None</td>
269        <td>None</td>
270      </tr>
271      <tr>
272        <td>gateway_create_endpoint</td>
273        <td>Whether fallback config is set, routing strategy, and number of model configs</td>
274        <td>`{"has_fallback_config": true, "routing_strategy": "REQUEST_BASED_TRAFFIC_SPLIT", "num_model_configs": 2}`</td>
275      </tr>
276      <tr>
277        <td>gateway_update_endpoint</td>
278        <td>Whether fallback config is set, routing strategy, and number of model configs (null if not provided)</td>
279        <td>`{"has_fallback_config": false, "routing_strategy": "ROUND_ROBIN", "num_model_configs": 1}`</td>
280      </tr>
281      <tr>
282        <td>gateway_delete_endpoint</td>
283        <td>None</td>
284        <td>None</td>
285      </tr>
286      <tr>
287        <td>gateway_get_endpoint</td>
288        <td>None</td>
289        <td>None</td>
290      </tr>
291      <tr>
292        <td>gateway_list_endpoints</td>
293        <td>Whether filtering by provider</td>
294        <td>`{"filter_by_provider": true}`</td>
295      </tr>
296      <tr>
297        <td>gateway_create_secret</td>
298        <td>Provider name</td>
299        <td>`{"provider": "openai"}`</td>
300      </tr>
301      <tr>
302        <td>gateway_update_secret</td>
303        <td>None</td>
304        <td>None</td>
305      </tr>
306      <tr>
307        <td>gateway_delete_secret</td>
308        <td>None</td>
309        <td>None</td>
310      </tr>
311      <tr>
312        <td>gateway_list_secrets</td>
313        <td>Whether filtering by provider</td>
314        <td>`{"filter_by_provider": false}`</td>
315      </tr>
316      <tr>
317        <td>gateway_invocation</td>
318        <td>Whether streaming is enabled and the invocation type</td>
319        <td>`{"is_streaming": true, "invocation_type": "mlflow_chat_completions"}`</td>
320      </tr>
321      <tr>
322        <td>ui_event</td>
323        <td>A UI interaction event. See the [below table](#ui-interaction-metadata) for a description of the various metadata elements</td>
324        <td>`{ "eventType": "onClick", "componentViewId": "88fc9edd-5e9e-4a17-abd2-c543f505b8eb", "componentId": "mlflow.prompts.list.create", "componentType": "button", timestamp_ns: 1765784028467000000 }`</td>
325      </tr>
326    </tbody>
327  </Table>
328  
329  #### UI Interaction Metadata
330  
331  This table describes a list of metadata that may be collected together with a given UI interaction log.
332  
333  <Table>
334    <thead>
335      <tr>
336        <th style={{ width: "20%" }}>Metadata Element</th>
337        <th style={{ width: "40%" }}>Explanation</th>
338        <th style={{ width: "40%" }}>Example</th>
339      </tr>
340    </thead>
341    <tbody>
342      <tr>
343        <td>Component ID of interactive UI elements</td>
344        <td>An ID string of an interactive element (e.g. button, switch, link, input field) in the UI. A log is generated upon clicking, typing, or otherwise interacting with such elements. A comprehensive list of component ID values can be found by [this search query](https://github.com/search?q=repo%3Amlflow%2Fmlflow%20componentId%3D&type=code).</td>
345        <td>`mlflow.prompts.list.create` (identifier for the "Create prompt" button on the prompts page)</td>
346      </tr>
347      <tr>
348        <td>Event type</td>
349        <td>An enumerated categorical value describing the nature of the interaction</td>
350        <td>`onView`, `onClick`, `onValueChange`</td>
351      </tr>
352      <tr>
353        <td>Component type</td>
354        <td>An enumerated categorical value describing the type of component that the interaction happened with</td>
355        <td>`button`, `alert`, `banner`, `radio`, `input`, ...</td>
356      </tr>
357      <tr>
358        <td>Component View ID</td>
359        <td>A randomly generated UUID that is regenerated whenever the UI element rerenders</td>
360        <td>`774db636-5cfa-4ce8-8f56-7e7126dc3439`</td>
361      </tr>
362      <tr>
363        <td>Timestamp</td>
364        <td>The client-side timestamp of when the interaction occurred</td>
365        <td>`1765789548484000`.</td>
366      </tr>
367    </tbody>
368  </Table>
369  
370  ## Why is MLflow Telemetry Opt-Out?
371  
372  MLflow uses an opt-out telemetry model to help improve the platform for all users based on real-world usage patterns.
373  Collecting anonymous usage data by default allows us to:
374  
375  - Understand how MLflow is being used across a wide range of environments and workflows
376  - Identify common pain points and identify feature improvements area more effectively
377  - Measure the impact of changes and ensure they improve the experience for the broader community
378  
379  If telemetry were opt-in, only a small, self-selected subset of users would be represented, leading to biased insights and potentially misaligned priorities.
380  We are committed to transparency and user choice. Telemetry is clearly documented, anonymized, and can be easily disabled at any time through configuration.
381  This approach helps us make MLflow better for everyone, while giving you full control. Check [`what we are doing with this data`](#what-are-we-doing-with-this-data) section for more information.
382  
383  ## How to opt-out?
384  
385  MLflow supports opt-out telemetry through either of the following environment variables:
386  
387  - **MLFLOW_DISABLE_TELEMETRY=true**
388  - **DO_NOT_TRACK=true**
389  
390  Setting either of these will **immediately disable telemetry**, no need to re-import MLflow or restart your session.
391  
392  :::note
393  MLflow automatically disables telemetry in [**some CI environments**](https://github.com/mlflow/mlflow/blob/de6c11193ce6a68ffec4b33650f75bd163143178/mlflow/telemetry/utils.py#L22).
394  If you'd like support for additional CI environments, please [open an issue on our GitHub repository](https://github.com/mlflow/mlflow/issues).
395  
396  - CI
397  - Github Actions
398  - CircleCI
399  - GitLab CI/CD
400  - Jenkins Pipeline
401  - Travis CI
402  - Azure Pipelines
403  - BitBucket
404  - AWS CodeBuild
405  - BuildKite
406  - ...
407    :::
408  
409  ### Scope of the setting
410  
411  - The environment variable only takes effect in processes where it is explicitly set or inherited.
412  - If you spawn subprocesses from a clean environment, those subprocesses may not inherit your shell's environment, and telemetry could remain enabled. e.g. `subprocess.run([...], env={})`
413  - Setting this environment variable before running `mlflow server` also disables all UI telemetry
414  
415  Recommendations to ensure telemetry is consistently disabled across all environments:
416  
417  - Add the variable to your shell startup file (~/.bashrc, ~/.zshrc, etc.): `export MLFLOW_DISABLE_TELEMETRY=true`
418  - If you're using subprocesses or isolated environments, use a dotenv manager or explicitly pass the variable when launching.
419  
420  ### How to validate telemetry is disabled?
421  
422  Use the following code to validate telemetry is disabled.
423  
424  ```python
425  from mlflow.telemetry import get_telemetry_client
426  
427  assert get_telemetry_client() is None, "Telemetry is enabled"
428  ```
429  
430  ### How to opt-out for your organization?
431  
432  Aside from setting the environment variables described above, organizations can additionally opt out of telemetry by blocking network access to the `mlflow-telemetry.io` domain. When this domain is unreachable, telemetry will be disabled.
433  
434  ### Opting out of UI telemetry
435  
436  As described above, the admin of an MLflow server can set the `MLFLOW_DISABLE_TELEMETRY` or `DO_NOT_TRACK`
437  environment variables to disable UI telemetry globally for the server. However, if you are not
438  an admin (i.e. you have no ability to set environment variables), you can still personally opt
439  out from UI telemetry by visiting the "Settings" page in the MLflow UI (introduced in MLflow 3.8.0).
440  
441  Setting the toggle to "Off" will disable UI telemetry from your device, even if the admin has not
442  opted out server-side.
443  
444  ## What are we doing with this data?
445  
446  We aggregate anonymized usage data and plan to share insights with the community through public dashboards. You'll be able to see how MLflow features are used and help improve them by contributing.