/ CONTRIBUTING.md
CONTRIBUTING.md
1 # Contributing to Haystack 2 3 First off, thanks for taking the time to contribute! :blue_heart: 4 5 All types of contributions are encouraged and valued. See the [Table of Contents](#table-of-contents) 6 for different ways to help and details about how this project handles them. Please make sure to read 7 the relevant section before making your contribution. It will make it a lot easier for us maintainers 8 and smooth out the experience for all involved. The community looks forward to your contributions! 9 10 > [!TIP] 11 > If you like Haystack but just don't have time to contribute, that's fine. There are other easy ways to support the 12 > project and show your appreciation: star this repository ⭐, mention Haystack at local meetups and tell your 13 > friends/colleagues, or share what you build and tag [Haystack on X (Twitter)](https://x.com/Haystack_ai) and 14 > [Haystack on LinkedIn](https://www.linkedin.com/showcase/haystack-ai-framework) — we'd love to see it! 15 16 ## Your first PR — high-level to-do list 17 18 Use this checklist to stay on track for your first code PR: 19 20 - **Pick an issue** — Choose one labeled [good first issue](https://github.com/deepset-ai/haystack/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) or [contributions wanted](https://github.com/deepset-ai/haystack/issues?q=is%3Aissue%20state%3Aopen%20label%3A"Contributions%20wanted!"). Avoid issues marked or commented as [handled internally](#issues-not-open-for-external-contributions). 21 - **Fork and clone** — [Clone the repository](#clone-the-git-repository), run `pre-commit install`, and create a branch. 22 - **Set up and run** — [Set up your development environment](#setting-up-your-development-environment), run unit tests with `hatch run test:unit` and run quality checks with `hatch run test:types` and `hatch run fmt`. 23 - **Implement and test** — Make your changes, add or update tests as needed, and ensure tests and pre-commit checks pass locally. 24 - **Documentation** — If your change adds or alters user-facing behavior, add a new docs page or update the relevant one in `docs-website/` (edit under `docs/` for the next release; add new pages to `sidebars.js`). See the [Documentation Contributing Guide](docs-website/CONTRIBUTING.md) for where to edit, frontmatter, and navigation. 25 - **Release notes** — Add a release note under `releasenotes/notes` with `hatch run release-note your-change-name` (see [Release notes](#release-notes)); maintainers can add `ignore-for-release-notes` for tests-only or CI-only changes. 26 - **Open the PR** — Use a [conventional commit](https://www.conventionalcommits.org/en/v1.0.0/) title, fill the [PR template](.github/pull_request_template.md), and if the PR was fully AI-generated, add a [short disclaimer](#using-ai-assistants-to-contribute). Enable "Allow edits and access to secrets by maintainers" on the PR. 27 - **Sign the CLA** — A [Contributor Licence Agreement (CLA)](https://cla-assistant.io/deepset-ai/haystack) is required for all contributions. Sign when prompted so your PR is ready for review (see [CLA](#contributor-licence-agreement-cla)). 28 - **Once the PR is open** — Fix any [CI](#ci-continuous-integration) failures and address review feedback. 29 30 **Table of Contents** 31 32 - [Contributing to Haystack](#contributing-to-haystack) 33 - [Your first PR — high-level to-do list](#your-first-pr--high-level-to-do-list) 34 - [Code of Conduct](#code-of-conduct) 35 - [I Have a Question](#i-have-a-question) 36 - [Reporting Bugs](#reporting-bugs) 37 - [Before Submitting a Bug Report](#before-submitting-a-bug-report) 38 - [How Do I Submit a Good Bug Report?](#how-do-i-submit-a-good-bug-report) 39 - [Suggesting Enhancements](#suggesting-enhancements) 40 - [Before Submitting an Enhancement](#before-submitting-an-enhancement) 41 - [How Do I Submit a Good Enhancement Suggestion?](#how-do-i-submit-a-good-enhancement-suggestion) 42 - [Contributing to Documentation](#contributing-to-documentation) 43 - [Contribute code](#contribute-code) 44 - [Where to start](#where-to-start) 45 - [Issues not open for external contributions](#issues-not-open-for-external-contributions) 46 - [Example high-quality contributions](#example-high-quality-contributions) 47 - [Using AI assistants to contribute](#using-ai-assistants-to-contribute) 48 - [Setting up your development environment](#setting-up-your-development-environment) 49 - [Clone the git repository](#clone-the-git-repository) 50 - [Run the tests locally](#run-the-tests-locally) 51 - [Requirements for Pull Requests](#requirements-for-pull-requests) 52 - [Release notes](#release-notes) 53 - [CI (Continuous Integration)](#ci-continuous-integration) 54 - [Working from GitHub forks](#working-from-github-forks) 55 - [Writing tests](#writing-tests) 56 - [Unit test](#unit-test) 57 - [Integration test](#integration-test) 58 - [End to End (e2e) test](#end-to-end-e2e-test) 59 - [Slow/unstable integration tests (for maintainers)](#slowunstable-integration-tests-for-maintainers) 60 - [Contributor Licence Agreement (CLA)](#contributor-licence-agreement-cla) 61 62 ## Code of Conduct 63 64 This project and everyone participating in it is governed by our [Code of Conduct](code_of_conduct.txt). 65 By participating, you are expected to uphold this code. Please report unacceptable behavior to haystack@deepset.ai. 66 67 ## I Have a Question 68 69 Before you ask a question, it is best to search for existing [Issues](https://github.com/deepset-ai/haystack/issues) that might help you. In case you have 70 found a suitable issue and still need clarification, you can write your question in this issue. It is also advisable to 71 search the internet for answers first. 72 73 If you then still feel the need to ask a question and need clarification, you can use [Haystack's Discord Server](https://discord.com/invite/xYvH6drSmA). 74 75 ## Reporting Bugs 76 77 ### Before Submitting a Bug Report 78 79 A good bug report shouldn't leave others needing to chase you up for more information. Therefore, we ask you to 80 investigate carefully, collect information, and describe the issue in detail in your report. Please complete the 81 following steps in advance to help us fix any potential bugs as fast as possible. 82 83 - Make sure that you are using the latest version. 84 - Determine if your bug is really a bug and not an error on your side, for example, using incompatible versions. 85 Make sure that you have read the [documentation](https://docs.haystack.deepset.ai/docs/intro). If you are looking 86 for support, you might want to check [this section](#i-have-a-question). 87 - To see if other users have experienced (and potentially already solved) the same issue you are having, check if there 88 is not already a bug report existing for your bug or error in the [bug tracker](https://github.com/deepset-ai/haystack/issues). 89 - Also make sure to search the internet (including Stack Overflow) to see if users outside of the GitHub community have 90 discussed the issue. 91 - Collect information about the bug: 92 - OS, Platform and Version (Windows, Linux, macOS, x86, ARM) 93 - Version of Haystack and the integrations you're using 94 - Possibly your input and the output 95 - If you can reliably reproduce the issue, a snippet of code we can use 96 97 ### How Do I Submit a Good Bug Report? 98 99 > [!IMPORTANT] 100 > You must never report security-related issues, vulnerabilities, or bugs, including sensitive information, to the issue tracker, or elsewhere in public. Instead, sensitive bugs must be reported using [this link](https://github.com/deepset-ai/haystack/security/advisories/new). 101 102 We use GitHub issues to track bugs and errors. If you run into an issue with the project: 103 104 - Open an [Issue of type Bug Report](https://github.com/deepset-ai/haystack/issues/new?assignees=&labels=bug&projects=&template=bug_report.md&title=). 105 - Explain the behavior you would expect and the actual behavior. 106 - Please provide as much context as possible and describe the *reproduction steps* that someone else can follow to 107 recreate the issue on their own. This usually includes your code. For good bug reports, you should isolate the problem 108 and create a reduced test case. 109 - Provide the information you collected in the previous section. 110 111 Once it's filed: 112 113 - The project team will label the issue accordingly. 114 - A team member will try to reproduce the issue with your provided steps. If there are no reproduction steps or no 115 obvious way to reproduce the issue, the team will ask you for those steps. 116 - If the team is able to reproduce it, the issue will be scheduled for a fix or left to be 117 [picked up by a community contributor](https://github.com/deepset-ai/haystack/issues?q=is%3Aissue%20state%3Aopen%20label%3A"Contributions%20wanted!"). 118 119 ## Suggesting Enhancements 120 121 This section guides you through submitting an enhancement suggestion, including new integrations and improvements 122 to existing ones. Following these guidelines will help maintainers and the community to understand your suggestion and 123 find related suggestions. 124 125 ### Before Submitting an Enhancement 126 127 - Make sure that you are using the latest version. 128 - Read the [documentation](https://docs.haystack.deepset.ai/docs/intro) carefully and find out if the functionality 129 is already covered, possibly via particular configuration parameters. 130 - Perform a [search](https://github.com/deepset-ai/haystack/issues) to see if the enhancement has already been suggested. If it has, add a comment to the 131 existing issue instead of opening a new one. 132 - Find out whether your idea fits with the scope and aims of the project. It's up to you to make a strong case to 133 convince the project's developers of the merits of this feature. Keep in mind that we want features that will be 134 useful to the majority of our users and not just a small subset. If you're just targeting a minority of users, 135 consider writing and distributing the integration on your own. 136 137 ### How Do I Submit a Good Enhancement Suggestion? 138 139 Enhancement suggestions are tracked as GitHub issues of type [Feature request](https://github.com/deepset-ai/haystack/issues/new?template=feature_request.md). 140 141 - Use a **clear and descriptive title** for the issue to identify the suggestion. 142 - Fill in the issue following the template 143 144 ## Contributing to Documentation 145 146 If you'd like to improve the documentation by fixing errors, clarifying explanations, adding examples, or creating new guides, see the [Documentation Contributing Guide](docs-website/CONTRIBUTING.md). 147 148 ## Contribute code 149 150 > [!IMPORTANT] 151 > When contributing to this project, you must agree that you have authored or carefully reviewed 100% of the content, that you have the necessary rights to the content and that the content you contribute may be provided under the project license. 152 153 ### Where to start 154 155 If this is your first code contribution, a good starting point is looking for an open issue that's marked with the label 156 ["good first issue"](https://github.com/deepset-ai/haystack/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22). 157 The core contributors periodically mark certain issues as good for first-time contributors. Those issues are usually 158 limited in scope, easily fixable and low priority, so there is absolutely no reason why you should not try fixing them. 159 It's a good excuse to start looking into the project and a safe space to experiment and fail: if you don't get the 160 grasp of something, pick another one! Once you become comfortable contributing to Haystack, you can have a look at the 161 list of issues marked as [contributions wanted](https://github.com/orgs/deepset-ai/projects/14/views/1) to look for your 162 next contribution! 163 164 ### Issues not open for external contributions 165 166 Some issues are handled internally by the core team and are **not open for external contributions**. You may see a 167 comment on such issues like: 168 169 > 👋 Hello there! This issue will be handled internally and isn't open for external contributions. If you'd like to contribute, please take a look at issues labeled **contributions welcome** or **good first issue**. We'd really appreciate it! 170 171 > [!WARNING] 172 > **Please do not open pull requests for issues that are marked or commented as handled internally.** Your work may not be merged. Instead, look for issues labeled [good first issue](https://github.com/deepset-ai/haystack/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) or [contributions wanted](https://github.com/deepset-ai/haystack/issues?q=is%3Aissue%20state%3Aopen%20label%3A"Contributions%20wanted!") — we'd love your help there! 173 174 ### Example high-quality contributions 175 176 Looking at strong pull requests is a great way to learn our standards. Example high-quality PRs: [#9270](https://github.com/deepset-ai/haystack/pull/9270), [#9227](https://github.com/deepset-ai/haystack/pull/9227), [#9271](https://github.com/deepset-ai/haystack/pull/9271), [#8648](https://github.com/deepset-ai/haystack/pull/8648), [#8767](https://github.com/deepset-ai/haystack/pull/8767). Use them as references for structure, testing, documentation, and how to describe changes in the PR description and release notes. 177 178 ### Using AI assistants to contribute 179 180 You may use AI assistants or agents to help you implement a contribution. Please use them wisely: 181 182 - **Review and understand** all generated code before submitting. You are responsible for the contribution. 183 - **Run tests and checks** locally (e.g. `hatch run test:unit`, `hatch run fmt`) so your PR meets our quality bar. 184 - **If your PR was fully AI-generated**, add a short disclaimer in the PR description, for example: *"This PR was 185 fully generated with an AI assistant. I have reviewed the changes and run the relevant tests."* 186 187 This helps maintainers and keeps the project ready for both human and AI contributors. 188 189 ### Setting up your development environment 190 191 *To run Haystack tests locally, ensure your development environment uses Python >=3.10 and <3.14.* 192 193 Haystack makes heavy use of [Hatch](https://hatch.pypa.io/latest/), a Python project manager that we use to set up the 194 virtual environments, build the project, and publish packages. As you can imagine, the first step towards becoming a 195 Haystack contributor is installing Hatch. There are a variety of installation methods depending on your operating system 196 platform, version, and personal taste: please have a look at [this page](https://hatch.pypa.io/latest/install/#installation) 197 and keep reading once you can run from your terminal: 198 199 ```console 200 $ hatch --version 201 Hatch, version 1.14.1 202 ``` 203 204 You create a new virtual environment for Haystack with `hatch` by running: 205 206 ```console 207 $ hatch shell 208 ``` 209 210 ### Clone the git repository 211 212 You won't be able to make changes directly to this repo, so the first step is to [create a fork](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo). 213 Once your fork is ready, you can clone a local copy with: 214 215 ```console 216 $ git clone https://github.com/YOUR-USERNAME/haystack 217 ``` 218 219 If everything worked, you should be able to do something like this (the output might be different): 220 221 ```console 222 $ cd haystack 223 224 $ hatch version 225 2.3.0-rc0 226 ``` 227 228 Last, enter the virtual environment: 229 230 ```console 231 $ hatch shell 232 ``` 233 234 and install the pre-commit hooks: 235 236 ```console 237 pre-commit install 238 ``` 239 240 Note: It is important to run `pre-commit install` inside the virtual environment created with `hatch shell`. If you don't, you'll get an error message like this: `pre-commit: command not found`. 241 242 pre-commit will run some tasks right before all `git commit` operations. From now on, your `git commit` output for Haystack should look something like this: 243 244 ```console 245 > git commit -m "test" 246 check python ast.........................................................Passed 247 check json...........................................(no files to check)Skipped 248 check for merge conflicts................................................Passed 249 check that scripts with shebangs are executable..........................Passed 250 check toml...........................................(no files to check)Skipped 251 check yaml...........................................(no files to check)Skipped 252 fix end of files.........................................................Passed 253 mixed line ending........................................................Passed 254 don't commit to branch...................................................Passed 255 trim trailing whitespace.................................................Passed 256 ruff.....................................................................Passed 257 codespell................................................................Passed 258 Lint GitHub Actions workflow files...................(no files to check)Skipped 259 [massi/contrib d18a2577] test 260 2 files changed, 178 insertions(+), 45 deletions(-) 261 ``` 262 263 ### Run the tests locally 264 265 Tests will automatically run in our CI for every commit you push to your PR on Github. In order to save precious CI time, we encourage you to run the tests locally before pushing new commits to Github. From the root of the git repository, you can run all the unit tests like this: 266 267 ```sh 268 hatch run test:unit 269 ``` 270 271 Hatch will create a dedicated virtual environment, sync the required dependencies and run all the unit tests from the 272 project. If you want to run a subset of the tests or even one test in particular, `hatch` will accept all the 273 options you would normally pass to `pytest`, for example: 274 275 ```sh 276 # run one test method from a specific test class in a test file 277 hatch run test:unit test/test_logging.py::TestSkipLoggingConfiguration::test_skip_logging_configuration 278 ``` 279 280 ### Run code quality checks locally 281 282 We also use tools to ensure consistent code style, quality, and static type checking. The quality of your code will be 283 tested by the CI, but once again, running the checks locally will speed up the review cycle. 284 285 286 To check for static type errors, run: 287 ```sh 288 hatch run test:types 289 ``` 290 291 To format your code and perform linting using Ruff (with automatic fixes), run: 292 ```sh 293 hatch run fmt 294 ``` 295 296 297 ## Requirements for Pull Requests 298 299 To ease the review process, please follow the instructions in this paragraph when creating a Pull Request: 300 301 - For the title, use the [conventional commit convention](https://www.conventionalcommits.org/en/v1.0.0/). 302 - For the body, follow the existing [pull request template](https://github.com/deepset-ai/haystack/blob/main/.github/pull_request_template.md) to describe and document your changes. 303 - If you used an AI assistant and the PR was **fully AI-generated**, include a brief disclaimer in the PR description 304 (see [Using AI assistants to contribute](#using-ai-assistants-to-contribute)). 305 306 ### Release notes 307 308 Each PR must include a release notes file under the `releasenotes/notes` path created with `reno`, and a CI check will 309 fail if that's not the case. Pull requests with changes limited to tests, code comments or docstrings, and changes to 310 the CI/CD systems can be labeled with `ignore-for-release-notes` by a maintainer in order to bypass the CI check. 311 312 For example, if your PR is bumping the `transformers` version in the `pyproject.toml` file, that's something that 313 requires release notes. To create the corresponding file, from the root of the repo run: 314 315 ``` 316 $ hatch run release-note bump-transformers-to-4-31 317 ``` 318 319 A release notes file in YAML format will be created in the appropriate folder, appending a unique id to the name of the 320 release note you provided (in this case, `bump-transformers-to-4-31`). To add the actual content of the release notes, 321 you must edit the file that's just been created. In the file, you will find multiple sections along with an explanation 322 of what they're for. You have to remove all the sections that don't fit your release notes, in this case for example 323 you would fill in the `enhancements` section to describe the change: 324 325 ```yaml 326 enhancements: 327 - | 328 Upgrade transformers to the latest version 4.31.0 so that Haystack can support the new LLama2 models. 329 ``` 330 331 Each section of the YAML file must follow [reStructuredText formatting](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html). 332 333 For inline code, use double backticks to wrap the code. 334 ``` 335 ``OpenAIChatGenerator`` 336 ``` 337 338 For code blocks, use the [code block directive](https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#directive-code-block). 339 340 ``` 341 .. code:: python 342 from haystack.dataclasses import ChatMessage 343 344 message = ChatMessage.from_user("Hello!") 345 print(message.text) 346 ``` 347 348 You can now add the file to the same branch containing the code changes. Your release note will be part of your pull 349 request and reviewed along with any code you changed. 350 351 ## CI (Continuous Integration) 352 353 We use GitHub Action for our Continuous Integration tasks. This means that as soon as you open a PR, GitHub will start 354 executing some workflows on your changes, like automated tests, linting, formatting, api docs generation, etc. 355 356 If all goes well, at the bottom of your PR page you should see something like this, where all checks are green. 357 358  359 360 If you see some red checks (like the following), then something didn't work, and action is needed on your side. 361 362  363 364 Click on the failing test and see if there are instructions at the end of the logs of the failed test. 365 For example, in the case above, the CI will give you instructions on how to fix the issue. 366 367  368 369 ## Working from GitHub forks 370 371 To help maintainers, we usually ask contributors to grant us push access to their fork. 372 373 To do so, please verify that "Allow edits and access to secrets by maintainers" on the PR preview page is checked 374 (you can check it later on the PR's sidebar once it's created). 375 376  377 378 ## Writing tests 379 380 We formally define three scopes for tests in Haystack with different requirements and purposes: 381 382 ### Unit test 383 - Tests a single logical concept 384 - Execution time is a few milliseconds 385 - Any external resource is mocked 386 - Always returns the same result 387 - Can run in any order 388 - Runs at every commit in PRs, automated through `hatch run test:unit` 389 - Can run locally with no additional setup 390 - **Goal: being confident in merging code** 391 392 ### Integration test 393 - Tests a single logical concept 394 - Execution time is a few seconds 395 - It uses external resources that must be available before execution 396 - When using models, cannot use inference 397 - Always returns the same result or an error 398 - Can run in any order 399 - Runs at every commit in PRs, automated through `hatch run test:integration` 400 - Can run locally with some additional setup (e.g. Docker) 401 - **Goal: being confident in merging code** 402 403 ### End to End (e2e) test 404 - Tests a sequence of multiple logical concepts 405 - Execution time has no limits (can be always on) 406 - Can use inference 407 - Evaluates the results of the execution or the status of the system 408 - It uses external resources that must be available before execution 409 - Can return different results 410 - Can be dependent on the order 411 - Can be wrapped into any process execution 412 - Runs outside the development cycle (nightly or on demand) 413 - Might not be possible to run locally due to system and hardware requirements 414 - **Goal: being confident in releasing Haystack** 415 416 ### Slow/unstable Integration Tests (for maintainers) 417 418 To keep the CI stable and reasonably fast, we run certain tests in a separate workflow. 419 420 We use `@pytest.mark.slow` for tests that clearly meet one or more of the following conditions: 421 - Unstable (such as call unstable external services) 422 - Slow (such as model inference on CPU) 423 - Require special setup (such as installing system dependencies, running Docker containers). 424 425 ⚠️ The main goal of this separation is to keep the regular integration tests fast and **stable**. 426 427 We should try to avoid including too many modules in the Slow Integration Tests workflow: doing so may reduce its effectiveness. 428 429 #### How does it work? 430 431 These tests are executed by the [Slow Integration Tests workflow](.github/workflows/slow.yml). 432 433 The workflow always runs, but the tests only execute when: 434 435 - There are changes to relevant files (as listed in the [workflow file](.github/workflows/slow.yml)). 436 **Important**: If you mark a test but do not include both the test file and the file to be tested in the list, the test won't run automatically. 437 - The workflow is scheduled (runs nightly). 438 - The workflow is triggered manually (with the "Run workflow" button on [this page](https://github.com/deepset-ai/haystack/actions/workflows/slow.yml)). 439 - The PR has the "run-slow-tests" label (you can use this label to trigger the tests even if no relevant files are changed). 440 - The push is to a release branch. 441 442 If none of the above conditions are met, the workflow completes successfully without running tests to satisfy Branch Protection rules. 443 444 *Hatch commands for running Integration Tests*: 445 - `hatch run test:integration` runs all integrations tests (fast + slow). 446 - `hatch run test:integration-only-fast` skips the slow tests. 447 - `hatch run test:integration-only-slow` runs only slow tests. 448 449 ## Contributor Licence Agreement (CLA) 450 451 Significant contributions to Haystack require a Contributor License Agreement (CLA). If the contribution requires a CLA, 452 we will get in contact with you. CLAs are quite common among company-backed open-source frameworks, and our CLA’s wording 453 is similar to other popular projects, like [Rasa](https://cla-assistant.io/RasaHQ/rasa) or 454 [Google's Tensorflow](https://cla.developers.google.com/clas/new?domain=DOMAIN_GOOGLE&kind=KIND_INDIVIDUAL) 455 (retrieved 4th November 2021). 456 457 The agreement's main purpose is to protect the continued open use of Haystack. At the same time, it also helps in 458 \protecting you as a contributor. Contributions under this agreement will ensure that your code will continue to be 459 open to everyone in the future (“You hereby grant to Deepset **and anyone** [...]”) as well as remove liabilities on 460 your end (“you provide your Contributions on an AS IS basis, without warranties or conditions of any kind [...]”). You 461 can find the Contributor Licence Agreement [here](https://cla-assistant.io/deepset-ai/haystack). 462 463 If you have further questions about the licensing, feel free to reach out to contributors@deepset.ai.