Cradicle Explorer

/ CONTRIBUTING.md
CONTRIBUTING.md
  1  # Contributing to Haystack
  2  
  3  First off, thanks for taking the time to contribute! :blue_heart:
  4  
  5  All types of contributions are encouraged and valued. See the [Table of Contents](#table-of-contents)
  6  for different ways to help and details about how this project handles them. Please make sure to read
  7  the relevant section before making your contribution. It will make it a lot easier for us maintainers
  8  and smooth out the experience for all involved. The community looks forward to your contributions!
  9  
 10  > [!TIP]
 11  > If you like Haystack but just don't have time to contribute, that's fine. There are other easy ways to support the
 12  > project and show your appreciation: star this repository ⭐, mention Haystack at local meetups and tell your
 13  > friends/colleagues, or share what you build and tag [Haystack on X (Twitter)](https://x.com/Haystack_ai) and
 14  > [Haystack on LinkedIn](https://www.linkedin.com/showcase/haystack-ai-framework) — we'd love to see it!
 15  
 16  ## Your first PR — high-level to-do list
 17  
 18  Use this checklist to stay on track for your first code PR:
 19  
 20  - **Pick an issue** — Choose one labeled [good first issue](https://github.com/deepset-ai/haystack/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) or [contributions wanted](https://github.com/deepset-ai/haystack/issues?q=is%3Aissue%20state%3Aopen%20label%3A"Contributions%20wanted!"). Avoid issues marked or commented as [handled internally](#issues-not-open-for-external-contributions).
 21  - **Fork and clone** — [Clone the repository](#clone-the-git-repository), run `pre-commit install`, and create a branch.
 22  - **Set up and run** — [Set up your development environment](#setting-up-your-development-environment), run unit tests with `hatch run test:unit` and run quality checks with `hatch run test:types` and `hatch run fmt`.
 23  - **Implement and test** — Make your changes, add or update tests as needed, and ensure tests and pre-commit checks pass locally.
 24  - **Documentation** — If your change adds or alters user-facing behavior, add a new docs page or update the relevant one in `docs-website/` (edit under `docs/` for the next release; add new pages to `sidebars.js`). See the [Documentation Contributing Guide](docs-website/CONTRIBUTING.md) for where to edit, frontmatter, and navigation.
 25  - **Release notes** — Add a release note under `releasenotes/notes` with `hatch run release-note your-change-name` (see [Release notes](#release-notes)); maintainers can add `ignore-for-release-notes` for tests-only or CI-only changes.
 26  - **Open the PR** — Use a [conventional commit](https://www.conventionalcommits.org/en/v1.0.0/) title, fill the [PR template](.github/pull_request_template.md), and if the PR was fully AI-generated, add a [short disclaimer](#using-ai-assistants-to-contribute). Enable "Allow edits and access to secrets by maintainers" on the PR.
 27  - **Sign the CLA** — A [Contributor Licence Agreement (CLA)](https://cla-assistant.io/deepset-ai/haystack) is required for all contributions. Sign when prompted so your PR is ready for review (see [CLA](#contributor-licence-agreement-cla)).
 28  - **Once the PR is open** — Fix any [CI](#ci-continuous-integration) failures and address review feedback.
 29  
 30  **Table of Contents**
 31  
 32  - [Contributing to Haystack](#contributing-to-haystack)
 33    - [Your first PR — high-level to-do list](#your-first-pr--high-level-to-do-list)
 34    - [Code of Conduct](#code-of-conduct)
 35    - [I Have a Question](#i-have-a-question)
 36    - [Reporting Bugs](#reporting-bugs)
 37      - [Before Submitting a Bug Report](#before-submitting-a-bug-report)
 38      - [How Do I Submit a Good Bug Report?](#how-do-i-submit-a-good-bug-report)
 39    - [Suggesting Enhancements](#suggesting-enhancements)
 40      - [Before Submitting an Enhancement](#before-submitting-an-enhancement)
 41      - [How Do I Submit a Good Enhancement Suggestion?](#how-do-i-submit-a-good-enhancement-suggestion)
 42    - [Contributing to Documentation](#contributing-to-documentation)
 43    - [Contribute code](#contribute-code)
 44      - [Where to start](#where-to-start)
 45      - [Issues not open for external contributions](#issues-not-open-for-external-contributions)
 46      - [Example high-quality contributions](#example-high-quality-contributions)
 47      - [Using AI assistants to contribute](#using-ai-assistants-to-contribute)
 48      - [Setting up your development environment](#setting-up-your-development-environment)
 49      - [Clone the git repository](#clone-the-git-repository)
 50      - [Run the tests locally](#run-the-tests-locally)
 51    - [Requirements for Pull Requests](#requirements-for-pull-requests)
 52      - [Release notes](#release-notes)
 53    - [CI (Continuous Integration)](#ci-continuous-integration)
 54    - [Working from GitHub forks](#working-from-github-forks)
 55    - [Writing tests](#writing-tests)
 56      - [Unit test](#unit-test)
 57      - [Integration test](#integration-test)
 58      - [End to End (e2e) test](#end-to-end-e2e-test)
 59      - [Slow/unstable integration tests (for maintainers)](#slowunstable-integration-tests-for-maintainers)
 60    - [Contributor Licence Agreement (CLA)](#contributor-licence-agreement-cla)
 61  
 62  ## Code of Conduct
 63  
 64  This project and everyone participating in it is governed by our [Code of Conduct](code_of_conduct.txt).
 65  By participating, you are expected to uphold this code. Please report unacceptable behavior to haystack@deepset.ai.
 66  
 67  ## I Have a Question
 68  
 69  Before you ask a question, it is best to search for existing [Issues](https://github.com/deepset-ai/haystack/issues) that might help you. In case you have
 70  found a suitable issue and still need clarification, you can write your question in this issue. It is also advisable to
 71  search the internet for answers first.
 72  
 73  If you then still feel the need to ask a question and need clarification, you can use [Haystack's Discord Server](https://discord.com/invite/xYvH6drSmA).
 74  
 75  ## Reporting Bugs
 76  
 77  ### Before Submitting a Bug Report
 78  
 79  A good bug report shouldn't leave others needing to chase you up for more information. Therefore, we ask you to
 80  investigate carefully, collect information, and describe the issue in detail in your report. Please complete the
 81  following steps in advance to help us fix any potential bugs as fast as possible.
 82  
 83  - Make sure that you are using the latest version.
 84  - Determine if your bug is really a bug and not an error on your side, for example, using incompatible versions.
 85    Make sure that you have read the [documentation](https://docs.haystack.deepset.ai/docs/intro). If you are looking
 86    for support, you might want to check [this section](#i-have-a-question).
 87  - To see if other users have experienced (and potentially already solved) the same issue you are having, check if there
 88    is not already a bug report existing for your bug or error in the [bug tracker](https://github.com/deepset-ai/haystack/issues).
 89  - Also make sure to search the internet (including Stack Overflow) to see if users outside of the GitHub community have
 90    discussed the issue.
 91  - Collect information about the bug:
 92    - OS, Platform and Version (Windows, Linux, macOS, x86, ARM)
 93    - Version of Haystack and the integrations you're using
 94    - Possibly your input and the output
 95    - If you can reliably reproduce the issue, a snippet of code we can use
 96  
 97  ### How Do I Submit a Good Bug Report?
 98  
 99  > [!IMPORTANT]
100  > You must never report security-related issues, vulnerabilities, or bugs, including sensitive information, to the issue tracker, or elsewhere in public. Instead, sensitive bugs must be reported using [this link](https://github.com/deepset-ai/haystack/security/advisories/new).
101  
102  We use GitHub issues to track bugs and errors. If you run into an issue with the project:
103  
104  - Open an [Issue of type Bug Report](https://github.com/deepset-ai/haystack/issues/new?assignees=&labels=bug&projects=&template=bug_report.md&title=).
105  - Explain the behavior you would expect and the actual behavior.
106  - Please provide as much context as possible and describe the *reproduction steps* that someone else can follow to
107    recreate the issue on their own. This usually includes your code. For good bug reports, you should isolate the problem
108    and create a reduced test case.
109  - Provide the information you collected in the previous section.
110  
111  Once it's filed:
112  
113  - The project team will label the issue accordingly.
114  - A team member will try to reproduce the issue with your provided steps. If there are no reproduction steps or no
115    obvious way to reproduce the issue, the team will ask you for those steps.
116  - If the team is able to reproduce it, the issue will be scheduled for a fix or left to be
117    [picked up by a community contributor](https://github.com/deepset-ai/haystack/issues?q=is%3Aissue%20state%3Aopen%20label%3A"Contributions%20wanted!").
118  
119  ## Suggesting Enhancements
120  
121  This section guides you through submitting an enhancement suggestion, including new integrations and improvements
122  to existing ones. Following these guidelines will help maintainers and the community to understand your suggestion and
123  find related suggestions.
124  
125  ### Before Submitting an Enhancement
126  
127  - Make sure that you are using the latest version.
128  - Read the [documentation](https://docs.haystack.deepset.ai/docs/intro) carefully and find out if the functionality
129    is already covered, possibly via particular configuration parameters.
130  - Perform a [search](https://github.com/deepset-ai/haystack/issues) to see if the enhancement has already been suggested. If it has, add a comment to the
131    existing issue instead of opening a new one.
132  - Find out whether your idea fits with the scope and aims of the project. It's up to you to make a strong case to
133    convince the project's developers of the merits of this feature. Keep in mind that we want features that will be
134    useful to the majority of our users and not just a small subset. If you're just targeting a minority of users,
135    consider writing and distributing the integration on your own.
136  
137  ### How Do I Submit a Good Enhancement Suggestion?
138  
139  Enhancement suggestions are tracked as GitHub issues of type [Feature request](https://github.com/deepset-ai/haystack/issues/new?template=feature_request.md).
140  
141  - Use a **clear and descriptive title** for the issue to identify the suggestion.
142  - Fill in the issue following the template
143  
144  ## Contributing to Documentation
145  
146  If you'd like to improve the documentation by fixing errors, clarifying explanations, adding examples, or creating new guides, see the [Documentation Contributing Guide](docs-website/CONTRIBUTING.md).
147  
148  ## Contribute code
149  
150  > [!IMPORTANT]
151  > When contributing to this project, you must agree that you have authored or carefully reviewed 100% of the content, that you have the necessary rights to the content and that the content you contribute may be provided under the project license.
152  
153  ### Where to start
154  
155  If this is your first code contribution, a good starting point is looking for an open issue that's marked with the label
156  ["good first issue"](https://github.com/deepset-ai/haystack/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
157  The core contributors periodically mark certain issues as good for first-time contributors. Those issues are usually
158  limited in scope, easily fixable and low priority, so there is absolutely no reason why you should not try fixing them.
159  It's a good excuse to start looking into the project and a safe space to experiment and fail: if you don't get the
160  grasp of something, pick another one! Once you become comfortable contributing to Haystack, you can have a look at the
161  list of issues marked as [contributions wanted](https://github.com/orgs/deepset-ai/projects/14/views/1) to look for your
162  next contribution!
163  
164  ### Issues not open for external contributions
165  
166  Some issues are handled internally by the core team and are **not open for external contributions**. You may see a
167  comment on such issues like:
168  
169  > 👋 Hello there! This issue will be handled internally and isn't open for external contributions. If you'd like to contribute, please take a look at issues labeled **contributions welcome** or **good first issue**. We'd really appreciate it!
170  
171  > [!WARNING]
172  > **Please do not open pull requests for issues that are marked or commented as handled internally.** Your work may not be merged. Instead, look for issues labeled [good first issue](https://github.com/deepset-ai/haystack/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) or [contributions wanted](https://github.com/deepset-ai/haystack/issues?q=is%3Aissue%20state%3Aopen%20label%3A"Contributions%20wanted!") — we'd love your help there!
173  
174  ### Example high-quality contributions
175  
176  Looking at strong pull requests is a great way to learn our standards. Example high-quality PRs: [#9270](https://github.com/deepset-ai/haystack/pull/9270), [#9227](https://github.com/deepset-ai/haystack/pull/9227), [#9271](https://github.com/deepset-ai/haystack/pull/9271), [#8648](https://github.com/deepset-ai/haystack/pull/8648), [#8767](https://github.com/deepset-ai/haystack/pull/8767). Use them as references for structure, testing, documentation, and how to describe changes in the PR description and release notes.
177  
178  ### Using AI assistants to contribute
179  
180  You may use AI assistants or agents to help you implement a contribution. Please use them wisely:
181  
182  - **Review and understand** all generated code before submitting. You are responsible for the contribution.
183  - **Run tests and checks** locally (e.g. `hatch run test:unit`, `hatch run fmt`) so your PR meets our quality bar.
184  - **If your PR was fully AI-generated**, add a short disclaimer in the PR description, for example: *"This PR was
185    fully generated with an AI assistant. I have reviewed the changes and run the relevant tests."*
186  
187  This helps maintainers and keeps the project ready for both human and AI contributors.
188  
189  ### Setting up your development environment
190  
191  *To run Haystack tests locally, ensure your development environment uses Python >=3.10 and <3.14.*
192  
193  Haystack makes heavy use of [Hatch](https://hatch.pypa.io/latest/), a Python project manager that we use to set up the
194  virtual environments, build the project, and publish packages. As you can imagine, the first step towards becoming a
195  Haystack contributor is installing Hatch. There are a variety of installation methods depending on your operating system
196  platform, version, and personal taste: please have a look at [this page](https://hatch.pypa.io/latest/install/#installation)
197  and keep reading once you can run from your terminal:
198  
199  ```console
200  $ hatch --version
201  Hatch, version 1.14.1
202  ```
203  
204  You create a new virtual environment for Haystack with `hatch` by running:
205  
206  ```console
207  $ hatch shell
208  ```
209  
210  ### Clone the git repository
211  
212  You won't be able to make changes directly to this repo, so the first step is to [create a fork](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo).
213  Once your fork is ready, you can clone a local copy with:
214  
215  ```console
216  $ git clone https://github.com/YOUR-USERNAME/haystack
217  ```
218  
219  If everything worked, you should be able to do something like this (the output might be different):
220  
221  ```console
222  $ cd haystack
223  
224  $ hatch version
225  2.3.0-rc0
226  ```
227  
228  Last, enter the virtual environment:
229  
230  ```console
231  $ hatch shell
232  ```
233  
234  and install the pre-commit hooks:
235  
236  ```console
237  pre-commit install
238  ```
239  
240  Note: It is important to run `pre-commit install` inside the virtual environment created with `hatch shell`. If you don't, you'll get an error message like this: `pre-commit: command not found`.
241  
242  pre-commit will run some tasks right before all `git commit` operations. From now on, your `git commit` output for Haystack should look something like this:
243  
244  ```console
245  > git commit -m "test"
246  check python ast.........................................................Passed
247  check json...........................................(no files to check)Skipped
248  check for merge conflicts................................................Passed
249  check that scripts with shebangs are executable..........................Passed
250  check toml...........................................(no files to check)Skipped
251  check yaml...........................................(no files to check)Skipped
252  fix end of files.........................................................Passed
253  mixed line ending........................................................Passed
254  don't commit to branch...................................................Passed
255  trim trailing whitespace.................................................Passed
256  ruff.....................................................................Passed
257  codespell................................................................Passed
258  Lint GitHub Actions workflow files...................(no files to check)Skipped
259  [massi/contrib d18a2577] test
260   2 files changed, 178 insertions(+), 45 deletions(-)
261  ```
262  
263  ### Run the tests locally
264  
265  Tests will automatically run in our CI for every commit you push to your PR on Github. In order to save precious CI time, we encourage you to run the tests locally before pushing new commits to Github. From the root of the git repository, you can run all the unit tests like this:
266  
267  ```sh
268  hatch run test:unit
269  ```
270  
271  Hatch will create a dedicated virtual environment, sync the required dependencies and run all the unit tests from the
272  project. If you want to run a subset of the tests or even one test in particular, `hatch` will accept all the
273  options you would normally pass to `pytest`, for example:
274  
275  ```sh
276  # run one test method from a specific test class in a test file
277  hatch run test:unit test/test_logging.py::TestSkipLoggingConfiguration::test_skip_logging_configuration
278  ```
279  
280  ### Run code quality checks locally
281  
282  We also use tools to ensure consistent code style, quality, and static type checking. The quality of your code will be
283  tested by the CI, but once again, running the checks locally will speed up the review cycle.
284  
285  
286  To check for static type errors, run:
287  ```sh
288  hatch run test:types
289  ```
290  
291  To format your code and perform linting using Ruff (with automatic fixes), run:
292  ```sh
293  hatch run fmt
294  ```
295  
296  
297  ## Requirements for Pull Requests
298  
299  To ease the review process, please follow the instructions in this paragraph when creating a Pull Request:
300  
301  - For the title, use the [conventional commit convention](https://www.conventionalcommits.org/en/v1.0.0/).
302  - For the body, follow the existing [pull request template](https://github.com/deepset-ai/haystack/blob/main/.github/pull_request_template.md) to describe and document your changes.
303  - If you used an AI assistant and the PR was **fully AI-generated**, include a brief disclaimer in the PR description
304    (see [Using AI assistants to contribute](#using-ai-assistants-to-contribute)).
305  
306  ### Release notes
307  
308  Each PR must include a release notes file under the `releasenotes/notes` path created with `reno`, and a CI check will
309  fail if that's not the case. Pull requests with changes limited to tests, code comments or docstrings, and changes to
310  the CI/CD systems can be labeled with `ignore-for-release-notes` by a maintainer in order to bypass the CI check.
311  
312  For example, if your PR is bumping the `transformers` version in the `pyproject.toml` file, that's something that
313  requires release notes. To create the corresponding file, from the root of the repo run:
314  
315  ```
316  $ hatch run release-note bump-transformers-to-4-31
317  ```
318  
319  A release notes file in YAML format will be created in the appropriate folder, appending a unique id to the name of the
320  release note you provided (in this case, `bump-transformers-to-4-31`). To add the actual content of the release notes,
321  you must edit the file that's just been created. In the file, you will find multiple sections along with an explanation
322  of what they're for. You have to remove all the sections that don't fit your release notes, in this case for example
323  you would fill in the `enhancements` section to describe the change:
324  
325  ```yaml
326  enhancements:
327    - |
328      Upgrade transformers to the latest version 4.31.0 so that Haystack can support the new LLama2 models.
329  ```
330  
331  Each section of the YAML file must follow [reStructuredText formatting](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html).
332  
333  For inline code, use double backticks to wrap the code.
334  ```
335  ``OpenAIChatGenerator``
336  ```
337  
338  For code blocks, use the [code block directive](https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#directive-code-block).
339  
340  ```
341  .. code:: python
342    from haystack.dataclasses import ChatMessage
343  
344    message = ChatMessage.from_user("Hello!")
345    print(message.text)
346  ```
347  
348  You can now add the file to the same branch containing the code changes. Your release note will be part of your pull
349  request and reviewed along with any code you changed.
350  
351  ## CI (Continuous Integration)
352  
353  We use GitHub Action for our Continuous Integration tasks. This means that as soon as you open a PR, GitHub will start
354  executing some workflows on your changes, like automated tests, linting, formatting, api docs generation, etc.
355  
356  If all goes well, at the bottom of your PR page you should see something like this, where all checks are green.
357  
358  ![Successful CI](images/ci-success.png)
359  
360  If you see some red checks (like the following), then something didn't work, and action is needed on your side.
361  
362  ![Failed CI](images/ci-failure-example.png)
363  
364  Click on the failing test and see if there are instructions at the end of the logs of the failed test.
365  For example, in the case above, the CI will give you instructions on how to fix the issue.
366  
367  ![Logs of failed CI, with instructions for fixing the failure](images/ci-failure-example-instructions.png)
368  
369  ## Working from GitHub forks
370  
371  To help maintainers, we usually ask contributors to grant us push access to their fork.
372  
373  To do so, please verify that "Allow edits and access to secrets by maintainers" on the PR preview page is checked
374  (you can check it later on the PR's sidebar once it's created).
375  
376  ![Allow access to your branch to maintainers](images/first_time_contributor_enable_access.png)
377  
378  ## Writing tests
379  
380  We formally define three scopes for tests in Haystack with different requirements and purposes:
381  
382  ### Unit test
383  - Tests a single logical concept
384  - Execution time is a few milliseconds
385  - Any external resource is mocked
386  - Always returns the same result
387  - Can run in any order
388  - Runs at every commit in PRs, automated through `hatch run test:unit`
389  - Can run locally with no additional setup
390  - **Goal: being confident in merging code**
391  
392  ### Integration test
393  - Tests a single logical concept
394  - Execution time is a few seconds
395  - It uses external resources that must be available before execution
396  - When using models, cannot use inference
397  - Always returns the same result or an error
398  - Can run in any order
399  - Runs at every commit in PRs, automated through `hatch run test:integration`
400  - Can run locally with some additional setup (e.g. Docker)
401  - **Goal: being confident in merging code**
402  
403  ### End to End (e2e) test
404  - Tests a sequence of multiple logical concepts
405  - Execution time has no limits (can be always on)
406  - Can use inference
407  - Evaluates the results of the execution or the status of the system
408  - It uses external resources that must be available before execution
409  - Can return different results
410  - Can be dependent on the order
411  - Can be wrapped into any process execution
412  - Runs outside the development cycle (nightly or on demand)
413  - Might not be possible to run locally due to system and hardware requirements
414  - **Goal: being confident in releasing Haystack**
415  
416  ### Slow/unstable Integration Tests (for maintainers)
417  
418  To keep the CI stable and reasonably fast, we run certain tests in a separate workflow.
419  
420  We use `@pytest.mark.slow` for tests that clearly meet one or more of the following conditions:
421  - Unstable (such as call unstable external services)
422  - Slow (such as model inference on CPU)
423  - Require special setup (such as installing system dependencies, running Docker containers).
424  
425  ⚠️ The main goal of this separation is to keep the regular integration tests fast and **stable**.
426  
427  We should try to avoid including too many modules in the Slow Integration Tests workflow: doing so may reduce its effectiveness.
428  
429  #### How does it work?
430  
431  These tests are executed by the [Slow Integration Tests workflow](.github/workflows/slow.yml).
432  
433  The workflow always runs, but the tests only execute when:
434  
435  - There are changes to relevant files (as listed in the [workflow file](.github/workflows/slow.yml)).
436    **Important**: If you mark a test but do not include both the test file and the file to be tested in the list, the test won't run automatically.
437  - The workflow is scheduled (runs nightly).
438  - The workflow is triggered manually (with the "Run workflow" button on [this page](https://github.com/deepset-ai/haystack/actions/workflows/slow.yml)).
439  - The PR has the "run-slow-tests" label (you can use this label to trigger the tests even if no relevant files are changed).
440  - The push is to a release branch.
441  
442  If none of the above conditions are met, the workflow completes successfully without running tests to satisfy Branch Protection rules.
443  
444  *Hatch commands for running Integration Tests*:
445  - `hatch run test:integration` runs all integrations tests (fast + slow).
446  - `hatch run test:integration-only-fast` skips the slow tests.
447  - `hatch run test:integration-only-slow` runs only slow tests.
448  
449  ## Contributor Licence Agreement (CLA)
450  
451  Significant contributions to Haystack require a Contributor License Agreement (CLA). If the contribution requires a CLA,
452  we will get in contact with you. CLAs are quite common among company-backed open-source frameworks, and our CLA’s wording
453  is similar to other popular projects, like [Rasa](https://cla-assistant.io/RasaHQ/rasa) or
454  [Google's Tensorflow](https://cla.developers.google.com/clas/new?domain=DOMAIN_GOOGLE&kind=KIND_INDIVIDUAL)
455  (retrieved 4th November 2021).
456  
457  The agreement's main purpose is to protect the continued open use of Haystack. At the same time, it also helps in
458  \protecting you as a contributor. Contributions under this agreement will ensure that your code will continue to be
459  open to everyone in the future (“You hereby grant to Deepset **and anyone** [...]”) as well as remove liabilities on
460  your end (“you provide your Contributions on an AS IS basis, without warranties or conditions of any kind [...]”). You
461  can find the Contributor Licence Agreement [here](https://cla-assistant.io/deepset-ai/haystack).
462  
463  If you have further questions about the licensing, feel free to reach out to contributors@deepset.ai.