firecrawl.md
  1  ---
  2  title: "Firecrawl"
  3  id: integrations-firecrawl
  4  description: "Firecrawl integration for Haystack"
  5  slug: "/integrations-firecrawl"
  6  ---
  7  
  8  
  9  ## haystack_integrations.components.fetchers.firecrawl.firecrawl_crawler
 10  
 11  ### FirecrawlCrawler
 12  
 13  A component that uses Firecrawl to crawl one or more URLs and return the content as Haystack Documents.
 14  
 15  Crawling starts from each given URL and follows links to discover subpages, up to a configurable limit.
 16  This is useful for ingesting entire websites or documentation sites, not just single pages.
 17  
 18  Firecrawl is a service that crawls websites and returns content in a structured format (e.g. Markdown)
 19  suitable for LLMs. You need a Firecrawl API key from [firecrawl.dev](https://firecrawl.dev).
 20  
 21  ### Usage example
 22  
 23  ```python
 24  from haystack_integrations.components.fetchers.firecrawl import FirecrawlFetcher
 25  
 26  fetcher = FirecrawlFetcher(
 27      api_key=Secret.from_env_var("FIRECRAWL_API_KEY"),
 28      params={"limit": 5},
 29  )
 30  fetcher.warm_up()
 31  
 32  result = fetcher.run(urls=["https://docs.haystack.deepset.ai/docs/intro"])
 33  documents = result["documents"]
 34  ```
 35  
 36  #### __init__
 37  
 38  ```python
 39  __init__(
 40      api_key: Secret = Secret.from_env_var("FIRECRAWL_API_KEY"),
 41      params: dict[str, Any] | None = None,
 42  ) -> None
 43  ```
 44  
 45  Initialize the FirecrawlFetcher.
 46  
 47  **Parameters:**
 48  
 49  - **api_key** (<code>Secret</code>) – API key for Firecrawl.
 50    Defaults to the `FIRECRAWL_API_KEY` environment variable.
 51  - **params** (<code>dict\[str, Any\] | None</code>) – Parameters for the crawl request. See the
 52    [Firecrawl API reference](https://docs.firecrawl.dev/api-reference/endpoint/crawl-post)
 53    for available parameters.
 54    Defaults to `{"limit": 1, "scrape_options": {"formats": ["markdown"]}}`.
 55    Without a limit, Firecrawl may crawl all subpages and consume credits quickly.
 56  
 57  #### run
 58  
 59  ```python
 60  run(urls: list[str], params: dict[str, Any] | None = None) -> dict[str, Any]
 61  ```
 62  
 63  Crawls the given URLs and returns the extracted content as Documents.
 64  
 65  **Parameters:**
 66  
 67  - **urls** (<code>list\[str\]</code>) – List of URLs to crawl.
 68  - **params** (<code>dict\[str, Any\] | None</code>) – Optional override of crawl parameters for this run.
 69    If provided, fully replaces the init-time params.
 70  
 71  **Returns:**
 72  
 73  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
 74  - `documents`: List of documents, one for each URL crawled.
 75  
 76  #### run_async
 77  
 78  ```python
 79  run_async(
 80      urls: list[str], params: dict[str, Any] | None = None
 81  ) -> dict[str, Any]
 82  ```
 83  
 84  Asynchronously crawls the given URLs and returns the extracted content as Documents.
 85  
 86  **Parameters:**
 87  
 88  - **urls** (<code>list\[str\]</code>) – List of URLs to crawl.
 89  - **params** (<code>dict\[str, Any\] | None</code>) – Optional override of crawl parameters for this run.
 90    If provided, fully replaces the init-time params.
 91  
 92  **Returns:**
 93  
 94  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
 95  - `documents`: List of documents, one for each URL crawled.
 96  
 97  #### warm_up
 98  
 99  ```python
100  warm_up() -> None
101  ```
102  
103  Warm up the Firecrawl client by initializing the clients.
104  This is useful to avoid cold start delays when crawling many URLs.
105  
106  ## haystack_integrations.components.websearch.firecrawl.firecrawl_websearch
107  
108  ### FirecrawlWebSearch
109  
110  A component that uses Firecrawl to search the web and return results as Haystack Documents.
111  
112  This component wraps the Firecrawl Search API, enabling web search queries that return
113  structured documents with content and links. It follows the standard Haystack WebSearch
114  component interface.
115  
116  Firecrawl is a service that crawls and scrapes websites, returning content in formats suitable
117  for LLMs. You need a Firecrawl API key from [firecrawl.dev](https://firecrawl.dev).
118  
119  ### Usage example
120  
121  ```python
122  from haystack_integrations.components.websearch.firecrawl import FirecrawlWebSearch
123  from haystack.utils import Secret
124  
125  websearch = FirecrawlWebSearch(
126      api_key=Secret.from_env_var("FIRECRAWL_API_KEY"),
127      top_k=5,
128  )
129  result = websearch.run(query="What is Haystack by deepset?")
130  documents = result["documents"]
131  links = result["links"]
132  ```
133  
134  #### __init__
135  
136  ```python
137  __init__(
138      api_key: Secret = Secret.from_env_var("FIRECRAWL_API_KEY"),
139      top_k: int | None = 10,
140      search_params: dict[str, Any] | None = None,
141  ) -> None
142  ```
143  
144  Initialize the FirecrawlWebSearch component.
145  
146  **Parameters:**
147  
148  - **api_key** (<code>Secret</code>) – API key for Firecrawl.
149    Defaults to the `FIRECRAWL_API_KEY` environment variable.
150  - **top_k** (<code>int | None</code>) – Maximum number of documents to return.
151    Defaults to 10. This can be overridden by the `"limit"` parameter in `search_params`.
152  - **search_params** (<code>dict\[str, Any\] | None</code>) – Additional parameters passed to the Firecrawl search API.
153    See the [Firecrawl API reference](https://docs.firecrawl.dev/api-reference/endpoint/search)
154    for available parameters. Supported keys include: `tbs`, `location`,
155    `scrape_options`, `sources`, `categories`, `timeout`.
156  
157  #### warm_up
158  
159  ```python
160  warm_up() -> None
161  ```
162  
163  Warm up the Firecrawl clients by initializing the sync and async clients.
164  This is useful to avoid cold start delays when performing searches.
165  
166  #### run
167  
168  ```python
169  run(query: str, search_params: dict[str, Any] | None = None) -> dict[str, Any]
170  ```
171  
172  Search the web using Firecrawl and return results as Documents.
173  
174  **Parameters:**
175  
176  - **query** (<code>str</code>) – Search query string.
177  - **search_params** (<code>dict\[str, Any\] | None</code>) – Optional override of search parameters for this run.
178    If provided, fully replaces the init-time search_params.
179  
180  **Returns:**
181  
182  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
183  - `documents`: List of documents with search result content.
184  - `links`: List of URLs from the search results.
185  
186  #### run_async
187  
188  ```python
189  run_async(
190      query: str, search_params: dict[str, Any] | None = None
191  ) -> dict[str, Any]
192  ```
193  
194  Asynchronously search the web using Firecrawl and return results as Documents.
195  
196  **Parameters:**
197  
198  - **query** (<code>str</code>) – Search query string.
199  - **search_params** (<code>dict\[str, Any\] | None</code>) – Optional override of search parameters for this run.
200    If provided, fully replaces the init-time search_params.
201  
202  **Returns:**
203  
204  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
205  - `documents`: List of documents with search result content.
206  - `links`: List of URLs from the search results.