Cradicle Explorer

/ docs-website / docs / pipeline-components / websearch / firecrawlwebsearch.mdx
firecrawlwebsearch.mdx
  1  ---
  2  title: "FirecrawlWebSearch"
  3  id: firecrawlwebsearch
  4  slug: "/firecrawlwebsearch"
  5  description: "Search engine using the Firecrawl API."
  6  ---
  7  
  8  # FirecrawlWebSearch
  9  
 10  Search the web and extract content using the Firecrawl API.
 11  
 12  <div className="key-value-table">
 13  
 14  |  |  |
 15  | --- | --- |
 16  | **Most common position in a pipeline** | Before a [`ChatPromptBuilder`](../builders/chatpromptbuilder.mdx) or right at the beginning of an indexing pipeline. |
 17  | **Mandatory init variables** | `api_key`: The Firecrawl API key. Can be set with the `FIRECRAWL_API_KEY` env var. |
 18  | **Mandatory run variables** | `query`: A string with your search query. |
 19  | **Output variables** | `documents`: A list of Haystack Documents containing the scraped content and metadata. <br /> <br />`links`: A list of strings of resulting URLs. |
 20  | **API reference** | [Firecrawl Search API](/reference/integrations-firecrawl) |
 21  | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/firecrawl/src/haystack_integrations/components/websearch/firecrawl/firecrawl_websearch.py |
 22  
 23  </div>
 24  
 25  ## Overview
 26  
 27  When you give `FirecrawlWebSearch` a query, it uses the Firecrawl Search API to search the web, crawl the resulting pages, and return the structured text as a list of Haystack `Document` objects. It also returns a list of the underlying URLs.
 28  
 29  Because Firecrawl actively scrapes and structures the content of the pages it finds into LLM-friendly formats, you generally don't need an additional component like `LinkContentFetcher` to read the web pages. `FirecrawlWebSearch` handles the retrieval and scraping all in one step.
 30  
 31  `FirecrawlWebSearch` requires a [Firecrawl](https://firecrawl.dev) API key to work. By default, it looks for a `FIRECRAWL_API_KEY` environment variable. Alternatively, you can pass an `api_key` directly during initialization.
 32  
 33  ## Usage
 34  
 35  ### On its own
 36  
 37  Here is a quick example of how `FirecrawlWebSearch` searches the web based on a query, scrapes the resulting web pages, and returns a list of Documents containing the page content.
 38  
 39  ```python
 40  from haystack_integrations.components.websearch.firecrawl import FirecrawlWebSearch
 41  from haystack.utils import Secret
 42  
 43  web_search = FirecrawlWebSearch(
 44      api_key=Secret.from_env_var("FIRECRAWL_API_KEY"),
 45      top_k=5,
 46      search_params={"scrape_options": {"formats": ["markdown"]}},
 47  )
 48  query = "What is Haystack by deepset?"
 49  
 50  response = web_search.run(query=query)
 51  
 52  for doc in response["documents"]:
 53      print(doc.content)
 54  ```
 55  
 56  ### In a pipeline
 57  
 58  Here is an example of a Retrieval-Augmented Generation (RAG) pipeline where using `FirecrawlWebSearch` to look up an answer. Because Firecrawl returns the actual text of the scraped pages, you can pass its `documents` output directly into the `ChatPromptBuilder` to give the LLM the necessary context.
 59  
 60  ```python
 61  from haystack import Pipeline
 62  from haystack.utils import Secret
 63  from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
 64  from haystack.components.generators.chat import OpenAIChatGenerator
 65  from haystack_integrations.components.websearch.firecrawl import FirecrawlWebSearch
 66  from haystack.dataclasses import ChatMessage
 67  
 68  web_search = FirecrawlWebSearch(
 69      api_key=Secret.from_env_var("FIRECRAWL_API_KEY"),
 70      top_k=2,
 71      search_params={"scrape_options": {"formats": ["markdown"]}},
 72  )
 73  
 74  prompt_template = [
 75      ChatMessage.from_system("You are a helpful assistant."),
 76      ChatMessage.from_user(
 77          "Given the information below:\n"
 78          "{% for document in documents %}{{ document.content }}\n{% endfor %}\n"
 79          "Answer the following question: {{ query }}.\nAnswer:",
 80      ),
 81  ]
 82  
 83  prompt_builder = ChatPromptBuilder(
 84      template=prompt_template,
 85      required_variables={"query", "documents"},
 86  )
 87  
 88  llm = OpenAIChatGenerator(
 89      api_key=Secret.from_env_var("OPENAI_API_KEY"),
 90      model="gpt-5-nano",
 91  )
 92  
 93  pipe = Pipeline()
 94  pipe.add_component("search", web_search)
 95  pipe.add_component("prompt_builder", prompt_builder)
 96  pipe.add_component("llm", llm)
 97  
 98  pipe.connect("search.documents", "prompt_builder.documents")
 99  pipe.connect("prompt_builder.prompt", "llm.messages")
100  
101  query = "What is Haystack by deepset?"
102  
103  result = pipe.run(data={"search": {"query": query}, "prompt_builder": {"query": query}})
104  
105  print(result["llm"]["replies"][0].text)
106  ```