firecrawlwebsearch.mdx
1 --- 2 title: "FirecrawlWebSearch" 3 id: firecrawlwebsearch 4 slug: "/firecrawlwebsearch" 5 description: "Search engine using the Firecrawl API." 6 --- 7 8 # FirecrawlWebSearch 9 10 Search the web and extract content using the Firecrawl API. 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | Before a [`ChatPromptBuilder`](../builders/chatpromptbuilder.mdx) or right at the beginning of an indexing pipeline. | 17 | **Mandatory init variables** | `api_key`: The Firecrawl API key. Can be set with the `FIRECRAWL_API_KEY` env var. | 18 | **Mandatory run variables** | `query`: A string with your search query. | 19 | **Output variables** | `documents`: A list of Haystack Documents containing the scraped content and metadata. <br /> <br />`links`: A list of strings of resulting URLs. | 20 | **API reference** | [Firecrawl Search API](/reference/integrations-firecrawl) | 21 | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/firecrawl/src/haystack_integrations/components/websearch/firecrawl/firecrawl_websearch.py | 22 23 </div> 24 25 ## Overview 26 27 When you give `FirecrawlWebSearch` a query, it uses the Firecrawl Search API to search the web, crawl the resulting pages, and return the structured text as a list of Haystack `Document` objects. It also returns a list of the underlying URLs. 28 29 Because Firecrawl actively scrapes and structures the content of the pages it finds into LLM-friendly formats, you generally don't need an additional component like `LinkContentFetcher` to read the web pages. `FirecrawlWebSearch` handles the retrieval and scraping all in one step. 30 31 `FirecrawlWebSearch` requires a [Firecrawl](https://firecrawl.dev) API key to work. By default, it looks for a `FIRECRAWL_API_KEY` environment variable. Alternatively, you can pass an `api_key` directly during initialization. 32 33 ## Usage 34 35 ### On its own 36 37 Here is a quick example of how `FirecrawlWebSearch` searches the web based on a query, scrapes the resulting web pages, and returns a list of Documents containing the page content. 38 39 ```python 40 from haystack_integrations.components.websearch.firecrawl import FirecrawlWebSearch 41 from haystack.utils import Secret 42 43 web_search = FirecrawlWebSearch( 44 api_key=Secret.from_env_var("FIRECRAWL_API_KEY"), 45 top_k=5, 46 search_params={"scrape_options": {"formats": ["markdown"]}}, 47 ) 48 query = "What is Haystack by deepset?" 49 50 response = web_search.run(query=query) 51 52 for doc in response["documents"]: 53 print(doc.content) 54 ``` 55 56 ### In a pipeline 57 58 Here is an example of a Retrieval-Augmented Generation (RAG) pipeline where using `FirecrawlWebSearch` to look up an answer. Because Firecrawl returns the actual text of the scraped pages, you can pass its `documents` output directly into the `ChatPromptBuilder` to give the LLM the necessary context. 59 60 ```python 61 from haystack import Pipeline 62 from haystack.utils import Secret 63 from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder 64 from haystack.components.generators.chat import OpenAIChatGenerator 65 from haystack_integrations.components.websearch.firecrawl import FirecrawlWebSearch 66 from haystack.dataclasses import ChatMessage 67 68 web_search = FirecrawlWebSearch( 69 api_key=Secret.from_env_var("FIRECRAWL_API_KEY"), 70 top_k=2, 71 search_params={"scrape_options": {"formats": ["markdown"]}}, 72 ) 73 74 prompt_template = [ 75 ChatMessage.from_system("You are a helpful assistant."), 76 ChatMessage.from_user( 77 "Given the information below:\n" 78 "{% for document in documents %}{{ document.content }}\n{% endfor %}\n" 79 "Answer the following question: {{ query }}.\nAnswer:", 80 ), 81 ] 82 83 prompt_builder = ChatPromptBuilder( 84 template=prompt_template, 85 required_variables={"query", "documents"}, 86 ) 87 88 llm = OpenAIChatGenerator( 89 api_key=Secret.from_env_var("OPENAI_API_KEY"), 90 model="gpt-5-nano", 91 ) 92 93 pipe = Pipeline() 94 pipe.add_component("search", web_search) 95 pipe.add_component("prompt_builder", prompt_builder) 96 pipe.add_component("llm", llm) 97 98 pipe.connect("search.documents", "prompt_builder.documents") 99 pipe.connect("prompt_builder.prompt", "llm.messages") 100 101 query = "What is Haystack by deepset?" 102 103 result = pipe.run(data={"search": {"query": query}, "prompt_builder": {"query": query}}) 104 105 print(result["llm"]["replies"][0].text) 106 ```