filetofilecontent.mdx
1 --- 2 title: "FileToFileContent" 3 id: filetofilecontent 4 slug: "/filetofilecontent" 5 description: "`FileToFileContent` reads local files and converts them into `FileContent` objects" 6 --- 7 8 # FileToFileContent 9 10 `FileToFileContent` reads local files and converts them into `FileContent` objects. These are ready for multimodal AI pipelines that need to pass PDFs and other file types to an LLM. 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | Before a `ChatPromptBuilder` in a query pipeline | 17 | **Mandatory run variables** | `sources`: A list of file paths or ByteStreams | 18 | **Output variables** | `file_contents`: A list of `FileContent` objects | 19 | **API reference** | [Converters](/reference/converters-api) | 20 | **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/file_to_file_content.py | 21 22 </div> 23 24 ## Overview 25 26 `FileToFileContent` processes a list of file sources and converts them into `FileContent` objects that can be embedded 27 into a `ChatMessage` and passed to a Language Model. 28 29 Each source can be: 30 31 - A file path (string or `Path`), or 32 - A `ByteStream` object. 33 34 Optionally, you can provide extra provider-specific information using the `extra` parameter. This can be a single dictionary (applied to all files) or a list matching the length of `sources`. 35 36 Support for passing files to LLMs varies by provider. Some providers do not support file inputs, some restrict support 37 to PDF files, and others accept a wider range of file types. 38 39 ## Usage 40 41 ### On its own 42 43 ```python 44 from haystack.components.converters import FileToFileContent 45 46 converter = FileToFileContent() 47 48 sources = ["document.pdf", "recording.mp3"] 49 50 result = converter.run(sources=sources) 51 file_contents = result["file_contents"] 52 print(file_contents) 53 54 ## [ 55 ## FileContent( 56 ## base64_data='JVBERi0x...', mime_type='application/pdf', 57 ## filename='document.pdf', extra={} 58 ## ), 59 ## FileContent( 60 ## base64_data='SUQzBA...', mime_type='audio/mpeg', 61 ## filename='recording.mp3', extra={} 62 ## ) 63 ## ] 64 ``` 65 66 ### In a pipeline 67 68 Use `FileToFileContent` together with a `LinkContentFetcher` and a `ChatPromptBuilder` to build a pipeline that fetches a remote file, converts it, and passes it to an LLM. 69 70 ```python 71 from haystack.components.converters import FileToFileContent 72 from haystack.components.fetchers import LinkContentFetcher 73 from haystack.components.generators.chat.openai import OpenAIChatGenerator 74 from haystack.components.builders import ChatPromptBuilder 75 76 from haystack import Pipeline 77 78 template = """ 79 {% message role="user"%} 80 {% for file in files %} 81 {{ file | templatize_part }} 82 {% endfor %} 83 What's the main takeaway of the following document? Just one sentence. 84 {% endmessage %} 85 """ 86 87 pipeline = Pipeline() 88 pipeline.add_component("fetcher", LinkContentFetcher()) 89 pipeline.add_component("converter", FileToFileContent()) 90 pipeline.add_component("prompt_builder", ChatPromptBuilder(template=template)) 91 pipeline.add_component("llm", OpenAIChatGenerator(model="gpt-4.1-mini")) 92 93 pipeline.connect("fetcher", "converter") 94 pipeline.connect("converter", "prompt_builder") 95 pipeline.connect("prompt_builder", "llm") 96 97 results = pipeline.run({"fetcher": {"urls": ["https://arxiv.org/pdf/2309.08632"]}}) 98 99 print(results["llm"]["replies"][0].text) 100 101 # The document is a satirical paper humorously claiming that pretraining a 102 # small language model exclusively on evaluation benchmark test sets can achieve 103 # perfect performance, highlighting issues of data contamination in model 104 # evaluation. 105 ```