/ docs-website / reference_versioned_docs / version-2.18 / experiments-api / experimental_summarizer_api.md
experimental_summarizer_api.md
1 --- 2 title: "Summarizers" 3 id: experimental-summarizers-api 4 description: "Components that summarize texts into concise versions." 5 slug: "/experimental-summarizers-api" 6 --- 7 8 <a id="haystack_experimental.components.summarizers.llm_summarizer"></a> 9 10 ## Module haystack\_experimental.components.summarizers.llm\_summarizer 11 12 <a id="haystack_experimental.components.summarizers.llm_summarizer.LLMSummarizer"></a> 13 14 ### LLMSummarizer 15 16 Summarizes text using a language model. 17 18 It's inspired by code from the OpenAI blog post: https://cookbook.openai.com/examples/summarizing_long_documents 19 20 Example 21 ```python 22 from haystack_experimental.components.summarizers.summarizer import Summarizer 23 from haystack.components.generators.chat import OpenAIChatGenerator 24 from haystack import Document 25 26 text = ("Machine learning is a subset of artificial intelligence that provides systems " 27 "the ability to automatically learn and improve from experience without being " 28 "explicitly programmed. The process of learning begins with observations or data. " 29 "Supervised learning algorithms build a mathematical model of sample data, known as " 30 "training data, in order to make predictions or decisions. Unsupervised learning " 31 "algorithms take a set of data that contains only inputs and find structure in the data. " 32 "Reinforcement learning is an area of machine learning where an agent learns to behave " 33 "in an environment by performing actions and seeing the results. Deep learning uses " 34 "artificial neural networks to model complex patterns in data. Neural networks consist " 35 "of layers of connected nodes, each performing a simple computation.") 36 37 doc = Document(content=text) 38 chat_generator = OpenAIChatGenerator(model="gpt-4") 39 summarizer = Summarizer(chat_generator=chat_generator) 40 summarizer.run(documents=[doc]) 41 ``` 42 43 <a id="haystack_experimental.components.summarizers.llm_summarizer.LLMSummarizer.__init__"></a> 44 45 #### LLMSummarizer.\_\_init\_\_ 46 47 ```python 48 def __init__(chat_generator: ChatGenerator, 49 system_prompt: str 50 | None = "Rewrite this text in summarized form.", 51 summary_detail: float = 0, 52 minimum_chunk_size: int | None = 500, 53 chunk_delimiter: str = ".", 54 summarize_recursively: bool = False, 55 split_overlap: int = 0) 56 ``` 57 58 Initialize the Summarizer component. 59 60 :param chat_generator: A ChatGenerator instance to use for summarization. 61 :param system_prompt: The prompt to instruct the LLM to summarise text, if not given defaults to: 62 "Rewrite this text in summarized form." 63 :param summary_detail: The level of detail for the summary (0-1), defaults to 0. 64 This parameter controls the trade-off between conciseness and completeness by adjusting how many 65 chunks the text is divided into. At detail=0, the text is processed as a single chunk (or very few 66 chunks), producing the most concise summary. At detail=1, the text is split into the maximum number 67 of chunks allowed by minimum_chunk_size, enabling more granular analysis and detailed summaries. 68 The formula uses linear interpolation: num_chunks = 1 + detail * (max_chunks - 1), where max_chunks 69 is determined by dividing the document length by minimum_chunk_size. 70 :param minimum_chunk_size: The minimum token count per chunk, defaults to 500 71 :param chunk_delimiter: The character used to determine separator priority. 72 "." uses sentence-based splitting, " 73 " uses paragraph-based splitting, defaults to "." 74 :param summarize_recursively: Whether to use previous summaries as context, defaults to False. 75 :param split_overlap: Number of tokens to overlap between consecutive chunks, defaults to 0. 76 77 78 <a id="haystack_experimental.components.summarizers.llm_summarizer.LLMSummarizer.warm_up"></a> 79 80 #### LLMSummarizer.warm\_up 81 82 ```python 83 def warm_up() 84 ``` 85 86 Warm up the chat generator and document splitter components. 87 88 <a id="haystack_experimental.components.summarizers.llm_summarizer.LLMSummarizer.to_dict"></a> 89 90 #### LLMSummarizer.to\_dict 91 92 ```python 93 def to_dict() -> dict[str, Any] 94 ``` 95 96 Serializes the component to a dictionary. 97 98 **Returns**: 99 100 Dictionary with serialized data. 101 102 <a id="haystack_experimental.components.summarizers.llm_summarizer.LLMSummarizer.from_dict"></a> 103 104 #### LLMSummarizer.from\_dict 105 106 ```python 107 @classmethod 108 def from_dict(cls, data: dict[str, Any]) -> "LLMSummarizer" 109 ``` 110 111 Deserializes the component from a dictionary. 112 113 **Arguments**: 114 115 - `data`: Dictionary with serialized data. 116 117 **Returns**: 118 119 An instance of the component. 120 121 <a id="haystack_experimental.components.summarizers.llm_summarizer.LLMSummarizer.num_tokens"></a> 122 123 #### LLMSummarizer.num\_tokens 124 125 ```python 126 def num_tokens(text: str) -> int 127 ``` 128 129 Estimates the token count for a given text. 130 131 Uses the RecursiveDocumentSplitter's tokenization logic for consistency. 132 133 **Arguments**: 134 135 - `text`: The text to tokenize 136 137 **Returns**: 138 139 The estimated token count 140 141 <a id="haystack_experimental.components.summarizers.llm_summarizer.LLMSummarizer.summarize"></a> 142 143 #### LLMSummarizer.summarize 144 145 ```python 146 def summarize(text: str, 147 detail: float, 148 minimum_chunk_size: int, 149 summarize_recursively: bool = False) -> str 150 ``` 151 152 Summarizes text by splitting it into optimally-sized chunks and processing each with an LLM. 153 154 **Arguments**: 155 156 - `text`: Text to summarize 157 - `detail`: Detail level (0-1) where 0 is most concise and 1 is most detailed 158 - `minimum_chunk_size`: Minimum token count per chunk 159 - `summarize_recursively`: Whether to use previous summaries as context 160 161 **Raises**: 162 163 - `ValueError`: If detail is not between 0 and 1 164 165 **Returns**: 166 167 The textual content summarized by the LLM. 168 169 <a id="haystack_experimental.components.summarizers.llm_summarizer.LLMSummarizer.run"></a> 170 171 #### LLMSummarizer.run 172 173 ```python 174 @component.output_types(summary=list[Document]) 175 def run(*, 176 documents: list[Document], 177 detail: float | None = None, 178 minimum_chunk_size: int | None = None, 179 summarize_recursively: bool | None = None, 180 system_prompt: str | None = None) -> dict[str, list[Document]] 181 ``` 182 183 Run the summarizer on a list of documents. 184 185 **Arguments**: 186 187 - `documents`: List of documents to summarize 188 - `detail`: The level of detail for the summary (0-1), defaults to 0 overwriting the component's default. 189 - `minimum_chunk_size`: The minimum token count per chunk, defaults to 500 overwriting the 190 component's default. 191 - `system_prompt`: If given it will overwrite prompt given at init time or the default one. 192 - `summarize_recursively`: Whether to use previous summaries as context, defaults to False overwriting the 193 component's default. 194 195 **Raises**: 196 197 - `RuntimeError`: If the component wasn't warmed up.