experimental_preprocessors_api.md
1 --- 2 title: "Preprocessors" 3 id: experimental-preprocessors-api 4 description: "Pipelines wrapped as components." 5 slug: "/experimental-preprocessors-api" 6 --- 7 8 <a id="haystack_experimental.components.preprocessors.md_header_level_inferrer"></a> 9 10 ## Module haystack\_experimental.components.preprocessors.md\_header\_level\_inferrer 11 12 <a id="haystack_experimental.components.preprocessors.md_header_level_inferrer.MarkdownHeaderLevelInferrer"></a> 13 14 ### MarkdownHeaderLevelInferrer 15 16 Infers and rewrites header levels in Markdown text to normalize hierarchy. 17 18 First header → Always becomes level 1 (#) 19 Subsequent headers → Level increases if no content between headers, stays same if content exists 20 Maximum level → Capped at 6 (######) 21 22 ### Usage example 23 ```python 24 from haystack import Document 25 from haystack_experimental.components.preprocessors import MarkdownHeaderLevelInferrer 26 27 # Create a document with uniform header levels 28 text = "## Title 29 ## Subheader 30 Section 31 ## Subheader 32 More Content" 33 doc = Document(content=text) 34 35 # Initialize the inferrer and process the document 36 inferrer = MarkdownHeaderLevelInferrer() 37 result = inferrer.run([doc]) 38 39 # The headers are now normalized with proper hierarchy 40 print(result["documents"][0].content) 41 > # Title 42 ## Subheader 43 Section 44 ## Subheader 45 More Content 46 ``` 47 48 <a id="haystack_experimental.components.preprocessors.md_header_level_inferrer.MarkdownHeaderLevelInferrer.__init__"></a> 49 50 #### MarkdownHeaderLevelInferrer.\_\_init\_\_ 51 52 ```python 53 def __init__() 54 ``` 55 56 Initializes the MarkdownHeaderLevelInferrer. 57 58 <a id="haystack_experimental.components.preprocessors.md_header_level_inferrer.MarkdownHeaderLevelInferrer.run"></a> 59 60 #### MarkdownHeaderLevelInferrer.run 61 62 ```python 63 @component.output_types(documents=list[Document]) 64 def run(documents: list[Document]) -> dict 65 ``` 66 67 Infers and rewrites the header levels in the content for documents that use uniform header levels. 68 69 **Arguments**: 70 71 - `documents`: list of Document objects to process. 72 73 **Returns**: 74 75 dict: a dictionary with the key 'documents' containing the processed Document objects. 76