/ releasenotes / notes / fix-missing-parent-header-error-MarkdownHeaderSplitter-b5db96e19011b6b9.yaml
fix-missing-parent-header-error-MarkdownHeaderSplitter-b5db96e19011b6b9.yaml
1 --- 2 fixes: 3 - | 4 When using the **MarkdownHeaderSplitter**, in the split chunks, the child header previously lost 5 its direct parent header in the metadata. Previously if one executed the code below: 6 7 .. code:: python 8 from haystack.components.preprocessors import MarkdownHeaderSplitter 9 from haystack import Document 10 11 text = """ 12 # header 1 13 intro text 14 15 ## header 1.1 16 text 1 17 18 ## header 1.2 19 text 2 20 21 ### header 1.2.1 22 text 3 23 24 ### header 1.2.2 25 text 4 26 """ 27 28 document = Document(content=text) 29 30 splitter = MarkdownHeaderSplitter( 31 keep_headers=True, 32 secondary_split="word" 33 ) 34 result = splitter.run(documents=[document])["documents"] 35 36 for doc in result: 37 print(f"Header: {doc.meta['header']}, parent headers: {doc.meta['parent_headers']}") 38 39 We would have expected this output: 40 41 .. code:: text 42 43 Header: header 1, parent headers: [] 44 Header: header 1.1, parent headers: ['header 1'] 45 Header: header 1.2, parent headers: ['header 1'] 46 Header: header 1.2.1, parent headers: ['header 1', 'header 1.2'] 47 Header: header 1.2.2, parent headers: ['header 1', 'header 1.2'] 48 49 But instead we actually got: 50 51 .. code:: text 52 Header: header 1, parent headers: [] 53 Header: header 1.1, parent headers: [] 54 Header: header 1.2, parent headers: ['header 1'] 55 Header: header 1.2.1, parent headers: ['header 1'] 56 Header: header 1.2.2, parent headers: ['header 1', 'header 1.2'] 57 58 The error happened when a parent header had its own content chunk before the first 59 child header. 60 61 This has been fixed so even when a parent header has its own content chunk before the first child header all content is preserved.