/ releasenotes / notes / fix-missing-parent-header-error-MarkdownHeaderSplitter-b5db96e19011b6b9.yaml
fix-missing-parent-header-error-MarkdownHeaderSplitter-b5db96e19011b6b9.yaml
 1  ---
 2  fixes:
 3    - |
 4      When using the **MarkdownHeaderSplitter**, in the split chunks, the child header previously lost
 5      its direct parent header in the metadata. Previously if one executed the code below:
 6  
 7      .. code:: python
 8        from haystack.components.preprocessors import MarkdownHeaderSplitter
 9        from haystack import Document
10  
11        text = """
12        # header 1
13        intro text
14  
15        ## header 1.1
16        text 1
17  
18        ## header 1.2
19        text 2
20  
21        ### header 1.2.1
22        text 3
23  
24        ### header 1.2.2
25        text 4
26        """
27  
28        document = Document(content=text)
29  
30        splitter = MarkdownHeaderSplitter(
31                keep_headers=True,
32                secondary_split="word"
33        )
34        result = splitter.run(documents=[document])["documents"]
35  
36        for doc in result:
37            print(f"Header: {doc.meta['header']}, parent headers: {doc.meta['parent_headers']}")
38  
39      We would have expected this output:
40  
41      .. code:: text
42  
43        Header: header 1, parent headers: []
44        Header: header 1.1, parent headers: ['header 1']
45        Header: header 1.2, parent headers: ['header 1']
46        Header: header 1.2.1, parent headers: ['header 1', 'header 1.2']
47        Header: header 1.2.2, parent headers: ['header 1', 'header 1.2']
48  
49      But instead we actually got:
50  
51      .. code:: text
52        Header: header 1, parent headers: []
53        Header: header 1.1, parent headers: []
54        Header: header 1.2, parent headers: ['header 1']
55        Header: header 1.2.1, parent headers: ['header 1']
56        Header: header 1.2.2, parent headers: ['header 1', 'header 1.2']
57  
58      The error happened when a parent header had its own content chunk before the first
59      child header.
60  
61      This has been fixed so even when a parent header has its own content chunk before the first child header all content is preserved.