/ adr / 0005-multi-turn-chat.md
0005-multi-turn-chat.md
 1  # ADR-0005: Multi-Turn Chat in Query Console
 2  
 3  ## Status
 4  
 5  Accepted
 6  
 7  ## Context
 8  
 9  The Query Console currently operates as a single-shot Q&A interface — the user asks one question, gets one answer, and every subsequent question starts from scratch. This means follow-up questions like "What is so great about #1?" fail because the LLM has no memory of the previous exchange.
10  
11  Real-world knowledge base exploration is conversational. Users ask an initial question, then drill down, rephrase, or ask follow-ups that reference prior answers. Without conversation history, each query is isolated and the user must repeat context manually.
12  
13  The backend's `POST /query` endpoint currently accepts `{ question: string }`. The LLM provider builds a fresh 2-message array (system + user) for every call. No conversation context is preserved.
14  
15  ## Decision
16  
17  Add multi-turn conversation support across the full stack:
18  
19  ### Frontend (meridian-studio)
20  
21  - Transform the Query Console from a single-response card into a **chat thread UI** with scrolling message history
22  - Maintain a `messages[]` array in component state tracking user questions and assistant responses
23  - Send the conversation history with each query so the backend can pass it to the LLM
24  - The conversation resets on page navigation or explicit "New conversation" action
25  
26  ### Backend (meridian)
27  
28  - Extend `POST /query` to accept an optional `conversation_history` field — an array of `{ role, content }` message objects
29  - Update `LLMProvider.generate()` to accept an optional `messages` parameter alongside the existing `prompt`
30  - The Azure OpenAI provider passes conversation history as prior turns in the `messages` array to `chat.completions.create()`
31  - The Ollama provider concatenates history into a single prompt string (Ollama's `/api/generate` endpoint does not support a messages array)
32  - Retrieval continues to use only the **latest user question** for vector search — prior turns provide LLM context, not retrieval context
33  - Governance (confidence threshold gating) remains unchanged — the conversation history does not bypass the retrieval confidence check
34  
35  ### Message format
36  
37  ```json
38  {
39    "question": "What is so great about #1?",
40    "conversation_history": [
41      { "role": "user", "content": "What topics are in the knowledge base?" },
42      { "role": "assistant", "content": "The knowledge base covers deployment, rollback procedures, and..." }
43    ]
44  }
45  ```
46  
47  The `conversation_history` field is optional. When omitted, behavior is identical to the current single-shot mode — full backward compatibility.
48  
49  ### What does NOT change
50  
51  - Retrieval uses only the latest question (not the full conversation)
52  - Governance threshold gating is unchanged
53  - The MCP `query_knowledge_base` tool keeps its current single-question interface (MCP clients manage their own conversation state)
54  - Telemetry logging structure is unchanged
55  
56  ## Alternatives Considered
57  
58  - **Client-side prompt concatenation**: Concatenate all prior Q&A into a single mega-question on the frontend. Simpler but produces poor retrieval results (vector search on a concatenated blob) and wastes tokens.
59  - **Server-side session storage**: Store conversation state on the backend keyed by session ID. Adds statefulness, session management, and cleanup complexity. The frontend already has the history in memory — passing it is simpler.
60  - **Send history through MCP**: Extend the MCP tool schema with a `messages` field. Unnecessary complexity — the frontend now calls `/query` directly.
61  
62  ## Consequences
63  
64  - The Query Console becomes conversational — follow-up questions work naturally
65  - Backend API gains an optional field (`conversation_history`) with no breaking changes
66  - LLM provider interface gains an optional `messages` parameter — existing implementations continue to work
67  - Token usage increases with conversation length (each turn re-sends the full history)
68  - No server-side session state — the frontend owns the conversation lifecycle
69  - Ollama provider gets a best-effort implementation (flattened history in prompt) since it lacks native chat API support