0005-multi-turn-chat.md
1 # ADR-0005: Multi-Turn Chat in Query Console 2 3 ## Status 4 5 Accepted 6 7 ## Context 8 9 The Query Console currently operates as a single-shot Q&A interface — the user asks one question, gets one answer, and every subsequent question starts from scratch. This means follow-up questions like "What is so great about #1?" fail because the LLM has no memory of the previous exchange. 10 11 Real-world knowledge base exploration is conversational. Users ask an initial question, then drill down, rephrase, or ask follow-ups that reference prior answers. Without conversation history, each query is isolated and the user must repeat context manually. 12 13 The backend's `POST /query` endpoint currently accepts `{ question: string }`. The LLM provider builds a fresh 2-message array (system + user) for every call. No conversation context is preserved. 14 15 ## Decision 16 17 Add multi-turn conversation support across the full stack: 18 19 ### Frontend (meridian-studio) 20 21 - Transform the Query Console from a single-response card into a **chat thread UI** with scrolling message history 22 - Maintain a `messages[]` array in component state tracking user questions and assistant responses 23 - Send the conversation history with each query so the backend can pass it to the LLM 24 - The conversation resets on page navigation or explicit "New conversation" action 25 26 ### Backend (meridian) 27 28 - Extend `POST /query` to accept an optional `conversation_history` field — an array of `{ role, content }` message objects 29 - Update `LLMProvider.generate()` to accept an optional `messages` parameter alongside the existing `prompt` 30 - The Azure OpenAI provider passes conversation history as prior turns in the `messages` array to `chat.completions.create()` 31 - The Ollama provider concatenates history into a single prompt string (Ollama's `/api/generate` endpoint does not support a messages array) 32 - Retrieval continues to use only the **latest user question** for vector search — prior turns provide LLM context, not retrieval context 33 - Governance (confidence threshold gating) remains unchanged — the conversation history does not bypass the retrieval confidence check 34 35 ### Message format 36 37 ```json 38 { 39 "question": "What is so great about #1?", 40 "conversation_history": [ 41 { "role": "user", "content": "What topics are in the knowledge base?" }, 42 { "role": "assistant", "content": "The knowledge base covers deployment, rollback procedures, and..." } 43 ] 44 } 45 ``` 46 47 The `conversation_history` field is optional. When omitted, behavior is identical to the current single-shot mode — full backward compatibility. 48 49 ### What does NOT change 50 51 - Retrieval uses only the latest question (not the full conversation) 52 - Governance threshold gating is unchanged 53 - The MCP `query_knowledge_base` tool keeps its current single-question interface (MCP clients manage their own conversation state) 54 - Telemetry logging structure is unchanged 55 56 ## Alternatives Considered 57 58 - **Client-side prompt concatenation**: Concatenate all prior Q&A into a single mega-question on the frontend. Simpler but produces poor retrieval results (vector search on a concatenated blob) and wastes tokens. 59 - **Server-side session storage**: Store conversation state on the backend keyed by session ID. Adds statefulness, session management, and cleanup complexity. The frontend already has the history in memory — passing it is simpler. 60 - **Send history through MCP**: Extend the MCP tool schema with a `messages` field. Unnecessary complexity — the frontend now calls `/query` directly. 61 62 ## Consequences 63 64 - The Query Console becomes conversational — follow-up questions work naturally 65 - Backend API gains an optional field (`conversation_history`) with no breaking changes 66 - LLM provider interface gains an optional `messages` parameter — existing implementations continue to work 67 - Token usage increases with conversation length (each turn re-sends the full history) 68 - No server-side session state — the frontend owns the conversation lifecycle 69 - Ollama provider gets a best-effort implementation (flattened history in prompt) since it lacks native chat API support