Architecture & Concept We designed a RAG approach (Retrieval + Augmented Generation) comprising:
• Retriever / Search component — finds relevant knowledge fragments from the internalknowledge base (semantic/vector search).
• Generator / Summarizer — given the user query plus retrieved fragments, generates a relevant answer in the form of query-oriented extractive multidocument summary.
• Filtering / Verification module — ensures the generated answer stays within permittedinternal sources and rejects speculative content.
1. Data consolidation & preprocessing o Audit and inventory knowledge sources (what documents, databases, systems)
o Normalize formats, chunk documents, attach metadata (tags, date, author, division)
o Deduplicate, cleanse, and review for sensitive content
2. Embedding & indexing o Select embedding model (e.g. Sentence Transformers, OpenAI embeddings, or domain-adapted model)
o Index embeddings in a vector database (Milvus, Weaviate, Pinecone, FAISS, etc.)
o Tweak search semantics and ranking algorithms
3. Quality control / validation
o Check thresholds: refuse answers when evidence is insufficient
o Fallback logic (e.g. “I’m not confident—please consult an expert”)
o Monitor metrics: accuracy, answer rejection rate, user feedback
4. Pilot, feedback loop, iteration & scaling o Launch pilot version in a single division o Collect user feedback, refine retrieval, improve prompts
1) A user types a question in the chat or portal
2) The backend converts the query to an embedding, performs vector search → retrieves top N fragments
3) These fragments + the original query are passed to the LLM, which generates a response
4) The answer is returned to the user, along with citations / source links
5) Admin dashboard tracks usage, flags weak answers, shows statistics