RAG 与 Agent 的融合:构建知识增强型 AI Agent
深入探讨 RAG(检索增强生成)与 AI Agent 的融合架构,实现具备知识检索、推理和行动能力的智能系统。
传统的 RAG 系统是”检索-生成”的单次流水线:用户提问,检索相关文档,拼接到提示词中,让 LLM 生成回答。但当 RAG 遇到 Agent,事情变得有趣起来——Agent 可以主动决定何时检索、检索什么、如何验证检索结果,甚至在多轮交互中逐步完善知识。
从 RAG 到 Agentic RAG
传统 RAG:
用户问题 → 检索文档 → LLM 生成 → 回答
Agentic RAG:
用户问题 → Agent 推理 → 判断是否需要检索
↓
需要 → 选择检索策略 → 执行检索 → 验证结果
↓
不需要 → 直接回答
↓
结果不足 → 重新检索 / 换策略
↓
综合推理 → 生成回答
架构设计
interface AgenticRAGConfig {
retrievers: Retriever[];
validator: ResultValidator;
maxRetries: number;
confidenceThreshold: number;
}
class AgenticRAG {
private retrievers: Map<string, Retriever> = new Map();
private llm: LLM;
private validator: ResultValidator;
async answer(question: string, context: ConversationContext): Promise<Answer> {
// 1. Agent 推理:是否需要检索
const decision = await this.decide(question, context);
if (!decision.needsRetrieval) {
return {
content: await this.llm.generate(question, context),
sources: [],
confidence: 0.9,
};
}
// 2. 选择检索策略
const strategy = this.selectStrategy(decision);
// 3. 执行检索(可能多轮)
let results: RetrievalResult[] = [];
let attempts = 0;
while (attempts < this.config.maxRetries) {
results = await this.retrieve(strategy, question, context);
// 4. 验证结果质量
const validation = await this.validator.validate(results, question);
if (validation.sufficient) {
break;
}
// 5. 调整策略重试
strategy.refine(validation.feedback);
attempts++;
}
// 6. 综合推理生成回答
const answer = await this.synthesize(question, results, context);
return {
content: answer,
sources: results.map(r => r.source),
confidence: this.calculateConfidence(results),
};
}
private async decide(question: string, context: ConversationContext): Promise<RetrievalDecision> {
const prompt = `判断以下问题是否需要检索外部知识:
问题:${question}
对话历史:${context.recentMessages.map(m => m.content).join('\n')}
回答 JSON: { "needsRetrieval": boolean, "reason": string }`;
const response = await this.llm.generate(prompt);
return JSON.parse(response);
}
private selectStrategy(decision: RetrievalDecision): RetrievalStrategy {
switch (decision.type) {
case 'factual':
return new VectorSearchStrategy();
case 'recent':
return new TimeWeightedStrategy();
case 'complex':
return new MultiHopStrategy();
case 'comparative':
return new ComparativeStrategy();
default:
return new VectorSearchStrategy();
}
}
}
检索策略
向量检索
class VectorSearchStrategy implements RetrievalStrategy {
private embedder: Embedder;
private vectorStore: VectorStore;
async retrieve(query: string, limit: number = 5): Promise<RetrievalResult[]> {
const embedding = await this.embedder.embed(query);
const results = await this.vectorStore.search(embedding, { topK: limit });
return results.map(r => ({
content: r.metadata.content,
source: r.metadata.source,
score: r.score,
}));
}
}
多跳检索
对于复杂问题,需要多步检索:
class MultiHopStrategy implements RetrievalStrategy {
private llm: LLM;
private retrievers: Retriever[];
async retrieve(query: string, limit: number = 5): Promise<RetrievalResult[]> {
const allResults: RetrievalResult[] = [];
let currentQuery = query;
for (let hop = 0; hop < 3; hop++) {
// 检索当前查询
const results = await this.retrievers[0].retrieve(currentQuery, limit);
allResults.push(...results);
// 生成下一步查询
const nextQuery = await this.generateNextQuery(query, allResults);
if (!nextQuery) break;
currentQuery = nextQuery;
}
return this.deduplicate(allResults).slice(0, limit);
}
private async generateNextQuery(
originalQuery: string,
currentResults: RetrievalResult[]
): Promise<string | null> {
const prompt = `原始问题:${originalQuery}
已检索信息:${currentResults.map(r => r.content).join('\n---\n')}
判断是否需要进一步检索。如果需要,生成下一步检索查询。如果信息已足够,返回 null。
JSON: { "query": string | null }`;
const response = await this.llm.generate(prompt);
const parsed = JSON.parse(response);
return parsed.query;
}
}
时间加权检索
class TimeWeightedStrategy implements RetrievalStrategy {
private vectorStore: VectorStore;
private halfLifeDays: number = 30;
async retrieve(query: string, limit: number = 5): Promise<RetrievalResult[]> {
const embedding = await this.embedder.embed(query);
const results = await this.vectorStore.search(embedding, { topK: limit * 2 });
// 应用时间衰减
const now = Date.now();
const scored = results.map(r => {
const ageDays = (now - r.metadata.timestamp) / (1000 * 60 * 60 * 24);
const timeWeight = Math.pow(0.5, ageDays / this.halfLifeDays);
const combinedScore = r.score * 0.7 + timeWeight * 0.3;
return { ...r, combinedScore };
});
scored.sort((a, b) => b.combinedScore - a.combinedScore);
return scored.slice(0, limit);
}
}
结果验证
class ResultValidator {
private llm: LLM;
async validate(results: RetrievalResult[], question: string): Promise<ValidationResult> {
if (results.length === 0) {
return { sufficient: false, feedback: '没有找到相关文档' };
}
const prompt = `问题:${question}
检索结果:
${results.map((r, i) => `[${i + 1}] ${r.content}`).join('\n')}
评估这些检索结果是否足以回答问题:
1. 相关性:结果与问题的相关程度
2. 充分性:信息是否足够回答问题
3. 一致性:结果之间是否矛盾
JSON: { "sufficient": boolean, "relevance": number, "feedback": string }`;
const response = await this.llm.generate(prompt);
return JSON.parse(response);
}
}
知识融合
class KnowledgeSynthesizer {
private llm: LLM;
async synthesize(
question: string,
results: RetrievalResult[],
context: ConversationContext
): Promise<string> {
// 对结果进行去重和排序
const uniqueResults = this.deduplicate(results);
const rankedResults = this.rankByRelevance(uniqueResults, question);
// 构建上下文
const contextParts = rankedResults.map((r, i) =>
`[来源 ${i + 1}] ${r.source}\n${r.content}`
);
const prompt = `基于以下参考资料回答问题。如果参考资料不足以回答,请说明。
问题:${question}
参考资料:
${contextParts.join('\n\n')}
要求:
1. 直接回答问题
2. 引用来源(如 [来源 1])
3. 如果信息不足,说明缺少什么信息`;
return await this.llm.generate(prompt, context.messages);
}
private rankByRelevance(results: RetrievalResult[], question: string): RetrievalResult[] {
return results.sort((a, b) => {
// 综合考虑语义相关性和其他因素
const scoreA = a.score * 0.6 + this.keywordOverlap(question, a.content) * 0.4;
const scoreB = b.score * 0.6 + this.keywordOverlap(question, b.content) * 0.4;
return scoreB - scoreA;
});
}
private keywordOverlap(query: string, content: string): number {
const queryWords = new Set(query.toLowerCase().split(/\s+/));
const contentWords = new Set(content.toLowerCase().split(/\s+/));
const overlap = [...queryWords].filter(w => contentWords.has(w)).length;
return overlap / queryWords.size;
}
}
文档索引管道
class DocumentIndexer {
private embedder: Embedder;
private vectorStore: VectorStore;
private chunker: TextChunker;
async indexDocument(document: Document): Promise<void> {
// 1. 文档分块
const chunks = await this.chunker.chunk(document.content, {
strategy: 'semantic', // 语义分块
maxChunkSize: 512,
overlap: 50,
});
// 2. 生成嵌入
const embeddings = await Promise.all(
chunks.map(chunk => this.embedder.embed(chunk.content))
);
// 3. 存储到向量数据库
await this.vectorStore.upsert(
chunks.map((chunk, i) => ({
id: `${document.id}_chunk_${i}`,
vector: embeddings[i],
metadata: {
content: chunk.content,
source: document.source,
title: document.title,
chunkIndex: i,
totalChunks: chunks.length,
timestamp: Date.now(),
},
}))
);
}
async reindex(documentId: string, newContent: string): Promise<void> {
// 删除旧的 chunks
await this.vectorStore.deleteByFilter({ documentId });
// 重新索引
await this.indexDocument({ id: documentId, content: newContent });
}
}
常见问题(FAQ)
Agentic RAG 和普通 RAG 的主要区别?
普通 RAG 是固定的”检索-生成”流水线。Agentic RAG 让 Agent 自主决定何时检索、检索什么、如何验证结果,支持多轮检索和策略调整。
如何评估 RAG 系统质量?
关键指标:检索准确率(Recall@K)、回答正确率、引用准确率。使用人工标注的测试集进行评估。
向量数据库怎么选?
小规模用 Chroma,中等规模用 Pinecone/Weaviate,大规模用 Milvus。考虑延迟、成本和运维复杂度。
总结
RAG 与 Agent 的融合创造了新一代知识增强型 AI 系统。Agent 的推理能力让 RAG 从被动检索变为主动探索,多轮检索和结果验证确保了知识的准确性和完整性。这种架构是构建企业级知识助手的最佳实践。