RAG + Agent

RAG 与 Agent 的融合:构建知识增强型 AI Agent

深入探讨 RAG(检索增强生成)与 AI Agent 的融合架构,实现具备知识检索、推理和行动能力的智能系统。

传统的 RAG 系统是”检索-生成”的单次流水线:用户提问,检索相关文档,拼接到提示词中,让 LLM 生成回答。但当 RAG 遇到 Agent,事情变得有趣起来——Agent 可以主动决定何时检索、检索什么、如何验证检索结果,甚至在多轮交互中逐步完善知识。

从 RAG 到 Agentic RAG

传统 RAG:
用户问题 → 检索文档 → LLM 生成 → 回答

Agentic RAG:
用户问题 → Agent 推理 → 判断是否需要检索

                    需要 → 选择检索策略 → 执行检索 → 验证结果

                    不需要 → 直接回答

                    结果不足 → 重新检索 / 换策略

                    综合推理 → 生成回答

架构设计

interface AgenticRAGConfig {
  retrievers: Retriever[];
  validator: ResultValidator;
  maxRetries: number;
  confidenceThreshold: number;
}

class AgenticRAG {
  private retrievers: Map<string, Retriever> = new Map();
  private llm: LLM;
  private validator: ResultValidator;

  async answer(question: string, context: ConversationContext): Promise<Answer> {
    // 1. Agent 推理:是否需要检索
    const decision = await this.decide(question, context);

    if (!decision.needsRetrieval) {
      return {
        content: await this.llm.generate(question, context),
        sources: [],
        confidence: 0.9,
      };
    }

    // 2. 选择检索策略
    const strategy = this.selectStrategy(decision);

    // 3. 执行检索(可能多轮)
    let results: RetrievalResult[] = [];
    let attempts = 0;

    while (attempts < this.config.maxRetries) {
      results = await this.retrieve(strategy, question, context);

      // 4. 验证结果质量
      const validation = await this.validator.validate(results, question);

      if (validation.sufficient) {
        break;
      }

      // 5. 调整策略重试
      strategy.refine(validation.feedback);
      attempts++;
    }

    // 6. 综合推理生成回答
    const answer = await this.synthesize(question, results, context);

    return {
      content: answer,
      sources: results.map(r => r.source),
      confidence: this.calculateConfidence(results),
    };
  }

  private async decide(question: string, context: ConversationContext): Promise<RetrievalDecision> {
    const prompt = `判断以下问题是否需要检索外部知识:
问题:${question}
对话历史:${context.recentMessages.map(m => m.content).join('\n')}

回答 JSON: { "needsRetrieval": boolean, "reason": string }`;

    const response = await this.llm.generate(prompt);
    return JSON.parse(response);
  }

  private selectStrategy(decision: RetrievalDecision): RetrievalStrategy {
    switch (decision.type) {
      case 'factual':
        return new VectorSearchStrategy();
      case 'recent':
        return new TimeWeightedStrategy();
      case 'complex':
        return new MultiHopStrategy();
      case 'comparative':
        return new ComparativeStrategy();
      default:
        return new VectorSearchStrategy();
    }
  }
}

检索策略

向量检索

class VectorSearchStrategy implements RetrievalStrategy {
  private embedder: Embedder;
  private vectorStore: VectorStore;

  async retrieve(query: string, limit: number = 5): Promise<RetrievalResult[]> {
    const embedding = await this.embedder.embed(query);
    const results = await this.vectorStore.search(embedding, { topK: limit });

    return results.map(r => ({
      content: r.metadata.content,
      source: r.metadata.source,
      score: r.score,
    }));
  }
}

多跳检索

对于复杂问题,需要多步检索:

class MultiHopStrategy implements RetrievalStrategy {
  private llm: LLM;
  private retrievers: Retriever[];

  async retrieve(query: string, limit: number = 5): Promise<RetrievalResult[]> {
    const allResults: RetrievalResult[] = [];
    let currentQuery = query;

    for (let hop = 0; hop < 3; hop++) {
      // 检索当前查询
      const results = await this.retrievers[0].retrieve(currentQuery, limit);
      allResults.push(...results);

      // 生成下一步查询
      const nextQuery = await this.generateNextQuery(query, allResults);
      if (!nextQuery) break;

      currentQuery = nextQuery;
    }

    return this.deduplicate(allResults).slice(0, limit);
  }

  private async generateNextQuery(
    originalQuery: string,
    currentResults: RetrievalResult[]
  ): Promise<string | null> {
    const prompt = `原始问题:${originalQuery}
已检索信息:${currentResults.map(r => r.content).join('\n---\n')}

判断是否需要进一步检索。如果需要,生成下一步检索查询。如果信息已足够,返回 null。
JSON: { "query": string | null }`;

    const response = await this.llm.generate(prompt);
    const parsed = JSON.parse(response);
    return parsed.query;
  }
}

时间加权检索

class TimeWeightedStrategy implements RetrievalStrategy {
  private vectorStore: VectorStore;
  private halfLifeDays: number = 30;

  async retrieve(query: string, limit: number = 5): Promise<RetrievalResult[]> {
    const embedding = await this.embedder.embed(query);
    const results = await this.vectorStore.search(embedding, { topK: limit * 2 });

    // 应用时间衰减
    const now = Date.now();
    const scored = results.map(r => {
      const ageDays = (now - r.metadata.timestamp) / (1000 * 60 * 60 * 24);
      const timeWeight = Math.pow(0.5, ageDays / this.halfLifeDays);
      const combinedScore = r.score * 0.7 + timeWeight * 0.3;

      return { ...r, combinedScore };
    });

    scored.sort((a, b) => b.combinedScore - a.combinedScore);
    return scored.slice(0, limit);
  }
}

结果验证

class ResultValidator {
  private llm: LLM;

  async validate(results: RetrievalResult[], question: string): Promise<ValidationResult> {
    if (results.length === 0) {
      return { sufficient: false, feedback: '没有找到相关文档' };
    }

    const prompt = `问题:${question}
检索结果:
${results.map((r, i) => `[${i + 1}] ${r.content}`).join('\n')}

评估这些检索结果是否足以回答问题:
1. 相关性:结果与问题的相关程度
2. 充分性:信息是否足够回答问题
3. 一致性:结果之间是否矛盾

JSON: { "sufficient": boolean, "relevance": number, "feedback": string }`;

    const response = await this.llm.generate(prompt);
    return JSON.parse(response);
  }
}

知识融合

class KnowledgeSynthesizer {
  private llm: LLM;

  async synthesize(
    question: string,
    results: RetrievalResult[],
    context: ConversationContext
  ): Promise<string> {
    // 对结果进行去重和排序
    const uniqueResults = this.deduplicate(results);
    const rankedResults = this.rankByRelevance(uniqueResults, question);

    // 构建上下文
    const contextParts = rankedResults.map((r, i) =>
      `[来源 ${i + 1}] ${r.source}\n${r.content}`
    );

    const prompt = `基于以下参考资料回答问题。如果参考资料不足以回答,请说明。

问题:${question}

参考资料:
${contextParts.join('\n\n')}

要求:
1. 直接回答问题
2. 引用来源(如 [来源 1])
3. 如果信息不足,说明缺少什么信息`;

    return await this.llm.generate(prompt, context.messages);
  }

  private rankByRelevance(results: RetrievalResult[], question: string): RetrievalResult[] {
    return results.sort((a, b) => {
      // 综合考虑语义相关性和其他因素
      const scoreA = a.score * 0.6 + this.keywordOverlap(question, a.content) * 0.4;
      const scoreB = b.score * 0.6 + this.keywordOverlap(question, b.content) * 0.4;
      return scoreB - scoreA;
    });
  }

  private keywordOverlap(query: string, content: string): number {
    const queryWords = new Set(query.toLowerCase().split(/\s+/));
    const contentWords = new Set(content.toLowerCase().split(/\s+/));
    const overlap = [...queryWords].filter(w => contentWords.has(w)).length;
    return overlap / queryWords.size;
  }
}

文档索引管道

class DocumentIndexer {
  private embedder: Embedder;
  private vectorStore: VectorStore;
  private chunker: TextChunker;

  async indexDocument(document: Document): Promise<void> {
    // 1. 文档分块
    const chunks = await this.chunker.chunk(document.content, {
      strategy: 'semantic',  // 语义分块
      maxChunkSize: 512,
      overlap: 50,
    });

    // 2. 生成嵌入
    const embeddings = await Promise.all(
      chunks.map(chunk => this.embedder.embed(chunk.content))
    );

    // 3. 存储到向量数据库
    await this.vectorStore.upsert(
      chunks.map((chunk, i) => ({
        id: `${document.id}_chunk_${i}`,
        vector: embeddings[i],
        metadata: {
          content: chunk.content,
          source: document.source,
          title: document.title,
          chunkIndex: i,
          totalChunks: chunks.length,
          timestamp: Date.now(),
        },
      }))
    );
  }

  async reindex(documentId: string, newContent: string): Promise<void> {
    // 删除旧的 chunks
    await this.vectorStore.deleteByFilter({ documentId });
    // 重新索引
    await this.indexDocument({ id: documentId, content: newContent });
  }
}

常见问题(FAQ)

Agentic RAG 和普通 RAG 的主要区别?

普通 RAG 是固定的”检索-生成”流水线。Agentic RAG 让 Agent 自主决定何时检索、检索什么、如何验证结果,支持多轮检索和策略调整。

如何评估 RAG 系统质量?

关键指标:检索准确率(Recall@K)、回答正确率、引用准确率。使用人工标注的测试集进行评估。

向量数据库怎么选?

小规模用 Chroma,中等规模用 Pinecone/Weaviate,大规模用 Milvus。考虑延迟、成本和运维复杂度。

总结

RAG 与 Agent 的融合创造了新一代知识增强型 AI 系统。Agent 的推理能力让 RAG 从被动检索变为主动探索,多轮检索和结果验证确保了知识的准确性和完整性。这种架构是构建企业级知识助手的最佳实践。