AI Infra 2026-04-26

AI Infra 演进路线：从 LLM API 到完整的 Agent 基础设施

梳理 AI 基础设施的演进脉络，从最简单的 API 调用到完整的 Agent 运行时、编排层和治理平台，展望未来发展方向。

三年前，调用一个 LLM API 就算”AI 工程”了。今天，一个生产级 Agent 系统需要工具运行时、记忆系统、工作流引擎、可观测平台、安全网关……AI 基础设施（AI Infra）正在从”一个 API 调用”演变为一个完整的技术栈。

演进路线图

Phase 1: API 调用时代 (2023)
┌──────────┐
│ LLM API  │ ← 直接调用，手动拼接提示词
└──────────┘

Phase 2: 框架时代 (2024)
┌──────────┐  ┌──────────┐
│ LLM API  │  │ LangChain│ ← 抽象层，链式调用
└──────────┘  └──────────┘

Phase 3: Agent 时代 (2025)
┌──────────┐  ┌──────────┐  ┌──────────┐
│ LLM API  │  │ Agent    │  │ Tools    │ ← 自主推理 + 工具调用
└──────────┘  │ Runtime  │  │ Registry │
              └──────────┘  └──────────┘

Phase 4: 平台时代 (2026)
┌──────────────────────────────────────────┐
│           Agent Platform                  │
│  ┌────────┐ ┌────────┐ ┌────────┐       │
│  │Runtime │ │Gateway │ │Memory  │       │
│  ├────────┤ ├────────┤ ├────────┤       │
│  │Workflow│ │Tracing │ │Security│       │
│  ├────────┤ ├────────┤ ├────────┤       │
│  │Sandbox │ │CostCtrl│ │Registry│       │
└──────────────────────────────────────────┘

Phase 1：API 调用

最简单的形态，直接调用 LLM API：

// 2023 年的典型代码
async function chat(userMessage: string) {
  const response = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'gpt-4',
      messages: [
        { role: 'system', content: 'You are a helpful assistant.' },
        { role: 'user', content: userMessage },
      ],
    }),
  });

  return response.json();
}

这个阶段的问题：

无抽象，代码与 API 紧耦合
无错误处理和重试
无成本追踪
无可观测性

Phase 2：框架抽象

LangChain 等框架提供了抽象层：

// 2024 年的典型代码
import { ChatOpenAI } from '@langchain/openai';
import { PromptTemplate } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';

const model = new ChatOpenAI({ model: 'gpt-4' });
const prompt = PromptTemplate.fromTemplate('请用中文回答：{question}');
const chain = prompt.pipe(model).pipe(new StringOutputParser());

const answer = await chain.invoke({ question: '什么是 MCP？' });

进步：

模型抽象，可切换 Provider
链式组合，代码更清晰
基础的错误处理

问题：

抽象泄漏，调试困难
过度工程化
不支持 Agent 模式

Phase 3：Agent 运行时

Agent 模式兴起，工具调用成为核心能力：

// 2025 年的典型代码
class AgentRuntime {
  private llm: LLM;
  private tools: ToolRegistry;
  private memory: MemorySystem;

  async run(task: string): Promise<string> {
    const messages = [
      { role: 'system', content: this.systemPrompt },
      { role: 'user', content: task },
    ];

    while (true) {
      const response = await this.llm.chat(messages, {
        tools: this.tools.getToolSchemas(),
      });

      if (response.toolCalls) {
        for (const call of response.toolCalls) {
          const result = await this.tools.execute(call.name, call.args);
          messages.push({ role: 'tool', content: result });
        }
        continue;
      }

      return response.content;
    }
  }
}

核心组件：

Agent Runtime：推理循环引擎
Tool Registry：工具注册和发现
Memory System：短期和长期记忆

Phase 4：完整平台

2026 年，Agent 平台成为企业标配：

// 平台级 Agent 系统
class AgentPlatform {
  // 核心运行时
  private runtime: AgentRuntime;
  private workflowEngine: WorkflowEngine;

  // 基础设施
  private gateway: AIGateway;
  private sandbox: SandboxManager;
  private memory: MemorySystem;

  // 可观测性
  private tracer: Tracer;
  private metrics: Metrics;
  private logger: Logger;

  // 治理
  private auth: AuthManager;
  private costControl: CostController;
  private registry: AgentRegistry;

  async deployAgent(config: AgentConfig): Promise<AgentHandle> {
    // 1. 注册 Agent
    const agent = await this.registry.register(config);

    // 2. 配置权限
    await this.auth.configurePermissions(agent.id, config.permissions);

    // 3. 部署沙箱
    const sandbox = await this.sandbox.create({
      image: config.runtimeImage,
      resources: config.resources,
    });

    // 4. 注册工具
    for (const tool of config.tools) {
      await this.runtime.registerTool(agent.id, tool);
    }

    // 5. 启动
    return this.runtime.start(agent.id);
  }
}

关键基础设施

AI Gateway

统一管理所有 LLM 流量：

class AIGateway {
  async proxy(request: LLMRequest): Promise<LLMResponse> {
    // 认证
    await this.auth.authenticate(request);

    // 限流
    await this.rateLimit.check(request.userId);

    // 路由到最优 Provider
    const provider = this.router.select(request.model);

    // 调用
    const response = await provider.call(request);

    // 记录成本
    await this.costTracker.record(request, response);

    // 记录日志
    await this.logger.log(request, response);

    return response;
  }
}

Agent Registry

class AgentRegistry {
  private agents: Map<string, AgentRegistration> = new Map();

  register(config: AgentConfig): AgentRegistration {
    const registration: AgentRegistration = {
      id: generateId(),
      name: config.name,
      version: config.version,
      capabilities: config.capabilities,
      tools: config.tools,
      permissions: config.permissions,
      status: 'active',
      registeredAt: Date.now(),
    };

    this.agents.set(registration.id, registration);
    return registration;
  }

  discover(requirements: CapabilityRequirements): AgentRegistration[] {
    return Array.from(this.agents.values()).filter(agent =>
      requirements.capabilities.every(cap =>
        agent.capabilities.includes(cap)
      )
    );
  }
}

Cost Controller

class CostController {
  private budgets: Map<string, Budget> = new Map();

  async checkBudget(agentId: string, estimatedCost: number): Promise<boolean> {
    const budget = this.budgets.get(agentId);
    if (!budget) return true;

    const currentSpend = await this.getCurrentSpend(agentId);
    return currentSpend + estimatedCost <= budget.limit;
  }

  async enforceQuota(agentId: string): Promise<void> {
    const budget = this.budgets.get(agentId);
    if (!budget) return;

    const currentSpend = await this.getCurrentSpend(agentId);

    if (currentSpend >= budget.limit * 0.9) {
      await this.alert.notify(agentId, 'budget_warning', {
        current: currentSpend,
        limit: budget.limit,
        percentage: (currentSpend / budget.limit) * 100,
      });
    }

    if (currentSpend >= budget.limit) {
      await this.runtime.pauseAgent(agentId);
      await this.alert.notify(agentId, 'budget_exceeded');
    }
  }
}

技术选型指南

组件	开源方案	商业方案
LLM Provider	vLLM, Ollama	OpenAI, Anthropic
向量数据库	Milvus, Chroma	Pinecone, Weaviate
工作流引擎	Temporal, Prefect	-
可观测性	Jaeger, Prometheus	Datadog, New Relic
网关	Kong, Envoy	-
沙箱	gVisor, Firecracker	-

常见问题（FAQ）

什么时候需要 Agent 平台？

当你的 Agent 数量超过 5 个，或者需要统一管理权限、成本和可观测性时。

自建还是使用云服务？

早期用云服务快速验证，规模扩大后自建核心组件，使用云服务补充边缘能力。

AI Infra 的最大挑战是什么？

标准化。目前各个组件缺乏统一的接口标准，集成成本高。MCP、A2A 等协议正在解决这个问题。

总结

AI Infra 正在从简单的 API 调用演进为完整的 Agent 平台。理解这个演进路线，可以帮助你做出更好的技术选型决策，避免过早优化或过度工程化。关键是根据当前阶段的实际需求，选择合适的抽象层次。