大模型这四年

从大模型元年2022年开始,当前关注度和影响力依然势头不减的大模型已经走到了第四个年尾。回头翻了一下快三年前写的使用ChatGPT的一点感受和思考
,当时提到的几点感受和思考,放在今天看来又有了一些新的感受和收获。

关于一个更好的wikipedia,但是时效性较差,也没法输入新的知识这个观点,今天看来已经不太成立了。大模型的知识库和知识图谱的结合,已经让大模型可以具备更好的时效性和可扩展性。比如结合检索增强生成(RAG)技术,可以让大模型在回答问题时,实时检索最新的信息,从而提供更准确和及时的回答。而随着基模能力的增强,尤其是推理能力、MoE 的引入,RAG在很多场景也越来越变得不那么必要了。短短三年,大模型技术从单纯的知识问答,已经发展到了可以进行复杂推理和决策支持的阶段。

关于它能帮你代笔写东西,但不是创作这个观点依然有争议。从身边的内容创作者样本来看,很多人已经开始将大模型作为创作的辅助工具,帮助他们生成初稿、提供灵感和优化语言表达。大模型在创作过程中扮演了一个重要的角色,尤其是在需要大量文本生成的场景下,比如写作、编程、设计等。然而,大模型生成的内容仍然需要人类的审校和润色,以确保其质量和准确性。因此,大模型更多地被视为一种增强创作能力的工具,而不是完全替代人类创作者。但这两年也也分明越来越明确的听到内容创作者的两个心声:恐惧和拥抱。恐惧的是大模型可能会取代一部分内容创作者的工作,尤其是那些重复性较高、技术含量较低的工作。而拥抱则是看到大模型带来的效率提升和创作灵感,愿意积极利用大模型来提升自己的创作能力和竞争力。

特别有意思的是,在前两天——52%比48%:互联网上AI生成的内容数量首超人类. 关于这点其实三年前自己也有过非常准确的判断:

及格内容生成的成本极低,互联网高质量内容将会以快的速度被稀释。而最有可能形成垃圾内容成山的领域就是导购、营销、水军等一软文为生存手段的领域。

但是,真的没想到这么快就达到了这个临界点。而这个临界点几乎宣布了一些行业的终结,比如传统的SEO写手、低质量内容创作者,甚至是传统的不思进取的搜索引擎。未来的内容创作将更多地依赖“独一无二”。关于这一点,看看百度和小红书的搜索流量和势头就会有深切的体感。虽然百度在搜索中也引入了AI生成内容,但是用户对AI生成内容的信任度和满意度仍然存在较大差距。相比之下,小红书等平台更注重用户生成内容(UGC)的真实性和多样性,吸引了大量用户进行分享和交流。因此,未来的内容创作将更多地依赖于独特的视角、真实的体验和个性化的表达,而不是单纯依赖于AI生成的内容。

关于“嵌入派”与“降临派”。嵌入派的观点认为大模型AI即工具,应该被嵌入到各种应用和服务中,以提升用户体验和工作效率。而降临派则认为大模型AI将会带来颠覆性的变革,甚至可能改变人类社会的运作方式。从目前的发展趋势来看,嵌入派与降临派分别代表了两种典型的工程构建方式:自下而上与自上而下。而比较奇妙的点在于,这两个构建方式都在同时进行,并且在不同的尺度上不断融合发展。比如,Anthropic提出的 MCP 以及 Claude Skills,都是典型的基于大模型嵌入派思路的延生;而Claude Code又为开发者提供强大的vibe coding能力,推进降临派的发展。DeepSeek R1、 OCR 这样的技术,则更多地体现了降临派的思路,通过模型架构和设计的创新,推进大模型能力的跃迁。

如果你是当前大模型技术的从业者或者爱好者,建议你仔细回顾和研究一下 LangChain 的技术架构变更。如果一定要在现在给出一个明确的建议,那就是:在可控的系统架构上去嵌入大模型能力,然后不断地沉淀为系统的确定性模块或子系统,进而扩大嵌入的尺度,持续迭代,无限进步。从这个思路触发,这就跟一门编程语言最终要实现如何自举的设计哲学几乎完全一致。毕竟,别忘了,LLM 的中心正是 Language.

MCP 业务落地实践

纸上得来终觉浅,绝知此事要躬行。在上篇近期LLM的一些趋势之二——MCP之后,很快发现这不只是趋势,而是已经成为行业前仆后继落地的现实。虽然本心是不想为了热点和流行而去追逐,但近期有好几个业务场景在分析和demo之后,都发现可以通过MCP以成本较低、有标准可循且可持续迭代演进的方式显著改善旧有问题的解决效果。因此,这个热点算是一拍即合。

网上关于MCP的文章很多了,但是更多是停留在MCP本身,很多其实一眼可以看出也是AI的产物。这本身是时代发展的趋势,但这些文章都缺乏如何把MCP与LLM结合落地的全局视角。更进一步的,真正落地时候需呀处理的关键细节也是当前缺失的。以前提起躬身入局是一种行动,更是一种姿态的表达,而在AI时代来临面前这可能是留给human的不多的一块自留地。下面以自己实践的过程,结合协议底层拆解下MCP的落地细节和一些关键问题思考。

MCP 协议

MCP RPC 目前发布了两个版本。都可以在官网specification中找到。
* 2024-11-05: https://modelcontextprotocol.io/specification/2024-11-05
* 2025-03-26: https://modelcontextprotocol.io/specification/2025-03-26

官方的文档整体质量很高,我推荐所有要做MCP业务落地的都花一点时间全文阅读。尤其是其中的Best Practice部分,里面有很多非常实用的建议和最佳实践。

  • 当前时间点上,生态上对2024-11-05的支持更加完善,因此生产级别上使用的话,建议先采纳这个版本的实现,兼容性上也会更好。
  • transport 层支持stdio 和 HTTP with SSE(后简称SSE) 两种模式。
  • stdio 适合在本地运行的agent client
  • HTTP with SSE 适合远程运行的agent client
  • 因此,大部分B/S模式的业务落地应该使用 SSE 作为 transport 层。因此MCP 网关是架构上很自然要建设的关键基建之一。下文我们单独讲这部分内容。
  • 两种 transport 均使用 JSON-RPC 2.0 作为消息格式进行通信。这部分内容也强烈建议阅读一下 specification, 对后面无论是落地建设 MCP Server 还是 MCP gateway 都有很大帮助。

如何把构建或将已有服务转换为 MCP Server

这两个问题之所以放在一起,是因为在SSE 作为 transport 的前提下,本质上这是同一个问题。在这种模式下的 MCP Server 是一个 支持 MCP 协议的 HTTP server,负责处理来自 client 的请求,并将结果返回给 client。MCP Server 需要实现以下几个关键点:

  • 主要处理两个端点 endpoint: SSE 连接端点 以及 消息通信端点。
  • SSE连接端点:一般实现使用的端点为 /sse, 该端点负责与client建立长连接session, 走 SSE. 这里可以做鉴权认证,同时会返回给client 消息通信的端点。
  • 消息通信端点:一般实现使用的端点为 /message 进行消息的发送和接收,走标准的 HTTP POST, 这里也可以做鉴权认证。
  • 其他详情可参考参考spec中的(Transport)[https://modelcontextprotocol.io/specification/2024-11-05/basic/transports].

作为一名gopher, 简单看了下mcp-go相关部分的实现:

SSE 端点建立连接,返回消息通信端点:

func (s *SSEServer) handleSSE(w http.ResponseWriter, r *http.Request) {
...
// Send the initial endpoint event
fmt.Fprintf(w, "event: endpoint\ndata: %s\r\n\r\n", s.GetMessageEndpointForClient(sessionID))
flusher.Flush()

// Main event loop - this runs in the HTTP handler goroutine
for {
select {
case event := <-session.eventQueue:
            // Write the event to the response
            fmt.Fprint(w, event)
            flusher.Flush()
        case <-r.Context().Done():
            close(session.done)
            return
        case <-session.done:
            return
        }
    }

注:其中SSE 消息格式可参考阮一峰老师的Server-Sent Events 教程

消息通信端点HTTP POST请求处理:


    switch baseMessage.Method {
    case mcp.MethodInitialize:
        var request mcp.InitializeRequest
        var result *mcp.InitializeResult
        if unmarshalErr := json.Unmarshal(message, &request); unmarshalErr != nil {
            err = &requestError{
                id:   baseMessage.ID,
                code: mcp.INVALID_REQUEST,
                err:  &UnparsableMessageError{message: message, err: unmarshalErr, method: baseMessage.Method},
            }
        } else {
            s.hooks.beforeInitialize(ctx, baseMessage.ID, &request)
            result, err = s.handleInitialize(ctx, baseMessage.ID, request)
        }
        if err != nil {
            s.hooks.onError(ctx, baseMessage.ID, baseMessage.Method, &request, err)
            return err.ToJSONRPCError()
        }
        s.hooks.afterInitialize(ctx, baseMessage.ID, &request, result)
        return createResponse(baseMessage.ID, *result)
    case mcp.MethodPing:
        var request mcp.PingRequest
        var result *mcp.EmptyResult
        if unmarshalErr := json.Unmarshal(message, &request); unmarshalErr != nil {
            err = &requestError{
                id:   baseMessage.ID,
                code: mcp.INVALID_REQUEST,
                err:  &UnparsableMessageError{message: message, err: unmarshalErr, method: baseMessage.Method},
            }
        } else {
            s.hooks.beforePing(ctx, baseMessage.ID, &request)
            result, err = s.handlePing(ctx, baseMessage.ID, request)
        }
        if err != nil {
            s.hooks.onError(ctx, baseMessage.ID, baseMessage.Method, &request, err)
            return err.ToJSONRPCError()
        }
        s.hooks.afterPing(ctx, baseMessage.ID, &request, result)
        return createResponse(baseMessage.ID, *result)
    case mcp.MethodResourcesList:

从上面代码可以看出,无论是初始化、接口列表发现、还是工具调用,都是通过HTTP POST请求来实现的。这里需要注意的是,MCP Server 需要同时支持长连接的SSE端点和短连接的HTTP POST端点。

但是在实际的业务系统中,不太可能把每个应用服务都按照以上方式进行改造。一个推荐的架构方式:

MCP client –> MCP server 网关 –> application server

但是从效果上看,以为直接套用这个架构思路就可以大杀四方就是一种幻想了。从实践上看,至少要做好一下节点:

  • 如何进行领域MCP Server的划分。映射到tools上,就是如何做接口的领域分层和治理。
  • tools中每个接口到应用层接口的编排
  • 每个tool的准确描述、参数说明,以及返回参数的剪裁,确保值返回必要、明确的字段。
  • 接口返回结果错误的精细处理和返回。这部分需要一些耐心,确保LLM能够尽可能的理解异常情况,持续react, 而不阻断后续推理。

如何在agent client中使用MCP

MCP Client 作为桥梁,按照协议与 MCP Server 进行通信。然后将MCP server 提供的tools, resources, prompts 提供给LLM。

实际应用中,LLM 通过prompt发现和使用mcp server:

{
"model": "deepseek-reasoner",
"messages": [
{
"role": "system",
"content": "In this environment you have access to a set of tools you can use to answer the user's question. You can use one tool per message, and will receive the result of that tool use in the user's response. You use tools step-by-step to accomplish a given task, with each tool use informed by the result of the previous tool use.\n\n## Tool Use Formatting\n\nTool use is formatted using XML-style tags. The tool name is enclosed in opening and closing tags, and each parameter is similarly enclosed within its own set of tags. Here's the structure:\n\n<tool_use>\n  <name>{tool_name}</name>\n  <arguments>{json_arguments}</arguments>\n</tool_use>\n\nThe tool name should be the exact name of the tool you are using, and the arguments should be a JSON object containing the parameters required by that tool. For example:\n<tool_use>\n  <name>python_interpreter</name>\n  <arguments>{\"code\": \"5 + 3 + 1294.678\"}</arguments>\n</tool_use>\n\nThe user will respond with the result of the tool use, which should be formatted as follows:\n\n<tool_use_result>\n  <name>{tool_name}</name>\n  <result>{result}</result>\n</tool_use_result>\n\nThe result should be a string, which can represent a file or any other output type. You can use this result as input for the next action.\nFor example, if the result of the tool use is an image file, you can use it in the next action like this:\n\n<tool_use>\n  <name>image_transformer</name>\n  <arguments>{\"image\": \"image_1.jpg\"}</arguments>\n</tool_use>\n\nAlways adhere to this format for the tool use to ensure proper parsing and execution.\n\n## Tool Use Examples\n\nHere are a few examples using notional tools:\n---\nUser: Generate an image of the oldest person in this document.\n\nAssistant: I can use the document_qa tool to find out who the oldest person is in the document.\n<tool_use>\n  <name>document_qa</name>\n  <arguments>{\"document\": \"document.pdf\", \"question\": \"Who is the oldest person mentioned?\"}</arguments>\n</tool_use>\n\nUser: <tool_use_result>\n  <name>document_qa</name>\n  <result>John Doe, a 55 year old lumberjack living in Newfoundland.</result>\n</tool_use_result>\n\nAssistant: I can use the image_generator tool to create a portrait of John Doe.\n<tool_use>\n  <name>image_generator</name>\n  <arguments>{\"prompt\": \"A portrait of John Doe, a 55-year-old man living in Canada.\"}</arguments>\n</tool_use>\n\nUser: <tool_use_result>\n  <name>image_generator</name>\n  <result>image.png</result>\n</tool_use_result>\n\nAssistant: the image is generated as image.png\n\n---\nUser: \"What is the result of the following operation: 5 + 3 + 1294.678?\"\n\nAssistant: I can use the python_interpreter tool to calculate the result of the operation.\n<tool_use>\n  <name>python_interpreter</name>\n  <arguments>{\"code\": \"5 + 3 + 1294.678\"}</arguments>\n</tool_use>\n\nUser: <tool_use_result>\n  <name>python_interpreter</name>\n  <result>1302.678</result>\n</tool_use_result>\n\nAssistant: The result of the operation is 1302.678.\n\n---\nUser: \"Which city has the highest population , Guangzhou or Shanghai?\"\n\nAssistant: I can use the search tool to find the population of Guangzhou.\n<tool_use>\n  <name>search</name>\n  <arguments>{\"query\": \"Population Guangzhou\"}</arguments>\n</tool_use>\n\nUser: <tool_use_result>\n  <name>search</name>\n  <result>Guangzhou has a population of 15 million inhabitants as of 2021.</result>\n</tool_use_result>\n\nAssistant: I can use the search tool to find the population of Shanghai.\n<tool_use>\n  <name>search</name>\n  <arguments>{\"query\": \"Population Shanghai\"}</arguments>\n</tool_use>\n\nUser: <tool_use_result>\n  <name>search</name>\n  <result>26 million (2019)</result>\n</tool_use_result>\nAssistant: The population of Shanghai is 26 million, while Guangzhou has a population of 15 million. Therefore, Shanghai has the highest population.\n\n\n## Tool Use Available Tools\nAbove example were using notional tools that might not exist for you. You only have access to these tools:\n<tools>\n\n<tool>\n  <name>fj5EiLHfjeWQhDMmQTXZUr</name>\n  <description>获取当前时间</description>\n  <arguments>\n    {\"type\":\"object\"}\n  </arguments>\n</tool>\n\n\n<tool>\n  <name>fq-YVmKZQ2p6aciTm5WVtX</name>\n  <description>获取技能组指标的当前值</description>\n  <arguments>\n    {\"type\":\"object\",\"properties\":{\"qualifierCodeList\":{\"description\":\"指标code列表\",\"type\":\"array\"},\"skillGroupId\":{\"description\":\"技能组ID\",\"type\":\"string\"}},\"required\":[\"skillGroupId\",\"qualifierCodeList\"]}\n  </arguments>\n</tool>\n\n\n<tool>\n  <name>fTlRF3f4Q22VGUdO0xbIxt</name>\n  <description>获取技能组指标的历史值</description>\n  <arguments>\n    {\"type\":\"object\",\"properties\":{\"beginTime\":{\"description\":\"开始时间,格式:2024-11-28 18:00:00,必须为整分钟\",\"type\":\"string\"},\"endTime\":{\"description\":\"结束时间,格式:2024-11-28 18:00:00,必须为整分钟\",\"type\":\"string\"},\"qualifierCode\":{\"description\":\"指标code\",\"type\":\"string\"},\"skillGroupId\":{\"description\":\"技能组ID\",\"type\":\"string\"}},\"required\":[\"skillGroupId\",\"qualifierCode\",\"beginTime\",\"endTime\"]}\n  </arguments>\n</tool>\n\n\n<tool>\n  <name>fU6szvcmSkfRaFBoMbptOz</name>\n  <description>获取技能组的指标列表</description>\n  <arguments>\n    {\"type\":\"object\"}\n  </arguments>\n</tool>\n\n</tools>\n\n## Tool Use Rules\nHere are the rules you should always follow to solve your task:\n1. Always use the right arguments for the tools. Never use variable names as the action arguments, use the value instead.\n2. Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself.\n3. If no tool call is needed, just answer the question directly.\n4. Never re-do a tool call that you previously did with the exact same parameters.\n5. For tool use, MARK SURE use XML tag format as shown in the examples above. Do not use any other format.\n\n# User Instructions\n\n\nNow Begin! If you solve the task correctly, you will receive a reward of $1,000,000.\n"
},
{
"role": "user",
"content": "分析一下饿了么技能组 1221000000006593548 今天18点以来的平均产能趋势,用表格展示\n"
}
],
"stream": true
}

系统prompt部分格式化后如下:

In this environment you have access to a set of tools you can use to answer the user's question. You can use one tool per message, and will receive the result of that tool use in the user's response. You use tools step-by-step to accomplish a given task, with each tool use informed by the result of the previous tool use.

## Tool Use Formatting

Tool use is formatted using XML-style tags. The tool name is enclosed in opening and closing tags, and each parameter is similarly enclosed within its own set of tags. Here's the structure:

<tool_use>
<name>{tool_name}</name>
<arguments>{json_arguments}</arguments>
</tool_use>

The tool name should be the exact name of the tool you are using, and the arguments should be a JSON object containing the parameters required by that tool. For example:
<tool_use>
<name>python_interpreter</name>
<arguments>{"code": "5 + 3 + 1294.678"}</arguments>
</tool_use>

The user will respond with the result of the tool use, which should be formatted as follows:

<tool_use_result>
<name>{tool_name}</name>
<result>{result}</result>
</tool_use_result>

The result should be a string, which can represent a file or any other output type. You can use this result as input for the next action.
For example, if the result of the tool use is an image file, you can use it in the next action like this:

<tool_use>
<name>image_transformer</name>
<arguments>{"image": "image_1.jpg"}</arguments>
</tool_use>

Always adhere to this format for the tool use to ensure proper parsing and execution.

## Tool Use Examples

Here are a few examples using notional tools:
---
User: Generate an image of the oldest person in this document.

Assistant: I can use the document_qa tool to find out who the oldest person is in the document.
<tool_use>
<name>document_qa</name>
<arguments>{"document": "document.pdf", "question": "Who is the oldest person mentioned?"}</arguments>
</tool_use>

User: <tool_use_result>
<name>document_qa</name>
<result>John Doe, a 55 year old lumberjack living in Newfoundland.</result>
</tool_use_result>

Assistant: I can use the image_generator tool to create a portrait of John Doe.
<tool_use>
<name>image_generator</name>
<arguments>{"prompt": "A portrait of John Doe, a 55-year-old man living in Canada."}</arguments>
</tool_use>

User: <tool_use_result>
<name>image_generator</name>
<result>image.png</result>
</tool_use_result>

Assistant: the image is generated as image.png

---
User: "What is the result of the following operation: 5 + 3 + 1294.678?"

Assistant: I can use the python_interpreter tool to calculate the result of the operation.
<tool_use>
<name>python_interpreter</name>
<arguments>{"code": "5 + 3 + 1294.678"}</arguments>
</tool_use>

User: <tool_use_result>
<name>python_interpreter</name>
<result>1302.678</result>
</tool_use_result>

Assistant: The result of the operation is 1302.678.

---
User: "Which city has the highest population , Guangzhou or Shanghai?"

Assistant: I can use the search tool to find the population of Guangzhou.
<tool_use>
<name>search</name>
<arguments>{"query": "Population Guangzhou"}</arguments>
</tool_use>

User: <tool_use_result>
<name>search</name>
<result>Guangzhou has a population of 15 million inhabitants as of 2021.</result>
</tool_use_result>

Assistant: I can use the search tool to find the population of Shanghai.
<tool_use>
<name>search</name>
<arguments>{"query": "Population Shanghai"}</arguments>
</tool_use>

User: <tool_use_result>
<name>search</name>
<result>26 million (2019)</result>
</tool_use_result>
Assistant: The population of Shanghai is 26 million, while Guangzhou has a population of 15 million. Therefore, Shanghai has the highest population.

## Tool Use Available Tools
Above example were using notional tools that might not exist for you. You only have access to these tools:
<tools></tools>

<tool>
<name>fj5EiLHfjeWQhDMmQTXZUr</name>
<description>获取当前时间</description>
<arguments>
{"type":"object"}
</arguments>
</tool>

<tool>
<name>fq-YVmKZQ2p6aciTm5WVtX</name>
<description>获取技能组指标的当前值</description>
<arguments>
{"type":"object","properties":{"qualifierCodeList":{"description":"指标code列表","type":"array"},"skillGroupId":{"description":"技能组ID","type":"string"}},"required":["skillGroupId","qualifierCodeList"]}
</arguments>
</tool>

<tool>
<name>fTlRF3f4Q22VGUdO0xbIxt</name>
<description>获取技能组指标的历史值</description>
<arguments>
{"type":"object","properties":{"beginTime":{"description":"开始时间,格式:2024-11-28 18:00:00,必须为整分钟","type":"string"},"endTime":{"description":"结束时间,格式:2024-11-28 18:00:00,必须为整分钟","type":"string"},"qualifierCode":{"description":"指标code","type":"string"},"skillGroupId":{"description":"技能组ID","type":"string"}},"required":["skillGroupId","qualifierCode","beginTime","endTime"]}
</arguments>
</tool>

<tool>
<name>fU6szvcmSkfRaFBoMbptOz</name>
<description>获取技能组的指标列表</description>
<arguments>
{"type":"object"}
</arguments>
</tool>

## Tool Use Rules
Here are the rules you should always follow to solve your task:
1. Always use the right arguments for the tools. Never use variable names as the action arguments, use the value instead.
2. Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself.
3. If no tool call is needed, just answer the question directly.
4. Never re-do a tool call that you previously did with the exact same parameters.
5. For tool use, MARK SURE use XML tag format as shown in the examples above. Do not use any other format.

# User Instructions

Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.

当然最简单的方式就是直接使用支持MCP的客户端应用,比如 Cursor、Cherry Studio.

一些工具

  • MCP Server调试工具:(MCP inspector)[https://github.com/modelcontextprotocol/inspector]
  • Cherry Studio:很灵活、用户体验很好的支持MCP 的 agent client.