LangChain, MCP Server, Qwen-Agent等测试及问题记录

参考官方文档：https://langchain-ai.github.io/langgraph/tutorials/introduction/由于想测试通过LangGraph编排让大模型调用工具，所以首先查询支持Function Calling的大模型:https://help.aliyun.com/zh/model-studio/qwen-function-calling使用云服务商提供的大模型

dxwd320

1392人浏览 · 2025-04-18 11:42:24

dxwd320 · 2025-04-18 11:42:24 发布

LangChain LangGraph

参考官方文档：https://langchain-ai.github.io/langgraph/tutorials/introduction/

1. 这里使用Qwen系列模型进行测试

由于想测试通过LangGraph编排让大模型调用工具，所以首先查询支持Function Calling的大模型:

https://help.aliyun.com/zh/model-studio/qwen-function-calling

1.1

使用云服务商提供的大模型需要申请api key

https://bailian.console.aliyun.com/?apiKey=1#/api-key
可用的模型名称 https://help.aliyun.com/zh/model-studio/models#ced16cb6cdfsy

from langchain_community.chat_models.tongyi import ChatTongyi
llm = ChatTongyi(model="qwen-max")
llm_with_tools = llm.bind_tools(tools)

1.2

本地部署使用vllm时调用工具失败

BadRequestError: Error code: 400 - {'object': 'error', 'message': '"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set', 'type': 'BadRequestError', 'param': None, 'code': 400}

根据提示尝试修改启动参数，未解决问题

(LLM) root@42c2e682b768:/workspace# vllm serve /workspace/Qwen2.5/Qwen/Qwen2.5-7B-Instruct  --tensor-parallel-size 4 --enable-auto-tool-choice --tool-call-parser
INFO 04-17 08:40:29 [__init__.py:239] Automatically detected platform cuda.
usage: vllm serve [model_tag] [options]
vllm serve: error: argument --tool-call-parser: expected one argument
(LLM) root@42c2e682b768:/workspace# vllm serve /workspace/Qwen2.5/Qwen/Qwen2.5-7B-Instruct  --tensor-parallel-size 4 --enable-auto-tool-choice
INFO 04-17 08:42:08 [__init__.py:239] Automatically detected platform cuda.
Traceback (most recent call last):
  File "/opt/conda/envs/LLM/bin/vllm", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/envs/LLM/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 48, in main
    cmds[args.subparser].validate(args)
  File "/opt/conda/envs/LLM/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 30, in validate
    validate_parsed_serve_args(args)
  File "/opt/conda/envs/LLM/lib/python3.10/site-packages/vllm/entrypoints/openai/cli_args.py", line 284, in validate_parsed_serve_args
    raise TypeError("Error: --enable-auto-tool-choice requires "
TypeError: Error: --enable-auto-tool-choice requires --tool-call-parser

本地部署使用OLLama

llm = ChatOpenAI(
    openai_api_base="http://127.0.0.1:11434/v1",  # Ollama 默认监听端口
    openai_api_key="ollama",                      # 可任意值，但不能为空
    model="llama3.3:latest",                              # 替换为你在 Ollama pull 的模型名字，如 qwen:7b、mistral、llama3 等
    temperature=0.7,
    verbose=True  # 打印模型返回内容
)

使用Ollama部署qwen2.5可以调用tool，需要显式指定用哪个tool，如果用多个tool会报错：

Cell In[25], line 110, in chatbot(state)
    106 message = llm_with_tools.invoke(state["messages"])
    107 # Because we will be interrupting during tool execution,
    108 # we disable parallel tool calling to avoid repeating any
    109 # tool invocations when we resume.
--> 110 assert len(message.tool_calls) <= 1
    111 return {"messages": [message]}

AssertionError:

而使用
ChatTongyi可以绑定多个tool，同时使用。

1.3

详细的测试代码如下，加了工具后每次响应的时长会变长，本地检索文档，受限于RAG的能力，匹配效果不好，需要专门调优，使用免费的tavily_search 搜素结果基本可用，还有一些付费的search tool:
https://python.langchain.com/docs/integrations/tools/

from typing import Annotated
from typing import Literal
from typing_extensions import TypedDict

from langchain_community.chat_models.tongyi import ChatTongyi
from langchain_community.document_loaders import UnstructuredURLLoader

from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.messages import ToolMessage, HumanMessage
from langchain_core.tools import InjectedToolCallId, tool
from langchain_openai import ChatOpenAI
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph, START, END, MessagesState
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langgraph.types import Command, interrupt

tavily_search = TavilySearchResults(max_results=2)
tools = [tavily_search , retrieve_context, human_assistance]
tool_node = ToolNode(tools=tools)
llm = ChatTongyi(model="qwen-max")
# 替换为本地 vLLM + Qwen 模型
#llm = ChatOpenAI(
#    openai_api_base="http://localhost:8000/v1",
#    openai_api_key="EMPTY",  # vLLM 默认不验证
#    model_name="Qwen2.5-7B-Instruct",
#    temperature=0.7
#)
llm_with_tools = llm.bind_tools(tools) 

class State(TypedDict):
    messages: Annotated[list, add_messages]
    name: str
    birthday: str
    
def chatbot(state: State):
    message = llm_with_tools.invoke(state["messages"])
    # Because we will be interrupting during tool execution,
    # we disable parallel tool calling to avoid repeating any
    # tool invocations when we resume.
    assert len(message.tool_calls) <= 1
    return {"messages": [message]}

# Define the workflow with LangGraph
graph_builder = StateGraph(State)
graph_builder.add_node("chatbot", chatbot)
graph_builder.add_node("tools", tool_node)

graph_builder.add_conditional_edges(
    "chatbot",
    tools_condition,
)
graph_builder.add_edge("tools", "chatbot")
graph_builder.add_edge(START, "chatbot")
memory = MemorySaver()

graph = graph_builder.compile(checkpointer=memory)

MCP Server

参考官方文档：https://modelcontextprotocol.io/quickstart/server 写一个server.py和client.py
官方的例子需要用到Claude API这里就模拟了一下：

class MockContent:
    def __init__(self, type_, text=None, name=None, input_=None):
        self.type = type_
        self.text = text
        self.name = name
        self.input = input_

class MockResponse:
    def __init__(self):
        self.content = [
            MockContent(type_='text', text='这是助手的初始回答。'),
            MockContent(type_='tool_use', name='get_forecast', input_={'latitude': 40.7128, 'longitude': -74.0060}),
            MockContent(type_='tool_use', name='get_alerts', input_={'state': 'NY'}),
            MockContent(type_='text', text='星期一:\n气温: 25°C\n风速: 10 km/h 东北风\n天气预报: 晴朗\n---\n星期二:\n气温: 22°C\n风速: 8 km/h 西南风\n天气预报: 多云')
        ]

response = MockResponse()
for content in response.content:
        if content.type == 'text':
            final_text.append(content.text)
            assistant_message_content.append(content)
        elif content.type == 'tool_use':
            tool_name = content.name
            tool_args = content.input

            # Execute tool call
            result = await self.session.call_tool(tool_name, tool_args)

上面只是测试了本地运行server.py client.py并交互，对于远程调用根据官方公告：

https://modelcontextprotocol.io/specification/2025-03-26/changelog

Replaced the previous HTTP+SSE transport with a more flexible Streamable HTTP transport (PR #206)

所以MCP本身功能还有待完善，有可能会做重大调整，再等等看。

相对于Langchain，MCP本身并没有提供更多新的功能，而且LangChain也对MCP进行了适配，所以二者并没有相关替代的关系 https://github.com/langchain-ai/langchain-mcp-adapters

langchain-mcp-adapters

参考github上面的readme实现server.py，修改一下client.py换用chatTongyi

cat client.py
import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

from langchain_mcp_adapters.tools import load_mcp_tools
from langgraph.prebuilt import create_react_agent

from langchain_openai import ChatOpenAI

from langchain_community.chat_models.tongyi import ChatTongyi

async def main():
    #model = ChatOpenAI(model="gpt-4o")
    model = ChatTongyi(model="qwen2.5-72b-instruct") #qwen-max qwen2.5-72b-instruct
    server_params = StdioServerParameters(
        command="python",
        # 替换为 math_server.py 的完整路径
        args=["server.py"],
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            tools = await load_mcp_tools(session)

            agent = create_react_agent(model, tools)
            agent_response = await agent.ainvoke({
                "messages": "what's (3 + 5) x 12?"
            })
            print(agent_response)

if __name__ == "__main__":
    asyncio.run(main())