DeepSeek API流式输出与多轮对话示例

deepseek api包含两个模型版本：deepseek-chat和deepseek-reasoner，分别是不带深度思考和深度思考的，前者更为轻量，输出字数与全面性不如后者，不带深度思考，但价格更为便宜，输出简洁。后者带深度思考，输出字数更多，意味着用户问一个问题需要等待的时间更长，价格更贵，但也更全面详细。max_tokens下面我写了两个流式输出版本、支持多轮对话的例子，只需要填写apio

夏天的清晨

369人浏览 · 2025-05-31 12:43:22

夏天的清晨 · 2025-05-31 12:43:22 发布

一、相关信息

模型(1)		deepseek-chat	deepseek-reasoner
上下文长度		64K	64K
输出长度(2)		默认 4K，最大 8K	默认 32K，最大 64K
功能	Json Output	支持	支持
	Function Calling	支持	支持
	对话前缀续写（Beta）	支持	支持
	FIM 补全（Beta）	支持	不支持
标准时段价格（北京时间 08:30-00:30）	百万tokens输入（缓存命中）(3)	0.5元	1元
	百万tokens输入（缓存未命中）	2元	4元
	百万tokens输出 (4)	8元	16元
优惠时段价格(5) （北京时间 00:30-08:30）	百万tokens输入（缓存命中）	0.25元（5折）	0.25元（2.5折）
	百万tokens输入（缓存未命中）	1元（5折）	1元（2.5折）
	百万tokens输出	4元（5折）	4元（2.5折）

deepseek-chat 模型对应 DeepSeek-V3-0324；deepseek-reasoner 模型对应 DeepSeek-R1-0528。
deepseek-reasoner 的 max_tokens 参数限制了模型单次输出的最大长度（思维链输出）。
关于上下文缓存的细节，请参考 DeepSeek 硬盘缓存。
deepseek-reasoner的输出 token 数包含了思维链和最终答案的所有 token，其计价相同。
DeepSeek API 现实行错峰优惠定价，每日优惠时段为北京时间 00:30-08:30，其余时间按照标准价格计费。请求的计价时间为该请求完成的时间。

下面我写了两个流式输出版本、支持多轮对话的例子，只需要填写apio key即可运行。

二、非深度思考版

from openai import OpenAI
from colorama import init, Fore, Style

init(autoreset=True)

client = OpenAI(api_key="你的key", base_url="https://api.deepseek.com")
messages = [{"role": "system", "content": "回答用户任何问题"}]

while True:
    print('-------------------------------------')
    sendMS = input(Fore.YELLOW+'\n\n请输入内容（按q退出对话）：')
    messages.append({"role": "user", "content": f"{sendMS}"}) # 拼接用户对话历史，实现多轮对话
    if sendMS == 'q':
        break

    stream_response = client.chat.completions.create(
        model="deepseek-chat",
        messages=messages,
        stream=True # 启用流式传输
    )

    # 流式响应输出
    print(Fore.GREEN+"\n\n--- 正式回答开始 ---\n")
    full_res = ''
    for chunk in stream_response:
        if chunk.choices[0].delta.content is not None:
            text_chunk = chunk.choices[0].delta.content
            full_res += text_chunk
            print(text_chunk, end="", flush=True)
    messages.append({"role": "assistant", "content": full_res})
    print('-------------------------------------\n')
print('对话已退出......')

三、深度思考版

from openai import OpenAI
from colorama import init, Fore, Style

init(autoreset=True)  

client = OpenAI(api_key="你的key", base_url="https://api.deepseek.com")
messages = [{"role": "system", "content": "回答用户任何问题"}]

while True:
    print('-------------------------------------')
    flag = False
    sendMS = input(Fore.YELLOW+'\n\n请输入内容（按q退出对话）：')
    messages.append({"role": "user", "content": f"{sendMS}"}) # 拼接用户对话历史，实现多轮对话
    if sendMS == 'q':
        break
    # 如果你需要流式响应（适合长内容）
    stream_response = client.chat.completions.create(
        model="deepseek-reasoner",
        messages=messages,
        stream=True # 启用流式传输
    )

    # 流式响应输出
    full_res = ''
    print(Fore.CYAN+"\n\n思考过程开始：\n")
    for chunk in stream_response:
        if chunk.choices[0].delta.reasoning_content is not None:
            text_chunk = chunk.choices[0].delta.reasoning_content
            print(text_chunk, end="", flush=True)
        if chunk.choices[0].delta.content is not None:
            if not flag:
                print(Fore.GREEN+"\n\n--- 正式回答开始 ---\n")
                flag = True
            text_chunk = chunk.choices[0].delta.content
            full_res += text_chunk
            print(text_chunk, end="", flush=True)
    messages.append({"role": "assistant", "content": full_res})
    print('-------------------------------------\n\n')
print('对话已退出......')

DeepSeek技术社区

欢迎加入DeepSeek 技术社区。在这里，你可以找到志同道合的朋友，共同探索AI技术的奥秘。

更多推荐

【深度学习】大模型-Transformer

DeepSeek技术社区

Kubernetes 上的大数据（三）

在本章中，你学习了如何在 Kubernetes 上部署和管理 Apache Spark、Apache Airflow 和 Apache Kafka 等关键大数据技术。将这些工具部署到 Kubernetes 上提供了多个好处，包括简化操作、更好的资源利用、扩展性、高可用性和统一的集群管理。你首先在 Kubernetes 上部署了 Spark 操作符，并运行了一个 Spark 应用程序来处理来自 Am