本地REST API调用deepseek SSE流式输出

本地调用deepseek

2301_82105191

965人浏览 · 2025-02-20 18:06:21

2301_82105191 · 2025-02-20 18:06:21 发布

直接先看效果视频，视频是window自带录屏录制，可能有些糊，，，，

效果

本地部署大模型通过 ollama，点击进入官网，直接下载对应系统的ollama，选择模型复制链接，直接cmd 下载，网上教程很多，自行百度，很简单。

安装下载完以后直接cmd 运行 run 对应模型，直接可以对话，说明可以了

另外一种方式使用，就是通过网页，这部分是通过docker，通过安装openwebui 去使用，教程也很多，随便一查就是，基本就是docker 安装具体可查 https://github.com/open-webui/open-webui

docker 安装成功后，直接cmd , 复制下面链接等待下载即可

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

成功以后，在docker desktop可看到webui 的镜像，直接点击下图所示

成功访问镜像以后，中间会有个账户登录过程，直接登录github 账户或者google 账户，没有账户，注册一个

ok，现在是接口方式调用

注意的点：

本地一定要先通过ollama run deepseek-r1:1.5b（参考文章开始贴图，具体可替换为自己下载版本），不然接口调不通，报 ollama 接口拒绝访问。
配置请求头流式，因为ollama 暴露的接口默认为流式输出，具体参考 https://github.com/ollama/ollama/blob/main/docs/api.md
示例代码后端代码是nodejs+express 快速搭建，具体看下面代码

ollama 接口说明

// 这是设置的请求头

上代码：

const express = require('express');
const fetch = require('node-fetch');
const app = express();
const port = 3002;
const baseUrl = 'http://127.0.0.1:11434/api/generate';
app.get('/predict', async (req, res) => {
  const text = req.query.text;
  // 设置 SSE 头
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  try {
    const response = await fetch(baseUrl, {
      method: 'POST',
      body: JSON.stringify({ model: 'deepseek-r1:1.5b',prompt:text })
    });

    if (!response.body) {
      throw new Error('Response body is null');
    }

    response.body.on('data', (chunk) => {
      res.write(`data: ${chunk.toString()}\n\n`);
    });

    response.body.on('end', () => {
      res.write('event: done\ndata: {}\n\n');
      res.end();
    });

    response.body.on('error', (error) => {
      console.error('Error:', error);
      res.end();
    });

  } catch (error) {
    console.error('请求失败:', error);
    res.status(500).end();
  }
});
app.listen(port, () => {
  console.log(`Server running on port ${port}`);
});

新建app.js 直接粘贴进去，

npm i 安装express node-fetch 依赖包，

"express": "^4.21.2",

"node-fetch": "^2.7.0"

node app.js 运行

简单说明：

前端注意点

示例使用 webapi new EventSource 实现流式接收也可以使用fetch ,axios （response.body.getReader() ）流式读取即可
安装markdown-it插件解析markdown
快速搭建vue3项目，cdn 或者vite , 无所谓能跑请求就行
解决跨域，本地端口不一致，通过代理，或其他方式

上代码：

<script setup>
import { ref, onUnmounted } from 'vue'
import markdownIntance from 'markdown-it'
const inputText = ref('')
const markdownits = new markdownIntance()
const eventSource = ref(null)
const messageInfo = ref([])
const isThinking = ref(true)
const getPrediction = async () => {
  messageInfo.value.push({
    type: 'user',
    content: inputText.value,
    id: Math.random()
  });
  eventSource.value = new EventSource(`/api/predict?text=${encodeURIComponent(inputText.value)}`);
  eventSource.value.onmessage = (event) => {
    const message = JSON.parse(event.data)
    if (message.response.includes('<think>')) {
      isThinking.value = true
      messageInfo.value[messageInfo.value.length - 1].think = ''
      message.response = ''
    }
    if (message.response.includes('</think>')) {
      isThinking.value = false
      message.response = ''
    }
    if (isThinking.value) {
      messageInfo.value[messageInfo.value.length - 1].think += message.response;
    } else {
      messageInfo.value[messageInfo.value.length - 1].content += message.response;
    }

  };
  eventSource.value.onopen = (e) => {
    inputText.value = ''
    messageInfo.value.push({
      type: 'ai',
      content: '',
      id: Math.random(),
      think: 'AI正在思考中...'
    });

  }
  eventSource.value.addEventListener('done', (event) => {
    console.log('done:', event);
    eventSource.value.close();
  });
  eventSource.value.onerror = (error) => {
    console.error('error:', error);
  };

  onUnmounted(() => {
    if (eventSource.value) {
      eventSource.value.close();
    }
  });
}
</script>

<template>
  <div class="containBox">
    <div style="display: flex;flex-direction: column;" v-for="item in messageInfo" :key="item.id">
      <div class="user" v-if="item.type === 'user'">
        {{ item.content }}
      </div>
      <div class="aiInfo" v-else>
        <div class="think" v-html="markdownits.render(item.think)">

        </div>
        <div class="content" v-html="markdownits.render(item.content)">

        </div>
      </div>
    </div>
  </div>
  <input @keyup.enter="getPrediction" v-model="inputText" />
  <button @click="getPrediction">predict</button>
</template>

<style scoped>
.containBox {
  display: flex;
  flex-direction: column;

  .user {
    align-self: flex-start;
  }

  .aiInfo {
    align-self: flex-end;
    width: 90%;
    text-align: left !important;

    .think {
      background-color: aquamarine;
    }
  }
}
</style>

代理示意，如果不是vite 工程文件没有vite.config.js 直接在请求实例去设置代理，如axios,proxy属性