Ollama【部署 02】Linux本地化部署及SpringBoot2.X集成Ollama(ollama-linux-amd64.tgz最新版本 0.6.2 网盘分享)
·
安装资源分享:
百度网盘链接: https://pan.baidu.com/s/17qK0Nx73bFOsicLgLmA8-A?pwd=tc61 提取码: tc61
包含文件:
- Windows OllamaStep.exe(版本 0.5.7)
- Linux ollama-linux-amd64-0.3.9.tgz
- Linux ollama-linux-amd64-0.5.11.tgz
- Linux ollama-linux-amd64-0.6.2.tgz
- Chatbox-1.9.8-Step.exe(实现客户端操作)
- AnythingLLMDesktop-v1.7.4.exe(实现客户端操作)
1.本地部署
1.1 软件安装
- Ollama 官网
1.1.1 脚本安装
curl -fsSL https://ollama.com/install.sh | sh
脚本安装默认路径是 /usr/local
,如果此目录挂载空间不大,建议手动安装。
1.1.2 手动安装
点击 Manual install instructions
可查看《手动安装文档》包含:
- 一般 Linux 安装
- AMD GPU install
- ARM64 install
- Adding Ollama as a startup service (recommended)
- Install CUDA drivers (optional)
- Install AMD ROCm drivers (optional)
- Customizing
- Updating
- Installing specific versions
- Viewing logs
- Uninstall
本次进行一般安装,流程如下:
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
下载非常慢,文件1604M
显示需要54 小时
有点难绷,网盘上也有分享,版本是 0.3.9
。
下载:
# 使用后台下载
nohup curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz > download.log 2>&1 &
# 使用 wget 进行断点续传
nohup wget -c https://ollama.com/download/ollama-linux-amd64.tgz > download.log 2>&1 &
# wget跟随的最新url
https://github.com/ollama/ollama/releases/latest/download/ollama-linux-amd64.tgz
下载失败后可以续传,虽然还是很慢,但是之前下载的数据没有白费:
解压、启动:
# 解压
sudo tar -C /usr -xzf ollama-linux-amd64.tgz
# 启动
ollama serve
# 验证
ollama -v
# 使用后台启动
nohup ./ollama serve >> serve.log 2>&1 &
# CPU
msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="156.1 GiB" available="105.4 GiB"
# GPU
msg="inference compute" id=GPU-5892a465-7090-90e9-d072-f04f3e56380a library=cuda variant=v11 compute=7.5 driver=0.0 name="" total="14.8 GiB" available="11.9 GiB"'
# 启动后 调用时的信息【1.5b】
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:2F:00.0 Off | 0 |
| N/A 77C P0 34W / 70W | 2831MiB / 15109MiB | 11% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 281537 C ...a_v11/ollama_llama_server 1809MiB |
+-----------------------------------------------------------------------------+
1.2 模型安装
安装命令跟 Windows 版本的一致:
ollama run deepseek-r1:1.5b
我使用的是 root 用户,模型安装在 /root/.ollama/models
目录下。
1.3 端口映射
由于默认绑定的是 127.0.0.1
使用 Nginx 将 Ollama 服务的端口映射出来:
server {
listen 11435;
server_name localhost;
location / {
proxy_pass http://127.0.0.1:11434;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
- 要注意防火墙。
也可以设置 Ollama 的环境变量 OLLAMA_HOST=0.0.0.0 监听所有可用的网络接口,从而允许外部网络访问:
export OLLAMA_HOST=0.0.0.0:11434
nohup ./ollama serve >> serve.log 2>&1 &
Ollama 的环境变量:
Usage:
ollama serve [flags]
Aliases:
serve, start
Flags:
-h, --help help for serve
Environment Variables:
OLLAMA_DEBUG Show additional debug information (e.g. OLLAMA_DEBUG=1)
OLLAMA_HOST IP Address for the ollama server (default 127.0.0.1:11434)
OLLAMA_KEEP_ALIVE The duration that models stay loaded in memory (default "5m")
OLLAMA_MAX_LOADED_MODELS Maximum number of loaded models per GPU
OLLAMA_MAX_QUEUE Maximum number of queued requests
OLLAMA_MODELS The path to the models directory
OLLAMA_NUM_PARALLEL Maximum number of parallel requests
OLLAMA_NOPRUNE Do not prune model blobs on startup
OLLAMA_ORIGINS A comma separated list of allowed origins
OLLAMA_SCHED_SPREAD Always schedule model across all GPUs
OLLAMA_TMPDIR Location for temporary files
OLLAMA_FLASH_ATTENTION Enabled flash attention
OLLAMA_LLM_LIBRARY Set LLM library to bypass autodetection
2.SpringBoot 集成
Spring AI 官方文档 Ollama 的 API 说明文档。
Spring AI supports Spring Boot 3.2.x and 3.3.x
最低 JDK 要求为 17
由于项目使用的依然是 JDK8 这里自行封装 API。
2.1 配置信息及配置类
spring:
ai:
ollama:
baseUrl: http://192.168.0.1:11434
temperature: 0.8
maxTokens: 4096
stream: false
@Data
@Component
@ConfigurationProperties(prefix = "spring.ai.ollama")
public class OllamaConfig {
/**
* 调用路径
*/
public String baseUrl;
/**
* 严谨与想象【0-2】
*/
public double temperature = 0.8;
/**
* 最大token量
*/
public int maxTokens = 4096;
/**
* 是否流输出
*/
public boolean stream = false;
}
2.2 请求及响应对象
公共的信息类:
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Message {
@ApiModelProperty("角色")
private String role;
@ApiModelProperty("内容")
private String content;
}
请求对象封装,两个请求对象属性很相似:
@Data
@NoArgsConstructor
@AllArgsConstructor
public class GenerateReq {
@ApiModelProperty("模型")
private String model;
@ApiModelProperty("提示")
private String prompt;
@ApiModelProperty("是否流输出")
private boolean stream = false;
}
@Data
@NoArgsConstructor
@AllArgsConstructor
public class ChatReq {
@ApiModelProperty("模型")
private String model;
@ApiModelProperty("提示")
private List<Message> messages;
@ApiModelProperty("是否流输出")
private boolean stream = false;
}
响应对象封装,也是比较相似的:
@Data
@NoArgsConstructor
@AllArgsConstructor
public class GenerateRes {
private String model;
private String created_at;
private String response;
private String done_reason;
private boolean done;
private int total_duration;
private int load_duration;
private int prompt_eval_count;
private int prompt_eval_duration;
private int eval_count;
private int eval_duration;
}
@Data
@NoArgsConstructor
@AllArgsConstructor
public class ChatRes {
private String model;
private String created_at;
private Message message;
private String done_reason;
private boolean done;
private long total_duration;
private long load_duration;
private long prompt_eval_count;
private long prompt_eval_duration;
private long eval_count;
private long eval_duration;
}
2.3 OllamaController
简单弄两个接口进行测试:
@Slf4j
@RestController
@RequestMapping("/api")
public class OllamaController {
@Resource
private OllamaComponent ollamaComponent;
@PostMapping("/generate")
@ApiOperation(value = "生成补全", tags = {"Chat"})
public R<Object> generate(@RequestBody GenerateReq req) {
return ollamaComponent.generate(req);
}
@PostMapping("/chat")
@ApiOperation(value = "生成聊天补全", tags = {"Chat"})
public R<Object> chat(@RequestBody ChatReq req) {
return ollamaComponent.chat(req);
}
}
2.4 OllamaComponent
调用服务的类:
@Slf4j
@Component
public class OllamaComponent {
@Resource
private OllamaUtil ollamaUtil;
public R<Object> generate(GenerateReq req) {
GenerateRes generateRes = null;
try {
generateRes = ollamaUtil.generate(req);
} catch (Exception e) {
e.printStackTrace();
log.error("generate Failed!");
}
return R.ok(generateRes);
}
public R<Object> chat(ChatReq req) {
ChatRes chatRes = null;
try {
chatRes = ollamaUtil.chat(req);
} catch (Exception e) {
e.printStackTrace();
log.error("chat Failed!");
}
return R.ok(chatRes);
}
}
2.5 OllamaUtil
发送接口请求的工具类:
@Slf4j
@Component
public class OllamaUtil {
@Resource
private OllamaConfig ollamaConfig;
private static final int CODE200 = 200;
public GenerateRes generate(GenerateReq req) {
try {
String bodyStr = ollamaRemoteApi("/api/generate", req);
return JSON.parseObject(bodyStr, GenerateRes.class);
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
public ChatRes chat(ChatReq req) {
try {
String bodyStr = ollamaRemoteApi("/api/chat", req);
return JSON.parseObject(bodyStr, ChatRes.class);
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
private String ollamaRemoteApi(String url, Object req) throws IOException {
String bodyStr = "";
OkHttpClient client = OkHttpClientConfig.getUnsafeOkHttpClient();
MediaType mediaType = MediaType.parse("application/json");
RequestBody body = RequestBody.create(mediaType, JSON.toJSONString(req));
Request request = new Request.Builder()
.url(ollamaConfig.baseUrl + url)
.method("POST", body)
.addHeader("User-Agent", "Apifox/1.0.0 (https://apifox.com)")
.addHeader("Content-Type", "application/json")
.addHeader("Accept", "*/*")
.addHeader("Connection", "keep-alive")
.build();
Response response = client.newCall(request).execute();
if (response.code() == CODE200) {
bodyStr = response.body().string();
}
return bodyStr;
}
}
2.6 OkHttpClientConfig
这个类主要是为了绕过 https,一般的请求也适用,由开发工程师 dongliang7贡献,特此感谢:
/**
* 创建 OkHttpClient 不进行SSL(证书)验证
*
* @author dongliang7
* @date 2021年11月19日 09:50:00
*/
@Slf4j
public class OkHttpClientConfig {
public static OkHttpClient getUnsafeOkHttpClient() {
try {
// 创建不验证证书链的信任管理器
final TrustManager[] trustAllCerts = new TrustManager[]{
new X509TrustManager() {
@Override
public void checkClientTrusted(java.security.cert.X509Certificate[] chain, String authType) {
}
@Override
public void checkServerTrusted(java.security.cert.X509Certificate[] chain, String authType) {
}
@Override
public java.security.cert.X509Certificate[] getAcceptedIssuers() {
return new java.security.cert.X509Certificate[]{};
}
}
};
// 安装全信任信任管理器
final SSLContext sslContext = SSLContext.getInstance("SSL");
sslContext.init(null, trustAllCerts, new java.security.SecureRandom());
// 使用我们完全信任的管理器创建 ssl 套接字工厂
final SSLSocketFactory sslSocketFactory = sslContext.getSocketFactory();
OkHttpClient.Builder builder = new OkHttpClient.Builder()
.connectTimeout(60, TimeUnit.SECONDS)
.readTimeout(60, TimeUnit.SECONDS)
.writeTimeout(120, TimeUnit.SECONDS);
builder.sslSocketFactory(sslSocketFactory);
builder.hostnameVerifier((hostname, session) -> true);
return builder.build();
} catch (Exception e) {
log.error("创建OkHttpClient不进行SSL(证书)验证失败:{}", e.getMessage());
throw new RuntimeException(e);
}
}
}
3.更新日志
- 20250326 网盘添加 ollama-linux-amd64-0.5.11.tgz 0.5.11版本资源
更多推荐
所有评论(0)