
10万预算大模型一体机选购指南:如何选到真正高性价比?
DeepSeek-R1、QwQ-32B等顶尖推理大模型陆续开源面世,想要部署一套大模型底座,升级公司的OA、ERP、CRM等办公系统接入大模型,考虑到公司对数据安全、隐私保护的要求,预算在10万左右,如何选购适合部署在公司内部的高性价比大模型一体机?
接下来,我们根据预算,从大模型一体机的硬件配置、搭载的大模型等多个维度,给大家提供参考。
一、硬件配置
-
服务器架构:4U 4GPU 超微X12DPG-QT6,万兆网卡;5个PCIE4.0;支持4GPU;塔式机箱,支持2个第3代英特尔至强可扩展处理器,最高支持270W,最大支持16个内存插槽RDIMM ECC 3200 DDR4,2个板载M.2 NVMe, 独立IPMI管理接口 ,板载双万兆网口,2400W ATX电源 ;默认支持6个PCIE4.0 x16,1个PCIE5.0 x8
-
CPU:Intel Xeon Silver 4310 12C 24T 120W 2.1GHz Processor * 2
-
内存:32GB DDR4-3200 DIMM * 8
-
系统盘:2.5寸 SATA SSD 960G * 1
-
数据盘:2.5寸 SATA 1TB * 1
-
显卡:NVIDIA RTX 4090 24G 涡轮卡 * 2
-
电源:2000W * 1
二、系统环境
-
系统版本:Ubuntu 22.04.4 LTS
-
内核版本:5.15.0-134-generic
-
NVIDIA驱动版本:550.54.15
-
CUDA版本:12.4
三、预装模型
-
推理框架:vLLM
-
模型类型:QwQ-32B
-
内置API接口:
curl -X POST http://172.16.7.200:8001/v1/chat/completions -H"Content-Type: application/json" -d '{
"model": "qwq-32b",
"stream": true,
"max_token": 32768,
"messages": [
{"role": "user", "content": "who are you?"}
]
}'
模型类型,支持选配:QwQ-32B | DeepSeek-R1-Distill-Qwen-32B | DeepSeek-R1-Distill-Qwen-14B 等。
四、性能压测数据
1、64请求,32并发,上下文长度4K,吞吐量约:459,成功率100%
Benchmarking summary:
+-----------------------------------+-----------------------------------------------------+
| Key | Value |
+===================================+=====================================================+
| Time taken for tests (s) | 165.84 |
+-----------------------------------+-----------------------------------------------------+
| Number of concurrency | 32 |
+-----------------------------------+-----------------------------------------------------+
| Total requests | 64 |
+-----------------------------------+-----------------------------------------------------+
| Succeed requests | 64 |
+-----------------------------------+-----------------------------------------------------+
| Failed requests | 0 |
+-----------------------------------+-----------------------------------------------------+
| Throughput(average tokens/s) | 459.607 |
+-----------------------------------+-----------------------------------------------------+
| Average QPS | 0.386 |
+-----------------------------------+-----------------------------------------------------+
| Average latency (s) | 46.558 |
+-----------------------------------+-----------------------------------------------------+
| Average time to first token (s) | 0.207 |
+-----------------------------------+-----------------------------------------------------+
| Average time per output token (s) | 0.041 |
+-----------------------------------+-----------------------------------------------------+
| Average input tokens per request | 29.594 |
+-----------------------------------+-----------------------------------------------------+
| Average output tokens per request | 1190.953 |
+-----------------------------------+-----------------------------------------------------+
| Average package latency (s) | 0.039 |
+-----------------------------------+-----------------------------------------------------+
| Average package per request | 1190.953 |
+-----------------------------------+-----------------------------------------------------+
| Expected number of requests | 64 |
+-----------------------------------+-----------------------------------------------------+
Percentile results:
+------------+----------+----------+-------------+--------------+---------------+----------------------+
| Percentile | TTFT (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Throughput(tokens/s) |
+------------+----------+----------+-------------+--------------+---------------+----------------------+
| 10% | 0.0674 | 0.0347 | 3.1883 | 21 | 77 | 22.8644 |
| 25% | 0.0704 | 0.0367 | 6.0251 | 24 | 140 | 24.4609 |
| 50% | 0.1328 | 0.0385 | 58.6126 | 29 | 1471 | 25.1108 |
| 66% | 0.318 | 0.0393 | 67.5388 | 32 | 1694 | 25.3569 |
| 75% | 0.3193 | 0.0399 | 72.0769 | 34 | 1809 | 25.529 |
| 80% | 0.3205 | 0.0404 | 73.4344 | 34 | 1844 | 25.6567 |
| 90% | 0.4189 | 0.0428 | 83.7376 | 40 | 2236 | 26.3543 |
| 95% | 0.4194 | 0.0474 | 103.771 | 43 | 2840 | 26.6591 |
| 98% | 0.4199 | 0.0636 | 141.9881 | 45 | 3742 | 27.0068 |
| 99% | 0.42 | 0.074 | 151.9243 | 45 | 3999 | 27.368 |
+------------+----------+----------+-------------+--------------+---------------+----------------------+
2、128请求,64并发,上下文长度4K,吞吐量约:900,成功率100%
Benchmarking summary:
+-----------------------------------+-----------------------------------------------------+
| Key | Value |
+===================================+=====================================================+
| Time taken for tests (s) | 174.458 |
+-----------------------------------+-----------------------------------------------------+
| Number of concurrency | 64 |
+-----------------------------------+-----------------------------------------------------+
| Total requests | 128 |
+-----------------------------------+-----------------------------------------------------+
| Succeed requests | 128 |
+-----------------------------------+-----------------------------------------------------+
| Failed requests | 0 |
+-----------------------------------+-----------------------------------------------------+
| Throughput(average tokens/s) | 900.453 |
+-----------------------------------+-----------------------------------------------------+
| Average QPS | 0.734 |
+-----------------------------------+-----------------------------------------------------+
| Average latency (s) | 58.153 |
+-----------------------------------+-----------------------------------------------------+
| Average time to first token (s) | 1.591 |
+-----------------------------------+-----------------------------------------------------+
| Average time per output token (s) | 0.053 |
+-----------------------------------+-----------------------------------------------------+
| Average input tokens per request | 28.812 |
+-----------------------------------+-----------------------------------------------------+
| Average output tokens per request | 1227.273 |
+-----------------------------------+-----------------------------------------------------+
| Average package latency (s) | 0.046 |
+-----------------------------------+-----------------------------------------------------+
| Average package per request | 1227.273 |
+-----------------------------------+-----------------------------------------------------+
| Expected number of requests | 128 |
+-----------------------------------+-----------------------------------------------------+
Percentile results:
+------------+----------+----------+-------------+--------------+---------------+----------------------+
| Percentile | TTFT (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Throughput(tokens/s) |
+------------+----------+----------+-------------+--------------+---------------+----------------------+
| 10% | 0.0668 | 0.0372 | 4.3761 | 20 | 77 | 16.408 |
| 25% | 0.0703 | 0.0397 | 12.0744 | 23 | 199 | 19.8301 |
| 50% | 0.4059 | 0.0425 | 67.8619 | 28 | 1398 | 21.003 |
| 66% | 0.7801 | 0.0444 | 77.7914 | 31 | 1673 | 21.6158 |
| 75% | 0.7856 | 0.0455 | 85.5781 | 34 | 1810 | 22.0796 |
| 80% | 0.7885 | 0.0463 | 91.5732 | 36 | 1952 | 22.4552 |
| 90% | 4.4932 | 0.0481 | 103.4461 | 40 | 2199 | 23.2734 |
| 95% | 9.5955 | 0.0513 | 125.3587 | 42 | 2757 | 24.3639 |
| 98% | 18.7952 | 0.0774 | 154.1247 | 45 | 3587 | 25.1411 |
| 99% | 19.4686 | 0.1236 | 174.4351 | 48 | 4096 | 25.2272 |
+------------+----------+----------+-------------+--------------+---------------+----------------------+
3、160请求,96并发,上下文长度4K,吞吐量约:951,成功率100%
Benchmarking summary:
+-----------------------------------+-----------------------------------------------------+
| Key | Value |
+===================================+=====================================================+
| Time taken for tests (s) | 205.116 |
+-----------------------------------+-----------------------------------------------------+
| Number of concurrency | 96 |
+-----------------------------------+-----------------------------------------------------+
| Total requests | 160 |
+-----------------------------------+-----------------------------------------------------+
| Succeed requests | 160 |
+-----------------------------------+-----------------------------------------------------+
| Failed requests | 0 |
+-----------------------------------+-----------------------------------------------------+
| Throughput(average tokens/s) | 951.706 |
+-----------------------------------+-----------------------------------------------------+
| Average QPS | 0.78 |
+-----------------------------------+-----------------------------------------------------+
| Average latency (s) | 80.641 |
+-----------------------------------+-----------------------------------------------------+
| Average time to first token (s) | 4.946 |
+-----------------------------------+-----------------------------------------------------+
| Average time per output token (s) | 0.078 |
+-----------------------------------+-----------------------------------------------------+
| Average input tokens per request | 29.319 |
+-----------------------------------+-----------------------------------------------------+
| Average output tokens per request | 1220.062 |
+-----------------------------------+-----------------------------------------------------+
| Average package latency (s) | 0.062 |
+-----------------------------------+-----------------------------------------------------+
| Average package per request | 1220.062 |
+-----------------------------------+-----------------------------------------------------+
| Expected number of requests | 160 |
+-----------------------------------+-----------------------------------------------------+
Percentile results:
+------------+----------+----------+-------------+--------------+---------------+----------------------+
| Percentile | TTFT (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Throughput(tokens/s) |
+------------+----------+----------+-------------+--------------+---------------+----------------------+
| 10% | 0.0698 | 0.037 | 5.6135 | 20 | 84 | 11.5917 |
| 25% | 0.0796 | 0.0423 | 14.2502 | 24 | 224 | 13.5395 |
| 50% | 0.8165 | 0.0499 | 93.074 | 29 | 1398 | 16.5264 |
| 66% | 0.8261 | 0.0557 | 113.212 | 32 | 1632 | 16.9418 |
| 75% | 1.0795 | 0.0578 | 125.709 | 34 | 1776 | 17.1497 |
| 80% | 1.0873 | 0.0591 | 130.813 | 36 | 1859 | 17.2425 |
| 90% | 1.1001 | 0.0626 | 146.4731 | 40 | 2288 | 17.4515 |
| 95% | 43.8405 | 0.0706 | 161.2143 | 42 | 2785 | 18.204 |
| 98% | 59.0927 | 0.0961 | 181.7891 | 45 | 3543 | 19.4896 |
| 99% | 61.9677 | 0.1718 | 201.432 | 48 | 4096 | 20.3344 |
+------------+----------+----------+-------------+--------------+---------------+----------------------+
4、256请求,128并发,上下文长度4K,吞吐量约:784,成功率97.6%,失败6个
Benchmarking summary:
+-----------------------------------+-----------------------------------------------------+
| Key | Value |
+===================================+=====================================================+
| Time taken for tests (s) | 369.935 |
+-----------------------------------+-----------------------------------------------------+
| Number of concurrency | 128 |
+-----------------------------------+-----------------------------------------------------+
| Total requests | 256 |
+-----------------------------------+-----------------------------------------------------+
| Succeed requests | 250 |
+-----------------------------------+-----------------------------------------------------+
| Failed requests | 6 |
+-----------------------------------+-----------------------------------------------------+
| Throughput(average tokens/s) | 784.738 |
+-----------------------------------+-----------------------------------------------------+
| Average QPS | 0.676 |
+-----------------------------------+-----------------------------------------------------+
| Average latency (s) | 110.144 |
+-----------------------------------+-----------------------------------------------------+
| Average time to first token (s) | 11.86 |
+-----------------------------------+-----------------------------------------------------+
| Average time per output token (s) | 0.153 |
+-----------------------------------+-----------------------------------------------------+
| Average input tokens per request | 29.332 |
+-----------------------------------+-----------------------------------------------------+
| Average output tokens per request | 1161.208 |
+-----------------------------------+-----------------------------------------------------+
| Average package latency (s) | 0.085 |
+-----------------------------------+-----------------------------------------------------+
| Average package per request | 1161.208 |
+-----------------------------------+-----------------------------------------------------+
| Expected number of requests | 256 |
+-----------------------------------+-----------------------------------------------------+
| Result DB path | ./outputs\20250307_172013\qwq-32b\benchmark_data.db |
+-----------------------------------+-----------------------------------------------------+
2025-03-07 17:26:30,308 - evalscope - INFO -
Percentile results:
+------------+----------+----------+-------------+--------------+---------------+----------------------+
| Percentile | TTFT (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Throughput(tokens/s) |
+------------+----------+----------+-------------+--------------+---------------+----------------------+
| 10% | 0.0849 | 0.0394 | 7.5397 | 20 | 72 | 7.0122 |
| 25% | 0.6538 | 0.0448 | 29.9604 | 24 | 196 | 8.6525 |
| 50% | 1.3853 | 0.0558 | 119.713 | 29 | 1386 | 11.7059 |
| 66% | 1.4043 | 0.0622 | 155.7703 | 32 | 1604 | 13.5485 |
| 75% | 8.8918 | 0.0666 | 169.3069 | 35 | 1726 | 14.1706 |
| 80% | 23.9252 | 0.0687 | 176.7478 | 36 | 1825 | 15.1871 |
| 90% | 46.345 | 0.0736 | 201.3842 | 40 | 2097 | 15.5282 |
| 95% | 65.5189 | 0.0853 | 214.3025 | 42 | 2301 | 15.6163 |
| 98% | 86.6971 | 0.1251 | 225.9585 | 45 | 2650 | 15.7064 |
| 99% | 93.7098 | 0.394 | 235.0125 | 48 | 3110 | 15.7135 |
+------------+----------+----------+-------------+--------------+---------------+----------------------+
性能压测结论:
上下文长度平均4K,可稳定支持到120个并发访问。
五、增值服务
-
硬件服务:3年标准硬件服务 5 * 9
-
软件服务:可预装智能体软件(带企业知识库管理、用户管理、知识库访问授权管理、联网搜索/提示词助理/规章制度助理/会议议程助理/日报助理/周报助理/月报助理/翻译助理/营销文案助理/编程助理等智能体)
-
升级服务:模型定期免费升级
六、更多参考机型
请访问:https://api.baystoneai.com
-
技术规格
-
模型选择指南
更多推荐
所有评论(0)