ubuntu 20.04 install tesseract （豆包）

【代码】ubuntu 20.04 install tesseract （豆包）

tlwlmy

167人浏览 · 2026-03-20 15:57:25

tlwlmy · 2026-03-20 15:57:25 发布

一、核心安装步骤（基础版 + 中文语言包）

先更新系统源（确保安装最新版本）

sudo apt update && sudo apt upgrade -y

# 第一步：更新软件源索引（前置必做）
sudo apt update

# 第二步：查看待升级的软件清单（提前知晓会改什么）
sudo apt list --upgradable

# 第三步：执行 dist-upgrade（不建议加 -y，手动确认更安全）
sudo apt dist-upgrade
# 执行后会列出“将要安装/升级/卸载”的包，输入 y 回车确认

安装 Tesseract 核心包
Ubuntu 20.04 官方源已包含 Tesseract 4.x，直接安装：

sudo apt install tesseract-ocr -y

安装中文语言包（关键！识别中文必须装）
默认只带英文，需单独安装中文（简体 / 繁体）语言包：

简体中文（最常用）
sudo apt install tesseract-ocr-chi-sim -y

可选：繁体中文（如需）
sudo apt install tesseract-ocr-chi-tra -y

验证安装是否成功

# 查看 Tesseract 版本（应输出 4.x 及以上）
tesseract --version

# 查看已安装的语言包（确认 chi_sim 存在）
tesseract --list-langs

输出中出现 chi_sim 说明简体中文包安装成功；
出现 eng 是默认的英文包。

二、进阶：安装最新版 Tesseract 5.x（可选）

如果需要 Tesseract 5.x 的新特性（如更好的中文识别），官方源版本较低，可通过 PPA 安装：

# 添加 PPA 源
sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel -y
sudo apt update

# 安装 5.x 版本（会覆盖旧版本）
sudo apt install tesseract-ocr -y

# 重新安装中文语言包（确保适配 5.x）
sudo apt install tesseract-ocr-chi-sim -y

三、基本使用示例（测试识别）

识别图片中的文字（简体中文）

# 语法：tesseract 图片路径 输出文件前缀 -l 语言包
tesseract test.png result -l chi_sim
test.png：待识别的图片（需提前准备）；
result：输出的文本文件（会生成 result.txt）；
-l chi_sim：指定用简体中文识别（默认是 eng）。

直接输出到终端（不生成文件）

tesseract test.png stdout -l chi_sim

四、常见问题解决

报错「Error opening data file … chi_sim.traineddata」
原因：中文语言包未安装或路径错误，重新安装：

sudo apt reinstall tesseract-ocr-chi-sim -y

识别中文乱码 / 准确率低
- 确保图片清晰、文字无模糊 / 倾斜；
- 优先用 PNG 格式（JPG 压缩可能影响识别）；
- 进阶：可训练自定义字库，或使用 tesseract-ocr-best 优化版。
Python 调用 Tesseract（可选）
- 如果需要在代码中使用，安装 pytesseract：

# 安装 Python 库
pip install pytesseract pillow

# Python 示例代码
import pytesseract
from PIL import Image

# 指定 Tesseract 路径（Ubuntu 一般无需指定，默认路径：/usr/bin/tesseract）
# pytesseract.pytesseract.tesseract_cmd = '/usr/bin/tesseract'

# 识别图片
img = Image.open('test.png')
text = pytesseract.image_to_string(img, lang='chi_sim')
print(text)

五、卸载 Tesseract

# 卸载核心包
sudo apt remove tesseract-ocr -y

# 清理残留
sudo apt autoremove -y

DeepSeek技术社区

欢迎加入DeepSeek 技术社区。在这里，你可以找到志同道合的朋友，共同探索AI技术的奥秘。

更多推荐

RAG召回率低？混合检索策略与DeepSeek重排优化的工程实践

DeepSeek技术社区

RAG 稀疏稠密双路召回：如何避免混合检索的常见性能陷阱

DeepSeek技术社区

RAG 混合检索实战：何时该用向量+关键词双通道？DeepSeek 采购问答助手的踩坑总结

DeepSeek技术社区

所有评论(0)

查看更多评论

tlwlmy

@tianjiewang

已为社区贡献1条内容

ubuntu 20.04 install tesseract （豆包）

tlwlmy

一、核心安装步骤（基础版 + 中文语言包）

二、进阶：安装最新版 Tesseract 5.x（可选）

三、基本使用示例（测试识别）

四、常见问题解决

五、卸载 Tesseract

所有评论(0)

温馨提示：您尚未绑定手机号

tlwlmy