DeepSeek-OCR-2开源OCR教程：支持Markdown输出的结构化解析指南

本文介绍了如何在星图GPU平台上自动化部署🖋️ 深求·墨鉴 (DeepSeek-OCR-2)镜像，实现高效的结构化文档识别。该工具不仅能精准识别图片中的文字，还能理解文档结构（如标题、表格、公式），并直接输出格式完整的Markdown文档，典型应用于将会议白板照片快速转换为结构清晰的会议纪要，极大提升文档数字化效率。

陳寶平

241人浏览 · 2026-03-19 00:09:05

陳寶平 · 2026-03-19 00:09:05 发布

DeepSeek-OCR-2开源OCR教程：支持Markdown输出的结构化解析指南

1. 前言：当OCR遇见Markdown

你有没有遇到过这样的场景？

手头有一堆纸质资料需要整理成电子版，但手动打字太费时间
拍了一堆会议白板的照片，想把上面的内容整理成会议纪要
找到一份PDF格式的研究论文，想把里面的图表和公式提取出来
扫描了一本老书，想把内容数字化保存

传统的OCR工具虽然能把图片变成文字，但结果往往是一团乱麻——表格没了格式，标题和正文混在一起，图片和文字位置全乱套。你拿到这样的文本，还得花大量时间重新排版整理。

今天我要介绍的DeepSeek-OCR-2，彻底改变了这个局面。它不仅能识别文字，还能理解文档的结构，然后直接输出格式完美的Markdown文档。这意味着你从图片到可编辑、可直接使用的文档，只需要一步。

2. DeepSeek-OCR-2是什么？

2.1 不仅仅是文字识别

DeepSeek-OCR-2是一个基于深度学习的开源OCR引擎，但它的能力远超传统OCR工具。让我用大白话解释一下：

传统OCR就像是一个只会认字的机器人——它看到图片上的文字，就一个一个地读出来，但不管这些文字是怎么排列的，也不管它们之间有什么关系。

DeepSeek-OCR-2则像是一个有经验的编辑——它不仅能认出文字，还能理解：

哪些是标题，哪些是正文
表格有几行几列，每个单元格里是什么
数学公式的结构是什么样的
图片和文字的相对位置关系
列表的层级关系

最重要的是，它把这些理解直接转换成Markdown格式。Markdown你可能听说过，它是一种轻量级的标记语言，用简单的符号就能表示格式。比如：

# 标题 表示一级标题
**加粗** 表示加粗文字
- 列表项 表示列表
| 表格 | 内容 | 表示表格

2.2 为什么选择DeepSeek-OCR-2？

我测试过市面上很多OCR工具，DeepSeek-OCR-2有几个明显的优势：

识别准确率高：特别是对中文的支持很好，包括各种字体、大小、排版方式。

结构保持完整：这是它最大的亮点。表格识别后还是表格，列表识别后还是列表，标题层级也保持原样。

开源免费：你可以自己部署，完全控制数据，不用担心隐私问题。

支持多种格式：除了Markdown，它还支持其他输出格式，但Markdown是最实用的。

易于集成：提供了Python API，可以很方便地集成到你的工作流中。

3. 快速上手：从安装到第一个识别

3.1 环境准备

首先，你需要准备一个Python环境。我建议使用Python 3.8或更高版本。

# 创建一个新的虚拟环境（可选但推荐）
python -m venv ocr_env
source ocr_env/bin/activate  # Linux/Mac
# 或者
ocr_env\Scripts\activate  # Windows

# 安装必要的依赖
pip install torch torchvision torchaudio
pip install opencv-python pillow
pip install transformers

3.2 安装DeepSeek-OCR-2

安装过程很简单：

# 克隆仓库
git clone https://github.com/deepseek-ai/DeepSeek-OCR-2.git
cd DeepSeek-OCR-2

# 安装依赖
pip install -r requirements.txt

# 安装DeepSeek-OCR-2包
pip install -e .

3.3 你的第一个OCR程序

让我们写一个最简单的程序来测试一下：

from deepseek_ocr import DeepSeekOCR
from PIL import Image
import os

# 初始化OCR引擎
ocr = DeepSeekOCR()

# 加载一张图片
image_path = "test_document.jpg"
if os.path.exists(image_path):
    image = Image.open(image_path)
    
    # 执行OCR识别
    result = ocr.recognize(image)
    
    # 输出Markdown格式的结果
    print("识别结果（Markdown格式）：")
    print("=" * 50)
    print(result.markdown)
    print("=" * 50)
    
    # 保存到文件
    with open("output.md", "w", encoding="utf-8") as f:
        f.write(result.markdown)
    print("结果已保存到 output.md")
else:
    print(f"请先准备测试图片：{image_path}")

这个程序做了几件事：

初始化OCR引擎
加载一张图片
识别图片中的文字和结构
输出Markdown格式的结果
把结果保存到文件

4. 核心功能详解

4.1 文字识别：不只是认字

DeepSeek-OCR-2的文字识别能力很强，但更重要的是它能理解文字的语义角色。

# 让我们看看详细的识别结果
result = ocr.recognize(image, return_details=True)

# 查看识别出的文本块
for i, block in enumerate(result.text_blocks):
    print(f"文本块 {i+1}:")
    print(f"  文本内容: {block.text}")
    print(f"  置信度: {block.confidence:.2f}")
    print(f"  位置: {block.bbox}")
    print(f"  类型: {block.type}")  # 可能是 'title', 'paragraph', 'list_item' 等
    print("-" * 30)

这个功能特别有用，因为你可以根据文本块的类型做不同的处理。比如，把所有标题提取出来生成目录，或者只处理正文部分。

4.2 表格识别：保持结构完整

表格识别是DeepSeek-OCR-2的强项。传统的OCR工具识别表格后，你得到的就是一堆文字，行列关系全没了。但DeepSeek-OCR-2能保持表格的完整结构。

# 专门处理表格
tables = result.tables
for i, table in enumerate(tables):
    print(f"表格 {i+1}:")
    print(f"  行数: {table.rows}")
    print(f"  列数: {table.cols}")
    
    # 获取Markdown格式的表格
    md_table = table.to_markdown()
    print("Markdown格式:")
    print(md_table)
    
    # 也可以获取原始数据
    data = table.data
    for row in data:
        print(row)

识别出来的表格会转换成这样的Markdown格式：

| 姓名 | 年龄 | 职位 |
|------|------|------|
| 张三 | 28   | 工程师 |
| 李四 | 32   | 经理 |

你可以直接把这个Markdown粘贴到Notion、Obsidian、Typora等支持Markdown的编辑器中，它会自动渲染成漂亮的表格。

4.3 公式识别：学术工作者的福音

如果你是学生、研究人员，或者需要处理技术文档，公式识别功能会帮你大忙。

# 检查是否有公式
if result.formulas:
    print(f"识别到 {len(result.formulas)} 个公式")
    for i, formula in enumerate(result.formulas):
        print(f"公式 {i+1}:")
        print(f"  LaTeX格式: {formula.latex}")
        print(f"  位置: {formula.bbox}")
        
        # 公式在Markdown中通常用 $$ 包裹
        md_formula = f"$$\n{formula.latex}\n$$"
        print(f"  Markdown格式: {md_formula}")

识别出来的公式是LaTeX格式的，这是学术界和科技文档的标准数学公式表示法。在Markdown中，用$$包裹LaTeX公式，它就能被正确渲染。

4.4 图片和文字混合排版

很多文档都是图文混排的，DeepSeek-OCR-2能理解图片和文字的关系。

# 处理图文混排
for i, layout in enumerate(result.layout):
    print(f"布局元素 {i+1}:")
    print(f"  类型: {layout.type}")  # 'text', 'image', 'table', 'formula'
    print(f"  位置: {layout.bbox}")
    print(f"  内容: {layout.content[:50]}...")  # 只显示前50个字符

这个布局信息特别有用，因为它告诉你文档的原始结构。你可以根据这个信息重建文档的原始排版。

5. 实战案例：处理真实文档

5.1 案例一：会议白板照片转会议纪要

假设你拍了一张会议白板的照片，上面有：

会议标题
几个讨论要点（列表形式）
一个简单的表格（任务分配）
一些手写的备注

def process_meeting_whiteboard(image_path):
    """处理会议白板照片"""
    # 加载图片
    image = Image.open(image_path)
    
    # 识别
    result = ocr.recognize(image, return_details=True)
    
    # 构建会议纪要
    meeting_minutes = []
    
    # 添加标题
    for block in result.text_blocks:
        if block.type == 'title' and block.confidence > 0.8:
            meeting_minutes.append(f"# {block.text}\n")
            break
    
    # 添加时间
    from datetime import datetime
    meeting_minutes.append(f"**会议时间**: {datetime.now().strftime('%Y-%m-%d %H:%M')}\n")
    
    # 添加讨论要点
    meeting_minutes.append("## 讨论要点\n")
    list_items = [b for b in result.text_blocks if b.type == 'list_item']
    for item in list_items:
        meeting_minutes.append(f"- {item.text}")
    
    # 添加任务分配表格
    if result.tables:
        meeting_minutes.append("\n## 任务分配\n")
        meeting_minutes.append(result.tables[0].to_markdown())
    
    # 添加备注
    meeting_minutes.append("\n## 备注\n")
    paragraphs = [b for b in result.text_blocks if b.type == 'paragraph']
    for para in paragraphs[:3]:  # 只取前三个段落作为备注
        meeting_minutes.append(para.text + "\n")
    
    # 保存结果
    output = "\n".join(meeting_minutes)
    with open("meeting_minutes.md", "w", encoding="utf-8") as f:
        f.write(output)
    
    return output

# 使用示例
minutes = process_meeting_whiteboard("whiteboard_photo.jpg")
print("生成的会议纪要:")
print(minutes)

这个脚本会自动：

识别白板上的标题作为会议主题
自动添加当前时间作为会议时间
提取列表项作为讨论要点
识别表格作为任务分配
提取其他文字作为备注

5.2 案例二：学术论文图片转结构化文档

学术论文通常包含：

复杂的数学公式
数据表格
图表
多级标题
参考文献

def process_academic_paper(image_path):
    """处理学术论文图片"""
    image = Image.open(image_path)
    result = ocr.recognize(image, return_details=True)
    
    # 构建结构化文档
    sections = []
    
    # 按位置排序文本块（从上到下）
    sorted_blocks = sorted(result.text_blocks, key=lambda x: x.bbox[1])
    
    current_section = []
    current_level = 0
    
    for block in sorted_blocks:
        if block.type == 'title':
            # 根据字体大小判断标题级别
            title_level = estimate_title_level(block)
            
            # 如果是新章节，保存当前章节
            if current_section:
                sections.append(("\n".join(current_section), current_level))
                current_section = []
            
            current_level = title_level
            current_section.append(f"{'#' * title_level} {block.text}")
        else:
            current_section.append(block.text)
    
    # 添加最后一个章节
    if current_section:
        sections.append(("\n".join(current_section), current_level))
    
    # 添加公式
    if result.formulas:
        sections.append(("## 公式", 2))
        for formula in result.formulas:
            sections.append((f"$$\n{formula.latex}\n$$", 3))
    
    # 添加表格
    if result.tables:
        sections.append(("## 数据表格", 2))
        for table in result.tables:
            sections.append((table.to_markdown(), 3))
    
    # 生成最终文档
    final_doc = []
    for content, level in sections:
        final_doc.append(content)
    
    output = "\n\n".join(final_doc)
    
    with open("paper_structured.md", "w", encoding="utf-8") as f:
        f.write(output)
    
    return output

def estimate_title_level(block):
    """根据文本块特征估计标题级别"""
    # 简单的启发式方法：根据文本长度和位置
    text_length = len(block.text)
    if text_length < 20:
        return 1  # 一级标题
    elif text_length < 40:
        return 2  # 二级标题
    else:
        return 3  # 三级标题

5.3 案例三：批量处理文档图片

如果你有很多文档需要处理，可以批量处理：

import os
from pathlib import Path

def batch_process_ocr(input_folder, output_folder):
    """批量处理文件夹中的所有图片"""
    input_path = Path(input_folder)
    output_path = Path(output_folder)
    output_path.mkdir(exist_ok=True)
    
    # 支持的图片格式
    image_extensions = ['.jpg', '.jpeg', '.png', '.bmp', '.tiff']
    
    processed_count = 0
    error_files = []
    
    for image_file in input_path.iterdir():
        if image_file.suffix.lower() in image_extensions:
            try:
                print(f"处理: {image_file.name}")
                
                # 加载图片
                image = Image.open(image_file)
                
                # 识别
                result = ocr.recognize(image)
                
                # 保存结果
                output_file = output_path / f"{image_file.stem}.md"
                with open(output_file, "w", encoding="utf-8") as f:
                    f.write(result.markdown)
                
                processed_count += 1
                print(f"  完成: {output_file}")
                
            except Exception as e:
                error_files.append((image_file.name, str(e)))
                print(f"  错误: {e}")
    
    # 生成处理报告
    report = [
        "# OCR批量处理报告",
        f"**处理时间**: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}",
        f"**输入文件夹**: {input_folder}",
        f"**输出文件夹**: {output_folder}",
        f"**成功处理**: {processed_count} 个文件",
    ]
    
    if error_files:
        report.append(f"**处理失败**: {len(error_files)} 个文件")
        report.append("## 失败文件列表")
        for filename, error in error_files:
            report.append(f"- {filename}: {error}")
    
    report_path = output_path / "processing_report.md"
    with open(report_path, "w", encoding="utf-8") as f:
        f.write("\n\n".join(report))
    
    print(f"\n批量处理完成！")
    print(f"成功处理: {processed_count} 个文件")
    if error_files:
        print(f"失败: {len(error_files)} 个文件")
        print(f"详细报告: {report_path}")
    
    return processed_count, error_files

# 使用示例
# batch_process_ocr("input_images", "output_markdown")

6. 高级技巧与优化建议

6.1 提高识别准确率

虽然DeepSeek-OCR-2已经很准确了，但你可以通过一些技巧进一步提高识别率：

def optimize_ocr_recognition(image_path):
    """优化OCR识别过程"""
    from PIL import Image, ImageEnhance
    
    # 1. 预处理图片
    image = Image.open(image_path)
    
    # 调整对比度
    enhancer = ImageEnhance.Contrast(image)
    image = enhancer.enhance(1.5)  # 增加50%对比度
    
    # 调整亮度
    enhancer = ImageEnhance.Brightness(image)
    image = enhancer.enhance(1.1)  # 增加10%亮度
    
    # 转换为灰度图（对黑白文档效果更好）
    if image.mode != 'L':
        image = image.convert('L')
    
    # 2. 使用更详细的配置
    config = {
        'detect_rotation': True,  # 自动检测旋转
        'language': 'chinese_english',  # 中英文混合
        'paragraph_break': '\n\n',  # 段落分隔符
        'table_structure': True,  # 识别表格结构
        'formula_detection': True,  # 识别公式
    }
    
    # 3. 执行识别
    result = ocr.recognize(image, **config)
    
    return result

6.2 处理特殊排版

有些文档有特殊的排版需求，比如多栏排版、脚注、页眉页脚等：

def handle_special_layouts(image_path):
    """处理特殊排版文档"""
    image = Image.open(image_path)
    result = ocr.recognize(image, return_details=True)
    
    # 检测多栏排版
    columns = detect_columns(result.text_blocks)
    
    if len(columns) > 1:
        print("检测到多栏排版")
        # 按栏重新组织内容
        reorganized_content = reorganize_columns(columns)
        return reorganized_content
    
    # 检测页眉页脚
    header_footer = detect_header_footer(result.text_blocks, image.size)
    
    if header_footer['header'] or header_footer['footer']:
        print("检测到页眉页脚")
        # 移除页眉页脚
        content_without_hf = remove_header_footer(result, header_footer)
        return content_without_hf
    
    return result.markdown

def detect_columns(text_blocks):
    """检测多栏排版"""
    # 简单的基于位置的栏检测
    columns = {}
    for block in text_blocks:
        x_center = (block.bbox[0] + block.bbox[2]) / 2
        column_key = int(x_center / 100)  # 每100像素为一栏
        
        if column_key not in columns:
            columns[column_key] = []
        columns[column_key].append(block)
    
    # 按x坐标排序
    sorted_columns = sorted(columns.items(), key=lambda x: x[0])
    
    return [blocks for _, blocks in sorted_columns]

6.3 自定义输出格式

虽然Markdown是默认输出格式，但你可以自定义输出：

def custom_output_format(result, template="default"):
    """自定义输出格式"""
    if template == "notion":
        # 适配Notion的格式
        output = []
        
        for block in result.text_blocks:
            if block.type == 'title':
                level = block.metadata.get('level', 1)
                if level == 1:
                    output.append(f"# {block.text}")
                elif level == 2:
                    output.append(f"## {block.text}")
                else:
                    output.append(f"### {block.text}")
            elif block.type == 'list_item':
                indent = block.metadata.get('indent', 0)
                prefix = "  " * indent + "- "
                output.append(f"{prefix}{block.text}")
            else:
                output.append(block.text)
        
        return "\n".join(output)
    
    elif template == "html":
        # 输出HTML格式
        output = ["<html><body>"]
        
        for block in result.text_blocks:
            if block.type == 'title':
                level = block.metadata.get('level', 1)
                output.append(f"<h{level}>{block.text}</h{level}>")
            elif block.type == 'list_item':
                output.append(f"<li>{block.text}</li>")
            else:
                output.append(f"<p>{block.text}</p>")
        
        output.append("</body></html>")
        return "\n".join(output)
    
    else:
        # 默认返回Markdown
        return result.markdown

7. 常见问题与解决方案

7.1 识别速度慢怎么办？

DeepSeek-OCR-2的识别质量很高，但相应的计算量也较大。以下是一些优化建议：

def optimize_speed():
    """优化识别速度"""
    # 1. 使用GPU加速（如果有的话）
    import torch
    if torch.cuda.is_available():
        print("使用GPU加速")
        # 在初始化时指定设备
        ocr = DeepSeekOCR(device='cuda')
    else:
        print("使用CPU，考虑减少图片尺寸")
    
    # 2. 降低图片分辨率（对大图有效）
    def resize_image(image, max_size=2000):
        """调整图片大小"""
        width, height = image.size
        if max(width, height) > max_size:
            ratio = max_size / max(width, height)
            new_size = (int(width * ratio), int(height * ratio))
            return image.resize(new_size, Image.Resampling.LANCZOS)
        return image
    
    # 3. 只识别需要的区域
    def recognize_region(image, region_bbox):
        """只识别指定区域"""
        # region_bbox格式: (left, top, right, bottom)
        cropped = image.crop(region_bbox)
        return ocr.recognize(cropped)
    
    # 4. 批量处理时使用多进程
    import multiprocessing as mp
    
    def process_single_image(args):
        """单个图片处理函数，用于多进程"""
        image_path, output_dir = args
        try:
            image = Image.open(image_path)
            result = ocr.recognize(image)
            output_path = Path(output_dir) / f"{Path(image_path).stem}.md"
            with open(output_path, "w", encoding="utf-8") as f:
                f.write(result.markdown)
            return (image_path, True)
        except Exception as e:
            return (image_path, False, str(e))
    
    def batch_process_parallel(image_paths, output_dir, num_workers=4):
        """并行批量处理"""
        with mp.Pool(num_workers) as pool:
            args = [(path, output_dir) for path in image_paths]
            results = pool.map(process_single_image, args)
        return results

7.2 识别结果有错误怎么办？

即使是最好的OCR工具，也难免会有识别错误。以下是一些纠正方法：

def correct_ocr_errors(text, correction_rules=None):
    """纠正常见的OCR错误"""
    if correction_rules is None:
        # 一些常见的OCR错误映射
        correction_rules = {
            '0': 'O',  # 数字0被识别为字母O
            '1': 'I',  # 数字1被识别为字母I
            '5': 'S',  # 数字5被识别为字母S
            'rn': 'm',  # rn被识别为m
            'cl': 'd',  # cl被识别为d
        }
    
    corrected = text
    for wrong, right in correction_rules.items():
        corrected = corrected.replace(wrong, right)
    
    return corrected

def interactive_correction(result):
    """交互式纠正识别错误"""
    import difflib
    
    print("原始识别结果:")
    print(result.markdown)
    print("\n" + "="*50 + "\n")
    
    # 让用户输入纠正后的文本
    corrected = input("请输入纠正后的文本（直接回车使用原始文本）:\n")
    
    if corrected.strip():
        # 计算差异
        original_lines = result.markdown.split('\n')
        corrected_lines = corrected.split('\n')
        
        diff = difflib.unified_diff(
            original_lines, corrected_lines,
            lineterm='', fromfile='原始', tofile='纠正后'
        )
        
        print("\n差异对比:")
        print('\n'.join(diff))
        
        return corrected
    else:
        print("使用原始文本")
        return result.markdown

def use_spell_checker(text):
    """使用拼写检查器"""
    try:
        from spellchecker import SpellChecker
        spell = SpellChecker()
        
        words = text.split()
        corrected_words = []
        
        for word in words:
            # 只检查看起来像单词的部分
            if word.isalpha():
                corrected = spell.correction(word)
                if corrected:
                    corrected_words.append(corrected)
                else:
                    corrected_words.append(word)
            else:
                corrected_words.append(word)
        
        return ' '.join(corrected_words)
    except ImportError:
        print("请先安装 spellchecker: pip install pyspellchecker")
        return text

7.3 处理模糊或低质量图片

对于质量较差的图片，需要额外的预处理：

def enhance_image_quality(image_path):
    """增强图片质量以提高OCR准确率"""
    import cv2
    import numpy as np
    
    # 使用OpenCV进行更专业的处理
    img = cv2.imread(image_path)
    
    # 转换为灰度图
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    # 使用CLAHE增强对比度（对光照不均的图片特别有效）
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
    enhanced = clahe.apply(gray)
    
    # 去噪
    denoised = cv2.fastNlMeansDenoising(enhanced, h=10)
    
    # 二值化（黑白化）
    _, binary = cv2.threshold(denoised, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    
    # 保存处理后的图片
    output_path = image_path.replace('.', '_enhanced.')
    cv2.imwrite(output_path, binary)
    
    print(f"增强后的图片已保存: {output_path}")
    return output_path

def handle_blurry_images(image_path):
    """处理模糊图片"""
    import cv2
    
    img = cv2.imread(image_path)
    
    # 尝试使用锐化滤波器
    kernel = np.array([[-1,-1,-1],
                       [-1, 9,-1],
                       [-1,-1,-1]])
    sharpened = cv2.filter2D(img, -1, kernel)
    
    # 或者使用非锐化掩蔽
    blurred = cv2.GaussianBlur(img, (0, 0), 3)
    sharpened = cv2.addWeighted(img, 1.5, blurred, -0.5, 0)
    
    output_path = image_path.replace('.', '_sharpened.')
    cv2.imwrite(output_path, sharpened)
    
    return output_path