基于内容智能断点切割长图

摘要：本文介绍了一个智能长图切割与自动配文系统。通过OpenCV灰度分析和梯度检测算法，在±100px范围内寻找最佳断点切割长图，避免切断文字内容块。结合GPT-4V视觉模型，为每张切图生成社交媒体适配的图文说明，支持导出为txt/Excel格式。系统采用PRA工作流设计，包含图像处理模块、AI分析模块和GUI交互界面，可实现从长图切割到智能配文的全流程自动化，适用于微信公众号、小红书等内容发布场

木子乔乔

1167人浏览 · 2025-06-19 19:50:15

木子乔乔 · 2025-06-19 19:50:15 发布

🎯

目标：

将一张长图切割成若干张「视觉自然、不截断文字块或图表区域」的图片，适合社交媒体连续阅读。

🧠 思路设计

读取长图 → 转灰度 → 计算每一行的像素变化程度（梯度）
寻找「内容变化小」的区域作为候选断点
以固定高度为基准，在内容平缓处“智能调整”断点（±100px内找最平滑位置）
输出切图

✅ 代码实现（含注释）

import cv2
import numpy as np
import os

def find_smart_cut_positions(gray_img, slice_height, margin=100):
    """
    根据图像灰度变化，找到最佳切割断点，避免切断文字或图像。
    在每个目标区域 ±margin 范围内寻找“像素变化最小”的地方作为切点。
    """
    height = gray_img.shape[0]
    cut_positions = [0]

    y = slice_height
    while y < height:
        start = max(y - margin, cut_positions[-1] + 100)
        end = min(y + margin, height - 1)
        region = gray_img[start:end]

        # 计算每行灰度的变化程度（图像梯度）
        diffs = np.abs(np.diff(region, axis=0)).sum(axis=1)
        best_offset = np.argmin(diffs)
        best_y = start + best_offset
        cut_positions.append(best_y)
        y = best_y + slice_height

    cut_positions.append(height)
    return cut_positions

def smart_split_image(image_path, output_dir, slice_height=1350):
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    img = cv2.imread(image_path)
    if img is None:
        print(f"读取失败：{image_path}")
        return

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    cut_positions = find_smart_cut_positions(gray, slice_height)

    base_name = os.path.splitext(os.path.basename(image_path))[0]
    for i in range(len(cut_positions) - 1):
        start = cut_positions[i]
        end = cut_positions[i + 1]
        slice_img = img[start:end]
        output_path = os.path.join(output_dir, f"{base_name}_smart_part{i+1}.png")
        cv2.imwrite(output_path, slice_img)
        print(f"保存：{output_path}（{end - start}px 高）")

if __name__ == "__main__":
    # 参数设置
    image_path = "your_long_image.png"  # 替换为你的长图路径
    output_dir = "./smart_slices"
    slice_height = 1350  # 每张图片大致目标高度，可根据平台调整

    smart_split_image(image_path, output_dir, slice_height)

📦 效果说明

它不会简单地等分图像，而是在每个目标高度附近 ±100px 内，自动寻找「灰度变化最小的行」作为断点。
这可以有效地避免切断文字、图表，适合带有大量内容的长图如：
- 微信公众号文章截图
- 报告型长图
- 小红书风格图文卡片

🧰 可选增强功能（如需我继续加）：

功能	描述
🔢 批量处理文件夹	自动处理多个长图
🧠 OCR分析断点	调用文字识别避开正文内容
🖼️ 可视化断点位置	在原图中画出分割线用于预览
💬 自动生成每张配图说明	用 GPT 分析图片内容自动配文字（结合 Edge TTS）

太好了！你要的是“自动配图文功能”——也就是：

每张切割后的图片，自动生成一句图文说明，可以用于配文、发布或语音讲解播客。

🎯 目标：

为每一张图片智能生成简洁有趣的说明文字，可用于小红书风格图解说、微博配图、朋友圈图文或播客讲解脚本。

🧠 整体工作流设计：

原始长图
   ↓
智能切图（内容断点）
   ↓
每张切图 → 图文分析 → 自动生成一句解说
   ↓
输出结果：图片 + 对应文字说明 + 合集文本（可读/播客用）

✅ 自动配图文脚本（结合 GPT 接口）

以下脚本将每张图片发送到 OpenAI（或你的 GPT 接口）进行图像理解，生成一句“简短文字描述”。

🔧 示例代码（需联网、API KEY）

import openai
import base64
import os
from PIL import Image
from io import BytesIO

openai.api_key = os.getenv("OPENAI_API_KEY")  # 推荐放在环境变量中

def encode_image(image_path):
    """将图片转换为 base64 编码字符串"""
    with Image.open(image_path) as img:
        img = img.convert("RGB")
        buffered = BytesIO()
        img.save(buffered, format="JPEG")
        return base64.b64encode(buffered.getvalue()).decode("utf-8")

def generate_caption(image_path, prompt_prefix="这张图展示了什么？用一句话说明："):
    base64_image = encode_image(image_path)
    response = openai.ChatCompletion.create(
        model="gpt-4-vision-preview",  # 或 gpt-4o
        messages=[
            {"role": "system", "content": "你是图文内容创作助手，擅长为图像生成社交媒体图文文案。"},
            {"role": "user", "content": [
                {"type": "text", "text": prompt_prefix},
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
            ]}
        ],
        max_tokens=100
    )
    caption = response.choices[0].message['content'].strip()
    return caption

def process_all_images(image_folder, output_text="captions.txt"):
    result = []
    for filename in sorted(os.listdir(image_folder)):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            img_path = os.path.join(image_folder, filename)
            print(f"生成文字说明：{filename}")
            caption = generate_caption(img_path)
            result.append((filename, caption))

    # 保存为文本文件
    with open(output_text, "w", encoding="utf-8") as f:
        for filename, caption in result:
            f.write(f"{filename}：{caption}\n")

    print(f"已保存图文说明到：{output_text}")
    return result

📌 输出形式示例

smart_part1.png：介绍了数据治理的三层架构，层次清晰。
smart_part2.png：展示了平台数据流动的具体路径，箭头指向明确。
smart_part3.png：列出了关键数据资产分类，适合新手理解。
...

还可以用于播客：

🔈「这一页展示了平台数据流动的路径，从源头采集到中台治理，流程清晰。」  
🔈「接下来这一页介绍了AI在数据处理环节的应用场景。」

🛠️ 技术要求

OpenAI API key（已确认你有）
Python 3.8+
安装依赖：

pip install openai pillow

🚀 可拓展功能建议

功能	说明
🎙️ 生成语音讲解	配合 Edge TTS / ElevenLabs，生成音频
🖼️ 图+文批量导出	输出 Word/PPT 图文合集
🧾 Excel 图文汇总	图片文件名+配文保存到 Excel 表
📲 一键打包	将图片 + 配文 + 音频打包发布

是否需要我 整合为完整项目结构 + 图形界面？或者帮你加入 语音播客生成部分（配合 Edge TTS）？

只需告诉我发布用途平台，我会为你调整文风和音频语气风格！