DeepSeek测评：元宝满血版VS本地量化

比较两个JSON文件中的标注框，若两个框的坐标完全一致且标签相同，则视为重复，仅保留一个。可通过遍历所有框并使用集合或字典记录已存在的框来实现。

AI浩

801人浏览 · 2025-02-20 20:38:45

AI浩 · 2025-02-20 20:38:45 发布

测评任务

我在不同时期标注了两份Labelme格式的json数据，将其合并，如果重合区域超过了90%，默认为一个检测框。

测试元宝满血版的DeepSeek

问：合并两个Labelme标注的json，里面有相同的框，则保留一个

要合并两个Labelme标注的JSON文件并去重相同框，可通过以下步骤实现：

解析JSON数据
使用Python读取两个JSON文件，提取所有标注框的坐标和标签信息。Labelme的标注框数据通常包含points（坐标数组）和label（类别名称）字段。
定义去重逻辑
比较两个JSON文件中的标注框，若两个框的坐标完全一致且标签相同，则视为重复，仅保留一个。可通过遍历所有框并使用集合或字典记录已存在的框来实现。
合并数据并生成新JSON
将去重后的标注框合并到同一数据结构中，并关联对应的图片路径。最后将合并结果写入新的JSON文件。

示例代码框架（需根据实际JSON结构调整）：

import json

def load_json(file_path):
    with open(file_path, 'r') as f:
        return json.load(f)

def merge_annotations(json1, json2):
    merged = json1.copy()
    seen = set()
    for shape in json2['shapes']:
        points = tuple(shape['points'])
        label = shape['label']
        key = (points, label)
        if key not in seen:
            seen.add(key)
            merged['shapes'].append(shape)
    return merged

# 使用示例
json1 = load_json('file1.json')
json2 = load_json('file2.json')
merged_json = merge_annotations(json1, json2)
with open('merged.json', 'w') as f:
    json.dump(merged_json, f)

注意事项：

若标注框坐标存在微小差异（如浮点数精度问题），需添加容差判断。
合并后需检查图片路径是否一致，避免关联错误。

这个回答显然是错的。

再问：合并两个Labelme标注的json，里面有相同的框，则保留一个，通过重叠区域来判断是否是同一个区域，如果重合度超过了95%，则是同一个

以下是基于 重叠区域判断 合并两个Labelme标注JSON的完整代码实现，支持矩形和多边形标注的去重：

完整代码

import json
from shapely.geometry import Polygon

def load_labelme_json(file_path):
    """加载Labelme格式的JSON文件"""
    with open(file_path, 'r') as f:
        data = json.load(f)
    return data

def calculate_overlap(poly1, poly2):
    """计算两个多边形的重叠比例（基于较小面积）"""
    try:
        # 计算交集面积
        intersection = poly1.intersection(poly2).area
        # 计算较小面积
        min_area = min(poly1.area, poly2.area)
        return intersection / min_area if min_area > 0 else 0.0
    except:
        return 0.0  # 处理无效几何图形

def is_same_region(shape1, shape2, overlap_threshold=0.95):
    """判断两个标注是否为同一区域"""
    # 标签不同直接返回False
    if shape1['label'] != shape2['label']:
        return False
    
    # 将坐标转换为Shapely多边形
    try:
        poly1 = Polygon(shape1['points'])
        poly2 = Polygon(shape2['points'])
    except:
        return False  # 处理无效坐标
    
    # 计算重叠比例
    overlap_ratio = calculate_overlap(poly1, poly2)
    return overlap_ratio >= overlap_threshold

def merge_labelme_jsons(json1, json2, output_path='merged.json'):
    """合并两个Labelme JSON文件并去重"""
    # 校验是否为同一张图片
    if json1['imagePath'] != json2['imagePath'] or json1['imageHeight'] != json2['imageHeight'] or json1['imageWidth'] != json2['imageWidth']:
        raise ValueError("JSON文件不来自同一张图片，无法合并！")
    
    merged = json1.copy()
    existing_shapes = merged['shapes']
    
    # 遍历第二个JSON的所有标注
    for shape2 in json2['shapes']:
        is_duplicate = False
        # 检查是否与已有标注重复
        for shape1 in existing_shapes:
            if is_same_region(shape1, shape2):
                is_duplicate = True
                break
        if not is_duplicate:
            existing_shapes.append(shape2)
    
    # 保存合并后的JSON
    with open(output_path, 'w') as f:
        json.dump(merged, f, indent=2)
    print(f"合并完成，结果已保存至 {output_path}")

# 使用示例
if __name__ == '__main__':
    json1 = load_labelme_json('annotations1.json')
    json2 = load_labelme_json('annotations2.json')
    merge_labelme_jsons(json1, json2)

代码说明

核心功能
- is_same_region(): 通过标签一致性 + 重叠度判断是否为同一区域
- calculate_overlap(): 基于Shapely库计算多边形重叠比例
- merge_labelme_jsons(): 主函数实现合并与去重逻辑
支持标注类型
- 矩形（自动转换为4点多边形）
- 多边形（任意点数）
- 其他形状（需确保points字段格式正确）
关键参数
- overlap_threshold=0.95: 可调整的重叠阈值（默认95%）
- 自动处理浮点数精度问题（通过Shapely几何计算）

使用示例

假设有两个标注文件：

annotations1.json: 包含标签为cat的矩形框 [10,10,50,50]
annotations2.json: 包含标签为cat的矩形框 [12,12,50,50]（轻微偏移）

运行合并后：

两个框的重叠比例为 (38*38)/(40*40) = 90.25%（不满足95%阈值）
合并结果会保留两个标注

若将第二个框改为 [11,11,50,50]，则重叠比例达到 95.06%，合并后仅保留一个。

注意事项

依赖安装
```
pip install shapely
```

性能优化

如果标注数量较大（>1000），建议使用空间索引（如R-tree）加速查询：

from shapely.strtree import STRtree
# 构建空间索引
polygons = [Polygon(shape['points']) for shape in existing_shapes]
tree = STRtree(polygons)

特殊形状处理

对于圆形标注（Labelme中用shape_type: circle表示），需先将中心点+半径转换为多边形近似：

import numpy as np
def circle_to_polygon(center, radius, num_points=20):
    angles = np.linspace(0, 2*np.pi, num_points)
    points = [(center[0] + radius*np.cos(a), center[1] + radius*np.sin(a)) for a in angles]
    return points

这个回答也是错了，没有理解Labelme的shape[‘points’]，于是我将shape[‘points’]告诉它。

继续问：‘points’: [[660, 494], [706, 539]], ‘shape_type’: ‘rectangle’

根据提供的矩形标注框信息 'points': [[660, 494], [706, 539]], 'shape_type': 'rectangle'，其表示一个轴对齐矩形，左上角坐标为 $(660, 494)$ ，右下角坐标为 $(706, 539)$ 。在合并Labelme标注JSON时，若需判断该矩形与其他矩形是否重复（重合度超过95%），可参考以下步骤：

判断逻辑

坐标转换
将两个矩形标注的坐标转换为Shapely的Polygon对象：

from shapely.geometry import Polygon

# 当前矩形坐标转换
rect1 = Polygon([(660, 494), (706, 494), (706, 539), (660, 539)])

# 假设另一矩形坐标为 [[x1,y1], [x2,y2]]
rect2 = Polygon([(x1, y1), (x2, y1), (x2, y2), (x1, y2)])

重叠比例计算
计算交集面积与较小矩形面积的比值：
$overlap_ratio = intersection_area min ⁡ ( area 1 , area 2 ) \text{overlap\_ratio} = \frac{\text{intersection\_area}}{\min(\text{area}_1, \text{area}_2)}$
判定条件
若 overlap_ratio ≥ 0.95 且标签一致，则视为重复，仅保留一个标注。

代码实现示例

def is_duplicate_rectangle(shape1, shape2):
    # 标签不同直接返回False
    if shape1['label'] != shape2['label']:
        return False
    
    # 提取坐标点并构建多边形
    points1 = shape1['points']
    points2 = shape2['points']
    rect1 = Polygon([(points1[0][0], points1[0][1]),
                     (points1[1][0], points1[0][1]),
                     (points1[1][0], points1[1][1]),
                     (points1[0][0], points1[1][1])])
    rect2 = Polygon([(points2[0][0], points2[0][1]),
                     (points2[1][0], points2[0][1]),
                     (points2[1][0], points2[1][1]),
                     (points2[0][0], points2[1][1])])
    
    # 计算重叠比例
    intersection = rect1.intersection(rect2).area
    min_area = min(rect1.area, rect2.area)
    overlap_ratio = intersection / min_area if min_area > 0 else 0.0
    return overlap_ratio >= 0.95

注意事项

坐标顺序
Labelme矩形标注默认使用[[x1,y1], [x2,y2]]格式，需确保转换为多边形时逻辑正确（如代码中的四角点生成）。
性能优化
若需处理大量标注，建议使用空间索引（如STRtree）加速查询。
容差处理
可调整阈值（如0.95→0.9）以适应标注误差。

如需进一步操作（如批量合并JSON文件），可参考完整合并代码。

终于，能用了，但是没有给出我想要的答案。我想要IOU的重合大于95%。

本地Q4量化满血版

问：合并两个Labelme标注的json，里面有相同的框，则保留一个，通过重叠区域来判断是否是同一个区域，如果重合度超过了95%，则是同一个

好的，我需要将两个Labelme格式的JSON文件合并，当遇到相同位置的标注框时（根据超过95%的重叠面积判断为
同一目标），只保留其中一个。以下是我的逐步思考和解决方案：

1. 分析需求

输入: 两个Labelme JSON文件
输出: 合并后的一个JSON文件
核心逻辑:
- 检测两文件中标注框（shape）的重叠度，若≥95%则认为重复，保留其中一个；
- 非重叠的标注全部保留。

2. 技术关键点

如何计算两个矩形框的重叠率？
使用“交并比”（IoU, Intersection over Union）：交集面积 / 最小包围框面积（此处选择更严格的判定方
式，确保高重合度）。
如何处理Labelme JSON结构？
Labelme的JSON结构包含图像信息、标注形状列表等。合并时需要：
1. 读取两个JSON文件的shapes字段；
2. 去重处理；
3. 生成新的JSON文件。

3. 实现步骤

Step 1: 加载并解析两个JSON文件

import json

def load_json(file_path):
    with open(file_path, 'r') as f:
        return json.load(f)

json1 = load_json('file1.json')
json2 = load_json('file2.json')

shapes1 = json1['shapes']
shapes2 = json2['shapes']

Step 2: 定义矩形框重叠率计算函数

def calculate_iou(box1, box2):
    # Labelme的矩形框表示为 [[x1,y1], [x2,y2]]（左上和右下角点）
    x1_min = min(point[0] for point in box1['points'])
    y1_min = min(point[1] for point in box1['points'])
    x1_max = max(point[0] for point in box1['points'])
    y1_max = max(point[1] for point in box1['points'])

    x2_min = min(point[0] for point in box2['points'])
    y2_min = min(point[1] for point in box2['points'])
    x2_max = max(point[0] for point in box2['points'])
    y2_max = max(point[1] for point in box2['points'])

    # 计算交集区域
    inter_x_min = max(x1_min, x2_min)
    inter_y_min = max(y1_min, y2_min)
    inter_x_max = min(x1_max, x2_max)
    inter_y_max = min(y1_max, y2_max)

    if inter_x_max < inter_x_min or inter_y_max < inter_y_min:
        return 0.0

    # 交集面积
    inter_area = (inter_x_max - inter_x_min) * (inter_y_max - inter_y_min)
    
    # 最小包围框面积（此处采用更严格的重叠率定义）
    min_box1_area = (x1_max - x1_min) * (y1_max - y1_min)
    min_box2_area = (x2_max - x2_min) * (y2_max - y2_min)
    
    # 选择较小的面积作为分母，确保高重叠率
    min_area = min(min_box1_area, min_box2_area)
    iou = inter_area / min_area if min_area > 0 else 0.0

    return iou

Step 3: 遍历标注框去重
从shapes2中逐个检查是否与shapes1中的框高度重叠，若未重复则添加到合并列表。

merged_shapes = shapes1.copy()

for shape2 in shapes2:
    is_duplicate = False
    for shape1 in merged_shapes:
        # 仅处理相同标签的矩形（假设其他形状如多边形不参与去重）
        if shape1['shape_type'] == 'rectangle' and \
           shape2['shape_type'] == 'rectangle' and \
           shape1['label'] == shape2['label']:

            iou = calculate_iou(shape1, shape2)
            if iou >= 0.95:
                is_duplicate = True
                break

    if not is_duplicate:
        merged_shapes.append(shape2)

Step 4: 生成合并后的JSON文件
保留第一个文件的图像信息，更新shapes字段。

merged_json = {
    "version": json1["version"],
    "flags": json1["flags"],
    "imagePath": json1["imagePath"],
    "imageData": json1.get("imageData", None),
    "imageHeight": json1["imageHeight"],
    "imageWidth": json1["imageWidth"],
    "shapes": merged_shapes
}

with open('merged.json', 'w') as f:
    json.dump(merged_json, f, indent=2)

4. 注意事项

标签一致性: 仅合并相同label的标注框；
非矩形处理: 当前逻辑跳过非矩形的形状（如多边形、点），若需支持需调整IoU计算方式；
性能优化: 如果标注数量大，双重循环效率低，可改用空间索引（如R-tree）加速查询。

5. 完整代码示例
将上述步骤整合为一个Python脚本：

import json

def load_json(file_path):
    with open(file_path, 'r') as f:
        return json.load(f)

def calculate_iou(box1, box2):
    # ...（同上Step 2的calculate_iou函数）

def merge_labelme_jsons(json1_path, json2_path, output_path='merged.json', iou_threshold=0.95):
    json1 = load_json(json1_path)
    json2 = load_json(json2_path)

    merged_shapes = json1['shapes'].copy()

    for shape2 in json2['shapes']:
        is_duplicate = False
        for shape1 in merged_shapes:
            if (shape1['shape_type'] == 'rectangle' and 
                shape2['shape_type'] == 'rectangle' and 
                shape1['label'] == shape2['label']):
                
                iou = calculate_iou(shape1, shape2)
                if iou >= iou_threshold:
                    is_duplicate = True
                    break
        
        if not is_duplicate:
            merged_shapes.append(shape2)

    # 构建合并后的JSON数据
    merged_json = {
        "version": json1["version"],
        "flags": json1.get("flags", {}),
        "imagePath": json1["imagePath"],
        "imageData": json1.get("imageData"),
        "imageHeight": json1["imageHeight"],
        "imageWidth": json1["imageWidth"],
        "shapes": merged_shapes
    }

    with open(output_path, 'w') as f:
        json.dump(merged_json, f, indent=2)
    print(f"Merged JSON saved to {output_path}")

# 使用示例
merge_labelme_jsons('file1.json', 'file2.json')

6. 测试验证

场景一: 两个完全相同的框 → 保留一个；
场景二: 部分重叠但未超过95%的框 → 全部保留；
场景三: 不同标签的同位置框 → 视为不同，均保留。

建议在合并后通过Labelme工具可视化检查结果是否符合预期。
在这里插入图片描述
直接测试结果。很意外，完全正确，用IOU的方式，方法也符合预期！

DeepSeek技术社区

欢迎加入DeepSeek 技术社区。在这里，你可以找到志同道合的朋友，共同探索AI技术的奥秘。

更多推荐

关于Agent智能体的开发心得——从人的角度去理解智能体

DeepSeek技术社区

【深度学习】大模型-Transformer

DeepSeek技术社区

Kubernetes 上的大数据（三）

在本章中，你学习了如何在 Kubernetes 上部署和管理 Apache Spark、Apache Airflow 和 Apache Kafka 等关键大数据技术。将这些工具部署到 Kubernetes 上提供了多个好处，包括简化操作、更好的资源利用、扩展性、高可用性和统一的集群管理。你首先在 Kubernetes 上部署了 Spark 操作符，并运行了一个 Spark 应用程序来处理来自 Am