05 · 检测模块¶

本篇逐行解读两个检测子模块： - swellcamera/swellcamera.py（膨胀检测，208 行） - watercamera/watercamera.py（含水量检测，221 行）

这是整个项目最有"算法味道"的两个文件——投票算法、置信度过滤、标签对调都在这里。前置：04 · 主程序与 GUI 下一篇：06 · 模型与推理

第一部分：膨胀检测模块¶

5.1 swellcamera 文件结构¶

swellcamera/
├── __init__.py         1 行：from .swellcamera import main
├── __main__.py         8 行：python -m swellcamera 的入口
└── swellcamera.py    208 行：核心逻辑

三个入口的关系¶

flowchart LR
    A[python main.py] --> B[main.py:84 import swellcamera] --> C[__init__.py:1 from .swellcamera import main] --> D[swellcamera.swellcamera.main]
    E[python -m swellcamera] --> F[__main__.py:5 from swellcamera import main] --> C
    G[python swellcamera/swellcamera.py] --> D

无论从哪个入口进，最终都调用同一个 main() 函数。

5.2 RealTimeExpansionClassifier 类（swellcamera.py L28-158）¶

这是膨胀检测的核心类。main() 函数实例化它，然后驱动主循环。

构造函数（L29-53）¶

# swellcamera.py:29-53
def __init__(self):
    model_weight_path = get_resource_path(
        os.path.join("models", "best_model4.pth")
    )
    self.classifier = ExpansionClassifier(model_path=model_weight_path)

    self.stats = defaultdict(int)        # 统计每类出现次数
    self.frame_count = 0
    self.last_update_time = time.time()  # 1 秒一次统计打印

    self.font_path = self._get_font_path()
    self.font_size = 20
    self.text_color = (255, 255, 255)    # 白字
    self.bg_color = (0, 0, 0)            # 黑底

    # 标签对调映射
    self.label_mapping = {
        "膨胀": "不膨胀",
        "不膨胀": "膨胀",
        "expanded": "normal",
        "normal": "expanded"
    }

注意 ExpansionClassifier 来自 swellcamera.py:24：

try:
    from models.model4 import AgeClassifier as ExpansionClassifier
except ModuleNotFoundError:
    from model4 import AgeClassifier as ExpansionClassifier

AgeClassifier 被重命名为更合理的名字。这种 try/except 是为了兼容两种运行方式——作为包导入（models.model4 可达）vs 直接运行脚本（PYTHONPATH 里没有 models）。

5.3 ⚠️ 标签对调：本项目最重要的一个细节¶

# swellcamera.py:48-53
self.label_mapping = {
    "膨胀": "不膨胀",
    "不膨胀": "膨胀",
    "expanded": "normal",
    "normal": "expanded"
}

process_frame() 在 L88-99 应用这个映射：

if result and 'class' in result:
    original_class = result['class']
    result['class'] = self.label_mapping.get(original_class, original_class)

    if 'probabilities' in result:
        new_probs = {}
        for k, v in result['probabilities'].items():
            new_key = self.label_mapping.get(k, k)
            new_probs[new_key] = v
        result['probabilities'] = new_probs

这段代码的实际效果是：

model4 实际推理输出	经 label_mapping 后显示
`"膨胀"`	`"不膨胀"`
`"不膨胀"`	`"膨胀"`

这个对调是怎么来的？¶

最可能的解释：训练时正负标签搞反了——训练集里"膨胀"目录里的图片被打成了 0 标签（应该是 1），反之亦然。gxl 训练完发现指标对得上但语义反了，懒得重训，直接在推理代码里加映射来修正。

它带来的问题¶

可读性极差：阅读 model4.py 看到 class_names = ["膨胀", "不膨胀"] 时，不能想当然认为输出就是这个语义——还要叠加 swellcamera 的对调
概率分布也被对调：L94-99 把 dict 的键也换了，所以"膨胀的概率"实际是 model4 里"不膨胀的概率"
训练复现困难：如果未来想重训，必须知道这个对调的存在，否则会再次得到反转的结果

→ 详见 08 · 问题与改进 P0-4

5.4 字体路径搜索（swellcamera.py L55-74）¶

def _get_font_path(self):
    # 1. 项目内 fonts/simhei.ttf
    font_path = get_resource_path(os.path.join("fonts", "simhei.ttf"))
    if os.path.exists(font_path):
        return font_path

    # 2. Windows 系统字体
    system_fonts = [
        "C:/Windows/Fonts/simhei.ttf",
        "C:/Windows/Fonts/msyh.ttc",     # 微软雅黑
        "C:/Windows/Fonts/simsun.ttc",   # 宋体
    ]
    for font in system_fonts:
        if os.path.exists(font):
            return font

    # 3. 退回 PIL 默认（不支持中文）
    return None

优点：项目内字体优先，跨 PyInstaller 打包环境也能找到。

缺点：只列了 Windows 路径，Linux/macOS 上会回退到 PIL 默认字体（不支持中文），导致中文显示为方框。

对比 watercamera 的字体加载（详见 5.9），后者只硬编码 simhei.ttf，没有这个 fallback 机制。两个模块的工程质量不一致。

5.5 process_frame() — 推理主体（L76-111）¶

def process_frame(self, frame):
    """处理单帧图像并返回分类结果（标签已对调）"""
    temp_path = "temp_frame.jpg"
    cv2.imwrite(temp_path, frame)           # ⚠️ 写磁盘

    try:
        result = self.classifier.predict(temp_path)
        if os.path.exists(temp_path):
            os.remove(temp_path)            # 立即删除

        if result and 'class' in result:
            # ... label_mapping 对调（见 5.3）

        if result:
            self.stats[result['class']] += 1
            self.frame_count += 1

        return result
    except Exception as e:
        print(f"分类出错: {str(e)}")
        if os.path.exists(temp_path):
            os.remove(temp_path)
        return None

两个值得吐槽的点：

每帧一次磁盘写入：30fps 摄像头 = 每秒 30 次 cv2.imwrite + Image.open + os.remove，磁盘 I/O 显著拖慢推理速度
临时文件名固定为 temp_frame.jpg：如果膨胀+含水量两个模块同时运行，会竞争同一个文件——一个模块刚写完，另一个模块就覆盖，可能读到错误的图像

修复方案见 08 · 问题与改进 P0-2：用 Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)) 直接内存转换。

5.6 中文绘制（L113-130）¶

def draw_chinese_text(self, frame, text, position):
    try:
        img_pil = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
        draw = ImageDraw.Draw(img_pil)
        if self.font_path and os.path.exists(self.font_path):
            font = ImageFont.truetype(self.font_path, self.font_size)
        else:
            font = ImageFont.load_default()
        draw.text(position, text, font=font, fill=self.text_color)
        return cv2.cvtColor(np.array(img_pil), cv2.COLOR_RGB2BGR)
    except Exception as e:
        # 失败回退到 OpenCV 英文
        cv2.putText(frame, text, position, cv2.FONT_HERSHEY_SIMPLEX, 0.6, self.text_color, 2)
        return frame

经典的「OpenCV 不支持中文 → PIL 绘制 → 转回 OpenCV」三步走。

讽刺的是：L116 这里用了正确的方式做 BGR↔RGB 转换，而 5.5 节的推理路径偏偏不用，绕一圈走磁盘。

5.7 display_result() — 叠加显示（L132-158）¶

def display_result(self, frame, result):
    if result is None:
        return frame

    class_text = f"状态: {result['class']}"
    prob_text = f"置信度: {max(result['probabilities'].values())*100:.1f}%"

    # 黑色背景矩形（OpenCV）
    cv2.rectangle(frame, (10, 10), (300, 80), self.bg_color, -1)
    # 中文文字（PIL）
    frame = self.draw_chinese_text(frame, class_text, (15, 15))
    frame = self.draw_chinese_text(frame, prob_text, (15, 45))

    # 每秒打印一次终端统计
    current_time = time.time()
    if current_time - self.last_update_time >= 1.0:
        self.last_update_time = current_time
        total = max(1, self.frame_count)
        print("\n=== 实时统计 ===")
        for class_name, count in self.stats.items():
            print(f"{class_name}: {count}次 ({count/total:.1%})")

    return frame

注意置信度的算法：取 probabilities 字典所有值的最大值，即"最可能那一类的概率"。因为是二分类，必然 ≥ 0.5。

5.8 main() — 主循环（L160-206）¶

def main():
    classifier = RealTimeExpansionClassifier()

    camera_index = 1                          # ⚠️ 硬编码
    cap = cv2.VideoCapture(camera_index)
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

    if not cap.isOpened():
        print(f"无法打开摄像头，请检查索引{camera_index}是否正确")
        return

    print("实时膨胀分类系统已启动 (按Q退出)...")

    while True:                               # ⚠️ 无 stop_event 检查
        ret, frame = cap.read()
        if not ret:
            print("无法接收帧，可能摄像头已断开")
            break
        result = classifier.process_frame(frame)
        frame = classifier.display_result(frame, result)
        cv2.imshow('Real-time Expansion Detection', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()

    # 最终统计
    print("\n=== 最终统计 ===")
    total = max(1, classifier.frame_count)
    for class_name, count in classifier.stats.items():
        print(f"{class_name}: {count}次 ({count/total:.1%})")

P0-1 bug 的源头：L179 的 while True: 没有 if stop_event.is_set(): break，唯一的退出路径是 L195 的 q 键。

P0-3 bug 的源头：L165 的 camera_index = 1 硬编码，无 fallback。

第二部分：含水量检测模块¶

5.9 watercamera 文件结构¶

watercamera/
├── __init__.py         1 行：from .watercamera import main
├── __main__.py         8 行
└── watercamera.py    221 行

结构与 swellcamera 完全对称。

5.10 WaterContentClassifier 类（watercamera.py L11-149）¶

构造函数（L12-29）¶

# watercamera.py:12-29
def __init__(self):
    self.model_weights = {
        '1': 0.91,  # Model1 (30-35 vs 35-40)  ← 注释
        '2': 0.95,  # Model2 (35-40 vs 40-45)  ← 注释
        '3': 0.90   # Model3 (30-35 vs 40-45)  ← 注释
    }

    self.class_names = {
        '1': ['30-35', '35-40'],
        '2': ['35-40', '40-45'],
        '3': ['30-35', '40-45']
    }

    self.stats = defaultdict(int)
    self.frame_count = 0
    self.last_update_time = time.time()

⚠️ 注释与 model 实际类别顺序不一致： - watercamera.py:14 注释 Model1 (30-35 vs 35-40)，但 model1.py:23 实际 class_names = ["35-40", "30-35"]（顺序相反） - watercamera.py:15 注释 Model2 (35-40 vs 40-45)，但 model2.py:23 实际 class_names = ["40-45", "35-40"] - watercamera.py:16 注释 Model3 (30-35 vs 40-45)，但 model3.py:23 实际 class_names = ["40-45", "30-35"]

这不影响运行（投票算法基于 dict 键名而非顺序），但阅读时容易误导。

self.class_names 字段在代码里实际也没被用到（全文 grep 只在构造函数赋值，从未读取），是死数据。

5.11 三模型加权投票算法（L31-77）¶

这是整个项目最有"算法"含量的部分。完整代码：

# watercamera.py:31-77
def predict_single_image(self, image_path):
    vote_results = defaultdict(float)        # 类别 → 累计加权投票
    all_probs = {}                            # 调试用，记录每个模型原始输出
    valid_models = []                         # 哪些模型纳入了投票

    for model_id in ['1', '2', '3']:
        try:
            model_module = load_model(model_id)            # importlib 动态加载
            classifier = model_module.AgeClassifier()
            pred_label, probs = classifier.predict(image_path)

            prob_values = list(probs.values())
            confidence_diff = abs(prob_values[0] - prob_values[1])

            if confidence_diff >= 0.1:                     # ★ 置信度过滤
                valid_models.append(model_id)
                weight = self.model_weights[model_id]
                for class_name, prob in probs.items():
                    vote_results[class_name] += prob * weight  # ★ 加权累加

            all_probs[f"Model{model_id}"] = {
                'classes': self.class_names[model_id],
                'probs': probs,
                'prediction': pred_label,
                'confidence_diff': confidence_diff
            }
        except Exception as e:
            print(f"模型 {model_id} 预测失败: {str(e)}")
            continue

    if not vote_results:
        return None

    # 归一化 + 取最大
    total_weight = sum(vote_results.values())
    normalized_results = {k: v/total_weight for k, v in vote_results.items()}
    final_class = max(normalized_results.items(), key=lambda x: x[1])[0]
    confidence = max(normalized_results.values())

    return {
        'class': final_class,
        'confidence': confidence,
        'probabilities': normalized_results,
        'details': all_probs
    }

算法拆解¶

步骤 1：每个二分类器各跑一次¶

模型	比较	输出（举例）
model1	35-40 vs 30-35	`{"35-40": 0.8, "30-35": 0.2}`
model2	40-45 vs 35-40	`{"40-45": 0.3, "35-40": 0.7}`
model3	40-45 vs 30-35	`{"40-45": 0.1, "30-35": 0.9}`

步骤 2：置信度过滤¶

confidence_diff = abs(prob_values[0] - prob_values[1])
if confidence_diff >= 0.1:
    # 纳入投票

什么意思：如果两类概率太接近（差距 < 0.1），说明模型也"不确定"，干脆不让它投票。

用上面例子： - model1: |0.8 - 0.2| = 0.6 ≥ 0.1 → 纳入 - model2: |0.3 - 0.7| = 0.4 ≥ 0.1 → 纳入 - model3: |0.1 - 0.9| = 0.8 ≥ 0.1 → 纳入

步骤 3：加权累加¶

weight = self.model_weights[model_id]   # 0.91 / 0.95 / 0.90
for class_name, prob in probs.items():
    vote_results[class_name] += prob * weight

按上面例子算：

类别	来自 model1	来自 model2	来自 model3	合计
35-40	0.8 × 0.91 = 0.728	0.7 × 0.95 = 0.665	—	1.393
30-35	0.2 × 0.91 = 0.182	—	0.9 × 0.90 = 0.810	0.992
40-45	—	0.3 × 0.95 = 0.285	0.1 × 0.90 = 0.090	0.375

步骤 4：归一化 + argmax¶

total = 1.393 + 0.992 + 0.375 = 2.760
归一化：{"35-40": 0.505, "30-35": 0.359, "40-45": 0.136}
final_class = "35-40"，confidence = 0.505

算法的合理性分析¶

为什么是这个设计： - 三个类别两两组合 = C(3,2) = 3 种 → 正好用 3 个二分类器 - 比起一个直接三分类模型，每个二分类器训练数据更"对称"（每类样本数相等更容易） - 加权权重大致对应训练时各模型的验证集准确率（0.91/0.95/0.90）

问题： - 某些类别只被 2 个模型见过（如 "40-45" 只被 model2、model3 涉及），权重分布天然不对等 - 阈值 0.1 是经验值，没有理论依据 - 投票按概率累加而非"硬投票"，本质等价于 logit 平均

5.12 process_frame() — 复用临时文件（L79-99）¶

def process_frame(self, frame):
    temp_path = "temp_frame.jpg"             # ⚠️ 与 swellcamera 同名！
    cv2.imwrite(temp_path, frame)
    try:
        result = self.predict_single_image(temp_path)
        os.remove(temp_path)
        if result:
            self.stats[result['class']] += 1
            self.frame_count += 1
        return result
    except Exception as e:
        print(f"处理帧时出错: {str(e)}")
        if os.path.exists(temp_path):
            os.remove(temp_path)
        return None

临时文件名冲突：两个模块都用 "temp_frame.jpg"。如果同时运行（GUI 允许的场景），它们会互相覆盖，可能读到对方的画面。

5.13 display_result() — 中文绘制 + 5 秒统计（L101-149）¶

def display_result(self, frame, result):
    if result is None:
        return frame

    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    pil_image = Image.fromarray(frame_rgb)
    draw = ImageDraw.Draw(pil_image)

    try:
        font = ImageFont.truetype("simhei.ttf", 32)    # ⚠️ 相对路径，依赖工作目录
    except:
        font = ImageFont.load_default()

    class_text = f"含水量: {result['class']}%"
    conf_text = f"置信度: {result['confidence']*100:.1f}%"
    draw.rectangle((10, 10, 300, 80), fill=(0, 0, 0, 180))
    draw.text((15, 15), class_text, font=font, fill=(255, 255, 255))
    draw.text((15, 45), conf_text, font=font, fill=(255, 255, 255))
    frame = cv2.cvtColor(np.array(pil_image), cv2.COLOR_RGB2BGR)

    # 每 5 秒打印统计
    if current_time - self.last_update_time >= 5.0:
        total = max(1, self.frame_count)
        for class_name, count in self.stats.items():
            print(f"{class_name}%: {count}次 ({count/total:.1f}%)")
        ...
        self.frame_count = 0                            # ⚠️ 计数器重置
        self.last_update_time = current_time

    return frame

对比 swellcamera：watercamera 这里没有字体 fallback，硬编码 simhei.ttf，Linux 上必崩。

self.frame_count = 0：每 5 秒重置计数。注意 stats 字典没重置，所以统计是"自开始以来的累计"，而 frame_count 是"5 秒窗口内的"。这种不一致会让 count/total 这个百分比的语义变得很奇怪。

5.14 main() — 5 秒触发推理（L151-219）¶

def main():
    classifier = WaterContentClassifier()
    camera_index = 1
    cap = cv2.VideoCapture(camera_index)
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)           # 比 swell 低
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)              # 减少延迟
    cap.set(cv2.CAP_PROP_FPS, 30)
    cap.set(cv2.CAP_PROP_AUTO_EXPOSURE, 0.75)        # 自动曝光

    if not cap.isOpened():
        print(f"无法打开摄像头，请检查索引 {camera_index} 是否正确")
        return

    print("实时含水量分类系统已启动 (按Q退出)...")

    last_process_time = time.time()
    process_interval = 5.0                           # ★ 5 秒一次推理
    last_result = None

    while True:                                      # ⚠️ 无 stop_event
        ret, frame = cap.read()
        if not ret:
            break
        current_time = time.time()
        if (current_time - last_process_time) >= process_interval:
            result = classifier.process_frame(frame)
            if result is not None:
                last_result = result
            last_process_time = current_time

        # 每帧都用最新结果绘制（即使没新推理）
        frame = classifier.display_result(frame, last_result)
        cv2.imshow('Water Content Classification', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()
    print("\n=== 最终统计 ===")
    total = sum(classifier.stats.values())
    if total > 0:
        for class_name, count in classifier.stats.items():
            print(f"{class_name}%: {count}次 ({count/total:.1%})")

5 秒采样的实际效果： - 画面是 30 fps 的实时画面 - 但分类结果每 5 秒才更新一次，中间 4.9 秒都显示上次的 last_result - 用户看到的是「实时摄像头 + 5 秒粘滞标签」

为什么是 5 秒：三个 EfficientNet-B3 模型推理一次约 1-3 秒（CPU），每帧都跑会卡死。5 秒间隔给了足够的余地。

5.15 两个模块的横向对比¶

维度	swellcamera	watercamera
推理频率	每帧（~30 fps）	每 5 秒
摄像头分辨率	1280×720	640×480
模型数量	1 个（model4）	3 个（model1+2+3）
算法	单模型 + 标签对调	三模型加权投票 + 置信度过滤
字体加载	三级 fallback	单一相对路径，无 fallback
中文绘制位置	独立函数 `draw_chinese_text`	直接在 display_result 内联
字体大小	20 px	32 px
stop_event 检查	❌ 无	❌ 无
临时文件	`temp_frame.jpg`（每帧）	`temp_frame.jpg`（每 5 秒）
终端统计周期	1 秒	5 秒
启动信息	"按Q退出"	"按Q退出"

总评：watercamera 的算法更精致（投票合成），swellcamera 的工程更鲁棒（字体 fallback、错误处理更完整）。两个模块似乎是分别写的，没充分对齐风格。

5.16 改这两个文件最常见的需求¶

你想做	改哪
改摄像头索引	`swellcamera.py:165` + `watercamera.py:156`
改采样间隔	`watercamera.py:180` `process_interval = 5.0`
改投票权重	`watercamera.py:14-18`
改置信度阈值	`watercamera.py:46` `>= 0.1`
去掉标签对调	删除 `swellcamera.py:48-53` 整个 `label_mapping`
换字体	`swellcamera.py:55-74` + `watercamera.py:113`
让停止按钮生效	在 main 主循环加 `if stop_event.is_set(): break`，并把 `stop_event` 作为参数传入
消除临时文件	把 `cv2.imwrite + Image.open` 改成 `Image.fromarray(cv2.cvtColor(frame, BGR2RGB))`