dlibのSimple_Object_detectorを用いたPythonでの物体検出器の学習

- はじめに -

これはこの記事の続きで、dlibを使って物体検出をしようというものである。

まあ正確には、dlibには「顔検出器の学習」ってのは無くて「物体検出器の学習」の機能を使って、顔検出器の再学習がしたいという記事です。

dlibを使う際の参考になればよいです。

- dlibのObjectDetectorについて -

dlibに物体検出の学習が入ったのは2014年の時。
内部にはHoG+SVMを使っていて、OpenCVで学習する場合に比べて、遥かに少ない学習データで、かなりの精度を出す事ができる。

リリース時の本家記事 : dlib C++ Library: Dlib 18.6 released: Make your own object detector!

本記事では、Pythonのdlib apiを使って、物体検出器の学習を行っていく。

Python用のドキュメント : Classes — dlib documentation

dlib.simple_object_detectorを使う。
一応、こちらに公式の学習サンプルがある。

http://dlib.net/train_object_detector.py.html

大体の事は書いてあるけど、パラメータ等が全部書いてある訳ではないので、日本語訳してごにょごにょしたものをリポジトリに置いておいたので見て頂ければ。

github.com

- 学習形式とサンプル -

ディレクトリ内の画像と矩形情報が入ったテキストファイルを元に学習するスクリプトは以下。

#! /usr/bin/python
# -*- coding: utf-8 -*-
u"""rect.txtと画像データを用いてdlibを追加学習するスクリプト."""

import dlib
import os
from skimage import io

input_folder = "./test/"
rect_file = "./true_rect.txt"
output_svm = "detector.svm"

def get_rect(rect_file):
    u"""矩形ファイルを読み込みリスト化."""
    rect_list = []
    for line in open(rect_file, 'r'):
        rect_list.append(line)
    return rect_list


def make_train_data(rect_list):
    u"""矩形リストから学習用データを生成する."""
    boxes = []
    images = []
    for i, x in enumerate(rect_list):

        # 改行と空白を除去してリスト化
        x = x.replace('\n', '')
        x = x.replace('\r', '')
        one_data = x.split(' ')
        # 矩形の数k
        k = len(one_data) / 4

        # 矩形をdlib.rectangle形式でリスト化
        img_rect = []
        for j in range(k):
            left = int(one_data[j*4])
            top = int(one_data[j*4+1])
            right = int(one_data[j*4+2])
            bottom = int(one_data[j*4+3])
            img_rect.append(dlib.rectangle(left, top, right, bottom))

        # boxesに矩形リストをtupleにして追加
        # imagesにファイル情報を追加
        f_path = input_folder + one_data[k*4] + '.jpg'
        if os.path.exists(f_path):
            boxes.append(tuple(img_rect))
            images.append(io.imread(f_path))

    return boxes, images


def training(boxes, images):
    u"""学習するマン."""
    # simple_object_detectorの訓練用オプションを取ってくる
    options = dlib.simple_object_detector_training_options()
    # 左右対照に学習データを増やすならtrueで訓練(メモリを使う)
    options.add_left_right_image_flips = True
    # SVMを使ってるのでC値を設定する必要がある
    options.C = 5
    # スレッド数指定
    options.num_threads = 16
    # 学習途中の出力をするかどうか
    options.be_verbose = True
    # 学習許容範囲
    options.epsilon = 0.001
    # サンプルを増やす最大数(大きすぎるとメモリを使う)
    options.upsample_limit = 8
    # 矩形検出の最小窓サイズ(80*80=6400となる)
    options.detection_window_size = 6400

    # 学習してsvmファイルを保存
    print('train...')
    detector = dlib.train_simple_object_detector(images, boxes, options)
    detector.save(output_svm)


if __name__ == '__main__':
    rect_list = get_rect(rect_file)
    boxes, images = make_train_data(rect_list)
    training(boxes, images)

simple_object_detector_training内部でデータの増量を行っており、optionのupsample_limitとadd_left_right_image_flipsで調整できる。
データの増量では、基本的なData Augmentationが行われているため、学習用のデータは最小で良い。

実際、公式のサンプルコードでは、22枚のサンプル画像と矩形情報を学習用データセットとして、高い精度の顔検出器を作っている。

あまり画像を入れるとMemoryErrorの原因となる。
大体こんな感じで止まったら、Optionのパラメータ調整しなおすか、画像を減らすか、メモリを増やす必要がある。

Traceback (most recent call last):
  File "detector.py", line 104, in <module>
    boxes, images = make_train_data(rect_list)
  File "detector.py", line 70, in make_train_data
    images.append(io.imread(f_path))
  File "C:\Python27\lib\site-packages\skimage\io\_io.py", line 61, in imread
    img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
  File "C:\Python27\lib\site-packages\skimage\io\manage_plugins.py", line 211, in call_plugin
    return func(*args, **kwargs)
  File "C:\Python27\lib\site-packages\skimage\io\_plugins\pil_plugin.py", line 37, in imread
    return pil_to_ndarray(im, dtype=dtype, img_num=img_num)
  File "C:\Python27\lib\site-packages\skimage\io\_plugins\pil_plugin.py", line 111, in pil_to_ndarray
    frame = np.array(frame, dtype=dtype)
MemoryError

dlibの公式Q&Aで「MemoryErrorって出るんだけど…」という質問に対して、作者が「Buy Memory!」と応えているくらいなので仕方ない。

感覚としては、32Gメモリ積んだマシンでも、100*100サイズの画像1000枚を、add_left_right_image_flips=true、upsample_limit=4とかで学習させたら落ちる。
CPUもフルに使うので最悪PCフリーズが有り得る。
学習データを減らすのが手っ取り早いが、対応できる環境が少なくなる。
マシンかパラメータでなんとかこうとかするのが良い。
(こういう点から、dlibの物体検出器学習クラスは背景や周りの環境が固定な場合超強いって感じする。)

64Gメモリ、16コアのCPUでも100*100の画像2000枚くらいが限界っぽい。
それ以上はパラメータ調整云々でもなんともならなかった。

学習用の矩形情報と画像情報はPythonコードで言うと以下のような形式で入力する。
boxes[n]とimages[n]が共通の情報となれば良い。

boxes_img1 = ([dlib.rectangle(left=329, top=78, right=437, bottom=186),
               dlib.rectangle(left=224, top=95, right=314, bottom=185),
               dlib.rectangle(left=125, top=65, right=214, bottom=155)])
boxes_img2 = ([dlib.rectangle(left=154, top=46, right=228, bottom=121),
               dlib.rectangle(left=266, top=280, right=328, bottom=342)])
boxes = [boxes_img1, boxes_img2]
images = [io.imread(dir_path + '/xxxxxx.jpg'),
          io.imread(dir_path + '/yyyyyy.jpg')]

学習に使うrect.txtは

x1 y1 x2 y2 file_name
x1 y1 x2 y2 file_name2

のような空白CSVっぽくなってる前提。
矩形が複数ある場合の1行は

x1 y1 x2 y2 x3 y3 x4 y4 file_name

といった形式で保存してあるものをパースしている。

いつかxmlにもする。
学習データ作って、xmlで学習させてる人は居たのでリンク貼っとく。

- 学習結果のsvmを使う -

前回の記事のdetector.runする部分を修正する。

- detector = dlib.get_frontal_face_detector()
+ detector = dlib.simple_object_detector("detector.svm")

- dets, scores, idx = detector.run(img_rgb, 0)
+ dets = detector(img_rgb, 0)

自前で学習した学習器はスコアや第二候補を返さないっぽい。