darkflow yolo v2 training from scratch not working

Two single-class training attempts have been made where one successfully produced reliable bounding boxes and the other failed to produce even one. The successful case was a single-class ‘car’ detector and the other was a ‘face’ detector. The training results do not make sense and this post will document this erratic behavior. Continue reading “darkflow yolo v2 training from scratch not working”

Advertisements

first training with YOLOv2

Intro

I have been studying Yolov2 for a while and have first tried using it on car detection in actual road situations. I used tiny-yolo as the base model and used the pre-trained binary weights. While it recognized cars very well with traditional full-shot car images like the ones that a person can see in a commercial, it did not work well in car images that a driver would see in the driver’s seat.

Preparing Dataset

Get Images from Blackbox Video Footage

Clearly, the pretrained model was not trained with driver’s POV car images. In order to gather some data, I took the liberty of copying the blackbox videos from my Dad’s car. It was approximately an hour long. I used the ffprobe tool to extract a screenshot every 3~10seconds of all the videos. There were two cases where I removed the images from my dataset

  • night images. The night images were very dark and the image of the car differed greatly from what the driver would see in daylight. It looked like a good idea to only work with daylight images for now.
  • nearly identical images during halt. When the car is waiting for the signal, it is halted but the blackbox video is still recording. Therefore some extracted screenshots nearly had identical images and it would be redundant to label this data.

In the end I was able to extract 330 images. It is small but let’s see how far we can go with this amount of dataset.

Labeling the Images

This is another separate project that I had been doing. It was about making a tool where I could label the object in an image. First I tried making an Android app which I did roughly complete but it turned out that I was quite irritating to do this job in a small screen. The productivity of labeling was lower than I thought.

I moved on to creating a website to increase labeling efficiency. After 2 weeks of Django programming, I was able to setup a website where I could not only label my images but also manage them in sets.

With this website, I was able to label all 330 images within approx. 1 hour. Of course, this didn’t include time I had to spend fixing bugs with my website while I started labeling.

Converting label data to darkflow compatible json format

After looking in to the xml format that is included in the darkflow source code as an example, the necessary key-values that are required for the darkflow to understand were identified. The box data that accumulated in my webserver then needed to be converted to this compatible format.

Training

Prepare Custom JSON Parser

By default, the darkflow source code can only parse xml format. However, I find json to be much more easy to handle and thus I added a custom JSON Parser to darkflow and tweaked it so that it can read json files instead of xml files.

The training procedure is simple and documented in the README of darkflow source code. Following the guidelines were sufficient to start training

Result

Train Stats

  • total images: 330
  • batch size: 16
  • epoch: 10

Training Progress Graph

Selection_001.png

In total 200 steps were run and the loss became approximately half compared to the beginning.

Test Set Images

I had another set of driver POV video that I took to use as a test set. I picked a few images and ran the three types of predictions. =

  • Pretrained (No additiona training done from me)
  • Step-105 model
  • Step-200 model

Below are the results.

Pretrained (Step-0)

Step-105

Step-200

Conclusion

  • Training with driver’s POV images even with a small dataset does quite improve the car detection from driver’s POV
  • Step-200 seems to be drawing excessive rectangles. Do not know if this is due to immature detection of half-hidden cars. If we did a better job at training half-hidden cars, then perhaps this issue may disappear.
  • The rectangle position and size is still quite off from what I have anticipated. How can I improve this?
    • The rectangle position and size prediction is related with grid size and anchor points inside YOLOv2. Should take a deeper look into this.

darkflow training tips

install Cython.

 $ sudo pip install Cython

 

 

when installing darkflow at the beginning, install it through the method that enables code changes to be applied.

pip install -e .

the original source code only contains parser for VOC as an example. However, by taking it as a reference, creating a custom json parser that will interface properly with the existing system is easy.

if you do not pass on the json parsed attributes to the system properly, it may give an wrong result like the following:

Dataset of 4 instance(s)
Training statistics: 
 Learning rate : 1e-05
 Batch size : 4
 Epoch number : 2
 Backup every : 2000
step 1 - loss nan - moving ave loss nan
Finish 1 epoch(es)
step 2 - loss nan - moving ave loss nan
Finish 2 epoch(es)

You can see that the system is not able to calculate the loss properly thus, giving nan instead.

A few mistakes that a user can make is 1) did not properly stringify the string of json values, 2) mixed order in x,y values

['007081.jpg', [500, 375, [['dog', 152, 84, 281, 315], ['person', 32, 84, 223, 351], ['person', 259, 105, 500, 375]]]]

this is a proper sample of dumps that is passed on to the system.

If the user doesn’t str() the json string values, it would give something like this:

[u'testimage1.png', [1920, 1080, [[u'car', 898, 544, 591, 390]]]]

Another case is when the user mistakes the order of x,y values. The order should be (xmin, ymin, xmax, ymax). You can see that the wrong example given above has got the min/max the other way around.

After fixing these minor mistakes, the loss calculation works fine.