Cat detector
Experiment 1: fixed-scale (Aug 30, 2006)
Scale-dependent, using supervision:
- Training data: take images with Francois' ground truth. Resample so that each head is 64 pixels across. Use image inside hand-drawn cat body box for training cat category. Use pictures outside the box for training background category.
- Test data: As above, different pictures. Either the image is the box exactly containing a cat, or it is a box not containing a cat.
- System: Lana Lazebnik's CVPR06 system as implemented by Greg.
Results:
- ROC curves for different size training data shown at right.
Notice:
- Marked improvement when the n. of training examples increases. Even from 500 to 900 there is no saturation. It would be interesting to see how many more training examples can we use and still see improvement.
- Results look quite impressive (85% correct at 0.5% f.a. rate) but we need to see what happens in the case when the scale of the cat is unknown and also we have to search for the cat around the picture.
To do:
- Plot performance (e.g. % detect at 0.1% false alarm rate) as fn of n. of training examples.
- Retrain with obvious symmetries of training data: l-r flip, rotations (full circle or just +- 30^0?).
- Test current system with rescaled pictures of cats, to test degree of scale invariance (we need to discuss details).
Experiment 2 : fixed-scale with corrections/improvements (Sep. 8, 2006)
- Bottom line
- In experiment 1 we got 87.1% true postives @ 1 % false positives @ Ntrain=900.
In experiment 2 we got 85.7% performance at Ntrain=800.
Thus the results have not changed significantly.
There was a concern in the first experiment that I was including all cropped cats and all cropped clutter in the pool of potential training and test images. This was on the (incorrect) assumption that none of the cropped images overlapped. The sets of training and test images the were chosen from this pool were, of course, disjoint. However there was a small possiblity that some of the images that began with the prefix train
might overlap with images that began with the prefix train
, and likewise for the images that begin with test
.
The probability of this seems small, but Francois and I just wanted to check to make sure the test set was not getting contaminated with duplicates from the training set.
Things that were done differently this time:
- Only images that begin with
train
andtest
are picked for training and testing (respectively). - During testing (NOT training) I use 10 times more clutter images than cat images. This improves the signal-to-noise at low false negative rates.
- We are now limited to a maximum number of 834 train and 179 test images (because the two image pools are now isolated).
- Can no longer go up to Ntrain=900
- This time the code produces a list of the worst false positive examples, so we can see what they look like
Let's take a closer look, specifically at the example of Ntrain=800. The ROC curve to the right comes from moving the SVM threshold (black vertical bar) from right to left, and plotting the fraction of misidentified clutter and cat images at each step:
The "worst" false positives are circled in yellow in the above plot. Below are images of the 5 worst false positives, ie. the 5 worst examples of clutter being misidentified as cats, starting with the most extreme misidentification on the left. I'm not quite sure what pillows and bad upholstery have to do with cats? You be the judge:
There appears to be a duplication of the same pillow image here, many times. As you look at, say, the worst 50 false positives this same pillow segment crops up repeatedly. I'm not sure this is what we want, to accurately represent clutter. Some scan-window overlap is understandable but too much overlap may be unrealistic. Anyway this is potentially hurting our performance, because one bad piece of pillow keeps hurting us over and over again during testing.
One thing we would like to plot is performance vs. Ntrain for a given level of false positives that is deemed acceptable. For now here is a quick-and-dirty version. I'll try to come up with something prettier later.
Experiment 3 : test scale invariance
Changes as compared to the other 2 experiments:
- Cat faces are no longer exactly 64 pixels across.
- Crop size is constant, face size is not
- More like a real scan strategy
- Generate a tuning curve charting performance as a function of cat scale
- Fix an acceptable rate of false positives based on scan strategy.
- We need to decide what is reasonable. 0.1%? 1%?
- Now for a range of Ntrain values
- Plot cat scale on the x axis and true positives on the y axis
- What scales can we tolerate, while maintaining acceptable performance?
- Fix an acceptable rate of false positives based on scan strategy.
To create the tuning curve, we will need an auxiliary file which contains the scale of the cat (face) in each of the cat training and test images.