Pollen Notebook, Fall 2008
Oct 7
Evaluating The Appearance Model
cd ~/pollen/mar/sift8; cats=getcategories; cats([6 7 10 14 16])=[]; histpollen('../hist8/1',cats,50,50,200); trainpollen(50,50,200); svm1= testpollen(50,50,200); share pollen_svm_Oct7 svm1
I generate 4 such confusion matrices svm[1234]
in the corresponding directories hist8/[1234]
and average them to get the confusion matrices shown here:
conf=(svm1.conf+svm2.conf+svm3.conf+svm4.conf)/4; svm= svm1; viewconfuse(svm);
Categories eucalyp, fungus, liqamber, palm, poplar
were removed because they had less than 100 training examples.
Bottom line:
- This looks reasonably good to me, especially since we're only using
ntrain=50
(the real model usesntrain=200
). - If any of the 5 categories that were left out are a problem, maybe
makesvm
is having some problem with categories smaller thanntrain=200
. Possible reasons:- Bug in
makesvm
- Expected svm bias, which should be corrected (see variable
ntrain_bias
inclassifypollen.m
- Bug in
Bug Fix
In classifypollen
the appearance model wasn't doing anything,
due to a problem in getsift
if the input img
is not of type uint8
. See subversion log for details
(revision 1877).
March Counts
cd ~/pollen/data/03-25-08; files= findpollen(); tic; [g,t]= classifypollen(files,'displaymode',1,'shape_bias',xxx,'clutter_bias',xxx); toc/3600
With code revision 1878 and the following parameter values:
id | machine | shape_bias | clutter_bias |
1 | vision315 | .25 | .20 |
2 | vision305 | .40 | .20 |
3 | vision303 | .25 | .40 |
4 | vision302 | .40 | .40 |
5 | vision301 | .25 | .60 |
6 | vision308 | .40 | .60 |
7 | vision107 | .25 | .80 |
8 | vision121 | .40 | .80 |
Variables g[1-8]
and t[1-8]
can be obtained via take pollen_mar_Oct8
.
Results:
>> viewcount({'expert','1','2','3','4','5','6','7','8'},t1,g1,g2,g3,g4,g5,g6,g7,g8) 1 birc: 3 12 14 15 15 13 12 8 14 2 bubb: -- 306 189 289 180 261 170 247 143 3 chen: -- 1 0 1 0 1 0 1 0 4 clut: -- 712 823 844 907 957 990 1031 1058 5 cypr: 3 20 14 16 15 12 10 11 11 6 euca: 1 3 1 2 2 0 0 1 1 7 fung: -- 92 55 58 35 25 21 19 13 8 gink: 3 83 119 67 96 66 80 64 82 9 gras: 2 17 24 14 24 18 22 14 23 10 liq.: 3 7 0 4 1 3 1 2 3 11 liqa: -- 7 0 4 1 3 1 2 3 12 mulb: 37 97 141 81 118 66 103 53 85 13 oak: 57 56 73 51 66 50 65 43 52 14 oliv: -- 76 39 58 36 37 32 25 24 15 palm: -- 0 0 0 0 0 0 0 0 16 pine: 8 100 80 85 76 74 71 62 71 17 popl: 7 0 0 0 0 0 0 0 0 18 sage: 8 3 3 2 3 1 4 2 4 19 syca: 13 -- -- -- -- -- -- -- -- 20 unkn: 47 -- -- -- -- -- -- -- -- 21 waln: 3 7 10 5 9 5 9 6 9
September Counts
Preprocessing August data on vision311 and vision310:
cd ~/pollen/aug/images8; siftpollen('../sift8',getcategories,xxx,2); histpollen('../hist8',getcategories,200,0,200); trainpollen(200,0,200);
The intended target is the 9-22-08
set, which (I believe?) was actually taken on 8-28-08
(confirm this with Jim). For now I am just creating a chinese elm detector, with the following categories
in ~/pollen/aug/images8
:
1 bubble 2 chinelm 3 clutter 4 fungus
Things to Try Next
1. Lots of errors at the edges, where the pollen got truncated. Might get better counts by removing those detections that fell near the edge. In principle these images don't overlap... but is this true? (confirm)
2. Now that appearance model bug is fixed, try reinstating the following in classifypollen
:
possible= unique([guesses{i} clutter.id]);
3. I still see it picking up a lot of large dim objects. Maybe a good statistic would be mean brightness, ie. total luminosity / total area? A large dim object (e.g. smudge) is more suspicious than a small dim object (e.g. mulberry).
Interesting Test Image
Image 72+/-1 (for the March files
I'm running counts on) is really densely clumped. This could be useful for seeing (in a single image) whether my clutter_bias
or other parameters are in the right ballpark. I already get the feeling (looking at this) that my clutter_bias
may need to be larger than .60
Oct 8
Continuing to Explore Parameter Space
id | machine | shape_bias | clutter_bias |
9 | vision315 | .25 | 1.2 |
10 | vision305 | .40 | 1.2 |
11 | vision303 | .55 | 1.2 |
12 | vision302 | .25 | 1.8 |
13 | vision301 | .40 | 1.8 |
14 | vision308 | .55 | 1.8 |
15 | vision107 | 0 | 1.2 |
16 | vision121 | 0 | 1.8 |
>> viewcount({'expert','16','14','13','12','11','10','9','8','7','6','5','4','3','2','1'},t1,g16,g14,g13,g12,g11,g10,g9,g8,g7,g6,g5,g4,g3,g2,g1) 1 birc: 3 0 9 4 0 13 12 6 14 8 12 13 15 15 14 12 2 bubb: -- 225 95 123 155 94 131 207 143 247 170 261 180 289 189 306 3 chen: -- 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 4 clut: -- 1338 1219 1286 1348 1096 1156 1197 1058 1031 990 957 907 844 823 712 5 cypr: 3 2 4 3 2 6 6 3 11 11 10 12 15 16 14 20 6 euca: 1 0 0 0 0 0 0 0 1 1 0 0 2 2 1 3 7 fung: -- 1 0 0 1 2 2 4 13 19 21 25 35 58 55 92 8 gink: 3 3 41 30 11 71 66 41 82 64 80 66 96 67 119 83 9 gras: 2 8 24 21 14 33 15 11 23 14 22 18 24 14 24 17 10 liq.: 3 0 0 0 0 0 2 1 3 2 1 3 1 4 0 7 11 liqa: -- 0 0 0 0 0 2 1 3 2 1 3 1 4 0 7 12 mulb: 37 2 43 30 17 87 55 30 85 53 103 66 118 81 141 97 13 oak: 57 0 52 30 13 61 51 30 52 43 65 50 66 51 73 56 14 oliv: -- 1 15 8 3 25 19 13 24 25 32 37 36 58 39 76 15 palm: -- 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 16 pine: 8 11 76 56 29 87 69 49 71 62 71 74 76 85 80 100 17 popl: 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 sage: 8 0 2 1 1 3 2 0 4 2 4 1 3 2 3 3 19 syca: 13 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 20 unkn: 47 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 21 waln: 3 0 6 4 2 11 7 6 9 6 9 5 9 5 10 7
What seems to work best so far is 16, where there is no shape bias at all. Maybe instead of the shape bias approach, I should use the shape information (ie. nearest neighbor filter on loop and brightness parameters) to restrict the search carried out by the appearance model.
So try this:
- switch to pairwise svm, see if this gets similar results
- switch to using bias as listed above (which, at the moment, only
works in pairwise mode I think?)
Notes (from Meeting w/ Jim)
Need at least a smattering of small features to catch things like beaded walls (e.g. Olive) On the other extreme: perhaps they are well-described by just one BIG sift feature? I need to consider feature size more carefully.
The word "weed" includes: sagebrush (artemisia), chenopod, ambrosia (ragweed), rumex
Oct 9
Modified classifypollen
Code revision 1883 includes the following modifications:
- New attribute
strict
can have 3 values- 0: nearest-neighbor stage merely weights classifications
- 1: nearest-neighbor filters final classification
- 2: same as 1, but clutter category is also filtered
- Default values (first 3 seen below)
defaults= struct( ... % List of image files (if numeric, 'clutter_bias', 1.00 , ... 'shape_bias' , 0.20 , ... 'radius_bias' , 5 , ...
Tests
cd ~/pollen/data/03-25-08 files= findpollen(); [g,t]= deal({}); for i=1:3, [g{i},t{i}]=classifypollen(files,'strict',i-1), xxx ); end
Where xxx
is replaced with the following parameters:
- vision310
'radius_bias',3
- vision309
'radius_bias',5
- vision302
'radius_bias',5,'clutter_bias',0.8
This gives a total of 9 values, stored in pollen_mar_Oct10
Results
>> viewcount({'expert','310 0','310 1','310 2','308 0','308 1','308 2','302 0','302 1','302 2'},t310{1},g310{1},g310{2},g310{3},g308{1},g308{2},g308{3},g302{1},g302{2},g302{3}); expert 310 0 310 1 310 2 308 0 308 1 308 2 302 0 302 1 302 2 1 birc: 3 6 6 21 7 6 20 5 9 22 2 bubb: -- 86 19 20 81 28 26 98 30 30 3 chen: -- 1 2 12 2 0 10 1 1 11 4 clut: -- 844 916 691 866 922 701 772 861 688 5 cypr: 3 16 19 31 16 20 31 19 21 28 6 euca: 1 0 1 2 0 2 2 2 2 5 7 fung: -- 7 0 0 8 0 0 10 0 1 8 gink: 3 8 2 4 6 1 4 7 4 7 9 gras: 2 3 6 12 2 7 12 3 4 12 10 liq.: 3 2 3 13 1 2 9 1 2 9 11 liqa: -- 2 3 13 1 2 9 1 2 9 12 mulb: 37 87 99 199 78 86 189 113 120 191 13 oak: 57 8 11 31 11 17 32 19 21 31 14 oliv: -- 29 21 47 19 17 43 31 23 49 15 palm: -- 0 0 2 1 0 1 0 0 2 16 pine: 8 46 35 33 46 32 35 59 39 39 17 popl: 7 0 0 0 0 0 0 0 0 0 18 sage: 8 5 5 16 4 5 19 5 6 17 19 syca: 13 -- -- -- -- -- -- -- -- -- 20 unkn: 47 -- -- -- -- -- -- -- -- -- 21 waln: 3 10 12 25 9 12 28 12 17 26
Oct 12
Improvements to implement:
- Make sure training (and testing?) sets don't have interrupted loops
- current center is just the mean of contour points
- but contour points may not be smoothly distributed (ie gaps)
- this can happen also with "post-pinching"
- also caused by training sets being cropped (when running
getstats
) - determine radius, ellipticity and jaggedness in a less fragile way?
- maybe compute loop stats in a totally different way (ie. use fft of r, first few components)
- Of nearest n neighbors, remove those that are too far away
- adds an additional parameter
nnlimit
- adds an additional parameter
- Use pairwise multi-class comparisons and limit the pairs
- better discriminative power?
- Add old clutter set back in as additional category (see
mar/images9
) DONE- call it
clutter1
and the current one"clutter2
- right now nn filtering stage often misses the clutter category
- call it
- Remove detections that overlap right and bottom edges FIXED
- currently only removing those on left and top
- their ellipticity is wrong, misclassified as pine/eucalyptus??
- Better choice of sift grid DONE
- regular scales instead of random
- how discriminative is it to just use the entire box as one big feature?
- Different appearance models based on large, medium and small features
- lower cutoff threshold (right now misses, for example, structure inside pine)
- different cutoffs for
getsift
if it is a file or cropped image?
SIFT features extraction (#6 above)
Going to try extracting features at 3 different scales and generating 3 different appearance models. This will have a few advantages
- Find out which level (if any) is doing the most work
- Capture a broader range of scales without "diluting" them all up during matching
- For example, Jim often points at very small features in walls
- Each scale will now have its own customized vocabulary
- May need fewer features overall, therefore faster extraction at test-time
getsift(f,'scales',[16 14],'s',[12 12],'frame',.5); getsift(f,'scales', [8 7],'s',[18 18],'frame',.2); getsift(f,'scales', [4 3],'s',[24 24],'frame',0);
Which look like this:
See siftpollen.m
: (keeps backwards-compatibility with current syntax)
switch feature_mode case 0 % features used originally scales= [6 2]; grid= [48 48]; frame=1.0; case 1 % larger scale (worked better) scales= [8 4]; grid= [48 48]; frame=1.0; case 2 % new large scale scales= [16 14]; grid= [24 24]; frame=0.5; case 3 % new medium scale scales= [8 7]; grid= [24 24]; frame=0.2; case 4 % new small scale scales= [4 3]; grid= [24 24]; frame=0.0; end
Note: grid values must be divisible by 4 in SPM.
Testing these features now:
cd ~/pollen/mar/sift9/large; cats=getcategories; cats([7 8 11 15 17])= []; rand_seed(1); % very important if summing match kernels later! histpollen('../hist9/large/1',cats,50,50,100); trainpollen(50,50,100); svm1= testpollen(50,50,100);
...and so on for medium
and small
. The thinking is
that, since the features are not being smeared over many scales, that nwords=100
will now capture enough distinctive features (not to mention, it's faster).
Oct 14
Meeting with Jim
On meeting with Jim and talking this over, I think I have a notion of why the small features seem to contribute very little. I'm adding a parameter thresh=0.5
which will keep it from matching images based on how much whitespace they contain. As you might imagine, this will vary a bit between the training set (where boxes are determined by a human) and the testing set.
See examples below:
f='palm/April06c_00190.jpg' getsift(f,'scales', [4 3],'s',[24 24],'frame',0,'thresh',xxx);
Note that larger-scale features are hardly affected because the gaps are usually so small that bigger features don't see them.
I'm going to use thresh=0.5 because it seems important to keep some inner structure (in the mulberry for example). The disadvantage would be a little less emphasis on the pollen walls. But still much more emphasis than you get with thresh=0.0!
Nevermind
Turns out I was already using thresh=0.9
(in both siftpollen
and classifypollen
. This should really be putting the emphasis on the pollen walls already. I could still try thresh=0.5
but it is doubtful that this would make a huge difference. Might help with things like mulberry
, which is currently a problem category.
Going to run sift9/wide2
which corresponds to feature_mode=6
and focuses a little more on the interior. Just for comparison. Also switching from scale=[16 4]
to scale=[12 4]
in the hopes of using interior texture to some benefit.
Loop Statistics
Much-improved loop statistics now make use of fapfft
and catch the first few fourier components of the radius.
cd ~/pollen/mar/images9; cats= getcategories; makestats(cats,500);
One rough way to see if this is working is to plot just the first 3 principle components
load stats.mat [xmean,xstd]= deal(mean(stats),std(stats)); stats= rescale(stats,xmean,xstd); [eigenvecs,z,eigenvals]= princomp(stats); newstats= stats*eigenvecs; f1= find(cats<=10); f2= find(cats>10); hold off; scatter3(newstats(f1,1),newstats(f1,2),newstats(f1,3),20,cats(f1),'filled'); hold on; scatter3(newstats(f2,1),newstats(f2,2),newstats(f2,3),20,cats(f2)-10); plot(eigenvals,'*');
The thought here is that, since the first 4 components contain most of the information, I should do the nearest-neighbor search in this space. In fact, as long as we're going to bother projecting the stats into another space might as well try Fisher LD.
Oct 15
New nearest-neighbor model (using PCA-ed loop statistics) is kicking ass.
3-25-08_1036.tif: example of finding a mulberry that was missed?
Next:
- shape model apparently lacks big bubble pieces (3-25-08_1170.tif). Keeps calling pine.
- or, alternately, add clutter_bias to clutter1, clutter2 AND bubble
- add sycam to training set, remove olive and palm
301: cd ~/pollen/mar/images10; siftpollen('../sift10',getcategories,0,2,5); 305: cd ~/pollen/mar/images10; siftpollen('../sift10',getcategories,1,2,5);
note: have not added big bubble pieces yet, but adding them will mean rerunning siftpollen for just a quick incremental update.
Oct 16
Final Flight Check: Appearance Model
[greg@vision304 images10]$ cat DUWC >> showstrings(categories,2) 627 birch 1 birch 2 bubble 105 bubble 3 chenop 4 clutter1 102 chenop 5 clutter2 6 cypress 3492 clutter1 7 eucalyp 8 fungus 803 clutter2 9 ginkgo 10 grass 381 cypress 11 liqamber 12 mulberry 64 eucalyp % < 100 13 oak 14 pine 78 fungus % < 100 15 poplar 16 sageb 241 ginkgo 17 sycam 18 walnut 562 grass 62 liqamber % < 100 576 mulberry 483 oak 1942 pine 53 poplar % < 100 197 sageb 452 sycam 540 walnut
cd ~/pollen/mar/sift10/; cats=getcategories; cats([7 8 11 15])= []; % kill categories too small to train/test rand_seed(1); histpollen('../hist10/1',cats,50,50,200); trainpollen(50,50,200); svm1= testpollen(50,50,200);
...and so on for run 2
, yielding the mean confusion matrix at left.
Presumably the actual appearance model will have better performance because it uses Ntrain=200.
Generate New Improved Appearance and Shape Models (March)
If we don't need to split datasets into train/test, we can now afford to use more training example:
cd ~/pollen/mar/sift10/; cd ~/pollen/mar/images10; cats=getcategories; cats=getcategories; rand_seed(1); rand_seed(1); histpollen('../hist10/1',cats,200,0,200); makestats(getcategories,200); trainpollen(200,0,200);
There's nothing sacred about rand_seed
. I'm just trying to make the results reproducible.
Also, there's some concern that if I used npercat=1000
in makestats
(as I originally did) it heavily biases large categories, to the exclusion of those with <100 elements (including fungus and bubble).
For the record: I'm at subversion code revision 1896.
Oct 17
Last Night's Counts
cd ~/pollen/data/03-25-08 files= findpollen(); [gxxx,txxx]=classifypollen(files,'displaymode',false,'verbose',0);
With the following extra parameters:
- vision310
- vision304
'shape_bias',.25,
- vision302
'clutter_bias',.5
- vision112
'radius_bias',10
This gives a total of 4 values, stored in pollen_mar_Oct17:
>> viewcount({'expert','310','304','302','121'},t310,g310,g304,g302,g121); expert 310 304 302 121 1 birch: 3 6 8 6 10 2 bubble: -- 23 23 22 25 3 chenop: -- 2 0 0 0 4 clutter1: -- 84 97 80 92 5 clutter2: -- 505 515 540 463 6 cypress: 3 8 8 5 9 7 eucalyp: 1 2 1 2 4 8 eucalyp.: 1 -- -- -- -- 9 fungus: -- 1 2 1 2 10 ginkgo: 3 10 11 7 8 11 grass: 2 5 6 4 5 12 liq.ambe: 3 -- -- -- -- 13 liqamber: -- 2 1 1 0 14 mulberry: 37 36 25 28 29 15 oak: 57 32 29 24 30 16 pine: 8 16 14 14 20 17 poplar: 7 0 0 0 0 18 sageb: 8 3 3 2 3 19 sageb.: 8 -- -- -- -- 20 sycam: 13 12 10 9 7 21 sycam.: 13 -- -- -- -- 22 unknown: 47 -- -- -- -- 23 walnut: 3 3 4 4 6
The "Final" March Count
None of the parameters I added lat night helped performance, so I'll stick with the default (that ran on vision310
):
birch | 3 | 6 |
bubble | -- | 23 |
chenop | -- | 2 |
clutter1 | -- | 84 |
clutter2 | -- | 505 |
cypress | 3 | 8 |
eucalyp | 1 | 2 |
fungus | -- | 1 |
ginkgo | 3 | 10 |
grass | 2 | 5 |
liqamber | 3 | 2 |
mulberry | 37 | 36 |
oak | 57 | 32 |
pine | 8 | 16 |
poplar | 7 | 0 |
sageb | 8 | 3 |
sycam | 13 | 12 |
unknown | 47 | -- |
walnut | 3 | 3 |
Revised Appearance Model
Using sift features of type 1 instead of type 5 (I'm not sure type 5 are performing very well).
cd ~/pollen/mar/sift10/1; cats=getcategories; cats([7 8 11 15])= []; % remove smaller categories rand_seed(1); histpollen('../hist10/1/1',cats,50,50,200); trainpollen(50,50,200); svm1= testpollen(50,50,200);
Here is the older Oct 16 model performance for comparison
I have yet to try this on the final counts, to see if they improve.
Meanwhile in August, Month of the Chinese Elm
classifypollen(files,'traindir','aug','histdir','hist8/1'... 'statdir','images8','shape_bias',.25, 'showloops',false,'htmldir','html');
Nov 18 : Stuff for Demo
March 2008
Rerunning sift features from scratch, just to make sure.
- images20
- a copy of images8, except that clutter has been renamed clutter2, and images7b/clutter has been added as clutter1. This was done in a hack sort of way with links before in mar/images10
- images21
- same as above except oak of questionable size has been moved to oak/questionable so that it is no longer part of the training set
cd ~/pollen/mar/images20; rand_seed(1); makestats(getcategories,400); siftpollen('../sift20',getcategories); histpollen('../hist20/1',getcategories,200,0,200); trainpollen(200,0,200);
Classification next:
cd ~/pollen/data/03-25-08 files= findpollen(); displayargs= {'displaymode',false, 'verbose',0}; classifyargs= {'traindir','mar', 'histdir','hist20/1', 'statdir','images20', 'trainsuffix','200_000_200'}; [gxxx,txxx]=classifypollen(files,displayargs{:},classifyargs{:});
Now repeat the same thing, replacing 20
with 21
. This will tell me if tweaking the oak training set helped at all.
Demo
For the following sets:
- Aug 2008
- Mar 2008
- Jan 2008
Show
classifypollen
in action (realtime)- Final histogram counts
cd ~/pollen/data/09-22-08 files= findpollen(); classifyargs= { 'traindir' 'aug' ... 'histdir' 'hist8' ... 'statdir' 'images8' ... 'shape_bias' .25 ... 'trainsuffix', '200_000_200' }; classifypollen( files, classifyargs{:});
take pollen_aug_Nov17 taug.types{1}= 'chinelm'; viewcount({'expert','computer'},[1 3 5 9],taug,gaug); zlim([0 120])
cd ~/pollen/data/03-25-08 files= findpollen(); classifyargs= { 'traindir' 'mar' ... 'histdir' 'hist20/1' ... 'statdir' 'images20' ... 'trainsuffix', '200_000_200' }; classifypollen( files, classifyargs{:});
take pollen_mar_Oct17 viewcount({'expert','computer'},[2 4 5 8 13 19 21 22],t310,g310); zlim([0 60]);
take pollen_mar_Nov18 viewcount({'expert','computer'},[2 4 5 8 13 19 21 22],t20,g21b); zlim([0 60]);
March: What Now?
- Which is the best appearance + shape model?
- Benchmark with jan/feb/mar datasets
- Size
- Appearance
- Size + Appearance
- How to make improvements?
- Dataset modifications?
- Use nbnn + spm?
- Parameter search
- Add different features?
- Simplify
- Instead of generating separate jan/feb/mar etc. directories
- Have one 1-vs-1 model, and mark unwanted categories
- Sanity check: does 1-vs-1 work as well as 1-vs-all?
Running the Ubermodel
On 16 machines at once (approx 30 minutes per machine):
cd ~/pollen/images siftpollen('../sift',getcategories,0,16);
On one of the 8-processor machines, histogramming takes about 15 minutes (utilizing ~600% of the CPU) and generating the match kernel takes another 15 minutes (~430-530% utilization). That's about 25-28 kilomatches per second, with one other process running on the machine besides mine.
matlabpool % using MATLAB 2009a I now get 8 labs cd ~/pollen/sift rand_seed(1); histpollen('~/pollen/hist/1',getcategories,200,0,200); trainpollen(200,0,200);
Unfortunately I am not utilizing parfor
yet in makesvm
or in any of its daughter routines. So it's utilization is still just 100%.
Testing the Appearance Model
>> cd ~/pollen/images >> cats= getcategories; >> showstrings(cats) >> !duwc [a-z]* 1 alder 723 alder 2 ash 555 ash 3 birch 627 birch 4 bubble 105 bubble 5 chenop 102 chenop 6 chinelm 2596 chinelm 7 clutter1 3492 clutter1 8 clutter2 803 clutter2 9 crepemyrtle 19 crepemyrtle % <100 10 cypress 381 cypress 11 eucalyp 64 eucalyp % <100 12 fungus 78 fungus % <100 13 ginkgo 241 ginkgo 14 grass 562 grass 15 jacaran 79 jacaran % <100 16 liqamber 62 liqamber % <100 17 mulberry 576 mulberry 18 oak 420 oak 19 olive 842 olive 20 palm 85 palm % <100 21 pecan 12 pecan % <100 22 pine 1942 pine 23 poplar 53 poplar % <100 24 sageb 197 sageb 25 sycam 452 sycam 26 walnut 540 walnut
Categories 9,11,12,15,16,20,21,23 are too small.
cd ~/pollen/sift; rand_seed(1); cats=getcategories; cats([9 11 12 15 16 20 21 23])=[]; histpollen('~/pollen/hist/1',cats,50,50,200); trainpollen(50,50,200); svm1= testpollen(50,50,200);
Groupings
What sort of groupings are found based on interconfusion?
cd ~/pollen/hist/1 [conf,cats]=fromstruct(svm1,'conf','categories'); clist= 1:length(cats); groups= makegroups(conf,clist,2,5); viewgroups(conf,groups([1 4 3 2 5]),cats);
NBNN
cd ~/pollen/sift; rand_seed(1); cats=getcategories; cats([9 11 12 15 16 20 21 23])=[]; makenn('~/pollen/nn/1',cats,50,50); makenbnn('.',50,50,25); runnbnn(50,50,25);
optionally,
cd ~/pollen/sift; makenn('~/pollen/nn/1',cats,50,50,'teston','train'); makenbnn('.',50,50,25,'teston','train'); runnbnn(50,50,25,'teston','train');
Non-test Model: The real deal
Just to recap:
cd ~/pollen/mar/sift; cd ~/pollen/mar/images; cats=getcategories; cats=getcategories; rand_seed(1); rand_seed(1); histpollen('~/pollen/hist/1',cats,200,0,200); makestats(getcategories,500); trainpollen(200,0,200);
How to improve on the old way?
- Add NBNN
- Combine multiple SPM models (but are they entirely redundant??)
- Add different features
02-19-09
Class | Linecount | LabelGUI | Computer v1 |
Computer v2 |
Computer v3 |
alder | 2 | 1 | |||
ash+privet | 4+6 | 3 | |||
birch | 3 | 2 | |||
cypress | 5 | 4 | 5 | 4 | 3 |
eucalyp | 3 | 1 | |||
grass | 2 | NA | 3 | 1 | |
olive | 1 | 1 | 1 | ||
oak | 3 | 1 | 1 | 1 | |
palm | 4 | 1 | |||
pine | 13 | 7 | 16 | 17 | 6 |
poplar+cottenwood | 5 | 4 | |||
sycam=planetree | 3 | 6 | 6 | 6 | 3 |
unknown | 2 | 8 | NA |
v2: ntrain_bias=0.5; shape_bias=0.2; clutter_bias=0.0;
v3: ntrain_bias=0.5; shape_bias=0.2; clutter_bias=-0.0
Here is a more complete record of the
options that are in use
for the above classifypollen
tests.
April 7, 2009
load model.mat boxes= classifypollen(findpollen,'verbose',true,'clutter_bias',0,... 'shape_bias',1.0,'possible',possible,'strict',1,... 'radius_bias',3,'outdir','images');
April 10, 2009
- Cross-validation
- Use half the dataset to classify the other half.
- Do this repeatedly and gather statistics.
- Iteratively remove the worst offenders.
- What do they look like
- Shape stats before and after the above process
- Are the stat distributions tighter?
- Can stats be used to cull the training data even more?
- Iteratively remove n-sigma outliers?
- Try new dataset against Jim's most recent datasets (02-19-09, etc.)
- Better performance?
Make master hist file (step 1)
cd ~/pollen/sift cats= getcategories files= pickfiles(cats); makehist('~/pollen/spm',{},'999_999_200.mat',files,{});
Investigate data pruning based on size
subplot(26,1,1) n= max(cats); for i=1:n, f= find(cats==i); subplot(13,2,i); hist(stats(f,1),0:150); xlim(0,150); title(categories{i}); end scalesubplots(1.2,2.0); end
There are some questionable areas outlined in red. I need to
- scale these into units of microns
- overlay the limits we found in the classification manual.
- should I prepend the radius (in microns) to the filename, for convenience?
sizenames
cd ~/pollen/images; sizename('~/pollen/sizes');
Then I renamed sizes
as images
and put the old images in 2009/Apr10
.
This allows me to easily sort by size (in gthumb) and remove size outliers that are pointed out in the histograms to the right.
Side note: one unintended consequence of this is that it makes it easy
to spot duplicates. For example, there was a duplicate in ash
and multiple duplicates in alder
.
Problem with duplicates
Holy crow now that I'm sorting by size with the new filename scheme, I can see that there are a ton of duplicates in some categories. Let's try to find these later: should find for example that
03457_Jan06_00635.jpg 03574_Jan06_01363.jpg 03631_Jan06_00083.jpg 03724_Dec05_NoPine_00078.jpg
are all duplicates.
Conclusion: solve this later in the match stage by looking for abnormally high match kernel values.
Pruning
Meanwhile I moved about 340 images into _questionable
subdirectories,
mostly by looking for questionable files in the lower and upper size range
for the category.
Running the New Pruned Data Set
cd ~/pollen/images; siftpollen('~/pollen/sift',getcategories,0,4); siftpollen('~/pollen/sift',getcategories,1,4); siftpollen('~/pollen/sift',getcategories,2,4); siftpollen('~/pollen/sift',getcategories,3,4);
cd ~/pollen/sift cats= getcategories files= pickfiles(cats); makehist('~/pollen/spm',{},'999_999_200.mat',files,{});
Duplicates
What threshold to use for identifying duplicates? Explore:
cd ~/pollen/spm; load match999_999_200.mat; thresh=0.5:0.02:1.0; n= thresh*0; nthresh= length(thresh); for i= 1:nthresh printf('%d %d\n',i,nthresh); [x,y]=findmatch(mtrain,thresh(i)); n(i)= length(x); end
In conclusion, 0.6667 isn't bad. This is the new default for findmatch
.
Duplicate Removal
Continuing from the above,
cd ~/pollen/images; pruneduplicates(ftrain(y));
Here are the results:
Apr10 Now 1 alder 723 701 2 ash 555 551 3 birch 627 595 4 bubble 105 88 5 chenop 102 85 6 chinelm 2596 2585 7 clutter1 3492 3430 8 clutter2 803 795 9 crepemyrtle 19 16 10 cypress 381 369 11 eucalyp 64 64 12 fungus 78 78 13 ginkgo 241 225 14 grass 562 497 15 jacaran 79 74 16 liqamber 62 57 17 mulberry 576 573 18 oak 420 412 19 olive 842 811 20 palm 85 81 21 pecan 12 12 22 pine 1942 1758 23 poplar 53 53 24 sageb 197 189 25 sycam 363 359 26 walnut 540 525
Testing The Appearance Model: Before and After
Notice that largest number of categories is 85 now, not 100. So using ntrain=40 instead of 50.
rootdir='~/pollen/2009/Apr10'; % rootdir='~/pollen'; for i=1:10, chdir(rootdir); chdir('sift'); rand_seed(i); pwd spmdir= sprintf('%s/spm/%d',rootdir,i) cats=getcategories; cats([9 11 12 15 16 20 21 23])=[]; histpollen(spmdir,cats,40,40,200); trainpollen(40,40,200); svm{i}= testpollen(40,40,200); end
To view that,
svm= {}; for i=1:10, cd(sprintf('%d',i)); svm{i}=testpollen(40,40,200); cd('..'); end for i=2:10, svm{1}.conf= svm{1}.conf + svm{i}.conf; end; svm{1}.conf=svm{1}.conf/10;
Overall performance improved only 3% (from 63.75 to 66.95). But this doesn't tell the whole story
- Out-of-sample data had been contaminated with duplicates of in-sample images
- ~550 duplicate have been removed
- This the new classification tests is more difficult, and
- Test results should now generalize better to out-of-sample sets
- Removed many images whos shape stats were outliers
- Better nn shape filter, presumably?
Running Some Actual Models (for use by classifypollen
)
for i=1:10, rootdir='~/pollen'; chdir(rootdir); chdir('sift'); rand_seed(-1); pwd spmdir= sprintf('%s/spm/%d',rootdir,i) cats=getcategories; histpollen(spmdir,cats,100,0,200); trainpollen(100,00,200); end
The idea is to have classifypollen
read multiple classifiers and
aggregate all their margin scores to see who wins.Does this improve performance?
Let's try some counts and find out!
f= findpollen; boxes= classifypollen(f,'verbose',true,'clutter_bias',-0.1,'shape_bias',0.5,... 'possible',possible,'radius_bias',3,'strict',0); boxes= classifypollen(f,'verbose',true,'clutter_bias',-0.0,'shape_bias',0.5,... 'possible',possible,'radius_bias',3,'strict',1);
Appearance Bias
Need a new bias which biases against higher-performing categories. Pine just seems to get called more than other things... there should be something that biases the appearance margins accordingly.
Use ntrain/ntest/ntrial=40/40/10 results above to establish a useful bias. Then multiply margins by this.
At the moment things like alder and ash have a tough time getting called, because the margins are simply never all that high.
Update: did not implement this. If SVM is doing it's job, it does not seem like this should matter
Current Best Parameters?
boxes= classifypollen(f(end:-1:1),'verbose',true,'clutter_bias',0.333333,... 'shape_bias',1.0,'possible',possible,'radius_bias',1,'strict',0,... 'ntrain_bias',1.0);
Not very satisfied with the performance here. I need to go back to basics and
- Evaluate shape model separately
- Evaluate appearance model separately
- Need to model blurred/non-blurred versions of each pollen?
- Evaluate test performance vs. test performance for 2009 datasets
How about using scores instead of votes?
Is n being incremented in the correct place in pairwise.fwd
Apr 21
Need to evaluate two things separately, using the latest testing data. And figure out how and why they aren't working.
Out-of-sample Test Set
I'm going to start using the in-sample pre-2007 data for training and test directly against the out-of-sample 2008-2009 data. This is because I often get great performance when dividing pre-2007 data into train/test sets, but it feels like the performance is not generalizing well to the 2008-2009 data.
If that's true, let's figure out why.
For now the model is bogus for all but 02/19/09. We just want to extract test images so the classifications don't matter.
Note: for now, just saves the expert boxes. This means no clutter category. To do this we would have to take all boxes identified by the computer that were not labeled by the expert. This may be a logical next step. For now...
cd ~/pollen/data/02-19-09 load model.mat args={'possible',possible,'outdir','images','displaymode',false}; % already did this on a previous data % boxes= classifypollen(findpollen,args{:}); cd ~/pollen/data/02-22-09; boxes= classifypollen(findpollen,args{:}); cd ~/pollen/data/03-03-09; boxes= classifypollen(findpollen,args{:});
That turns out to be a pretty pitiful sampling, so did the above for 01-08-08, 03-25-08, and 09-22-08 as well.
Now consolidate into one testing set.
mkdir ~/pollen/out cd ~/pollen/data for dir in 02-19-09 02-22-09 03-03-09 03-25-08 09-22-08; do echo $dir rsync -ax $dir/images/ ~/pollen/out/images/ done cd ~/pollen/out/images; duwc *
With the result:
1 alder 33 ash 13 birch 0 chin. 101 chin.elm 15 cypress 6 eucalyp. 3 ginkgo 4 grass 3 liq.amber 42 mulberry 58 oak 2 palm 210 pine 14 poplar 3 rumex 9 sageb. 26 sycam. 120 unknown 3 walnut
Finally some trivial renaming is required
cd ~/pollen/out/images mv chin.elm chinelm mv eucalyp. eucalyp mv liq.amber liqamber mv sageb. sageb mv sycam. sycam rm -rf chin.
Then run it like any other database:
cd ~/pollen/out/images siftpollen('../sift',getcategories);
Note: had to remove bogus files:
ash/3-3-09TL_0367_001.jpg
Finally merge in and out-sample data into one big dataset. Just as there is 101,256,357 so there will now be (default),out,inout:
mkdir inout rsync -ax out/images/ inout/images/ rsync -ax out/sift/ inout/sift/ rsync -ax images/ inout/images/ rsync -ax sift/ inout/sift/
Out-Of-Sample Evaluation of Appearance Model
Analogous to:
cd ~/101/sift/80x60; cats101= getcategories; files101= pickfiles(cats101); cd ~/256/sift/80x60; cats256= getcategories; files256= pickfiles(cats256); cd ~/357/sift/80x60; makehist('~/357/spm/80x60',{},'101_256_200.mat',files101,files256);
we now try:
cd ~/pollen/sift; cats= getcategories; filesin= pickfiles(cats); cd ~/pollen/out/sift; cats= getcategories; filesout= pickfiles(cats); cd ~/pollen/inout/sift makehist('~/pollen/inout/spm',{},'001_002_200.mat',filesin,filesout);
This may take a couple hours.
Bug?
Meanwhile, need to think about potential bug in NIPS2009 paper topic as well as pollen here (postmatch.m
):
% get training/test class values [ctrain,Ctrain]= file2class(ftrain); [ctest, Ctest ]= file2class(ftest);
What if categories in ftrain
and ftest
are different: are the mappings of categories to numbers still consistant across train/test?
Now I think I've fixed this in both postmatch
and makesvm
Appearance Model Results
Below: some test categories are entirely absent, yet the train/test category numbers seem to be consistent.
Right: Final confusion matrix
Actual performance across the categories we care about is more like 19%. Easier to see this way.
Learning a Bias for the Appearance Margins
The idea is that some kinds of pollen (pine??) get called a lot, but that some stragglers (ash?) need a little extra margin boost. Learning what this bias looks like:
[bias,perf]=makebias(5,50,10,1,'method',13);
The danger here is: what if these biases don't extrapolate well to future datasets in different months?
Did this 4 times and averaged the resuts (see plot at left).
This can be found in ~/pollen/inout/spm/bias.mat:
bias= [0.950 0.850 1.225 1.000 1.000 1.378 0.825 0.900 1.000 1.150 0.988 0.869 1.000 0.900 0.800 1.091 1.150 1.207 0.972 1.150 1.000 1.184 1.000 1.038 0.944 1.000];
Now try randomizing over different samplings:
bias={}; perf=[]; for i=1:20, hostlabel(num2str(i)); postmatch(001,002,200,50,10,i); [b,p]=makebias(5,50,10,1,'method',13); bias{i}= b; perf(i)= p; end
Trying this on a 2nd machine (to see if unequal numbers of training example per category is a factor).
bias={}; perf=[]; for i=1:20, hostlabel(num2str(i)); postmatch(001,002,200,200,10,i); [b,p]=makebias(5,200,10,1,'method',13); bias{i}= b; perf(i)= p; end
Question: are the categories that need biasing the same as the categories that had less than 50 samples (and are thus under-represented in the training?)
top; load bias50_10 mean=bias{1}*0; for i=1:20, plot(bias{i},'.'); hold on; mean=mean+bias{i}; end; plot(mean/20,'ro'); title('bias 50 10','fontsize',16) bot load bias200_10 mean=bias{1}*0; for i=1:20, plot(bias{i},'.'); hold on; mean=mean+bias{i}; end; plot(mean/20,'ro'); title('bias 200 10','fontsize',16)
These correspond to the following categories (26 instead of 28 because rumex
and unknown
have no corresponding training category):
>>showstrings(categories([1:23 25 26 28]),5) 1 alder 2 ash 3 birch 4 bubble 5 chenop 6 chinelm 7 clutter1 8 clutter2 9 crepemyrtle 10 cypress 11 eucalyp 12 fungus 13 ginkgo 14 grass 15 jacaran 16 liqamber 17 mulberry 18 oak 19 olive 20 palm 21 pecan 22 pine 23 poplar 24 sageb 25 sycam 26 walnut
Keep in mind that the test set was pretty limited, and only the following 19 categories are represented:
>> load svm050_010_001.mat >> showstrings(categories(unique(svm.ctest)),5) 1 alder 2 ash 3 birch 4 chinelm 5 cypress 6 eucalyp 7 ginkgo 8 grass 9 liqamber 10 mulberry 11 oak 12 palm 13 pine 14 poplar 15 rumex 16 sageb 17 sycam 18 unknown 19 walnut
This means that only the bias values for
>> useful= intersect([1:23 25 26 28],unique(svm.ctest)); >> showstrings(categories(useful),5) 1 alder 2 ash 3 birch 4 chinelm 5 cypress 6 eucalyp 7 ginkgo 8 grass 9 liqamber 10 mulberry 11 oak 12 palm 13 pine 14 poplar 15 sageb 16 sycam 17 walnut
are very useful.The others should probably be set to 1. At the end of the day, something like this:
>> useless= setdiff([1:23 25 26 28],unique(svm.ctest)); >> useless(3:4)= []; % clutter biases are useful >> load bias50_10 >> mean(useless)= deal(1); >> spmbias= mean; >> sav spmbias.mat spmbias My hope is that the above biases will improve the appearance model.
May 4
02-19-09
cd ~/pollen/data/02-19-09 load model.mat load spmbias3.mat pollen=classifypollen(findpollen,'possible',possible,'verbose',1,'spmbias',spmbias);
04-23-09
According to archived data from last year,possible classifications are:
weeds, grass, pine, oak, birch, polar, plane (sycam), eucalypt, olive, ginko, palm, sagebrush, chenopod, ambrosia
The available classifications are:
1 alder 2 ash 3 birch 4 bubble 5 chenop 6 chinelm 7 clutter1 8 clutter2 9 crepemyrtle 10 cypress 11 eucalyp 12 fungus 13 ginkgo 14 grass 15 jacaran 16 liqamber 17 mulberry 18 oak 19 olive 20 palm 21 pecan 22 pine 23 poplar 24 sageb 25 sycam 26 walnut
So
possible= sort([14 22 18 3 23 25 11 19 13 20 24 5 7 8]);
which is stored in ~/pollen/data/04-23-09/model.mat.
cd ~/pollen/data/04-23-09 load model.mat load spmbias.mat pollen=classifypollen(findpollen,'possible',possible,'verbose',1,'spmbias',spmbias);
To Try Next
- Single-scale or 2-scale SIFT features (no range of randomized scales)
- Or how about simpler, non-sift features?
- Edges, perhaps?
- Optimize shape stats
- Stats act on blurred images now, so no "rattiness" metric
- How about using edge detector instead
- might help with bright edges ala 4-26-09_0430
- Semi-supervised labelGUI
- How many images can be totally avoided?
- Optimize for parfor
- What's taking the most time right now?
- load (preload?)
- convolve
- (find loops)
- What's taking the most time right now?
May 28
Below xxx values tried are 050 100 200 400 800
:
cd ~/pollen/16x16/sift; cats=getcategories; histpollen('~/pollen/hist',cats,9999,9999,xxx); postmatch(9999,9999,xxx,50,20,1,'post',2); 1: 701 => ( 50 10) 2: 551 => ( 50 10) 3: 595 => ( 50 10) 4: 88 => ( 50 10) 5: 85 => ( 50 10) 6: 2585 => ( 50 10) 7: 3430 => ( 50 10) 8: 795 => ( 50 10) 9: 16 => ( 16 0) 10: 369 => ( 50 10) 11: 64 => ( 50 10) 12: 78 => ( 50 10) 13: 225 => ( 50 10) 14: 497 => ( 50 10) 15: 74 => ( 50 10) 16: 57 => ( 50 7) 17: 573 => ( 50 10) 18: 412 => ( 50 10) 19: 811 => ( 50 10) 20: 81 => ( 50 10) 21: 12 => ( 12 0) 22: 1758 => ( 50 10) 23: 53 => ( 50 3) 24: 189 => ( 50 10) 25: 359 => ( 50 10) 26: 525 => ( 50 10)
showstrings(cats,4) 1 alder 2 ash 3 birch 4 bubble 5 chenop 6 chinelm 7 clutter1 8 clutter2 9 crepemyrtle 10 cypress 11 eucalyp 12 fungus 13 ginkgo 14 grass 15 jacaran 16 liqamber 17 mulberry 18 oak 19 olive 20 palm 21 pecan 22 pine 23 poplar 24 sageb 25 sycam 26 walnut trainpollen(50,10,1); svm= testpollen(50,10,1);
Since certain categories like olive hardly ever get called, try biasing.
Note: changed dbias
from 0.8 to 0.5 because it did not seem
to have enough range to find the optimal bias.
[bias,perf]=makebias(10,50,10,1,'method',13);
Optimal value for nword
cd ~/pollen/16x16/hist; nword=[50 100 200 400 800]; perf=zeros(10,5); for i=1:5, cd(num2str(nword(i))); p= scansvm; perf(:,i)=p(:,3)'; cd ..; end [mean(perf); std(perf)] ans = 45.8730 49.0536 49.8591 50.5337 50.7877 2.5587 2.5345 2.5546 2.2381 4.6325
So 400 is optimal (2nd best performance and least variance). But not significantly so.
Optimal Bias
Ran the following on 5 different machines for 5 seed values XXX=1..5:
cd ~/pollen/16x16/hist/400; [biasXXX,perfXXX]=makebias(20,50,10,XXX,'method',13);
Mean bias (and individual biases) saved in ~/pollen/16x16/hist/400/bias.mat
Runs
Today's runs ( 4-26-09 ) are as follows:
- pollen1
- uses pre-existing model1.mat and spmbias.mat
- pollen2
- uses corrected model (based on actual counts)
- pollen3
- remove old spmbias (now spmbias1.mat) and replace with 1.0 across the obard
- pollen4
- null bias still, but new model based on 16x16 grid of sift features (10, not 14 or 7)
- pollen5
- putting bias1 in from /common/greg/pollen/16x16/hist/final
pollen2 counts way too much ginko and oak. pollen3 counts even more (no bias in place)
Sept 20, 2009
Confusion Matrix For Jim's Poster
- Build training model using 50 images per category (N largest categories)
- Find test images not used in training set
- Evaluate confusion
Training Model
Shape model:
cd ~/pollen/images makestats(getcategories,200);
Appearance Model: use existing
/common/greg/pollen/spm_v1/24x24_1014/200/hist050_010_010.mat
Create testing set:
[greg@vision401 testing]$ cd ~/pollen/data makepollentest #creates ./testing from ./training
Consolidate labels:
cd ~/pollen/data/training label= mergelabels; cd ~/pollen/data/testing save label.mat label
Keep only images with pollen in them, that aren't in training set:
files= findpollen(); load /common/greg/pollen/spm_v1/24x24_1014/200/hist050_010_010.mat for i=1:length(ftrain), indx=first(find(ftrain{i}=='_')); ftrain{i}= strrep(ftrain{i}(indx+1:end),'.mat','.tif'); ftrain{i}= strrep(ftrain{i},'_',''); end [files,ftrain]= apply(@sort,files,ftrain); ftest= setdiff(files,ftrain); [length(ftrain) length(files) length(ftest)]
This leaves 7068 files in ftest
to be classified:
pollen= classifypollen(ftest,'verbose',0);
Nov 16, 2009
Baseline appearance model
/common/greg/pollen/spm_v1/24x24_1014/200 [svms,svm]= testpollens(50,10,1:10); viewconfuse(svm)
Performance for crepemyrtle and pecan is zero because there are not enough training examples to construct independent train and test sets at Ntrain=50.
So the baseline performance of %55.8 is somewhat depressed due to this.