CAP 5638, Pattern Recognition, Fall, 2005
Department of Computer Science,
¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾
Points: 100
Due: Thursday, October 6, 2005
Problem 1 (15 points) Problem 1 (Chapter 3 in the textbook).
(a) See the following plots.



(c) Note this problem itself is
not clear and confusing. For a large n, as we assume that the samples are
generated according to the distribution, we have
. It should be marked on the plot p(x|q) with x=2. Note it does not correspond to the maximum as
expected (this is only for one particular one)

Problem 2 (15 points) Problem 2 (Chapter 3 in the textbook).



Problem 3 (15 points) Problem 3 (Chapter 3 in the textbook).



Problem 4 (25 points) Problem 7 (Chapter 3 in the textbook).



(e) This example shows the parametric form is very important for maximum-likelihood estimation. If the assumed form is far from the true underlying model, the ML estimate can give larger error than other models in the same assumed family. In order to get good results using ML estimate, one needs to find the most accurate model for the unknown underlying model based on prior knowledge, experience, or experiments on some data. If several models are available, they should be evaluated and compared using some test data.
Problem 5 As we did in class, suppose that there are C classes (w1, …, wC), and there are d features (x1, …., xd), the features are assumed to be statistically independent with normal distribution of unknown mean and variance and the parameters are estimated using maximum likelihood estimation method.
1) (15 points) Write down the training steps and a set of discriminant functions
for minimum error rate classification. You need to include the details.
First we fix the notation, let
represent the kth
feature in the ith of class wj. Assume that we have n training samples in
total and we have n1 for w1,
…, and nC for wC
There are different choices of discriminant functions for different minimum error rate classification. Here I choose
![]()
Here for class j, we need to
estimate both
and
. For
, using maximum likelihood estimation, we have
![]()
![]()
To estimate
, based on the assumptions that the features are
statistically independent and they are normal distributed, we have

Here class wj has 2d parameters and they are estimated by according to maximum likelihood estimation

Plug-in the results to the discriminant function given above and ignore the common constant, we have

2) (15 points) Implement your steps using a
program language of your choice and then apply your program on the wine dataset
for leave-one-out classification (available on the course web page and http://www.ics.uci.edu/~mlearn/MLRepository.html).
You need to include the results from your program and your source code.
Give the equations above, to do leave-one-out recognition on the wine dataset, we need to do the following
- For each sample in the dataset
o We form a training set by removing it from the entire data set (177 samples)
o We estimate the 26 parameters (13 means and 13 variances for each class) and there are in total 78 parameters.
o For the one that was left out, we compute g1, g2, and g3 and we assign it to the class with the largest g’s.
o We compare that the classification result with the true label: if there are different, it is a mistake; otherwise, it is correctly classified.
- Output the classification rate
Here is a Matlab program
C=3;
first=0;
ns=[1:C]*0;
wine_uci %read
the dataset
for i=1:C,
ns(i)=sum(imgWine(:,1)==(i));
end
startInd=ns*0;
startInd(1)=0;
for i=2:(C+1),
startInd(i)=startInd(i-1)+ns(i-1);
end
nsInd=ns*0;
imgMat=imgWine(:,2:size(imgWine,2));
for i=1:size(imgWine,1),
k=imgWine(i,1);
nsInd(k)=nsInd(k)+1;
imgMat(nsInd(k)+startInd(k),:)=imgWine(i,2:size(imgWine,2));
end
startInd=ns*0;
startInd(1)=0;
for i=2:(C+1),
startInd(i)=startInd(i-1)+ns(i-1);
end
clf;
colormap(gray(256));
correct=0;
wrong=0;
total=0;
for c=1:C,
for
k=1:ns(c),
%Leave one out classification
firstsub = (c-1)*max(ns);
subplot(2,1,1);
gx=1:C*0;
for i=1:C,
subInd=(startInd(i)+1):startInd(i+1);
if i==c,
subInd=[1:(k-1) (k+1):ns(i)];
end
subMat=imgMatP(subInd,:);
%size(subMat)
mean_vec=sum(subMat);
mean_vec = mean_vec/size(subInd,2);
var_vec=sum(subMat.^2)/size(subInd,2)-mean_vec.^2;
if min(min(var_vec)) <= 0.000000001,
var_vec =var_vec-min(min(var_vec))+0.000000001;
end
gx_marg=-0.5*log(var_vec)-(imgMatP(startInd(c)+k,:)-mean_vec).^2./(2*var_vec);
gx(i) =
sum(gx_marg);
%pause;
end
format long;
[Y, c1]=max(gx);
subplot(2,1,2);
p=plot(1:C,gx,'-');
set(p,'LineWidth',[2]);
gxStr=[sprintf('%6.2f ',gx)];
disp([sprintf('Test image %dth from class
%d: Number of features %d, classified as %d with\n\t
gx=[%s]',k,c,size(imgMatP,2),c1,gxStr)]);
if c1==c,
resStr=[sprintf('correct')];
correct=correct+1;
else
resStr=[sprintf('wrong (class %d)',
c)];
wrong=wrong+1;
end
total=total+1;
title([sprintf('Classified as \\omega%d,
which is %s (total %d (%d correct %d wrong), which is
%4.2f%%)',c1,resStr,total,correct, wrong, correct*100/total)],'FontSize',[12]);
firstsub = (c-1)*max(ns);
% subplot(2,max(ns),firstsub+k);
% title([sprintf('Classified as
\\omega%d',c1)],'FontSize',[12]);
pause(2)
%firstsub = (c-1)*max(ns);
%subplot(C+1,max(ns),firstsub+k);
% cla;
%
%image((reshape(imgMat(startInd(c)+k,:),imSize))');
% axis('off'); axis('image');
% title([sprintf('\\omega%d
k=%d',c,k)],'FontSize',[8]);
end
end
display([sprintf('Total
%d (%d correct %d wrong), which is %4.2f%%',total,correct, wrong,
correct*100/total)]);
3) (Extra credit, 10 points) Implement
linear dimension reduction assuming that a linear transformation is known. Then
apply your program on the wine dataset for leave-one-out classification using
either 1) randomly generated linear transformation, 2) principal component
analysis, 3) or Fisher discriminant analysis. You need to include the results
from your program and your source code.
This is similar to the original program except that we reduce the dimension first by multiplying the data matrix with W, the dimension reduction matrix.