WS09 Pattern Recognition Homework 1

Due: Wed, November 4, 2009

1 New Applications of Pattern Recognition

Identify two problems in computer science that are currently solved algorithmically that might benefit from pattern recognition (machine learning).  Discuss how these problems are solved currently and how pattern recognition might help improve the solution.  Write up what you find (1-2 pages) and submit at recitation.

2 Simple Nearest Neighbor Classifier

Implement a nearest neighbor classifier for the MNIST dataset. Download mnist.py and the MNIST datasets. Use the MNIST data worksheet as an example to see how to read MNIST images. Load the data into two variables: images and labels.  Note that images is a rank 3 array; images[i] is a rank 2 array that represents a 28x28 image of a character. The label for images[123] is contained in labels[123]. 

There are 60000 images and corresponding labels in the MNIST dataset.  Divide these into two subsets:

training_images = images[:50000]
training_labels = labels[:50000]
test_images = images[50000:]
test_labels = labels[50000:]

Use training_images and training_labels as the dataset (the data used to build the model) and use test_images and test_labels to compute the errors.

Now, let the pixels of an image be b_{ij} and define the euclidean distance of two images b and b' as:

d(b,b') = \sqrt{ \sum_{ij} (b_{ij}-b'_{ij})^2

For each test image in test_images, find the training image that has the smallest distance from the test image and use the corresponding label as the prediction.  This is known as the nearest neighbor algorithm.

Compute the fraction of test images for which the actual label (stored in test_labels) differs from the predicted label.  This is the nearest neighbor error rate.

As you will notice, pattern recognition is quite compute intensive.  So, it's best to do testing and development in three steps:

  • First, during development, perform a "smoke test" using training_images[:100] and training_labels[:100] for training and test_images[:100] and test_labels[:100] for testing
  • Second, use training_images[:1000] and training_labels[:1000] for training and test_images[:1000] and test_labels[:1000] for testing
  • Then (if your algorithm is sufficiently fast), use the full training and test sets.

Report error rates from the second and third step.

WS09 Pattern Recognition Homework 2

Due: Monday, November 9, 2009

1 Comparing Dissimiliarity Measures

Using train_images[0:1000] and test_images[0:1000], and using the distance implementations from scipy.spatial.distance, compare all the distances that it makes sense to compare in terms of their error rates.  Summarize your results.

  1. Which distance is best?  Which distance is worst?
  2. Try to come up with some explanation for the behaviors you see.
  3. Your tests were on a small dataset; do you think that distances that perform poorly on small datasets could start performing well for larter training sets?  Explain.

2 Comparing Canonicalizations

Compare error rates using the correlation distance using the following canonicalizations:

  1. Matching the mediods of the template and the unknown characters.
  2. Matching the centroids of the template and the unknown characters.

Use test and training samples as in problem 1.  How do the error rates compare?  If you find any characters that are classified differently, look at them and try to explain the difference.

WS09 Pattern Recognition Homework 3

Due: Monday, November 16, 2009
Please hand in in lecture or at recitation.

1 Mahalanobis Distance

Given a symmetric positive definite 2x2 matrix Q, define the norm

||x||_Q^2 = x \cdot Q \cdot x

Prove mathematically that this satisfies the triangle inequality; that is:

||u + v ||_Q \leq ||u||_Q + ||v||_Q

Please make sure your proof is readable; use a mix of mathematical notation and concise verbal explanations.

2 Covariance

Compute the covariance matrix \Sigma_{U_{ab}} of a 2D uniform density parameterized by a and b:

U(x,y;a,b) = \frac{1}{4ab} if |x| \leq a and |y| \leq b, 0 otherwise.

For a=3, b=1, in Python, plot the normal density with covariance matrix \Sigma_{U_{ab}} centered at the mean (either as an image representing density or as a set of iso contours), and superimpose the outline of the rectangle [-a,a] \times [-b,b]

Python's plotting functions are documented here: http://matplotlib.sourceforge.net/; you can either use draw everything into an array "by hand" and then use

from pylab import *
imshow(array,cmap=cm.gray); show()

or you might want to use one of the nicer plotting functions in matplotlib for contour plots.

WS09 Pattern Recognition Homework 4

Due: Monday, November 23, 2009
Please hand in in lecture or at recitation.

1 Translation Invariant Representation

Compute a translation invariant representation of the character images in MNIST by computing the Fourier spectrum.  Then compute the recognition error rate for the first 1000 test samples against the first 1000 training samples using nearest neighbor classification.  Repeat the experiment for different powers of the Fourier spectrum: F^{1/4}, F^{1/3}, F^{1/2}, F, F^2


Perform PCA on the training samples from the MNIST database.  Use the first 1000 training and test samples for trying out nearest neighbor classification in the PCA-transformed space.  Compute error rates and approximation error (in terms of percentage of total variance) for PCA spaces of dimensions k = 5, 10, 50, 100, 500, 900.  Report and discuss your results.

WS09 Pattern Recognition Homework 5

Due: Monday, November 30, 2009
Please hand in in lecture or at recitation.

1 Symmetric Bitflip Model

Assume that the MNIST data was actually generated by having one template per class and that the error model is a symmetric bitflip model, with different "flip" probabilities for each pixel.  Estimate the bitflip probabilities for each pixel and class and perform statistically optimal classification under this model.  Use images 0-50000 for estimating the bitflip probabilities, use images 0-1000 as templates, and images 50000-51000 for estimating error rates.  Compare your error rates with your previous error rates for the Euclidean model and the Hamming model.

2 Asymmetric Bitflip Model

Repeat the previous experiment, but use an asymmetric bitflip model.  Discuss and compare your results.

WS09 Pattern Recognition Homework 6

Due: Monday, December 7, 2009
Please hand in in lecture or at recitation.

1 Skew Correction

Perform skew correction on the MNIST data and then carry out nearest neighbor classification.  Compare the performance of nearest neighbor classification using both skew corrected and uncorrected data.  Use the Euclidean, correlation, and cosine dissimilarity measures.  Use samples 0:1000 for training and samples 50000:51000 for testing.

For skew correction, first compute the centered second order moments of the image (analogous to the covariance matrix).  Then, compute a skew matrix M =  {{1 \,\,\,\, \alpha} \choose {0\,\,\,\, 1}} that "deskews" the character, i.e., makes one of the axes of the second order moments vertical when the coordinates of the image are remapped as {x' \choose y'} = M {x \choose y}.  Use a function scipy.ndimage.interpolate to actually deskew the image by remapping the pixels.  Verify that your skew correction is working correctly; if it is, images of the digit "1" should usually just turn into vertical lines, no matter how they were slanted during writing.

2 Confusion Matrix

Compute confusion matrices for the nearest neighbor classifier you explored in exercise 1.  A confusion matrix is an N x N table (if there are N classes), where the cell at row i and column j contains the number of times that a sample that had the actual class j was predicted by the classifier to belong to class i.

Which characters pairs are most confused?  Which character pairs are the least confused?  Are there any relative differences in the confusion matrix for the different distance measures or preprocessing methods?

3 Normal Density Model of MNIST

Perform classification of MNIST characters using a multivariate normal density model of each class. That is, assume that p(x|c) = {\cal N}(x;\mu_c,\Sigma_c).  Estimate the values of \mu_c and \Sigma_c for each class c from samples 0:50000 of the MNIST data set.  Also estimate the prior P(c) from the data..  For \Sigma_c, use three different values: (1) the identity matrix, (2) the diagonal of the sample covariance matrix, and (3) the full sample covariance matrix.

WS09 Pattern Recognition Homework 7

Due: Monday, December 14, 2009
Please hand in in lecture or at recitation.

1 Evaluation of k-Means Classifier

Perform k-means clustering of the first 50000 samples of the MNIST data for k = 10, 30, 100, 300, 1000.  For each one of the means, compute a histogram of the corresponding classes and then label each mean with the most frequent of the classes assigned to it.  Finally, perform classification of samples 50000:51000 by finding the nearest mean and classifying the unknown sample as the class of the corresponding mean.  Run three trials each, since the k-means results differ depending on initialization.  What error rates do you get for the different values of k?

Report your error rates and discuss.

2 Gaussian Mixture Classifier

Perform Gaussian mixture classification using a diagonal covariance matrix.  That is, implement a multidimensional generalization of the one-dimensional Gaussian mixture classifier, computing for each cluster the pixel means and the pixel variances.  Use k = 10, 30, 100, 300, 1000 centers and compute the error rate as in problem 1, using the Mahalanobis distance for classification.

You will need to do something special for the inverse of the covariance matrix.  If the covariance matrix is \Sigma = diag(\sigma_{1,1},...,\sigma_{d,d}), then the inverse covariance matrix is ordinarily going to be \Sigma^{-1} = diag(\sigma^{-1}_{1,1},...,\sigma^{-1}_{d,d}).  However, since some of the \sigma_{i,i} are going to be close to zero, pick a parameter \lambda and let \Sigma^{-1} = diag(\max(\lambda,\sigma_{1,1})^-1,...,\max(\lambda,\sigma_{d,d}))^{-1}.  Try different values of \lambda, as in \lambda = 10^{-6}, 10^{-2}, 10^{-1}, 0.3

Report your error rates and discuss.

3 SOM Model

(a) Repeat experiment 1 but using a 10 x 10 self-organizing map to compute 100 cluster centers using the SOM code from the worksheet.  Visualize the cluster centers.  Compare your results with k=100 for the k-means classifier.

(b) Modify the SOM model to use a toroidal neighborhood relationship.  That is, (0,7) is adjacent to (9,7) and (3,0) is adjacent to (3,9).  This is a simple three line addition to the existing code that changes the computation of the dx and dy values.  Repeat experiment 1 using a 10 x 10 toroidal self-organizing map.  Visualize the cluster centers.  Compare and discuss your results with k=100 k-means classifier and with part (a). 

WS09 Pattern Recognition Homework 8

Due: Monday, January 11, 2009
Please hand in in lecture or at recitation.

1 Multiple Linear Classifiers for Character Recognition

Split the MNIST data as usual (0:50000 for training, 50000:51000 for testing and use the deskewed data).  Implement

  • the perceptron learning algorithm
  • linear least squares
  • the sigmoidal classifier with gradient descent training


  • 1-vs-rest
  • all pairs of classes

This will give you 6 different sets of binary vectors when you apply it to the training data (10 dimensional or 100 dimensional).

Based on the classification methods for binary vectors that we have discussed in the past lectures, try to come up with good ways of assigning final classes to these binary vectors.  Measure the performance of your methods on the test set.

2 Pairwise Classification Failure

Try to describe/construct  a simple pattern recognition problem in which any of the methods from Problem 1 will fail to give good results.