The following questions, or similar questions, may appear on the final exam.  Use them as a study guide.  The questions cover lectures, exercises, readings, and homework.

Pattern Recognition

  • General Pattern Recognition
    • State Bayes rule.
    • Given histograms of the c.c.d., find the decision boundaries.
    • What is the "empirical error rate"?  How does it differ from the Bayes error rate?
    • How does pattern recognition differ from other kinds of software engineering?
    • * Explain pattern recognition as the interaction between "nature" and a "model".
    • * Have the ability to write and correct algorithms expressed in Python and NumPy.
  • Nearest Neighbor Classification
    • Describe the nearest neighbor classifier.

      What is a distance metric? What properties does it have to satisfy?

      What is a norm?

      What is the triangle inequality?

      What is the relationship between a distance metric and a norm?


      • Minkowski norm
      • Euclidean norm
      • Manahattan norm
      • Hamming distance
      • maximum norm

      What is a unit ball?

      What is the relationship between the Euclidean norm and the inner product?

      Describe applications of the nearest neighbor classifier.

      How do transformations of the input vector affect nearest neighbor classifiers?  Give examples.

      What is the relationship between the k-nearest neighbor classifier and posterior probabilities?

      What asymptotic relation exists between the 1-nn classifier and the Bayes optimal classifier?

      Derive an asymptotic bound on the error rate of the 1nn classifier from the Bayes optimal classifier.

      What is the relationship between nearest neighbor classification and non-parametric density estimation?

      How does k need to grow in a k nearest neighbor classifier in order to achieve Bayes error rates asymptotically?

      How would you pick the size of k in a k-nearest neighbor classifier in practice?

      What is cross-validation of classifier parameters?

  • Feature Normalization
    • What are invariances?

      What is the effect of invariances on nearest neighbor classification?

      What is the centroid of a pattern?

      How do you compute the centroid of a pattern?

      How can you use the centroid in order to improve classification?

      How can you compensate for different translations of characters?

      What is a canonical representative (of a class)?

      What is an invariant representation?

      What are Hu moments and why are they important?

      How can you use the Fourier transform to achieve a translation invariant representatiobn?

      How can you use the Fourier transform to achieve a rotation invariant representation?

      What is the Fourier-Mellin transform?

      What is a centralized second order moment?

      What is skew correction and how is it computed?

      State the equation for the coordinate transformation for skew correction.

      * What is the mathematical definition of a norm/metric?

      * What is the mathematical relationship between a norm and a metric?

      * Write down the formulas for computing the Euclidean, Manhattan, Taxicab, Hamming distance, maximum, Chebyshef, Cityblock, and Minkowsky norm/metric.

      * Give plausible examples of dissimilarity measures that violate each of the axioms.

      * Consider a pattern recognition problem in which input vectors v are to be classified as classes c.  Explain how the problem is affected if the input vector v is transformed into k v for some real number k.

      * Define a translation invariant metric.

      * Explain what the effect of using a translation invariant metric is on a recognition problem.

      * Explain how you can construct a useful translation invariant metric from a metric that is not translation invariant by geometric matching.

      * Explain how you can construct a useful translation invariant metric from a metric that is not translation invariant by centroid matching.

      * State the formula for the centroid.

      * Write loop-free NumPy code for computing the centroid of an image.

      * Images of handwritten characters are sometimes encoded with black=0/white=255, sometimes with black=255/white=0, and sometimes with black=1/white=0.  Explain what the effect of each choice is on the centroid computation.

      * What is the canonical representative of a pattern, given a set of possible transformations?

      * What is an equivalence class of patterns under a transformation group (e.g. translation)?

      * Explain anisotropic size normalization of characters by their bounding box.

      * What is the difference between anisotropic and isotropic rescaling?

      * What is the difference between finding a canonical representative under translation using centroids, medioids, and bounding boxes?  Which one works best in practice?

  • Non-Parametric Density Estimation
    • Describe the kernel density estimation algorithm.

      What is a window function?

      What kinds of window functions are commonly used?

      How does one use kernel density estimation for classification?

      How should you choose the scale parameter so that a kernel density estimator converges asymptotically?

      Derive the kernel density estimator.

      What is the kNN density estimator?

      How should you choose k for the kNN density estimator?

      Is the kernel density estimate unbiased?

      What does the kernel density estimate actually estimate?  How can you express that in terms of the true density and the window function?

  • Ellipses, Gaussians, Covariances
    • How is the expected value defined?

      What is a random variable?

      What is the definition of the variance of a random variable?

      What is the definition of the expected deviation?

      What is the definition of the covariance?

      Given a generator of normally distributed samples, describe a procedure for generating samples from a multivariate normal density with a given mean and convariance matrix.

      What is the sample mean?

      What is a moment?  How is it computed?

      Given samples from a 2D normal density, estimate the major and minor axis of its covariance matrix.

      What do the eigenvalues of the covariance matrix represent?

      What is the Mahalanobis distance?

      Given the angle and lengths of the major and minor axes, and the center, write down an implicit equation for an ellipse.

      Given a covariance matrix, find the angle and lengths of the major and minor aces of the level set of a normal density with that covariance matrix.

      Explain the difference between the sample mean and the population mean.

      Generally, how quickly does the sample mean converge to the population mean?

      What is a quadratic form?

      Compute the eigenvectors and eigenvalues of this ___ matrix.

      What is the formula for the sample mean?

      What is the formula for the sample covariance?

      # What is the formula for the population/true mean?

      # What is the formula for the population/true covariance?

      # Compute the sample mean and convarance of this ___ density.

      # What is the difference between "density" and "distribution"?

      # What is a random variable?

      # What is the expected deviation?

      # Write down the formula for the 1D standard normal density.

      # Write down the formula for for the multivariate normal density.

  • PCA
    • What is PCA?
    • Why would you use PCA?
    • How do you compute PCA?
    • How can you compute the eigenvector corresponding to the largest eigenvalue of a matrix iteratively?
    • How can you compute the first 10 eigenvectors of a positive definite matrix iteratively?
    • What is the effect of using PCA for nearest neighbor classification?
  • Decision Regions for the Normal Density
    • What is the central limit theorem?

      Why is the central limit theorem important for pattern recognition?

      Given two 1D Gaussians, derive the optimal decision boundaries.

      What is the "normal generative model"?

      Give the formula for the univariate normal density.

      Give the formula for a multivariate normal density.

      What kind of decision boundaries occur in classification problems in which all c.c.d.s are normal densities?

      In a classification problem with 2D normal ccds, what geometric shapes occur as decision boundaries?

      Give an example of two ccds that have a circular decision boundary.

      Give an example of two normal densities whose decision boundary is two parallel planes.

      Given these two (non-normal) ccds ______ , compute the decision boundary.

  • k-Means Algorithm, EM, GMM
    • State the k-means algorithm.
    • What is the purpose of the k-means algorithm?
    • How would you use the k-means algorithm for classification?
    • What is the advantage of using k-means vs nearest neighbors?
    • What is the EM algorithm?
    • Given samples from a Gaussian mixture, how would you recover the mixture parameters?
    • Describe the EM algorithm for EM estimation.
  • SOM
    • State the SOM algorithm.
    • Explain how the SOM algorithm works.
    • How would you implement (a kind of topology) with the SOM algorithm?
    • What does the SOM algorithm compute?
    • How would you use the SOM algorithm for classification?
  • Linear Discriminants
    • Name three different methods for computing a linear discriminant function from training data.

      Give an example of a classification problem that is optimally solved by a linear discriminant function.

      State the perceptron learning algorithm.

      What is the difference between the batch and the single sample perceptron learning algorithm?

      What is linear separability?

      What is the perceptron criterion function?

      Derive the perceptron learning algorithm from the perceptron criterion function and gradient descent.

      The optimal solution to some classification problem involving normal densities is a linear discriminant function; is the problem linearly separable?

      What is a standardized feature vector in the context of perceptron learning?

      Define: training, test, validation set

      How do you use the pseudoinverse for finding a linear discriminant?

      When applied to a non-separable problem, what kind of solution does the perceptron algorithm return?

      Find a solution to a quadratic discriminant problem using gradient descent; derive the update algorithm.

  • SVM
    • Explain how to obtain non-linear discriminant (e.g., quadratic) functions via feature space transformations and linear discriminant classifiers.

      Define and illustrate the curse of dimensionality.

      Define and illustrate a maximum margin classifier.

      Bound the generalization error in terms of SVM and the number of support vectors.

      Define and illustrate the kernel trick.

      Derive a kernel version of the perceptron learning algorithm.

      Give three commonly used kernel functions.

      State the maximum margin perceptron learning algorithm.

  • MLPs
    • Define:

      • # input/hidden/output layers
      • # weights
      • # inputs
      • # activation functions
      • # units
      • # activations
      • # credit assignment
      • # error

      State the forward equations for an MLP.

      State the update equations for an MLP.

      Derive the backpropagation algorithm.

      Describe the relationship between modern MLPs and historical multi-layer perceptrons.

      Justify linear threshold machines in terms of McCullough Pitts neurons.

      Provide an intuitive explanation for the backpropagation update rule.

      Define: training set/validation set error

      Define overtraining in terms of learning curves.

      Assume a neural network has been trained to perform some classification problem.  Now you need to solve a new classification problem in which the inputs have been linearly transformed.  What are the neural network weights for the new problem?

      How should inputs to a neural network be normalized?

      Assume you have trained a neural network on a classification problem.  Now you solve a classification problem with the same ccds but different priors.  How can you use the neural network you have to solve the new problem?

      Assume you change the activiation functions of a one-hidden-layer neural network to linear and then train.  What is the effect?

  • Bayesian Parameter Estimation
    • define MAP estimator
    • derive maximum likelihood estimator of the mean of a gaussian
    • define bias of an estimator
    • bayesian parameter estimation
    • define reproducing densitites and conjugate priors
    • explain difference between MAP and Bayesian parameter estimation
    • state bayesian parameter updates in its general (integral) form
    • know the conjugate priors for the normal and binomial densities [see exercises]
    • apply conjugate priors for bayesian parameter estimation
    • posteriors for estimates of binonial parameters
    • posteriors for estimates of the mean of a normal density
    • incremental updates of parameter estimates
    • explain the difference between Bayesian and ML estimates
    • Bayesian and ML  estimate of the parameter of the uniform density U(0,theta)
    • minimum error decision boundaries for a single feature [exercises]
    • construction of single feature classifiers from data
  • Python and Array Languages
    • Express this _______ algorithm loop-free.
    • Implement quicksort without loops.
    • Correct this _______ program for computing ___________.
    • Fill in the missing parts of this ______________ program so that it computes ______________.