机器学习题库-new - 图文(5)

2019-08-30 20:23

3、 LDA

Using a set of 100 labeled training examples (two classes), we train the following models:

GaussI : A Gaussian mixture model (one Gaussian per class), where the covariance matrices are both set to I (identity matrix).

GaussX: A Gaussian mixture model (one Gaussian per class) without any restrictions on the covariance matrices.

LinLog: A logistic regression model with linear features.

QuadLog: A logistic regression model, using all linear and quadratic features.

(1) After training, we measure for each model the average log probability of labels given examples in

the training set. Specify all the equalities or inequalities that must always hold between the models relative to this performance measure. We are looking for statements like “model 1 <= model 2” or “model 1 = model 2”. If no such statement holds, write “none”.

GaussI <= LinLog (both have logistic postiriors, and LinLog is the logistic model maximizing the average log probabilities)

GaussX <= QuadLog (both have logistic postiriors with quadratic features, and QuadLog is the model of this class maximizing the average log probabilities)

LinLog <= QuadLog (logistic regression models with linear features are a subclass of logistic regression models with quadratic functions— the maximum from the superclass is at least as high as the maximum from the subclass)

GaussI <= QuadLog (follows from above inequalities)

(GaussX will have higher average log joint probabilities of examples and labels, then will GaussI. But have higher average log joint probabilities does not necessarily translate to higher average log conditional probabilities)

(2) Which equalities and inequalities must always hold if we instead use the mean classification error

in the training set as the performance measure? Again use the format “model 1 <= model 2” or “model 1 = model 2”. Write “none” if no such statement holds.

None. Having higher average log conditional probabilities, or average log joint probabilities, does not necessarily translate to higher or lower classification error. Counterexamples can be constructed for all pairs in both directions.

Although there is no inequalities which is always correct, it is commonly the case that GaussX <= GaussI and that QuadLog <= LinLog. Partial credit of up to two points was awarded for these inequalities.

5、We consider here generative and discriminative approaches for solving the classification problem illustrated in Figure 4.1. Specifically, we will use a mixture of Gaussians model and regularized logistic regression models.

Figure 4.1. Labeled training set, where “+” corresponds to class y = 1.

(1) We will first estimate a mixture of Gaussians model, one Gaussian per class, with the constraint

that the covariance matrices are identity matrices. The mixing proportions (class frequencies) and the means of the two Gaussians are free parameters.

a) Plot the maximum likelihood estimates of the means of the two class conditional Gaussians in Figure 4.1. Mark the means as points “x” and label them “0” and “1” according to the class.

The means should be close to the center of mass of the points. b) Draw the decision boundary in the same figure.

Since the two classes have the same number of points and the same covariance matrices, the decision boundary is a line and, moreover, should be drawn as the orthogonal bisector of the line segment connecting the class means.

(2) We have also trained regularized linear logistic regression models

for the same data. The regularization penalties, used in penalized conditional loglikelihood estimation, were

-Cwi2, where i = 0, 1, 2. In other words, only one of the parameters were regularized in each

case. Based on the data in Figure 4.1, we generated three plots, one for each regularized parameter, of the number of misclassified training points as a function of C (Figure 4.2). The three plots are not identified with the corresponding parameters, however. Please assign the “top”, “middle”, and “bottom” plots to the correct parameter, w0, w1, or w2, the parameter that was regularized in the plot. Provide a brief justification for each assignment.

? “top” = (w1)

By strongly regularizing w1 we force the boundary to be horizontal in the figure. The logistic regression model tries to maximize the log-probability of classifying the data correctly. The highest penalty comes from the misclassified points and thus the boundary will tend to balance the (worst) errors. In the figure, this is roughly speaking x2 = 1 line, resulting in 4 errors. ? “middle” = (w0)

If we regularize w0, then the boundary will eventually go through the origin (bias term set to zero). Based on the figure we can find a good linear boundary through the origin with only one error. ? “bottom” = (w2)

The training error is unaffected if we regularize w2 (constrain the boundary to be vertical); the value of w2 would be small already without regularization.

共7页:

机器学习题库-new - 图文(5).doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档