Multiclass logistic regression

Instead of y=0,1 we will expand our definition so that y=0,1...n. Basically we re-run binary classification multiple times, once for each class.

Procedure

1. Divide the problem into n+1 binary classification problems (+1 because the index starts at 0?).

2. For each class…

3. Predict the probability the observations are in that single class.

4. prediction = <math>max(probability of the classes)

For each sub-problem, we select one class (YES) and lump all the others into a second class (NO). Then we take the class with the highest predicted value.

Softmax activation

The softmax function (softargmax or normalized exponential function) is a function that takes as input a vector of K real numbers, and normalizes it into a probability distribution consisting of K probabilities proportional to the exponentials of the input numbers. That is, prior to applying softmax, some vector components could be negative, or greater than one; and might not sum to 1; but after applying softmax, each component will be in the interval [ 0 , 1 ] , and the components will add up to 1, so that they can be interpreted as probabilities. The standard (unit) softmax function is defined by the formula

σ(zi)=ez(i)∑Kj=1ez(j) for i=1,.,.,.,K and z=z1,.,.,.,zK

In words: we apply the standard exponential function to each element zi of the input vector z and normalize these values by dividing by the sum of all these exponentials; this normalization ensures that the sum of the components of the output vector σ(z)is 1.

Scikit-Learn example

Let’s compare our performance to the LogisticRegression model provided by scikit-learn.

import sklearn

from sklearn.linear_model import LogisticRegression

from sklearn.cross_validation import train_test_split

# Normalize grades to values between 0 and 1 for more efficient computation

normalized_range = sklearn.preprocessing.MinMaxScaler(feature_range=(-1,1))

# Extract Features + Labels

labels.shape =  (100,) #scikit expects this

features = normalized_range.fit_transform(features)

# Create Test/Train

features_train,features_test,labels_train,labels_test = train_test_split(features,labels,test_size=0.4)

# Scikit Logistic Regression

scikit_log_reg = LogisticRegression()

scikit_log_reg.fit(features_train,labels_train)

#Score is Mean Accuracy

scikit_score = clf.score(features_test,labels_test)

print 'Scikit score: ', scikit_score

#Our Mean Accuracy

observations, features, labels, weights = run()

probabilities = predict(features, weights).flatten()

classifications = classifier(probabilities)

our_acc = accuracy(classifications,labels.flatten())

print 'Our score: ',our_acc

Scikit score: 0.88. Our score: 0.89

Introduction to Machine Learning

Search This Blog