Introduction
Perceptron is a fundamental algorithm for binary classification in Machine Learning. The perceptron is a mistake-driven online learning algorithm.
It is often said that the perceptron is modeled after neurons in the brain. It has m input values (which correspond with the features of the examples in the training set) and one output value. Each input value is multiplied by a weight-factor
. If the sum of the products between the feature value and weight-factor is larger than zero, the perceptron is activated and ‘fires’ a signal (+1). Otherwise it is not activated.
It takes as input a vector of real-valued inputs x and makes a prediction . This simplifies computations internally, but we again emphasize that your model’s predict method must return labels that are 0 or 1. Predictions are made using a linear classifier:
. The term
is the dot product P of w and x computed as
. Updates to w are made only when a prediction is incorrect:
. The new weight vector
is a function of the current weight vector w and example x, y. The weight vector is updated so as to improve the prediction on the current example. Note that Perceptron naturally handles continuous and binary features, so no special processing is needed.
The basic structure of the algorithm is:
- Initialize w to 0, set learning rate
and number of iterations I
- For each training iteration k = 1 . . . I:
(a) For each example i = 1 . . . N:
i. Receive an example
ii. Predict the label if
;
otherwise.
iii. If make an update to w:
Note that there is no bias term in this version and you should not include one in your solution. Also observe the definition of “sign” to account for 0 values. Once again, while sign returns −1 and 1, the outputs from your predict method must be the actual labels, which are 0 or 1.
Dataset
Several real world binary classification datasets taken from a range of applications are used in this example. Each dataset is in the same format (described below) and contains a train, development, and test file. Here are some descriptions of these datasets:
Biology
Biological research produces large amounts of data to analyze. Applications of machine learning to biology include finding regions of DNA that encode for proteins, classification of gene expression data, and inferring regulatory networks from mRNA and proteomic data.
Our biology task of characterizing gene splice junction sequences comes from molecular biology, a field interested in the relationships of DNA, RNA, and proteins. Splice junctions are points on a sequence at which “superfluous” RNA is removed before the process of protein creation in higher organisms. Exons are nucleotide sequences that are retained after splicing while introns are spliced out. The goal of this prediction task is to recognize DNA sequences that contain boundaries between exons and introns. Sequences contain exon/intron (EI) boundaries, intron/exon (IE) boundaries, or do not contain splice examples.
For a binary task, you will classify sequences as either EI boundaries (label 1) or non-splice sequences (label 0). Each learning instance contains a 60 base pair sequence (ex. ACGT), with some ambiguous slots. Features encode which base pair occurs at each position of the sequence.
Finance
Finance is a data rich field that employs numerous statistical methods for modeling and prediction, including the modeling of financial systems and portfolios. 1 Our financial task is to predict which Australian credit card applications should be accepted (label 1) or rejected (label 0). Each example represents a credit card application, where all values and attributes have been anonymized for confidentiality. Features are a mix of continuous and discrete attributes and discrete attributes have been binarized.
NLP
Natural language processing studies the processing and understanding of human languages. Machine learning is widely used in NLP tasks, including document understanding, information extraction, machine translation and document classification.
Our NLP task is sentiment classification. Each example is a product review taken from Amazon kitchen appliance reviews. The review is either positive (label 1) or negative (label 0) towards the product. Reviews are represented as uni-gram and bi-grams; each one and two word phrase is extracted as a feature.
Speech
Statistical speech processing has its roots in the 1980s and has been the focus of machine learning research for decades. The area deals with all aspects of processing speech signals, including speech transcription, speaker identification and speech information retrieval.
Our speech task is spoken letter identification. Each example comes from a speaker saying one of the twenty-six letters of English alphabet. Our goal is to predict which letter was spoken. The data was collected by asking 150 subjects to speak each letter of the alphabet twice.
Each spoken utterance is represented as a collection of 617 real valued attributes scaled to be between -1.0 and 1.0. Features include spectral coefficients; contour features, sonorant features, pre-sonorant features, and post-sonorant features. The binary task is to distinguish between the letter M (label 0) and N (label 1).
Vision
Computer vision processes and analyzes images and videos and it is one of the fundamental areas of robotics. Machine learning applications include identifying objects in images, segmenting video and understanding scenes in film.
Our vision task is image segmentation. In image segmentation, an image is divided into segments are labeled according to content. The images in our data have been divided into 3×3 regions. Each example is a region and features include the centroids of parts of the image, pixels in a region, contrast, intensity, color, saturation and hue. The goal is to identify the primary element in the image as either a brickface, sky, foliage, cement, window, path or grass. In the binary task, you will distinguish segments of foliage (label 0) from grass (label 1).
Synthetic Data
When developing algorithms it is often helpful to consider data with known properties. We typically create synthetic data for this purpose. To help test your algorithms, we are providing two synthetic datasets. These data are to help development.
Easy
The easy data is labeled using a trivial classification function. Any reasonable learning algorithm should achieve near flawless accuracy. Each example is a 10 dimensional instance drawn from a multi-variate Gaussian distribution with 0 mean and a diagonal identity covariance matrix. Each example is labeled according to the presence one of 6 features; the remaining features are noise.
Hard
Examples in this data are randomly labeled. Since there is no pattern, no learning algorithm should achieve accuracy significantly different from random guessing (50%). Data is generated in an identical manner as Easy except there are 94 noisy features.
Data Format
The data are provided in what is known as SVM-light format. Each line contains a single example:
0 1:-0.2970 2:0.2092 5:0.3348 9:0.3892 25:0.7532 78:0.7280
The first entry on the line is the label. The label can be an integer (0/1 for binary classification) or a real valued number (for regression.) The classification label of −1 indicates unlabeled. Subsequent entries on the line are features. The entry 25:0.7532 means that feature 25 has value 0.7532. Features are 1-indexed.
My Implementation
To use these datasets, you need to write a python script to parses a given data file and returns features and labels. In my implementationn, the features are stored as a sparse matrix of floats (and in particular as a scipy.sparse.csr matrix of floats), which has num examples rows and num features columns. The labels are stored as a dense 1-D array of integers with num examples elements.
Below are my perceptron model.
import numpy as np class Model(object): def __init__(self): self.num_input_features = None def fit(self, X, y): """ Fit the model. Args: X: A compressed sparse row matrix of floats with shape [num_examples, num_features]. y: A dense array of ints with shape [num_examples]. """ raise NotImplementedError() def predict(self, X): """ Predict. Args: X: A compressed sparse row matrix of floats with shape [num_examples, num_features]. Returns: A dense array of ints with shape [num_examples]. """ raise NotImplementedError() class Perceptron(Model): def __init__(self, n, max_iter): super().__init__() self.max_iter = max_iter self.n = n self.w = [] def fit(self, X, y): self.ne, self.nf = np.shape(X) self.w = np.zeros(self.nf) temp = [-1 if ele ==0 else 1 for ele in y] X = X.toarray() for i in range(0, self.max_iter): for j in range(0, self.ne): a = np.dot(self.w, X[j]) if np.sign(a) >= 0: pre_y = 1 else: pre_y = -1 if pre_y != temp[j]: self.w += self.n * temp[j] * X[j] def predict(self, X): ne, nf = np.shape(X) if nf self.nf: X = X[:, :self.nf] predicted_Y=np.empty([np.shape(X)[0]], dtype=np.int) X = X.toarray() for i in range(np.shape(X)[0]): y_elem = int(np.sign(np.dot(self.w, X[i]))) if y_elem == -1: y_elem = 0 predicted_Y[i] = y_elem return predicted_Y
If this post helped you, please consider supporting me.
