# Implementing the Perceptron algorithm from scratch with Python

## Introduction

Perceptron is a fundamental algorithm for binary classiﬁcation in Machine Learning. The perceptron is a mistake-driven online learning algorithm.

It is often said that the perceptron is modeled after neurons in the brain. It has m input values (which correspond with the features of the examples in the training set) and one output value. Each input value $x_i$  is multiplied by a weight-factor $w_i$. If the sum of the products between the feature value and weight-factor is larger than zero, the perceptron is activated and ‘fires’ a signal (+1). Otherwise it is not activated. It takes as input a vector of real-valued inputs x and makes a prediction $\bar{y} \in \big\{-1, +1\big\}$ . This simpliﬁes computations internally, but we again emphasize that your model’s predict method must return labels that are 0 or 1. Predictions are made using a linear classiﬁer: $\bar{y} = sign(w \cdot x)$. The term $w \cdot x$ is the dot product P of w and x computed as $\sum_i x_i w_i$. Updates to w are made only when a prediction is incorrect: $\bar{y} \neq y$. The new weight vector $w^{'}$ is a function of the current weight vector w and example x, y. The weight vector is updated so as to improve the prediction on the current example. Note that Perceptron naturally handles continuous and binary features, so no special processing is needed.

The basic structure of the algorithm is:

1. Initialize w to 0, set learning rate $\eta$ and number of iterations I
2. For each training iteration k = 1 . . . I:

(a) For each example i = 1 . . . N:

i. Receive an example $x_i$

ii. Predict the label $\bar{y}_i = sign(w \cdot x_i ) = 1$ if $w \cdot x_i \geqslant 0$ ; $-1$ otherwise.

iii. If $\bar{y}_i \neq y_i$ make an update to w: $w^{'} = w + \eta y_i x_i$

Note that there is no bias term in this version and you should not include one in your solution. Also observe the deﬁnition of “sign” to account for 0 values. Once again, while sign returns −1 and 1, the outputs from your predict method must be the actual labels, which are 0 or 1.

## Dataset

Several real world binary classiﬁcation datasets taken from a range of applications are used in this example. Each dataset is in the same format (described below) and contains a train, development, and test ﬁle. Here are some descriptions of these datasets:

### Biology

Biological research produces large amounts of data to analyze. Applications of machine learning to biology include ﬁnding regions of DNA that encode for proteins, classiﬁcation of gene expression data, and inferring regulatory networks from mRNA and proteomic data.

Our biology task of characterizing gene splice junction sequences comes from molecular biology, a ﬁeld interested in the relationships of DNA, RNA, and proteins. Splice junctions are points on a sequence at which “superﬂuous” RNA is removed before the process of protein creation in higher organisms. Exons are nucleotide sequences that are retained after splicing while introns are spliced out. The goal of this prediction task is to recognize DNA sequences that contain boundaries between exons and introns. Sequences contain exon/intron (EI) boundaries, intron/exon (IE) boundaries, or do not contain splice examples.

For a binary task, you will classify sequences as either EI boundaries (label 1) or non-splice sequences (label 0). Each learning instance contains a 60 base pair sequence (ex. ACGT), with some ambiguous slots. Features encode which base pair occurs at each position of the sequence.

### Finance

Finance is a data rich ﬁeld that employs numerous statistical methods for modeling and prediction, including the modeling of ﬁnancial systems and portfolios. 1 Our ﬁnancial task is to predict which Australian credit card applications should be accepted (label 1) or rejected (label 0). Each example represents a credit card application, where all values and attributes have been anonymized for conﬁdentiality. Features are a mix of continuous and discrete attributes and discrete attributes have been binarized.

### NLP

Natural language processing studies the processing and understanding of human languages. Machine learning is widely used in NLP tasks, including document understanding, information extraction, machine translation and document classiﬁcation.

Our NLP task is sentiment classiﬁcation. Each example is a product review taken from Amazon kitchen appliance reviews. The review is either positive (label 1) or negative (label 0) towards the product. Reviews are represented as uni-gram and bi-grams; each one and two word phrase is extracted as a feature.

### Speech

Statistical speech processing has its roots in the 1980s and has been the focus of machine learning research for decades. The area deals with all aspects of processing speech signals, including speech transcription, speaker identiﬁcation and speech information retrieval.

Our speech task is spoken letter identiﬁcation. Each example comes from a speaker saying one of the twenty-six letters of English alphabet. Our goal is to predict which letter was spoken. The data was collected by asking 150 subjects to speak each letter of the alphabet twice.

Each spoken utterance is represented as a collection of 617 real valued attributes scaled to be between -1.0 and 1.0. Features include spectral coeﬃcients; contour features, sonorant features, pre-sonorant features, and post-sonorant features. The binary task is to distinguish between the letter M (label 0) and N (label 1).

### Vision

Computer vision processes and analyzes images and videos and it is one of the fundamental areas of robotics. Machine learning applications include identifying objects in images, segmenting video and understanding scenes in ﬁlm.

Our vision task is image segmentation. In image segmentation, an image is divided into segments are labeled according to content. The images in our data have been divided into 3×3 regions. Each example is a region and features include the centroids of parts of the image, pixels in a region, contrast, intensity, color, saturation and hue. The goal is to identify the primary element in the image as either a brickface, sky, foliage, cement, window, path or grass. In the binary task, you will distinguish segments of foliage (label 0) from grass (label 1).

### Synthetic Data

When developing algorithms it is often helpful to consider data with known properties. We typically create synthetic data for this purpose. To help test your algorithms, we are providing two synthetic datasets. These data are to help development.

#### Easy

The easy data is labeled using a trivial classiﬁcation function. Any reasonable learning algorithm should achieve near ﬂawless accuracy. Each example is a 10 dimensional instance drawn from a multi-variate Gaussian distribution with 0 mean and a diagonal identity covariance matrix. Each example is labeled according to the presence one of 6 features; the remaining features are noise.

#### Hard

Examples in this data are randomly labeled. Since there is no pattern, no learning algorithm should achieve accuracy signiﬁcantly diﬀerent from random guessing (50%). Data is generated in an identical manner as Easy except there are 94 noisy features.

## Data Format

The data are provided in what is known as SVM-light format. Each line contains a single example:

0 1:-0.2970 2:0.2092 5:0.3348 9:0.3892 25:0.7532 78:0.7280

The ﬁrst entry on the line is the label. The label can be an integer (0/1 for binary classiﬁcation) or a real valued number (for regression.) The classiﬁcation label of −1 indicates unlabeled. Subsequent entries on the line are features. The entry 25:0.7532 means that feature 25 has value 0.7532. Features are 1-indexed.

## My Implementation

To use these datasets, you need to write​ a python script to parses a given data ﬁle and returns features and labels. In my implementation​n, the features are stored as a sparse matrix of ﬂoats (and in particular as a scipy.sparse.csr matrix of ﬂoats), which has num examples rows and num features columns. The labels are stored as a dense 1-D array of integers with num examples elements.

Below are my perceptron model.

import numpy as np
class Model(object):

def __init__(self):
self.num_input_features = None

def fit(self, X, y):
""" Fit the model.
Args:
X: A compressed sparse row matrix of floats with shape
[num_examples, num_features].
y: A dense array of ints with shape [num_examples].
"""
raise NotImplementedError()

def predict(self, X):
""" Predict.
Args:
X: A compressed sparse row matrix of floats with shape
[num_examples, num_features].

Returns:
A dense array of ints with shape [num_examples].
"""
raise NotImplementedError()

class Perceptron(Model):

def __init__(self, n, max_iter):
super().__init__()
self.max_iter = max_iter
self.n = n
self.w = []

def fit(self, X, y):
self.ne, self.nf = np.shape(X)
self.w = np.zeros(self.nf)
temp = [-1 if ele ==0 else 1 for ele in y]
X = X.toarray()
for i in range(0, self.max_iter):
for j in range(0, self.ne):
a = np.dot(self.w, X[j])
if np.sign(a) >= 0:
pre_y = 1
else:
pre_y = -1
if pre_y != temp[j]:
self.w += self.n * temp[j] * X[j]

def predict(self, X):
ne, nf = np.shape(X)
if nf  self.nf:
X = X[:, :self.nf]
predicted_Y=np.empty([np.shape(X)], dtype=np.int)
X = X.toarray()
for i in range(np.shape(X)):
y_elem = int(np.sign(np.dot(self.w, X[i])))
if y_elem == -1:
y_elem = 0
predicted_Y[i] = y_elem
return predicted_Y


If this post helped you, please consider supporting me.

Categories: