Implementing the Perceptron algorithm from scratch with Python


Perceptron is a fundamental algorithm for binary classification in Machine Learning. The perceptron is a mistake-driven online learning algorithm.

It is often said that the perceptron is modeled after neurons in the brain. It has m input values (which correspond with the features of the examples in the training set) and one output value. Each input value x_i  is multiplied by a weight-factor w_i. If the sum of the products between the feature value and weight-factor is larger than zero, the perceptron is activated and ‘fires’ a signal (+1). Otherwise it is not activated.


It takes as input a vector of real-valued inputs x and makes a prediction \bar{y} \in \big\{-1, +1\big\} . This simplifies computations internally, but we again emphasize that your model’s predict method must return labels that are 0 or 1. Predictions are made using a linear classifier: \bar{y} = sign(w \cdot x). The term w \cdot x is the dot product P of w and x computed as \sum_i x_i w_i. Updates to w are made only when a prediction is incorrect: \bar{y} \neq y. The new weight vector w^{'} is a function of the current weight vector w and example x, y. The weight vector is updated so as to improve the prediction on the current example. Note that Perceptron naturally handles continuous and binary features, so no special processing is needed.

The basic structure of the algorithm is:

  1. Initialize w to 0, set learning rate \eta and number of iterations I
  2. For each training iteration k = 1 . . . I:

(a) For each example i = 1 . . . N:

i. Receive an example x_i

ii. Predict the label \bar{y}_i = sign(w \cdot x_i ) = 1 if w \cdot x_i \geqslant 0 ; -1 otherwise.

iii. If \bar{y}_i \neq y_i make an update to w: w^{'} = w + \eta y_i x_i

Note that there is no bias term in this version and you should not include one in your solution. Also observe the definition of “sign” to account for 0 values. Once again, while sign returns −1 and 1, the outputs from your predict method must be the actual labels, which are 0 or 1.


Several real world binary classification datasets taken from a range of applications are used in this example. Each dataset is in the same format (described below) and contains a train, development, and test file. Here are some descriptions of these datasets:


Biological research produces large amounts of data to analyze. Applications of machine learning to biology include finding regions of DNA that encode for proteins, classification of gene expression data, and inferring regulatory networks from mRNA and proteomic data.

Our biology task of characterizing gene splice junction sequences comes from molecular biology, a field interested in the relationships of DNA, RNA, and proteins. Splice junctions are points on a sequence at which “superfluous” RNA is removed before the process of protein creation in higher organisms. Exons are nucleotide sequences that are retained after splicing while introns are spliced out. The goal of this prediction task is to recognize DNA sequences that contain boundaries between exons and introns. Sequences contain exon/intron (EI) boundaries, intron/exon (IE) boundaries, or do not contain splice examples.

For a binary task, you will classify sequences as either EI boundaries (label 1) or non-splice sequences (label 0). Each learning instance contains a 60 base pair sequence (ex. ACGT), with some ambiguous slots. Features encode which base pair occurs at each position of the sequence.


Finance is a data rich field that employs numerous statistical methods for modeling and prediction, including the modeling of financial systems and portfolios. 1 Our financial task is to predict which Australian credit card applications should be accepted (label 1) or rejected (label 0). Each example represents a credit card application, where all values and attributes have been anonymized for confidentiality. Features are a mix of continuous and discrete attributes and discrete attributes have been binarized.


Natural language processing studies the processing and understanding of human languages. Machine learning is widely used in NLP tasks, including document understanding, information extraction, machine translation and document classification.

Our NLP task is sentiment classification. Each example is a product review taken from Amazon kitchen appliance reviews. The review is either positive (label 1) or negative (label 0) towards the product. Reviews are represented as uni-gram and bi-grams; each one and two word phrase is extracted as a feature.


Statistical speech processing has its roots in the 1980s and has been the focus of machine learning research for decades. The area deals with all aspects of processing speech signals, including speech transcription, speaker identification and speech information retrieval.

Our speech task is spoken letter identification. Each example comes from a speaker saying one of the twenty-six letters of English alphabet. Our goal is to predict which letter was spoken. The data was collected by asking 150 subjects to speak each letter of the alphabet twice.

Each spoken utterance is represented as a collection of 617 real valued attributes scaled to be between -1.0 and 1.0. Features include spectral coefficients; contour features, sonorant features, pre-sonorant features, and post-sonorant features. The binary task is to distinguish between the letter M (label 0) and N (label 1).


Computer vision processes and analyzes images and videos and it is one of the fundamental areas of robotics. Machine learning applications include identifying objects in images, segmenting video and understanding scenes in film.

Our vision task is image segmentation. In image segmentation, an image is divided into segments are labeled according to content. The images in our data have been divided into 3×3 regions. Each example is a region and features include the centroids of parts of the image, pixels in a region, contrast, intensity, color, saturation and hue. The goal is to identify the primary element in the image as either a brickface, sky, foliage, cement, window, path or grass. In the binary task, you will distinguish segments of foliage (label 0) from grass (label 1).

Synthetic Data

When developing algorithms it is often helpful to consider data with known properties. We typically create synthetic data for this purpose. To help test your algorithms, we are providing two synthetic datasets. These data are to help development.


The easy data is labeled using a trivial classification function. Any reasonable learning algorithm should achieve near flawless accuracy. Each example is a 10 dimensional instance drawn from a multi-variate Gaussian distribution with 0 mean and a diagonal identity covariance matrix. Each example is labeled according to the presence one of 6 features; the remaining features are noise.


Examples in this data are randomly labeled. Since there is no pattern, no learning algorithm should achieve accuracy significantly different from random guessing (50%). Data is generated in an identical manner as Easy except there are 94 noisy features.

Data Format

The data are provided in what is known as SVM-light format. Each line contains a single example:

0 1:-0.2970 2:0.2092 5:0.3348 9:0.3892 25:0.7532 78:0.7280

The first entry on the line is the label. The label can be an integer (0/1 for binary classification) or a real valued number (for regression.) The classification label of −1 indicates unlabeled. Subsequent entries on the line are features. The entry 25:0.7532 means that feature 25 has value 0.7532. Features are 1-indexed.

Download My Dataset

My Implementation

To use these datasets, you need to write​ a python script to parses a given data file and returns features and labels. In my implementation​n, the features are stored as a sparse matrix of floats (and in particular as a scipy.sparse.csr matrix of floats), which has num examples rows and num features columns. The labels are stored as a dense 1-D array of integers with num examples elements.

Below are my perceptron model.

import numpy as np
class Model(object):

    def __init__(self):
        self.num_input_features = None

    def fit(self, X, y):
        """ Fit the model.
            X: A compressed sparse row matrix of floats with shape
                [num_examples, num_features].
            y: A dense array of ints with shape [num_examples].
        raise NotImplementedError()

    def predict(self, X):
        """ Predict.
            X: A compressed sparse row matrix of floats with shape
                [num_examples, num_features].

            A dense array of ints with shape [num_examples].
        raise NotImplementedError()

class Perceptron(Model):

    def __init__(self, n, max_iter):
        self.max_iter = max_iter
        self.n = n
        self.w = []

    def fit(self, X, y):, = np.shape(X)
        self.w = np.zeros(
        temp = [-1 if ele ==0 else 1 for ele in y]
        X = X.toarray()
        for i in range(0, self.max_iter):
            for j in range(0,
                a =, X[j])
                if np.sign(a) >= 0:
                    pre_y = 1
                    pre_y = -1
                if pre_y != temp[j]:
                    self.w += self.n * temp[j] * X[j]
    def predict(self, X):
        ne, nf = np.shape(X)
        if nf
            X = X[:,]
        X = X.toarray()
        for i in range(np.shape(X)[0]):
            y_elem = int(np.sign(, X[i])))   
            if y_elem == -1:
                y_elem = 0
            predicted_Y[i] = y_elem
        return predicted_Y

If this post helped you, please consider supporting me.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s