Getting Started with Support Vector Machine
SVM is built to work on smaller, but highly complex data used in the creation of stronger, more efficient models. Support Vector Machine, or SVM, is a supervised ML algorithm useful for both regression and classification problems.
Every individual interested in data analysis and data science usually goes through a stage, where they are astonished by the staggering vastness of the field. The plethora of tools and methods, algorithms and concepts overwhelms us as we move on from the typical methods of analysis and predictions. At The Analytics Bay we try to break down new concepts for easier understanding, and in this article we focus on Support Vector Machine which is commonly known as SVM.
A very popular analogy to understand the difference between SVM and other models using Regression is: Regression is a sword which is used bluntly and is applicable for large but simpler data. SVM is a carefully crafted knife and provides precision in predictions that other algorithms cannot. SVM is built to work on smaller, but highly complex data used in the creation of stronger, more efficient models.
So, what is SVM?
Support Vector Machine, or SVM, is a supervised ML algorithm useful for both regression and classification problems. However, it is primarily used for classification. In SVM, each data point is plotted in n-dimensional space (n representing the number of features available in the data), the coordinates of the point being the value of the feature. Classification is performed when we construct a hyper-plane to differentiate between the two classes.
Now, multiple planes could be created to classify the data points. The most efficient plane is the one with the maximum margin, that is the maximum distance from both the data points. This is called the optimal hyperplane.
But what about datasets with more than two features? Hyperplanes are just boundaries between the two classes, to separate the two data points. If input has two features, the plane is a line. But if it is 3, the hyperplane is a two-dimensional plane. As the number of features increase, the hyperplane becomes more complex and difficult to visualize.
If we over simplify things, the construction of a linear hyperplane between two classes is quite easy. However, it leads to another urgent dilemma: Do we need to prove this feature manually for the creation of a hyperplane? No, the SVM algorithm utilizes a brilliant method called the kernel trick. The SVM kernel is a function that takes low dimensional input space and transforms it to a higher dimensional space i.e. it converts non separable problem to separable problem. The kernels are mostly used to solve non-linear separation problems. To put into simpler terms, we leave it to the SVM model and its kerns to perform complex data transformations and come up with a process to separate the various classes using the outputs and labels defined by the user.
Here, SVM uses a circular plane to classify the data points. SVM uses an additional feature using the formula z= x^2+y^2 to come up with more appropriate data points. Plotting x axis and z axis would be something like this:
Therefore, we understand that SVM is not a one-trick pony, but can adapt itself to multiple complex forms of data for better classification.
Merits and Demerits of using SVM
1) SVM works perfectly for data that has a clear margin of separation.
2) Works well with multi-dimensional data.
3) It uses a smaller set of data to determine the decision function (i.e support vectors) and hence is memory effective.
1) SVM can be time consuming, hence is not the most suited for large data sets.
2) The efficiency decreases when we have overlapping data points, i.e. the data is noisy.
3) It doesn’t give the probability estimates, which have to be calculated using another, rather expensive method.
SVM in Python
Here, we’ll look at how to implement SVM in Python on the Iris dataset using the scikit-learn package.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features. We could
avoid this ugly slicing by using a two-dim dataset
y = iris.target
we create an instance of SVM and fit out data. We do not scale our data since we want to plot the support vectors
C = 1.0 # SVM regularization parameter
svc = svm.SVC(kernel=’linear’, C=1,gamma=0).fit(X, y)</i>
Plotting the kernel gives us this result, where a linear hyperplane is used to separate the data points. We can even use ‘rbf’ or ‘poly’ for non linear hyperplanes, however there is a risk of overfitting.
svc = svm.SVC(kernel=’rbf’, C=1,gamma=0).fit(X, y)
The SVM library has multiple parameters that determine the construction of the planes, and many of them can be altered according to the dataset. Do give this a try and play around with them to better understand Support Vector Machines.
Thus, we can conclude that Support Vector Machine is an amazing tool for a data scientist. In this article, we understood the conceptual working of the SVM and looked at how it segregates and classifies data points.
Recommended » Benford's Law