10. Support Vector Machines (SVM)#
Support Vector Machines are among the best “out-of-the-box” classifiers available. They rely on elegant geometric concepts to find the optimal boundary between classes.
10.1. 1. The Hyperplane Concept#
In 2 Dimensions, a hyperplane is a flat Line.
In 3 Dimensions, a hyperplane is a flat Plane.
In p Dimensions, it is a \(p-1\) dimensional subspace.
We want to find a hyperplane that separates our two classes of data (e.g., Blue dots vs Red dots) perfectly.
10.2. 2. Maximal Margin Classifier#
Usually, there are infinite lines that can separate two perfectly separated clusters. Which one is best?
Intuition: We want the “widest street” possible.
We find a separator such that the distance to the nearest training data points is maximized.
This distance is called the Margin.
The classifier is the “center line” of this street.
10.2.1. Support Vectors#
Interestingly, the position of the decision boundary depends only on the few observations that are closest to the line.
These closest points are called Support Vectors.
Points far away from the boundary do not affect the model at all. This makes SVM distinct from Logistic Regression (where all points contribute).
10.3. 3. The Kernel Trick (Non-Linearity)#
What if data isn’t separable by a straight line? (e.g., data looks like a circle inside a ring).
Solution: Project the data into a higher dimension.
Imagine 2D data on a sheet of paper. You can’t draw a straight line to separate inner and outer rings.
“Lift” the inner ring up into 3D space.
Now you can slide a flat sheet (hyperplane) between them!
This mathematical projection is handled efficiently using Kernels.
Common Kernels:
Linear: Standard straight line.
Polynomial: Curved lines.
Radial Basis Function (RBF): Can create complex, island-like shapes.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# 1. Generate Non-Linear Data (Moons)
X, y = make_moons(n_samples=400, noise=0.2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Visualize Data
plt.figure(figsize=(8,6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')
plt.title('Data that is NOT linearly separable')
plt.show()
# Helper function to visualize boundaries
def plot_decision_boundary(model, X, y, title="Boundary"):
h = .02
x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(8,6))
plt.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')
plt.title(title)
plt.show()
10.3.1. Comparing Kernels#
We will try a Linear Kernel (which should fail) and an RBF Kernel (which should succeed).
# 2. Linear Kernel
svm_linear = SVC(kernel='linear', C=1.0)
svm_linear.fit(X_train, y_train)
acc_linear = accuracy_score(y_test, svm_linear.predict(X_test))
plot_decision_boundary(svm_linear, X, y, f"Linear Kernel (Acc: {acc_linear:.2f})")
# 3. RBF Kernel (Radial Basis Function)
svm_rbf = SVC(kernel='rbf', gamma=2, C=1.0)
svm_rbf.fit(X_train, y_train)
acc_rbf = accuracy_score(y_test, svm_rbf.predict(X_test))
plot_decision_boundary(svm_rbf, X, y, f"RBF Kernel (Acc: {acc_rbf:.2f})")
10.4. 4. Hyperparameters#
Two key parameters control SVM behavior:
C (Cost): Controls how strict we are about the margin.
Low C: “Soft margin”. Allows more errors/violations used to find a wider street. Generally generalizes better (Low Variance).
High C: “Hard margin”. Strict. Tries to classify everything correctly. Can overfit (High Variance).
Gamma (for RBF): Defines how far the influence of a single training example reaches.
Low Gamma: Far reach. Smoother decision boundary.
High Gamma: Close reach. Boundary hugs data points tightly (can lead to “islands” around points).
10.5. 5. Quiz#
Q1. Which Kernel is best suited for concentric circle data (one circle inside another)? A) Linear Kernel B) RBF or Polynomial Kernel C) No SVM can handle this.
Q2. What are “Support Vectors”? A) All data points in the training set. B) The data points closest to the decision boundary (margin). C) The misclassified points only.
Q3. If your SVM is overfitting (memorizing the noise), what should you try? A) Increase C (make it stricter). B) Decrease C (allow wider margin/smoother boundary). C) Use a more complex Kernel.
10.5.1. Sample Answers#
Q1: B). RBF/Polynomial project data to higher dims where circles become separable planes. Q2: B). They literally “support” or define the boundary. Q3: B). Decreasing C increases regularization (wider margin), reducing overfitting.