Introduction to ML Part 1#

This tutorial is designed to provide a bird’s eye view of the ML packages landscape. The goal is not to give an in-depth explanation of all the features of each packages, but rather demonstrate the purpose of a few widely used ML packages. For more details, we refer the reader to the packages’ documentation and other online tutorials.

You can go through the Jupyter, Numpy and Matplotlib sections before the course starts, and then start studying the next sections after you have completed unit 1.

https://github.com/varal7/ml-tutorial

Jupyter#

Jupyter is not strictly speaking an ML package. It provides a browser front-end connected to an instance of IPython which allows REPL for quick testing, allows to create documents that intertwines code, output, images, and text. This is great for prototyping, demonstrations and tutorials, but terrible for actual coding.

6*7
42
def tokenize(text):
    return text.split(" ")
text = "In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley"
print(tokenize(text))
['In', 'a', 'shocking', 'finding,', 'scientist', 'discovered', 'a', 'herd', 'of', 'unicorns', 'living', 'in', 'a', 'remote,', 'previously', 'unexplored', 'valley']

Numpy#

import numpy as np

Numpy is desiged to handle large multidimensional arrays and enable efficient computations with them. In the back, it runs pre-compiled C code which is much faster than, say, a Python for loop

In the Numpy tutorial, we have covered the basics of Numpy, numpy arrays, element-wise operations, matrices operations and generating random matrices. In this section, we’ll cover indexing, slicing and broadcasting, which are useful concepts that will be reused in Pandas and PyTorch.

Indexing and slicing#

Numpy arrays can be indexed and sliced like regular python arrays

a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

#Number at index 2 is 3
print(a[2]) 

#['starting index', 'stop at position'] - note the position starts from 1 unline index that starts at 0
print(a[2:4]) 

# The 3rd aprameter is hops. So this will start at 3, then 2 hops to 5, then 7
print(a[2:7:2]) 

# if the positino is negative, the positioning would reverse. In this case -1 would mean 2nd last position 
print(a[2:-1]) 

# you can reverse the slice by starting with heigher index and small position
# negative hops would mean reverse direction
print(a[6:1:-2]) 
3
[3, 4]
[3, 5, 7]
[3, 4, 5, 6, 7, 8]
[7, 5, 3]
a_py = [1, 2, 3, 4, 5, 6, 7, 8, 9]
a_np = np.array(a_py)
print(a_py[3:7:2], a_np[3:7:2])
print(a_py[2:-1:2], a_np[2:-1:2])
print(a_py[::-1], a_np[::-1])
[4, 6] [4 6]
[3, 5, 7] [3 5 7]
[9, 8, 7, 6, 5, 4, 3, 2, 1] [9 8 7 6 5 4 3 2 1]

But you can also use arrays to index other arrays

idx = np.array([7,2])
a_np[idx]
array([8, 3])
# a_py[idx]

Which allows convenient querying, reindexing and even sorting

ages = np.random.randint(low=30, high=60, size=10)
heights = np.random.randint(low=150, high=210, size=10)

print(ages)
print(heights)
[35 57 44 56 57 37 48 32 45 54]
[168 204 188 204 177 186 160 207 186 182]
print(ages < 50)
[ True False  True False False  True  True  True  True False]
print(heights[ages < 50])
print(ages[ages < 50])
[168 188 186 160 207 186]
[35 44 37 48 32 45]
shuffled_idx = np.random.permutation(10)
print(shuffled_idx)
print(ages[shuffled_idx])
print(heights[shuffled_idx])
[6 1 8 2 7 5 4 3 0 9]
[48 57 45 44 32 37 57 56 35 54]
[160 204 186 188 207 186 177 204 168 182]
sorted_idx = np.argsort(ages)
print(sorted_idx)
print(ages[sorted_idx])
print(heights[sorted_idx])
[7 0 5 2 8 6 9 3 1 4]
[32 35 37 44 45 48 54 56 57 57]
[207 168 186 188 186 160 182 204 204 177]

Broadcasting#

When Numpy is asked to perform an operation between arrays of differents sizes, it “broadcasts” the smaller one to the bigger one.

a = np.array([4, 5, 6])
b = np.array([2, 2, 2])
a * b
array([ 8, 10, 12])
a = np.array([4, 5, 6])
b = 2
a * b
array([ 8, 10, 12])

The two snippets of code above are equivalent but the second is easier to read and also more efficient.

a = np.arange(10).reshape(1,10)
b = np.arange(12).reshape(12,1)
print(a)
print(b)
[[0 1 2 3 4 5 6 7 8 9]]
[[ 0]
 [ 1]
 [ 2]
 [ 3]
 [ 4]
 [ 5]
 [ 6]
 [ 7]
 [ 8]
 [ 9]
 [10]
 [11]]
print(a * b)
[[ 0  0  0  0  0  0  0  0  0  0]
 [ 0  1  2  3  4  5  6  7  8  9]
 [ 0  2  4  6  8 10 12 14 16 18]
 [ 0  3  6  9 12 15 18 21 24 27]
 [ 0  4  8 12 16 20 24 28 32 36]
 [ 0  5 10 15 20 25 30 35 40 45]
 [ 0  6 12 18 24 30 36 42 48 54]
 [ 0  7 14 21 28 35 42 49 56 63]
 [ 0  8 16 24 32 40 48 56 64 72]
 [ 0  9 18 27 36 45 54 63 72 81]
 [ 0 10 20 30 40 50 60 70 80 90]
 [ 0 11 22 33 44 55 66 77 88 99]]

Matplotlib#

%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [10, 7]

Matplotlib is the go-to library to produce plots with Python. It comes with two APIs: a MATLAB-like that a lot of people have learned to use and love, and an object-oriented API that we recommend using.

x = np.linspace(-2*np.pi, 2*np.pi, 400)
y = np.tanh(x)
fig, ax = plt.subplots()
ax.plot(x, y)
[<matplotlib.lines.Line2D at 0x11ba23490>]
../../_images/c8397da2531e45dd614e2a57bbef7874cecd62a2344c0643611728951ff0e3da.png

You can plot multiple subplots in the same figure, or multiple functions in the same subplot

x = np.linspace(0, 2*np.pi, 400)
y1 = np.tanh(x)
y2 = np.cos(x**2)
fig, axes = plt.subplots(1, 2, sharey=True)
axes[1].plot(x, y1)
axes[1].plot(x, -y1)
axes[0].plot(x, y2)
[<matplotlib.lines.Line2D at 0x11c515290>]
../../_images/a11fb44de6755276041fe291be0b031685f01cd0417ad473cbd2bf715056d2bd.png

Matplotlib also comes with a lot of different options to customize, the colors, the labels, the axes, etc.

For instance, see this introduction to matplotlib

Scikit-learn (read this after you have completed unit 1)#

Scikit-learn includes a number of features and utilities to kickstart your journey in Machine Learning.

A toy example#

from sklearn.datasets import make_blobs
X, y = make_blobs(n_samples=1000, centers=2, random_state=0)
X[:5], y[:5]
(array([[0.4666179 , 3.86571303],
        [2.84382807, 3.32650945],
        [0.61121486, 2.51245978],
        [3.81653365, 1.65175932],
        [1.28097244, 0.62827388]]),
 array([0, 0, 0, 1, 1]))
fig, ax = plt.subplots()
for label in [0, 1]:
    mask = (y == label)
    ax.scatter(X[mask, 0], X[mask, 1])
../../_images/687452f269159bad1972db74a63fa68c2ed56d7764b589755f910a3dd621ca18.png
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
fig, ax = plt.subplots()
for label in [0, 1]:
    mask = (y_train == label)
    ax.scatter(X_train[mask, 0], X_train[mask, 1])
for label in [0, 1]:
    mask = (y_test == label)
    ax.scatter(X_test[mask, 0], X_test[mask, 1])
../../_images/b76513830c55bfbe07feabe230bd114cdf773c232f0db386a957564495277734.png

Sklearn uses a uniform and very consistent API, making it easy to switch algorithms

For instance, training and predicting with a perceptron.

from sklearn.linear_model import Perceptron
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score
clf = Perceptron(max_iter=40, random_state=0)
# clf = LinearSVC(max_iter=40, random_state=0)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

print('Test accuracy: %.4f' % accuracy_score(y_test, y_pred))
Test accuracy: 0.9400
theta = clf.coef_[0]
theta_0 = clf.intercept_
fig, ax = plt.subplots()
for label in [0, 1]:
    mask = (y_train == label)
    ax.scatter(X_train[mask, 0], X_train[mask, 1])
for label in [0, 1]:
    mask = (y_test == label)
    ax.scatter(X_test[mask, 0], X_test[mask, 1])
x_bnd = np.linspace(X[:, 0].min() - 1, X[:, 0].max() + 1,  400)
y_bnd = - x_bnd * (theta[0] /theta[1]) - (theta_0 / theta[1])
ax.plot(x_bnd, y_bnd)
[<matplotlib.lines.Line2D at 0x13c6cd550>]
../../_images/b8b0a65a398fb5bc4e827393758d5dec166b11e44ff97ebdf5670de368b6eed7.png

Another toy example#

X, y = make_blobs(n_samples=500, centers=3, random_state=7)
y[y==2] = 0
fig, ax = plt.subplots()
for label in [0, 1]:
    mask = (y == label)
    ax.scatter(X[mask, 0], X[mask, 1])
../../_images/0d1c033d51fbaf114de6d0af335c304c48c34ed4de3eecad71a6b0700265660d.png
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
from sklearn.svm import SVC
# clf = SVC(kernel="linear", random_state=0)
clf = SVC(kernel="rbf", random_state=0)
clf.fit(X_train, y_train)
SVC(random_state=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
y_pred = clf.predict(X_test)

print('Test accuracy: %.4f' % accuracy_score(y_test, y_pred))
Test accuracy: 1.0000
x_min = X[:, 0].min()
x_max = X[:, 0].max()
y_min = X[:, 1].min()
y_max = X[:, 1].max()

XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]
Z = clf.decision_function(np.c_[XX.ravel(), YY.ravel()])

fig, ax = plt.subplots()
for label in [0, 1]:
    mask = (y == label)
    ax.scatter(X[mask, 0], X[mask, 1])
    
Z = Z.reshape(XX.shape)
ax.contour(XX, YY, Z, colors="black",
    linestyles=['--', '-', '--'], levels=[-.5, 0, .5])
<matplotlib.contour.QuadContourSet at 0x13ce4b590>
../../_images/e0c8901f0fcd1636effde86c8c9e6f29d1e255d585982175c61e62a3c3e162de.png

Classify digits#

# from sklearn.datasets import load_breast_cancer
# breast_cancer = load_breast_cancer()
# X, y = breast_cancer.data, breast_cancer.target
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
from sklearn.datasets import load_digits
digits = load_digits()
X, y = digits.data, digits.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
fig, ax = plt.subplots()
ax.matshow(digits.images[0])
<matplotlib.image.AxesImage at 0x13d045d90>
../../_images/6515d529014144847c26b9b233472a6ad4932eaeb5b9dba843b4daf8ae3c1883.png
X_train.shape
(1437, 64)
clf = Perceptron(max_iter=40, random_state=0)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print('Accuracy: %.4f' % accuracy_score(y_test, y_pred))
Accuracy: 0.9389
clf = LinearSVC(C=1, random_state=0)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print('Accuracy: %.4f' % accuracy_score(y_test, y_pred))
Accuracy: 0.9333
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, clf.predict(X_test))
array([[27,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0, 32,  0,  0,  0,  0,  1,  0,  1,  1],
       [ 0,  1, 33,  2,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0, 29,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0, 30,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0, 38,  1,  0,  0,  1],
       [ 0,  1,  0,  0,  0,  0, 43,  0,  0,  0],
       [ 0,  1,  0,  0,  1,  0,  0, 37,  0,  0],
       [ 0,  3,  1,  1,  0,  0,  2,  0, 29,  3],
       [ 0,  0,  0,  2,  0,  1,  0,  0,  0, 38]])

Scikit-learn also includes utilities to quickly compute a cross validation score…

clf = LinearSVC(C=1, random_state=0)
from sklearn.model_selection import cross_val_score
scores =  cross_val_score(clf, X_train, y_train, cv=5)
print("Mean: %.4f, Std: %.4f" % (np.mean(scores), np.std(scores)))
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
Mean: 0.9436, Std: 0.0158
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
clf = LinearSVC(C=0.1, random_state=0)
scores =  cross_val_score(clf, X_train, y_train, cv=5)
print("Mean: %.4f, Std: %.4f" % (np.mean(scores), np.std(scores)))
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
Mean: 0.9534, Std: 0.0065
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(

… or to perform a grid search

from sklearn.model_selection import GridSearchCV
clf = LinearSVC(random_state=0)
param_grid = {'C': 10. ** np.arange(-6, 4)}
grid_search = GridSearchCV(clf, param_grid=param_grid, cv=5, verbose=3, return_train_score=True)
grid_search.fit(X_train, y_train);
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV 1/5] END .......C=1e-06;, score=(train=0.883, test=0.844) total time=   0.0s
[CV 2/5] END .......C=1e-06;, score=(train=0.880, test=0.865) total time=   0.0s
[CV 3/5] END .......C=1e-06;, score=(train=0.869, test=0.909) total time=   0.0s
[CV 4/5] END .......C=1e-06;, score=(train=0.880, test=0.840) total time=   0.0s
[CV 5/5] END .......C=1e-06;, score=(train=0.873, test=0.899) total time=   0.0s
[CV 1/5] END .......C=1e-05;, score=(train=0.929, test=0.906) total time=   0.0s
[CV 2/5] END .......C=1e-05;, score=(train=0.930, test=0.917) total time=   0.0s
[CV 3/5] END .......C=1e-05;, score=(train=0.923, test=0.944) total time=   0.0s
[CV 4/5] END .......C=1e-05;, score=(train=0.933, test=0.906) total time=   0.0s
[CV 5/5] END .......C=1e-05;, score=(train=0.925, test=0.937) total time=   0.0s
[CV 1/5] END ......C=0.0001;, score=(train=0.963, test=0.938) total time=   0.0s
[CV 2/5] END ......C=0.0001;, score=(train=0.964, test=0.941) total time=   0.0s
[CV 3/5] END ......C=0.0001;, score=(train=0.960, test=0.969) total time=   0.0s
[CV 4/5] END ......C=0.0001;, score=(train=0.962, test=0.937) total time=   0.0s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 5/5] END ......C=0.0001;, score=(train=0.957, test=0.969) total time=   0.0s
[CV 1/5] END .......C=0.001;, score=(train=0.983, test=0.962) total time=   0.0s
[CV 2/5] END .......C=0.001;, score=(train=0.983, test=0.965) total time=   0.0s
[CV 3/5] END .......C=0.001;, score=(train=0.983, test=0.969) total time=   0.0s
[CV 4/5] END .......C=0.001;, score=(train=0.981, test=0.962) total time=   0.0s
[CV 5/5] END .......C=0.001;, score=(train=0.982, test=0.979) total time=   0.0s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 1/5] END ........C=0.01;, score=(train=0.992, test=0.958) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 2/5] END ........C=0.01;, score=(train=0.994, test=0.962) total time=   0.1s
[CV 3/5] END ........C=0.01;, score=(train=0.996, test=0.969) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 4/5] END ........C=0.01;, score=(train=0.993, test=0.962) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 5/5] END ........C=0.01;, score=(train=0.994, test=0.972) total time=   0.1s
[CV 1/5] END .........C=0.1;, score=(train=0.997, test=0.948) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 2/5] END .........C=0.1;, score=(train=0.996, test=0.965) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 3/5] END .........C=0.1;, score=(train=0.998, test=0.948) total time=   0.1s
[CV 4/5] END .........C=0.1;, score=(train=0.992, test=0.955) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 5/5] END .........C=0.1;, score=(train=0.996, test=0.951) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 1/5] END .........C=1.0;, score=(train=0.993, test=0.938) total time=   0.1s
[CV 2/5] END .........C=1.0;, score=(train=0.994, test=0.969) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 3/5] END .........C=1.0;, score=(train=0.989, test=0.920) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 4/5] END .........C=1.0;, score=(train=0.994, test=0.948) total time=   0.1s
[CV 5/5] END .........C=1.0;, score=(train=0.995, test=0.944) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 1/5] END ........C=10.0;, score=(train=0.998, test=0.944) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 2/5] END ........C=10.0;, score=(train=0.997, test=0.972) total time=   0.1s
[CV 3/5] END ........C=10.0;, score=(train=0.997, test=0.941) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 4/5] END ........C=10.0;, score=(train=0.977, test=0.951) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 5/5] END ........C=10.0;, score=(train=0.994, test=0.944) total time=   0.1s
[CV 1/5] END .......C=100.0;, score=(train=0.993, test=0.941) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 2/5] END .......C=100.0;, score=(train=0.992, test=0.965) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 3/5] END .......C=100.0;, score=(train=0.997, test=0.937) total time=   0.1s
[CV 4/5] END .......C=100.0;, score=(train=0.990, test=0.951) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 5/5] END .......C=100.0;, score=(train=0.993, test=0.948) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 1/5] END ......C=1000.0;, score=(train=0.988, test=0.931) total time=   0.1s
[CV 2/5] END ......C=1000.0;, score=(train=0.989, test=0.955) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 3/5] END ......C=1000.0;, score=(train=0.994, test=0.916) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
[CV 4/5] END ......C=1000.0;, score=(train=0.994, test=0.944) total time=   0.1s
[CV 5/5] END ......C=1000.0;, score=(train=0.997, test=0.951) total time=   0.1s
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
/Users/n03an/.pyenv/versions/3.11.5/lib/python3.11/site-packages/sklearn/svm/_classes.py:32: FutureWarning: The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.
  warnings.warn(
print(grid_search.best_params_)
{'C': 0.001}
print(grid_search.best_score_)
0.9672981997677119
y_pred = grid_search.predict(X_test)
print('Accuracy: %.4f' % accuracy_score(y_test, y_pred))
Accuracy: 0.9639

And a lot more features! We have only looked at some classification models and some model selection features, but sklearn can also be used for regression,

Pandas#

import pandas as pd

Pandas is a library that provides a set of tools for data analysis (Python Data Analysis Library).

Pandas dataframes can be created by importing a CSV file (or TSV, or JSON, or SQL, etc.)

# df = pd.read_csv("file.csv")

Pandas dataframes can also be created directly from a dictionary of arrays.

print(grid_search.cv_results_)
{'mean_fit_time': array([0.012708  , 0.01136432, 0.01144166, 0.02166719, 0.07240119,
       0.06853347, 0.06847544, 0.06964402, 0.07199402, 0.07125392]), 'std_fit_time': array([0.00119576, 0.00028361, 0.00069896, 0.00088312, 0.0011855 ,
       0.00112866, 0.00268612, 0.00313414, 0.00544413, 0.00527056]), 'mean_score_time': array([0.00078311, 0.00059404, 0.00057449, 0.00061421, 0.00077777,
       0.00080886, 0.00088906, 0.00090394, 0.00081658, 0.00091605]), 'std_score_time': array([2.96987588e-04, 9.94822884e-05, 3.36971073e-05, 2.81335540e-05,
       6.25741177e-05, 5.08916395e-05, 2.31639276e-04, 1.11696659e-04,
       4.55138507e-05, 1.53209449e-04]), 'param_C': masked_array(data=[1e-06, 1e-05, 0.0001, 0.001, 0.01, 0.1, 1.0, 10.0,
                   100.0, 1000.0],
             mask=[False, False, False, False, False, False, False, False,
                   False, False],
       fill_value='?',
            dtype=object), 'params': [{'C': 1e-06}, {'C': 1e-05}, {'C': 0.0001}, {'C': 0.001}, {'C': 0.01}, {'C': 0.1}, {'C': 1.0}, {'C': 10.0}, {'C': 100.0}, {'C': 1000.0}], 'split0_test_score': array([0.84375   , 0.90625   , 0.9375    , 0.96180556, 0.95833333,
       0.94791667, 0.9375    , 0.94444444, 0.94097222, 0.93055556]), 'split1_test_score': array([0.86458333, 0.91666667, 0.94097222, 0.96527778, 0.96180556,
       0.96527778, 0.96875   , 0.97222222, 0.96527778, 0.95486111]), 'split2_test_score': array([0.90940767, 0.94425087, 0.96864111, 0.96864111, 0.96864111,
       0.94773519, 0.91986063, 0.94076655, 0.93728223, 0.91637631]), 'split3_test_score': array([0.83972125, 0.90592334, 0.93728223, 0.96167247, 0.96167247,
       0.95470383, 0.94773519, 0.95121951, 0.95121951, 0.94425087]), 'split4_test_score': array([0.8989547 , 0.93728223, 0.96864111, 0.97909408, 0.97212544,
       0.95121951, 0.94425087, 0.94425087, 0.94773519, 0.95121951]), 'mean_test_score': array([0.87128339, 0.92207462, 0.95060734, 0.9672982 , 0.96451558,
       0.9533706 , 0.94361934, 0.95058072, 0.94849739, 0.93945267]), 'std_test_score': array([0.02834893, 0.01589708, 0.01478263, 0.00643189, 0.00507105,
       0.00647594, 0.01581652, 0.01133844, 0.00971576, 0.01421674]), 'rank_test_score': array([10,  9,  4,  1,  2,  3,  7,  5,  6,  8], dtype=int32), 'split0_train_score': array([0.88250653, 0.92863359, 0.96344648, 0.98259356, 0.9921671 ,
       0.99738903, 0.99303742, 0.99825936, 0.99303742, 0.98781549]), 'split1_train_score': array([0.87989556, 0.93037424, 0.9643168 , 0.98259356, 0.99390775,
       0.99564839, 0.99390775, 0.99651871, 0.9921671 , 0.98868581]), 'split2_train_score': array([0.86869565, 0.9226087 , 0.96      , 0.9826087 , 0.99565217,
       0.99826087, 0.98869565, 0.9973913 , 0.99652174, 0.99391304]), 'split3_train_score': array([0.88      , 0.93304348, 0.96173913, 0.98086957, 0.99304348,
       0.99217391, 0.99391304, 0.9773913 , 0.98956522, 0.99391304]), 'split4_train_score': array([0.87304348, 0.92521739, 0.9573913 , 0.98173913, 0.99391304,
       0.99565217, 0.99478261, 0.99391304, 0.99304348, 0.99652174]), 'mean_train_score': array([0.87682824, 0.92797548, 0.96137874, 0.9820809 , 0.99373671,
       0.99582488, 0.99286729, 0.99269474, 0.99286699, 0.99216983]), 'std_train_score': array([0.0051415 , 0.00369543, 0.00248347, 0.00069114, 0.0011553 ,
       0.00208668, 0.0021576 , 0.00778877, 0.00222751, 0.00335008])}
df = pd.DataFrame(grid_search.cv_results_)
df
mean_fit_time std_fit_time mean_score_time std_score_time param_C params split0_test_score split1_test_score split2_test_score split3_test_score ... mean_test_score std_test_score rank_test_score split0_train_score split1_train_score split2_train_score split3_train_score split4_train_score mean_train_score std_train_score
0 0.012708 0.001196 0.000783 0.000297 0.000001 {'C': 1e-06} 0.843750 0.864583 0.909408 0.839721 ... 0.871283 0.028349 10 0.882507 0.879896 0.868696 0.880000 0.873043 0.876828 0.005141
1 0.011364 0.000284 0.000594 0.000099 0.00001 {'C': 1e-05} 0.906250 0.916667 0.944251 0.905923 ... 0.922075 0.015897 9 0.928634 0.930374 0.922609 0.933043 0.925217 0.927975 0.003695
2 0.011442 0.000699 0.000574 0.000034 0.0001 {'C': 0.0001} 0.937500 0.940972 0.968641 0.937282 ... 0.950607 0.014783 4 0.963446 0.964317 0.960000 0.961739 0.957391 0.961379 0.002483
3 0.021667 0.000883 0.000614 0.000028 0.001 {'C': 0.001} 0.961806 0.965278 0.968641 0.961672 ... 0.967298 0.006432 1 0.982594 0.982594 0.982609 0.980870 0.981739 0.982081 0.000691
4 0.072401 0.001186 0.000778 0.000063 0.01 {'C': 0.01} 0.958333 0.961806 0.968641 0.961672 ... 0.964516 0.005071 2 0.992167 0.993908 0.995652 0.993043 0.993913 0.993737 0.001155
5 0.068533 0.001129 0.000809 0.000051 0.1 {'C': 0.1} 0.947917 0.965278 0.947735 0.954704 ... 0.953371 0.006476 3 0.997389 0.995648 0.998261 0.992174 0.995652 0.995825 0.002087
6 0.068475 0.002686 0.000889 0.000232 1.0 {'C': 1.0} 0.937500 0.968750 0.919861 0.947735 ... 0.943619 0.015817 7 0.993037 0.993908 0.988696 0.993913 0.994783 0.992867 0.002158
7 0.069644 0.003134 0.000904 0.000112 10.0 {'C': 10.0} 0.944444 0.972222 0.940767 0.951220 ... 0.950581 0.011338 5 0.998259 0.996519 0.997391 0.977391 0.993913 0.992695 0.007789
8 0.071994 0.005444 0.000817 0.000046 100.0 {'C': 100.0} 0.940972 0.965278 0.937282 0.951220 ... 0.948497 0.009716 6 0.993037 0.992167 0.996522 0.989565 0.993043 0.992867 0.002228
9 0.071254 0.005271 0.000916 0.000153 1000.0 {'C': 1000.0} 0.930556 0.954861 0.916376 0.944251 ... 0.939453 0.014217 8 0.987815 0.988686 0.993913 0.993913 0.996522 0.992170 0.003350

10 rows × 21 columns

Pandas columns are also Numpy arrays, so they obey to the same indexing magic

df[df['param_C'] < 0.01]
mean_fit_time std_fit_time mean_score_time std_score_time param_C params split0_test_score split1_test_score split2_test_score split3_test_score ... mean_test_score std_test_score rank_test_score split0_train_score split1_train_score split2_train_score split3_train_score split4_train_score mean_train_score std_train_score
0 0.012708 0.001196 0.000783 0.000297 0.000001 {'C': 1e-06} 0.843750 0.864583 0.909408 0.839721 ... 0.871283 0.028349 10 0.882507 0.879896 0.868696 0.880000 0.873043 0.876828 0.005141
1 0.011364 0.000284 0.000594 0.000099 0.00001 {'C': 1e-05} 0.906250 0.916667 0.944251 0.905923 ... 0.922075 0.015897 9 0.928634 0.930374 0.922609 0.933043 0.925217 0.927975 0.003695
2 0.011442 0.000699 0.000574 0.000034 0.0001 {'C': 0.0001} 0.937500 0.940972 0.968641 0.937282 ... 0.950607 0.014783 4 0.963446 0.964317 0.960000 0.961739 0.957391 0.961379 0.002483
3 0.021667 0.000883 0.000614 0.000028 0.001 {'C': 0.001} 0.961806 0.965278 0.968641 0.961672 ... 0.967298 0.006432 1 0.982594 0.982594 0.982609 0.980870 0.981739 0.982081 0.000691

4 rows × 21 columns

They also provide most functionality you would expect as database user (df.sort_values, df.groupby, df.join, df.concat, etc.)

fig, ax = plt.subplots()
ax.plot(df['mean_test_score'], label="validation error")
ax.plot(df['mean_train_score'], label="training error")
ax.set_xticklabels(df['param_C'])
ax.set_xlabel("C")
ax.set_ylabel("Accuracy")
ax.legend(loc='best');
/var/folders/1_/swvbhq057ls2d5l5wv0zs_qr0000gn/T/ipykernel_34530/3467822884.py:4: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
  ax.set_xticklabels(df['param_C'])
../../_images/a4b96772004dcef1061de385fbf6df8af010744b44671f96d51e18b231c706ad.png

Other packages#

Other packages that didn’t make the cut:

  • Plotly and Seaborn: two other plotting libraries

  • Scipy: a science library built on top of Numpy

  • Scrapy: a web crawling library

  • pdb: a debugger for python (not ML-specific but terribly useful)

  • tqdm: a progress bar (not ML-specific)

Next time: