SVMs with Julia

We’re almost at the end of another month of Julia here on MachineLearningGeek, and we’ll be quickly covering Support Vector Machines today.

Today we’ll be showcasing another machine learning library for Julia. Much like in Python you’d use SciKitLearn for simple algorithms like an SVM, while you’d use PyTorch or the likes for neural networks. We have MLJ.jl for simpler algorithms, while Flux.jl is reserved for neural networks. (There’s also MLJFlux for when you want to put both of them together). So lets get started.

As is the custom, we first import the packages we’ll be needing later, MLJ will be our workhorse, and Gafly we’ll use for plotting.

using MLJ
import RDatasets: dataset
using Gadfly

I prefer to use random values while learning, and so I define a function to initiate my variables like that:

function init()
	X = randn(40, 2)
	y = vcat(-ones(20), ones(20))
	return X,y
	end

	X,y=init()

Today we’re going to use a slightly different way of starting off our data, we’ll use y as a mask:

ym1 = y .== -1
ym2 = .!ym1

And then plot our data

plot(
layer(x=X[ym1, 1], y=X[ym1, 2], ),
layer(x=X[ym2, 1], y=X[ym2, 2], Theme(default_color=colorant"red")),
 #Scale.shape_discrete(levels=["+ve","-ve"]),
 Guide.manual_color_key("", ["+ve", "-ve"], ["deepskyblue","red"]),
	point_shapes=[Shape.circle, Shape.star1])

The reason we constructed our data in such a convoluted way was that in order to use our data with MLJ, we’re going to have to put it in a format that MLJ likes. So, now we do that by converting into tables:

tX = MLJ.table(X)
ty = categorical(y);

Finally, we do the modelling. So, you should know that just like Plots.jl, MLJ.jl, instead of writing everything from scratch, relies on pre-existing libraries. For SVMs, that pre-existing library is LIBSVM, which is pretty standard. So, we load it using the load macro provided by MLJ:

@load SVC pkg=LIBSVM

Then we instantiate it and feed the data:

svc_mdl = SVC()
svc = machine(svc_mdl, tX, ty)

To fit the data is as simple as:

fit!(svc);

And we can check the accuracy of our model by using the misclassification_rate function provided by MLJ

ypred = predict(svc, tX)
misclassification_rate(ypred, ty)

And you can see that even on random data, we get a 0.1 accuracy. That’s pretty good! You can check out the Pluto notebook at the following link:

https://github.com/mathmetal/Misc/blob/master/MLG/SimpleSVM.jl

Leave a Reply

Your email address will not be published.