Lately, I’ve started to learn some machine learning. As an intro, I’ve completed one of the coursera MOOCs devoted to that, in particular the Machine Learning course designed by Stanford University. I find MOOCs super useful introductory courses. It was a nice experience and I learnt many things I didn’t expect to learn. In first place: Octave. As Blas Benito told me on Twitter, Octave is a fully-developed programming language ‘compatible’ with Matlab (I grossly labelled it as a ‘open-source version of Matlab’). I worked a bit with Matlab during my PhD but then I abandoned all my scripts because it isn’t free, I wish I had known back then Octave. Anyway, the whole Machine Learning course is developed with scripts and exercises in Octave (which is quite intuitive if you already know R).
Now, to better integrate the exercises and just in case I want to use some of the methods in my future research, I’ve translated all the exercises and scripts to R. You can find them in my Gitlab site. Over the next few weeks I hope I’ll get to translate them to Python too, as this could help me to catch-up with Python. In the near future I’d also like to develop these scripts to accommodate multivariate data as dependent values (maybe at the same time I write the Python functions).
One of the things that really got my attention during this course was the first lesson, on linear regression. I already knew how to run a linear regression from scratch and to estimate analytically the least-squares linear function, with its slope and intercept. When I saw they were explaining an iterative process to obtain the best-fitting linear function I got the impression that they were using an unnecessary complicated process based on brute-force principles to estimate something pretty simple. When things got complicated later, I understood that this was a necessary introduction for more complex cases.
I still think mathematicians and statisticians would be puzzled by the long procedures used by data scientists, while computer scientists might be amazed by the efficiency of such a long array of algorithms ran on high-dimensional data. At the same time, I don’t think mathematicians and statisticians have a clear answer for many of the questions where high-dimensional data is involved while computer scientists, even if by unsophisticated means, have found their way into largely unexplored areas with reasonable efficiency. Maybe trying to figure out an analytical way of explaining the algorithmic results would be the golden ticket. Meanwhile, I’ll stick to algorithms whenever I need them but there’s no way I’ll abandon normal equations for linear regression.