I stumbled on this interesting post on reddit Is ML all about statistical regressions at its core?, and although I thought the responses were informative, they seemed to be a bit too focused for what I felt was the right answer to a philosophical question dressed in factual clothing.
I’d say that, yes, ML really is about regression. In it’s simplest form, regression is about approximating a function that represents the underlying relationship between different variables.
ML is all about finding the line, plane or hyperplane that represents the assumed relationship between the variables. The function is represented with a model (a decision tree, Naïve Bayes, linear regression, or similar) that performs some translation between some number of variables. Variation from the underlying true model can be explained away as either noise or as approximation error. This applies to supervised and and unsupervised learning. Unsupervised algorithms like k-Means or DBSCAN that generate an ad hoc relationship based on the input parameters are just reflecting the additional parametric inputs as input variables.
So, is this observation useful? It may be if you come from the Statistics side. I think it’s nice to be able to sum up and generalize ML algorithms as the search for a function qua model that represents the underlying data, but we already knew that, didn’t we? Call it a type of regression if you like, or if that helps you understand what’s going on, but I don’t think that helps you get closer to the model, which is what ML is all about.