Introduction to Clustering
When it comes to using mathematical and statistical language and formalism, supervised learning slots into a role akin to that played by physics: complex, yes, but quite well-suited to the language and with a long history of applications. Unsupervised learning (such as clustering) is more similar to biology: it has not been studied with the same formalism and to the same extent, because it is, quite simply, harder to do so (not in the sense that the algorithms are too complicated, but in the sense that their results are harder to validate). Interest in those methods is increasing, however. In this MCT, we discuss the basics of clustering and tackle some of its issues and challenges. We also introduce k-Means, hierarchical clustering, and discuss clustering validation.