As I mentioned in my post ‘A Dataset for Teaching Clustering – The Fruit Dataset‘, one of the major challenges in understanding clustering results is the difficulty of visualizing these results in a way that lets you directly see if the clustering algorithm has grouped datapoints in a way that is relevant to your specific context.
In the fruit dataset, there is an image of each object in question, which makes it much easier to understand the results. Most datasets do not come with included images, however. In this case, what strategies might be used to visually understand the clustering results?
My colleague Rich Webster spends a lot of time thinking about new and innovative ways to visualize data, and during a discussion about ways to visualize high level patterns in datasets he mentioned a new package created by his colleague Nick Barrowman: vtree. vtree enables visualization of the overall composition of datasets through the creation of variable trees. These variable trees then allow people to visualize how different factors in the dataset interact with each other to partition the dataset into various subsets with particular qualities.
In the context of clustering, this strategy could also be used to visualize the composition of clusters (which are dataset subsets) relative to the dataset as a whole. This would seem to open up new avenues for a more intuitive understanding of clustering results. I look forward to seeing what comes out of combining vtree and clustering analysis and will aim to post a blog post with an update as new work on this front develops.