Aspects of the present disclosure relate to machine learning techniques for identifying collections of items, such as furniture items, that are visually complementary. These techniques can rely on computer vision and item imagery. For example, a first portion of a machine learning system can be trained to extract aesthetic item qualities or attributes from pixel values of images of the items. A second portion of the machine learning system can learn correlations between these extracted aesthetic qualities and the level of visual coordination between items. Thus, the disclosed techniques use computer vision machine learning to programmatically determine whether items visually coordinate with one another based on pixel values of images of those items.