Media unit retrieval methods, systems and computer program products are provided that allow a user to search for an item by iteratively presenting media units such as images representing items to the user and receiving user input consisting of selections of the presented media units (including possibly the empty selection). Features, or attributes, a user is interested in, for example semantic features, are inferred from the interaction and media units are retrieved for presentation based on similarity with user-selected media units, through sampling of a probability distribution describing the intent or interests, or combinations of approaches. Accordingly, the user-experience is akin to a conversation about what the user is looking for. Retrieval may be based on both selected and unselected media units and the selection may comprise making a selection with a single action. Further, a database of media units can capture similarity relationships for efficient media unit retrieval.