Disclosed is a machine learning architecture for a two-dimensional image protocol detector configured to receive a first image representing at least a portion of a mouth of a user, and output user feedback for capturing a second image representing a portion of the mouth of the user, where the machine learning architecture outputs the user feedback in response to an image quality score of the first image not satisfying an image quality threshold.