A method, computer program product, and computing system for receiving audio-based content from a user who is reviewing an image on a display screen; receiving gaze information that defines a gaze location of the user; and temporally aligning the audio-based content and the gaze information to form location-based content.