Patent 7760372 was granted and assigned to Xerox on July, 2010 by the United States Patent and Trademark Office.
Provided is a method for the automated selection of sample documents or pages from a large collection, and more particularly an application of the method in a proof presentment environment—where the method is employed for selection and review of representative or extreme pages from a large document, such as one scheduled for printing. The method characterizes pages or documents in a multi-dimensional vector space based upon a set of characteristics, and then uses clustering techniques to group the pages, enabling the selection of typical pages from the groups, outlier pages from extremes lying outside of the groups, or both typical and outlier pages.