An apparatus, system, and method are disclosed for identifying content within a scanned document. The apparatus includes a modification module, an identification module, and a segmentation module. The modification module creates a modified content data set through application of a sigmoid function to a scanned content data set. The identification module identifies a content segment within the modified content data set. The segmentation module identifies a content segment type of the content segment. Exemplary content segment types include text, line art, and images.