Patent attributes
Methods and systems for determining relevance for a new document are described. Existing documents that have a high probability of relevance can be chosen. A vocabulary of words in the existing documents can be built. Each word can be mapped into a vector such that each existing document can be represented by a sequence of vectors and each sentence and/or paragraph in each existing document can be represented by a subsequence of vectors including a subset of the sequence of vectors. Data augmentation can be applied changing an order of the subsequences in order to create additional documents represented by the subsequences. A deep neural network can be trained using the subsequences that represent the existing documents and the subsequences that represent additional documents. The new documents can be trained using a trained deep neural network. A relevant document can be output using the trained deep neural network.