Patent attributes
A system for automatically generating entity collections comprises a data graph including entities connected by edges and instructions that cause the computer system to determine a set of entities from the data graph and to determine a set of constraints that has a quantity of constraints. A constraint in the set represents a path in the data graph shared by at least two of the entities in the set of entities. The instructions also cause the computer system to generate candidate collection definitions from combinations of the constraints, where each candidate collection definition identifies at least one constraint and no more than the quantity of constraints. The instructions also cause the computer system to determine an information gain for at least some of the candidate collection definitions, and store at least one candidate collection definition that has an information gain that meets a threshold as a candidate collection.