Patent attributes
Systems and methods for very fast grouping of “similar” SQL queries according to user-supplied similarity criteria. The user-supplied similarity criteria include a threshold quantifying the degree of similarity between SQL queries and common artifacts included in the queries. A similarity-characterizing data structure allows for the very fast grouping of “similar” SQL queries. Because the computation is distributed among multiple compute nodes, a small cluster of compute nodes takes a short time to compute the similarity-characterizing data on a workload of tens of millions of queries. The user can supply the similarity criteria through a UI or a command line tool. Furthermore, the user can adjust the degree of similarity by supplying new similarity criteria. Accordingly, the system can display in real time or near real time, updated SQL groupings corresponding to the newly supplied similarity criteria using the originally computed similarity-characterizing data structure.