Determining malicious activity in a monitored network using clustering algorithmic techniques in which a source of known malicious network entities and known legitimate network entities associated with network traffic flow are provided. A dataset is generated consisting of a plurality of known malicious network entities and a plurality of known legitimate network entities. Network related attributes are identified associated with each of the plurality of malicious network entities and the plurality of legitimate network entities contained in the generated dataset. A predetermined number (X) of clusters is generated based upon the plurality of malicious (bad) and legitimate (good) network entities. A generated cluster is tagged with a bad, good or an unknown tag. If a generated cluster is determined assigned a bad tag, it is then stored it in a database and assigned a clusterID for future use in machine learning techniques for detecting network attacks upon the monitored network.