With the growth in the amount of network traffic and the increased sophistication of network-based attacks such as DDoS, Botnets, and advanced persistent threats, identifying and responding to all possible threats can be overwhelming for security analysts. Furthermore, the multi-dimensional nature of network traffic makes it difficult for analysts to accurately identify and rank the most important threats using each network source in isolation.
Detecting advanced threats requires correlating observations of multiple types of data including DNS, Flow, HTTP, TLS, etc. No single data type is sufficient on its own. However, as we combine more types of data, there are more variables to search for correlation, individually and in combination. This "curse of dimensionality" can result in faulty statistical reasoning leading to an increase in false positives and missed opportunities to identify and stop threats.
In this talk, we describe how recent multi-dimensional anomaly detection algorithms from machine learning can be used to combine traffic from multiple sources, while addressing the curse of dimensionality. Then, using an open-source platform of YAF, Apache Spark, and Apache Spot (incubating), we show how these algorithms can be used to provide effective focus for analysts and improve network outcomes