Detecting DGAs is like Forecasting Weather
By Philip Qian, Infoblox Senior Product Manager
DGAs (Domain Generation Algorithms) are rendezvous domains for malware and hacker-controlled-servers to communicate, generated by rules or algorithms, usually encoded/encrypted and often have a short life span. Hackers use DGAs to evade the detection or blocking from static blacklist-based systems, for example, (some) firewalls using threat intelligence data that does not get updated as frequently as needed.
It is not easy to detect DGAs because they satisfy the DNS protocol in every manner and there is usually no signature to identify them. This is why we use artificial intelligence/machine learning technology to detect them.
If you look at some DGA domains you can find that quite often they appear to be collections of random characters, and because of that, from lexical analysis point of view they are statically quite different from normal domains. On top of that, they usually do not resolve to IP addresses since most of them are not even registered. Both of these are very important characteristics, or “features” as called in machine learning, to be used when we train the machine learning models.
Machine learning works similar to the way human beings forecasted weather before certain modern technologies, including sensors and computers were invented. We observed lots of similar independent signals such as temperature, wind direction and animal behavior then associated them based on very long term of observation. If you think of some weather proverbs you will know what I mean.
Just like weather forecasting, the effectiveness of using machine learning may have false detections as the training data might be biased or not applicable. Since DGAs are early indicator of malicious activities, such false detections can be handled depending on users’ risk tolerance levels.