In 2008, Eva Chen, Trend Micro’s CEO declared: “In the antivirus business, we have been lying to customers for 20 years”. Seven years later, Bryan Dye, Symantec SVP Information Security, stated, “[antivirus] is dead”. Despite these declarations, antivirus remains a billion-dollar industry with millions of users, whether they are individuals, companies or governments. So why is that?
This series aims to help us understand this paradox by reviewing the forty-year-old cat and mouse game that has been going on between antiviruses and malware programs.
The Antiviruses strike back… harder
Dynamic detection: gauge the sample’s demeanor
Dynamic detection methods rely on the sample’s execution rather than its structure. By observing the behavior of a sample, they are able to assess its maliciousness.
In an effort to thwart malware using obfuscation techniques such as encryption or packing, AVs first used a technique known as code emulation: emulating a processor and a memory management system to execute the first instructions of a sample and test if it attempts suspicious actions such as unpacking or decrypting itself. Moreover, once the payload was decrypted / unpacked, the classic signature system could be run against it. This technique is used by several AV editors, as these patents (1, 2, 3) suggest.
However, as mentioned in the first part of this series, malware quickly circumvented these tools by adding ruinous instructions before the decryption routines or by testing the emulated environment. Constrained by strict timeframe or a maximum number of instructions to execute when analyzing a file, the emulators often failed to reach the desired point in the code. Moreover, the emulated environment being only composed of a memory management system and a CPU, it was easy for malware to detect, thus easy to circumvent.
Sandboxing: Hang the DJ
A sandbox is an isolated system used to run suspicious applications. Its goal is to execute a sample in a somewhat realistic environment to observe its behavior. It is a very broad term covering several technologies going from simple CPU and memory emulation to the use of a fully-fledged virtual machine.
To understand the nuance between them in this context, let’s talk about the difference between emulation and virtualization: when running a sample in an emulated environment, the emulated parts are simulated by a software aimed at replicating the behavior of a specific element (operating system, hardware…). When using virtualization, the sample runs in a virtual machine (VM) which – under the supervision of a hypervisor – is using the hardware of the machine running the VM.
According to Kruegel (2014), the main drawback of emulation is the overhead induced by the software replicating the part. But, unlike virtualization, a small part of a system can be emulated if the rest is not needed instead of replicating a whole machine, which is often not suitable for a regular user.[i] As shown by Egele et al. (2012) sandboxing can be done in various manners, which are detailed in the following paragraphs.[ii]
Operating system emulation
Introduced in AV in the 1990s, it consists of executing suspicious files in an emulated environment. The information extracted from the emulator can then be used to feed a heuristic engine or match signatures. To go beyond the aforementioned limitations of the code emulation techniques, some AV editors decided to emulate an entire operating system (OS): this is known as Operating System emulation. Let us notice that due to the rise of malware written in interpreted language (such as the infamous GootKit), AV editors developed emulators provided with an interpreter to execute such malware, as shown by this patent.
Of course, like any sandbox, both types of solutions can be bypassed if the emulation is detected; as shown here, malware researcher Emeric Nasi showed that simple tricks could easily circumvent AV emulation.
The main appeal of this technique is that it requires less computing power than a sandbox using virtualization. But both techniques suffer from being easily detectable: as reported by Kruegel (2014), emulating an OS is a difficult task, especially given the number of undocumented API of an ever-changing OS like Microsoft Windows. Any flaw in the emulation could allow malware to determine that it is being analyzed, and a malware aware of this fact is unlikely to exhibit any malicious behavior.
Full system emulation
To avoid having to replicate an entire OS, some companies choose to emulate the hardware, which API, unlike an OS API, undergoes very little change, to guarantee backward compatibility and are very well documented. To minimize the chance of being detected, this tool emulates all the necessary components of a computer: this is known as full system emulation. Currently this kind of technique is mainly used in the cloud-based solutions (more information about cloud in AVs below). This is mostly due to the amount of computing power required by it which is not suitable for every user; the same goes for virtualization-based sandboxes.
Some AV editors include a virtualization-based sandbox in their product or as a separate module. This kind of sandbox has two main appeal. First of all, it usually has better performance than the full system emulation solution, because of the absence of overhead discussed above.
Second of all, it is subtler than most OS emulation solutions. For these reasons, major AV editors use this technology in their products. Let’s note that companies usually do not include virtualization-based sandboxes to be used locally, which is probably due to the amount of resources needed for it, which makes it more suitable to be hosted in the cloud.
Let us notice that, because of the resource cost of sandboxing techniques such as full system emulation and virtualization-based sandboxes, AVs usually weed out the majority of the samples by using static or simpler dynamic analysis mechanism. Moreover, to minimize the chance of evasion, some AV companies decided to use both full system emulation and virtualization-based sandboxes.
Monitoring: this time, it’s for real
As discussed above, analyzing a file in a sandbox can be expensive. Thus, several AVs have opted for an alternative solution: execute the sample in the endpoint but monitor it closely. This technique is used by various AV editors as we can see here and here. Moreover, several related patents have been registered by antivirus editors (1, 2, 3).
Keeping an eye on a process is usually done with hooks but other methods exist such as AMSI (on Windows 10), taint analysis, hardware-assisted virtualization, etc. Unsurprisingly, this AV firm owns several patents regarding hooking (1, 2).
In the late 2000s, the AV companies were faced with several challenges, including those that prompted the need for cloud-based AV. The time lapse between the appearance of a new malware and the creation of an efficient signature was (and still is) significant; it opens a vulnerability windows, during which endpoints are defenseless against the new malware.
Moreover, the diversification of malware forced AVs to become huge programs, which made them more vulnerable to exploit as their complexity increased. For more info on this topic, see this Black Hat talk. This, combined with the fact that AV often needs elevated privileges to run, made them excellent targets for malware.
To solve these problems, the concept of cloud-based AV was introduced by Oberheide, Cooke & Jahanian in 2008, aimed at increasing the detection rate while reducing the vulnerability window by applying the idea of N-version programming to AV.[iii] Instead of having a heavy AV agent running on a computer, they proposed a lightweight agent, tasked with querying an online component which is in charge of analyzing sample against several AV engines and returning the results to the agent.
This design has several advantages: first of all, the lightweight agent is nothing like the huge and complex program that is a classic AV, which means it is easier to harden and to avoid having malware compromising it. Second, at least in theory, using a cloud-based AV means using several AV engines, which when combined, have a smaller vulnerability window and a higher detection rate. This was underlined by the original research paper, but AV editors often only use their own engine.
Another benefit is that all the heavy lifting is made in the cloud, which allows for the use of aforementioned resource-hungry sandboxing solutions, which would be unable to run on a classic endpoint. Moreover, a cloud-based AV is easy to deploy, since only the agent is necessary.
Cloud-based AV also allows for an easier retrospective detection process, as underlined by the authors of the papers. If a computer downloads a file which is deemed malicious by the AV, then all computers that downloaded similar files can be considered infected and the administrators or even the AV can take the necessary steps toward an incident response.
Furthermore, conducting the analysis in the cloud makes the reverse engineering of the AV more complicated because access to the analysis tools is limited. As shown by Koret & Bachaalany (2015), AV signature stored on an endpoint can be unpacked by malware authors to bypass them more easily. On the other hand, an obvious drawback is that if the service is down, the endpoint is left without any protection.
Several companies offer cloud-based solutions. Those who do not usually include the cloud as a way to perform sandboxing or retrieve metadata which will be used for data mining or machine learning.
Threats of tomorrow
Over the last 20 years, the malware author population progressively shifted from individuals searching for recognition or wanting to pull off a stunt to organized criminals and state-sponsored actors. Because malware is now written by professionals, attacks have become more sophisticated and have evolved to stay ahead of the countermeasures used by AV editors.
The professionalization of malware is spawning services aimed at helping malware authors. A good example of these tools are the no-distribute scanners: services to which cybercriminal submit their malware to verify if they are detected by AV engines. The main difference is that unlike similar website like VirusTotal, the sample will not be shared with AV editors.
Malware authors use such websites to test and enhance their malware stealth. Such tools substantially undermine the capabilities of AV to thwart new threats before they inflict substantial damage. Furthermore, evolutions of the cyber landscape such as fileless malware or the growing use of IoT equipment are seriously challenging the traditional way AV operates.
As we have seen, the current threat landscape moves fast and keeping up with it is getting more and more difficult. The problem is: you can’t fight something if you don’t know what it is. This is where threat intelligence comes into play. By providing you with targeted and actionable data, Blueliv helps you to stay in the race using automation complemented by human intelligence.
This post was authored by Mathieu Gaucheler with the support of the Blueliv Labs team.
[i] Kruegel, C. (2014). Full System Emulation: Achieving Successful Automated Dynamic Analysis of Evasive Malware. Retrieved from https://www.blackhat.com/docs/us-14/materials/us-14-Kruegel-Full-System-Emulation-Achieving-Successful-Automated-Dynamic-Analysis-Of-Evasive-Malware-WP.pdf
[ii] Egele, Manuel & Scholte, Theodoor & Kirda, Engin & Kruegel, Christopher. (2012). A Survey on Automated Dynamic Malware-Analysis Techniques and Tools. ACM Computing Surveys – CSUR. 44. 1-42. 10.1145/2089125.2089126.
[iii] Oberheide, Jon & Cooke, Evan & Jahanian, Farnam. (2008). CloudAV: N-Version Antivirus in the Network Cloud. Proceedings of the 17th USENIX Security Symposium. 91-106.