In 2008, Eva Chen, Trend Micro’s CEO declared: “In the antivirus business, we have been lying to customers for 20 years”. Seven years later, Bryan Dye, Symantec SVP Information Security, stated, “[antivirus] is dead”. Despite these declarations, antivirus remains a billion-dollar industry with millions of users, whether they are individuals, companies or governments. So why is that?
This series aims to help us understand this paradox by reviewing the forty-year-old cat and mouse game that has been going on between antiviruses and malware programs.
The Antiviruses strike back
By using the obfuscation techniques mentioned above on known malware, Christodorescu & Jha (2004) were able to escape detection by major antiviruses, showing false negative rates sometimes higher than 50%.[i]
Given these figures, it was clear that relying on a rigid signature-based strategy was not a suitable design for an AV. Thus, AV vendors started to explore new strategies to repel malware. These new strategies rely on a heuristic approach to detection, to thwart malware attempts at escaping an over-specific signature. They are presented in two categories: static and dynamic.
Static detection: evaluating the sample itself
Static detection is a category aggregating all the techniques that do not rely on the analysis of the sample’s behavior when executed. The next paragraphs are mostly gleaned from research article because AV editors are not very keen on discussing the internals of their AV engines. Nevertheless, you have an overview of the tools that can be used by AVs.
This technique aims to ignore an aforementioned obfuscation technique: the addition of worthless instructions such as NOP. AVs implemented ways to match their signatures on a file while ignoring these kinds of instructions. As specified by Bashari Rad, Masrom & Ibrahim (2011), this technique can also detect macro viruses written in text format by ignoring the TAB and space characters.[ii]
This technique is mainly used to parse macro code. Once the code has been parsed, only the essential statements are kept (the “skeleton”). The AV will then match its signatures on the skeleton, instead of the original macro.
Algorithmic scanning method
Also known as virus-specific detection, these techniques are deployed when the AV is struggling with detection. They include filtering (limiting the scan to files that are relevant to the malware being searched) and X-ray scanning (brute-force of the malware encryption).
According to Koret & Bachaalany (2015): “an expert system is a heuristic engine that implements a set of algorithms that emulate the decision-making strategy of a human analyst.”[iii] In the case of AV, an expert system would test a sample and, given the results of the test, would decide whether it should stop the analysis, do some more tests or assume that the sample is malicious. Similar systems are used by modern AV.
API calls analysis
System calls and API (Application Programming Interface) function calls are used by applications to communicate with the operating system, for example: NtCreateFile is used to create or open a file/folder on Windows. These calls are a rich source of information to detect malicious files. Several techniques have been applied to analyze them. Here are some examples.
Eskandari & Hashemi (2012) used graph theory to model the API calls made by portable executable (PE) files.[iv] Their method is the following: disassembling and parsing the PE to collect the API calls and create a graph out of them. After creating this graph, their system processes it to assess whether or not the sample is malicious.
Alazab et al. (2011) applied several machine learning classifiers on a large sample set (66 703 executable files).[v] By disassembling and parsing them, they were able to collect API calls and train the classifiers with them. Their method reached a 98.5% accuracy rate. One of the advantages of using machine learning for malware detection is that it is much easier to update machine learning training results than to update a large signature database, like the one needed by legacy AV.
OPcode and binary analysis
One level lower than API calls, we find Opcodes; short for Operation Codes. They are short instructions, part of machine language instruction, used to perform low-level tasks.
In 2013, Shanmugam, Low & Stamp proposed a detection method for metamorphic malware.[vi] By developing a similarity score analogous to Jackobsen’s algorithm (a tool used in cryptanalysis) based on the Opcodes embedded in a sample, they achieved a very high detection rate against metamorphic malware.
Deshpande, Park & Stamp (2013) designed a detection method for metamorphic malware.[vii] This approach is based on eigenvalue analysis, an algebraic notion used in facial recognition among others. By training their classifier on a set of malware samples, they reached detection rates above 90%, both when using the sample’s Opcodes as well as when using their binary structure, although the use of the binary structure yielded better results.
By focusing solely on sample’s entropy, Baysa, Low & Stamp (2013) successfully used entropy analysis to detect metamorphic malware by using the following procedure: analyzing the entropy of two binary files to divide them into several sections and align these sequences according to their Levenshtein distance.[viii]
After doing so, by calculating the similarities between the sequence, they assessed the nature of the sample. Using this method, they were able to perfectly differentiate a mutated metamorphic worm from a benign sample by matching them against a non-mutated version of the worm. Another advantage of this technique is the processing time: 0.2s per MB on a standard PC.
Let us also note the work of Singh et al. (2015).[ix] They combined the techniques employed by aforementioned papers and others, such as simple substitution distance, hidden Markov models, opcode graph similarity and function call graph analysis by using support-vector machines (SVM). By combining these tools, they achieved a better detection rate than with any other tool used alone. Another benefit is that, when increasing the sample obfuscation, the SVM detection rate drops more slowly than any other tool used alone.
Sadly, none of the aforementioned techniques are capable of detecting brand new malware since they all need a set of samples to train the classifiers or create a signature. Moreover, given the rise of interpreted language in the past two decades, malware authors are now using languages such as Python or PowerShell to carry out attacks. The problem is that using static analysis, the behavior of programs written in an interpreted language is not as easy to predict as the one written in compiled languages. Due to this and to improve detection rates, AVs had to engage in dynamic detection methods.
The next part of this series will review some of the techniques used to do so, and how they are used by researchers and AV editor for malware detection.
[i] Christodorescu, Mihai & Jha, Somesh. (2004). Testing malware detectors. ACM SIGSOFT Software Engineering Notes. 29. 34-44. 10.1145/1007512.1007518.
[ii] Bashari Rad, Babak & Masrom, Maslin & Ibrahim, Suhaimi. (2011). Evolution of Computer Virus Concealment and Anti-Virus Techniques: A Short Survey. International Journal of Computer Science Issues (IJCSI). 8. 113-121.
[iii] Koret, J. & Bachaalany, E. (Eds.). (2015). The Antivirus Hacker’s Handbook. Indianapolis: Wiley Publishing.
[iv] Eskandari, Mojtaba & Hashemi, Sattar. (2012). A graph mining approach for detecting unknown malwares. Journal of Visual Languages & Computing. 23. 154–162. 10.1016/j.jvlc.2012.02.002.
[v] Alazab, Mamoun & Venkatraman, Sitalakshmi & Watters, Paul & Alazab, Moutaz. (2011). Zero-day Malware Detection based on Supervised Learning Algorithms of API call Signatures. CRPIT. 121.
[vi] Shanmugam, Gayathri & Low, Richard & Stamp, Mark. (2013). Simple substitution distance and metamorphic detection. Journal of Computer Virology and Hacking Techniques. 9. 10.1007/s11416-013-0184-5.
[vii] Deshpande, Sayali & Park, Younghee & Stamp, Mark. (2013). Eigenvalue analysis for metamorphic detection. Journal of Computer Virology and Hacking Techniques. 10. 10.1007/s11416-013-0193-4.
[viii] Baysa, Donabelle & Low, Richard & Stamp, Mark. (2013). Structural entropy and metamorphic malware. Journal of Computer Virology and Hacking Techniques. 9. 10.1007/s11416-013-0185-4.
[ix] Singh, Tanuvir & Di Troia, Fabio & Visaggio, Corrado Aaron & Austin, Thomas & Stamp, Mark. (2015). Support vector machines and malware detection. Journal of Computer Virology and Hacking Techniques. 10.1007/s11416-015-0252-0.