As it is well known, Cuckoo Sandbox is a malware analysis system which allows us to customize both processing and reporting stages. In this context, we can feed Cuckoo with Yara Rules based not only on the content of malware, but also on its behavior.
One of the most prominent issues when working with Yara Rules is to know how accurate they are. Unfortunately, Cuckoo Sandbox doesn’t include a feature that allows you to do this. For this reason, we developed yaraqa.py, a python code that will let you test your own Yara rulesets in a flexible and customizable way.
YaraQA
Yaraqa.py will try to apply your Yara ruleset to a malware repository, goodware repository or both. It will control whether a file must match a rule or not and maintain internal counters to finally show a statistic summary that will allow us to see our Yara rulesets accuracy. The script can handle these options:
In order to launch yaraqa.py successfully, we need to fill its configuration file, yaraqa.conf. In this file, you’ll have to specify where to find your goodware and malware repositories, static and memory Yara rules and the needed parameters for Cuckoo. You can find further information about the directory organization and filename directives needed by yaraqa.py in order to work properly in the README file.
Examples
In order to execute the test we need to state at least which family do we want to test. This would be the default execution if no optional parameters are passed.
> python yaraqa.py –family decebal
This would execute the yaraqa testing with these arguments by default:
> python yaraqa.py –family decebal –static –memory –malware –goodware
Which means that yaraqa will test static and memory Yara rules against malware and goodware repositories. We could test Yaras in memory with –memory or both using –all.
This is how we would test decebal.yara with both memory and static methods against malware repositories, plotting the results and without saving any logfile. In addition, we would like to test Yaras only against targeted files (both commands are equivalent).
> python yaraqa.py –family decebal –memory –static –malware –plot –nolog –targeted
> python yaraqa.py –family decebal –all –malware –plot –nolog –targeted
Results
Here we can see some screenshots to have an idea how the script’s terminal output looks and how the plot is generated. In this example, we first checked our static Yara rule to identify decebal against malware and goodware repositories. As we have not specified targeted option, all files at both repositories will be used to test the rule (see Fig. 2).
Now we would like to check memory Yara rule as well. This time we will set the –targeted option, as memory Yara rules take more time than static, we are not going to check the memory rule against all existing files, just the ones we know they would have to match. We ask yaraqa to generate a –plot as well (see Fig .3 and Fig. 4).
This is how .svg plots generated look like, we could import yaraqa in a python script to launch multiple Yara test and render a plot with all the data retrieved from yaraqa, as we can see in the last screenshot (see Fig. 6 and Fig. 7).
If you want to test it, you can download the code here.
You can also share with us your Yara rules, malware samples or give us feedback, emailing us at community@blueliv.com. Don’t miss the chance to join our community to fight against cyber crime.
Alberto Marín
Cyber Threat Analyst