Pushdo has historically (since 2008) had close ties to the Cutwail botnet, often acting as a dropper for it. The reader, however, is reminded: as malware executes on a system it can do almost anything it’s controller wants.
Code execution is code execution, regardless if the malware has previously been used for sending spam, creating traffic for DoS attacks, or exfiltrating stolen business secrets to a drop server used by an advanced persistent threat actor during a nation-state sponsored cyber-espionage campaign.
Previous versions of Pushdo have used DNS smokescreens, URL path randomization, and DGA fall back techniques for obscuring command and control (C2) communication. Recently, a new variant of the Pushdo implant surfaced which uses a new algorithm to generate domains. In an attempt to sever Pushdo communications for our customers, we reverse engineered the Pushdo sample, isolated functionality which generated domains, and reimplemented the algorithm’s logic.
Unpacking the Sample
Generally, malware authors tend to not ship their binaries in “plain text”. Once they have written and compiled their creations into an executable, they run it through a tool called a “crypter”. These tools typically cut up the input file into pieces, encrypt them, and place them into another executable which has been specially crafted to reassemble the payload and have it run. These outer shells are updated multiple times per day to evade detection by security software, and many even employ server-side polymorphism on the malware repository, which means each individual victim will receive a distinct copy of the malicious file. It’s not hard to see why naive signature based detection techniques cannot keep up with this style of distribution. In order to recover the inner payload, there are a few things to look for during analysis. In a somewhat half-hearted attempt to guard the inner workings of the crypter, the code which actually reassembles the payload itself needs to be found and decrypted.
The first duty of an analyst is to locate this initial production of code. For the Pushdo sample we analyzed, the outer shell was compiled against Microsoft’s MFC. This is both a blessing and a curse. The downside is that MFC applications tend to be a complete mess of events and callbacks, and control flow is not always easy to statically determine. However, the advantage is that the tables of object methods are trivial to find, and so in practice it never takes very long to find suspicious functionality:
In this custom window class, we found the function that decrypts the second stage of the crypter. With a little trial and error in a debugger in an isolated environment, the transfer of control can be found with relative ease: There’s a jump into the middle of some dynamically allocated memory, the first instructions of which are the typical “call/pop” combination so that this code can orient itself and locate all of the information it will need to reassemble the payload. However, debuggers are pretty poor environments for analyzing code, so we dumped out this memory region and imported it back into an IDA database: We even get to see some of what this stage will be up to — a dab of dynamic import fetching, a hint of memory protection fiddling, and a standard trick of setting up the payload to run using the Windows API UnmapViewOfFile. The thing to remember about this second stage of the crypter is that many of them tend to have a fatal flaw: they will reconstruct the original payload in memory as it was on disk before the crypter ran. Therefore, we don’t need to worry about the details of the bit-smashing done during reassembly. We just need to keep our eye out for PE parsing code… like this: Sure enough, setting a breakpoint here and investigating the area pointed to by edx, we find the original malware executable. One “.writemem” later in WinDbg and we’ve totally dispatched the crypter.
Reverse Engineering the Communication Mechanisms
Similar to previous variants of Pushdo, this sample uses a smokescreen technique in attempts to hide its actual command and control (C2) communications. One rather notable difference between this and previous versions of Pushdo is that this version has moved away from the recognizable URL patterns, such as “/?ptrxcz_”. In addition, gone are the POSTs to vmw.com and youtube.com. Interestingly though, according to domain features from Investigate, the average popularity score for a smokescreen domain in the previous version of Pushdo was 42.45 while the average popularity of a smokescreen domain used by this variant is 39.19; only a slight decrease in average smokescreen domain popularity. OpenDNS has noticed a large increase in queries for all smokescreen domains used by this variant of Pushdo starting around July 4. One such increase is depicted below. The domain resolutions from Pushdo implants could be the cause of this increase.
The hardcoded list of 100 domains, pictured below, are resolved and contacted using HTTP POST methods. Most of these domains have no ties to the Pushdo malware and are completely legitimate. The actual C2 and benign domains are sent HTTP requests with identical features (data content, static HTTP headers and user-agent, etc). This muddies traffic analysis, potentially causing confusion for some analysts and automated analysis systems.
After resolving and contacting the hard coded domains, the sample falls back to algorithmically generated domains with a hardcoded TLD of ‘.kz’. Similar to Cryptolocker, the Pushdo DGA is seeded on time. The algorithm which generates the domains is a shared secret between the threat actor controlling the botnet and the samples being distributed to compromise victim machines. This shared secret provides a layer of protection for the botnet by making its C2 domains a moving target for blacklists and take downs. Once a high level understanding of Pushdo’s call backs were established, deeper analyses were undertaken to fully comprehend how the implant programmatically generated call back domains.
Loading up the payload Pushdo drops into IDA presents an embarrassment of riches. A cursory glance at the strings shows numerous avenues of initial investigation: We see all the hallmarks of malicious software: setting runkeys so that the malware will start executing again when the victim’s computer reboots, “svchost.exe” to do some good old-fashioned thread injection, an “http://%s” to reach out to a generated domain name, and even the alphabet used to do Base64 encoding. Since we were focused on finding the DGA, we followed the network related strings first, which led straight to the heart of the matter: This piece of code was responsible for finding a command and control server, with some backup plans in case of failure. It would first generate the domains for the given day and try to contact them in order. If all of these domains could not be reached, it would try the domains from the previous day. The process continued until either a server responded or until all domains from the previous 30 days had been tried. In the case of continued failure, it would try contacting domains for 15 days in the future as well, finally giving up if none could be reached. All that was left to do was to translate this piece of assembly code into something a little more user friendly, and then hook up this domain generation script into our main malicious domain feed. One crontab entry later, and our customers had complete protection.
Working Smarter, Not Just Harder
These days, security vendors collect tens if not hundreds of thousands of malware samples daily. It might seem that in the face of such volume, it would not be possible to complete such a detailed analysis on every file. This is correct — it is not possible, but that is not the whole story. Of the malware samples received daily, they are not all *essentially different* from one another. Many, if not most, of the differences are just the daily variations in the crypters used to hide the main payloads. The core malicious functionally changes at a far slower pace, with the exact same malicious binary being distributed for weeks or months at a time. This is where security companies can apply the most pressure.
By doing deep analysis like reverse engineering the DGA, we force the malware authors to change a core piece of functionality, or risk losing control of their existing infrastructure. This, in turn, is slower and more error prone than just running their binary through yet another crypter to evade detection for a day, and often doesn’t happen. For example, the differences in the DGA for this version of Pushdo and the first version of the DGA which appeared in 2013 are mostly cosmetic. A few constants in the algorithm changed along with the table of letters used to generate the domain name fragments, but the structure of the algorithm was virtually identical.
The image below shows the proactive blocking of a Pushdo C&C domain prior to it ever having been queried by an OpenDNS user.