Content Type Obfuscation
Polyglots
A file that has different types simultaneously, may bypass filters and avoid security counter-measures.
Types
Simple Polyglot file: the file has different types, accessed depending on how it is handled.
Ambiguous file: is one that is interpreted differently depending on the parser. One parser may crash or fail to process it, while another may return a valid file.
Chimera file: the file has some data interpreted as different types.
Use in Malware
…allows remote attackers to execute arbitrary code or cause a denial of service (memory corruption) via (1) a crafted Flash application in a .pdf file or (2) a crafted .swf file, related to authplay.dll, as exploited in the wild in July 2009.
Strategies
Stacks: Data is appended to the file.
Cavities: Uses blank (non-used space) in the file.
Parasites: Uses comments or metadata fields that allow content to be written.
Zippers: mutual comments.
Empty Space
Files sometimes allow empty or unused space.
Before, in the middle or after actual content (appended).
Most common in Block formats (ISO and ROM dumps, TAR archives).
NAND dumps, ROM dumps, and ISOs are directly mapped to sectors.
Some formats allow arbitrary bytes before the file starts (e.g. PDF).
PDFs are processed from the end.
“Empty space” can be abused to inject crafted content.
A simple bash-pdf polyglot
Why?
PDF is a collection of objects.
Objects are dictionaries of properties with a named type.
Called “CosObjects” or Carousel Object System.
Added to file. New revisions will create new objects that are appended.
A PDF can have unused objects.
Objects can contain executable code (the code is not executed by the PDF reader!).
Objects can contain anything!
Well…. There is the LAUNCH action, and Javascript is a valid object type…
A simple object
Two objects
Two objects and something else that is not parsed
The XREF Table
PDF have a table with the offset of every object.
In the end!
The reader skips to the end of the file, reads the table and parses the objects.
That’s one reason why it ignores garbage between objects.
XREF table also defines where the file magic (%PDF-1.5\n\n) is
There may be some bytes before the magic.
1024 random bytes are allowed.
Practical application
Malware makes use of polyglots as a means to circumvent filters.
A Packet/Email/Web application firewall will block executables, but will it block JPGs? If it does, can it be done with a low rate of false positives?
The general process involves downloading a polyglot and a decoder.
Polyglot contains malicious code.
Decode is implemented in a less suspicious manner (e.g., Javascript).
From a Reversing Perspective: how much effort will we spend analyzing a JPG?
Automated tools such as binwalk, TrId and file can help (but are limited).
Last updated