Content Type Obfuscation

Polyglots

A file that has different types simultaneously, may bypass filters and avoid security counter-measures.

Types

  • Simple Polyglot file: the file has different types, accessed depending on how it is handled.

  • Ambiguous file: is one that is interpreted differently depending on the parser. One parser may crash or fail to process it, while another may return a valid file.

  • Chimera file: the file has some data interpreted as different types.

Use in Malware

…allows remote attackers to execute arbitrary code or cause a denial of service (memory corruption) via (1) a crafted Flash application in a .pdf file or (2) a crafted .swf file, related to authplay.dll, as exploited in the wild in July 2009.

https://nvd.nist.gov/vuln/detail/CVE-2009-1862

Strategies

  • Stacks: Data is appended to the file.

  • Cavities: Uses blank (non-used space) in the file.

  • Parasites: Uses comments or metadata fields that allow content to be written.

  • Zippers: mutual comments.

Empty Space

Files sometimes allow empty or unused space.

  • Before, in the middle or after actual content (appended).

  • Most common in Block formats (ISO and ROM dumps, TAR archives).

    • NAND dumps, ROM dumps, and ISOs are directly mapped to sectors.

  • Some formats allow arbitrary bytes before the file starts (e.g. PDF).

    • PDFs are processed from the end.

“Empty space” can be abused to inject crafted content.

A simple bash-pdf polyglot

Why?

PDF is a collection of objects.

  • Objects are dictionaries of properties with a named type.

  • Called “CosObjects” or Carousel Object System.

  • Added to file. New revisions will create new objects that are appended.

  • A PDF can have unused objects.

  • Objects can contain executable code (the code is not executed by the PDF reader!).

    • Objects can contain anything!

    • Well…. There is the LAUNCH action, and Javascript is a valid object type…

A simple object

1 0 obj
<</length 100>>
stream

…100 bytes..

endstream
endobj

Two objects

1 0 obj
<</length 100>>
stream
…100 bytes..
endstream
Endobj
2 0 obj
<</length 100>>
stream
…100 bytes..
endstream
endobj

Two objects and something else that is not parsed

1 0 obj

<</length 100>>
stream
…100 bytes…
endstream
Endobj
I should not be here, but who cares. And I could be anywhere
2 0 obj
<</length 100>>
stream
…100 bytes…
endstream
endobj

The XREF Table

PDF have a table with the offset of every object.

  • In the end!

  • The reader skips to the end of the file, reads the table and parses the objects.

    • That’s one reason why it ignores garbage between objects.

XREF table also defines where the file magic (%PDF-1.5\n\n) is

  • There may be some bytes before the magic.

  • 1024 random bytes are allowed.

Practical application

Malware makes use of polyglots as a means to circumvent filters.

  • A Packet/Email/Web application firewall will block executables, but will it block JPGs? If it does, can it be done with a low rate of false positives?

The general process involves downloading a polyglot and a decoder.

  • Polyglot contains malicious code.

  • Decode is implemented in a less suspicious manner (e.g., Javascript).

From a Reversing Perspective: how much effort will we spend analyzing a JPG?

  • Automated tools such as binwalk, TrId and file can help (but are limited).

Last updated