P olym orp h ic B len d in g A ttack swenke.gtisc.gatech.edu/papers/usenix_security_2006.pdfP olym...

Polymorphic Blending Attacks

Prahlad Fogla Monirul Sharif Roberto Perdisci Oleg Kolesnikov Wenke LeeCollege of Computing, Georgia Institute of Technology

801 Atlantic Drive, Atlanta, Georgia 30332{prahlad, msharif, rperdisc, ok, wenke}@cc.gatech.edu

Abstract

A very effective means to evade signature-based intru-sion detection systems (IDS) is to employ polymor-phic techniques to generate attack instances that donot share a fixed signature. Anomaly-based intrusiondetection systems provide good defense because existingpolymorphic techniques can make the attack instanceslook different from each other, but cannot make themlook like normal. In this paper we introduce a newclass of polymorphic attacks, called polymorphic blend-

ing attacks, that can effectively evade byte frequency-based network anomaly IDS by carefully matching thestatistics of the mutated attack instances to the normalprofiles. The proposed polymorphic blending attackscan be viewed as a subclass of the mimicry attacks. Wetake a systematic approach to the problem and formallydescribe the algorithms and steps required to carry outsuch attacks. We not only show that such attacks arefeasible but also analyze the hardness of evasion underdifferent circumstances. We present detailed techniquesusing PAYL, a byte frequency-based anomaly IDS, as acase study and demonstrate that these attacks are indeedfeasible. We also provide some insight into possiblecountermeasures that can be used as defense.

1 Introduction

In the continuing arms race in computer and networksecurity, a common trend is that attackers are employingpolymorphic techniques. Toolkits such as ADMmutate[17], PHATBOT [10], and CLET [5] are available fornovices to generate polymorphic attacks. The purpose ofusing polymorphism is to evade detection by an intrusiondetection system (IDS). Every instance of a polymor-phic attack looks different and yet carries out the samemalicious activities. For example, the payload of eachinstance of a polymorphic worm can have different bytecontents. It follows that signature-based (misuse) IDSmay not reliably detect a polymorphic attack becauseit may not have a fixed or predictable signature, or

because the invariant parts of the attack may not besufficient to construct a signature that produces veryfew false positives. On the other hand, each instanceof a polymorphic attack needs to contain exploit codethat is typically not used in normal activities. Thus,each instance looks different from normal. Existingpolymorphic techniques [28] focus on making the attackinstances look different from each other, and not much onmaking them look like normal. This means that networkpayload anomaly detection systems can provide a gooddefense against the current generation of polymorphicattacks. However, if a polymorphic attack can blendin with (or look like) normal traffic, it can successfullyevade an anomaly-based IDS that relies solely on pay-load statistics.

In this paper, we show that it is possible to evadenetwork anomaly IDS based on payload statistics us-ing a class of polymorphism that we call polymorphicblending. A polymorphic blending attack is a polymor-phic attack that also has the ability to evade a payloadstatistics-based anomaly IDS. In addition to making allthe mutated attack instances different, an attacker (or theattack code) attempts to make them appear normal bytransforming each instance in such a way that its payloadcharacteristics (e.g., the byte frequency distribution) fitthe normal profile used by the anomaly IDS. Sincepolymorphic blending attacks try to evade the IDS bymaking the attacks look like normal, they can be viewedas a subclass of the mimicry attacks [29, 33].

This paper makes several contributions. We studythe class of polymorphic blending attacks against bytefrequency-based network anomaly IDS, which was in-troduced by Kolesnikov et al. in [12]. We present thegeneral techniques and design considerations for suchattacks. We provide rationales of why these attacks arepractical and show that network anomaly IDS based onpayload statistics do not guarantee adequate protectionagainst sophisticated attacks.

Using 1-gram and 2-gram PAYL [35, 36] as a case

study, we take a systematic approach to the problemand describe the necessary steps required to carry outan effective attack. Our work provides insight into notonly how such an attack can be performed, but alsohow hard it is to launch these attacks under differentcircumstances. We analyze the amount of learning re-quired for the attacker and the time and space complexityrequired for blending. We use a real attack vector [8] toimplement a polymorphic blending attack and provideexperimental evidence that our attack can effectivelyevade detection. We also discuss possible countermea-sures that a defender (e.g., IDS designer or operator)can take to decrease the likelihood that a polymorphicblending attack will succeed.

Organization of the paper The rest of the paper isorganized as follows. We discuss related work in poly-morphic attacks and detection in Section 2. In Section 3,we introduce polymorphic blending attacks and discussthe general techniques and design issues of polymorphicblending attacks. We present our case study in Section 4and conclude the paper in Section 5.

2 Related Work

Transforming attack packets to avoid detection is acommon practice among attackers. Attackers can exploitthe ambiguities present in the traffic stream to trans-form an attack instance to another so that an IDS isnot able to recognize the attack pattern. IP and TCPtransformations ([11, 22]) techniques are used to evadeNIDS that analyzes TCP/IP headers. Vigna et al. [31]discussed multiple network, application and exploit layer(shellcode polymorphism) mutation mechanisms. Aformal model to combine multiple transformations waspresented by Rubin et al. [24]. Multiple tools suchas Fragroute [26], Whisker [23], and AGENT [24] areavailable that can perform attack mutation.

Code polymorphism has been used extensively byvirus writers to write polymorphic viruses. Mistfall, tPE,EXPO, and DINA [28, 37] are some of the polymorphicengines used by virus writers. Worm writers have alsostarted using polymorphic engines. ADMmutate [17],PHATBOT [10], and JempiScodes [25] are some ofthe polymorphic shellcode generators commonly used towrite polymorphic worms. Garbage and NOP insertions,register shuffling, equivalent code substitution, and en-cryption/decryption are some of the common techniquesused to write polymorphic shellcodes.

Quite a few approaches have been proposed to detectpolymorphic attacks. In [30], Toth et al. proposed atechnique to locate the presence of executable shellcodeinside the payload. They used abstract execution ofnetwork flows to find the MEL (Maximum ExecutableLength) of the payload. The flow is marked suspiciousif its MEL is above certain length. Chinchani et al. [2]

performed fast static analysis to check if a networkflow contains exploit code. STRIDE [1] focuses ondetecting polymorphic sleds used by buffer overflowattacks. In [14], Kruegel et al. used structural analysis ofbinary code to find similarities between different worminstances. Using a graph coloring technique on a worm’scontrol flow graph, this approach is able to accuratelymodel the structure of the worm. Given a set of suspi-cious flows, Polygraph [20] generates a set of disjoint in-variant substrings that are present in multiple suspiciousflows. These substrings can then be used as a signatureto detect worm instances. In a recent work, Perdisci et al.[21] proposed an attack on Polygraph [20] where noiseis injected into the dataset of suspicious flows so thatPolygraph is not able to generate a reliable signature forthe worm. Shield [34] uses transport layer filters to blockthe traffic that exploits a known vulnerability. Filters areexploit-independent, and vulnerabilities are described asa partial state machines of the vulnerable application.In [3], Christodorescu et al. proposed an instructionsemantics based worm detection technique. The pro-posed approach can detect code polymorphism that usesinstruction reordering, register shuffling, and garbageinsertions. It is worth noting that unless the attackercombines the polymorphic blending attack proposed inthis paper with other evasion techniques, the approachescited above [1, 2, 3, 14, 20, 30, 34] may be able to detectthe attack. We further discuss possible countermeasuresagainst the polymorphic blending attack in Section 4.7.

A number of attacks aimed at evading Host-basedanomaly IDS have been developed. Wagner et al. [33]and Tan et al. [29] presented mimicry attacks against thestide model [9] developed by Forrest et al. The mainidea behind these mimicry attacks was to inject dummysystem calls into an attack sequence to make the finalsystem call sequence look similar to the normal systemcall sequence. As a defense against mimicry attacks aswell as other impossible path attacks [7, 32], more ad-vanced detection approaches (e.g., [6, 7]) were proposed,which use call stack information along with the systemcall sequences. Recently, a more sophisticated mimicryattack was proposed by Kruegel et al. [13], which canevade most system call based anomaly IDS.

Several application payload-based anomaly IDS [15,18, 19] have been proposed which monitor the payloadof a packet for anomalies. In [16], Kruegel et al. pro-posed four different models, namely, length, characterdistribution, probabilistic grammar, and token finder, forthe detection of HTTP attacks. PAYL, proposed byWang and Stolfo [35], records the average frequencyof occurrences of each byte in the payload of a normalpacket. A separate profile is created for each port andpacket length. In their recent work [36], the authorssuggested an improved version of PAYL that computes

several profiles for each port. At the end of the training,clustering is performed to reduce the number of profiles.They proposed that instead of byte frequency, one canalso use an n-gram model in a similar fashion. One maindrawback of the system is that they do not consider anadvanced attacker, who may know the IDS running atthe target and actively try to evade it. In this paper weprovide strong evidence that such byte frequency basedanomaly IDS are open to attacks and may be easilyevaded.

CLET [5], an advanced polymorphic engine, comesclosest to our polymorphic blending attack. It performsspectrum analysis to evade IDS that use data miningmethods for worm detection. Given an attack pay-load, CLET adds padding bytes in a separate cramming

bytes zone (of a given length) to try and make thebyte frequency distribution of the attack close to thenormal traffic. However, the encoded shellcode (usingXOR) in CLET may still deviate significantly from thenormal distribution and the obtained polymorphic attackmay be detected by the IDS. A preliminary work byKolesnikov et al. [12] introduced and cursorily exploredpolymorphic blending attacks. In this paper we presenta systematic approach for evading byte frequency-basednetwork anomaly IDS, and provide detailed analysis ofthe design, complexity and possible countermeasures forthe polymorphic blending attacks. We also show that ourpolymorphic blending technique is much more effectivethan CLET in evading byte frequency-based anomalyIDS.

3 Blending Attacks

3.1 Polymorphism

A polymorphic attack is an attack that is able to changeits appearance with every instance. Thus, there maybe no fixed or predictable signature for the attack. Asa result, it may evade detection because most currentintrusion detection systems and anti-virus systems aresignature-based. Exploit mutation and shellcode poly-morphism are two common ways to generate polymor-phic attacks. In general, there are three components in apolymorphic attack:

1. Attack Vector: an attack vector is used for exploit-ing the vulnerability of the target host. Certainparts of the attack vector can be modified to createmutated but still valid exploits. There might stillbe certain parts, called the invariant, of the attackvector that have to be present in every mutant forthe attack to work. If the attack invariant is verysmall and exists in the normal traffic, then an IDSmay not be able to use it as a signature because itwill result in a high number of false positives.

2. Attack Body: the code that performs the intendedmalicious actions after the vulnerability is ex-ploited. Common techniques to achieve attack body(shellcode) polymorphism include register shuf-fling, equivalent instruction substitution, instructionreordering, garbage insertions, and encryption. Dif-ferent keys can be used in encryption for differentinstances of the attack to ensure that the byte se-quence is different every time.

3. Polymorphic Decryptor: this section contains thepart of the code that decrypts the shellcode. Itdecrypts the encrypted attack body and transferscontrol to it. Polymorphism of the decryptor canbe achieved using various code obfuscation tech-niques.

Detection of Polymorphic Attacks All attack in-stances contain exploit code and/or input data that aretypically not used in normal activities. For example, anattack instance, especially its decryptor and encryptedshellcode, may contain characters that have very lowprobability of appearing in a normal packet. Thus, ananomaly-based IDS can detect the polymorphic attackinstances by recognizing their deviation from the normalprofile. For example, Wang et al. [35, 36] showedthat the byte frequency distribution of an (polymorphic)attack is quite different from that of normal traffic, andcan thus be used by the anomaly-based IDS PAYL todetect simple polymorphic attacks. However, detectionof a sophisticated polymorphic attack is much morechallenging.

3.2 Blending Attacks

Clearly, if a polymorphic attack can “blend in” with(or “look” like) normal, it can evade detection by ananomaly-based IDS. Normal traffic contains a lot ofsyntactic and semantic information, but only a very smallamount of such information can be used by a high speed

network-based anomaly IDS. This is due to fundamentaldifficulties in modeling complex systems and perfor-mance overhead concerns in real-time monitoring. Thenetwork traffic profile used by high speed anomaly IDS,e.g., PAYL, typically includes simple statistics such asmaximum or average size and rate of packets, frequencydistribution of bytes in packets, and range of tokens atdifferent offsets.

Given the incompleteness and the imprecision of thenormal profiles based on simple traffic statistics, it isquite feasible to launch what we call polymorphic blend-

ing attacks. The main idea is that, when generating apolymorphic attack instance, care can be taken so thatits payload characteristics, as measured by the anomalyIDS, will match the normal profile. For example, in orderto evade detection by PAYL [35, 36], the polymorphic

attack instance can carefully choose the characters usedin encryption and pad the attack payload with a chosenset of characters, so that the resulting byte frequency ofthe attack instance closely matches the normal profilesand thus will be considered normal by PAYL.

3.2.1 A Realistic Attack Scenario

Before presenting the general strategies and techniquesused in polymorphic blending attacks, we present anattack scenario and argue that such attacks are realistic.Figure 1 shows the attack scenario that is the basis ofour case study. There are a few assumptions behind thisscenario:

• The adversary has already compromised a host Xinside a network A which communicates with thetarget host Y inside network B. Network A andhost X may lack sufficient security so that theattack can penetrate without getting detected, or theadversary may collude with an insider.

• The adversary has knowledge of the IDS (IDSB)that monitors the victim host network. This mightbe possible using a variety of approaches, e.g.,social engineering (e.g., company sales or purchasedata), fingerprinting, or trial-and-error. We arguethat one cannot assume that the IDS deployment isa secret, and security by obscurity is a very weakposition. We assume IDSB is a payload statistics

based system (e.g., PAYL). Since the adversaryknows the learning algorithm being used by IDSB ,given some packet data from X to Y , the adversarywill be able to generate its own version of thestatistical normal profile used by IDSB .

• A typical anomaly IDS has a threshold setting thatcan be adjusted to obtain a desired false positiverate. We assume that the adversary does not knowthe exact value of the threshold used by IDSB ,but has an estimation of the generally acceptablefalse positive and false negative rates. With thisknowledge, the adversary can estimate the errorthreshold when crafting a new attack instance tomatch the IDS profile.

We now explain the attack scenario. Once the adver-sary has control of host X , it observes the normal trafficgoing from X to Y . The adversary estimates a normalprofile for this traffic using the same modeling techniquethat IDSB uses. We call this an artificial profile. With it,the adversary creates a mutated instance of itself in sucha way that the statistics of the mutated instance match theartificial profile. When IDSB analyzes these mutatedattack packets, it is unable to discern them from normaltraffic because the artificial profile can be very close tothe actual profile in use by IDSB . Thus, the attacksuccessfully infiltrates the network B and compromiseshost Y .

Figure 1: Attack Scenario of Polymorphic BlendingAttack

3.2.2 Desirable Properties of Polymorphic BlendingAttacks

Clearly, the key for a polymorphic blending attack tosucceed in evading an IDS is to be able to learn anartificial profile that is very close to the actual nor-mal profile used by the IDS, and create polymorphicinstances that match the artificial profile. There areother desirable properties. First, the blending process(e.g., with encoding and padding) should not result inan abnormally large attack size. Otherwise, a simpledetection heuristic will be to monitor the network flowsize. Second, although we do not put any constraint onthe resources available to the adversary, the polymorphicblending process should be economical in terms of timeand space. Otherwise, it will not only slow down theattack, but also increase the chance of detection bythe local IDS (e.g., IDSA or host-based IDS.) Moreformally, given a description of the algorithm that theIDS uses to learn and match the normal profile and anattack instance, the time (and space) complexity of thealgorithm used to apply polymorphic blending to theattack instance should be a small degree polynomial withrespect to the initial attack size. Algorithms that requireexponential time and space may not be practical. Sincethe learning time should be small, the blending algorithmshould not require to collect a lot of normal packets tolearn the normal statistics.

3.3 Steps of Polymorphic Blending Attacks

The polymorphic blending attack has three basic steps:(1) learn the IDS normal profile; (2) encrypt the attackbody; (3) and generate a polymorphic decryptor.

3.3.1 Learning The IDS Normal Profile

The task at hand for the adversary is to observe thenormal traffic going from a host, say X , to another host

in the target network, say Y , and generate a normalprofile close to the one used by the IDS at the targetnetwork, say IDSB , using the same algorithm used bythe IDS.

A simple method to get the normal data is by sniffingthe network traffic going from network A to host Y .This can be easily accomplished in a bus network. In aswitched environment, it may be harder to obtain suchdata. Since the adversary knows the type of servicerunning at the target host, he can simply generate normalrequest packets and learn the artificial profile using thesepackets.

In theory, even if the adversary learns a profile fromjust a single normal packet, and then mutates an attackinstance so that it matches the statistics of the normalpacket perfectly, the resulting polymorphic blended at-tack packet should not be flagged as an anomaly byIDSB , provided the normal packet does not result ina false positive in the first place. On the other hand,it is beneficial to generate an artificial profile that is asclose to the normal profile used by IDSB as possible, sothat if a polymorphic blended attack packet matches theartificial profile closely it has a high chance of evadingIDSB . In general, if more normal packets are capturedand used by the adversary, she will be able to learn anartificial normal profile that is closer to the normal profileused by IDSB .

3.3.2 Attack Body Encryption

After learning the normal profile, the adversary creates anew attack instance and encrypts (and blends) it to matchthe normal profile. A straightforward byte substitutionscheme followed by padding can be used for encryption.The main idea here is that every character in the attackbody can be substituted by a character(s) observed fromthe normal traffic using a substitution table. The en-crypted attack body can then be padded with some moregarbage normal data so that the polymorphic blendedattack packet can match the normal profile even better.To keep the padding (and hence the packet size) minimal,the substituted attack body should already match thenormal profile closely. We can use this design criterionto produce a suitable substitution table.

To ensure that the substitution algorithm is reversible(for decrypting and running the attack code), a one-to-one or one-to-many mapping can be used. A single-byte substitution is preferred over multi-byte substitutionbecause multi-byte substitution will inflate the size of theattack body after substitution. An obvious requirementof such encryption scheme is that the encrypted attackbody should contain characters from only the normaltraffic. Although this may be hard for a general en-cryption technique (because the output typically looksrandom), it is an easy requirement for a simple byte

substitution scheme. However, finding an optimal substi-tution table that requires minimal padding is a complexproblem. In Section 4, we show that for certain casesthis is a very hard problem. We can instead use a greedymethod to find an acceptable substitution table. Themain idea is to first sort the statistical features in thedescending order of the frequency for both the attackbody and normal traffic. Then, for each unassigned entrywith the highest frequency in the attack body, we simplymap it to an available (not yet mapped) normal entry withthe highest frequency. This procedure is repeated untilall entries in the attack body are mapped. The featuremapping can be translated to a character mapping anda substitution table can be created for encryption anddecryption purposes.

3.3.3 Polymorphic Decryptor

A decryptor first removes all the extra padding from theencrypted attack body and then uses a reverse substitu-tion table (or decoding table) to decrypt the attack bodyto produce the original attack code (shellcode).

The decryptor is not encrypted but can be mutatedusing multiple iterations of shellcode polymorphism pro-cessing (e.g., mapping an instruction to an equivalent onerandomly chosen from a set of candidates). To reversethe substitution done during blending, the decryptorneeds to look up a decoding table that contains therequired reverse mappings. The decoding table for one-to-one mapping can be stored in an array where the i-thentry of the array represents the normal character usedto substitute attack character i. Such an decoding tablecontains only normal characters. Unused entries in thetable can be used for padding. On the other hand, storageof decoding tables for one-to-many mapping or variable-length mapping is complicated and typically requireslarger space.

3.4 Attack Design Issues

3.4.1 Incorporating Attack Vector and PolymorphicDecryptor in Blending

We discussed in Section 3.3.2 that the encryption of theattack body is guided by the need to make the attackpacket match the normal statistical profile (or moreprecisely, the learned artificial profile).

The attack vector, decryptor, and substitution tableare not encrypted. Their addition to the attack packetpayload alters the packet statistics. The new statisticsmay deviate significantly from the normal profile. Insuch a case, we must find a new substitution table inorder to match the whole attack packet to the normalprofile. First, we take the normal profile and subtract thefrequencies of characters in the attack vector, decryptor,and existing substitution table. Next, we find a newsubstitution table using the adjusted normal profile. If the

statistics of the new substitution table is not significantlydifferent from the old substitution table, we use the newsubstitution table for encryption. Otherwise we repeatthe above steps.

3.4.2 Packet Length based IDS Profile

If IDSB has different profiles for packets of differentlengths, as in the case of PAYL, the substitution phaseand padding phase need to use the normal profile cor-responding to the final attack packet size. A targetlength greater than the length of the original attack packet(before polymorphic blending) is chosen at first. Theencryption step is then applied and the packet is paddedto the target length. If the statistics of the resultingattack packet is not very close to the normal profile, adifferent target length is chosen and the above process isrepeated. Another strategy is to divide the attack bodyinto multiple small packets and perform the polymorphicblending process for all of them separately.

4 Evaluation and Results

To demonstrate that polymorphic blending attacks arefeasible and practical, we show how an attack can usepolymorphic blending to evade the anomaly IDS PAYL.

In this section, we first describe the polymorphicblending techniques to evade PAYL. Then we report theresults of the experiments we ran to evaluate the evasioncapabilities of the polymorphic blending attacks.

In our evaluation, we first established a baseline per-formance by sending polymorphic instances (generatedusing the CLET polymorphic engine) of the attack toPAYL and verified that all of the instances were detectedby the IDS as anomalies. Then, without changingthe configuration of PAYL, we used our polymorphicblending techniques to generate attack instances to seehow well they can evade the IDS.

4.1 PAYL Anomaly IDS as A Case Study

PAYL has been shown to be effective in detecting poly-morphic attacks and worms [35, 36]. For this reasonwe used PAYL in our case study. We used the 2-gramversion in addition to the 1-gram version to evaluate howpolymorphic blending attack is affected when an IDSuses a more comprehensive model.

PAYL uses n-gram analysis by recording the fre-quency distribution of n-grams in the payload of apacket. A sliding window of width n is used to recordthe number of occurrences of all the n-grams presentin the payload. A separate model is generated for eachpacket length. These models are clustered togetherat the end of the training to reduce the number ofmodels. Furthermore, the length of a packet is alsomonitored for anomalies. Thus a packet with an unseenor very low frequency length is flagged as an anomaly.

{f(xi),!(xi)} represents the PAYL model of normaltraffic, where xi is the ith gram, which is a character in1-gram PAYL, and a tuple in 2-gram PAYL. f(xi) is theaverage relative frequency of xi in the normal traffic, and!(xi) is the standard deviation of xi in the normal traffic.The anomaly score as calculated by PAYL is shown inEquation 1.

score(P ) =!

i

(f(xi) − f(xi))/(!(xi) + ") (1)

Here, P is the monitored packet, f(xi) is the relativefrequency of the ith gram xi in P , and " is a smoothingfactor used to prevent division by zero. For conveniencewe will use the term frequency to denote relative fre-quency.

We evaluated our polymorphic blending attack withthe first version of PAYL as described in [35]. Wang etal. [36] proposed some improvements on PAYL in theirrecent version. We believe that our attack still worksfor this new version of PAYL. The main improvement ofthe new version is to use multiple centroids for a givenpacket length, so that a low false positive rate can beachieved using a relatively low anomaly threshold. Inthis case, our polymorphic blending attack has to use thesame learning algorithm as the new version of PAYL.Furthermore, more normal traffic needs to be used tolearn an artificial profile that is close to the actual normalprofile. Thus, the effect is that our attack may take alittle more time. The new version also matches ingresssuspicious traffic with egress suspicious traffic to findworms. This feature does not have any effect on ourattack because the attack instances blend in with normal.

4.2 Evading 1-gram

To evade 1-gram PAYL, the frequency of each characterin the attack packet should be close to the average fre-quency recorded during the learning phase. We substitutethe characters in the attack packet with the charactersseen in the normal traffic, and apply sufficient amount ofpadding so that the 1-gram frequencies of the resultingpacket match the normal profile very closely. We firstpresent analytical results on the amount of paddingrequired to match the substituted attack body with thenormal profile perfectly. Then we present a substitutionalgorithm that uses the padding criteria to minimize theamount of required padding.

In the following sections, we assume that the normalfrequency f(x) has already been adjusted for the attackvector, the decryptor, and the decoding table (as dis-cussed in Section 3.4.1, these parts need to be accountedfor when computing the frequencies of characters to finda suitable substitution).

4.2.1 Padding

Let w and w be the substituted attack body before andafter padding, respectively. Let n be the number ofdistinct characters in the normal traffic. ‖s‖ denotesthe length of a string s, and #i denotes the number ofoccurrences of the normal character xi in the paddingsection of the blending packet. Then,

‖w‖ = ‖w‖ +n!

i=1

#i (2)

Suppose the relative frequency of character xi in thenormal traffic and the substituted attack body is f(xi)and f(xi), respectively. Since the final desired frequencyof xi is f(xi), the number of occurrences of xi in theblending packet should be ‖w‖f(xi). Thus, #i can bedefined using the following equation:

#i = ‖w‖f(xi) − ‖w‖f(xi), 1 ≤ i ≤ n (3)

Equation 3 can be re-written as,

‖w‖ =#i + ‖w‖f(xi)

f(xi), 1 ≤ i ≤ n (4)

Since f(x) and f(x) are relative frequency distributions,"i f(xi) =

"i f(xi) = 1. Unless they are identical,

there exists some character xi for which f(xi) > f(xi).The character xi is perhaps “overused” in the substitutedattack body. It is trivial to see that we need to pad allthe characters except the one that is most overused. Letxk be the character that has highest overuse and $ be thedegree of overuse. That is,

$ = $k = maxi{$i}, where $i =f(xi)

f(xi), 1 ≤ i ≤ n

(5)Since no padding is required for character xk, #k = 0.Putting this value in Equation (4) we get:

‖w‖ =0 + ‖w‖f(xk)

f(xk)= $‖w‖ (6)

The amount of padding required for each character xi

can be calculated by substituting the value of ‖w‖ inEquation (3):

#i = ‖w‖($f(xi) − f(xi)) (7)

Thus, using the padding defined by the above equation,we can match the final attack packet perfectly to thenormal frequency f(x). Furthermore, the amount ofpadding required by the above equation is the minimumamount that is needed to match the normal profile ex-actly. Please refer to Appendix 6.1 for the proof.

4.2.2 Substitution

The analysis in Section 4.2.1 shows that the amount ofpadding can be minimized by minimizing $, which is

max( f(xi)f(xi)

). This in turn means that the objective of the

substitution process is to minimize the resulting $. Thereare two possible cases for substitution. The first is whenthe number of distinct characters present in the attackbody (m) is less than or equal to the number of distinctcharacters present in the normal traffic (n), i.e. m ≤ n.In this case we can perform single-byte encoding, eitherone-to-one or one-to-many. If m > n, we need to usemulti-byte encoding.

Case: m ≤ n We suggest a greedy algorithm togenerate a one-to-many mapping from the attack charac-ters to the normal characters that provides an acceptablesolution and is computationally efficient. Our algorithmtries to minimize the ratio $ locally for each substitutionassignment.

Let xi represents a normal character and yj representan attack character. Let f(xi) be the frequency ofcharacter xi in normal traffic and g(yj) be the frequencyof character yj in the attack body. Let S(yj) be theset of normal characters to which yj is mapped. Let

tf(yj) = !xi!S(yj)f(xi). The probability that yj issubstituted by xi, xi ∈ S(yj), during substitution isf(xi)

tf(yj). Thus, the number of occurrences of xi in the

substituted attack body isf(xi)"g(yj)

tf(yj). We then have $i =

(f(xi)"g(yj)/tf(yj))f(xi)

= g(yj)

tf(yj). Our greedy algorithm

tries to minimize this ratio $i locally. The substitutionalgorithm is as follows.

Sort the normal character frequency f(x) and theattack character frequency g(y) in descending order. Forthe first m characters, map yi to xi and set S(yi) = {xi}and tf(yi) = f(xi), ∀1 ≤ i ≤ m. For the (m + 1)th

normal character, xm+1, find an attack character (yj)

with maximum ratio ofg(yj)

tf(yj). Assign xm+1 to yj and

set S(yj) = {xm+1} ∪ S(yj) and tf(yj) = tf(yj) +f(xm+1). This is performed for each of the remainingcharacters until we reach the end of the frequency listf(x). While substituting alphabet yj in the attackbody, we choose a character xi from the set S(yj) with

probabilityf(xi)

tf(yj).

Consider an example where f(a, b, c) ={0.3, 0.4, 0.3}, attack body w = qpqppqpq, andg(p, q) = {0.5, 0.5}. According to the above algorithm,initially, b and a are assigned to p and q respectively. At

this point, ratio g(p)ˆtf(p)

= 1.25 and g(q)ˆtf(q)

= 1.66. So we

assign c to q. Thus, p will be substituted by b and q willbe substituted by a with probability 0.5 and by c withprobability 0.5. Thus, the attack after substitution can be

w = cbabbcba.

In our experiments, we used a simple one-to-one map-ping where characters with the highest frequencies in theattack packet are mapped to characters with the highestfrequencies in normal traffic. This simple mapping isshown to be sufficient for the blending purpose.

Case: m > n We suggest a heuristic based onHuffman encoding scheme to obtain a small attack sizeafter encoding. Given the frequency distribution of thecharacters in the attack body being encoded, Huffmanencoding provides a minimum length packet after encod-ing. The weights of the nodes in Huffman tree is thesum of the relative frequencies of all its descendant leafnodes. The weight of a leaf node is the frequency of agiven character in the attack body. Every edge in the treeis assigned to a character from the normal profile. Inthe original Huffman coding the edges of the Huffmantree are labeled randomly. Random labeling of the edgesmay give us a very large value of $. We developed aheuristic to assign labels to edges of Huffman tree to finda mapping that gives us a very small $. Before stating theheuristic, we present the problem of optimally assigningthe labels to the edges in Huffman tree:

Given a Huffman tree, assign labels l(v) ∈ N to thevertices v in the tree, such that after substitution, $ =

max( f(x)f(x) ), ∀x ∈ N , is minimum. The constraint on

the label l(v) is that if parent(v1) = parent(v2), thenl(v1) '= l(v2).

We propose a greedy algorithm to find an approximatesolution for the above problem. First sort the verticesin descending order of their weight and initialize thecapacity of each character cap(xi) = f(xi), ∀xi ∈ N .Then starting from the leftmost unlabeled vertex vj , finda character xi with the maximum cap(xi) and that is notassigned to any of the direct siblings of vj . Assign xi

to vj and reduce the capacity of xi by the weight ofthe vertex. Repeat until all the vertices are assigned.The labels generated by the above algorithm are usedfor the substitution process. An example is explained inFigure 2.

4.3 Evading 2-gram

The 1-gram PAYL model assumes that the bytes oc-curring in the stream are independent. It does not tryto capture any information of byte sequencing of thenormal traffic. The 2-gram model on the other hand cancapture some byte sequencing information. It recordsthe frequencies of all the 2-grams present in the normaltraffic. It is easy to see that by matching 2-grams we areinherently performing 1-gram matching as well.

For 2-gram, the polymorphic blending process needsto match the frequencies of not only all the charactersbut also all the tuples. Similar to 1-gram substitution,

b

p q

1

r s

0.4 0.6

ab

0.15 0.25 0.25 0.35

baa

Figure 2: 1-gram multibyte encoding. Thefrequency of the normal character is f(a, b) ={0.5, 0.5}. Sorted weights of the nodes are{0.6, 0.4, 0.35, 0.25, 0.25, 0.15}. Using the proposedalgorithm we get S : {p, q, r, s} (→ {ba, bb, aa, ab}

d

a

b

c

Figure 3: 2-gram multibyte encoding. e0 = da, e1 = bc.w = 01101010. w = bdabcbcbdabcbdabcbda

one can either use single-byte encoding or multi-byteencoding for substitution. For single-byte encoding, thegoal is to find a one-to-one or one-to-many mapping thatensures that all the tuples in the substituted attack bodyare also present in normal profile. In Appendix 6.2,we show that this is NP-complete for the general caseby reducing the well known sub-graph isomorphismproblem [4] to the mapping problem. Unlike single-byteencoding, it is possible for an attacker to find a multi-byte encoding scheme that produces only valid 2-grams.Here, we present a viable multi-byte encoding scheme.

4.3.1 Multi-byte Encoding

A 2-gram normal profile can be viewed as a Mooremachine (FSM) which has a state for each character inN . Every state is a start state and end state. A transitionfrom state v1 to state v2 exists if and only if 2-gramv1v2 exists in normal profile. This FSM represents thelanguage accepted by the IDS with given 2-gram profile.Strings generated by the FSM contain only normal 2-grams. Characters in an attack body can be mappedto paths in this FSM. For example, suppose the statemachine has two cycles reachable from each other. e1

and e2 be two edges such that e1 is present only in thefirst cycle and e2 is present only in the second cycle.Given a bit representation of the attack body, we canencode 0 using e0 and 1 using e1. We can generate anybit string represented using these two tuples interleaved

by other non-informative characters present in the cyclesand in the paths between two cycles. Figure 3 shows anexample of such an encoding scheme. Such an encodedattack string will have a very large size. We use it toshow the existence of an encoding scheme that is ableto match the normal 2-grams. We can generate a moreefficient encoding scheme by using the entropy measureof transitions at each state. The complete details of suchan encoding scheme are not addressed in this paper. Theauthors suggest readers to refer to coding theory for moreon entropy based encoding.

4.3.2 Approximate Single-Byte Encoding

As discussed above, the problem of finding a single-bytesubstitution is hard for 2-gram. On the other hand, multi-byte encoding may increase the size of the attack pack-ets considerably. We can use a simple approximationalgorithm to find a good one-to-one substitution. Thealgorithm performs single byte substitution in such a waythat tuples with high frequencies in the attack packet aregreedily matched with tuples with high frequencies innormal traffic.

The details of the algorithm are as follows. First,sort the normal tuple frequencies f(xi,j) and the attacktuple frequencies g(yi,j) in descending order. Initially,all tuples in the list f(xi,j) are marked unused and thesubstitution table is cleared. The frequency list g(y)is traversed from the top. For every tuple yi,j in thesorted attack tuple list, the list f(x) is traversed fromthe beginning to find an unmarked tuple xi!,j! so thatsubstituting yi with xi! and yj with xj! does not violateany mappings that were already made. The tuple xi!,j!

is marked and the substitution table is updated. Theabove algorithm is fast and provides consistent reversiblematching. The algorithm does not guarantee to providethe best substitution, i.e., the closest distance to the targetfrequency distribution.

4.3.3 Padding

We introduce an efficient padding algorithm that does notprovide minimal padding but tries to match the targetdistribution in a greedy manner. Let df (xi,j) be thedifference between the frequency of tuple xi,j in thenormal profile and the substituted attack body. Find atuple xk,l from the list of normal tuples that starts withthe last padded character (xk) and that has the highestdf (xk,m), ∀1 ≤ m ≤ 256. The second character of thetuple, xl, is padded to the end of the packet and df (xk,l)is reduced. This step is repeated until the blending attacksize reaches a desired length.

4.4 Complexity of Blending Attacks

We now summarize the methods provided above andanalyze the hardness of a polymorphic blending attackwhile keeping the design goals (Section 3.2.2) in mind.

For 1-gram blending, although finding a substitution thatminimizes the padding seems to be a hard-problem andmay take exponential time, we have proposed greedyalgorithms that find a good substitution that requiresmall amount of padding to perfectly match the normalbyte frequency. For 2-gram blending, finding a single-byte substitution that ensures only normal tuples aftersubstitution is shown to be NP-hard (see the proof inAppendix 6.2). An approximation algorithm can be usedto efficiently compute a substitution that may introducea few invalid tuples. A multibyte encoding scheme canachieve a very good match with no invalid tuples at theexpense of very large attack sizes. An attacker has totherefore consider several trade-offs between the degreeof matching, attack size, and time complexity to mountsuccessful blending attacks.

4.5 Experiment Setup

4.5.1 Attack Vector

We chose an attack that targets a vulnerability in Win-dows Media Services (MS03-022). The attack vectorwe selected exploits a problem with the logging ISAPIextension that handles incoming client requests. It isbased on the implementation by firew0rker [8]. Thesize of the attack vector is 99 bytes and is required tobe present at the start of the HTTP request. The attackneeds to send approximately 10KB of data to cause thebuffer overflow and compromise the system. Our attackbody opens a TCP client socket to an IP address andsends system registry files. The size of the unencryptedattack body is 558 bytes and contains 109 differentcharacters. During the blending process, we divided ourattack into several packets. If our final blending attackafter padding does not add up to 10KB, we just sendsome normal packets as a part of the attack to cause thebuffer overflow. The decryptor was divided into multiplesections and distributed among different packets. Theattack body was divided among all the attack packets.

0

10000

20000

30000

40000

50000

60000

0 200 400 600 800 1000 1200 1400 1600

Fre

quency

of pack

ets

Packet length

IDS DataAttack Data

Figure 4: Packet length distribution

0

1000

2000

3000

4000

5000

6000

7000

8000

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Num

ber

of uniq

ue g

ram

s

Number of packets (in thousands)

1-gram2-gram

Figure 5: Observed Unique 1-grams and 2-grams

4.5.2 Dataset

Data Type FeaturePacket length

418 730 1460

IDS TrainingNum. of Pkts 16,490 540 1,781One Grams 106 90 128Two Grams 4,325 3,791 3,903

Attack TrainingNum. of Pkts 2,168 82 249One Grams 89 86 86Two Grams 2,847 2,012 2,196

Table 1: HTTP Traffic dataset

We collected around 15 days of HTTP traffic comingto our department’s network in November 2004. We usedseveral IDSs, including Snort, to verify that this datacontains no known attack. We removed all the packetswith no TCP payload. We used the data of the first14 days (4,356,565 packets, 1.9GB) for IDS training toobtain the IDS normal profiles. A separate profile wascreated for each TCP payload length (or simply packetlength). The full payload section of each packet was usedto compute the profiles. The last day of the HTTP trafficwas made available to the attacker to learn the artificialprofile. We also used cross-validation, i.e., randomlypicking one of the 15 days for attack training and the restfor IDS training, to verify the results of our experiments.

The packet length distributions in the IDS trainingdataset and the attack training dataset are shown inFigure 4. Among this packet lengths, we chose three dif-ferent lengths to implement the blending attack, namely418, 730 and 1460. These packets lengths are largeenough to accommodate the attack data into a small num-ber of packets. These lengths also occurred frequentlyin the training dataset. A separate artificial profile wascreated for each packet length using the attack trainingdata of the same packet length. Thus, we generated three1-gram models and three 2-gram models for differentpacket lengths. Table 1 shows the details of the datasetsused for the evaluation. The numbers of unique 1-gramsand 2-grams in the data are also shown in the table.

4.6 Evaluation

Training time of 1-gram and 2-gram PAYL: We per-formed experiments on the training time required to learnthe profiles used by PAYL. Figure 5 shows the numbersof unique 1-grams and 2-grams observed in HTTP trafficstream. Since the numbers of observed 1-gram and 2-gram continue to increase as new packets arrive in thestream, the training of profiles for 1-gram and 2-gramtakes a long time to converge. We trained our IDS modelusing all of the available IDS training data.Traditional polymorphic attacks: To the best of ourknowledge, CLET [5] is the only publicly availabletool that implements evasion techniques against bytefrequency-based anomaly IDS. For this reason we usedCLET as our baseline. As mentioned in Section 2,given an attack CLET adds padding bytes in the payloadto make the byte frequency distribution of the attackclose to the normal traffic. However, CLET does notapply any byte substitution technique (see Section 4.2.2).Further, CLET does not address the evasion of 2-gramPAYL explicitly. We also generated polymorphic attacksusing other well known tools (e.g., ADMutate [17]), andverified that they are less effective than CLET in evadingPAYL.

We generated multiple polymorphic instances of ourattack body using CLET and tested them against PAYL.Each attack instance contained one or more attack pack-ets of given length. Different amount of bytes werecrammed (padded) to obtain the desired attack size.Attack training data was used to generate spectral filesused for cramming by the CLET engine. A polymorphicattack instance will evade an IDS model if and only if allthe attack packets corresponding to the attack instanceare able to evade the IDS. Thus, the anomaly score ofan attack instance was calculated as the highest of allthe anomaly scores (Equation 1) obtained by the attackpackets corresponding to the attack instance. Table 2shows the anomaly threshold setting of different PAYLmodels that result in the detection of all the attackinstances. The anomaly thresholds were calculated asthe minimum anomaly score over all the attack instances.Using the given thresholds, both 1-gram and 2-gramPAYLwere successful in detecting all the instances of theattack. Having established this “baseline” performance,we would like to show that our blending attacks canevade PAYL even if a lower threshold is used.

Packet Length 1-gram 2-gram

418 872 1,399

730 652 1,313

1460 355 977

Table 2: IDS anomaly threshold setting that detects allthe polymorphic attacks sent by the CLET engine

10

15

20

25

30

35

40

45

0 5 10 15 20 25 30 35 40 45 50

Anom

aly

Sco

re

Number of attack training packets

1460730418

(a) 1-gram

100

150

200

250

300

350

400

0 5 10 15 20 25 30 35 40 45 50

Anom

aly

Sco

re

Number of attack training packets

1460730418

(b) 2-gram

Figure 6: Anomaly score of Artificial Profile

Packet Length 1-gram 2-gram

418 8 20

730 8 18

1460 14 40

Table 3: Number of packets required for the convergenceof attacker’s training

4.6.1 Artificial Profile

We used a simple convergence technique, similar toPAYL, to stop the training of the artificial profile. Atevery certain interval (convergence check interval) wecheck if the Manhattan 1 distance between the artificialprofiles at the last interval and the current interval issmaller than a certain threshold (convergence threshold).It stops training if the distance is smaller than thethreshold. We set the convergence threshold (= 0.05)to be the same as the original implementation of PAYL.The artificial profile does not have to become very stableor match the normal profile perfectly because somedeviation from the normal profile can be tolerated. Toreduce the training time we set the convergence checkinterval to 2 packets. Thus, if we see two consecutivepackets of a given length that are close to the learnedprofile, we stop training. Table 3 shows the numberof packets required to converge the artificial profile ofdifferent packet lengths. As expected, the artificialprofile converges very fast. The 1-gram profile convergesfaster than the 2-gram profile for the same packet length.We show that a small number of packets are enoughto create an effective polymorphic blending attack. Inpractice, the attacker can use more learning data to createa better profile.

Figure 6 shows the anomaly score of the artificialnormal profile, as calculated by the IDS normal profile,versus the number of attack training packets used to learnthe artificial profile. As the number of attack trainingpackets increases, the anomaly score of artificial normal

profile decreases, which means that the artificial profiletrained using more packets is a better estimation of thePAYL normal profile. The score needs to be less thanthe anomaly threshold of PAYL for the blending attackpackets to have a realistic chance of evading PAYL. Forall attack training sizes shown in Figure 6, the scoreis well under the threshold (Table 2) used to configurePAYL to detect all the traditional (without blending)polymorphic attack instances.

4.6.2 Blending Attacks for 1-gram and 2-gram

PAYL

For each packet length, we generated both the 1-gramand 2-gram PAYL normal profiles using the entire IDStraining dataset (i.e., the first 14 days of HTTP traffic).For each packet length, the 1-gram and 2-gram artificialnormal models were learned using a fraction of the attacktraining dataset. The learning stops at the point themodels converge, as shown in Table 3.

We used the one-to-one single-byte substitution tech-nique discussed in Section 4.2.2 for constructing theblending attack against 1-gram PAYL, and the singlebyte encoding scheme discussed in Section 4.3.2 forthe blending attack against 2-gram PAYL. Two sets ofblending experiments were performed. In the first set ofexperiments, the substituted attack body was divided intomultiple packets and each packet was padded separatelyto match the normal profile. A single decoding tableis required to decode the whole attack flow. In thesecond set of experiments, the attack body was firstdivided into a given number of packets. Each of theattack body sections were substituted using one-to-onesingle byte substitution and then padded to match thenormal frequency. Individually substituting the attackbody for each packet allowed us to match the statisticalprofile of the substituted attack body closer to the normalprofile. But it requires a separate decoding table for eachpacket, thus reducing the padding space considerably.For convenience, we call the first set of experiments

0

0.01

0.02

0.03

0.04

0.05

0.06

0 50 100 150 200 250

Re

lativ

e F

req

ue

ncy

Character

normattack

(a) Original attack packet

0

0.01

0.02

0.03

0.04

0.05

0.06

0 50 100 150 200 250

Re

lativ

e F

req

ue

ncy

Character

normattack

(b) 1-gram Blending Packet for packet length 1460

Figure 7: Comparison of frequency distribution of normal profile and attack packet

0

20

40

60

80

100

120

140

160

0 2 4 6 8 10 12

An

om

aly

Sco

re

Number of attack packets

ATT-1460IDS-1460ATT-730IDS-730ATT-418IDS-418

(a) 1-gram

100

200

300

400

500

600

700

800

0 2 4 6 8 10 12

An

om

aly

Sco

re



(b) 2-gram

Figure 8: Anomaly score of the blending attack packets (with local substitution) for artificial profile and IDS profile

global substitution, and the second local substitution. Ifm > n for any of the above experiments, we simplysubstituted the low frequency attack characters usingnon-existing characters in the normal. This increasedthe error in blending attack but reduced the complexityof the blending attack algorithm. Figure 7 shows thecomparison of the frequency distribution of differentcharacters present in the HTTP traffic. The byte fre-quency distribution of the original attack instance is verydifferent from the normal profile because the normaldata has mainly printable ASCII characters whereas theattack payload has many characters that are unprintable.Thus, this was easily detected by both 1-gram and 2-gram IDS models. The attack was substituted and paddedto obtain a single packet of length 1460. As shown inFigure 7(b), the frequency distribution of attack payloadafter substitution and padding becomes almost identicalto the PAYL normal profile. This demonstrates theeffectiveness of our polymorphic blending techniques.We studied how dividing an attack instance into severalpackets and blending them separately help match theattack packets with the artificial profile and evade PAYL.

The experiments were performed with the number ofattack packets ranging from 1 to 12. We checked theanomaly score of each attack packet as calculated byboth the artificial profile and the IDS profile. Similarto the anomaly score of attack instances generated byCLET, the anomaly score of a blending attack instancewas calculated as the highest of all the scores obtained bythe attack packets corresponding to the blending attackinstance. Figure 8 and Figure 9 show the anomaly scoresof blending attacks with local substitution and globalsubstitution, respectively. For each attack flow, we showthe score of the packet with the highest score. It isevident that if the attack is divided into more packets,it matches the profile more closely. The reason is thatif the attack body is divided into multiple fragments,for each packet there is more padding space availableto match the profile. Also, local substitution worksbetter than global substitution scheme for all cases exceptfor 2-gram blending for packet length 418. Since oursubstitution table contains only normal 1-grams but maycontain foreign 2-grams, a large substitution table mayproduce a large error for the 2-gram model. Considering

0

50

100

150

200

250

0 2 4 6 8 10 12

An

om

aly

Sco

re



(a) 1-gram

100

200

300

400

500

600

700

800

900

0 2 4 6 8 10 12

An

om

aly

Sco

re



(b) 2-gram

Figure 9: Anomaly score of the blending attack packets (with global substitution) for artificial profile and IDS profile

that small packets have small padding space to reducethe error caused by the substitution table, having anindividual substitution table in each packet can causelarge error.

Although the score of the blending attack as calculatedby the IDS model is greater than the score calculated bythe artificial normal profile, it is still much lower thanthe anomaly threshold set for the detection of traditionalpolymorphic attacks.

Thus, our experiment clearly shows that unlike tradi-tional polymorphic attacks, our blending attack is veryeffective in evading 1-gram and 2-gram PAYL for all thepacket lengths and number of attack packets.

4.6.3 IDS False Positive Rate And Its Impact onBlending Attacks

We also studied the effect of false positive rates on thedetection of blending attacks. Anomaly threshold for agiven false positive rate (fp) is set such that only fpfraction of normal data has anomaly score higher than theanomaly threshold. The anomaly thresholds for differentfalse positive rates are shown in Table 4. The numberof attack packets required to evade the IDS successfullyfor a given threshold is shown in the parenthesis. Aswe increase the false positive rate, we need to dividethe attack into more packets to keep the score below theanomaly threshold. Thus, keeping a high false positiverate may increase the size of the blending attack. Fromthe table we can infer that even if the IDS keeps its falsepositive rate high to detect more attacks, blending attackcan still easily evade it using an attack size as small as3,650, i.e. five packets of length 730.

Since 2-gram PAYL records some sequence informa-tion along with byte frequencies, it seems to be a goodrepresentation of normal traffic. In our experiments wefound that 2-gram PAYL consistently produces higheranomaly score than 1-gram PAYL for all attack packetlengths. But at the same time, the 2-gram IDS needs

to set very high anomaly thresholds to avoid high falsepositive rates. Thus, in practice, the 2-gram PAYL isactually only marginally more effective than the 1-gramversion in detecting attacks.

Blending attacks can be successfully launched on both1-gram and 2-gram models. Larger packet lengths aremore suitable for blending attacks. With few exceptions,the local substitution scheme works better than the globalsubstitution scheme. The 2-gram model provides onlymarginal advantage over the 1-gram model in detectingblending attacks but requires huge space to store themodel. Thus, the 2-gram model may not be a betterchoice over the 1-gram model.

4.7 Countermeasures

The experimental results reported above show that thestatistical models used by PAYL are not sufficiently ac-curate to detect deliberate evasion attempts. We believethis problem is common to other network anomaly IDSthat use traffic statistics [15, 18]. By following the ideaspresented in this paper, it may be fairly easy to devisedifferent blending algorithms in order to evade othernetwork anomaly IDSs that rely solely on some form ofpacket statistics. The reason is that traffic statistics usedby such network-based anomaly IDS do not provide acomprehensive representation of normal traffic. Appli-cation syntax and semantics related information cannotbe modeled accurately using simple statistics of networkpackets. On the other hand, some of the IDS introducedin Section 2, e.g., [1, 2, 30], use syntax and semanticsrelated information and could be used to detect thepolymorphic blending attack. Nevertheless, modelingapplication syntax and semantic information is in generalmore expensive than measuring simple traffic statistics.Thus the trade-off between detection accuracy, hardnessof evasion and operational speed has to be considered.A key direction to explore is to develop a more efficientsemantic-based IDS that can be deployed on high-speed

False Positive 418 730 1460

1-gram 2-gram 1-gram 2-gram 1-gram 2-gram

0.1 61.07 (17,-) 373.4 (-,12) 63.70 (5,7) 467.6 (5,5) 74.50 (3,3) 447.7 (2,2)

0.01 78.61 (12,15) 456.9 (22,8) 143.6 (2,3) 625.5 (3,3) 81.98 (3,3) 531.0 (2,2)

0.001 125.5 (5,7) 561.8 (7,6) 164.6 (2,3) 670.5 (3,3) 239.2 (1,1) 931.9 (1,1)

0.0001 166.8 (5,5) 582.6 (7,5) 244.5 (2,2) 805.0 (2,2) 243.4 (1,1) 935.0 (1,1)

Table 4: Anomaly thresholds for different false positive rates in IDS models. Bracketed entries are the the numbers ofpackets required to evade the IDS using the local and global substitution scheme, respectively.

networks.

Another defense approach is to use multiple IDS mod-els that use independent features. Such a collective setof models may be a better representation of the normaltraffic. In such a case, a polymorphic blending attack willneed to evade all (or the majority) of the models.

One reason blending attacks work is that the attackerhas the complete knowledge of the IDS model beingused. This gives the attacker an enormous advantage. Apossible countermeasure is to introduce randomness [27]in the IDS model. Consider a model constructed bymeasuring the occurrence frequency of pairs of non-consecutive bytes that are separated by % number ofbytes. For example, given a payload containing thesequence of byte values {b1, b2, · · · , bl}, the IDS couldmeasure the occurrence frequency of the pair of bytevalues (bi, bi+!+1), ∀i = 0, · · · , (l − % − 1), where lis the payload length. We call this a 2!-gram model. For% = 0, the 2!-gram model is the same as the 2-gramPAYL model. If the IDS chooses % at random duringthe training phase, this makes the blending attack moredifficult given that the attacker needs to guess the valueof % before applying the blending algorithm (note that %is chosen at random before the model is created and isfixed for each packet. Therefore, the 2!-gram model isas complex as the 2-gram model used by PAYL). Fur-thermore, the IDS could construct m different models,each of them having a different randomly chosen %k,with k = 1, · · · , m, and combine their output in orderto obtain a more accurate decision about the packets. Inthis case the attacker needs to guess m values for theparameter % and needs to devise a blending algorithmthat “satisfies” all the m different models at the sametime. This means that even if the attacker knows exactlyhow the IDS performs the training and test phases, it ismuch more difficult to evade it.

Preliminary experimental results show that if % issmall with respect to the payload size, the 2!-grammodel is able to capture a sufficient amount of structuralinformation that allows to construct an accurate IDSmodel. Further, the combination of different 2!-grammodels appears to be a promising technique. However,the complexity of the detection system grows linearlywith m. A thorough analysis of this modeling techniqueis beyond the scope of this paper and will be the subject

of our future work.

While countermeasures may make evasion harder tosucceed, they typically require more resources and canbe more complex in design and implementation. Itmay also produce higher error rates if the IDS uses toomany features such that its models “overfit” the data. Inshort, trade-offs between “hardness of evasion” and otherperformance measures need to be carefully considered.

5 Conclusion

In this paper, we presented a new class of attacks calledpolymorphic blending attacks. Existing polymorphictechniques can be used for evading signature-based IDSbecause the attack instances do not share a consistentsignature. But anomaly IDS can detect these attackinstances because the polymorphism techniques fail tomask their statistical anomalies. Our proposed attackovercomes this very shortcoming. The idea is to firstlearn the normal profiles used by the IDS, and then, whilecreating a polymorphic instance of an attack, make surethat its statistics match the normal profiles.

We described the basic steps and general techniquesthat can be used to devise polymorphic blending attacks.We presented a case study using the anomaly IDS PAYLto demonstrate that these attacks are practical and feasi-ble. Our experiments showed that polymorphic blendingattacks can evade PAYL while traditional polymorphicattacks cannot. We also showed that an attacker does notneed a large number of packets to learn the normal profileand blend in successfully. The results with 2-gram PAYL

suggested that simply using more complex features ormodels do not always provide a good defense againstthese polymorphic blending attacks. We discussed somepossible defenses against polymorphic blending attacks.

Acknowledgments

This work is supported in part by NSF grantCCR-0133629 and Office of Naval Research grantN000140410735. The contents of this work are solelythe responsibility of the authors and do not necessar-ily represent the official views of NSF and the U.S.Navy. The authors would like to thank the anonymousreviewers for helpful comments and the shepherd of thispaper Professor Fabian Monrose at The Johns HopkinsUniversity for very valuable suggestions.

References

[1] P. Akritidis, E. P. Markatos, M. Polychronakis, and K. D. Anag-nostakis. Stride: Polymorphic sled detection through instructionsequence analysis. In Proceedings of the 20th IFIP InternationalInformation Security Conference (IFIP/SEC 2005), 2005.

[2] R. Chinchani and E.V.D. Berg. A fast static analysis approach todetect exploit code inside network flows. In Recent Advances inIntrusion Detection, 2005.

[3] M. Christodorescu, S. Jha, S. Seshia, D. Song, and R. Bryant.Semantics-aware malware detection. In Proceeding of the IEEESecurity and Privacy Conference, 2005.

[4] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction toalgorithms. The MIT Press, 1990.

[5] T. Detristan, T. Ulenspiegel, Y. Malcom, and M. Underduk.Polymorphic shellcode engine using spectrum analysis. PhrackIssue 0x3d, 2003.

[6] H. Feng, J. Giffin, Y. Huang, S. Jha, W. Lee, and B. Miller.Formalizing sensitivity in static analysis for intrusion detection.In Proceedings the IEEE Symposium on Security and Privacy,2004.

[7] H. Feng, O. Kolesnikov, P. Fogla, W. Lee, and W. Gong. Anomalydetection using call stack information. In Proceedings of theIEEE Security and Privacy Conference, 2003.

[8] Firew0rker. Windows media services remote command ex-ecution exploit. http://www.k-otik.com/exploits/07.01.nsiilog-titbit.cpp.php, 2003.

[9] S. Forrest, S.A. Hofmeyr, A. Somayaji, and T.A. Longstaff. Asense of self for unix processes. In Proceedings of the IEEESymposium on Security and Privacy, 1996.

[10] Threat Intelligence Group. Phatbot trojan analysis.http://www.lurhq.com/phatbot.html.

[11] M. Handley and V. Paxson. Network intrusion detection: Eva-sion, traffic normalization, and end-to-end protocol semantics. In10th USENIX Security Symposium, 2001.

[12] O. M. Kolesnikov, D. Dagon, and W. Lee. Advanced polymor-phic worms: Evading IDS by blending in with normal traffic.Technical Report GIT-CC-04-13, College of Computing, GeorgiaTech, 2004.

[13] C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna.Automating mimicry attacks using static binary analysis. In 14thUsenix Security Symposium, 2005.

[14] C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna.Polymorphic worm detection using structural information ofexecutables. In Recent Advances in Intrusion Detection, 2005.

[15] C. Kruegel, T. Toth, and E. Kirda. Service specific anomalydetection for network intrusion detection. In Proceedings of ACMSIGSAC, 2002.

[16] C. Kruegel and G. Vigna. Anomaly detection of web-basedattacks. In Proceedings of ACM CCS, pages 251–261, 2003.

[17] Ktwo. Admmutate: Shellcode mutation engine.http://www.ktwo.ca/ADMmutate-0.8.4.tar.gz, 2001.

[18] M. Mahoney. Network traffic anomaly detection based on packetbytes. In Proceedings of ACM SIGSAC, 2003.

[19] M. Mahoney and P.K. Chan. Learning nonstationary models ofnormal network traffic for detecting novel attacks. In Proceedingsof SIGKDD, 2002.

[20] J. Newsome, B. Karp, and D. Song. Polygraph: Automaticallygenerating signatures for polymorphic worms. In Proceeding ofthe IEEE Security and Privacy Conference, 2005.

[21] R. Perdisci, D. Dagon, W. Lee, P. Fogla, and M. Sharif. Mislead-ing worm signature generators using deliberate noise injection. InProceedings of the IEEE Security and Privacy Conference, 2006.

[22] T.H. Ptacek and T.N. Newsham. Insertion, evasion, and denial ofservice: Eluding network intrusion detection. Technical ReportT2R-0Y6, Secure Networks, Inc., 1998.

[23] Rain Forest Puppy. A look at whisker’s anti- idstactics just how bad can we ruin a good thing?www.wiretrip.net/rfp/txt/whiskerids.html, 1999.

[24] S. Rubin, S. Jha, and B.P. Miller. Automatic generation and anal-ysis of nids attacks. In Annual Computer Security ApplicationsConference (ACSAC), 2004.

[25] M. Sedalo. Jempiscodes: Polymorphic shellcode generator.www.shellcode.com.ar/en/proyectos.html.

[26] D. Song. Fragroute: a tcp/ip fragmenter.www.monkey.org/!dugsong/fragroute, 2002.

[27] Sal Stolfo. Personal communication. 2005.

[28] P. Szor. Advanced code evolution techniques and computer virusgenerator kits. The Art of Computer Virus Research and Defense,2005.

[29] K.M.C. Tan, K.S. Killourhy, and R.A. Maxion. Underminingan anomaly-based intrusion detection system using commonexploits. In Recent Advances in Intrusion Detection, 2002.

[30] T. Toth and C. Kruegel. Accurate buffer overflow detection viaabstract payload execution. In Recent Advances in IntrusionDetection, 2002.

[31] G. Vigna, W. Robertson, and D. Balzarotti. Testing network-based intrusion detection signatures using mutant exploits. InProceedings of the ACM Conference on Computer and Commu-nication Security (ACM CCS), pages 21–30, 2004.

[32] D. Wagner and D. Dean. Intrusion detection via static analysis. InProceeding of IEEE Symposium on Security and Privacy, 2001.

[33] D. Wagner and P. Soto. Mimicry attacks on host-based intrusiondetection systems. In Proceedings of the ACM Conference onComputer and Communication Security (ACM CCS), 2002.

[34] H. J. Wang, C. Guo, D. R. Simon, and A. Zugenmaier. Shield:Vulnerability-driven network filters for preventing known vulner-ability exploits. In the Proceedings of ACM SIGCOMM, 2004.

[35] K. Wang and S. Stolfo. Anomalous payload-based networkintrusion detection. In Recent Advances in Intrusion Detection,2004.

[36] K. Wang and S. Stolfo. Anomalous payload-based worm detec-tion and signature generation. In Recent Advances in IntrusionDetection, 2005.

[37] T. Yetiser. Polymorphic viruses: Implementation, detection, andprotection. Technical report, VDS Advanced Research Group,1993.

Notes1Used by authors of PAYL to compare two models for convergence

6 APPENDIX

6.1 Proof of Optimal Padding for 1-gramBlending Attack

We prove that the padding calculated using Equation (7)is minimum for matching the 1-gram profile exactly.

Theorem 6.1 #i ≥ 0, ∀1 ≤ i ≤ n

Proof We prove the theorem by contradiction. Assumethat for some j, #j < 0. Then from Equation (7),

‖w‖($f(xj) − f(xj)) < 0. Thus, $ < f(xj)f(xj)

. This

contradicts Equation (5), therefore, all #i ≥ 0.

The frequency of a character xi in the packet after

padding is f(xi) = #w#f(xi)+"i

#w# . Using Equation (7)

and Equation (4), f(xi) = f(xi). Thus, the final attackpacket after padding has the exact target distribution,f(xi).

Theorem 6.2 The padding calculated using Equa-

tion (7) is the minimum required padding to match

frequencies exactly.

Proof Suppose we perform padding using Equation (7).Suppose there exists another packet (say p, ‖p‖ <‖w‖) with smaller padding and matches the frequenciesexactly. Since #k = 0, the number of occurrences of xk

in p cannot decrease. Thus, frequency of xk in packet

p is fp(xk) = #w#f(xk)#p# = #w#f(xk)

#p# > f(xk). Thus,

packet p does not match the normal frequencies exactly.Thus, we have reached a contradiction.

6.2 Proof of Hardness of 2-gram Single-Byte Encoding

First, we look at the problem of evading a simple IDSthat stores all the 2-grams present in the normal stream.While monitoring, it checks if all the 2-grams present inthe traffic are also present in the normal 2-gram list. Inthe event that the IDS finds a 2-gram that was not presentin normal traffic, IDS raises an alarm. Blending theattack packet with the normal traffic requires the attackerto transform the packet such that all the 2-grams in thepacket after substitution are also present in the normal2-gram list. Matching the frequencies of the tuples is atleast as hard as the above simplified problem.

Suppose we have a normal traffic profile (N, TN) andan attack packet description (M, TM ), where N and Mis the set of normal and attack characters, respectively.TN and TM is the set of different 2-grams present innormal traffic and the attack, respectively. Also, theattacker is allowed to do only one-to-one substitutionfrom M to N . Then, blending of the packet translates tofinding a substitution S such that all the tuples in S(w)

are also present the normal profile. That is if a1a2 ∈ TM ,then S(a1a2) ∈ TN .

Theorem 6.3 The problem of finding a one-to-one sub-

stitution S to match 2-grams is NP-complete.

Proof To prove that the problem is in NP-complete,we need to show that the problem is polynomial timeverifiable and NP-hard.

Given a solution substitution S for the 2-gram match-ing problem, we can calculate S(w) in O(‖w‖) steps.For each 2-gram present in S(w), checking if it is presentin TN can be done in O(‖w‖.TM ) steps. Thus, thisproblem is poly-verifiable and consequently in NP.

To show that the problem is NP-hard, we reduce theproblem of sub-graph isomorphism to substitution prob-lem. A sub-graph isomorphism problem is that given twographs G(V, E) and G$(V $, E$), decide whether G$ isa sub-graph of G. Mathematically, we want to checkif there is a mapping S(V $ (→ V ), s.t. ∀(v1, v2) ∈E$, (S(v1), S(v2)) ∈ E.

Suppose, N = V . For each edge e = (v1, v2) ∈ E,add two 2-grams (v1v2, v2v1) in the normal profile (TN ).Suppose M = V $. For each edge e$ = (v1, v2) ∈ E$, weadd two 2-grams (v1v2, v2v1) in the attack profile (TM ).

If the above 2-gram matching problem has a solution,then we can find a mapping S(V $ (→ V ) such that forall 2-grams (a1a2) ∈ TM , S(a1a2) ∈ TN . Since the 2-grams in TM correspond to edges in G$ and the 2-gramsin TN correspond to edges in G, the above statementsuggests that ∀e$ ∈ G$, S(e$) ∈ G. This means thatgraph G$ is isomorphic to a sub-graph of G with mappinggiven by S.

Also, if there does not exist a solution to the 2-gram matching problem, then there does not exists asubstitution St such that G$ is a sub-graph of G aftersubstitution. Otherwise, St will result in a successful 2-gram mapping.

Thus, the 2-gram matching problem is at least as hardas the sub-graph isomorphism problem. It is known thatthe sub-graph isomorphism problem is NP-complete.Also, we have already proved that the 2-gram matchingproblem is in NP. Thus, the 2-gram matching problem isNP-complete.

Even if an IDS allows constant number of mismatches,it can be shown that the problem still remains NP-complete. This is followed by the result that sub-graphisomorphism with constant number of edge insertion,deletion, and substitution is also NP-complete. Thismeans that an attacker cannot get the substitution thatwill match the normal profile with a small constantnumber of mismatched 2-grams. Also, the one-to-onesubstitution problem can be easily reduced to one-to-many substitution. Thus, solving one-to-many substitu-tion is also hard.

P olym orp h ic B len d in g A ttack swenke.gtisc.gatech.edu/papers/usenix_security_2006.pdfP olym...

Documents

Transcript of P olym orp h ic B len d in g A ttack swenke.gtisc.gatech.edu/papers/usenix_security_2006.pdfP olym...