Cobra Fine grained Malware Analysis using Stealth Localized executions

[ Pobierz całość w formacie PDF ]

gine. The W32/Ratos employs a very intelligent mechanism in that of analysis at different times for a given code stream. Therefore,
it uses the privilege level indicator in the segment selectors as a we will concentrate on presenting the performance of Cobra based
part of its internal computations (lines 3 9, Figure 10b). Thus, if on analysis sessions with a Windows based trojan, W32/Ratos (see
AR = Analysis Range Latency due to Latency due to Latency due to Latency due to
Block Creations Xfer-Stubs Stealth Implants Block Purging
(112% , 2.47s)
AR-1 (105% , 2.33s)
AR-1
AR-2
AR-2
(403% , 4.04s)
AR-3 (111% , 1.1s)
(165% , 1.59s) AR-3
(332% , 3.39s)
(38% , 0.75s)
AR-4 (35% , 0.67s)
AR-4
0 50 100 150 200 250 0 50 100 150 200 250
Latency in Clock Cycles ( x 107)
Latency in Clock Cycles ( x 107)
(b) With Block-Coalascing
(a) Normal
(58% , 1.39s)
(79% , 1.63s)
AR-1
AR-1
AR-2
AR-2 (287% , 3.01s)
(85% , 0.93s)
AR-3
(112% , 0.99s)
AR-3
(21% , 0.44s)
(36% , 0.77s) AR-4
AR-4
0 50 100 150 200 250 0 50 100 150 200 250
Latency in Clock Cycles ( x 107)
Latency in Clock Cycles ( x 107)
(209%, 2.14s)
(c) With Block-Coalascing and Skipping (d) With Block-Coalascing and Skipping
on standard code streams on standard and non-standard code streams
Figure 11. (a)-(d) Performance of Cobra on Analysis Sessions with the W32/Ratos
Section 5). The performance of the framework for other analysis major portion of Cobra s overall latency since these elements form
sessions can be estimated in a similar fashion. the backbone of the framework. Latency due to block-purging only
comes into effect when an analysis range involves self-modifying
Before we proceed to present the performance measurements of
Cobra, a few words regarding the test-bench are in order. To val- code (analysis ranges 1 3 in this case) and is due to the fact that
idate Cobra, we make use of our prototype malware analysis en- the framework invalidates the blocks corresponding to the modi-
fied code regions. Latency due to stealth-implants occur when Co-
vironment, WiLDCAT. The current version of WiLDCAT employs
bra needs to patch a block in order to prevent its detection. This is
Cobra for its functioning and runs under the Windows OS (9x and
shown in analysis ranges 1, 2 and 4 which contain W32/Ratos anti-
XP) on the IA-32 (and compatible) processors. For test purposes,
analysis code fragments. In general Cobra incurs lower overall la-
an Intel 1.7 GHz processor with 512 MB of memory was used.
tency when the ratio of straight line instructions to branches and
We divide the total run-time latency of Cobra into: (a) latency
loops is greater over a localized code region as exemplified by anal-
due to block creations, (b) latency due to xfer-stubs, (c) latency due
ysis range 4. In other cases the overall latency of the framework de-
to block purging, and (d) latency due to stealth implants. Readings
pends upon the number and nature of the branches encountered in
were taken at various points within WiLDCAT and Cobra to mea-
the code stream. As an example, analysis range 2 incurs a latency as
sure these overheads. We use processor clock cycles as the perfor-
high as 5 times the normal execution time due to a high amount of
mance metric for the runtime latency. This metric is chosen, as it
code obfuscation via jumps. This is due to the increased block cre-
does not vary across processor speeds and also since it is a stan-
ations and xfer-stub overheads for such blocks, to ensure that the
dard in literature related to micro benchmarks. The processor per-
obfuscation is tackled completely during block execution (see Sec-
formance counter registers were used to measure the clock cycles
tion 4.1.2).
by using the RDMSR instruction.
Figure 11 shows the performance of Cobra under different anal- Figure 11b shows the latency of Cobra employing block-
coalescing on the same analysis ranges. Block-coalescing helps in
ysis methods for various analysis ranges. The analysis ranges were
chosen from our analysis sessions involving the encryption and de- reducing the latency due to xfer-stubs when analyzing code involv-
ing loops over instruction blocks. As seen from the graph in Fig-
cryption engine of the W32/Ratos. The choice of the analysis ranges
ure 11b, analysis ranges 2 and 3 which contain a large number of
(single-step handler, parts of first and second decryption layers)
loops incurs a much lower overall latency with block-coalescing
were such that their semantics are relatively constant and they occur
for every instance of the trojan deployment. This allows us to ob- when compared to its overall latency without block-colascing (Fig-
ure 11a). However, for analysis range 1 there is negligible gain in
tain a deterministic performance measure of various aspects of the
performance with block-coalescing, since the number of code loops
framework. For the graphs in Figure 11, the y-axis (category axis)
is very less in its case. A point to note is that the latency of anal-
represents the analysis ranges the x-axis is the amount of extra clock
cycles that are incurred as opposed to the native run-time of that par- ysis range 4 with block-coalescing is more than its latency with-
out block-coalescing in Figure 11a. This is due to the fact that the
ticular range. Also, the data label next to each of the stacked bar in
W32/Ratos generates varying amount of code for a given function-
all the graphs represent the percentage of normalized latency and its
ality and embeds a random amount of anti-analysis code fragments
corresponding time in seconds for a 1.7GHz Intel processor.
in different instances of its deployment due to its metamorphic na-
Figure 11a shows the performance of Cobra when run normally
ture.
(without applying performance enhancement techniques such as
block-coalescing and/or skipping). As seen from the graph of Fig- block-skipping helps to further reduce the overall latency by ex-
ure 11a, latency due to block-creations and xfer-stubs are present in cluding standard and/or already analyzed code streams from the
every analysis range (analysis ranges 1 4 in this case) and form a analysis process. Figure 11c shows the performance of Cobra with
block-coalescing and block-skipping applied to standard kernel 2002 IEEE Symposium on Security and Privacy, pages 143
code streams. As seen from the graph of Figure 11c, the latency of 159, May 2002.
analysis ranges 1 and 2 are reduced further when compared to their
[2] F. Bellard. Qemu, a fast and portable dynamic translator.
latency without block-skipping in Figure 11b. This is because, the
In USENIX 2005 Annual Technical Conference, FREENIX
code streams in analysis ranges 1 and 2 invoke standard kernel func-
Track, pages 41 46, 2005.
tions such as VirtualProtect, KeSetEvent, KeRaiseIrql etc. which
[3] J. Bergeron, M. Debbabi, J. Desharnais, M. Erhioui,
are excluded from the slicing process with block-skipping. How-
Y. Lavoie, and N. Tawbi. Static detection of malicious code in
ever, analysis range 4 has negligible improvement since it does not
executable programs. Symposium on Requirements Engineer-
involve any calls to standard code-streams. Figure 11d shows the
ing for Information Security (SREIS 01), March 2001.
performance of Cobra with block-coalescing and block-skipping
[4] J. Bergeron, M. Debbabi, J. Desharnais, B. Ktari, M. Salios,
on standard as well as already analyzed malware code streams. As
N. Tawbi, R. Charpentier, and M. Patry. Detection of mali-
an example the single-step handler always invokes the code block
cious code in cots software: A short survey. First International [ Pobierz całość w formacie PDF ]

do ÂściÂągnięcia - pobieranie - pdf - download - ebook

Wątki