Saturday 31 August 2013

Advanced grepping through directory trees with binary data |Jakob Lell's Blog

When reverse engineering stuff you frequently get a directory tree with a complete bunch of documents (both binaries and textual content files) and you want to quickly locate all occurrences of key phrases you are intrigued in. Normal illustrations for this problem are software directories of programs, extracted Applications or the root filesystem of an embedded technique.

When reverse engineering Linux-based mostly firmware photos you usually commence by extracting the root filesystem (or initrd) so that you can examine the userspace packages, scripts and configuration information. There are previously some good tutorials ([one] [two] [3]) and instruments like binwalk and firmware-mod-kit which automate numerous actions Polarized of obtaining/extracting the root filesystem from a binary firmware impression. However, after you have acquired the root filesystem, you frequently discover a complete bunch of files and it can be really tough to locate the fascinating stuff to analyze. For occasion, you could uncover a juicy configuration variable in /and so forth and want to discover all references to this configuration variable in the firmware. Utilizing the regular grep utility does a excellent task at analyzing textual content documents but it is not almost as useful for binary information, which may possibly still include the search term you are hunting for. By default grep only claims whether the key phrase is there or not and it does not screen the context close to the key phrase (as it does for text data files). Forcing grep to handle binaries as textual content information making use of the -a alternative also does not fix the problem both considering that grep will then output a entire bunch of binary information prior to and soon after the match until finally the up coming newline and you most likely really don't want to see this binary info in your terminal.

But the good news is there are a great deal of beneficial normal resources offered on a Linux method and you can cleverly combine them to get over this limitation. I have come up with the adhering to command for grepping by means of directory trees:

uncover . -type f -print0|xargs - strings -a --print-file-title|grep -i -E ':.*your_key word_here'|considerably less -S

The locate command just searches the existing directory for documents and prints the filenames to standard output separated by a null byte. Utilizing a null byte as an alternative of a newline helps make confident that it doesn’t fall short if filenames in the tree contain specific people this sort of as a areas or newlines. Utilizing the filter "-sort f" helps make sure that it only finds typical files and not directories, symlinks, units or unix area sockets, which may exist in your listing as nicely and would lead to troubles with the following tools.

The output of discover is piped to xargs, which will call the command strings for all data files identified by the find command. The option "-" tells xargs that the input is divided by null bytes instead of newlines. The system strings looks through the file and outputs all sequences of at minimum four printable people. Given that grep processes the output of strings and not the real files, grep cannot demonstrate the filename of a match (as it does when using grep to recursively search in a listing). Because you typically want to know in which files your search benefits are, you can use the choice --print-file-names of strings so that the output consists of the filename as nicely. The -a choice of strings tells it to parse the whole file and not only specified sections of ELF documents.

The subsequent step is to use grep to filter the output of strings in purchase to lookup for a distinct key phrase. If you do not want to lookup situation-insensitively, you’ll have to remove the -i alternative of grep. Utilizing the pattern ':.*' ahead of the real key phrase makes sure that it will not flood your search outcomes with all strings of a file if the filename (which is prepended by the --print-file-identify of strings) previously includes the keyword you are searching for.

Very last but not least I recommend piping the final results to significantly less -S so that less will only use one particular line of the monitor per result. This tends to make the final results simpler to interpret particularly if you have actually prolonged strains in the benefits (which at times occurs with firmware images) and you don’t want to have a hundred traces of wrapped textual content for one one research result. You can still see the entire output lines by scrolling horizontally in less (or just use the search function of significantly less to navigate to the true key phrase).

The research can consider some time specifically for massive listing trees. In that case you can effortlessly speed up the method by saving the output of strings to a file:

find . -kind f -print0|xargs - strings -a --print-file-name > /tmp/strings.txt

This intermediate outcomes can then be employed for numerous lookups:

cat /tmp/strings.txt|grep -i -E ':.*your_key phrase_here'|much less -S

A check with the 2.one GB /usr/lib/ listing on my notebook designed a one.two GB strings.txt and seeking this file normally takes some ten seconds provided that it is nonetheless cached in memory.

The same instructions can also be utilised for other reversing responsibilities this sort of as program directories, extracted apps or even world wide web purposes (which might also include binary information like sqlite databases).

If you count on other character encodings this kind of as utf16 (wich is very widespread for Windows programs), you will require to use the -e alternative of strings. The adhering to command attempts ascii/utf8, utf16 and utf32:

for enc in S l Ldo find . -sort f -print0|xargs - strings -e $enc --print-file-namedone > /tmp/strings.txt cat /tmp/strings.txt|grep -i -E ':.*your_search term_here'|less -S

.

No comments:

Post a Comment