Od Hexdump

0 views

Skip to first unread message

Karina Edling

unread,

Aug 3, 2024, 12:56:24 PM8/3/24

to onritote

The file command knows from the first 8 bytes what this file is. The libpng specification alerts programmers what to look for. You can see that within the first 8 bytes of this image file, specifically, is the string PNG. That fact is significant because it reveals how the file command knows what kind of file to report.

First, you know that you want hexdump to process the PNG file in 8-byte chunks. Furthermore, you may know by integer recognition that the PNG spec is documented in decimal, which is represented by %d according to the hexdump documentation:

Hexdump is a fascinating tool that not only teaches you more about how computers process and convert information, but also about how file formats and compiled binaries function. You should try running hexdump on files at random throughout the day as you work. You never know what kinds of information you may find, nor when having that insight may be useful.

Opensource.com aspires to publish all content under a Creative Commons license but may not be able to do so in all cases. You are responsible for ensuring that you have the necessary permission to reuse any work on this site. Red Hat and the Red Hat logo are trademarks of Red Hat, Inc., registered in the United States and other countries.

As others have pointed out, this is because hexdump -x treats the files as containing 2-byte words. On little endian systems (almost all desktops are), this means the bytes will be swapped before they are displayed. This means that the byte values are printed in pairs and that the order of these bytes are swapped. Since you have an odd number of bytes, hexdump just adds a zero to make up the final pair. The zero is then swapped with the 0a. This is documented behaviour for hexdump, so it is not lying to you!

Using hexdump -C is a better command to get a formatted output that shows the bytes in the order they are in the file. Also the 0a is a new line and was probably added quietly by whatever created the file (vim does this by default). Eg, echo will always add a new line if you don't tell it not to. In bash:

hexdump -x displays the values as if they were 2-byte integers. On a little-endian machine this will display each pair of bytes in swapped order, treating them as two-byte quantities with the high-order (second) byte first, followed by the low-order (first) byte.

As you've seen, using hexdump -C displays the actual bytes. The actual contents of your file are the two bytes 0xCF 0x9E, followed by the newline character 0x0A. Vim and ls are correctly telling you that there are 3 bytes (2 characters). The first two bytes comprise one Unicode character using the UTF-8 encoding.

As a big endian value, that would be 2^9 (512) + 2^24 (16777216). This is what I mean by us "thinking" in big endianess. If we write out a binary number we use big endian bit order (one byte 00000010 == 2) and so when the number is longer than one byte, we would use big endian byte order (two bytes 0000000000000010 == 2).

I was expecting an output of ff11 1111 11 but looking at in hexdump showed me this: 11ff 1111 0011 at first I was confused, and thought maybe I had discovered some obscurity in my assembler (obviously I have not used the .align directive here, so this code would be incorrect in a real-life usage, and I thought the assembler might be doing something weird because of this). However when I went and checked the output using the program hexedit (if you are unfamiliar with this it is just a simple command line hexeditor), and it showed me what I expected (ff 11 11 11 11). Does anyone know why I am receiving this odd output? Is this a bug in hexdump, or does hexdump not behave like I am expecting it to for some other reason?

I was working through my most recent class, Application Security, and one of the exercises required us to find a secret message hidden in an image. Now, I know you can do this manually with hexdump -C. That output looks something like this:

This is fine unless your image is huge or your secret message has a bunch of garbage bytes mixed into it for extra secrets. So I was trying to look up a way to get it to just kick out the ASCII output on its own so I could use other tools like grep to search through it, when I stumbled over a reference to the strings command. What is the strings command?

Huh, I've never seen text just appended to an image or binary to hide it.
They usually just modify the least significant bit of each color channel in each pixel and use those to construct a new binary, so 2 pixels per byte/character (r, g, b, a).

Yeah, I was looking for a quick and dirty way to do an example, but you're right. There are a lot more common and better ways to hide text in images. I knew they were out there, but I appreciate you explaining it here. Thanks!

Hexdump's ability to read binary data and format it appropriately so it can, for example, be piped to awk is very useful, but I regularly need to read files in which the binary data is of a different endian-ness from that native to the system. In particular, I need to read big-endian data on a little endian machine. My ideal solution would be "hexdump" with a switch to reverse the endian-ness, but such a switch doesn't seem to exist.

What you are seeing there is the binary contents of the mydata file shown as lines of 8 individual 16-bit values in hexadecimal. The first number on each line is the starting offset in the file for the first of the 8 following values on that line. The * indicates that all of the lines from 0000030 to 0001020 inclusive would be the same value as the 0000020 line and that keeps the output condensed for large files with long and aligned repeating sequences. Be aware that, with no command line options, hexdump will dump the entire file to screen so use it with care on large files.

If you have a really big file where you are only interested in seeing some of the data you can use the -n option to specify how many bytes to dump. The -n 32 in the following example causes hexdump to show only 32 bytes of the file.

If you are only interested in the data some distance into the file you can use the -s option to specify how far into the file to start dumping from. The -s 16 in the following example causes hexdump to show the content of the file beginning 16 bytes in from the start and continuing to the end of the file. :

You can also combine these two switches to see a subset of the data some offset into the file. In this example, using both -s 16 and -n 32 causes hexdump to show 32 bytes of the file beginning 16 bytes in from the start.:

You can stop hexdump from replacing duplicate lines with a * by using the -v option. In the following example -v causes the previously collapsed duplicate rows at offsets 30 hex and 40 hex to be displayed:

Now the structure is beginning to show. After that I get to the values that will be useful in mounting the file systems and for which single byte output is not useful. The start sector number (logical block address, or LBA) and length (in sectors) for the partition are given as 32 bit values:

Note that the latest change adds a 2/4 count specifier, that means 2 instances of a 4 byte (32 bit) value. The %08x format means output the value as an 8 digit hexadecimal number with leading zeros. Since the LBA values have numeric significance it will be more useful in decimal and without the zero paddingl

Not all values in a binary file are numeric, the file may also contain text. hexdump provides a way to show that too. Unlike the C printf family of functions, you have to know in advance how long the text is and there are many ways to achieve the same thing. Going back to the ELF executable file that I started with, you could just dump the signature as-is:

Note that the first one, using the printf standard %c displayed nothing for the two characters following the ELF but the second one displayed a dot for each character after the ELF. That helps with alignment and it indicates when the data changes from text (printing characters) to non-text (non-printing characters).

Once again you can combine it with other format options to look at sequential elements of the file that have different formats. The default length of the %f conversion is 8 bytes so you can actually drop the 1/8 from the example because there are no other format values and the dump length is 8. If the number has too many digits then you can show it in exponential format:

You can add the same to your own decoding adding a _a prefix to any of the integer numeric format types: x, d, o. If the sample file had 8 floating point values 4091 bytes into the file and I wanted to know the location of each one:

The 4155 is the effect of the %_Ad, the extra indent was deliberate. The _A option can be useful when you are building output from multiple hexdump command lines and want to know what -s value to use on the next command without having to calculate it.

When specifying the field widths after a % you can use another syntax to describe the overall width to output the value in and how many digits of output to include. For example, the following are equivalent:

In the second one I replaced %08x with %8.8x. In the second syntax, the number before the dot is the overall width for the field and the number after the dot is the number of digits of output to generate. The value is output right justified if the first number is larger than the second. So, we could have the 8 digit hex value in a 10 character field:

I want to store a raw hexdump into a .pcap file instead of a .txt file. Is there any header/format that is available which i can use, so that the raw hexdump along with the special header/format can be read as a pcap file by Wireshark.

You can either call text2pcap directly from your C application, or, if you observe the licence for text2pcap (GPL v2.0 or later) then re-use the code in text2pcap to achieve the same but note that it has dependencies on other parts of the wireshark suite.