storage engine: understanding the row format

Christoph Rupp

unread,

May 19, 2016, 8:16:04 AM5/19/16

to

Hi,

If a row has variable-length blobs (fields of type
MYSQL_TYPE_VARCHAR), then the serialized row stores the full length of
the blob, even if most bytes are unused. In such cases, i'd like to
"compress" the row before writing it to disk.

However, i have a few difficulties understanding the row format. The
following two CREATE TABLE statements are relatively similar (the
first one creates an additional index). But their first byte differs,
and I don't understand why.

CREATE TABLE test (value VARCHAR(30) NOT NULL, INDEX(value), num
INTEGER PRIMARY KEY)
INSERT INTO test VALUES("1", 1);

(gdb) x/8b buf
0x7fff5000e660: 1 49 -113 -113 -113 -113 -113 -113

buf[0] stores the length of 'value', buf[1] stores the data of 'value'.

CREATE TABLE test (value VARCHAR(30), num INTEGER PRIMARY KEY)
INSERT INTO test VALUES("1", 1);

(gdb) x/8b buf
0x7fff50013cf0: -2 1 49 -113 -113 -113 -113 -113

Now buf[1] stores the length and buf[2] stores the data of 'value'.

But what is buf[0]?

Is there documentation for the serialized row format?

Thanks
Christoph

PS: here's the code that i currently use:

static inline ups_record_t
pack_record(TABLE *table, uint8_t *buf, uint8_t *arena)
{
assert(!row_is_fixed_length(table));

uint8_t *src = buf;
uint8_t *dst = arena;

// copy the first byte - whatever it is
// this causes problems because in some cases there is no "first byte"!
*dst = *src;
dst++;
src++;

for (Field **field = table->field; *field != 0; field++) {
uint32_t type = (*field)->type();
uint16_t key_size;
uint32_t len_bytes;

if (type == MYSQL_TYPE_VARCHAR) {
// see Field_blob::Field_blob() (in field.h) - need 1-4 bytes to
// store the real size
if ((*field)->field_length <= 255) {
len_bytes = 1;
key_size = *src;
}
else if ((*field)->field_length <= 65535) {
len_bytes = 2;
key_size = *(uint16_t *)src;
}
else if ((*field)->field_length <= 16777215) {
len_bytes = 3;
key_size = *src; // TODO implement this
}
else {
len_bytes = 4;
key_size = *(uint32_t *)src;
}
}
else {
len_bytes = 0;
key_size = (*field)->key_length();
}

::memcpy(dst, src, key_size + len_bytes);
src += (*field)->pack_length();
dst += key_size + len_bytes;
}

ups_record_t r = ups_make_record(arena, (uint32_t)(dst - arena));
return r;
}

--
MySQL Internals Mailing List
For list archives: http://lists.mysql.com/internals
To unsubscribe: http://lists.mysql.com/internals

Øystein Grøvlen

unread,

May 19, 2016, 8:35:04 AM5/19/16

to

Hi Christoph,

On 19. mai 2016 14:15, Christoph Rupp wrote:
> Hi,
>
> If a row has variable-length blobs (fields of type
> MYSQL_TYPE_VARCHAR), then the serialized row stores the full length of
> the blob, even if most bytes are unused. In such cases, i'd like to
> "compress" the row before writing it to disk.

Note that VARCHAR and BLOB are different types. Contrary to VARCHAR,
for blobs, space is not allocated in the internal record buffer, but in
separate buffers.

>
> However, i have a few difficulties understanding the row format. The
> following two CREATE TABLE statements are relatively similar (the
> first one creates an additional index). But their first byte differs,
> and I don't understand why.

The index is not the only difference. Nullability of the value column
also differs.

>
> CREATE TABLE test (value VARCHAR(30) NOT NULL, INDEX(value), num
> INTEGER PRIMARY KEY)
> INSERT INTO test VALUES("1", 1);
>
> (gdb) x/8b buf
> 0x7fff5000e660: 1 49 -113 -113 -113 -113 -113 -113
>
> buf[0] stores the length of 'value', buf[1] stores the data of 'value'.
>
> CREATE TABLE test (value VARCHAR(30), num INTEGER PRIMARY KEY)
> INSERT INTO test VALUES("1", 1);
>
> (gdb) x/8b buf
> 0x7fff50013cf0: -2 1 49 -113 -113 -113 -113 -113
>
> Now buf[1] stores the length and buf[2] stores the data of 'value'.
>
> But what is buf[0]?

The additional byte(s) is used to record which nullable columns are
NULL. For tables where there are no nullable columns, there will not be
such a byte.

Regards,

--
Øystein

Øystein

Christoph Rupp

unread,

May 19, 2016, 8:43:16 AM5/19/16

to

Hi Øystein,

thanks for the fast and helpful reply!

I assume that table->s->null_bytes tells me how many bytes are used to
describe the nullable columns?

Best regards
Christoph

Øystein Grøvlen

unread,

May 19, 2016, 8:45:52 AM5/19/16

to

Hi,

On 19. mai 2016 14:42, Christoph Rupp wrote:
> Hi Øystein,
>
> thanks for the fast and helpful reply!
>
> I assume that table->s->null_bytes tells me how many bytes are used to
> describe the nullable columns?

Correct.

--
Øystein