LONG: Variable internals and copy-on-write

598 views
Skip to first unread message

Pete Boettcher

unread,
Dec 22, 2000, 2:47:33 PM12/22/00
to

WARNING: Very long post

Enclosed is a MEX source file that demonstrates some of the internal
memory structure of Matlab variables. This file blatantly ignores
Matlab API, and pokes around data structures directly! This might
destroy your data, explode your computer, etc. But I don't think it
will :) Seriously, though, save data in open programs before trying
this.

The rest of this post is a detailed description of the storage scheme
for Matlab variables, with an emphasis on the method Matlab uses to
implement "copy-on-write". Familiarity with C MEX is assumed.

The information presented here is only guesses. It is not endorsed or
verified by The Mathworks, and I make no claims about its accuracy. I
and my employer disclaim all liability and responsibility for anything
that occurs due to use of this information. Use with caution!

The following is a more detailed version of the mxArray structure than
the one found in matrix.h. In that file, some of these fields are
grouped together and named "reserved" or some such.

struct mxArray_tag {
char name[mxMAXNAM];
int class;
int vartype;
mxArray *crosslink;
int number_of_dims;
int nelements_allocated;
int dataflags;
int rowdim;
int coldim;
union {
struct {
void *pdata;
void *pimag_data;
void *irptr;
void *jcptr;
int reserved;
int nfields;
} number_array;
} data;
};

In general this speaks for itself... read through the source of the
MEX file attached to find the details of each field. However the
following special cases need some extra explanation:

Cell arrays:

The header of a cell array looks identical to the header for a numeric
real array (except the class, of course!). The data for a cell array,
however, consists of an array of pointers to more mxArray structures,
which are the contents of the cell array. This is the intuitive
arrangement if you consider that cell arrays are containers for all
other Matlab array types, including more cell arrays.
(See figures below).

Struct arrays:

Structs are stored exactly like cells, except for the field names.
The number of fields is stored in the element nfields, and the
imaginary data pointer (pimag_data) is a (char *) that points to a
list of field names. The first 32 bytes are the first field name
(null-terminated), the second 32 bytes are the second field name, etc.
The values of the fields are stored like a cell array, but interleave
the fields. That is, for a two element structure with 3 fields, 3
pointers for the 3 fields of the first element are stored, followed by
3 more pointers for the second element.

Cross links:

This is the most interesting part of the whole thing! You might have
heard before that Matlab implements a "copy-on-write" algorithm. This
means that if you copy an array, only the header is copied; the data
itself is shared between the two arrays. The first time one of the
arrays is modified, Matlab first copies all the data.

Matlab uses what I call "cross links" to implement this. When you
copy an array (using b=a for instance), Matlab creates a new header,
and sets the data pointer (pdata) to point to the same data as the
source array. It then sets the crosslink field of the new header to
the address of the old header, and vice versa. Then, if any time an
array is written to, the crosslink field is checked. If non-zero, a
copy of the data array is made, both crosslink fields are zeroed, and
the modification continues.

For more than one copy, a circular list is used.


-----------------------------------------
>> a = [35.7 100.2 1.2e7];

mxArray a
pdata -----> 35.7 100.2 1.2e7
crosslink=0


-----------------------------------------
>> b = a;

mxArray a
pdata -----> 35.7 100.2 1.2e7
crosslink / \
| / \ |
| | |
| | |
\ / | |
crosslink |
mxArray b |
pdata --------

-----------------------------------------
>> a(1) = 1;

mxArray a
pdata -----> (1) 100.2 1.2e7
crosslink=0

crosslink=0
mxArray b
pdata ------> 35.7 100.2 1.2e7 ...

-----------------------------------------


Cross links and cell arrays:

When copying a cell array, the same thing happens. A new header is
created, but the data array (in this case an array of mxArray*) is
shared. The first time a cell element is modified, this array is
copied. However, don't forget that each element itself points to
another mxArray. So each of those mxArray headers is copied as well.
But ONLY the headers! The data for each of the cell elements is again
crosslinked between the two cell arrays.

-----------------------------------------

mxArray a mxArray mxArray mxArray
pdata -----> pdata pdata pdata
crosslink / \ | | |
| / \ | | | |
| | | | | |
| | | \ / \ / \ /
\ / | | [1 2] 'hello' [100x100 double]
crosslink |
mxArray b |
pdata --------

-----------------------------------------

>> a{2}(1) = 'j';

mxArray a mxArray mxArray mxArray
pdata -----> pdata pdata pdata
crosslink=0 | | |
| | |
\ / \ / \ /
[1 2] 'hello' [100x100 double]
/ \ / \
crosslink=0 | |
mxArray b pdata pdata
pdata -----> mxArray mxArray mxArray
pdata
|
|
\ /
'jello'

(Note: the mxArray's that share pdata have crosslinks between them)
-----------------------------------------

About the program:

Read the comments at the top of the file!
Compile it with a working mex installation using
>> mex headerdump.c

Run it with one argument: any matlab variable of any time, including
temporary variables (as in headerdump([1])). It will display some
goop extracted from the header, including any crosslinks. To see
this, make a copy of a variable and run it on that.

For cell arrays, headerdump will print a list of all the elements (up
to 30), including crosslinks, if any.


Hope this is useful to someone!

-Peter Boettcher


Sorry, MIME is not working for me. There are two files here as inline
plain text. Cut and paste into an editor.


----begin headerdump.c----
/* headerdump.c: MEX file to show the internals of an mxArray
header.

WARNING! WARNING! WARNING! This program blatantly abuses
the Matlab API! It pokes around in private data structures
without remorse! The structures may change from version to
version (or even platform to platform for all we know).

That said, this program only looks at memory. So even if you
get a Matlab segfault, nothing should be corrupted. Please
let me know, though, if you do segfault, with instructions on
how to reproduce it.

This works on Matlab 5.3, and from looking at the header files
in R12, the structure is the same. No guarantees though!

The author and his employer disclaim all liability for any damage
caused by this program. For experimental use only!

Author: Peter Boettcher <boet...@ll.mit.edu>
Last modified: <Fri Dec 22 14:42:20 2000 by pwb> */

#include "mex.h"
#include "mxinternals.h"

/* Print one-liner describing mxArray, including any crosslinks */
void briefdump(mxArray *in)
{
int *tmp;
int i;

printf("(%s) ", mxGetClassName(in));
printf("Address: %p", in);
if(in->crosslink)
printf(" <linked to %p>", in->crosslink);

if(in->number_of_dims == 2) {
printf(" [%i %i]\n", in->rowdim, in->coldim);
} else {
printf(" [");
tmp = (int *)(in->rowdim);
for(i=0; i<in->number_of_dims; i++) {
printf("%i ", tmp[i]);
}
printf("]\n");
}
}

/* Print detailed description of mxArray */
void dumpMxArray(mxArray *in)
{
int *tmp;
int i;
int numel;

printf("Name: %.32s\n", in->name);
printf("Address: %p", in);
/* Crosslink means two or more variables point to the same
data. The link allows Matlab to copy the array and update
the affected variables if someone wants to modify an element */
if(in->crosslink)
printf(" <Crosslinked to %p: %.32s>\n", in->crosslink,
in->crosslink->name);
else
printf("\n");

/* This field has a unique value for each class, but is not
the same as the class ID */
printf("Related to classID? %i (true: %i)\n",
in->class, (int)mxGetClassID(in));

printf("Variable type: ");
switch(in->vartype) {
case MXVARNORMAL:
printf("Normal\n");
break;
case MXVARPERSIST:
printf("Persistent\n");
break;
case MXVARGLOBAL:
printf("Global\n");
break;
case MXVARSUBEL:
printf("Subelement of a cell or struct\n");
break;
case MXVARTEMP:
printf("Temporary\n");
break;
default:
printf("Unknown variable type: %i (Please email boet...@ll.mit.edu)\n",
in->vartype);
}

printf("Data Flags: Logical %i DblScalar %i (other: %x)\n",
(in->dataflags&MXLOGICALMASK)!=0,
(in->dataflags&MXSCALARMASK)!=0,
(in->dataflags&0x00fffffc));

if(in->dataflags & 0xff000000)
printf("User Data: %x\n", (in->dataflags&0xff000000)>>24);


printf("\nDimensions (%i): ", in->number_of_dims);
if(in->number_of_dims == 2) {
printf("[%i %i]\n", in->rowdim, in->coldim);
numel = in->rowdim * in->coldim;
} else { /* multidimensional */
numel = 1;
printf("[");
tmp = (int *)(in->rowdim);
for(i=0; i<in->number_of_dims; i++) {
numel *= tmp[i];
printf("%i ", tmp[i]);
}
printf("]\n");
}

printf("Real data: %p", in->data.number_array.pdata);
if(in->dataflags & MXSCALARMASK)
printf(" [%g]", *((double *)in->data.number_array.pdata));
printf("\n");

if(in->data.number_array.pimag_data) { /* complex */
printf("Imag data: %p", in->data.number_array.pimag_data);
if(in->dataflags & MXSCALARMASK)
printf(" [%g]", *((double *)in->data.number_array.pimag_data));
printf("\n");
}

if(in->nelements_allocated) { /* sparse */
printf("Sparse matrix: Nelements: %i\n", in->nelements_allocated);
printf("Column ptr: %p\nRow ptr: %p\n",
in->data.number_array.irptr, in->data.number_array.jcptr);
}

if(in->data.number_array.reserved) /* what's this? */
printf("Unknown field: %x (Please email boet...@ll.mit.edu)\n",
in->data.number_array.reserved);

if(in->class == 6) /* struct */ {
printf("Number of struct fields: %i\n", in->data.number_array.nfields);
numel *= in->data.number_array.nfields;
for(i=0; i<in->data.number_array.nfields; i++)
printf(" %.32s\n", (char *)in->data.number_array.pimag_data + i*32);
printf("\n");
}
printf("\n");

/* Structs are stored like cells, using mxArray pointers in the real
data spot. They are stored in "field major" order, meaning the
pointers to the values of the fields of the first struct element
are stored in order, then the columns of the struct array, then
the rows. */
/* Print a one-liner for each element of the cell or struct, up
to a maximum of 30 */
if(in->class == 6 || in->class == 5) {
for(i=0; i<((numel < 30) ? numel : 30); i++)
if(((mxArray **)in->data.number_array.pdata)[i])
briefdump(((mxArray **)in->data.number_array.pdata)[i]);
else
printf("(nil)\n");
}
}

void mexFunction( int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[])
{
if(nrhs < 1)
mexErrMsgTxt("One input required.");

dumpMxArray(prhs[0]);
}

---end headerdump.c---

---begin mxinternals.h---
#ifndef MXINTERNALS_H
#define MXINTERNALS_H 1


#define MXVARNORMAL (0)
#define MXVARPERSIST (1)
#define MXVARGLOBAL (2)
#define MXVARSUBEL (3)
#define MXVARTEMP (4)

#define MXLOGICALMASK 0x02
#define MXSCALARMASK 0x01

/* Blatant disregard for MATLAB API. This allows you
to poke around an mxArray structure directly. Use
with extreme caution! May not be portable across
platforms or versions, may cause Matlab segfaults,
may cause your computer to explode and eat all your data... */
struct mxArray_tag {
char name[mxMAXNAM];
int class;
int vartype;
mxArray *crosslink;
int number_of_dims;
int nelements_allocated;
int dataflags;
int rowdim;
int coldim;
union {
struct {
void *pdata;
void *pimag_data;
void *irptr;
void *jcptr;
int reserved;
int nfields;
} number_array;
} data;
};

#endif
---end mxinternals.h---

Ofek Shilon

unread,
Apr 30, 2017, 4:44:02 AM4/30/17
to
This thread from 17Y ago still comes up when searching for info about mxArray internals. Here's similar work, that is valid for matlab releases 2014+:
https://github.com/OfekShilon/mxArrayWatch



בתאריך יום שישי, 22 בדצמבר 2000 בשעה 21:47:33 UTC+2, מאת Pete Boettcher:
Reply all
Reply to author
Forward
0 new messages