Multithreading OpenJPEG

851 views
Skip to first unread message

IrishJesus

unread,
May 26, 2008, 12:55:27 AM5/26/08
to OpenJPEG
I've seen this come up a couple of times in past threads, and was
hoping it might have been addressed in v1.3, but it appears there is
no multi-threading done in OpenJPEG.

I'm totally impressed with the simplicity and stability that OpenJPEG
provides, but its performance is dismal. Let's move beyond a
"reference implementation" to a high-performance, open-source JP2K
library, and really get some adoption going!

I tried a simple attempt at multithreading the decoding of individual
components within tcd.c, which at first appearance seemed to make a
marked improvement in decode speeds (from 1/sec per 2k tile (4
components = 0.25sec/comp/tile) to 0.1/sec/comp/tile (about 60%
improvement). However, I'm guessing that the rest of the library
isn't written as thread-safe, because it quickly segfaults after
decoding the first tile, usually from bad pointers, etc. I'm no
expert on pthreads, or threading C programs, and I quickly found
myself out of my league.

Has there been any serious effort to support a multi-threading version
of the library? In 2.0 perhaps?

Speaking of 2.0, I downloaded the alpha code, and compiled, but you've
abandoned cio_td in favor of a stream "class", but there is no memory-
buffer implementation, only FILE *. Anyone know if a mem-buffer
implementation will be done for 2.0?


jerome

unread,
May 26, 2008, 9:16:36 AM5/26/08
to OpenJPEG
Hi,

> I've seen this come up a couple of times in past threads, and was
> hoping it might have been addressed in v1.3, but it appears there is
> no multi-threading done in OpenJPEG.
>
>
> I tried a simple attempt at multithreading the decoding of individual
> components within tcd.c, which at first appearance seemed to make a
> marked improvement in decode speeds (from 1/sec per 2k tile (4
> components = 0.25sec/comp/tile) to 0.1/sec/comp/tile (about 60%
> improvement). However, I'm guessing that the rest of the library
> isn't written as thread-safe, because it quickly segfaults after
> decoding the first tile, usually from bad pointers, etc. I'm no
> expert on pthreads, or threading C programs, and I quickly found
> myself out of my league.

In the 2.0 alpha, it should be (fairly) easy to add support for
multithreading.

First, implement a IO(input/output) thread that reads/write data on
stream.
Then follow the "normal" initialization phase.

Then copy the tcd_t structure and pass it to a thread that will
perform the tcd_(en|de)code_tile.
Only parralelize the tcd_decode_tile/ tcd_encode_tile part of the
(de)coding

Warning since the coding parameter (struct cp_t) and the image
(opj_image_t) will be shared among all the threads and may crash on
concurrent IO.
It may be safer to copy also these elements but I am not sure (to be
tested).

The "clean" (but slower in terms of coding) way would be
1 to add the number of components of the image in the cp_t parameters
and to get rid of any pointer to the image struct in decode and
encode.
2 Store the number of decoded resolutions in the tcd_t struct and not
in the image.
3 I highly doubt the sytem will crash on concurrent IO on the cp_t
struct (to be tested)

To sum up :
1 read/write and decode/encode headers by an input thread. Add the
resulting tcd_t struct to a pool of pending tiles. If the pool is full
(filled with maximum memory, hold the input stream).

2 multiple threads ( p ) are waiting on the pool and process only
tcd_(en|de)code_tiles. The resulting encoded/decoded data is indexed
by the number of the tile and included in a second pool. An output
stream is waiting on this pool and deliver data to the client (file,
decoding application, ...) in order.

Before multithreading the encoding part of the library, I advice to
work with floats and get rid of the fixed point operations. This will
give a large boost to the encoding part.

> Anyone know if a mem-buffer
> implementation will be done for 2.0?
For example :

typedef struct my_opj_memory
{
OPJ_UINT32 m_total_size; /* size of the buffer */
OPJ_UINT32 m_current_offset; /* position in the buffer */
OPJ_BYTE * m_buffer; /* buffer */
} my_opj_memory_t;


OPJ_UINT32 opj_read_from_memory (void * p_buffer, OPJ_UINT32
p_nb_bytes, my_opj_memory_t * p_data)
{
OPJ_UINT32 l_remain = p_data->m_total_size - p_data-
>m_current_offset;
l_remain = uint_min(l_remain,p_nb_bytes);
memcpy(p_buffer,p_data->m_buffer+p_data->m_current_offset,l_remain);
p_data->m_current_offset += l_remain;
return l_remain ? l_remain : -1;
}

OPJ_UINT32 opj_write_to_memory (void * p_buffer, OPJ_UINT32
p_nb_bytes, my_opj_memory_t * p_data)
{
OPJ_UINT32 l_remain = p_data->m_total_size - p_data-
>m_current_offset;
l_remain = uint_min(l_remain,p_nb_bytes);
memcpy(p_data->m_buffer+p_data->m_current_offset,p_buffer,l_remain);
p_data->m_current_offset += l_remain;
return l_remain ? l_remain : -1;
}

OPJ_SIZE_T opj_skip_from_memory (OPJ_SIZE_T p_nb_bytes,
my_opj_memory_t * p_data)
{
OPJ_UINT32 l_remain = p_data->m_total_size - p_data-
>m_current_offset;
l_remain = uint_min(l_remain,p_nb_bytes);
p_data->m_current_offset += l_remain;
return l_remain ? l_remain : -1;
}

OPJ_BOOL opj_seek_from_memory (OPJ_SIZE_T p_nb_bytes, my_opj_memory_t
* p_user_data)
{
if
(p_nb_bytes > p_data->m_total_size)
{
return 0;
}
p_data->m_current_offset = p_nb_bytes;
return 1;
}

opj_stream_t* OPJ_CALLCONV opj_stream_create_memory_stream
(my_opj_memory_t * p_data,OPJ_UINT32 p_size,OPJ_BOOL p_is_read_stream)
{
opj_stream_t* l_stream = 00;
if
(! p_file)
{
return 00;
}
l_stream = opj_stream_create(p_size,p_is_read_stream);
if
(! l_stream)
{
return 00;
}
opj_stream_set_user_data(l_stream,p_data);
opj_stream_set_read_function(l_stream,(opj_stream_read_fn)
opj_read_from_memory);
opj_stream_set_write_function(l_stream, (opj_stream_write_fn)
opj_write_to_memory);
opj_stream_set_skip_function(l_stream, (opj_stream_skip_fn)
opj_skip_from_memory);
opj_stream_set_seek_function(l_stream, (opj_stream_seek_fn)
opj_seek_from_memory);
return l_stream;
}

That should do the trick.
Just remains to create a my_opj_memory_t struct and allocate a buffer.

Hope this helps,

Jérôme

IrishJesus

unread,
May 28, 2008, 7:44:49 PM5/28/08
to OpenJPEG
Jerome,

Thanks for the info and code examples. I'm putting it all to good
use.

Perhaps you or someone else could help with a little 2.0 question.

In the test decode example, I notice there is a "data" pointer that is
malloced and passed into the decode function(s), but doesn't appear to
be used by the application anywhere itself. What is this? Is it an
arbitrary memory chunk used for decoding the file? or does it serve
some other, more sinister purpose?

If it is just a memory chunk for decoding, why not allocate the buffer
in the decode function. As it is now, (in the example) its allocated
early on, and then subsequently freed in every possible error
condition, even though it may never be used. At the least, allocate
the buffer just prior to decoding, and free it when you're done.


jerome

unread,
Jun 2, 2008, 4:56:25 AM6/2/08
to OpenJPEG
Hi,

> Perhaps you or someone else could help with a little 2.0 question.
>
> In the test decode example, I notice there is a "data" pointer that is
> malloced and passed into the decode function(s), but doesn't appear to
> be used by the application anywhere itself. What is this? Is it an
> arbitrary memory chunk used for decoding the file? or does it serve
> some other, more sinister purpose?
>

Nothing really sinister here ;)
The decoded data will be stored here by the decode function.
It is not used since it is a "tutorial" on how to use the library.
You are free to use the decoded data as you wish.

> If it is just a memory chunk for decoding, why not allocate the buffer
> in the decode function. As it is now, (in the example) its allocated
> early on, and then subsequently freed in every possible error
> condition, even though it may never be used. At the least, allocate
> the buffer just prior to decoding, and free it when you're done.

Then again, this is just a "tutorial" on the use of the library. You
are "free" to allocate the memory when you want, and to adapt the code
the way that better suits your needs (but you are right, I may have
allocated it later).

Regards,

Jérôme
Reply all
Reply to author
Forward
0 new messages