I'm writing a library for communication with a number of external
devices. The main part of the public api is very similar to normal file I/O:
device_t *device = NULL;
device_open (&device, ...);
device_read (device, ...);
device_write (device, ...);
device_close (device);
Since most of the I/O operations are relative slow (ranging from a few
seconds to many minutes) they block the main thread. Especially in a GUI
application that is not acceptable and the code should run in its own
thread to keep the rest of the application responsive.
So far no problem, because the code is thread safe as long as the access
to the device handle from multiple threads is properly synchronized (e.g
no global variables, etc). But now I would like to add the capability to
cancel a running operation. What is the best approach for doing this?
The code should be cross-platform with at least support for Windows,
Linux and Mac. A different implementation for each platform is fine too.
I'm not an expert in threading, but I already came up with some ideas:
1. Killing the worker thread.
Simply expecting the application to kill the worker thread is probably
not a good option, because no cleanup can be performed.
2. Providing some sort of device_cancel() function.
This function would set some flag in the device handle to request
cancellation of the running operation. The library code can than check
this flag whenever it can exit cleanly. It's no problem if the exit is
not immediately. Thus something like this:
struct device_t {
volatile int cancelled;
...
}
device_cancel (device_t *device)
{
device->cancelled = 1;
}
device_read (device_t *device, ...)
{
while (!device->cancelled) {
/* do some slow stuff */
}
}
But to make this work, this function must be safe to be called from
different threads, otherwise it is almost useless. From what I
understand, this code should work as long as reading/writing the flag is
atomic. The question is now how can I achieve this? I have read some
literature that reading/writing integers is usually atomic, but not
always guaranteed. And how about the volatile?
Now I think another potential problem is when this device_cancel()
function is run while the device handle is being closed in the
device_close() function, this will fail. How could I prevent this? I
have been thinking about introducing a new cancellation object, that
could be attached to the device handle, so it becomes possible to
guarantee the lifetime of the cancellation object is longer than the
device handle.
cancel_t *cancel = NULL;
cancel_create (&cancel);
/* Start in separate thread */
device_t *device = NULL;
device_open (&device, ...);
device_set_cancel (device, cancel);
...
device_close (device);
/* End thread */
cancel_destroy (cancel);
Is this a reasonable (and correct) approach?
3. Any other ideas?
Thanks in advance,
Jef
> I'm writing a library for communication with a number of external
> devices. The main part of the public api is very similar to normal file I/O:
>
> device_t *device = NULL;
> device_open (&device, ...);
> device_read (device, ...);
> device_write (device, ...);
> device_close (device);
>
> Since most of the I/O operations are relative slow (ranging from a few
> seconds to many minutes) they block the main thread. Especially in a GUI
> application that is not acceptable and the code should run in its own
> thread to keep the rest of the application responsive.
>
> So far no problem, because the code is thread safe as long as the access
> to the device handle from multiple threads is properly synchronized (e.g
> no global variables, etc). But now I would like to add the capability to
> cancel a running operation. What is the best approach for doing this?
IMO, the best approach is as follows:
1) When you want to start an operation, you create an object that
reflects that operation. You have a reference to that object and the
thread that's performing the operation has a reference.
2) The operation object contains a mutex, condition variable, and any
information needed to assess the state of the operation.
3) The thread performing the operation updates the operation state and
checks for cancellation under the protection of the mutex. When that
thread is done with the operation or has given up on it, it decrements
its reference count (under the protection of the mutex), freeing it if
the count reaches zero.
4) The code requesting the operation is given the handle to the
operation. It may block on it (which results in a call to
pthread_mutex_[timed}wait) or cancel it (which decrements its
reference count, freeing it if it hits zero).
5) On completion, assuming no cancellation, the thread that created it
detects the completion (by seeing the completion in the status/state
variable) under the protection of the mutex and decrements its
reference, resulting in the reference dropping to zero.
DS
Am I correct that this would look somewhat like this pseudo code:
int main ()
{
operation_t *operation;
operation_create (&operation, ...);
pthread_t thread;
pthread_create (&thread, NULL, dowork, operation);
while (operation_is_still_running) {
if (user_requested_cancel)
operation_cancel (operation)
}
operation_destroy (operation);
}
void *dowork (void *arg)
{
operation_t *operation = (operation_t *) arg;
device_t *device = NULL;
device_open (&device, ...);
device_set_operation (device, operation);
/* Call a slow library function that
support cancellation through the
attached operation object. */
device_read (device, ...);
device_close (device);
}
device_read (device_t *device, ...)
{
while (!operation_is_canceled) {
/* do some slow stuff */
}
}
That sounds very similar to the cancellation object I mentioned. But
with the difference that your operation object is slightly more advanced
with reference counting and mutexes. Is that even necessary when I only
need the ability to cancel an operation? The way I see this, is that all
the thread management code should be kept outside of the library, since
it can be done in the application code. Only the cancellation object
needs to be thread aware of course.
Note: checking whether a cancellation is requested should be fast,
because some of the devices have very tight timing requirements. That's
why I was looking into atomic operations rather than mutexes.
> That sounds very similar to the cancellation object I mentioned. But
> with the difference that your operation object is slightly more advanced
> with reference counting and mutexes. Is that even necessary when I only
> need the ability to cancel an operation?
Yes, it is. At least, it is unless you do something else (comparably
complex) to handle the following problems: The thread that initiates
the operation cancels it (and cannot wait for it to complete or finish
cancellation) but another thread is still working on the operation.
Who frees the operation object?
> The way I see this, is that all
> the thread management code should be kept outside of the library, since
> it can be done in the application code. Only the cancellation object
> needs to be thread aware of course.
The operation object can be created by the library. Your functions can
take an operation object pointer which, if NULL, makes the operation
synchronous. So it goes like this:
u=some_library_function(NULL, a, b, c);
This is synchronous.
operation *op=create_operation();
u=some_library_function(op, a, b, c);
if(u==-1) /* early failure */
Now, you can do:
cancel_operation(op);
and move on.
Or you can do:
u=wait_operation(500);
(Wait up to 500mS). Maybe zero for wait forever.
Then maybe:
u=operation_return(op);
to get the final return value.
You may also need:
u=operation_isfinished(op);
You get the idea. One tricky part with this is you need to stash any
pointers for other return values or data. You must make sure that
cancelling the operation NULLs those pointers so they will not be used
if the operation completes. Some code may do:
operation_cancel(op);
return; // oops, pointers now point to stack entries that don't exist.
So once 'cancel' returns, ensure the pointers are not used.
Also, the 'cancel' function should return whether the function was
actually cancelled or whether it completed already.
> Note: checking whether a cancellation is requested should be fast,
> because some of the devices have very tight timing requirements. That's
> why I was looking into atomic operations rather than mutexes.
You can certainly have an 'operation state' object that supports
atomic operations. But locking a mutex is fast too. Beware of
premature implementation optimization. Do you plan to spin on an 'is
complete' function? Why not block on a 'wait' function instead?
DS
A good point. I didn't think about that.
If implemented this way, the library needs to implement all the
"heavyweight" threading code to support the async part of this api. When
requesting an async operation, a new thread has to be created and
monitored for its state.
But I'm looking for a more "lightweight" solution, where my library
supports only sync operations. And if that is not appropriate for an
application (e.g. a GUI should not freeze), the application is
responsible to run the slow code in a background thread. The typical use
case for my library is connecting to a device, download some data and
process it afterwards. During this task, some progress indicator is
shown and the rest of the application is waiting (but not blocking, so
it remains responsive) until the entire operation is done, or the user
cancels the operation. I don't see a real use for true async operations
here.
Leaving the threading code to the application greatly simplifies my
library. Only the cancellation would need some library support, to allow
something better than the brute force approach of killing the background
thread.
BTW, your description reminds me of the Win32 overlapped I/O api [1,2].
What I would like to achieve would be something similar to the
CancelSynchronousIo() function. The GLib library has a similar concept
with the GCancellable object [3].
[1] http://msdn.microsoft.com/en-us/library/aa365683(VS.85).aspx
[2] http://msdn.microsoft.com/en-us/library/aa363789(VS.85).aspx
[3]
http://library.gnome.org/devel/gio/stable/GInputStream.html#g-input-stream-read
>> Note: checking whether a cancellation is requested should be fast,
>> because some of the devices have very tight timing requirements. That's
>> why I was looking into atomic operations rather than mutexes.
>
> You can certainly have an 'operation state' object that supports
> atomic operations. But locking a mutex is fast too. Beware of
> premature implementation optimization.
It's not a premature optimization. I'm already reaching the limits, so I
just want to be careful not to add any other slowdowns that would push
my code over the limit.
I have no idea how slow or fast mutexes are (because I have very little
experience with multi threaded programming). I only read that atomic
operations are much faster.
> Do you plan to spin on an 'is
> complete' function? Why not block on a 'wait' function instead?
That was just an example for illustration.
> If implemented this way, the library needs to implement all the
> "heavyweight" threading code to support the async part of this api. When
> requesting an async operation, a new thread has to be created and
> monitored for its state.
Well, strictly speaking that's not true. An existing thread can be re-
used, or you can make the application supply you with threads.
> But I'm looking for a more "lightweight" solution, where my library
> supports only sync operations. And if that is not appropriate for an
> application (e.g. a GUI should not freeze), the application is
> responsible to run the slow code in a background thread.
But this ties up a thread for the entire duration of the operation.
What if there are 12,000 operations goings on? Do you really expect
the application to supply you with 12,000 threads to do them? This is
a real issue because:
> The typical use
> case for my library is connecting to a device, download some data and
> process it afterwards.
So your functions can block for an arbitrarily long amount of time
while waiting for external things to happen. So it can take, say, 10
seconds to complete an operation. That means an application that needs
to do, say, 50 operations a second may need 500 threads.
That may not actually be a disaster for your particular application,
but I think it reflects a choice of a broken design that you cannot
control resource usage. It will definitely limit what operations you
can do in the future.
> During this task, some progress indicator is
> shown and the rest of the application is waiting (but not blocking, so
> it remains responsive) until the entire operation is done, or the user
> cancels the operation. I don't see a real use for true async operations
> here.
How many operations might need to be pending at once, worst case? And
are you okay with a situation where you cannot increase that without a
complete redesign?
This also makes things ugly for the application designer. For example,
suppose he has to deal with another library that has threading
limitations -- maybe an operation has to be completed by the same
thread that started it. If he has an operation going, he can't call
your library with that thread because if he does, he can't complete
the operation he's already doing until your operation completes or he
cancels it (because the thread he needs to do it will be busy).
The net effect of dealing with libraries that have weird threading
dependencies (as yours would) is that the people using them typically
wind up having to quarantine the library with 'service' threads so
they don't have to deal with these limitations. Your library would be
yet another weird one I'd have to quarantine, being unable to call it
with critical threads because I'll lose them, and having to dispatch
it because you didn't.
And you can do it better than I can, since you know when the dispatch
and I have to dispatch all the time. For example, if you can tell that
an operation will never work or can complete immediately, you can
avoid the dispatch. How can I do that?
DS
I can't imagine my library needs to scale to those numbers (read below
for more info). But imagine for a moment it does. How would having an
async api solve this issue when the implementation would internally
create a thread for every operation. In that case, the number of threads
would be identical. Unless some sort of thread pool would be used, but
isn't that possible in the application code too?
>> During this task, some progress indicator is
>> shown and the rest of the application is waiting (but not blocking, so
>> it remains responsive) until the entire operation is done, or the user
>> cancels the operation. I don't see a real use for true async operations
>> here.
>
> How many operations might need to be pending at once, worst case? And
> are you okay with a situation where you cannot increase that without a
> complete redesign?
It is extremely rare if an end user owns more than one device. If I had
to come up with a number I would say less than 5%, which is probably
already too optimistic. Since we are dealing with real hardware, the
number of devices that can be attached to a PC is also limited by
physical factors, such as available serial/usb ports
So the numbers will always be very small. This is not like downloading
data from a network, where the number of connections (or other similar
resources) can increase relative easily.
I highly doubt that there will even be an application that will want to
support concurrent transfers. None of the existing applications does
this (regardless whether they are using my library or not).
I think the best analogy I can come up with is a digital camera. Most
end users only own one camera, and those that do own more have likely no
need to transfer pictures from multiple cameras at the same time. I
don't know a photo application that support this, even when the
underlying transfer protocol or library could support it.
> This also makes things ugly for the application designer. For example,
> suppose he has to deal with another library that has threading
> limitations -- maybe an operation has to be completed by the same
> thread that started it. If he has an operation going, he can't call
> your library with that thread because if he does, he can't complete
> the operation he's already doing until your operation completes or he
> cancels it (because the thread he needs to do it will be busy).
>
> The net effect of dealing with libraries that have weird threading
> dependencies (as yours would) is that the people using them typically
> wind up having to quarantine the library with 'service' threads so
> they don't have to deal with these limitations. Your library would be
> yet another weird one I'd have to quarantine, being unable to call it
> with critical threads because I'll lose them, and having to dispatch
> it because you didn't.
How would that be different from any other library with a sync api
(which I think is still the majority)?
> And you can do it better than I can, since you know when the dispatch
> and I have to dispatch all the time. For example, if you can tell that
> an operation will never work or can complete immediately, you can
> avoid the dispatch. How can I do that?
That's certainly true.
But my main concern with a full featured async api is that the async
stuff would complicate the library a lot (to the point where the
threading might be more complex than the rest of the code), while in
practice it might be used very little. So the question is: is this
really worth the effort?
The "simple" cancellation on the other hand is something that would be
really nice to have. Some operations can block for a significant amount
of time (ranging from only a few milliseconds to some minutes), and can
only be canceled at specific points in the transfer.
> I can't imagine my library needs to scale to those numbers (read below
> for more info). But imagine for a moment it does. How would having an
> async api solve this issue when the implementation would internally
> create a thread for every operation. In that case, the number of threads
> would be identical. Unless some sort of thread pool would be used, but
> isn't that possible in the application code too?
No, it's impossible in the application code. If I have to call a
synchronous function, then I need one thread for every operation
that's in progress, period. If I have 100 operations in progress, that
means 100 threads. If you don't provide an asynchronous API, there's
not a damn thing I can do about it.
If your library does not provide an asynchronous API, you force the
application to dedicate a thread to each pending operation for as long
as the application wants to keep that operation pending. That's pretty
awful, IMO.
You may think it's not a big deal, but the problem is, that would be
your library's quirk. Now imagine I'm working with another library
with a different quirk -- say it provides an asynchronous API but
requires me to cancel or complete operations in the same thread that
started them. Think about it -- how do I use both your library and
this library with the same threads?
In practice, application programmers will curse you under their breath
and then quarantine your library from their own threads so they don't
have to deal with your quirks.
> > How many operations might need to be pending at once, worst case? And
> > are you okay with a situation where you cannot increase that without a
> > complete redesign?
> It is extremely rare if an end user owns more than one device. If I had
> to come up with a number I would say less than 5%, which is probably
> already too optimistic. Since we are dealing with real hardware, the
> number of devices that can be attached to a PC is also limited by
> physical factors, such as available serial/usb ports
Okay, if the operations are always associated with a local physical
device, and only one operation can pend per device, then I think
you're okay.
> I think the best analogy I can come up with is a digital camera. Most
> end users only own one camera, and those that do own more have likely no
> need to transfer pictures from multiple cameras at the same time. I
> don't know a photo application that support this, even when the
> underlying transfer protocol or library could support it.
Then I think you're okay.
> > The net effect of dealing with libraries that have weird threading
> > dependencies (as yours would) is that the people using them typically
> > wind up having to quarantine the library with 'service' threads so
> > they don't have to deal with these limitations. Your library would be
> > yet another weird one I'd have to quarantine, being unable to call it
> > with critical threads because I'll lose them, and having to dispatch
> > it because you didn't.
> How would that be different from any other library with a sync api
> (which I think is still the majority)?
It wouldn't. We curse and avoid them all. ;)
> But my main concern with a full featured async api is that the async
> stuff would complicate the library a lot (to the point where the
> threading might be more complex than the rest of the code), while in
> practice it might be used very little. So the question is: is this
> really worth the effort?
It sounds like not.
> The "simple" cancellation on the other hand is something that would be
> really nice to have. Some operations can block for a significant amount
> of time (ranging from only a few milliseconds to some minutes), and can
> only be canceled at specific points in the transfer.
Is there some benefit to canceling the operation? Would it allow a new
operation to be started sooner? If it's just to keep the thread from
being stuck, I wouldn't bother. But if it's to stop a hardware
resource from being consumed and allow a new operation to start sooner
if a user is no longer interested in the previous operation, then it's
worth doing.
DS
I understand that when doing the threading in the application, you need
one thread for each operation. But assume now for a moment I want to
implement an async api. If I start a new thread whenever an async call
is started, I also end up with one thread for each operation. Or is
there some other way to implement the async api? Maybe some technique
that I don't know about?
I am aware of stuff like select() to watch multiple (file) descriptors
at once, which avoids the need to have a thread for watching each
descriptor. But I have no idea how something like that could be used for
my library. I even think a single thread won't work for me, because some
devices have very strict timing rules, such as the delay between
receiving a packet and requesting the next one should not exceed a
certain value (e.g. 50ms window). So if some other events can be
processed in between, they may interfere with the transfer and cause
trouble. When the entire operation runs in its own thread, nothing else
can slip in between (except when the OS schedules some other
thread/process of course).
Note: If my async api would use a fixed number of worker threads to
process the requested async api calls one by one, I think you could also
do something similar in the application code. That was what I was trying
to see in my original comment.
>> The "simple" cancellation on the other hand is something that would be
>> really nice to have. Some operations can block for a significant amount
>> of time (ranging from only a few milliseconds to some minutes), and can
>> only be canceled at specific points in the transfer.
>
> Is there some benefit to canceling the operation? Would it allow a new
> operation to be started sooner? If it's just to keep the thread from
> being stuck, I wouldn't bother. But if it's to stop a hardware
> resource from being consumed and allow a new operation to start sooner
> if a user is no longer interested in the previous operation, then it's
> worth doing.
Some devices may require an action from the user once the operation is
finished. For instance some devices use considerably more battery power
when connected to the host PC. Thus to preserve battery life as much as
possible, the device should be disconnected once the operation is
finished. But if the operation remains active in the background, the
device keeps consuming more power than necessary.
Also from a user perspective, keeping the task running is also rather
strange. When you start the operation in a GUI based application, you
typically get some dialog with a progress indicator and a cancel button.
But when you click the cancel button, most users will expect the
operation to be finished when the dialog disappears (to be able to
inform the user that the device can safely be disconnected for
instance). But that would mean having to wait for the thread to finish,
which would be equivalent to ignoring the cancellation request. Not
exactly what I would expect from a cancel button.
The main point is probably that I only want to achieve a sync operation
(e.g. the app is waiting until the operation is completely finished or
canceled), while the GUI must remain responsive (e.g. widgets are still
painted and events are handled). Thus it's not really some background
task that can execute more or less independent from the main thread.
> I understand that when doing the threading in the application, you need
> one thread for each operation. But assume now for a moment I want to
> implement an async api. If I start a new thread whenever an async call
> is started, I also end up with one thread for each operation. Or is
> there some other way to implement the async api? Maybe some technique
> that I don't know about?
Yes, don't use one thread for each operation.
> I am aware of stuff like select() to watch multiple (file) descriptors
> at once, which avoids the need to have a thread for watching each
> descriptor. But I have no idea how something like that could be used for
> my library. I even think a single thread won't work for me, because some
> devices have very strict timing rules, such as the delay between
> receiving a packet and requesting the next one should not exceed a
> certain value (e.g. 50ms window). So if some other events can be
> processed in between, they may interfere with the transfer and cause
> trouble. When the entire operation runs in its own thread, nothing else
> can slip in between (except when the OS schedules some other
> thread/process of course).
In general, using more threads makes it harder to meet timing requires
because the 'wrong' thread can be running. With one thread, you never
have to worry about the 'wrong' thread running.
> Also from a user perspective, keeping the task running is also rather
> strange. When you start the operation in a GUI based application, you
> typically get some dialog with a progress indicator and a cancel button.
> But when you click the cancel button, most users will expect the
> operation to be finished when the dialog disappears (to be able to
> inform the user that the device can safely be disconnected for
> instance). But that would mean having to wait for the thread to finish,
> which would be equivalent to ignoring the cancellation request. Not
> exactly what I would expect from a cancel button.
Generally, users expect the 'cancel' button to stop them from having
to wait for an operation to complete. If they expect that it assures
that the operation will not complete, their expectation is unrealistic
(since they may simply be waiting for an acknowledgment). Whether is
matters if the 'cancel' button actually stops the operation or only
stops the user from having to wait depends on what the operation is.
> The main point is probably that I only want to achieve a sync operation
> (e.g. the app is waiting until the operation is completely finished or
> canceled), while the GUI must remain responsive (e.g. widgets are still
> painted and events are handled). Thus it's not really some background
> task that can execute more or less independent from the main thread.
Then you're probably fine providing only blocking operations and a
cancel function.
DS
Can you tell me about some alternatives? Or just point me in the right
direction, so I know where I should start looking?
>> I am aware of stuff like select() to watch multiple (file) descriptors
>> at once, which avoids the need to have a thread for watching each
>> descriptor. But I have no idea how something like that could be used for
>> my library. I even think a single thread won't work for me, because some
>> devices have very strict timing rules, such as the delay between
>> receiving a packet and requesting the next one should not exceed a
>> certain value (e.g. 50ms window). So if some other events can be
>> processed in between, they may interfere with the transfer and cause
>> trouble. When the entire operation runs in its own thread, nothing else
>> can slip in between (except when the OS schedules some other
>> thread/process of course).
>
> In general, using more threads makes it harder to meet timing requires
> because the 'wrong' thread can be running. With one thread, you never
> have to worry about the 'wrong' thread running.
But if I'm doing everything in a single thread, I could still be running
the 'wrong' code in that thread. For instance when I receive a packet
that needs to be acknowledged within a certain time window, another
operation could be ready too. And if that other operation is processed
first and it takes longer than the timing window for the first packet, I
still have a problem.
>> Also from a user perspective, keeping the task running is also rather
>> strange. When you start the operation in a GUI based application, you
>> typically get some dialog with a progress indicator and a cancel button.
>> But when you click the cancel button, most users will expect the
>> operation to be finished when the dialog disappears (to be able to
>> inform the user that the device can safely be disconnected for
>> instance). But that would mean having to wait for the thread to finish,
>> which would be equivalent to ignoring the cancellation request. Not
>> exactly what I would expect from a cancel button.
>
> Generally, users expect the 'cancel' button to stop them from having
> to wait for an operation to complete. If they expect that it assures
> that the operation will not complete, their expectation is unrealistic
> (since they may simply be waiting for an acknowledgment). Whether is
> matters if the 'cancel' button actually stops the operation or only
> stops the user from having to wait depends on what the operation is.
In my case, I can't cancel at arbitrary points. For instance only
between two data packets, but not in the middle of a packet. Thus
immediate cancellation is not possible anyway, but canceling within a
"reasonable" time (which could be up to a few seconds) is fine. And if
the operation completes before the cancellation was noticed, that's fine
too.
Imagine a user starts an operation that would normally take 10 minutes,
and during that transfer he realizes he made a mistake, cancels it and
wants to try again. That wouldn't be possible because the first
operation is still active and he'll have to wait the full 10 minutes.
That's not really ideal.
> > Yes, don't use one thread for each operation.
> Can you tell me about some alternatives? Or just point me in the right
> direction, so I know where I should start looking?
Use a pool of threads, say with a fixed number. Only assign a thread
from that pool to an operation when it's possible to make forward
progress on that operation.
> > In general, using more threads makes it harder to meet timing requires
> > because the 'wrong' thread can be running. With one thread, you never
> > have to worry about the 'wrong' thread running.
> But if I'm doing everything in a single thread, I could still be running
> the 'wrong' code in that thread. For instance when I receive a packet
> that needs to be acknowledged within a certain time window, another
> operation could be ready too. And if that other operation is processed
> first and it takes longer than the timing window for the first packet, I
> still have a problem.
Right, but it's a problem that's completely under your control. Don't
let the other operation take longer than your timing window. If it
does, put it down and pick it back up again later.
> Imagine a user starts an operation that would normally take 10 minutes,
> and during that transfer he realizes he made a mistake, cancels it and
> wants to try again. That wouldn't be possible because the first
> operation is still active and he'll have to wait the full 10 minutes.
> That's not really ideal.
If you have operations that take that long, cancellation is almost
mandatory. But I can't imagine any operation that takes 10 minutes
that could conceivably require a dedicated thread. Do you really need
10 minutes of CPU time to transfer stuff? It's way sub-optimal design
to tie up a CPU when it's not actually doing CPU work.
DS
I have no idea how I would have to implement something like this.
Especially because my library consist of many backends, one for each
type of supported device. Each type of device has it own transfer
protocol and my library provides a common api to access all devices in a
similar way, thus hiding the underlying protocol.
So when requesting data from a device, some devices send all there data
as a single data stream, some split the data in multiple packets where
you need to ack/nak each packet to get the next packet, and some support
random access to the internal memory pages and you have to request the
pages one by one:
/* Single data stream */
write_request
read_response
/* Multiple packets */
write_request
while (nbytes < size) {
read_packet
if (error)
write_nak_and_try_again
else
write_ack
}
/* Random access to memory pages */
while (nbytes < size) {
write_page_request
read_page_response
}
Also the timing requirements are very different. Some devices allow only
a very short time (e.g. 50ms) between two bytes/packets/pages, while
others have a very long timeout (e.g. a few minutes) before the devices
aborts the transmission itself. Also the timeouts that I'm using for
reading (and thus the maximum time a read can block) varies depending on
the backend.
>> Imagine a user starts an operation that would normally take 10 minutes,
>> and during that transfer he realizes he made a mistake, cancels it and
>> wants to try again. That wouldn't be possible because the first
>> operation is still active and he'll have to wait the full 10 minutes.
>> That's not really ideal.
>
> If you have operations that take that long, cancellation is almost
> mandatory. But I can't imagine any operation that takes 10 minutes
> that could conceivably require a dedicated thread. Do you really need
> 10 minutes of CPU time to transfer stuff? It's way sub-optimal design
> to tie up a CPU when it's not actually doing CPU work.
My operations are not very CPU intensive. Most of the time is spend
waiting for I/O. I'm using blocking I/O, implemented with a typical
read()/select() loop on Linux and a sync ReadFile() call on Windows. And
I'm using a timeout (in the order of a few seconds) to ensure that the
I/O doesn't block forever in case something went wrong (e.g. no data
ever arrives).
Transfers are slow because the physical link is slow (e.g. a serial
port, or an usb port with an internal usb-to-serial chip), and some
devices are just slow in processing request and sending their replies.
> > Right, but it's a problem that's completely under your control. Don't
> > let the other operation take longer than your timing window. If it
> > does, put it down and pick it back up again later.
> I have no idea how I would have to implement something like this.
By doing the work you need to do with asynchronous functions.
> Especially because my library consist of many backends, one for each
> type of supported device. Each type of device has it own transfer
> protocol and my library provides a common api to access all devices in a
> similar way, thus hiding the underlying protocol.
Presumably the common API is purely synchronous. See how this causes
your problems -- now you can't meet other timing requirements because
you have to call these synchronous functions that can block you for
arbitrarily long.
> > If you have operations that take that long, cancellation is almost
> > mandatory. But I can't imagine any operation that takes 10 minutes
> > that could conceivably require a dedicated thread. Do you really need
> > 10 minutes of CPU time to transfer stuff? It's way sub-optimal design
> > to tie up a CPU when it's not actually doing CPU work.
> My operations are not very CPU intensive. Most of the time is spend
> waiting for I/O. I'm using blocking I/O, implemented with a typical
> read()/select() loop on Linux and a sync ReadFile() call on Windows. And
> I'm using a timeout (in the order of a few seconds) to ensure that the
> I/O doesn't block forever in case something went wrong (e.g. no data
> ever arrives).
> Transfers are slow because the physical link is slow (e.g. a serial
> port, or an usb port with an internal usb-to-serial chip), and some
> devices are just slow in processing request and sending their replies.
This is really sub-optimal, but I'm not sure how much it matters. You
are forcing the operating system to do context switches for no reason.
You have work to do, you have a thread running that could do that
work, but instead you block that thread on I/O that's not going to
complete for some time. This forces the scheduler to switch threads to
get work done.
On the bright side, it's usually not to bad with slow I/O and much
worse with fast I/O. If the I/O takes, say, 100ms, then at worst you
can force 10 extra context switches per second by blocking on the I/O.
That's negligible. It's much worse for fast I/O, since the number of
extra context switches per second you force by senseless blocking can
be much higher.
DS
Do you mean using async I/O functions internally, like aio_* on unix or
overlapped I/O on Windows instead of their sync counterparts?
>> Especially because my library consist of many backends, one for each
>> type of supported device. Each type of device has it own transfer
>> protocol and my library provides a common api to access all devices in a
>> similar way, thus hiding the underlying protocol.
>
> Presumably the common API is purely synchronous. See how this causes
> your problems -- now you can't meet other timing requirements because
> you have to call these synchronous functions that can block you for
> arbitrarily long.
Both the public api and its internal implementation is indeed entirely
synchronous.
> David Schwartz wrote:
> >>> Right, but it's a problem that's completely under your control. Don't
> >>> let the other operation take longer than your timing window. If it
> >>> does, put it down and pick it back up again later.
> >> I have no idea how I would have to implement something like this.
> > By doing the work you need to do with asynchronous functions.
> Do you mean using async I/O functions internally, like aio_* on unix or
> overlapped I/O on Windows instead of their sync counterparts?
Yes. You simply don't call any functions that can block for
arbitrarily long amounts of time. This is the same design that single-
threaded servers use. They can't allow work they're doing for a client
to tie them up for long periods of time because that would mean the
server as a whole would stall.
> > Presumably the common API is purely synchronous. See how this causes
> > your problems -- now you can't meet other timing requirements because
> > you have to call these synchronous functions that can block you for
> > arbitrarily long.
> Both the public api and its internal implementation is indeed entirely
> synchronous.
So what will probably happen is that someone somewhere will wind up
having to 'quarantine' the synchronous functions. This is sub-optimal,
but you probably don't need optimal behavior. (And it is definitely
reasonable to pick an implementation that is sub-optimal from a
performance standpoint in the areas where performance isn't important.
Ease of implementation, maintainability, and other things are often
worth trading performance for.)
DS
I think that in my case an async api is simply not worth the effort and
I'll stick to a sync api, but with a thread-safe cancellation support.
Without the thread-safe requirement, the implementation of a
cancellation object would look similar to the code below (all error
checking is omitted).
typedef struct {
int refcount;
int cancelled;
} cancel_t;
int
cancel_create (cancel_t **out)
{
cancel_t *cancel = (cancel_t *) malloc (sizeof (cancel_t));
if (cancel == NULL)
return -1;
cancel->refcount = 1;
cancel->cancelled = 0;
*out = cancel;
return 0;
}
void
cancel_set_cancelled (cancel_t *cancel, int cancelled)
{
cancel->cancelled = cancelled;
}
int
cancel_get_cancelled (cancel_t *cancel)
{
return cancel->cancelled;
}
void
cancel_ref (cancel_t *cancel)
{
++cancel->refcount;
}
void
cancel_unref (cancel_t *cancel)
{
if (--cancel->refcount != 0)
return;
free (cancel);
}
To make this object thread-safe, I'll have to add a mutex to the object
and lock/unlock it whenever I access one of the refcount or cancelled
members. Except in the create function I think, because at that point no
other thread can access the object.
But I have also been reading on atomic operations. Since I only need to
read/write/increment/decrement simple flags, I think that could be done
with atomic operations as well. Win32 has the Interlocked family of
functions [1] and recent GCC versions have built-in atomic functions
[2]. So I won't have to resort to assembler, which is not an option in
my case because I have almost no assembler knowledge. But none of them
have functions for reading/writing the variables. How should that be
done? The Win32 documentation mentions they are always atomic if
properly aligned, so I think I only need to declare the struct members
as volatile, to prevent them from being cached.
http://msdn.microsoft.com/en-us/library/ms684122(VS.85).aspx
http://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html
Are there any benefits over using a normal mutex? I think atomic
operations are safe to use in a signal handler, while mutexes are
definitely not. In a single threaded application a SIGINT handler would
be about the only way to cancel a running operation.
An alternative idea I have been thinking about, is simply letting the
application supply a callback function that is called by the library
(instead of checking the status of a cancel object) whenever it can
safely abort the running operation. With this approach, the thread
synchronization code is kept entirely outside of the library, and the
application will have to take care of that.
Which approach is preferred? Is there maybe some other approach that I
didn't think of?
You can use a fetch-and-add operation with an addend of zero to mimic an
atomic load, and a swap to emulate an atomic store. On Windows it would be
something like:
_______________________________________________________________
extern LONG InterlockedLoad(LONG volatile*);
extern VOID InterlockedStore(LONG volatile*, LONG);
#define InterlockedLoad(mp_self) ( \
InterlockedExchangeAdd((mp_self), 0) \
)
#define InterlockedStore(mp_self, mp_value) ( \
(VOID)InterlockedExchange((mp_self), (mp_value)) \
)
_______________________________________________________________
> [...]
> An alternative idea I have been thinking about, is simply letting the
> application supply a callback function that is called by the library
> (instead of checking the status of a cancel object) whenever it can safely
> abort the running operation. With this approach, the thread
> synchronization code is kept entirely outside of the library, and the
> application will have to take care of that.
>
> Which approach is preferred? Is there maybe some other approach that I
> didn't think of?
I really need to read this thread to give proper answers. However, the
callback scheme does seem like it could turn out to be a fairly workable
approach. You should also probably allow the application to pass state
information to the device object at the time of creation which would in turn
be passed into the cancel callback.
Using an "add" instruction to read a value feels wrong :-)
>> An alternative idea I have been thinking about, is simply letting the
>> application supply a callback function that is called by the library
>> (instead of checking the status of a cancel object) whenever it can safely
>> abort the running operation. With this approach, the thread
>> synchronization code is kept entirely outside of the library, and the
>> application will have to take care of that.
>>
>> Which approach is preferred? Is there maybe some other approach that I
>> didn't think of?
>
> I really need to read this thread to give proper answers. However, the
> callback scheme does seem like it could turn out to be a fairly workable
> approach. You should also probably allow the application to pass state
> information to the device object at the time of creation which would in turn
> be passed into the cancel callback.
Of course the callback function would get a "userdata" parameter. I'm
already using such a callback scheme in my library for delivering event
notifications (progress, etc) to the caller.
I think one of the disadvantages of a callback scheme is that the
application could easily do more than just checking for cancellation.
Nothing prevents the application from doing something else in the
callback function. Since I have to meet certain timing requirements for
some devices, that could cause problems when too much time is spend
inside the callback function. (Note this is also an issue for the event
callbacks, but in this case I don't think there is an alternative
approach.) With a cancellation object the library remains in control.
Also, if my library would be used from another framework (.NET, python
or whatever) invoking a callback function has some extra overhead in the
glue code.
I could also (ab)use the existing event callbacks as opportunities to
cancel the running operation, but events are not necessary emitted at
points where I can abort safely. And in some cases I could also abort at
points where no event is ever emitted. For instance when a corrupt
packet is received, the packet is retried a few times before giving up.
In this example I could abort after each packet, but the progress event
would only be emitted after the packet is received correctly.
Monitoring events would also become mandatory to be able to cancel a
running operation. On the other hand adding a dedicated callback
function to support cancellation, would significantly increase the
number of callback invocations when both cancellation and events are
enabled. Because in many cases (but not all) they would happen almost at
the same time.
> Are there any benefits over using a normal mutex? I think atomic
> operations are safe to use in a signal handler, while mutexes are
> definitely not. In a single threaded application a SIGINT handler would
> be about the only way to cancel a running operation.
For example:
1) I create an operation and block on it, since that's all I can do.
2) I decide to cancel the operation.
3) Right as I go to cancel the operation, it completes. I now call a
function on an operation that no longer exists.
There are many ways to solve this, but simplest is to protect both the
operation itself and the ability to find the operation with a mutex.
The thread that decides to cancel the operation somehow has to find
out what operation it's supposed to cancel, so synchronization will be
required anyway. To add an atomic operation to code that already has
pure synchronization is a pure pessimization.
You have to somehow make sure that when a thread makes an irrevocable
decision to cancel an operation, the operation exists long enough for
that thread to cancel it. And if the operation has already complete,
the cancelling thread must clean up the operation structure.
It is at least an order of magnitude harder to get all this right with
atomic primitives. With a lock, you can perform whatever operations
you need -- increment this, check that, delete this, change this
pointer to NULL, whatever.
DS
Isn't the reference counting supposed to solve this problem?
int
main ()
{
cancel_t *cancel;
cancel_create (&cancel, ...);
/* Increase the refcount for the worker thread */
cancel_ref (cancel);
pthread_t thread;
pthread_create (&thread, NULL, worker, cancel);
/* This could be a GUI event loop. */
while (worker_thread_is_running) {
if (user_requested_cancel)
cancel_set_cancelled (cancel, 1)
}
cancel_unref (cancel);
}
void *
worker (void *arg)
{
cancel_t *cancel= (cancel_t *) arg;
device_t *device = NULL;
device_open (&device, ...);
/* Attach the cancel object to the device handle,
which increases the refcount again, and decreases
automatically when the device handle is destroyed.*/
device_set_cancel (device, cancel);
/* We don't need the cancel object any longer */
cancel_unref (cancel);
/* Call a slow library function that
support cancellation through the
attached cancel object. */
device_read (device, ...);
device_close (device);
}
In this case, the cancel object remains alive for as long as someone
holds a reference to it, regardless of whether that's the main thread,
the worker thread or the device handle. Even if I wouldn't wait for the
worker thread to finish, the cancel object remains valid.
Sure, but you're over-simplifying.
> int
> main ()
> {
> cancel_t *cancel;
> cancel_create (&cancel, ...);
>
> /* Increase the refcount for the worker thread */
> cancel_ref (cancel);
>
> pthread_t thread;
> pthread_create (&thread, NULL, worker, cancel);
So, right here you passed the 'cancel' object to another thread at
super-high cost (creating a thread). If you're going to create a
thread for every event, avoiding a fast lock is silly.
Now, if you're not going to create a thread, you're going to dispatch
to an existing thread. That's going to require taking some lock. While
you have that lock, you can just bump the cancel count without an
atomic operation.
> /* This could be a GUI event loop. */
> while (worker_thread_is_running) {
> if (user_requested_cancel)
> cancel_set_cancelled (cancel, 1)
> }
>
> cancel_unref (cancel);
Your 'while (worker_thread_is_running)' loop is going to have to
acquire some lock to check on the status of the worker thread, right?
So what your code does is it *releases* a lock and the does an atomic
operation. Do you see how silly that is? Why not just do the cancel
decrement while you hold the lock, then the overhead of an atomic lock
is not needed.
If you are choosing an atomic operation to avoid the overhead of a
lock, you are losing on all fronts. You get the overhead of the lock
with operations like checking on the status of another thread and then
you do the atomic operation *anyway*.
> }
>
> void *
> worker (void *arg)
> {
> cancel_t *cancel= (cancel_t *) arg;
>
> device_t *device = NULL;
> device_open (&device, ...);
>
> /* Attach the cancel object to the device handle,
> which increases the refcount again, and decreases
> automatically when the device handle is destroyed.*/
> device_set_cancel (device, cancel);
At some point, this function must make the cancellation object visible
to another thread, so it's going to have to take some lock either here
or later.
> /* We don't need the cancel object any longer */
> cancel_unref (cancel);
So why waste the overhead of an atomic operation here? We just took a
lock.
> /* Call a slow library function that
> support cancellation through the
> attached cancel object. */
> device_read (device, ...);
>
> device_close (device);
>
> }
>
> In this case, the cancel object remains alive for as long as someone
> holds a reference to it, regardless of whether that's the main thread,
> the worker thread or the device handle. Even if I wouldn't wait for the
> worker thread to finish, the cancel object remains valid.
Right, but the whole point of the atomic operations was to avoid
having to take a lock. As your code shows, you have to take a lock
*anyway* to coordinate the thread that calls the function with the
thread that cancels it. So the atomic operation is pure overhead. You
have to hold a lock anyway, so just do a normal increment/decrement
while you hold the lock.
DS
I won't create a thread for every single sub-operation, but rather one
thread for the entire series of operations (downloading, processing,
etc). But your comment remains valid of course.
Since downloading data is typically a one time operation (cfr a digital
camera), I won't have an existing thread available for doing the work.
At least that's how I would do it. But users of my library should be
free to do things differently of course.
>> /* This could be a GUI event loop. */
>> while (worker_thread_is_running) {
>> if (user_requested_cancel)
>> cancel_set_cancelled (cancel, 1)
>> }
>>
>> cancel_unref (cancel);
>
> Your 'while (worker_thread_is_running)' loop is going to have to
> acquire some lock to check on the status of the worker thread, right?
> So what your code does is it *releases* a lock and the does an atomic
> operation. Do you see how silly that is? Why not just do the cancel
> decrement while you hold the lock, then the overhead of an atomic lock
> is not needed.
>
> If you are choosing an atomic operation to avoid the overhead of a
> lock, you are losing on all fronts. You get the overhead of the lock
> with operations like checking on the status of another thread and then
> you do the atomic operation *anyway*.
True, but this is not the place where I want to avoid the lock overhead.
BTW, acquiring the lock for checking the status of the worker thread
will probably never block because the worker thread will only have to
acquire that lock once to set the is_running flag to false.
>> void *
>> worker (void *arg)
>> {
>> cancel_t *cancel= (cancel_t *) arg;
>>
>> device_t *device = NULL;
>> device_open (&device, ...);
>>
>> /* Attach the cancel object to the device handle,
>> which increases the refcount again, and decreases
>> automatically when the device handle is destroyed.*/
>> device_set_cancel (device, cancel);
>
> At some point, this function must make the cancellation object visible
> to another thread, so it's going to have to take some lock either here
> or later.
I'm not really sure I understand what you are saying here. The
device_set_cancel function would simply increase the refcount and store
the pointer in the device handle. The main thread still has a reference
to the object, to be able to set it to the canceled state.
>> /* We don't need the cancel object any longer */
>> cancel_unref (cancel);
>
> So why waste the overhead of an atomic operation here? We just took a
> lock.
Increasing the refcount in the previous step would lock, increase the
refcount and immediately unlock again. So I won't have the lock anymore.
>> /* Call a slow library function that
>> support cancellation through the
>> attached cancel object. */
>> device_read (device, ...);
>>
>> device_close (device);
>>
>> }
>>
>> In this case, the cancel object remains alive for as long as someone
>> holds a reference to it, regardless of whether that's the main thread,
>> the worker thread or the device handle. Even if I wouldn't wait for the
>> worker thread to finish, the cancel object remains valid.
>
> Right, but the whole point of the atomic operations was to avoid
> having to take a lock. As your code shows, you have to take a lock
> *anyway* to coordinate the thread that calls the function with the
> thread that cancels it. So the atomic operation is pure overhead. You
> have to hold a lock anyway, so just do a normal increment/decrement
> while you hold the lock.
All true, but the point where the atomic operation would matter most in
my code is inside the device_read() function. This is the function that
does the slow I/O and thus will need to check the attached cancel object
repeatedly. This is also where the timing requirements are located. The
ref/unref'ing needs to be done only a few times.
device_read (device_t *device, ...)
{
while (!cancel_get_cancelled (device->cancel)) {
/* do some slow stuff */
}
}
But at that point I don't have the lock already. So I'll have to take
the lock or use an atomic operation.
I think that the main issue is that you seem to assume that the same
mutex can be accessed both by the application (for monitoring the worker
thread) and the library (for the cancel object). But that's not the
case. The cancel object would be implemented as an opaque object, which
means an application won't be able to access its internals, including
the internal mutex. But on the other hand, if the application already
uses a mutex, my library doesn't know anything about that one. How would
it know whether the application has a lock or not?. Locking the app
mutex for the entire duration is obviously not a good idea either. So
how can this work?
I think that even when I implement the cancel object with a mutex
instead of atomic operations, the application will still need to have
its own mutex to monitor the worker thread. Or am I missing something?
> BTW, acquiring the lock for checking the status of the worker thread
> will probably never block because the worker thread will only have to
> acquire that lock once to set the is_running flag to false.
And you think holding a lock while you increment/decrement a single
integer will?
> > At some point, this function must make the cancellation object visible
> > to another thread, so it's going to have to take some lock either here
> > or later.
> I'm not really sure I understand what you are saying here.
If thread A creates the operation and thread B cancels it, then
somehow the operation must have been communicated from thread A to
thread B. Whatever mechanism this was, it will have to involve a lock
somewhere. While holding that lock, you can manipulate the cancel
count.
> >> /* We don't need the cancel object any longer */
> >> cancel_unref (cancel);
> > So why waste the overhead of an atomic operation here? We just took a
> > lock.
> Increasing the refcount in the previous step would lock, increase the
> refcount and immediately unlock again. So I won't have the lock anymore.
Okay, think this through:
1) You created an operation.
2) You arranged for another thread to find it should it need to be
cancelled.
3) The operation completed.
At this point, you need to undo whatever you did in step 2. There is
no longer any need to keep the operation information where the other
thread can find it. So there is some structure you need to update. To
update that structure, you will need to hold a lock or you could race
with the cancelling thread.
So you will have to acquire some lock anyway. Why not decrement the
cancel count while holding that lock?
How will the GUI thread know what the pointer to the operation to be
cancelled is? How will it know the operation completed? All of these
things will have to be communicated, and that will be done with locks.
These locks protect the pointer to the operation and the global
knowledge of its state. (The GUI must know whether the operation is in
progress, right? Otherwise how could it even offer to let you cancel
it?)
So logically, these same locks should protect the reference count.
> > Right, but the whole point of the atomic operations was to avoid
> > having to take a lock. As your code shows, you have to take a lock
> > *anyway* to coordinate the thread that calls the function with the
> > thread that cancels it. So the atomic operation is pure overhead. You
> > have to hold a lock anyway, so just do a normal increment/decrement
> > while you hold the lock.
> All true, but the point where the atomic operation would matter most in
> my code is inside the device_read() function. This is the function that
> does the slow I/O and thus will need to check the attached cancel object
> repeatedly. This is also where the timing requirements are located. The
> ref/unref'ing needs to be done only a few times.
Okay, so we're not talking about the reference count, we're talking
about the 'is_cancelled' flag? IMO, a volatile boolean is sufficient
for this purpose, however it is technically a platform-specific
optimization.
> I think that the main issue is that you seem to assume that the same
> mutex can be accessed both by the application (for monitoring the worker
> thread) and the library (for the cancel object). But that's not the
> case. The cancel object would be implemented as an opaque object, which
> means an application won't be able to access its internals, including
> the internal mutex. But on the other hand, if the application already
> uses a mutex, my library doesn't know anything about that one. How would
> it know whether the application has a lock or not?. Locking the app
> mutex for the entire duration is obviously not a good idea either. So
> how can this work?
You simply require the application to synchronize the use of a single
operation. For example, you prohibit the application from calling
'cancel' while it is calling 'release' on the same operation. You
require the application to serialize operations on the operation, just
as it would serialize operations to anything else.
> I think that even when I implement the cancel object with a mutex
> instead of atomic operations, the application will still need to have
> its own mutex to monitor the worker thread. Or am I missing something?
Your operations are going to be synchronous. The application has no
need to monitor the worker thread. It will know the operation is done
when the calling thread returns, right?
DS
;^)
AFAICT, its the only portable method to get an atomic load with full memory
barrier semantics on Windows.
[...]
AFAICT, a cancel object like the one presented in the following pseudo-code
should work:
_______________________________________________________________________
struct cancel {
atomic_word refs;
atomic_word volatile cancelled;
};
struct device {
struct cancel* cancel;
/* [...] */
};
int
cancel_create(
struct cancel** pself,
atomic_word refs
) {
struct cancel* const self = malloc(sizeof(*self));
if (! self) return ENOMEM;
self->refs = refs;
self->cancelled = 0;
*pself = self;
return 0;
}
void
cancel_set(
struct cancel* const self
) {
self->cancelled = 1;
}
void
cancel_refadjust(
struct cancel* const self
atomic_word val
) {
/* uses atomic fetch-and-add with full membar */
atomic_uword old_val = ATOMIC_FAA_FENCE(&self->refs, val);
if ((old_val + val) < 1) {
free(self);
}
}
int
device_create(
struct device** pself,
struct cancel* cancel,
/* [...] */
) {
struct device* const self = malloc(sizeof(*self));
if (! self) return ENOMEM;
self->cancel = cancel;
/* [...] */
*pself = self;
return 0;
}
int
device_read/write(
struct device* self,
/* [...] */
) {
for (;;) {
if (self->cancel) {
if (self->cancel->cancelled) {
/* [...] */
return -1;
}
}
/* [...] */
}
return 0;
}
int
device_destroy(
struct device* self
) {
/* [...] */
free(self);
return 0;
}
_______________________________________________________________________
Here is some sample application pseudo-code which cancels a "device thread":
_______________________________________________________________________
void*
app_device_thread(
void* state
) {
struct device* device;
struct cancel* const cancel = state;
device_create(&device, cancel);
device_read(&device, /* [...] */);
device_destroy(&device);
cancel_refadjust(cancel, -1);
return NULL;
}
int main(void) {
pthread_t tid;
struct cancel* cancel;
cancel_create(&cancel, 2);
pthread_create(&tid, NULL, app_device_thread, cancel);
sleep(5); /* sleep for a while */
cancel_set(cancel);
cancel_refadjust(cancel, -1);
pthread_join(tid, NULL);
return 0;
}
_______________________________________________________________________
Would that work for you? AFAICT, it should work fine. Just document that a
`device' object is not thread-safe in anyway shape or form. Then document
that a `cancel' object provides only basic/normal thread-safety, NOT strong!
Keep in mind that even Boost shared_ptr only provides basic/normal
thread-safety level. Here is the rule that MUST be followed:
An application thread can only increment the reference count IF it already
owns a previous reference!!!
Here is some more info on basic/normal vs. strong thread safety:
http://groups.google.com/group/comp.programming.threads/browse_thread/thread/e5167941d32340c6
If applications do not follow the rule, well, they will experience a very
nasty race-condition!
;^o
Also, remember to document that a `device' object does not take ownership of
a `cancel' object in any way shape or form. Its the applications
responsibility to maintain a correct reference count for a `device' object!
Any thoughts?
Ummm. That should be:
device_create(&device, cancel);
device_read(device, /* [...] */);
device_destroy(device);
> cancel_refadjust(cancel, -1);
> return NULL;
> }
>
>
> int main(void) {
> pthread_t tid;
> struct cancel* cancel;
>
> cancel_create(&cancel, 2);
> pthread_create(&tid, NULL, app_device_thread, cancel);
> sleep(5); /* sleep for a while */
> cancel_set(cancel);
> cancel_refadjust(cancel, -1);
> pthread_join(tid, NULL);
> return 0;
> }
> _______________________________________________________________________
> [...]
Anyway, I read your sample cancel object impl, and its associating the
`cancel' object with the `device' object in a separate function. That's
fine. Just make sure to document that this is not a thread-safe action.
AFAICT, you already document that `device' objects are not thread-safe in
any way shape or form and that its the application responsibility to provide
proper synchronization.
One point, using a separate `cancel' object means that the application can
use it to cancel out multiple `device' objects in a single shot. Perhaps you
might want to document that aspect as well.
Something like this is indeed what I'm looking for. The refcount
definitely needs a lock (or atomic operations). That part is already
clear to me. Is the cancelled guaranteed to work correctly with the
volatile modifier, but no locks. Or is this something that happens to
work on all/most typical platforms?
With regards to the thread safety of the device object, I know it's not
thread safe in the sense that it can't be accessed safely my multiple
threads at the same time. But it's implementation is still safe in the
sense that you can safely use different device object in different
threads (e.g. reentrant functions like localtime_r are used internally,
and all state is stored in the device object itself). Otherwise it
wouldn't work correctly in a multithreaded application at all. Is there
a specific term for this type of "thread safety"?
> Here is the rule that MUST be followed:
> An application thread can only increment the reference count IF it already
> owns a previous reference!!!
Sounds very reasonable.
> Also, remember to document that a `device' object does not take ownership of
> a `cancel' object in any way shape or form. Its the applications
> responsibility to maintain a correct reference count for a `device' object!
Is there a reason for not taking ownership by the device object? I think
this would prevent errors, because it will be quite easy for an
application to forget to keep a reference to the cancel object.
With a separate registration function, I would have the possibility to
detach/replace the cancel object again. When the cancel object is
attached during the construction of the device object, it is stuck forever.
On the other hand, it would also allow to cancel the construction of the
device object, which could be useful. Maybe even more useful than
allowing to detach/replace the cancel object, because some devices
already perform some slow I/O during the construction phase.
> One point, using a separate `cancel' object means that the application can
> use it to cancel out multiple `device' objects in a single shot. Perhaps you
> might want to document that aspect as well.
This "feature" is not really a requirement in my case. But like you say,
it's something that should be documented.
I didn't say that :-)
> If thread A creates the operation and thread B cancels it, then
> somehow the operation must have been communicated from thread A to
> thread B. Whatever mechanism this was, it will have to involve a lock
> somewhere. While holding that lock, you can manipulate the cancel
> count.
In my example, the main thread created the cancel object and passed it
to the worker thread. So they can access it both and the refcount makes
sure the object remains alive for as long as one of the threads needs
the object.
> Okay, think this through:
>
> 1) You created an operation.
> 2) You arranged for another thread to find it should it need to be
> cancelled.
> 3) The operation completed.
>
> At this point, you need to undo whatever you did in step 2. There is
> no longer any need to keep the operation information where the other
> thread can find it. So there is some structure you need to update. To
> update that structure, you will need to hold a lock or you could race
> with the cancelling thread.
>
> So you will have to acquire some lock anyway. Why not decrement the
> cancel count while holding that lock?
>
> How will the GUI thread know what the pointer to the operation to be
> cancelled is? How will it know the operation completed? All of these
> things will have to be communicated, and that will be done with locks.
> These locks protect the pointer to the operation and the global
> knowledge of its state. (The GUI must know whether the operation is in
> progress, right? Otherwise how could it even offer to let you cancel
> it?)
>
> So logically, these same locks should protect the reference count.
Now I'm really getting confused. Are you saying that I should keep the
mutex outside the cancel object (thus making the object not thread-safe
like the rest of my library), and let the application handle all the
locking? Thus allowing to use the same mutex for both the cancel object
and all other main<->worker thread communication?
In that case, how about my library functions (like my device_read
function) that need to access the cancel object to check its status?
These functions are not aware of the application mutex, so they won't be
able to lock it. Or would you implement this without any locking at all
(with a volatile modifier I suppose)? And how about the
device_set_cancel function to attach the cancel object to the device
object? It doesn't have access to the mutex either, or would you simply
not increment the refcount in this case?
>>> Right, but the whole point of the atomic operations was to avoid
>>> having to take a lock. As your code shows, you have to take a lock
>>> *anyway* to coordinate the thread that calls the function with the
>>> thread that cancels it. So the atomic operation is pure overhead. You
>>> have to hold a lock anyway, so just do a normal increment/decrement
>>> while you hold the lock.
>
>> All true, but the point where the atomic operation would matter most in
>> my code is inside the device_read() function. This is the function that
>> does the slow I/O and thus will need to check the attached cancel object
>> repeatedly. This is also where the timing requirements are located. The
>> ref/unref'ing needs to be done only a few times.
>
> Okay, so we're not talking about the reference count, we're talking
> about the 'is_cancelled' flag? IMO, a volatile boolean is sufficient
> for this purpose, however it is technically a platform-specific
> optimization.
Are there any architectures or compilers where a volatile wouldn't work?
>> I think that the main issue is that you seem to assume that the same
>> mutex can be accessed both by the application (for monitoring the worker
>> thread) and the library (for the cancel object). But that's not the
>> case. The cancel object would be implemented as an opaque object, which
>> means an application won't be able to access its internals, including
>> the internal mutex. But on the other hand, if the application already
>> uses a mutex, my library doesn't know anything about that one. How would
>> it know whether the application has a lock or not?. Locking the app
>> mutex for the entire duration is obviously not a good idea either. So
>> how can this work?
>
> You simply require the application to synchronize the use of a single
> operation. For example, you prohibit the application from calling
> 'cancel' while it is calling 'release' on the same operation. You
> require the application to serialize operations on the operation, just
> as it would serialize operations to anything else.
Right, but an application won't be able to prevent my library code
(inside the device_read function) from accessing the cancel object with
only an application mutex.
>> I think that even when I implement the cancel object with a mutex
>> instead of atomic operations, the application will still need to have
>> its own mutex to monitor the worker thread. Or am I missing something?
>
> Your operations are going to be synchronous. The application has no
> need to monitor the worker thread. It will know the operation is done
> when the calling thread returns, right?
Yes. When the operation is done (or cancelled of course) the worker
thread will return too. At least for my simple approach with one worker
thread for each operation.
> Now I'm really getting confused. Are you saying that I should keep the
> mutex outside the cancel object (thus making the object not thread-safe
> like the rest of my library), and let the application handle all the
> locking? Thus allowing to use the same mutex for both the cancel object
> and all other main<->worker thread communication?
Yes.
> In that case, how about my library functions (like my device_read
> function) that need to access the cancel object to check its status?
You don't have to touch the reference count to the cancel object. You
have a synchronous function that is called and eventually returns. The
cancellation object will not be destroyed until the function returns,
so you need have nothing to do with the synchronization that protects
the reference count to the cancellation object.
I think the confusion is coming from you not separating the
cancellation flag from the reference count. Your library does not need
to do anything with reference count. In fact, you could just have an
API that take a pointer to a function that returns an 'is cancelled'
boolean.
> These functions are not aware of the application mutex, so they won't be
> able to lock it. Or would you implement this without any locking at all
> (with a volatile modifier I suppose)? And how about the
> device_set_cancel function to attach the cancel object to the device
> object? It doesn't have access to the mutex either, or would you simply
> not increment the refcount in this case?
Your library has interest in the lifetime of the cancellation object.
So there's no reason for you to care how the application manages its
lifetime.
> > Okay, so we're not talking about the reference count, we're talking
> > about the 'is_cancelled' flag? IMO, a volatile boolean is sufficient
> > for this purpose, however it is technically a platform-specific
> > optimization.
> Are there any architectures or compilers where a volatile wouldn't work?
It depends what "work" means. There is no specific guarantee for how
quickly it will be noticed. Technically, you should consider this a
platform-specific optimization. It's your call, it will work either
way. You can also acquire a mutex while you check -- trust me, the
mutex won't be a bottleneck unless you check it ridiculously often.
DS
There is a very simple reason why I'm using a separate cancel object,
instead of just storing the cancel flag inside the device handle. With
an internal flag there is a serious problem when you attempt to cancel
an operation while the device object is being destroyed.
struct device {
/* some more stuff here */
int cancelled;
}
void
device_close (device_t *device)
{
/* some more cleanup here */
free (device);
}
void
device_cancel (device_t *device)
{
/* This will break if another thread did
free the device object already. */
device->cancelled = 0;
}
With a separate object, I can easily make sure the lifetime of the
cancel object is longer than that of the device object, without any
significant resource consumption. Keeping the device object alive for
longer than what is strictly necessary would be a very bad idea
(consuming hardware resources, etc). The cancel object is only a very
lightweight object, so it's not really an issue when it's kept a little
longer than necessary.
> Is there a reason for not taking ownership by the device object? I think
> this would prevent errors, because it will be quite easy for an
> application to forget to keep a reference to the cancel object.
If the application doesn't have a reference, then the object may go
away, causing a failure. There's nothing you can do to fix this.
Somehow, if the application is going to hold a pointer to something,
it must also hold a reference to stop that something from going away.
DS
Okay, that makes it a lot clearer.
Note: In my case the cancel object would not be passed to the function
that performs the slow i/o, but rather attached to device object itself
(to be able to re-use the cancel object for all operations, without
having to pass it to every single function). So the cancel object should
be kept alive for as long as the device object exists. But the principle
remains exactly the same of course.
> I think the confusion is coming from you not separating the
> cancellation flag from the reference count. Your library does not need
> to do anything with reference count. In fact, you could just have an
> API that take a pointer to a function that returns an 'is cancelled'
> boolean.
I even considered this callback design. It has the advantage of not
having to deal with any locking at all (because that would have to be
done by the application), but on the other hand calling back into
application code might be more expensive when my library would be used
in a non C/C++ application like .NET. Just checking a flag is probably a
lot cheaper because it doesn't have to cross the language border. But to
be honest, I'm not really sure if that would really be a problem in
practice.
My library also supports progress reporting through callback functions.
Usually (but not always) events and cancellation points are located at
exactly the same place, and thus I would have to to call two callback
functions. Not exactly optimal.
>> These functions are not aware of the application mutex, so they won't be
>> able to lock it. Or would you implement this without any locking at all
>> (with a volatile modifier I suppose)? And how about the
>> device_set_cancel function to attach the cancel object to the device
>> object? It doesn't have access to the mutex either, or would you simply
>> not increment the refcount in this case?
>
> Your library has interest in the lifetime of the cancellation object.
> So there's no reason for you to care how the application manages its
> lifetime.
So I would end up with a very simple cancel object:
struct cancel_t {
int cancelled;
}
And either declare the cancelled flag as volatile, or extend the
structure with a mutex and lock/unlock everytime I access the flag. The
reference counting (or object lifetime in general) would be left to the
application code.
>>> Okay, so we're not talking about the reference count, we're talking
>>> about the 'is_cancelled' flag? IMO, a volatile boolean is sufficient
>>> for this purpose, however it is technically a platform-specific
>>> optimization.
>
>> Are there any architectures or compilers where a volatile wouldn't work?
>
> It depends what "work" means. There is no specific guarantee for how
> quickly it will be noticed. Technically, you should consider this a
> platform-specific optimization. It's your call, it will work either
> way. You can also acquire a mutex while you check -- trust me, the
> mutex won't be a bottleneck unless you check it ridiculously often.
Define "work" as it should never cause any unexpected behavior, such as
crashing the application, corrupting memory, etc. It's no problem if a
change by another thread is not noticed immediately. My cancellation
support is not immediately anyway. But the change should be noticed in a
reasonable time, from an end-user point of view (e.g. when a user click
cancel, it's no problem if the operation last for one more second, but
it shouldn't take ages).
Yes; NUMA systems that do not honor cache-coherency wrt inter-node
communication.
>> It depends what "work" means. There is no specific guarantee for how
>> quickly it will be noticed. Technically, you should consider this a
>> platform-specific optimization. It's your call, it will work either
>> way. You can also acquire a mutex while you check -- trust me, the
>> mutex won't be a bottleneck unless you check it ridiculously often.
>
> Define "work" as it should never cause any unexpected behavior, such as
> crashing the application, corrupting memory, etc. It's no problem if a
> change by another thread is not noticed immediately. My cancellation
> support is not immediately anyway. But the change should be noticed in a
> reasonable time, from an end-user point of view (e.g. when a user click
> cancel, it's no problem if the operation last for one more second, but it
> shouldn't take ages).
Read this whole thread:
http://groups.google.com/group/comp.arch/browse_thread/thread/df6f520f7af13ea5
You can say the API's are reentrant such that two threads A and B cannot
call into the API using the same `device' object. However, two threads A and
B can call into the API if they are using different `device' objects.
>> Here is the rule that MUST be followed:
>> An application thread can only increment the reference count IF it
>> already owns a previous reference!!!
>
> Sounds very reasonable.
>
>> Also, remember to document that a `device' object does not take ownership
>> of a `cancel' object in any way shape or form. Its the applications
>> responsibility to maintain a correct reference count for a `device'
>> object!
>
> Is there a reason for not taking ownership by the device object? I think
> this would prevent errors, because it will be quite easy for an
> application to forget to keep a reference to the cancel object.
David already answered. BTW, I think you should get rid of the reference
count from the cancel object as it would remove the need for your library to
contain any synchronization primitives. Also, it would allow for an
application to choose exactly what type of reference counting algorithm to
use. With a cancel object like:
struct cancel {
atomic_word volatile cancelled;
};
I could eaisly use atomic counting:
struct my_cancel {
struct cancel cancel;
atomic_word refs;
};
Or use a custom mutex:
struct my_cancel {
struct cancel cancel;
custom_mutex_t mutex;
int refs;
};
Whatever.
As for synchronization for the `cancel::cancelled' member, I think volatile
should be sufficient unless you expect your devices to be used on non-cache
coherent systems. Then you would need to use a standard thread API, or arch
specific instructions. If you do not want your library to contain any
dependency to a threading API, then volatile may be the way to go. Also,
checking a volatile should be as friendly as you can get wrt your strict
timing issues. Although, using a mutex should not screw up your timing
requirements either because its never really going to ever be contended
with. Unless a really crappy application likes setting the cancelled state
over and over again in a tight loop.