I think, that would be difficult.
As soon as you use any packages for image conversion or estimation you have to assume that they use dynamic memory allocation.
The garbage collector of Julia is fast, but not suitable for hard real-time requirements. Implementing a garbage collector for hard real-time
applications is possible, but a lot of work and will probably not happen in the near future.
Their was an issue on this topic, that was closed as "won't fix":
https://github.com/JuliaLang/julia/issues/8543
If you are prepared to make your code to not perform any heap allocations, I don't see a reason why there should be any issue. When I once worked on a very first multi-threading version of Julia I wrote exactly such functions that won't trigger gc since the later was not thread safe. This can be hard work but I would assume that its at least not more work than implementing the application in C/C++ (assuming that you have some Julia experience)
So for now the best is to build a toy that is equivalent in processing time to the original and see by myself what I'm able to get.
We have many ideas, many theories due to the nature of the GC so the best is to try.Páll -> Thanks for the links
Páll: don't worry about the project failing because of YOUUUUUU ;) in any case we wanted to try Julia and see if we could get help/tips from the community.
About the nogc I wonder if activating it will also prevent the core of Julia to be garbage collected ? If yes for long run it's a bad idea to disable it too long.
For now the only options options are C/C++ and Julia, sorry no D or Lisp :) Why would you not recommend C for this kind of tasks ?
And I said 1000 images/sec but the camera may be able to go up to 10 000 images/sec so I think we can define it as hard real time.
Linus Torvalds, on his own Linux kernel (may be outdated, there is real-time kernel now available, it's not the default, just read the fine print there):
"Can we make the whole kernel truly hard-RT? Sure, possible in theory. In practice? No way, José. It's just not mainline enough."Note what he says about CPUs with caches (all modern CPUs.. even some microcontrollers, those without wouldn't be fast enough anyway..). Silicon Graphics had real-time I/O capabilities in their filesystem:
"Learn why Java SE is a good choice for implementing real-time systems, especially those that are large, complex, and dynamic.
Published August 2014
[..]The use of Java SE APIs in the implementation of real-time systems is
most appropriate for soft real-time development. Using Java SE for hard
real-time development is also possible, but generally requires the use
of more specialized techniques such as the use of NoHeapRealtimeThread
abstractions, as described in the Real-Time Specification for Java (JSR 1), or the use of the somewhat simpler ManagedSchedulable
abstractions of the Safety Critical Java Technology specification (JSR 302).
[..]
Projects that can be implemented entirely by one or two developers in a year's time are more likely to be implemented in a less powerful language such as C or C++
[..]
As Chief Technology Officer over Java at Atego Systems—a mission- and safety-critical solutions provider—Dr. Kelvin Nilsen oversees the design and implementation of the Perc Ultra virtual machine and other Atego embedded and real-time oriented products. Prior to joining Atego, Dr. Nilsen served on the faculty of Iowa State University where he performed seminal research on real-time Java that led to the Perc family of virtual machine products."
Since it seems you have a good overview in this domain I will give more details:
We are working in signal processing and especially in image processing. The goal here is just the adaptive optic: we just want to stabilize the image and not get the final image.
The consequence is that we will not store anything on the hard drive: we read an image, process it and destroy it. We stay in RAM all the time.
The processing is done by using/coding our algorithms. So for now, no need of any external library (for now, but I don't see any reason for that now)
First I would like to apologize: just after posting my answer I went to wikipedia to search the difference between soft and real time.
I should have done it before so that you don't have to spend more time to explain.
In the end I still don't know if I am hard real time or soft real time: the timing is given by the camera speed and the processing should be done between the acquisition of two images.
We don't want to miss an image or delay the processing, I still need to clarify the consequences of a delay or if we miss an image.
For now let's just say that we can miss some images so we want soft real time.
I'm making a benchmark that should match the system in term of complexity, these are my first remarks:
When you say that one allocation is unacceptable, I say it's shockingly true: In my case I had 2 allocations done by:
A +=1 where A is an array
and in 7 seconds I had 600k allocations.
Morality :In closed loop you cannot accept any alloc and so you have to explicit all loops.
--track-allocation
option to hunt down memory allocations while developing your algorithm. See e.g. http://docs.julialang.org/en/release-0.4/manual/profile/#man-track-allocation
I have two problems now:
1/ Many times, the first run that include the compilation was the fastest and then any other run was slower by a factor 2.
2/ If I relaunch many times the main function that is in a module, there are some run that were very different (slower) from the previous.
About 1/, although I find it strange I don't really care.
2/ If far more problematic, once the code is compiled I want it to act the same whatever the number of launch.
I have some ideas why but no certitudes. What bother me the most is that all the runs in the benchmark will be slower, it's not a temporary slowdown it's all the current benchmark that will be slower.
If I launch again it will be back to the best performances.
Thank you for the links they are very interesting and I keep that in mind.
Note: I disabled hyperthreading and overclock, so it should not be the CPU doing funky things.
Hi John,
I am currently pursuing similar effort. I got a GPIO pin on the BeagleBone Black embedded board toggling in hard real-time and verified the jitter with an oscilloscope. For that, I used a vanilla Linux 4.4.11 kernel with the PREEMPT_RT patch applied. I also released an initial version of a Julia package that wraps the clock_nanosleep() and clock_gettime() functions from the POSIX real-time extensions. Please see this other thread:
https://groups.google.com/forum/#!topic/julia-users/0Vr2rCRwJY4
I tested that package both on Intel-based laptop and on the BeagleBone Black. I am giving some of the relevant details below..
On Monday, June 6, 2016 at 5:41:29 AM UTC-4, John leger wrote:Since it seems you have a good overview in this domain I will give more details:
We are working in signal processing and especially in image processing. The goal here is just the adaptive optic: we just want to stabilize the image and not get the final image.
The consequence is that we will not store anything on the hard drive: we read an image, process it and destroy it. We stay in RAM all the time.
The processing is done by using/coding our algorithms. So for now, no need of any external library (for now, but I don't see any reason for that now)
First I would like to apologize: just after posting my answer I went to wikipedia to search the difference between soft and real time.
I should have done it before so that you don't have to spend more time to explain.
In the end I still don't know if I am hard real time or soft real time: the timing is given by the camera speed and the processing should be done between the acquisition of two images.
We don't want to miss an image or delay the processing, I still need to clarify the consequences of a delay or if we miss an image.
For now let's just say that we can miss some images so we want soft real time.
The real-time performance you are after could be 95% hard real-time. See e.g. here: https://www.osadl.org/fileadmin/dam/rtlws/12/Brown.pdf
I'm making a benchmark that should match the system in term of complexity, these are my first remarks:
When you say that one allocation is unacceptable, I say it's shockingly true: In my case I had 2 allocations done by:
A +=1 where A is an array
and in 7 seconds I had 600k allocations.
Morality :In closed loop you cannot accept any alloc and so you have to explicit all loops.
Yes, try to completely avoid memory allocations while developing your own algorithms in Julia. Pre-allocations and in-place operations are your friends! The example script available on the POSIXClock package is one way to do this (https://github.com/ibadr/POSIXClock.jl/blob/master/examples/rt_histogram.jl). The real-time section of the code is marked by a ccall to mlockall() in order to cause immediate failure upon memory allocations in the real-time section. You can also use the--track-allocation
option to hunt down memory allocations while developing your algorithm. See e.g. http://docs.julialang.org/en/release-0.4/manual/profile/#man-track-allocation
--track-allocation
not so long ago and it is a good tool. For now I think I will rely on tracking allocation manually. I am a little afraid of using mlockall(): In soft or real time crashing (failure) is not a good option for me...--track-allocation I have a question:
- function deflat(v::globalVar)
0 @simd for i in 1:v.len_sub
0 @inbounds v.sub_imagef[i] = v.flat[i]*v.image[i]
- end
-
0 @simd for i in 1:v.len_ref
0 @inbounds v.ref_imagef[i] = v.flat[i]*v.image[i]
- end
0 return
- end
-
- # get min max
- # apply norm_coef
- # MORE TO DO HERE
- function normalization(v::globalVar)
0 min::Float32 = Float32(4095)
0 max::Float32 = Float32(0)
0 tmp::Float32 = Float32(0)
0 norm_fact::Float32 = Float32(0)
0 norm_coef::Float32 = Float32(0)
- # find min max
0 @simd for i in 1:v.nb_mat
0 # Doing something with no allocs
0 end
0 end
0
1226415 # SAD[70] 16x16 de Ref_Image sur Sub_Image[60]
- function correlation_SAD(v::globalVar)
0
- end
-
In the mem output file I have this information: at the end of normalization I have no alloc and in front of the SAD comment and before the empty correlation function I have 1226415 allocations.
It should be logic that these allocations happened in normalization but why is it here between two function ?
I have two problems now:
1/ Many times, the first run that include the compilation was the fastest and then any other run was slower by a factor 2.
2/ If I relaunch many times the main function that is in a module, there are some run that were very different (slower) from the previous.
About 1/, although I find it strange I don't really care.
2/ If far more problematic, once the code is compiled I want it to act the same whatever the number of launch.
I have some ideas why but no certitudes. What bother me the most is that all the runs in the benchmark will be slower, it's not a temporary slowdown it's all the current benchmark that will be slower.
If I launch again it will be back to the best performances.
Thank you for the links they are very interesting and I keep that in mind.
Note: I disabled hyperthreading and overclock, so it should not be the CPU doing funky things.
Regarding these two issues, I encountered similar ones. Are you running on an Intel-based computer? I had to do many tweaks to get to acceptable real-time performance with Intel processors. Many factors could be at play. As you said, you have to make sure hyper-threading is disabled and not to overclock the processor. Also, monitor the kernel dmesg log for any errors or warnings regarding RT throttling or local_softitq_pending.
Additionally, I had to use the following options in the Linux command line (pass them from the bootloader):
intel_idle.max_cstate=0 processor.max_cstate=0 idle=poll
Together with removing the intel_powerclamp kernel module (sudo rm intel_powerclamp). Caution: be extremely careful with such configuration as it disables many power saving features in the processor and can potentially overheat it. Keep an eye on the kernel dmesg log and try to monitor the CPU temperature.
I also found it useful to isolate one CPU core using the isolcpus=1 kernel command line option and then set the affinity of the real-time Julia process to run on that isolated CPU (using the taskset command). This way, you can almost guarantee the Linux kernel and all other user-space process will not run on that isolated CPU so it becomes wholly dedicated to running the real-time Julia process. I am planning to post more details to the POSIXClock package in the near future.
Best,
Islam
I have two problems now:
1/ Many times, the first run that include the compilation was the fastest and then any other run was slower by a factor 2.
2/ If I relaunch many times the main function that is in a module, there are some run that were very different (slower) from the previous.
About 1/, although I find it strange I don't really care.
2/ If far more problematic, once the code is compiled I want it to act the same whatever the number of launch.
I have some ideas why but no certitudes. What bother me the most is that all the runs in the benchmark will be slower, it's not a temporary slowdown it's all the current benchmark that will be slower.
If I launch again it will be back to the best performances.
Thank you for the links they are very interesting and I keep that in mind.
Note: I disabled hyperthreading and overclock, so it should not be the CPU doing funky things.
Regarding these two issues, I encountered similar ones. Are you running on an Intel-based computer? I had to do many tweaks to get to acceptable real-time performance with Intel processors. Many factors could be at play. As you said, you have to make sure hyper-threading is disabled and not to overclock the processor. Also, monitor the kernel dmesg log for any errors or warnings regarding RT throttling or local_softitq_pending.
Additionally, I had to use the following options in the Linux command line (pass them from the bootloader):
intel_idle.max_cstate=0 processor.max_cstate=0 idle=poll
Together with removing the intel_powerclamp kernel module (sudo rm intel_powerclamp). Caution: be extremely careful with such configuration as it disables many power saving features in the processor and can potentially overheat it. Keep an eye on the kernel dmesg log and try to monitor the CPU temperature.
I also found it useful to isolate one CPU core using the isolcpus=1 kernel command line option and then set the affinity of the real-time Julia process to run on that isolated CPU (using the taskset command). This way, you can almost guarantee the Linux kernel and all other user-space process will not run on that isolated CPU so it becomes wholly dedicated to running the real-time Julia process. I am planning to post more details to the POSIXClock package in the near future.
I have an intel processor indeed and thanks for all the tips I will first try to apply to isolate a CPU then disabling the intel options.
Best,
Islam
Again thanks a lot for all the help.
Since it seems you have a good overview in this domain I will give more details:
We are working in signal processing and especially in image processing. The goal here is just the adaptive optic: we just want to stabilize the image and not get the final image.
The consequence is that we will not store anything on the hard drive: we read an image, process it and destroy it. We stay in RAM all the time.
The processing is done by using/coding our algorithms. So for now, no need of any external library (for now, but I don't see any reason for that now)
First I would like to apologize: just after posting my answer I went to wikipedia to search the difference between soft and real time.
I should have done it before so that you don't have to spend more time to explain.
In the end I still don't know if I am hard real time or soft real time: the timing is given by the camera speed and the processing should be done between the acquisition of two images.
[Note also, real-time also applies to doing stuff too early, not only to not doing stuff too late.. In some cases, say in games, that is not a [big] problem, getting a frame ready earlier isn't a big concern.]
Are you sure "the processing should be done between the acquisition of two images" is a strict requirement? I assume the "atmospheric turbulence" to not change extremely quickly and you could have some latency with you calculation applying for some time/at least a few/many frames after and then your project seems not hard real-time at all. Maybe soft or firm, a category I had forgotten..
At least your timescale is much longer than the camera speed to capture each frame in a video?
You also said "1000 images/sec but the camera may be able to go up to 10 000 images/sec". I'm aware of very high-speed photography, such as capturing a picture of a bullet from a gun, or seeing light literally spreading across a room. Still do you need many frames per second for (capturing video, that seems not your job) or for correction? Did you mix up camera speed for exposure time? Ordinary cameras go up to 1/1000 s shutter speed, but might only take video at up to 30, 60 or say 120 fps.
We don't want to miss an image or delay the processing, I still need to clarify the consequences of a delay or if we miss an image.
For now let's just say that we can miss some images so we want soft real time.
I'm making a benchmark that should match the system in term of complexity, these are my first remarks:
When you say that one allocation is unacceptable, I say it's shockingly true: In my case I had 2 allocations done by:
A +=1 where A is an array
and in 7 seconds I had 600k allocations.
Morality :In closed loop you cannot accept any alloc and so you have to explicit all loops.
I have two problems now:
1/ Many times, the first run that include the compilation was the fastest and then any other run was slower by a factor 2.
2/ If I relaunch many times the main function that is in a module, there are some run that were very different (slower) from the previous.
About 1/, although I find it strange I don't really care.
2/ If far more problematic, once the code is compiled I want it to act the same whatever the number of launch.
I have some ideas why but no certitudes. What bother me the most is that all the runs in the benchmark will be slower, it's not a temporary slowdown it's all the current benchmark that will be slower.
If I launch again it will be back to the best performances.
Thank you for the links they are very interesting and I keep that in mind.
Note: I disabled hyperthreading and overclock, so it should not be the CPU doing funky things.
On Monday, May 30, 2016 at 8:19:34 PM UTC, Tobias Knopp wrote:If you are prepared to make your code to not perform any heap allocations, I don't see a reason why there should be any issue. When I once worked on a very first multi-threading version of Julia I wrote exactly such functions that won't trigger gc since the later was not thread safe. This can be hard work but I would assume that its at least not more work than implementing the application in C/C++ (assuming that you have some Julia experience)I would really like to know why the work is hard, is it getting rid of the allocations, or being sure there are no more hidden in your code? I would also like to know then if you can do the same as in D language:
that is would it be possible to make a macro @nogc and mark functions in a similar way?
The @nogc macro was made a long time ago, I now see:
https://groups.google.com/forum/?fromgroups=#!searchin/julia-users/Suspending$20Garbage$20Collection$20for$20Performance...good$20idea$20or$20bad$20idea$3F/julia-users/6_XvoLBzN60/nkB30SwmdHQJ
I'm not saying disabling the GC is preferred, just that the macro has been done to do it had already been done.
Karpinski has his own exception variant a little down the thread with "you really want to put a try-catch around it". I just changed that variant so it can be called recursively (and disabled try-catch as it was broken):
macro nogc(ex)
quote
#try
local pref = gc_enable(false)
local val = $(esc(ex))
#finally
gc_enable(pref)
#end
val
end
end
On Monday, June 6, 2016 at 9:41:29 AM UTC, John leger wrote:Since it seems you have a good overview in this domain I will give more details:
We are working in signal processing and especially in image processing. The goal here is just the adaptive optic: we just want to stabilize the image and not get the final image.
The consequence is that we will not store anything on the hard drive: we read an image, process it and destroy it. We stay in RAM all the time.
The processing is done by using/coding our algorithms. So for now, no need of any external library (for now, but I don't see any reason for that now)
I completely misread/missed reading 3) about the "deformable mirror" seeing now it's a down-to-earth project - literally.. :)
Still, glad to help, even if it doesn't get Julia into space. :)
First I would like to apologize: just after posting my answer I went to wikipedia to search the difference between soft and real time.
I should have done it before so that you don't have to spend more time to explain.
In the end I still don't know if I am hard real time or soft real time: the timing is given by the camera speed and the processing should be done between the acquisition of two images.
From: https://en.wikipedia.org/wiki/Real-time_computing#Criteria_for_real-time_computing
- Hard – missing a deadline is a total system failure.
- Firm – infrequent deadline misses are tolerable, but may degrade the system's quality of service. The usefulness of a result is zero after its deadline.
- Soft – the usefulness of a result degrades after its deadline, thereby degrading the system's quality of service.
[Note also, real-time also applies to doing stuff too early, not only to not doing stuff too late.. In some cases, say in games, that is not a [big] problem, getting a frame ready earlier isn't a big concern.]
Are you sure "the processing should be done between the acquisition of two images" is a strict requirement? I assume the "atmospheric turbulence" to not change extremely quickly and you could have some latency with you calculation applying for some time/at least a few/many frames after and then your project seems not hard real-time at all. Maybe soft or firm, a category I had forgotten..
At least your timescale is much longer than the camera speed to capture each frame in a video?
You also said "1000 images/sec but the camera may be able to go up to 10 000 images/sec". I'm aware of very high-speed photography, such as capturing a picture of a bullet from a gun, or seeing light literally spreading across a room. Still do you need many frames per second for (capturing video, that seems not your job) or for correction? Did you mix up camera speed for exposure time? Ordinary cameras go up to 1/1000 s shutter speed, but might only take video at up to 30, 60 or say 120 fps.
>I like the definition of 95% hard real time; it suits my needs. Thanks for this good paper.
The term/title, sounds like firm real-time..
We don't want to miss an image or delay the processing, I still need to clarify the consequences of a delay or if we miss an image.
For now let's just say that we can miss some images so we want soft real time.
You could store with each frame a) how long since the mirror was corrected, based on b) the measurement since how long ago. Also can't you [easily] see from a picture if it is mirror is maladjusted? Does to then look blurred and then high-frequency content missing?
How many "mirrors" are adjusted, or points in the mirror[s]?
I'm making a benchmark that should match the system in term of complexity, these are my first remarks:
When you say that one allocation is unacceptable, I say it's shockingly true: In my case I had 2 allocations done by:
A +=1 where A is an array
and in 7 seconds I had 600k allocations.
Morality :In closed loop you cannot accept any alloc and so you have to explicit all loops.
I think you mean two (or even one) allocation are bad because they are in a loop. And that loop runs for each adjustment.
I meant even just one allocation (per adjustment, or frame of you will) can be a problem. Well, not strictly, but say there have been many in the past, then it's only the last one that is the problem.
I have two problems now:
1/ Many times, the first run that include the compilation was the fastest and then any other run was slower by a factor 2.
2/ If I relaunch many times the main function that is in a module, there are some run that were very different (slower) from the previous.
About 1/, although I find it strange I don't really care.
2/ If far more problematic, once the code is compiled I want it to act the same whatever the number of launch.
I have some ideas why but no certitudes. What bother me the most is that all the runs in the benchmark will be slower, it's not a temporary slowdown it's all the current benchmark that will be slower.
If I launch again it will be back to the best performances.
Thank you for the links they are very interesting and I keep that in mind.
Note: I disabled hyperthreading and overclock, so it should not be the CPU doing funky things.
Keep at least possible thermal throttling in mind.. The other guy, Islam, had something on it. I had my mind set on the coldness or hotness of space.. and radiation-hardening.
--
Palli.