I am working on file processing (large number of files) which is
totally independent task for each file. I want to speedup processing
using java thread but I do not want overwhelm my processor (core 2 due
processor) with all threads at the same time.
I want to run 2 parallel threads simultaneously and wait until they
finish then run the next 2 threads. furthermore, each thread allocate
large amount of memory so I need to de-allocate the thread after
collecting the result from it.
I would appreciate if anyone direct my to similar set example
I would appreciate guideline and help
Thanks,
--
Motaz K. Saad
As far as the memory issues go, you need to leave memory management up to
the JVM. If you allocate a large amount of memory, but then no longer
have any way to reach the memory, the JVM will free the memory if and when
needed, if not before. "De-allocating" the thread (whatever it is you
mean by that) will have no effect at all.
As far as the threading goes, your proposed solution is inefficient.
Rather than creating pairs of threads, waiting for them to run to
completion, and then letting them exit and creating another pair, you
should simply create as many threads initially as you want to have
running, and then let them pull tasks to execute from a shared,
synchronized queue.
In that way, you avoid having threads constantly being created and
destroyed over and over, and you also ensure that both threads are always
busy until there is no more work.
You can either implement the above explicitly, or use a
java.util.concurrent.ThreadPoolExecutor to manage the threads and tasks
for you.
Pete
> In article
> <d471ab5a-5e18-4626...@d32g2000yqh.googlegroups.com>,
> "Motaz K. Saad" <motaz...@gmail.com> wrote:
>
> > I am working on file processing (large number of files) which is
> > totally independent task for each file. I want to speedup processing
> > using java thread but I do not want overwhelm my processor (core 2
> > due processor) with all threads at the same time.
> >
> > I want to run 2 parallel threads simultaneously and wait until they
> > finish then run the next 2 threads. furthermore, each thread allocate
> > large amount of memory so I need to de-allocate the thread after
> > collecting the result from it.
> >
> > I would appreciate if anyone direct my to similar set example
> > I would appreciate guideline and help
>
> A BlockingQueue with one producer and two consumers might do:
>
> <http://java.sun.com/javase/6/docs/api/java/util/concurrent/
> BlockingQueue.html
Also, please do not multi-post.
[Followup-To: comp.lang.java.programmer]
--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>
> I am working on file processing (large number of files) which is totally
> independent task for each file. I want to speedup processing using java
> thread but I do not want overwhelm my processor (core 2 due processor)
> with all threads at the same time.
You'd have to have an awful lot of threads to 'overwhelm' the processor -
hundreds, or perhaps even thousands. There is a chance that it might
overwhelm the disk, though, depending on things like how big your files
are and how the threads access them.
However, running your jobs in 1000 threads won't be any faster than
running them in 10 (although that might be faster than running them in 2,
because it lets you saturate the disk - two threads on two CPUs will leave
either the disk or CPU underutilised at some point, unless readahead and
GC keep them busy), so your conclusion is right - you want to use fewer
threads than you have tasks.
> I want to run 2 parallel threads simultaneously and wait until they
> finish then run the next 2 threads.
No, you want to run two parallel tasks simultaneously and wait until they
finish then run the next two tasks. You don't need to have one thread per
task.
> furthermore, each thread allocate large amount of memory so I need to
> de-allocate the thread after collecting the result from it.
As long as there are no pointers to the allocated objects after the file
is processed, the garbage collector will reclaim it.
> I would appreciate if anyone direct my to similar set example I would
> appreciate guideline and help
You want:
http://java.sun.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html
Your code looks like:
public class FileProcessingTask implements Runnable {
private final File file;
public FileProcessingTask(File file) {
this.file = file;
}
public void run() {
try {
// process file
}
catch (Exception e) {
// log the exception
// close any open files
}
}
}
public class FileProcessingApp {
public static void main(String... args) {
Collection<File> filesToProcess; // initialise this however you like
int numThreads = Runtime.getRuntime().availableProcessors() * 2;
ExecutorService executor = Executors.newFixedThreadPool(numThreads);
for (File file: filesToProcess) {
executor.execute(new FileProcessingTask(file));
}
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
}
}
tom
--
Brace yourself for an engulfing, cowardly autotroph! I want your
photosynthetic apparatii!
Thinking about "threads" is probably not what you want. You probably
want to think about "Parallelizable Tasks". Look into the
java.util.concurrent classes. Especially the Executors class. You can
create a fixed thread pool executor.
I suggest creating a more threads than cores, if your threads are mostly
IO bound.
Hope this helps,
Daniel.
--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
Depending on what you are exactly doing, it seems to me that doing two
threads at a time will not be maximizing throughput. They could be
spending most of their time waiting around for disk I/O. On modern
processors, one thread-per-core is generally not optimal; ISTR hearing
somewhere that the correct number is about 1.5-2 per core.
> I want to run 2 parallel threads simultaneously and wait until they
> finish then run the next 2 threads. furthermore, each thread allocate
> large amount of memory so I need to de-allocate the thread after
> collecting the result from it.
As long as you watch who holds references to what, the GC will clear
memory up itself. I think it would generally be sufficient to make sure
that you don't keep references to the threads unless necessary--good
programming practices will likely confine the leaks.
> I would appreciate if anyone direct my to similar set example
> I would appreciate guideline and help
The first thing to recommend is that you have a grasp on how to do
concurrent programming. If your tasks do not communicate to each other
or to the main program, you probably don't have any thread-safety issues.
The best way to actually implement this is probably with the new from
Java 5 ExecutorService API, as others have stated.
--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
>
>However, running your jobs in 1000 threads won't be any faster than
>running them in 10 (although that might be faster than running them in 2,
The main time large numbers of threads buys you something is web
scraping. Your threads are mostly sitting waiting for socket i/o to
complete.
--
Roedy Green Canadian Mind Products
http://mindprod.com
http://thecovemovie.com : The Cove: a documentary about Japan's secret atrocities against dolphins.
my tests with Multiple threads by the way show that 10 or 2 threads on a
dualcore machine are a not measureable difference..
Christian
> Roedy Green schrieb:
>> On Mon, 17 Aug 2009 19:16:06 +0100, Tom Anderson
>> <tw...@urchin.earth.li> wrote, quoted or indirectly quoted someone who
>> said :
>>
>>> However, running your jobs in 1000 threads won't be any faster than
>>> running them in 10 (although that might be faster than running them in
>>> 2,
>>
>> The main time large numbers of threads buys you something is web
>> scraping. Your threads are mostly sitting waiting for socket i/o to
>> complete.
>
> isn't that what NIO is for so you need only one thread to read.. and he
> gives off the processing to an ExecutorService..
That's fine if the processing is "read all data, then crunch all data".
But if it's more complicated than that, it might be that the code is
simpler if the reading and processing is all encapsulated in the same task.
NIO also isn't as efficient, at least for sockets. Assuming the same is
true for file i/o, that's another reason to use threads (judiciously, of
course).
> my tests with Multiple threads by the way show that 10 or 2 threads on a
> dualcore machine are a not measureable difference..
And what was the test? Not all benchmarks are equal, and they may or may
not apply to a specific solution that would be implemented. Threading in
particular is sensitive to both hardware and software configuration, and
where a large number of threads might be fine on one computer, it might
suffer some significant overhead on another.
Pete
>isn't that what NIO is for so you need only one thread to read.. and he
>gives off the processing to an ExecutorService..
Yes. I've been told that is a less flexible, more efficient way to do
it.
so I assume that if you really have no interaction with the os and
nothing that could block you. Then number of Threads = CPU cores
should be fine/perfect.
> so I assume that if you really have no interaction with the os and
> nothing that could block you. Then number of Threads = CPU cores should
> be fine/perfect.
Close.
If you're making garbage, you actually want slightly fewer threads than
cores, because the collector will need some CPU time every so often. If
you have 4 cores, you want to run about 3.8 threads. Of course, this is
not possible in practice.
Also, bear in mind that 'no interaction with the OS' includes 'not using
any virtual memory' (in the sense of 'memory which is paged out') - a
thread which accesses a memory location which is paged out will trigger
disk IO, which will cause it to block. If your java processes fits
entirely in physical memory, this is not an issue.
tom
--
Information is not knowledge. -- Albert Einstein