Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

java thread for core 2 due processors

0 views
Skip to first unread message

Motaz K. Saad

unread,
Aug 17, 2009, 12:56:37 PM8/17/09
to
Hello,

I am working on file processing (large number of files) which is
totally independent task for each file. I want to speedup processing
using java thread but I do not want overwhelm my processor (core 2 due
processor) with all threads at the same time.

I want to run 2 parallel threads simultaneously and wait until they
finish then run the next 2 threads. furthermore, each thread allocate
large amount of memory so I need to de-allocate the thread after
collecting the result from it.

I would appreciate if anyone direct my to similar set example
I would appreciate guideline and help


Thanks,
--
Motaz K. Saad

Peter Duniho

unread,
Aug 17, 2009, 1:07:34 PM8/17/09
to
On Mon, 17 Aug 2009 09:56:37 -0700, Motaz K. Saad <motaz...@gmail.com>
wrote:

As far as the memory issues go, you need to leave memory management up to
the JVM. If you allocate a large amount of memory, but then no longer
have any way to reach the memory, the JVM will free the memory if and when
needed, if not before. "De-allocating" the thread (whatever it is you
mean by that) will have no effect at all.

As far as the threading goes, your proposed solution is inefficient.
Rather than creating pairs of threads, waiting for them to run to
completion, and then letting them exit and creating another pair, you
should simply create as many threads initially as you want to have
running, and then let them pull tasks to execute from a shared,
synchronized queue.

In that way, you avoid having threads constantly being created and
destroyed over and over, and you also ensure that both threads are always
busy until there is no more work.

You can either implement the above explicitly, or use a
java.util.concurrent.ThreadPoolExecutor to manage the threads and tasks
for you.

Pete

John B. Matthews

unread,
Aug 17, 2009, 2:06:29 PM8/17/09
to
In article <nospam-EB4B8C....@news.aioe.org>,
"John B. Matthews" <nos...@nospam.invalid> wrote:

> In article
> <d471ab5a-5e18-4626...@d32g2000yqh.googlegroups.com>,


> "Motaz K. Saad" <motaz...@gmail.com> wrote:
>
> > I am working on file processing (large number of files) which is
> > totally independent task for each file. I want to speedup processing
> > using java thread but I do not want overwhelm my processor (core 2
> > due processor) with all threads at the same time.
> >
> > I want to run 2 parallel threads simultaneously and wait until they
> > finish then run the next 2 threads. furthermore, each thread allocate
> > large amount of memory so I need to de-allocate the thread after
> > collecting the result from it.
> >
> > I would appreciate if anyone direct my to similar set example
> > I would appreciate guideline and help
>

> A BlockingQueue with one producer and two consumers might do:
>
> <http://java.sun.com/javase/6/docs/api/java/util/concurrent/
> BlockingQueue.html

Also, please do not multi-post.

[Followup-To: comp.lang.java.programmer]

--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>

Tom Anderson

unread,
Aug 17, 2009, 2:16:06 PM8/17/09
to
On Mon, 17 Aug 2009, Motaz K. Saad wrote:

> I am working on file processing (large number of files) which is totally
> independent task for each file. I want to speedup processing using java
> thread but I do not want overwhelm my processor (core 2 due processor)
> with all threads at the same time.

You'd have to have an awful lot of threads to 'overwhelm' the processor -
hundreds, or perhaps even thousands. There is a chance that it might
overwhelm the disk, though, depending on things like how big your files
are and how the threads access them.

However, running your jobs in 1000 threads won't be any faster than
running them in 10 (although that might be faster than running them in 2,
because it lets you saturate the disk - two threads on two CPUs will leave
either the disk or CPU underutilised at some point, unless readahead and
GC keep them busy), so your conclusion is right - you want to use fewer
threads than you have tasks.

> I want to run 2 parallel threads simultaneously and wait until they
> finish then run the next 2 threads.

No, you want to run two parallel tasks simultaneously and wait until they
finish then run the next two tasks. You don't need to have one thread per
task.

> furthermore, each thread allocate large amount of memory so I need to
> de-allocate the thread after collecting the result from it.

As long as there are no pointers to the allocated objects after the file
is processed, the garbage collector will reclaim it.

> I would appreciate if anyone direct my to similar set example I would
> appreciate guideline and help

You want:

http://java.sun.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html

Your code looks like:

public class FileProcessingTask implements Runnable {
private final File file;
public FileProcessingTask(File file) {
this.file = file;
}
public void run() {
try {
// process file
}
catch (Exception e) {
// log the exception
// close any open files
}
}
}

public class FileProcessingApp {
public static void main(String... args) {
Collection<File> filesToProcess; // initialise this however you like
int numThreads = Runtime.getRuntime().availableProcessors() * 2;
ExecutorService executor = Executors.newFixedThreadPool(numThreads);
for (File file: filesToProcess) {
executor.execute(new FileProcessingTask(file));
}
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
}
}

tom

--
Brace yourself for an engulfing, cowardly autotroph! I want your
photosynthetic apparatii!

Daniel Pitts

unread,
Aug 17, 2009, 2:44:44 PM8/17/09
to
Motaz K. Saad wrote:
> Hello,

>
> I am working on file processing (large number of files) which is
> totally independent task for each file. I want to speedup processing
> using java thread but I do not want overwhelm my processor (core 2 due
> processor) with all threads at the same time.
>
> I want to run 2 parallel threads simultaneously and wait until they
> finish then run the next 2 threads. furthermore, each thread allocate

> large amount of memory so I need to de-allocate the thread after
> collecting the result from it.
>
> I would appreciate if anyone direct my to similar set example
> I would appreciate guideline and help
>
>
> Thanks,
> --
> Motaz K. Saad
<soapbox>Please don't multi-post, cross-post instead.</soapbox>

Thinking about "threads" is probably not what you want. You probably
want to think about "Parallelizable Tasks". Look into the
java.util.concurrent classes. Especially the Executors class. You can
create a fixed thread pool executor.

I suggest creating a more threads than cores, if your threads are mostly
IO bound.

Hope this helps,
Daniel.

--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

Joshua Cranmer

unread,
Aug 17, 2009, 3:52:58 PM8/17/09
to
Motaz K. Saad wrote:
> I am working on file processing (large number of files) which is
> totally independent task for each file. I want to speedup processing
> using java thread but I do not want overwhelm my processor (core 2 due
> processor) with all threads at the same time.

Depending on what you are exactly doing, it seems to me that doing two
threads at a time will not be maximizing throughput. They could be
spending most of their time waiting around for disk I/O. On modern
processors, one thread-per-core is generally not optimal; ISTR hearing
somewhere that the correct number is about 1.5-2 per core.

> I want to run 2 parallel threads simultaneously and wait until they
> finish then run the next 2 threads. furthermore, each thread allocate
> large amount of memory so I need to de-allocate the thread after
> collecting the result from it.

As long as you watch who holds references to what, the GC will clear
memory up itself. I think it would generally be sufficient to make sure
that you don't keep references to the threads unless necessary--good
programming practices will likely confine the leaks.

> I would appreciate if anyone direct my to similar set example
> I would appreciate guideline and help

The first thing to recommend is that you have a grasp on how to do
concurrent programming. If your tasks do not communicate to each other
or to the main program, you probably don't have any thread-safety issues.

The best way to actually implement this is probably with the new from
Java 5 ExecutorService API, as others have stated.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Roedy Green

unread,
Aug 17, 2009, 8:14:22 PM8/17/09
to
On Mon, 17 Aug 2009 19:16:06 +0100, Tom Anderson
<tw...@urchin.earth.li> wrote, quoted or indirectly quoted someone who
said :

>
>However, running your jobs in 1000 threads won't be any faster than
>running them in 10 (although that might be faster than running them in 2,

The main time large numbers of threads buys you something is web
scraping. Your threads are mostly sitting waiting for socket i/o to
complete.
--
Roedy Green Canadian Mind Products
http://mindprod.com

http://thecovemovie.com : The Cove: a documentary about Japan's secret atrocities against dolphins.

Christian

unread,
Aug 18, 2009, 9:42:03 AM8/18/09
to
Roedy Green schrieb:

> On Mon, 17 Aug 2009 19:16:06 +0100, Tom Anderson
> <tw...@urchin.earth.li> wrote, quoted or indirectly quoted someone who
> said :
>
>> However, running your jobs in 1000 threads won't be any faster than
>> running them in 10 (although that might be faster than running them in 2,
>
> The main time large numbers of threads buys you something is web
> scraping. Your threads are mostly sitting waiting for socket i/o to
> complete.
isn't that what NIO is for so you need only one thread to read.. and he
gives off the processing to an ExecutorService..

my tests with Multiple threads by the way show that 10 or 2 threads on a
dualcore machine are a not measureable difference..

Christian

Peter Duniho

unread,
Aug 18, 2009, 12:54:40 PM8/18/09
to
On Tue, 18 Aug 2009 06:42:03 -0700, Christian <fake...@xyz.de> wrote:

> Roedy Green schrieb:
>> On Mon, 17 Aug 2009 19:16:06 +0100, Tom Anderson
>> <tw...@urchin.earth.li> wrote, quoted or indirectly quoted someone who
>> said :
>>
>>> However, running your jobs in 1000 threads won't be any faster than
>>> running them in 10 (although that might be faster than running them in
>>> 2,
>>
>> The main time large numbers of threads buys you something is web
>> scraping. Your threads are mostly sitting waiting for socket i/o to
>> complete.
>
> isn't that what NIO is for so you need only one thread to read.. and he
> gives off the processing to an ExecutorService..

That's fine if the processing is "read all data, then crunch all data".
But if it's more complicated than that, it might be that the code is
simpler if the reading and processing is all encapsulated in the same task.

NIO also isn't as efficient, at least for sockets. Assuming the same is
true for file i/o, that's another reason to use threads (judiciously, of
course).

> my tests with Multiple threads by the way show that 10 or 2 threads on a
> dualcore machine are a not measureable difference..

And what was the test? Not all benchmarks are equal, and they may or may
not apply to a specific solution that would be implemented. Threading in
particular is sensitive to both hardware and software configuration, and
where a large number of threads might be fine on one computer, it might
suffer some significant overhead on another.

Pete

Roedy Green

unread,
Aug 20, 2009, 9:06:23 PM8/20/09
to
On Tue, 18 Aug 2009 15:42:03 +0200, Christian <fake...@xyz.de> wrote,

quoted or indirectly quoted someone who said :

>isn't that what NIO is for so you need only one thread to read.. and he

>gives off the processing to an ExecutorService..

Yes. I've been told that is a less flexible, more efficient way to do
it.

Christian

unread,
Aug 21, 2009, 9:42:50 PM8/21/09
to
Peter Duniho schrieb:
basically number crunching...
one thread solely there for reading from disc and putting the read stuff
in 64KiB packets into a BlockingQueue..
With different number of worker Threads running pulling the packets of
the Queue and hashing them. Which is done for building a Merkletree..
used hashfunctions was Tiger.

so I assume that if you really have no interaction with the os and
nothing that could block you. Then number of Threads = CPU cores
should be fine/perfect.

Tom Anderson

unread,
Aug 25, 2009, 10:22:01 AM8/25/09
to
On Sat, 22 Aug 2009, Christian wrote:

> so I assume that if you really have no interaction with the os and
> nothing that could block you. Then number of Threads = CPU cores should
> be fine/perfect.

Close.

If you're making garbage, you actually want slightly fewer threads than
cores, because the collector will need some CPU time every so often. If
you have 4 cores, you want to run about 3.8 threads. Of course, this is
not possible in practice.

Also, bear in mind that 'no interaction with the OS' includes 'not using
any virtual memory' (in the sense of 'memory which is paged out') - a
thread which accesses a memory location which is paged out will trigger
disk IO, which will cause it to block. If your java processes fits
entirely in physical memory, this is not an issue.

tom

--
Information is not knowledge. -- Albert Einstein

0 new messages