bittorrent for archer

6 views
Skip to first unread message

P. Oscar Boykin

unread,
Aug 9, 2010, 10:22:54 AM8/9/10
to acisp2...@googlegroups.com
I'm not a big bittorrent user, but I thought some of you might be, so
I thought I'd ask my question before doing any googling (forgive me).

It would be nice to be able to use bittorrent in the grid-appliance.
If there was an easy command-line bittorrent client, we could easily
distribute the torrent file, then have a script run the job by
starting a bittorrent client, downloading the input file, then running
the job.

This should alleviate the problem of the submit node having to
transfer a large file to all the worker nodes, and it should be pretty
easy to make an example recipe so people can do this for their jobs.

I have jobs that would benefit from this kind of thing: large inputs,
but very small outputs (basically big searches through unstructured
data which is unlikely to find many, if any hits).

Has any seen any how-to in order to accomplish this?
--
P. Oscar Boykin                            http://boykin.acis.ufl.edu
Assistant Professor, Department of Electrical and Computer Engineering
University of Florida

Dreamcat4

unread,
Aug 9, 2010, 3:22:12 PM8/9/10
to acisp2...@googlegroups.com
I would suggest;
Install Transmission which has a cmdline mode.
Learn Chef solo to do the scripting part.

> --
> You received this message because you are subscribed to the Google Groups "acis.p2p.users" group.
> To post to this group, send email to acisp2...@googlegroups.com.
> To unsubscribe from this group, send email to acisp2pusers...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/acisp2pusers?hl=en.
>
>

P. Oscar Boykin

unread,
Aug 9, 2010, 9:12:11 PM8/9/10
to acisp2...@googlegroups.com

Looks like transmission's client is just a front end that connects to another running daemon.  Did I misunderstand that?  The standard bt client seems to also ship with a command line version.

It would be great to have a how to on this kind of job on the wiki. 

I saw this:

http://osdir.com/ml/distributed.condor.user/2006-08/msg00262.html

And I don't agree with it the argument at all.  My machine seems to get hammered on submit since I have to transfer maybe 100MB (or more) of input to all the clients, who will read all of the data.  So, the submit node is quite a bottleneck.  The response is small, so there is no bottleneck there.  This seems exactly like an ideal case for Bittorrent.

Can anyone think of why this might not be a good method to handle the job?

On Aug 9, 2010 3:34 PM, "Dreamcat4" <drea...@gmail.com> wrote:

I would suggest;
Install Transmission which has a cmdline mode.
Learn Chef solo to do the scripting part.


On Mon, Aug 9, 2010 at 3:22 PM, P. Oscar Boykin <boy...@pobox.com> wrote:

> I'm not a big bittorren...

Dreamcat4

unread,
Aug 10, 2010, 1:29:22 AM8/10/10
to acisp2...@googlegroups.com
On Tue, Aug 10, 2010 at 2:12 AM, P. Oscar Boykin <boy...@pobox.com> wrote:
> Looks like transmission's client is just a front end that connects to
> another running daemon.  Did I misunderstand that?

Perhaps you mean "it doesnt have a tracker integrated into it" ?

Well, there are public bittorrent trackers you can use to co-ordinate
the download. The consequence is that they make your data publically
available. But then not needing to setup a tracker yourself is
probably a lot easier.

You aught to read up Magnet links. These provide a way to "mix and
match" multiple data sources. So if the torrent failed, the download
could fall back to a http server.

http://en.wikipedia.org/wiki/Magnet_URI_scheme

And another system supported by magnet links is DC+. It might be a
good alternative to Bittorrent (for your purposes)

http://en.wikipedia.org/wiki/DirectConnect

> The standard bt client
> seems to also ship with a command line version.

The standard Bittorrent client is probably lacking many new features.
Thats why I recommended you should consider a client like
Transmission. uTorrent is probably also worth evaluating.

They include a newer bittorrent feature called Local Peer Discovery (LPD).

http://en.wikipedia.org/wiki/Local_Peer_Discovery

... which would help if using a public (or any remote non-local) tracker.


> It would be great to have a how to on this kind of job on the wiki.
>
> I saw this:
>
> http://osdir.com/ml/distributed.condor.user/2006-08/msg00262.html
>
> And I don't agree with it the argument at all.  My machine seems to get
> hammered on submit since I have to transfer maybe 100MB (or more) of input
> to all the clients, who will read all of the data.  So, the submit node is
> quite a bottleneck.  The response is small, so there is no bottleneck there.
>  This seems exactly like an ideal case for Bittorrent.

Bear in mind that Bittorrent can take a while to get started up. In
the beginning no other nodes in your swarm will have any blocks. The
network load will be easier to control, however. You can set the
number of concurrent peers, and put caps on the bandwidth allocated to
deal with the download / upload.

If you modify your software to pass around your source file as a
magnet link first, that would make things more flexible. You could
then be free to choose http (webserver), DirectConnent, or bittorrent.
And more easily upgrade to some other newer downloads mechanism later
on.

Reply all
Reply to author
Forward
0 new messages