Re: Quick question regarding LSF adapter Bigjob

13 views
Skip to first unread message

Ole Weidner

unread,
Apr 22, 2014, 1:03:02 PM4/22/14
to Dinesh Prasanth Ganapathi, bigjob...@googlegroups.com
[moving to mailing list]

Hi Dinesh,

in RADICAL-Pilot your assumption is correct. However, it seems that the code below is BigJob code? While RADICAL-Pilot and BigJob presumably behave the same way (i.e., allocating on ‘core-level’), there have been some confusion about how the PilotComputeDescription attributes should be used properly.

I assume and seem to remember that pilot_description.number_of_processes=2048 in BigJob do the same as pilot_description.cores=2048 in RADICAL-Pilot, so you should be fine.

Maybe others have additional input?

Thanks,
Ole

On Apr 22, 2014, at 12:17 PM, Dinesh Prasanth Ganapathi <dinesh....@gmail.com> wrote:

> Hi Ole,
> I had a small confusion regarding the terminology for 'core' and 'node' while running a bigjob script to be run on Yellowstone. My aim is to run one task (rmsd of a coordinate file) per core for several cores. Yellowstone has 4596 computation nodes and each node has 16 cores. Also they charge you for the usage of a complete node (all 16 cores) even if we use only say 8 cores. so decided to run the tasks per core in a multiple of 16. I just wanted to confirm that by setting the highlighted lines I am doing the correct thing in the following snippet and that when I set pilot_description.number_of_processes to 2048 I am reserving 2048 cores and not nodes. Similarly by setting task_desc.number_of_processes
> to 1, I am running 1 task per core.
>
> NUMBER_JOBS = 2048
> def main():
> try:
> # this describes the parameters and requirements for our pilot job
> pilot_description = pilot.PilotComputeDescription()
> pilot_description.service_url = "lsf://localhost"
> pilot_description.number_of_processes = 2048
> pilot_description.working_directory = WORKDIR
> pilot_description.walltime = 720
> pilot_description.project ="URTG0003"
> pilot_description.queue ="regular"
> # create a new pilot job
> pilot_compute_service = pilot.PilotComputeService(coordination_url=COORD)
> pilotjob = pilot_compute_service.create_pilot(pilot_description)
>
> # submit tasks to pilot job
> tasks = list()
> for i in range(NUMBER_JOBS):
> task_desc = pilot.ComputeUnitDescription()
> task_desc.executable = 'time bash' #'bash try_rmsd_calling_script.sh'
> task_desc.arguments = ['/glade/p/work/dinesh/md_traj_expts/try_rmsd_calling_script.sh']
> #task_desc.environment = {'TASK_NO': i}
> task_desc.number_of_processes = 1
> task_desc.output = 'out.txt'
> task_desc.error = 'err.txt'
>
> task = pilotjob.submit_compute_unit(task_desc)
> print "* Submitted task '%s' with id '%s' to %s" % (i, task.get_id(), HOSTNAME)
> tasks.append(task)
>
> print "Waiting for tasks to finish..."
> start = time.time()
> pilotjob.wait()
>
>
> Thanks and regards,
> Dinesh

signature.asc

Dinesh Prasanth Ganapathi

unread,
Apr 23, 2014, 10:57:35 AM4/23/14
to bigjob...@googlegroups.com
Thanks Ole!


Regards,
Dinesh
Reply all
Reply to author
Forward
0 new messages