[moving to mailing list]
Hi Dinesh,
in RADICAL-Pilot your assumption is correct. However, it seems that the code below is BigJob code? While RADICAL-Pilot and BigJob presumably behave the same way (i.e., allocating on ‘core-level’), there have been some confusion about how the PilotComputeDescription attributes should be used properly.
I assume and seem to remember that pilot_description.number_of_processes=2048 in BigJob do the same as pilot_description.cores=2048 in RADICAL-Pilot, so you should be fine.
Maybe others have additional input?
Thanks,
Ole
On Apr 22, 2014, at 12:17 PM, Dinesh Prasanth Ganapathi <
dinesh....@gmail.com> wrote:
> Hi Ole,
> I had a small confusion regarding the terminology for 'core' and 'node' while running a bigjob script to be run on Yellowstone. My aim is to run one task (rmsd of a coordinate file) per core for several cores. Yellowstone has 4596 computation nodes and each node has 16 cores. Also they charge you for the usage of a complete node (all 16 cores) even if we use only say 8 cores. so decided to run the tasks per core in a multiple of 16. I just wanted to confirm that by setting the highlighted lines I am doing the correct thing in the following snippet and that when I set pilot_description.number_of_processes to 2048 I am reserving 2048 cores and not nodes. Similarly by setting task_desc.number_of_processes
> to 1, I am running 1 task per core.
>
> NUMBER_JOBS = 2048
> def main():
> try:
> # this describes the parameters and requirements for our pilot job
> pilot_description = pilot.PilotComputeDescription()
> pilot_description.service_url = "lsf://localhost"
> pilot_description.number_of_processes = 2048
> pilot_description.working_directory = WORKDIR
> pilot_description.walltime = 720
> pilot_description.project ="URTG0003"
> pilot_description.queue ="regular"
> # create a new pilot job
> pilot_compute_service = pilot.PilotComputeService(coordination_url=COORD)
> pilotjob = pilot_compute_service.create_pilot(pilot_description)
>
> # submit tasks to pilot job
> tasks = list()
> for i in range(NUMBER_JOBS):
> task_desc = pilot.ComputeUnitDescription()
> task_desc.executable = 'time bash' #'bash try_rmsd_calling_script.sh'
> task_desc.arguments = ['/glade/p/work/dinesh/md_traj_expts/try_rmsd_calling_script.sh']
> #task_desc.environment = {'TASK_NO': i}
> task_desc.number_of_processes = 1
> task_desc.output = 'out.txt'
> task_desc.error = 'err.txt'
>
> task = pilotjob.submit_compute_unit(task_desc)
> print "* Submitted task '%s' with id '%s' to %s" % (i, task.get_id(), HOSTNAME)
> tasks.append(task)
>
> print "Waiting for tasks to finish..."
> start = time.time()
> pilotjob.wait()
>
>
> Thanks and regards,
> Dinesh