Hi, I am trying to set up my workflow with sumatra. I start a lot of parallel jobs at once. My Python scripts prepares the job script and then starts the jobs. What I want to do is the following: 1. Prepare the jobs in my script 2. Register each record in the recordstore 3. Submit parallel jobs to the queue of my clusters. 4. After executing the job, each jobs should register its runtime and output files to the recordstore. For the last step, I want to retrieve the record from the database, add the runtime and output files and then save it again. Like so: project = load_project() record = project.get_record(label) record.duration = endtime - starttime record.output_data = record.datastore.find_new_data(record.timestamp) project.add_record(record) project.save() However, I get the following in most of the cases. It is not 100% reliably reproducible though, sometimes it runs through, sometimes not. Traceback (most recent call last): File "param_single.py", line 106, in <module> project.add_record(record) File "/home/schmidt/.local/lib/python3.5/site-packages/sumatra/projects.py", line 261, in add_record self.record_store.save(self.name, record) File "/home/schmidt/.local/lib/python3.5/site-packages/sumatra/recordstore/django_store/__init__.py", line 227, in save db_record.launch_mode = self._get_db_obj('LaunchMode', record.launch_mode) File "/home/schmidt/.local/lib/python3.5/site-packages/sumatra/recordstore/django_store/__init__.py", line 206, in _get_db_obj db_obj, created = cls.objects.get_or_create_from_sumatra_object(obj, using=self._db_label) File "/home/schmidt/.local/lib/python3.5/site-packages/sumatra/recordstore/django_store/models.py", line 53, in get_or_create_from_sumatra_object return self.using(using).get_or_create(**attributes) File "/home/schmidt/.local/lib/python3.5/site-packages/django/db/models/query.py", line 406, in get_or_create return self.get(**lookup), False File "/home/schmidt/.local/lib/python3.5/site-packages/django/db/models/query.py", line 339, in get (self.model._meta.object_name, num) sumatra.recordstore.django_store.models.MultipleObjectsReturned: get() returned more than one LaunchMode -- it returned 2! I do not understand this error because 1. I do not touch the entry LaunchMode and 2. I manually printed the launch_mode entry of the record and it does not contain 2 entries. Does someone know the source of the error or tell me a more correct way to achieve my goal within sumatra? Thanks a lot and best regards, Maximilian
-- Dr. Maximilian Schmidt Institute of Neuroscience and Medicine (INM-6) Institute for Advanced Simulation (IAS-6) JARA BRAIN Institute I Juelich Research Centre Juelich, Germany tel.: +49 2461 61-9468 max.s...@fz-juelich.de
1. Write a single py script as describe in tutorial, including the registration at the end. 2. Write my job submission script. i'm using PBS but any can do this job. 3. Submit parallel jobs to the queue of my clusters.
thank you for your reply. Sorry for my unclear explanation of the workflow. I am basically doing the same thing as you. So I have one serial python script that creates the job scripts, does the registration of each record and then submits the parallel jobs to the queue. Each parallel job is one simulation. The problematic part is that at the end of each simulation, each of the parallel jobs has recorded its individual runtime and I want to store this runtime to the database. So, after the simulation, each parallel job loads the record associated to its simulation and add the runtime (or other information) to the record. The label is a unique hash that identifies each simulation run, so to my understanding, there should be only one record for each label in the database. But I have no clue about django and databases in general, so I might be wrong, as you pointed out. The easier way to do all this is to let each job do the initial registration of the record in the first place. However, I ran into some strange errors when executing 'smtweb' and concluded that the database might be corrupt because many parallel jobs were writing into it in parallel. This risk is not completely avoided by the procedure I am trying to implement, but at least the probability is reduced because each simulation has a slightly different runtime and the time needed to modify a record should be relatively short. I hope I made my problem clearer, Thanks for your help, Max
Avis : Ce message et toute pièce jointe sont la propriété de boostHEAT et sont destinés seulement aux personnes ou à l'entité à qui ce message est adressé. Les informations contenues dans ce message peuvent avoir un caractère confidentiel et sa divulgation ou reproduction est strictement interdite. Si vous avez reçu ce message par erreur, veuillez le retourner immédiatement à l'expéditeur par courriel et le détruire. Si vous n'êtes pas le destinataire du message, vous n'êtes pas autorisé à utiliser, à copier ou à divulguer le contenu du message ou ses pièces jointes en tout ou en partie.
Notice: This e-mail message and any attachment are the property of boostHEAT and are intended solely for the use of the named recipient(s) or entity to whom this message is addressed. The information contained therein may be confidential or privileged, and its disclosure or reproduction is strictly prohibited. If you have received this message in error, please return it immediately to its sender at the above e-mail address and destroy it. If you are not the intended recipient you are not allowed to use, copy or disclose the content or attachments of this message in whole or in part.-- You received this message because you are subscribed to the Google Groups "sumatra-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to sumatra-user...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
project = load_project()
record = project.get_record(label)
record.duration = endtime - starttime
record.output_data = record.datastore.find_new_data(record.timestamp)
record.label = mynewlabel
You can also change the label only if it exists in the database like :
record.label = mynewlabelThis solution is for django experts I would say :)
You received this message because you are subscribed to a topic in the Google Groups "sumatra-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sumatra-users/J6m_tdZIZAw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sumatra-user...@googlegroups.com.