Questions about "session"

Az

unread,

Jun 3, 2010, 1:24:37 AM6/3/10

to sqlalchemy

In my code, I am currently adding to the "session" in various modules
(this is the same session since I'm importing it from my most
prominent module).

Some sample code would be:

###### BEGIN CODE 1 #######

Session = sessionmaker(bind=engine)
session = Session()

def addToTable():
"""Very simple SQLAlchemy function that populates the Student,
Project
and Supervisor tables."""

for student in students.itervalues():
session.add(student)
session.flush()

for project in projects.itervalues():
session.add(project)
session.flush()

for supervisor in supervisors.itervalues():
session.add(supervisor)
session.flush()

session.commit()

And then again in a function in the same module:

def monteCarloBasic(trials):
"""The Monte-Carlo simulation will generate allocations for the
list of students by randomly arranging the order for each trial.

In the case of a student having more than one project for a given
rank,
the algorithm with randomly select one of them since it is given that
all such projects are equally desireable to the student."""

session_id = 1
ident = 1
for trial in xrange(trials):

for id in randomiseStudentKeys(True):
stud_id = id
student = students[id]

if student.preferences:
temp_alloc = SimAllocation(ident, session_id, stud_id)

ranks = sorted(student.preferences.keys())

for rank in ranks:
# Now we begin to try giving him/her a project
proj = random.choice(list(student.preferences[rank]))

if not (proj.allocated or proj.blocked or proj.own_project):

student.allocated_project = proj
student.allocated_proj_ref = proj.proj_id
student.allocated_rank = rank
allocSuccessActions(proj)

temp_alloc.alloc_proj = proj.proj_id # ... we can set the
allocated project details
temp_alloc.alloc_proj_rank = rank

session.add(temp_alloc)

break

ident += 1 # Increment the primary key
session.add(temp_alloc)

session.flush()
session_id += 1

resetData() # Resets all data

session.commit()

####### END CODE 1 #######

Later on I'm using this session to run some calculations on my data.
For example:

####### BEGIN CODE 2 ########

sid = 4545
project_id_list = list(students[sid].preferences)
for project_id in project_id_list
gotcha =
session.query(SimAllocation).filter(SimAllocation.student_id ==
sid).filter(PP.SimAllocation.alloc_proj == project_id).count()

###### END CODE 2 #######

Simply, this line counts how many times a certain student was
allocated each project from his list when using the Monte-Carlo
simulation from ### CODE 1 ### above.

+++ Questions +++

1. Is this the correct way to use sessions or am I sort of abusing
them?
2. When should I close a session?
3. I got the following error after trying to use copy.deepcopy() on
one of my dictionaries.

File "Main.py", line 106, in <module>
OPTM.preoptAlloc(some_best)
File "XXXXXXXX/Optimisation.py", line 48, in preoptAlloc
sid = projs_preopt[sim.alloc_proj_ref].proj_sup
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
attributes.py", line 158, in __get__
return self.impl.get(instance_state(instance),
instance_dict(instance))
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
attributes.py", line 377, in get
value = callable_()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
state.py", line 185, in __call__
attr.impl.key in unmodified
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
mapper.py", line 1864, in _load_scalar_attributes
"attribute refresh operation cannot proceed" % (state_str(state)))
sqlalchemy.exc.UnboundExecutionError: Instance <Project at 0x24c5c50>
is not bound to a Session; attribute refresh operation cannot proceed

Is this something to do with the way I've been using the sessions?

---

Thanks,

Az

Az

unread,

Jun 3, 2010, 1:44:32 AM6/3/10

to sqlalchemy

Also:

I'm using [Python 2.6.5] and [SQLAlchemy 0.5.8]

Previously I was just shallow copying my dictionaries, and there were
no issues then with my simulations.
My dictionaries contain objects such that my students dictionary is
basically:

students[stud_id] = Student(stud_id, name, preferences,...)

Student is mapped to an SQLAlchemy table.

This is similar for many of my objects.

I was trying to run some code to optimise my allocations and there was
no "real" involvement with SQLAlchemy -- in that I wasn't actually
dealing with any SQLAlchemy code.

I understand that the shallow copy in Python just copies the
references whereas deepcopy copies the entire object. Does that mean
the deepcopied object is "outside" the session or something?

Some help would be much appreciated. I have a feeling that the answer
lies somewhere within the way deepcopy and session work but my head
just can't put two-and-two together right now :(

Michael Bayer

unread,

Jun 3, 2010, 11:57:05 AM6/3/10

to sqlal...@googlegroups.com

On Jun 3, 2010, at 1:24 AM, Az wrote:

> +++ Questions +++
>
> 1. Is this the correct way to use sessions or am I sort of abusing
> them?

I dont see any poor patterns of use above.

> 2. When should I close a session?

when you no longer need the usage of any of the objects associated with it, or any remaining objects are in a state which you will re-merge them into a new session before you next use them. The session in its default state of autocommit=False is just like going to your database and starting a transaction, doing some work - when you're done with the work, you close the transaction, and all the data associated with that trans (i.e. your ORM objects) is essentially "invalid"; other transactions can be modifying that data. Your objects are an extension of the Session, which should be considered as an object-oriented window onto a database transaction.

> 3. I got the following error after trying to use copy.deepcopy() on
> one of my dictionaries.
>

> "attribute refresh operation cannot proceed" % (state_str(state)))
> sqlalchemy.exc.UnboundExecutionError: Instance <Project at 0x24c5c50>
> is not bound to a Session; attribute refresh operation cannot proceed

don't do deepcopy() on a structure that contains ORM objects if their owning session has been closed. deepcopy on ORM objects probably has issues that prevent it from working as you'd expect. You'd be better off building copy constructors, i.e. def copy(self): return FooBar(....).

Az

unread,

Jun 3, 2010, 1:58:20 PM6/3/10

to sqlalchemy

"Owning session has been closed"? Can I still use deepcopy if the
session has not been closed? How can I stop it from closing the
sessions? The problem is that if I change my shallow copied
dictionary, the objects are changed.

Basically, I'm trying to do this state change thing where I'll take a
dictionary (let's call it Node 1), make changes to it (thereby making
changes to the objects it references) and then save those changes as
Node 2. Then I'll take Node 2 and make some changes to that. So on and
so forth for a certain number of changes. Everytime I do so, I want to
retain the information from the previous Node as well as a "best node"
which can be any of the Nodes. If my operations change the objects, is
that even possible?

That was my motivation to use deepcopy but I don't want to stop using
SQLAlchemy because of it :(

Michael Bayer

unread,

Jun 3, 2010, 2:41:08 PM6/3/10

to sqlal...@googlegroups.com

On Jun 3, 2010, at 1:58 PM, Az wrote:

> "Owning session has been closed"? Can I still use deepcopy if the
> session has not been closed?

deepcopy has issues because SQLAlchemy places extra information on your objects, i.e. an _sa_instance_state attribute, that you dont want in your copy. You *do* however need one to exist on your object. Therefore deepcopy is not supported right now by SQLAlchemy ORM objects. There are ways to manually blow away the old _sa_instance_state and put a new one on the object, but the most straightforward is to make a new object with __init__() and set up the attributes that are significant, instead of doing a full deep copy.

if you do really want to use deepcopy, you'd have to implement __deepcopy__() on your objects and ensure that a new _sa_instance_state is set up, there are functions in sqlalchemy.orm.attributes which can help with that. This *should* be made an official SQLA recipe, but we haven't gotten around to it.

> How can I stop it from closing the
> sessions?

nothing in SQLA closes sessions. Your program is doing that.

> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>

Az

unread,

Jun 3, 2010, 3:22:09 PM6/3/10

to sqlalchemy

I think I get what you mean. In the mean time, another error popped
up:

Traceback (most recent call last):
File "Main.py", line 32, in <module>
MCS.addToTable()
File "XXXXXXX/MonteCarloSimulation.py", line 138, in addToTable
####### <------- This function is described in the first post
session.flush()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
session.py", line 1354, in flush
self._flush(objects)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
session.py", line 1432, in _flush
flush_context.execute()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
unitofwork.py", line 261, in execute
UOWExecutor().execute(self, tasks)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
unitofwork.py", line 753, in execute
self.execute_save_steps(trans, task)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
unitofwork.py", line 768, in execute_save_steps
self.save_objects(trans, task)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
unitofwork.py", line 759, in save_objects
task.mapper._save_obj(task.polymorphic_tosave_objects, trans)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
mapper.py", line 1303, in _save_obj
(state_str(state), instance_key, state_str(existing)))
sqlalchemy.orm.exc.FlushError: New instance <Student at 0x24fe7b0>
with identity key (<class 'ProjectParties.Student'>, (7713,))
conflicts with persistent instance <Student at 0x140f090>

What's going on here with the flush error?

I've gone back to using just copy.copy(some_dictionary) just >>after<<
I read into my dictionaries and >>before<< I start doing stuff like
monteCarloSimulation (above) and addToTable (above). Even if I stick
the copy.copy() after addToTable(), it still manages to give me the
same error.

Message has been deleted

Az

unread,

Jun 4, 2010, 11:33:34 AM6/4/10

to sqlalchemy

Sorry, meant to reply to you Michael... ended up replying to myself!
---

Firstly, apologies if I'm demanding too much but basically I'm quite a
beginner at Python programming and this is for a University project,
which is why I'm keen to get this done (due in a few days!). So I hope
you won't mind me asking some questions that may seem really basic.

> deepcopy has issues because SQLAlchemy places extra information on your objects, i.e. an _sa_instance_state attribute, that you dont want in your
> copy. You *do* however need one to exist on your object. Therefore deepcopy is not supported right now by SQLAlchemy ORM objects.
> There are ways to manually blow away the old _sa_instance_state and put a new one on the object, but the most straightforward is to make a new
> object with __init__() and set up the attributes that are significant, instead of doing a full deep copy.

Could you explain what you mean by creating a new object with
__init__() and setting up the attributes? Would this be a new class
that isn't mapped using SQLA?

> if you do really want to use deepcopy, you'd have to implement __deepcopy__() on your objects and ensure that a new _sa_instance_state is set up,
> there are functions in sqlalchemy.orm.attributes which can help with that. This *should* be made an official SQLA recipe, but we haven't gotten
> around to it.

Could you please explain what you mean by that? Would it be possible
to give me an idea or an example of how such would work?

>> How can I stop it from closing the
>> sessions?
> nothing in SQLA closes sessions. Your program is doing that.

I'm not issuing a session.close() anywhere (I checked). Are there any
other ways of closing a session besides that? (If the answer is
"Plenty", don't worry about it... I'll try to track it down then)

Conor

unread,

Jun 4, 2010, 5:17:01 PM6/4/10

to sqlal...@googlegroups.com

On 06/03/2010 02:33 PM, Az wrote:
> Firstly, apologies if I'm demanding too much but basically I'm quite a
> beginner at Python programming and this is for a University project,
> which is why I'm keen to get this done (due in a few days!). So I hope
> you won't mind me asking some questions that may seem really basic.
>
>

>> deepcopy has issues because SQLAlchemy places extra information on your objects, i.e. an _sa_instance_state attribute, that you dont want in your
>> copy. You *do* however need one to exist on your object. Therefore deepcopy is not supported right now by SQLAlchemy ORM objects.
>>
>
>> There are ways to manually blow away the old _sa_instance_state and put a new one on the object, but the most straightforward is to make a new
>> object with __init__() and set up the attributes that are significant, instead of doing a full deep copy.
>>

> Could you explain what you mean by creating a new object with

> __init__() and setting up the attributes? Would this be a new class
> that isn't mapped using SQLA?
>
>

He just means creating a new instance of your mapped class and settings
its attributes manually, e.g.:

def copy(self):
copy = MyMappedClass()
copy.attr1 = self.attr1
copy.attr2 = self.attr2
return copy

>> if you do really want to use deepcopy, you'd have to implement __deepcopy__() on your objects and ensure that a new _sa_instance_state is set up,
>> there are functions in sqlalchemy.orm.attributes which can help with that. This *should* be made an official SQLA recipe, but we haven't gotten
>> around to it.
>>

> Could you please explain what you mean by that? Would it be possible
> to give me an idea or an example of how such would work?
>
>

In theory you can use a generic __deepcopy__ implementation for ORM
classes. A very simple version might be:

def orm_deepcopy(self, memo):
mapper = class_mapper(self.__class__)
result = self.__class__()
memo[id(self)] = result
for prop in mapper.iterate_properties():
value = getattr(self, prop.key)
setattr(result, prop.key, deepcopy(value, memo))
return result

class MyMappedClass(...):
__deepcopy__ = orm_deepcopy

Beware that this implementation does not handle overlapping properties
well (e.g. relations and their corresponding foreign key columns),
lazy-loading properties, read-only properties, clearing out
auto-incrementing primary keys, etc. I would not recommend this
approach, as a use-case-specific copy() method will be much easier to
tailor to your needs.

>>> How can I stop it from closing the
>>> sessions?
>>>
>
>> nothing in SQLA closes sessions. Your program is doing that.
>>

> I'm not issuing a session.close() anywhere (I checked). Are there any
> other ways of closing a session besides that? (If the answer is
> "Plenty", don't worry about it... I'll try to track it down then)
>

If you are in a web framework, it may be closing the session for you
(usually by calling Session.remove() on a ScopedSession). Additionally,
are you sure that your object-to-copy is not transient when you make
your deepcopy?

-Conor

Az

unread,

Jun 5, 2010, 9:06:57 PM6/5/10

to sqlalchemy

Cheers!

Creating a new instance of my mapped class and settings, manually.
Gotcha. I think this will be an easier solution for me.

Nah, I'm not in a web framework.

Additional Q:

+++

Currently, my database is being stored in memory and it's fine like
that since a) my data isn't very expansive and b) I'm running the
program (python Main.py) from a command line where I can just comment
out various functions in my Main file. However, I'm now designing a
GUI for my code where I want to be able to call each function
manually. Now, all functions that reference the session will reference
"session" I defined as:

Session = sessionmaker(bind=engine)
session = Session()

Thus after I run my finish my monteCarloBasic (defined way up at the
top), I'd probably hit another button that would do what I did for
CODE 2, i.e.

def checkFor4545(trials):
sid = 4545
print sid

project_id_list = list(students[sid].preferences)
for project_id in project_id_list
gotcha =
session.query(SimAllocation).filter(SimAllocation.student_id ==
sid).filter(PP.SimAllocation.alloc_proj == project_id).count()

print project_id, gotcha/trials

This basically counts the number of times student 4545 got each of his
projects for entire Monte-Carlo simulation and prints the average over
the trials.

So basically when I'm in command line mode and go "python Main.py" my
Main looks like:

trials = 100
monteCarloBasic(trials)
checkFor4545(trials)

So those will run one after the other.

Now when I've got a GUI, I'll have the Monte-Carlo run separately and
then afterwards, hit a button that corresponds to
'checkFor4545(trials)'. Now, the reason I used SQLAlchemy in the first
place (besides the general usefulness of SQL) is for data persistence.
Will 'checkFor4545(trials)' display the same output as it would if it
were run serially from Main.py? Will it reference the same "session"?
(Probably a question you'll want to slap your forehead over, but I
just want to verify I've got my understanding correct).

Additionally, when I save to a physical database file, what happens
everytime I run monteCarloBasic(trials) (since it writes to the
database). Will it rewrite it every time? Or will it keep appending to
it?

++++++

Az

Conor

unread,

Jun 5, 2010, 11:39:57 PM6/5/10

to sqlal...@googlegroups.com

I see nothing that indicates that they would NOT see the same session, but I do have some comments:

GUIs usually run long tasks in background threads to keep the UI responsive. If you were to do this, you would not want to use a single global session, because sharing a session between threads is a big no-no.
I'm concerned what the call to resetData does. If it resets student-project associations, then could it end up deleting the temp_allocs you just added in that trial?
What should happen if you run monteCarloBasic multiple times? It seems like you would get duplicate primary keys on SimAllocation rows after the 1st call.

Additionally, when I save to a physical database file, what happens
everytime I run monteCarloBasic(trials) (since it writes to the
database). Will it rewrite it every time? Or will it keep appending to
it?

I don't see anything that would indicate rewriting the database in the code that you have shown (except maybe as a side-effect of your resetData function that I noted above). Also, you may get duplicate primary key errors like I mentioned above.

-Conor

Az

unread,

Jun 6, 2010, 12:43:46 AM6/6/10

to sqlalchemy

This will probably help:

def addToTable():
"""Very simple SQLAlchemy function that populates the Student,
Project
and Supervisor tables."""
for student in students.itervalues():
session.add(student)
session.flush()

for project in projects.itervalues():
session.add(project)
session.flush()

for supervisor in supervisors.itervalues():
session.add(supervisor)
session.flush()

session.commit()

def resetData():
"""An essential helper function to help reset all student, project
and
supervisor data to their pre-allocation values"""

for student in students.itervalues():
student.allocated_project = None
student.allocated_proj_ref = None
student.allocated_rank = penalty

for supervisor in supervisors.itervalues():
supervisor.loading_limit = supervisor.original_quota -
supervisor.predecr_quota

for project in projects.itervalues():
project.allocated = False
project.blocked = False

So yes, my resetData function will revert _certain_ attributes for
only the Student, Supervisor and Project classes back to the state
they were in when I parsed them in from the text files. My students,
supervisors and projects dictionaries -- at the end of the entire M-C
-- will basically look like that. But if there are other attributes I
have not reverted, they won't change. For example my student has an
attribute called "overall_proby" which I get after calculating the
statistics from the M-C simulation. This won't change until I run the
entire simulation from start to end or terminate. However, his
"allocated_project" will remain None.

> If it resets
> student-project associations, then could it end up deleting the
> temp_allocs you just added in that trial?

temp_alloc belongs to the SimAllocation class so resetData() shouldn't
affect it.

> If you were to do this, you would not want to use a
> single global session, because sharing a session between threads
> is a big no-no.

Why is that a bad thing? And how can work around this single global
session? Basically, say in my GUI, I hit "RUN" for the Monte-Carlo
simulation and it does its work. And then I accidentally hit "RUN"
again. It'll recalculate everything right? Now when I click "STATS" to
get my statistics, I'll want to get the statistics for the second time
I ran the simulation. Thus how can I work around the "single global
session" problem?

> It seems like you would get duplicate primary keys on SimAllocation
> rows after the 1st call.

Once again any ways of avoiding this? I some sort of unique key for
the Monte-Carlo runs and have the incrementing row number was the only
thing I can think of since student ID's will repeat after every
individual trial is done.

Oh also: a BIG thank you for taking the time to help me out! I'm
almost at my deadline :)

> * GUIs usually run long tasks in background threads to keep the UI

> responsive. If you were to do this, you would not want to use a
> single global session, because sharing a session between threads
> is a big no-no.

> * I'm concerned what the call to resetData does. If it resets

> student-project associations, then could it end up deleting the
> temp_allocs you just added in that trial?

> * What should happen if you run monteCarloBasic multiple times? It

Az

unread,

Jun 6, 2010, 1:49:20 AM6/6/10

to sqlalchemy

Also adding a bit here:

My Student (this one is mapped) class looks like this:

class Student(object):
def __init__(self, ee_id, name, stream_id, overall_proby):
self.ee_id = ee_id
self.name = name
self.stream_id = stream_id
self.preferences = collections.defaultdict(set)
self.allocated_project = None
self.allocated_proj_ref = None
self.allocated_rank = None
self.own_project_id = None
self.own_project_sup = None
self.overall_proby = overall_proby

students_table = Table('studs', metadata,
Column('ee_id', Integer, primary_key=True),
Column('name', String),
Column('stream_id', String),
Column('allocated_rank', Integer),
Column('allocated_proj_ref', Integer, ForeignKey('projs.proj_id')),
Column('overall_proby', Float)
)

mapper(Student, students_table)

Now the reason I wanted to use copy was because I was going to be
changing certain data around -- specifically, self.allocated_project,
self.allocated_proj_ref and self.allocated_rank.

You suggested defining, or rather redefining copy such that:

> def copy(self):
> copy = MyMappedClass()
> copy.attr1 = self.attr1
> copy.attr2 = self.attr2
> return copy

Now does this look okay?

def copied(self):
copied = Student()
copied.allocated_project = self.allocated_project
copied.allocated_proj_ref = self.allocated_proj_ref
copied.allocated_rank = self.allocated_rank
return copied

Basically what I'm copying are dictionaries.

Thus the "students" dictionary looks like this:

students[ee_id] = Student(ee_id, name...)

So BEFORE I was just doing
students_copied = copy.copy(students)

How would I use the copy function I created?

students_copied = students.copied()?

And so if I used this version... my objects won't change (which was
why I wanted to use deepcopy) when I make changes to the copied
dictionary?

Thanks

Az

On Jun 6, 4:39 am, Conor <conor.edward.da...@gmail.com> wrote:

> * GUIs usually run long tasks in background threads to keep the UI

> responsive. If you were to do this, you would not want to use a
> single global session, because sharing a session between threads
> is a big no-no.

> * I'm concerned what the call to resetData does. If it resets

> student-project associations, then could it end up deleting the
> temp_allocs you just added in that trial?

> * What should happen if you run monteCarloBasic multiple times? It

Az

unread,

Jun 6, 2010, 3:58:04 PM6/6/10

to sqlalchemy

Hi Conor,

Basically I sat down and made some decisions and changes.

I've created an actual copy of the Student class as in I've now got
two classes, Student and StudentUnmapped. The Unmapped one has the
same attributes as the mapped one, except for being... well, unmapped.
Now I can a) use deepcopy and b) change the objects without worry.
resetData() will act on the unmapped dictionary as well so the mapped
object remains safe and unchanged.

Sorry for beating around the bush with questions that were a bit non-
SQLA.

Let's get back to some SQLA questions:

1. The only changes I'd push onto the mapped object would be... after
running my MC, I get a bunch of probabilities -- those I want to
persist. How do I modify the field in a table I've already
"session.commit()"-ed using the following function. This happens
pretty much after I've finished reading in the dictionaries
completely. After that I just add each thing to the relevant table.
But I'd want to update some attributes of student because I want to be
able to have in the database for access later.

def addToTable():
"""Very simple SQLAlchemy function that populates the Student,
Project
and Supervisor tables."""

for student in students.itervalues():
session.add(student)
session.flush()

for project in projects.itervalues():
session.add(project)
session.flush()

for supervisor in supervisors.itervalues():
session.add(supervisor)
session.flush()

session.commit()

2. Say I've now got a physical database and I've run my Monte-Carlo
multiple times. I think I'd either want to a) have the original M-C
sessions be overwritten or b) create another set of data, perhaps
using the data to differentiate the two. How can I do this? Can I
query each one separately? Or am I better off just with an overwrite?

3. Finally, regarding the GUI. If each function indicates a separate
"thread", then in that case, yes with my GUI I'd be passing the
session from thread to thread since I'm no longer just running Main.py
but rather, the constituent functions one by one. How do I deal with
this? The reason I used the database was because of persistence and I
definitely want my data to persist between threads (and after I've
closed my program) so I can use them for all manner of useful
calculations, queries and output.

If I can sort these three things out -- this entire project is wrapped
up :)

Thanks in advance!

Az

On Jun 6, 4:39 am, Conor <conor.edward.da...@gmail.com> wrote:

> * GUIs usually run long tasks in background threads to keep the UI

> responsive. If you were to do this, you would not want to use a
> single global session, because sharing a session between threads
> is a big no-no.

> * I'm concerned what the call to resetData does. If it resets

> student-project associations, then could it end up deleting the
> temp_allocs you just added in that trial?

> * What should happen if you run monteCarloBasic multiple times? It

Conor

unread,

Jun 7, 2010, 11:01:27 AM6/7/10

to sqlal...@googlegroups.com

On 06/06/2010 02:58 PM, Az wrote:

Hi Conor,

Basically I sat down and made some decisions and changes.

I've created an actual copy of the Student class as in I've now got
two classes, Student and StudentUnmapped. The Unmapped one has the
same attributes as the mapped one, except for being... well, unmapped.
Now I can a) use deepcopy and b) change the objects without worry.
resetData() will act on the unmapped dictionary as well so the mapped
object remains safe and unchanged.

Sounds good. Just beware that deepcopy will try to make copies of all the objects referenced by your StudentUnmapped objects (assuming you didn't define __deepcopy__), so you may end up copying projects, supervisors, etc.

It sounds like you want to a) INSERT students/projects/supervisors that don't yet exist in the database, and b) UPDATE students/projects/supervisors that do exist in the database. If so, I think you want to use session.merge instead of session.add.

2. Say I've now got a physical database and I've run my Monte-Carlo
multiple times. I think I'd either want to a) have the original M-C
sessions be overwritten or b) create another set of data, perhaps
using the data to differentiate the two. How can I do this? Can I
query each one separately? Or am I better off just with an overwrite?

You can indeed append the new set of data to the existing data. You would just need another column in SimAllocation to distinguish between different calls to monteCarloBasic. I would recommend using a database sequence or GUIDs to ensure that each call to monteCarloBasic gets a unique value for this column.

3. Finally, regarding the GUI. If each function indicates a separate
"thread", then in that case, yes with my GUI I'd be passing the
session from thread to thread since I'm no longer just running Main.py
but rather, the constituent functions one by one. How do I deal with
this? The reason I used the database was because of persistence and I
definitely want my data to persist between threads (and after I've
closed my program) so I can use them for all manner of useful
calculations, queries and output.

Just to be clear, by "thread" I mean actual system threads spawned by the the thread or threading module. If this is indeed what you want, then you probably have a UI thread and a worker thread that runs monteCarloBasic. Since you should not share a single session object between threads, you can:

Change monteCarloBasic to not rely on sessions (including their persistent objects) at all (you would have to make copies of your students, projects, and supervisors before hading them over to monteCarloBasic). You are already sort-of on this track by using StudentUnmapped objects. In this way, monteCarloBasic returns its results as a set of objects that are not attached to any session (either because they are unmapped or are transient instances), which the UI thread uses to update the database. How you pass data from worker threads to the UI thread is dependent on your GUI toolkit.
Change monteCarloBasic to create its own session from the sessionmaker object. This will let monteCarloBasic read and write from/to the database, but you will have to arrange for your UI thread session to expire_all or close itself appropriately so it can see the new data.

Again, this thread business is probably overkill for your project, so you may want to avoid it altogether.

-Conor

Az

unread,

Jun 7, 2010, 3:56:45 PM6/7/10

to sqlalchemy

> Sounds good. Just beware that deepcopy will try to make copies of all
> the objects referenced by your StudentUnmapped objects (assuming you
> didn't define __deepcopy__), so you may end up copying projects,
> supervisors, etc.

Good point. I'm deepcopying my students, projects and supervisors
dictionaries. But yes you're right, all of them have a reference to
other objects.

§§[Q1:] How will deepcopying the objects referenced by my
StudentUnmapped object affect me?

I also tried another structure for elegance...

class Student(object):
def __init__(self, ee_id, name, stream_id, overall_proby):
self.ee_id = ee_id
self.name = name
self.stream_id = stream_id
self.preferences = collections.defaultdict(set)
self.allocated_project = None
self.allocated_proj_ref = None
self.allocated_rank = None
self.own_project_id = None
self.own_project_sup = None
self.overall_proby = overall_proby

def __repr__(self):
return str(self)

def __str__(self):
return "%s %s %s: %s (OP: %s)" %(self.ee_id, self.name,
self.allocated_rank, self.allocated_project, self.overall_proby)

class StudentDBRecord(Student):
def __init__(self, student):
super(StudentDBRecord, self).__init__(student.ee_id,
student.name,
student.stream_id,
student.preferences,
student.allocated_project,
student.allocated_proj_ref,
student.allocated_rank,
student.own_project_id,
student.own_project_sup,
student.overall_proby)

mapper(StudentDBRecord, students_table, properties={'proj_id' :
relation(Project)})

Basically, the theory was I'd do all my algorithm stuff on the Student
objects and then after I've found an optimal solution I'll push those
onto the StudentDBRecord table for persistence...

However I ended up getting the following error:

#####

File "Main.py", line 25, in <module>
prefsTableFile = 'Database/prefs-table.txt')
File "/Users/Azfar/Dropbox/Final Year Project/SPAllocation/
DataReader.py", line 158, in readData
readProjectsFile(projectsFile)
File "/Users/Azfar/Dropbox/Final Year Project/SPAllocation/
DataReader.py", line 66, in readProjectsFile
supervisors[ee_id] = Supervisor(ee_id, name, original_quota,
loading_limit)
File "<string>", line 4, in __init__
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
state.py", line 71, in initialize_instance
fn(self, instance, args, kwargs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
mapper.py", line 1829, in _event_on_init
instrumenting_mapper.compile()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
mapper.py", line 687, in compile
mapper._post_configure_properties()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
mapper.py", line 716, in _post_configure_properties
prop.init()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
interfaces.py", line 408, in init
self.do_init()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
properties.py", line 714, in do_init
self._get_target()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
properties.py", line 726, in _get_target
self.mapper = mapper.class_mapper(self.argument, compile=False)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
util.py", line 564, in class_mapper
raise exc.UnmappedClassError(class_)
sqlalchemy.orm.exc.UnmappedClassError: Class 'ProjectParties.Student'
is not mapped

#####

§§[Q2:] What's that all about? Something wrong with the inheritence?

+++

> I would recommend using a database
> sequence or GUIDs to ensure that each call to monteCarloBasic gets a
> unique value for this column.

As another key sequence different from the simple "ident ==
row_number" I'm currently using right? I'll look into that.

+++

The thread business is indeed going over my head :S.

> In this way, monteCarloBasic returns its
> results as a set of objects that are not attached to any session
> (either because they are unmapped or are transient

> <http://www.sqlalchemy.org/docs/reference/orm/sessions.html#sqlalchemy...>

> instances), which the UI thread uses to update the database. How
> you pass data from worker threads to the UI thread is dependent on
> your GUI toolkit.

My GUI toolkit is Tkinter?

> 1. Change monteCarloBasic to not rely on sessions (including their
> persistent
> <http://www.sqlalchemy.org/docs/reference/orm/sessions.html#sqlalchemy...>

> objects) at all (you would have to make copies of your students,
> projects, and supervisors before hading them over to
> monteCarloBasic). You are already sort-of on this track by using
> StudentUnmapped objects. In this way, monteCarloBasic returns its
> results as a set of objects that are not attached to any session
> (either because they are unmapped or are transient

> <http://www.sqlalchemy.org/docs/reference/orm/sessions.html#sqlalchemy...>

> instances), which the UI thread uses to update the database. How
> you pass data from worker threads to the UI thread is dependent on
> your GUI toolkit.

> 2. Change monteCarloBasic to create its own session from the

Conor

unread,

Jun 7, 2010, 5:50:26 PM6/7/10

to sqlal...@googlegroups.com

On 06/07/2010 02:56 PM, Az wrote:

Sounds good. Just beware that deepcopy will try to make copies of all
the objects referenced by your StudentUnmapped objects (assuming you
didn't define __deepcopy__), so you may end up copying projects,
supervisors, etc.

Good point. I'm deepcopying my students, projects and supervisors
dictionaries. But yes you're right, all of them have a reference to
other objects.

§§[Q1:] How will deepcopying the objects referenced by my
StudentUnmapped object affect me?

By default, deepcopy will make one copy of everything in the object graph reachable by the object you feed it. The scary part is that, unless you also pass in a memo argument to each call to deepcopy, it will copy the entire graph every single call. So if you deepcopy the students dictionary and then deepcopy the projects dictionary, each student's allocated_proj attribute will not match any instance in the projects dictionary. This is why a use-case-specific copy function is recommended: it is a lot easier to predict which objects will get copied and which objects will be shared.

I don't see any benefit to making StudentDBRecord inherit from Student. Try this:

class Student(object):
    [existing definitions]

    def create_db_record(self):
        result = StudentDBRecord()
        result.ee_id = self.ee_id
        [copy over other attributes]
        return result

class StudentDBRecord(object):
    pass

I don't know if there is a way to get the inheritance to work they way you want it, but not using inheritance like I did above sidesteps the issue.

I would recommend using a database
sequence or GUIDs to ensure that each call to monteCarloBasic gets a
unique value for this column.

As another key sequence different from the simple "ident ==
row_number" I'm currently using right? I'll look into that.

The problem is that your ident always starts at 1 for each call to monteCarloBasic. So, assuming your primary key for SimAllocation consists of some combination of (session_id, ident, stud_id), you will be reusing the same primary keys for each call to monteCarloBasic. If you want to overwrite the rows with the primary keys, then you should either DELETE the old rows first or maybe use session.merge(temp_alloc) to get the "find or create" behavior. If you do NOT want to overwrite the rows, then you need to ensure that some set of columns in SimAllocation is globally unique, regardless of how many times monteCarloBasic has been called. An easy way to do this is to change ident to use a database sequence or GUID, but there are many other solutions. You probably want to group together SimAllocations from a particular call to monteCarloBasic together, in which case you would add a run_id column to SimAllocation, where rows with the same run_id were created in the same call to monteCarloBasic. I think a primary key of (run_id, session_id/trial_id, stud_id) would be good.

The thread business is indeed going over my head :S.

In this way, monteCarloBasic returns its
     results as a set of objects that are not attached to any session
     (either because they are unmapped or are transient
     <http://www.sqlalchemy.org/docs/reference/orm/sessions.html#sqlalchemy...>
     instances), which the UI thread uses to update the database. How
     you pass data from worker threads to the UI thread is dependent on
     your GUI toolkit.

My GUI toolkit is Tkinter?

Never used it, sorry. In general, every UI toolkit has a message/event queue to which you can post messages from any thread. So you could do something like:

result = monteCarloBasic(...)

def runs_in_ui_thread():
    update_database(result)

ui_toolkit.post_callback(runs_in_ui_thread)

-Conor

Az

unread,

Jun 7, 2010, 8:27:58 PM6/7/10

to sqlalchemy

> By default, deepcopy will make one copy of everything in the object
> graph reachable by the object you feed it. The scary part is that,

> unless you also pass in a /memo/ argument to each call to deepcopy, it
> will copy the entire graph /every single call/. So if you deepcopy the

> students dictionary and then deepcopy the projects dictionary, each
> student's allocated_proj attribute will not match any instance in the
> projects dictionary. This is why a use-case-specific copy function is
> recommended: it is a lot easier to predict which objects will get copied
> and which objects will be shared.

Shouldn't it match? I mean the student can only get allocated a
project if it exists in the projects dictionary... or is that not the
point?

By use-case-specific, you mean I'll have to redefine deepcopy inside
each class like this: def __deepcopy__(self): something, something?

The only two places where this is an issue is for Supervisor's
"offered_proj" attribute (a set) where, naturally, each project is an
object and in Project where "proj_sup" is naturally a supervisor
object :D

The usefulness of my data structures comes back to bite me now...

> class Student(object):
> [existing definitions]
>
> def create_db_record(self):
> result = StudentDBRecord()
> result.ee_id = self.ee_id
> [copy over other attributes]
> return result
>
> class StudentDBRecord(object):
> pass

The create_db_record function... does it have to called explicitly
somewhere or does it automatically run?

If I now had to commit my Student data to the database... what would I
do?

> I think a primary key of
> (run_id, session_id/trial_id, stud_id) would be good

If I make them all primary keys I get a composite key right? Within an
entire M-C simulation the stud_id's would repeat in groups -- so if
there are 100 simulations, each stud_id appears 100 times in that
commit.

Run_id is a fantastic idea! I'd probably have it be the date and time?
Given that the simulation takes a while to run... the time will have
changed sufficiently for uniqueness. However, then querying becomes a
pain because of whatever format the date and time data will be in...
so in that case, what is a GUID and is that something we could give to
the Monte-Carlo ourselves before the run as some sort of argument? It
would be the same for an entire run but different from run to run (so
not unique from row to row, but unique from one run set to the other).
Any thoughts on this?

> Never used it, sorry. In general, every UI toolkit has a message/event
> queue to which you can post messages from any thread. So you could do
> something like:
>
> result = monteCarloBasic(...)
>
> def runs_in_ui_thread():
> update_database(result)
>
> ui_toolkit.post_callback(runs_in_ui_thread)

Thanks for that. Now I know what to search for (message, event queue,
callback) :)

On Jun 7, 10:50 pm, Conor <conor.edward.da...@gmail.com> wrote:
> On 06/07/2010 02:56 PM, Az wrote:
>
> >> Sounds good. Just beware that deepcopy will try to make copies of all
> >> the objects referenced by your StudentUnmapped objects (assuming you
> >> didn't define __deepcopy__), so you may end up copying projects,
> >> supervisors, etc.
>
> > Good point. I'm deepcopying my students, projects and supervisors
> > dictionaries. But yes you're right, all of them have a reference to
> > other objects.
>

> > ï¿½ï¿½[Q1:] How will deepcopying the objects referenced by my

> > StudentUnmapped object affect me?
>
> By default, deepcopy will make one copy of everything in the object
> graph reachable by the object you feed it. The scary part is that,

> unless you also pass in a /memo/ argument to each call to deepcopy, it
> will copy the entire graph /every single call/. So if you deepcopy the

> > ï¿½ï¿½[Q2:] What's that all about? Something wrong with the inheritence?

Az

unread,

Jun 7, 2010, 8:50:40 PM6/7/10

to sqlalchemy

Additionally, before I tried out the create_db_record...

However whenever I try to commit:

#[§§0§]

student_records = best_node
for rec in student_records.itervalues():
MCS.session.add(rec)
MCS.session.commit()

I get:

sqlalchemy.orm.exc.UnmappedInstanceError: Class
'ProjectParties.Student' is not mapped

What is best_node?

Well it starts from here:

students_preopt = copy.deepcopy(DR.students)

This is passed to a function that spits out something called preopt
(this is where I take one of my best sessions and replicate what the
students dictionary looked like at the time)

And then I pass this to my simulated annealing algorithm which:
best_node = preopt
studOld = best_node
studNew = copy.deepcopy(studOld)

Now if studNew is better that studOld (by some weighting function):

Then best_node = studNew and we find ourselves at #[§§01]

But in short, studNew is the same as students which is mapped to
Student...

I need to get studNew into something that is mapped to
StudentDBRecord... is that where I use the function "create_db_record"
comes in?

> ...
>
> read more »

Conor

unread,

Jun 8, 2010, 10:07:30 AM6/8/10

to sqlal...@googlegroups.com

On 06/07/2010 07:27 PM, Az wrote:

By default, deepcopy will make one copy of everything in the object
graph reachable by the object you feed it. The scary part is that,
unless you also pass in a /memo/ argument to each call to deepcopy, it
will copy the entire graph /every single call/. So if you deepcopy the
students dictionary and then deepcopy the projects dictionary, each
student's allocated_proj attribute will not match any instance in the
projects dictionary. This is why a use-case-specific copy function is
recommended: it is a lot easier to predict which objects will get copied
and which objects will be shared.

Shouldn't it match? I mean the student can only get allocated a
project if it exists in the projects dictionary... or is that not the
point?

By use-case-specific, you mean I'll have to redefine deepcopy inside
each class like this: def __deepcopy__(self): something, something?

The only two places where this is an issue is for Supervisor's
"offered_proj" attribute (a set) where, naturally, each project is an
object and in Project where "proj_sup" is naturally a supervisor
object :D

The usefulness of my data structures comes back to bite me now...

In theory, the following will work, ignoring ORM deepcopy issues discussed at the beginning of this thread:

memo = {}
copied_students = copy.deepcopy(students, memo)
copied_supervisors = copy.deepcopy(supervisors, memo)
copied_projects = copy.deepcopy(projects, memo)

After you do this, memo will contain a record of all copied objects. You should examine memo.values() to see if it is copying more than you expected. If it did copy just what you expected, then my worries were unfounded.

By use-case-specific, I meant define your own copy_objects function that explicitly specifies what is copied:

def copy_objects(students, supervisors, projects):
    memo = {}
    copied_students = {}
    copied_supervisors = {}
    copied_projects = {}

    def copy_student(student):
        student_id = id(student)
        if student_id in memo:
            return memo[student_id]

        copied_student = Student()
        memo[student_id] = copied_student
        copied_student.attr1 = student.attr1
        [copy rest of student's attributes]
        if you_need_to_copy_students_project:
            copied_student.allocated_proj = copy_project(student.allocated_proj)
        return copied_student

    [define copy_supervisor]
    [define copy_project]

    copied_students = dict((key, copy_student(student)) for (key, student) in students.iteritems())
    copied_supervisors = dict((key, copy_supervisor(supervisor)) for (key, supervisor) in supervisors.iteritems())
    copied_projects = dict((key, copy_project(project)) for (key, project) in projects.iteritems())
    return (copied_students, copied_supervisors, copied_projects)

As you can see, this makes it clear which objects are copied and which are shared. In retrospect, I think I assumed you didn't want to make copies of your supervisors or projects when I recommended the use-case-specific approach, which kind of violates the spirit of deepcopy. Oh well, my bad.

class Student(object):
    [existing definitions]

    def create_db_record(self):
        result = StudentDBRecord()
        result.ee_id = self.ee_id
        [copy over other attributes]
        return result

class StudentDBRecord(object):
    pass

The create_db_record function... does it have to called explicitly
somewhere or does it automatically run?

You have to call it explicitly, e.g.:

for unmapped_student in unmapped_students:
    mapped_student = unmapped_student.create_db_record()
    # I assume you want "find or create" behavior,
    # so use session.merge instead of session.add.
    mapped_student = session.merge(mapped_student)

[...]

I think a primary key of
(run_id, session_id/trial_id, stud_id) would be good

If I make them all primary keys I get a composite key right? Within an
entire M-C simulation the stud_id's would repeat in groups -- so if
there are 100 simulations, each stud_id appears 100 times in that
commit.

Run_id is a fantastic idea! I'd probably have it be the date and time?
Given that the simulation takes a while to run... the time will have
changed sufficiently for uniqueness. However, then querying becomes a
pain because of whatever format the date and time data will be in...
so in that case, what is a GUID and is that something we could give to
the Monte-Carlo ourselves before the run as some sort of argument? It
would be the same for an entire run but different from run to run (so
not unique from row to row, but unique from one run set to the other).
Any thoughts on this?

Yes, session_id/trial_id and stud_id can repeat, and you can still group things together by run_id. Alternatively, you could add an autoincrementing primary key to SimAllocation, but I believe it is redundant since the combination (run_id, session_id/trial_id, stud_id) should be unique anyway. run_id can definitely be a datetime, but I'm not sure how well sqlite (it sounds like you're using sqlite) supports datetimes in queries (see http://www.sqlalchemy.org/docs/reference/dialects/sqlite.html#date-and-time-types). A GUID (or UUID) is just a 128-bit value (usually random); the benefit here is you can generate it on the client side and be confident that it will be unique on the server (to avoid duplicate primary key errors). Using datetimes or database sequences would also work. You can definitely pass the run_id as an argument to monteCarloBasic, or to each object's create_db_record method.

-Conor

Az

unread,

Jun 8, 2010, 11:54:59 PM6/8/10

to sqlalchemy

> memo = {}
> copied_students = copy.deepcopy(students, memo)
> copied_supervisors = copy.deepcopy(supervisors, memo)
> copied_projects = copy.deepcopy(projects, memo)
>
> After you do this, memo will contain a record of all copied objects. You
> should examine memo.values() to see if it is copying more than you
> expected. If it did copy just what you expected, then my worries were
> unfounded.

I'll let you know how that turns out soonish. While I know it's my
data, is there anything you can suggest from your experience that you
consider to be "unexpected"?

> Yes, session_id/trial_id and stud_id can repeat, and you can still group
> things together by run_id. Alternatively, you could add an
> autoincrementing primary key to SimAllocation, but I believe it is
> redundant since the combination (run_id, session_id/trial_id, stud_id)
> should be unique anyway. run_id can definitely be a datetime, but I'm
> not sure how well sqlite (it sounds like you're using sqlite) supports

> datetimes in queries (seehttp://www.sqlalchemy.org/docs/reference/dialects/sqlite.html#date-an...).

> A GUID (or UUID) is just a 128-bit value (usually random); the benefit
> here is you can generate it on the client side and be confident that it
> will be unique on the server (to avoid duplicate primary key errors).
> Using datetimes or database sequences would also work. You can
> definitely pass the run_id as an argument to monteCarloBasic, or to each
> object's create_db_record method.

Also I get why you mention three keys: run_id/guid/uuid and session_id/
trial_id alone won't suffice... but since we know there are unique
students (within each single allocation run, etc. So I can get rid of
the "ident" then? It serves no other purpose really if I can get a key
combo that's unique and works for.

I am indeed using SQLite3. I take it take my physical database has to
something like:
engine = create_engine('sqlite:///Database/spalloc.sqlite3',
echo=False)?

Also I take it I should generate the UUID (http://docs.python.org/
library/uuid.html) when I call the MonteCarloBasic function right?
Since it should be the same for each "call", I take I'll have to
generate it before the loop. Additionally, how would I actually query
a 128-bit value? Say I have a bit in my GUI where the supervisor can
put in a UUID to pull the data off the Database. How would he actually
know which UUID to put in? Any ideas?

Also once I've got my stuff in the physical database and after my
program is done, I'd call session.close() right? How do I access the
DB data then? Would I have to write some separate functions that allow
me to access the data without using (for example)
'session.query(Student)...`? This way the user (i.e. my supervisor)
won't have to keep running the readData, monteCarloBasic, etc
functions just to access the DB (that would be poor indeed!).

> datetimes in queries (seehttp://www.sqlalchemy.org/docs/reference/dialects/sqlite.html#date-an...).

Message has been deleted

Az

unread,

Jun 9, 2010, 1:44:13 AM6/9/10

to sqlalchemy

Traceback (most recent call last):

File "Main.py", line 39, in <module>
MCS.monteCarloBasic(trials)
File "/XXXX/MonteCarloSimulation.py", line 163, in monteCarloBasic
session.merge(temp_alloc)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
session.py", line 1158, in merge
self._autoflush()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
session.py", line 897, in _autoflush
self.flush()

File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
session.py", line 1354, in flush
self._flush(objects)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
session.py", line 1432, in _flush
flush_context.execute()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
unitofwork.py", line 261, in execute
UOWExecutor().execute(self, tasks)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
unitofwork.py", line 753, in execute
self.execute_save_steps(trans, task)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
unitofwork.py", line 768, in execute_save_steps
self.save_objects(trans, task)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
unitofwork.py", line 759, in save_objects
task.mapper._save_obj(task.polymorphic_tosave_objects, trans)

File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/
mapper.py", line 1428, in _save_obj
c = connection.execute(statement.values(value_params), params)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/
engine/base.py", line 824, in execute
return Connection.executors[c](self, object, multiparams, params)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/
engine/base.py", line 874, in _execute_clauseelement
return self.__execute_context(context)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/
engine/base.py", line 896, in __execute_context
self._cursor_execute(context.cursor, context.statement,
context.parameters[0], context=context)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/
engine/base.py", line 950, in _cursor_execute
self._handle_dbapi_exception(e, statement, parameters, cursor,
context)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/
lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/
engine/base.py", line 931, in _handle_dbapi_exception
raise exc.DBAPIError.instance(statement, parameters, e,
connection_invalidated=is_disconnect)
sqlalchemy.exc.IntegrityError: (IntegrityError) columns uid,
session_id, stud_id are not unique u'INSERT INTO sim_alloc (ident,
uid, session_id, stud_id, alloc_proj, alloc_proj_rank) VALUES
(?, ?, ?, ?, ?, ?)' [1, '1d295f48-7386-11df-8e87-00264a052efc', 1,
5796, 1100009, 1]

########

Good news: Got the UUID working in a snap.
Bad news: See error :(

Note: This happened when I started using 'session.merge(temp_alloc)'
instead of 'session.add'

########

On Jun 9, 4:54 am, Az <azfarul.is...@gmail.com> wrote:

Conor

unread,

Jun 9, 2010, 10:40:10 AM6/9/10

to sqlal...@googlegroups.com

On 06/08/2010 10:54 PM, Az wrote:

memo = {}
copied_students = copy.deepcopy(students, memo)
copied_supervisors = copy.deepcopy(supervisors, memo)
copied_projects = copy.deepcopy(projects, memo)

After you do this, memo will contain a record of all copied objects. You
should examine memo.values() to see if it is copying more than you
expected. If it did copy just what you expected, then my worries were
unfounded.

I'll let you know how that turns out soonish. While I know it's my
data, is there anything you can suggest from your experience that you
consider to be "unexpected"?

Expected: students, supervisors, projects, dictionaries of said objects, and other attribute values (strings, ints, lists, etc.). Unexpected: anything else, especially sessions, InstanceState objects, or other ORM support objects.

Yes, session_id/trial_id and stud_id can repeat, and you can still group
things together by run_id. Alternatively, you could add an
autoincrementing primary key to SimAllocation, but I believe it is
redundant since the combination (run_id, session_id/trial_id, stud_id)
should be unique anyway. run_id can definitely be a datetime, but I'm
not sure how well sqlite (it sounds like you're using sqlite) supports
datetimes in queries (seehttp://www.sqlalchemy.org/docs/reference/dialects/sqlite.html#date-an...).
A GUID (or UUID) is just a 128-bit value (usually random); the benefit
here is you can generate it on the client side and be confident that it
will be unique on the server (to avoid duplicate primary key errors).
Using datetimes or database sequences would also work. You can
definitely pass the run_id as an argument to monteCarloBasic, or to each
object's create_db_record method.

Also I get why you mention three keys: run_id/guid/uuid and session_id/
trial_id alone won't suffice... but since we know there are unique
students (within each single allocation run, etc. So I can get rid of
the "ident" then? It serves no other purpose really if I can get a key
combo that's unique and works for.

Yes, ident is redundant if you have those three columns.

I am indeed using SQLite3. I take it take my physical database has to
something like:
engine = create_engine('sqlite:///Database/spalloc.sqlite3',
echo=False)?

Looks good.

Also I take it I should generate the UUID (http://docs.python.org/
library/uuid.html) when I call the MonteCarloBasic function right?
Since it should be the same for each "call", I take I'll have to
generate it before the loop. Additionally, how would I actually query
a 128-bit value? Say I have a bit in my GUI where the supervisor can
put in a UUID to pull the data off the Database. How would he actually
know which UUID to put in? Any ideas?

Yes, one UUID generation per call to monteCarloBasic. As for knowing which UUID to query on, you can always query distinct values of the run_id column, e.g. session.query(SimAllocation.run_id).distinct().all(), and present them as a list to the user. However that doesn't really help people know which UUID to use. Using timestamps (i.e. columns of type sqlalchemy.DateTime) instead of UUIDs for SimAllocation.run_id may improve that situation.

Also once I've got my stuff in the physical database and after my
program is done, I'd call session.close() right? How do I access the
DB data then? Would I have to write some separate functions that allow
me to access the data without using (for example)
'session.query(Student)...`? This way the user (i.e. my supervisor)
won't have to keep running the readData, monteCarloBasic, etc
functions just to access the DB (that would be poor indeed!).

My impression is that readData is only used to import/migrate data into the database, and that you wouldn't call it very often.

Calling session.close() is not necessary if you have a single global session like you do. You only need it if you are worried that the database might get modified concurrently by another transaction (from a different process, session, etc.). Having said this, session.close() does not prevent you from using the session later on: it just closes out any pending transaction and expunges all object instances (including any student, supervisor, and project instances you may have added/loaded). This ensures that it sees fresh data for any future queries.

In conclusion, using session.query(Student)... should work whether you have run monteCarloBasic or not.

-Conor

Conor

unread,

Jun 9, 2010, 11:25:09 AM6/9/10

to sqlal...@googlegroups.com

The most likely cause is if you call session.add(temp_alloc) after calling session.merge(temp_alloc) for the same temp_alloc object. I noticed your original monteCarloBasic had two calls to session.add(temp_alloc); did both get changed to session.merge(temp_alloc)? If that doesn't work, can you verify that SQLAlchemy's primary key for SimAllocation matches the database's primary key for sim_alloc? What column type are you using for uid? Which call to session.merge is failing (line 163 according to your traceback), the one inside your "for rank in ranks" loop or the one outside?

Also, since you know you are creating new sim_alloc rows in the database (instead of overwriting existing ones), you can use session.add instead of session.merge. This will prevent unnecessary SELECTs to your database.

-Conor

Az

unread,

Jun 9, 2010, 3:45:30 PM6/9/10

to sqlalchemy

> Expected: students, supervisors, projects, dictionaries of said objects,
> and other attribute values (strings, ints, lists, etc.). Unexpected:
> anything else, especially sessions, InstanceState objects, or other ORM
> support objects.

Actually got some stuff like the following (copy-pasting bits from my
print output):

(<class 'sqlalchemy.orm.state.InstanceState'>,)
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at
0x2d5beb0>, 'proj_id': 1100034, 'postsim_probs': [], 'proj_sup': 1291,
'presim_pop': 0, 'own_project': False, 'allocated': False,
'proj_name': 'MPC on a Chip', 'blocked': False}

Stuff like that :S

> Calling session.close() is not necessary if you have a single global
> session like you do. You only need it if you are worried that the
> database might get modified concurrently by another transaction (from a
> different process, session, etc.). Having said this, session.close()
> does not prevent you from using the session later on: it just closes out
> any pending transaction and expunges all object instances (including any
> student, supervisor, and project instances you may have added/loaded).
> This ensures that it sees fresh data for any future queries.
>
> In conclusion, using session.query(Student)... should work whether you
> have run monteCarloBasic or not.

Excellent :)

> The most likely cause is if you call session.add(temp_alloc) after
> calling session.merge(temp_alloc) for the same temp_alloc object. I
> noticed your original monteCarloBasic had two calls to
> session.add(temp_alloc); did both get changed to
> session.merge(temp_alloc)? If that doesn't work, can you verify that
> SQLAlchemy's primary key for SimAllocation matches the database's
> primary key for sim_alloc? What column type are you using for uid? Which
> call to session.merge is failing (line 163 according to your traceback),
> the one inside your "for rank in ranks" loop or the one outside?

Oh yeah good point, they're separate calls. Basically for the one in
"for rank in ranks"
adds for a student getting a project, the other adds if a student
doesn't get a project since we want
to track all students (allocated or not, since the state of being
unallocated is what gives
us motivation to optimise the results).

Anyway, session.merge() is for overwriting previously existing values
right? Now thanks to the UUID I can add multiple calls to
monteCarloBasic() to my physical database :)

I basically wrote a small function that, for everytime the
monteCarloBasic() is called, will append the UUID, the number of
trials ran and the date-time to a text file. My supervisor would have
to copy paste that into a GUI text field or the command line but it's
not that much of a hassle, given the usefulness of the database.

Conor

unread,

Jun 9, 2010, 4:46:59 PM6/9/10

to sqlal...@googlegroups.com

On 06/09/2010 02:45 PM, Az wrote:

>> Expected: students, supervisors, projects, dictionaries of said objects,
>> and other attribute values (strings, ints, lists, etc.). Unexpected:
>> anything else, especially sessions, InstanceState objects, or other ORM
>> support objects.
>>
> Actually got some stuff like the following (copy-pasting bits from my
> print output):
>
> (<class 'sqlalchemy.orm.state.InstanceState'>,)
> {'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at
> 0x2d5beb0>, 'proj_id': 1100034, 'postsim_probs': [], 'proj_sup': 1291,
> 'presim_pop': 0, 'own_project': False, 'allocated': False,
> 'proj_name': 'MPC on a Chip', 'blocked': False}
>
> Stuff like that :S
>

I'm not sure what that printout indicates. Try this as your debug printout:

def get_memo_type_count(memo):
retval = {}
for obj in memo.itervalues():
type_ = obj.__class__
retval[type_] = retval.get(type_, 0) + 1
return retval

[perform deep copies]
type_count = get_memo_type_count(memo)
import pprint
pprint.pprint(type_count)

This will tell you, e.g. how may Student objects were copied, how many
InstanceState objects were copied, etc. Remember that you will have to
override __deepcopy__ on your mapped classes or use the
use-case-specific copy function to prevent ORM attributes (such as
_sa_instance_state) from being copied.

> [...]

>> The most likely cause is if you call session.add(temp_alloc) after
>> calling session.merge(temp_alloc) for the same temp_alloc object. I
>> noticed your original monteCarloBasic had two calls to
>> session.add(temp_alloc); did both get changed to
>> session.merge(temp_alloc)? If that doesn't work, can you verify that
>> SQLAlchemy's primary key for SimAllocation matches the database's
>> primary key for sim_alloc? What column type are you using for uid? Which
>> call to session.merge is failing (line 163 according to your traceback),
>> the one inside your "for rank in ranks" loop or the one outside?
>>
> Oh yeah good point, they're separate calls. Basically for the one in
> "for rank in ranks"
> adds for a student getting a project, the other adds if a student
> doesn't get a project since we want
> to track all students (allocated or not, since the state of being
> unallocated is what gives
> us motivation to optimise the results).
>

Your original monteCarloBasic definition had this:

for rank in ranks:

proj = random.choice(list(student.preferences[rank]))
if not (proj.allocated or proj.blocked or proj.own_project):

[...]
session.add(temp_alloc) # #1
break

ident += 1
session.add(temp_alloc) # #2

session.add #1 is redundant since #2 gets called regardless of whether
the student gets allocated a project or not (ignoring exceptions). Just
a minor nitpick.

> Anyway, session.merge() is for overwriting previously existing values
> right? Now thanks to the UUID I can add multiple calls to
> monteCarloBasic() to my physical database :)
>

session.merge gives you "find or create" behavior: look for an existing
object in the database, or create a new one if no existing object is
found. Note that session.merge requires you to completely fill in the
object's primary key whereas session.add does not.

> I basically wrote a small function that, for everytime the
> monteCarloBasic() is called, will append the UUID, the number of
> trials ran and the date-time to a text file. My supervisor would have
> to copy paste that into a GUI text field or the command line but it's
> not that much of a hassle, given the usefulness of the database.
>

Sounds pretty ugly. What if you add extra tables to represent runs
and/or trials?

class Run(Base):
# Having a separate table here gives you nice auto-incrementing run ids
# and lets you attach additional information to a run, such as timestamp,
# human-supplied comment, etc.
__tablename__ = 'run'
id = Column(Integer, primary_key=True)
timestamp = Column(DateTime, nullable=False)
# comment = Column(UnicodeText(100), nullable=False)

trials = relationship('Trial',
back_populates='run',
order_by=lambda: Trial.id.asc())

class Trial(Base):
# Having a separate table here is of dubious value, but hey it makes the
# relationships a bit nicer!
__tablename__ = 'trial'
__table_args__ = (PrimaryKeyConstraint('run_id', 'id'), {})
run_id = Column(Integer, ForeignKey('run.id'))
id = Column(Integer)

run = relationship('Run', back_populates='trials')
sim_allocs = relationship('SimAllocation', back_populates='trial')

class SimAllocation(Base):
...
__table_args__ = (PrimaryKeyConstraint('run_id', 'trial_id', 'stud_id'),
ForeignKeyConstraint(['run_id', 'trial_id'],
['trial.run_id', 'trial.id']),
{})

run_id = Column(Integer)
trial_id = Column(Integer)
stud_id = Column(Integer)

trial = relationship('Trial', back_populates='sim_allocs')

-Conor

Az

unread,

Jun 10, 2010, 1:33:28 PM6/10/10

to sqlalchemy

The pprintout was:

{<type 'collections.defaultdict'>: 156,
<type 'bool'>: 2,
<type 'float'>: 1,
<type 'int'>: 538,
<type 'list'>: 1130,
<type 'dict'>: 867,
<type 'NoneType'>: 1,
<type 'set'>: 932,
<type 'str'>: 577,
<type 'tuple'>: 1717,
<type 'type'>: 5,
<class 'sqlalchemy.util.symbol'>: 1,
<class 'sqlalchemy.orm.state.InstanceState'>: 236,
<class 'ProjectParties.Student'>: 156,
<class 'ProjectParties.Supervisor'>: 39,
<class 'ProjectParties.Project'>: 197}

I think the InstanceStates come from the Supervisor and Project
classes (197+39 = 236)

Ah true, my solution was rather hacky and not very elegant.

Your class definitions... are you defining both table and Class in one
go? Would I have to change the way my monteCarloBasic creates
instances of SimAllocation?

Az

unread,

Jun 10, 2010, 1:56:26 PM6/10/10

to sqlalchemy

So I laid them out like this:

class Run(Base):
# For autoincrementing run IDs
# Allows addition of more information to a run

__tablename__ = 'run'
id = Column(Integer, primary_key=True)
timestamp = Column(DateTime, nullable=False)
# comment = Column(UnicodeText(100), nullable=False)

trials = relationship('Trial',
back_populates='run',
order_by=lambda: Trial.id.asc())

class Trial(Base):
# Having a separate table here is of dubious value, but hey it
makes the
# relationships a bit nicer!
__tablename__ = 'trial'
__table_args__ = (PrimaryKeyConstraint('run_id', 'id'), {})
run_id = Column(Integer, ForeignKey('run.id'))
id = Column(Integer)

run = relationship('Run', back_populates='trials')
sim_allocs = relationship('SimAllocation', back_populates='trial')

class SimAllocation(Base):

#
__tablename__ = 'sim_alloc'

__table_args__ = (PrimaryKeyConstraint('run_id', 'trial_id',
'stud_id'),
ForeignKeyConstraint(['run_id', 'trial_id'],
['trial.run_id',
'trial.id']),
{})

run_id = Column(Integer)
trial_id = Column(Integer)
stud_id = Column(Integer)

trial = relationship('Trial', back_populates='sim_allocs')

def __init__(self, ident, uid, session_id, stud_id, alloc_proj_rank):
self.ident = ident
self.uid = uid
self.session_id = session_id
self.stud_id = stud_id
self.alloc_proj = None
self.alloc_proj_ref = None
self.alloc_proj_rank = alloc_proj_rank

def __repr__(self):
return str(self)

def __str__(self):

return "Row: %s UID: %s - %s: Student: %s (Project: %s - Rank: %s)" %
(self.ident, self.uid, self.session_id, self.stud_id, self.alloc_proj,
self.alloc_proj_rank)

#####

The original mapping was:

simulation_allocation = Table('sim_alloc', metadata,
Column('ident', Integer),
Column('uid', String, primary_key=True),
Column('session_id', Integer, primary_key=True),
Column('stud_id', Integer, ForeignKey('studs.ee_id'),
primary_key=True),
Column('alloc_proj', Integer, ForeignKey('projs.proj_id')),
Column('alloc_proj_rank', Integer)
)

mapper(SimAllocation, simulation_allocation, properties={'stud' :
relation(StudentDBRecord), 'proj' : relation(Project)})

Of course, I'd get rid of the project relationship since an
allocated_project and allocated_proj_ref *can* be a NoneType....
(realised that right now!)

Additionally, I'd like to maintain the ForeignKey relationship with
the StudentDRRecord table for pulling in info about a student.

Also, I've not got rid of ident because I don't know how else to map
SimAllocation to a dictionary as well. The only thing I could use for
keys was the IDENT before but now that we have a composite key, what
happens to the dictionary? However, the dictionary will just hold
information for the current run really.

Az

unread,

Jun 10, 2010, 5:32:53 PM6/10/10

to sqlalchemy

Let me take a guess:

class Supervisor(object):
def __init__(self, ee_id, name, original_quota, loading_limit):

self.ee_id = ee_id
self.name = name

self.original_quota = original_quota
self.loading_limit = loading_limit
self.predecr_quota = 0
self.offered_proj = set()
self.total_prealloc_pop = 0
self.total_postalloc_pop = 0

def __repr__(self):
return str(self)

def __str__(self):

return self.name
return "%s %s %s (Offered projects: %s)" %(self.ee_id, self.name,
self.predecr_quota, self.offered_proj)

So *inside* the Supervisor class would I define it like this (trying
to have a go at it)?

def __deepcopy__(self, memo):
dc = type(self)()
dc.__dict__.update(self.__dict__)
for attr in dir(supervisor):
if not attr.startswight('__'):
self.attr = deepcopy(self.attr, memo)

So this only overrides __deepcopy__ when I call it for a Supervisor
and not for any of the other classes right?

> ...
>
> read more »

Az

unread,

Jun 12, 2010, 8:45:12 PM6/12/10

to sqlalchemy

Hi Conor,

Many apologies for being pushy but since I'm pretty much in the
processing of finishing up my code (due in two days), I wonder if you
could just take a look at the last three posts of mine---these
constitute the final hurdle and I'll be done :)

Cheers,

Az

On Jun 9, 9:46 pm, Conor <conor.edward.da...@gmail.com> wrote:

Conor

unread,

Jun 28, 2010, 6:22:15 PM6/28/10

to sqlal...@googlegroups.com

The location of the __deepcopy__ method is correct, but there are several problems with the implementation:

You are modifying self, when you want to modify dc.
You are not returning dc.
dir(supervisor) (I assume you meant dir(self)) will include attributes from the class object itself, but you don't want to make copies of those attributes. I would recommend iterating over self.__dict__ instead or manually specifying the attributes to be copied.
attr.startswith('__') will not catch '_sa_instance_state': you probably want attr.startswith('_') or attr.startswith('_sa_') instead.

So this only overrides __deepcopy__ when I call it for a Supervisor
and not for any of the other classes right?

Correct.

-Conor

Conor

unread,

Jun 28, 2010, 6:27:09 PM6/28/10

to sqlal...@googlegroups.com

On 06/10/2010 12:33 PM, Az wrote:

The pprintout was:

{<type 'collections.defaultdict'>: 156,
 <type 'bool'>: 2,
 <type 'float'>: 1,
 <type 'int'>: 538,
 <type 'list'>: 1130,
 <type 'dict'>: 867,
 <type 'NoneType'>: 1,
 <type 'set'>: 932,
 <type 'str'>: 577,
 <type 'tuple'>: 1717,
 <type 'type'>: 5,
 <class 'sqlalchemy.util.symbol'>: 1,
 <class 'sqlalchemy.orm.state.InstanceState'>: 236,
 <class 'ProjectParties.Student'>: 156,
 <class 'ProjectParties.Supervisor'>: 39,
 <class 'ProjectParties.Project'>: 197}

I think the InstanceStates come from the Supervisor and Project
classes (197+39 = 236)

Sounds right. You will need to override __deepcopy__ on those classes as well.

I assumed you were using the declarative extension (sqlalchemy.ext.declarative) to generate the table, class, and mapper in one go. It's not at all necessary: you can define the tables, classes, and mappers separately. Just use what you are most comfortable with.

-Conor

Reply all

Reply to author

Forward