> +++ Questions +++
>
> 1. Is this the correct way to use sessions or am I sort of abusing
> them?
I dont see any poor patterns of use above.
> 2. When should I close a session?
when you no longer need the usage of any of the objects associated with it, or any remaining objects are in a state which you will re-merge them into a new session before you next use them. The session in its default state of autocommit=False is just like going to your database and starting a transaction, doing some work - when you're done with the work, you close the transaction, and all the data associated with that trans (i.e. your ORM objects) is essentially "invalid"; other transactions can be modifying that data. Your objects are an extension of the Session, which should be considered as an object-oriented window onto a database transaction.
> 3. I got the following error after trying to use copy.deepcopy() on
> one of my dictionaries.
>
> "attribute refresh operation cannot proceed" % (state_str(state)))
> sqlalchemy.exc.UnboundExecutionError: Instance <Project at 0x24c5c50>
> is not bound to a Session; attribute refresh operation cannot proceed
don't do deepcopy() on a structure that contains ORM objects if their owning session has been closed. deepcopy on ORM objects probably has issues that prevent it from working as you'd expect. You'd be better off building copy constructors, i.e. def copy(self): return FooBar(....).
> "Owning session has been closed"? Can I still use deepcopy if the
> session has not been closed?
deepcopy has issues because SQLAlchemy places extra information on your objects, i.e. an _sa_instance_state attribute, that you dont want in your copy. You *do* however need one to exist on your object. Therefore deepcopy is not supported right now by SQLAlchemy ORM objects. There are ways to manually blow away the old _sa_instance_state and put a new one on the object, but the most straightforward is to make a new object with __init__() and set up the attributes that are significant, instead of doing a full deep copy.
if you do really want to use deepcopy, you'd have to implement __deepcopy__() on your objects and ensure that a new _sa_instance_state is set up, there are functions in sqlalchemy.orm.attributes which can help with that. This *should* be made an official SQLA recipe, but we haven't gotten around to it.
> How can I stop it from closing the
> sessions?
nothing in SQLA closes sessions. Your program is doing that.
> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>
He just means creating a new instance of your mapped class and settings
its attributes manually, e.g.:
def copy(self):
copy = MyMappedClass()
copy.attr1 = self.attr1
copy.attr2 = self.attr2
return copy
>> if you do really want to use deepcopy, you'd have to implement __deepcopy__() on your objects and ensure that a new _sa_instance_state is set up,
>> there are functions in sqlalchemy.orm.attributes which can help with that. This *should* be made an official SQLA recipe, but we haven't gotten
>> around to it.
>>
> Could you please explain what you mean by that? Would it be possible
> to give me an idea or an example of how such would work?
>
>
In theory you can use a generic __deepcopy__ implementation for ORM
classes. A very simple version might be:
def orm_deepcopy(self, memo):
mapper = class_mapper(self.__class__)
result = self.__class__()
memo[id(self)] = result
for prop in mapper.iterate_properties():
value = getattr(self, prop.key)
setattr(result, prop.key, deepcopy(value, memo))
return result
class MyMappedClass(...):
__deepcopy__ = orm_deepcopy
Beware that this implementation does not handle overlapping properties
well (e.g. relations and their corresponding foreign key columns),
lazy-loading properties, read-only properties, clearing out
auto-incrementing primary keys, etc. I would not recommend this
approach, as a use-case-specific copy() method will be much easier to
tailor to your needs.
>>> How can I stop it from closing the
>>> sessions?
>>>
>
>> nothing in SQLA closes sessions. Your program is doing that.
>>
> I'm not issuing a session.close() anywhere (I checked). Are there any
> other ways of closing a session besides that? (If the answer is
> "Plenty", don't worry about it... I'll try to track it down then)
>
If you are in a web framework, it may be closing the session for you
(usually by calling Session.remove() on a ScopedSession). Additionally,
are you sure that your object-to-copy is not transient when you make
your deepcopy?
-Conor
I see nothing that indicates that they would NOT see the same session,
but I do have some comments:
Additionally, when I save to a physical database file, what happens everytime I run monteCarloBasic(trials) (since it writes to the database). Will it rewrite it every time? Or will it keep appending to it?
I don't see anything that would indicate rewriting the database in
the code that you have shown (except maybe as a side-effect of your resetData
function that I noted above). Also, you may get duplicate primary key
errors like I mentioned above.
Hi Conor, Basically I sat down and made some decisions and changes. I've created an actual copy of the Student class as in I've now got two classes, Student and StudentUnmapped. The Unmapped one has the same attributes as the mapped one, except for being... well, unmapped. Now I can a) use deepcopy and b) change the objects without worry. resetData() will act on the unmapped dictionary as well so the mapped object remains safe and unchanged.
Sounds good. Just beware that deepcopy will try to make copies of
all the objects referenced by your StudentUnmapped objects
(assuming you didn't define __deepcopy__), so you may end up
copying projects, supervisors, etc.
It sounds like you want to a) INSERT students/projects/supervisors that
don't yet exist in the database, and b) UPDATE
students/projects/supervisors that do exist in the database. If so, I
think you want to use session.merge instead of session.add.
2. Say I've now got a physical database and I've run my Monte-Carlo multiple times. I think I'd either want to a) have the original M-C sessions be overwritten or b) create another set of data, perhaps using the data to differentiate the two. How can I do this? Can I query each one separately? Or am I better off just with an overwrite?
You can indeed append the new set of data to the existing data. You
would just need another column in SimAllocation to
distinguish between different calls to monteCarloBasic. I
would recommend using a database sequence or GUIDs to ensure that each
call to monteCarloBasic gets a unique value for this column.
3. Finally, regarding the GUI. If each function indicates a separate "thread", then in that case, yes with my GUI I'd be passing the session from thread to thread since I'm no longer just running Main.py but rather, the constituent functions one by one. How do I deal with this? The reason I used the database was because of persistence and I definitely want my data to persist between threads (and after I've closed my program) so I can use them for all manner of useful calculations, queries and output.
Just to be clear, by "thread" I mean actual system threads spawned
by the the thread or threading module. If this is
indeed what you want, then you probably have a UI thread and a worker
thread that runs monteCarloBasic. Since you should not share
a single session object between threads, you can:
Again, this thread business is probably overkill for your project,
so you may want to avoid it altogether.
-Conor
Sounds good. Just beware that deepcopy will try to make copies of all the objects referenced by your StudentUnmapped objects (assuming you didn't define __deepcopy__), so you may end up copying projects, supervisors, etc.Good point. I'm deepcopying my students, projects and supervisors dictionaries. But yes you're right, all of them have a reference to other objects. §§[Q1:] How will deepcopying the objects referenced by my StudentUnmapped object affect me?
By default, deepcopy will make one copy of everything in the object
graph reachable by the object you feed it. The scary part is that,
unless you also pass in a memo argument to each call to
deepcopy, it will copy the entire graph every single call. So
if you deepcopy the students dictionary and then deepcopy the projects
dictionary, each student's allocated_proj attribute will not
match any instance in the projects dictionary. This is why a
use-case-specific copy function is recommended: it is a lot easier to
predict which objects will get copied and which objects will be shared.
I don't see any benefit to making StudentDBRecord inherit
from Student. Try this:
class Student(object):
[existing definitions]
def create_db_record(self):
result = StudentDBRecord()
result.ee_id = self.ee_id
[copy over other attributes]
return result
class StudentDBRecord(object):
passI don't know if there is a way to get the inheritance to work they way you want it, but not using inheritance like I did above sidesteps the issue.
I would recommend using a database sequence or GUIDs to ensure that each call to monteCarloBasic gets a unique value for this column.As another key sequence different from the simple "ident == row_number" I'm currently using right? I'll look into that.
The problem is that your ident always starts at 1 for each
call to monteCarloBasic. So, assuming your primary key for
SimAllocation consists of some combination of (session_id, ident,
stud_id), you will be reusing the same primary keys for each call to monteCarloBasic.
If you want to overwrite the rows with the primary keys, then you
should either DELETE the old rows first or maybe use session.merge(temp_alloc)
to get the "find or create" behavior. If you do NOT want to overwrite
the rows, then you need to ensure that some set of columns in SimAllocation
is globally unique, regardless of how many times monteCarloBasic
has been called. An easy way to do this is to change ident to
use a database sequence or GUID, but there are many other solutions.
You probably want to group together SimAllocations from a
particular call to monteCarloBasic together, in which case
you would add a run_id column to SimAllocation, where rows
with the same run_id were created in the same call to monteCarloBasic.
I think a primary key of (run_id, session_id/trial_id, stud_id) would
be good.
The thread business is indeed going over my head :S.In this way, monteCarloBasic returns its results as a set of objects that are not attached to any session (either because they are unmapped or are transient <http://www.sqlalchemy.org/docs/reference/orm/sessions.html#sqlalchemy...> instances), which the UI thread uses to update the database. How you pass data from worker threads to the UI thread is dependent on your GUI toolkit.My GUI toolkit is Tkinter?
Never used it, sorry. In general, every UI toolkit has a
message/event queue to which you can post messages from any thread. So
you could do something like:
result = monteCarloBasic(...)
def runs_in_ui_thread():
update_database(result)
ui_toolkit.post_callback(runs_in_ui_thread)
-Conor
By default, deepcopy will make one copy of everything in the object graph reachable by the object you feed it. The scary part is that, unless you also pass in a /memo/ argument to each call to deepcopy, it will copy the entire graph /every single call/. So if you deepcopy the students dictionary and then deepcopy the projects dictionary, each student's allocated_proj attribute will not match any instance in the projects dictionary. This is why a use-case-specific copy function is recommended: it is a lot easier to predict which objects will get copied and which objects will be shared.
Shouldn't it match? I mean the student can only get allocated a project if it exists in the projects dictionary... or is that not the point? By use-case-specific, you mean I'll have to redefine deepcopy inside each class like this: def __deepcopy__(self): something, something? The only two places where this is an issue is for Supervisor's "offered_proj" attribute (a set) where, naturally, each project is an object and in Project where "proj_sup" is naturally a supervisor object :D The usefulness of my data structures comes back to bite me now...
In theory, the following will work, ignoring ORM deepcopy issues
discussed at the beginning of this thread:
memo = {}
copied_students = copy.deepcopy(students, memo)
copied_supervisors = copy.deepcopy(supervisors, memo)
copied_projects = copy.deepcopy(projects, memo)
After you do this, memo will contain a record of all
copied objects. You should examine memo.values() to see if it
is copying more than you expected. If it did copy just what you
expected, then my worries were unfounded.
By use-case-specific, I meant define your own copy_objects function
that explicitly specifies what is copied:
def copy_objects(students, supervisors, projects):
memo = {}
copied_students = {}
copied_supervisors = {}
copied_projects = {}
def copy_student(student):
student_id = id(student)
if student_id in memo:
return memo[student_id]
copied_student = Student()
memo[student_id] = copied_student
copied_student.attr1 = student.attr1
[copy rest of student's attributes]
if you_need_to_copy_students_project:
copied_student.allocated_proj = copy_project(student.allocated_proj)
return copied_student
[define copy_supervisor]
[define copy_project]
copied_students = dict((key, copy_student(student)) for (key, student) in students.iteritems())
copied_supervisors = dict((key, copy_supervisor(supervisor)) for (key, supervisor) in supervisors.iteritems())
copied_projects = dict((key, copy_project(project)) for (key, project) in projects.iteritems())
return (copied_students, copied_supervisors, copied_projects)
As you can see, this makes it clear which objects are copied and
which are shared. In retrospect, I think I assumed you didn't want to
make copies of your supervisors or projects when I recommended the
use-case-specific approach, which kind of violates the spirit of
deepcopy. Oh well, my bad.
class Student(object): [existing definitions] def create_db_record(self): result = StudentDBRecord() result.ee_id = self.ee_id [copy over other attributes] return result class StudentDBRecord(object): passThe create_db_record function... does it have to called explicitly somewhere or does it automatically run?
You have to call it explicitly, e.g.:
for unmapped_student in unmapped_students:
mapped_student = unmapped_student.create_db_record()
# I assume you want "find or create" behavior,
# so use session.merge instead of session.add.
mapped_student = session.merge(mapped_student)
[...]
I think a primary key of (run_id, session_id/trial_id, stud_id) would be goodIf I make them all primary keys I get a composite key right? Within an entire M-C simulation the stud_id's would repeat in groups -- so if there are 100 simulations, each stud_id appears 100 times in that commit. Run_id is a fantastic idea! I'd probably have it be the date and time? Given that the simulation takes a while to run... the time will have changed sufficiently for uniqueness. However, then querying becomes a pain because of whatever format the date and time data will be in... so in that case, what is a GUID and is that something we could give to the Monte-Carlo ourselves before the run as some sort of argument? It would be the same for an entire run but different from run to run (so not unique from row to row, but unique from one run set to the other). Any thoughts on this?
Yes, session_id/trial_id and stud_id
can repeat, and you can still group things together by run_id.
Alternatively, you could add an autoincrementing primary key to
SimAllocation, but I believe it is redundant since the combination (run_id,
session_id/trial_id, stud_id) should be
unique anyway. run_id can definitely be a datetime, but I'm
not sure how well sqlite (it sounds like you're using sqlite) supports
datetimes in queries (see http://www.sqlalchemy.org/docs/reference/dialects/sqlite.html#date-and-time-types).
A GUID (or UUID) is just a 128-bit value (usually random); the benefit
here is you can generate it on the client side and be confident that it
will be unique on the server (to avoid duplicate primary key errors).
Using datetimes or database sequences would also work. You can
definitely pass the run_id as an argument to monteCarloBasic,
or to each object's create_db_record method.
-Conor
memo = {} copied_students = copy.deepcopy(students, memo) copied_supervisors = copy.deepcopy(supervisors, memo) copied_projects = copy.deepcopy(projects, memo) After you do this, memo will contain a record of all copied objects. You should examine memo.values() to see if it is copying more than you expected. If it did copy just what you expected, then my worries were unfounded.I'll let you know how that turns out soonish. While I know it's my data, is there anything you can suggest from your experience that you consider to be "unexpected"?
Expected: students, supervisors, projects, dictionaries of said objects, and other attribute values (strings, ints, lists, etc.). Unexpected: anything else, especially sessions, InstanceState objects, or other ORM support objects.
Yes, session_id/trial_id and stud_id can repeat, and you can still group things together by run_id. Alternatively, you could add an autoincrementing primary key to SimAllocation, but I believe it is redundant since the combination (run_id, session_id/trial_id, stud_id) should be unique anyway. run_id can definitely be a datetime, but I'm not sure how well sqlite (it sounds like you're using sqlite) supports datetimes in queries (seehttp://www.sqlalchemy.org/docs/reference/dialects/sqlite.html#date-an...). A GUID (or UUID) is just a 128-bit value (usually random); the benefit here is you can generate it on the client side and be confident that it will be unique on the server (to avoid duplicate primary key errors). Using datetimes or database sequences would also work. You can definitely pass the run_id as an argument to monteCarloBasic, or to each object's create_db_record method.Also I get why you mention three keys: run_id/guid/uuid and session_id/ trial_id alone won't suffice... but since we know there are unique students (within each single allocation run, etc. So I can get rid of the "ident" then? It serves no other purpose really if I can get a key combo that's unique and works for.
Yes, ident is redundant if you have those three columns.
I am indeed using SQLite3. I take it take my physical database has to
something like:
engine = create_engine('sqlite:///Database/spalloc.sqlite3',
echo=False)?
Looks good.
Also I take it I should generate the UUID (http://docs.python.org/ library/uuid.html) when I call the MonteCarloBasic function right? Since it should be the same for each "call", I take I'll have to generate it before the loop. Additionally, how would I actually query a 128-bit value? Say I have a bit in my GUI where the supervisor can put in a UUID to pull the data off the Database. How would he actually know which UUID to put in? Any ideas?
Yes, one UUID generation per call to monteCarloBasic. As
for knowing which UUID to query on, you can always query distinct
values of the run_id column, e.g. session.query(SimAllocation.run_id).distinct().all(),
and present them as a list to the user. However that doesn't really
help people know which UUID to use. Using timestamps (i.e. columns of
type sqlalchemy.DateTime) instead of UUIDs for SimAllocation.run_id
may improve that situation.
Also once I've got my stuff in the physical database and after my program is done, I'd call session.close() right? How do I access the DB data then? Would I have to write some separate functions that allow me to access the data without using (for example) 'session.query(Student)...`? This way the user (i.e. my supervisor) won't have to keep running the readData, monteCarloBasic, etc functions just to access the DB (that would be poor indeed!).
My impression is that readData is only used to
import/migrate data into the database, and that you wouldn't call it
very often.
Calling session.close() is not necessary if you have a
single global session like you do. You only need it if you are worried
that the database might get modified concurrently by another
transaction (from a different process, session, etc.). Having said
this, session.close() does not prevent you from using the
session later on: it just closes out any pending transaction and
expunges all object instances (including any student, supervisor, and
project instances you may have added/loaded). This ensures that it sees
fresh data for any future queries.
In conclusion, using session.query(Student)... should work
whether you have run monteCarloBasic or not.
-Conor
The most likely cause is if you call session.add(temp_alloc)
after calling session.merge(temp_alloc) for the same temp_alloc
object. I noticed your original monteCarloBasic had two calls
to session.add(temp_alloc); did both get changed to session.merge(temp_alloc)?
If that doesn't work, can you verify that SQLAlchemy's primary key for SimAllocation
matches the database's primary key for sim_alloc? What column
type are you using for uid? Which call to session.merge
is failing (line 163 according to your traceback), the one inside your "for
rank in ranks" loop or the one outside?
Also, since you know you are creating new sim_alloc rows
in the database (instead of overwriting existing ones), you can use session.add
instead of session.merge. This will prevent unnecessary
SELECTs to your database.
-Conor
>> Expected: students, supervisors, projects, dictionaries of said objects,
>> and other attribute values (strings, ints, lists, etc.). Unexpected:
>> anything else, especially sessions, InstanceState objects, or other ORM
>> support objects.
>>
> Actually got some stuff like the following (copy-pasting bits from my
> print output):
>
> (<class 'sqlalchemy.orm.state.InstanceState'>,)
> {'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at
> 0x2d5beb0>, 'proj_id': 1100034, 'postsim_probs': [], 'proj_sup': 1291,
> 'presim_pop': 0, 'own_project': False, 'allocated': False,
> 'proj_name': 'MPC on a Chip', 'blocked': False}
>
> Stuff like that :S
>
I'm not sure what that printout indicates. Try this as your debug printout:
def get_memo_type_count(memo):
retval = {}
for obj in memo.itervalues():
type_ = obj.__class__
retval[type_] = retval.get(type_, 0) + 1
return retval
[perform deep copies]
type_count = get_memo_type_count(memo)
import pprint
pprint.pprint(type_count)
This will tell you, e.g. how may Student objects were copied, how many
InstanceState objects were copied, etc. Remember that you will have to
override __deepcopy__ on your mapped classes or use the
use-case-specific copy function to prevent ORM attributes (such as
_sa_instance_state) from being copied.
> [...]
>> The most likely cause is if you call session.add(temp_alloc) after
>> calling session.merge(temp_alloc) for the same temp_alloc object. I
>> noticed your original monteCarloBasic had two calls to
>> session.add(temp_alloc); did both get changed to
>> session.merge(temp_alloc)? If that doesn't work, can you verify that
>> SQLAlchemy's primary key for SimAllocation matches the database's
>> primary key for sim_alloc? What column type are you using for uid? Which
>> call to session.merge is failing (line 163 according to your traceback),
>> the one inside your "for rank in ranks" loop or the one outside?
>>
> Oh yeah good point, they're separate calls. Basically for the one in
> "for rank in ranks"
> adds for a student getting a project, the other adds if a student
> doesn't get a project since we want
> to track all students (allocated or not, since the state of being
> unallocated is what gives
> us motivation to optimise the results).
>
Your original monteCarloBasic definition had this:
for rank in ranks:
proj = random.choice(list(student.preferences[rank]))
if not (proj.allocated or proj.blocked or proj.own_project):
[...]
session.add(temp_alloc) # #1
break
ident += 1
session.add(temp_alloc) # #2
session.add #1 is redundant since #2 gets called regardless of whether
the student gets allocated a project or not (ignoring exceptions). Just
a minor nitpick.
> Anyway, session.merge() is for overwriting previously existing values
> right? Now thanks to the UUID I can add multiple calls to
> monteCarloBasic() to my physical database :)
>
session.merge gives you "find or create" behavior: look for an existing
object in the database, or create a new one if no existing object is
found. Note that session.merge requires you to completely fill in the
object's primary key whereas session.add does not.
> I basically wrote a small function that, for everytime the
> monteCarloBasic() is called, will append the UUID, the number of
> trials ran and the date-time to a text file. My supervisor would have
> to copy paste that into a GUI text field or the command line but it's
> not that much of a hassle, given the usefulness of the database.
>
Sounds pretty ugly. What if you add extra tables to represent runs
and/or trials?
class Run(Base):
# Having a separate table here gives you nice auto-incrementing run ids
# and lets you attach additional information to a run, such as timestamp,
# human-supplied comment, etc.
__tablename__ = 'run'
id = Column(Integer, primary_key=True)
timestamp = Column(DateTime, nullable=False)
# comment = Column(UnicodeText(100), nullable=False)
trials = relationship('Trial',
back_populates='run',
order_by=lambda: Trial.id.asc())
class Trial(Base):
# Having a separate table here is of dubious value, but hey it makes the
# relationships a bit nicer!
__tablename__ = 'trial'
__table_args__ = (PrimaryKeyConstraint('run_id', 'id'), {})
run_id = Column(Integer, ForeignKey('run.id'))
id = Column(Integer)
run = relationship('Run', back_populates='trials')
sim_allocs = relationship('SimAllocation', back_populates='trial')
class SimAllocation(Base):
...
__table_args__ = (PrimaryKeyConstraint('run_id', 'trial_id', 'stud_id'),
ForeignKeyConstraint(['run_id', 'trial_id'],
['trial.run_id', 'trial.id']),
{})
run_id = Column(Integer)
trial_id = Column(Integer)
stud_id = Column(Integer)
trial = relationship('Trial', back_populates='sim_allocs')
-Conor
The location of the __deepcopy__ method is correct, but there are several problems with the implementation:
So this only overrides __deepcopy__ when I call it for a Supervisor and not for any of the other classes right?
Correct.
-Conor
The pprintout was:
{<type 'collections.defaultdict'>: 156,
<type 'bool'>: 2,
<type 'float'>: 1,
<type 'int'>: 538,
<type 'list'>: 1130,
<type 'dict'>: 867,
<type 'NoneType'>: 1,
<type 'set'>: 932,
<type 'str'>: 577,
<type 'tuple'>: 1717,
<type 'type'>: 5,
<class 'sqlalchemy.util.symbol'>: 1,
<class 'sqlalchemy.orm.state.InstanceState'>: 236,
<class 'ProjectParties.Student'>: 156,
<class 'ProjectParties.Supervisor'>: 39,
<class 'ProjectParties.Project'>: 197}
I think the InstanceStates come from the Supervisor and Project
classes (197+39 = 236)
Sounds right. You will need to override __deepcopy__ on those classes as well.
I assumed you were using the declarative extension
(sqlalchemy.ext.declarative) to generate the table, class, and mapper
in one go. It's not at all necessary: you can define the tables,
classes, and mappers separately. Just use what you are most comfortable
with.
-Conor