partitions project's database

33 views
Skip to first unread message

Asi Sudai

unread,
Sep 29, 2014, 8:52:37 PM9/29/14
to python_in...@googlegroups.com
hey :)

Question what you think/know about Horizontal Partitioning of each project data into a "shard" database?
distributed project database in a sense that all the project have the same scheme But they have separated database.

for example projectA have tables (Shot, Asset) and projectB have the same tables with the same fields,
but projectA and projectB are on different database or even servers.

And using a common lookup project table, you can find which project points to which database, for you code to work with

Why we're thinking about it?
having each project individual database have some advantages:
- easier to backup and restore ( much smaller set of data ) 
- easier to archive together with the entire project files ( cause that DB is isolated from the rest of the projects data )
- less table locking wait time, cause each project have different database and tables

badness coming from this?
- complexity ... each item needs a unique_id across all the tables, something like ( projectID_ID )
- complexity ... where do you store common tables ( Users for example )
- you can't do "join" across databases... do get all artists Tasks across shows become multiple queries, one per project

any thoughts?

Justin Israel

unread,
Sep 29, 2014, 9:47:56 PM9/29/14
to python_in...@googlegroups.com

You are referring to this generic wiki about partitioned data, but are you specifically talking about MySQL as your target? The reason I ask is because some databases have built in concepts of sharding and routing. And also some databases may use row locking instead of table locking. Maybe it is best to investigate the right database for your needs before looking at trying to implement a manual mysql partitioning abstraction?

I agree about some of the advantages, but I wonder if they outweigh the disadvantages. It would mean that each time you create a new project, a new database must be created before applications will be able to function. And you have to maintain your own routing logic, which may or may not be a big deal at all.

Has it proven difficult to extract a single projects data set for backups, just using queries? And is your project dataset so large that it needs to be off-lined with each final project?

--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/6dbde745-cfdf-4609-ab09-714f99d37ba4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Asi Sudai

unread,
Oct 1, 2014, 5:09:46 PM10/1/14
to python_in...@googlegroups.com
I'm using a recent release of MySql with sqlAlchemy as the ORM... but could move to a different sql if there's a better one if needed.

Since I never done this before, we started prototyping the database and the code side, to see how complex sharding would be.
the first issue that came up is the ID, it's not unique, there are many Asset with ID 1 for example.
So the way around that is to create a truly unique_id across all the databases. so the Asset ID is 1 but the uniqueID is: somethinglong001101 

the next step would be the code side, how can we hide the complexity of the sharding from the tools and pipeline, we don't want them to "know" about it...

I hope this prototype will show if it's worth it or not... 
worth it or not, I guess it comes down to is the sharding gives us flexible but at the same time the complexity is hidden from our day to day tools/code :)










To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_maya+unsub...@googlegroups.com.

Justin Israel

unread,
Oct 1, 2014, 5:36:38 PM10/1/14
to python_in...@googlegroups.com
But what kind of problems were you actually facing, if I might ask? Did you encounter problems with table locking? If so, is your usage profile write heavy and are you using MyISAM (table locking) vs InnoDB (row locking)? Were your major pain points the archive process? Is your dataset unbelievably large and causing slow queries? 

Just curious what is prompting the investigation into partitioning. Also, it looks like Postgres supports table partitioning:



To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/f3a798ff-1d8f-41da-bd91-9017c522f4bf%40googlegroups.com.

Asi Sudai

unread,
Oct 27, 2014, 12:34:24 AM10/27/14
to python_in...@googlegroups.com
But what kind of problems were you actually facing?
I'll admit it's driven more by curiosity and "philosophy" of separating components more than actual need to overcome a problem.

since sqlalchemy gives an easy framework to do so, it's not very complicated and all the sharding logic is stored in the backend, so the front end tools aren't aware of this at all.

by sharding the database I hope we could:
- easier to backup and restore ( much smaller set of data ) 
- easier to archive together with the entire project files ( cause that DB is isolated from the rest of the projects data )
- set permissions based on a project/shard
- scalability where each shard could live on a different server or location (if needed)

did anyone try/needed that in a production environment?
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_maya+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_maya+unsub...@googlegroups.com.

Justin Israel

unread,
Oct 27, 2014, 2:25:12 AM10/27/14
to python_in...@googlegroups.com


On 27/10/2014 5:34 PM, "Asi Sudai" <asis...@gmail.com> wrote:
>>
>> But what kind of problems were you actually facing?
>
> I'll admit it's driven more by curiosity and "philosophy" of separating components more than actual need to overcome a problem.
>
> since sqlalchemy gives an easy framework to do so, it's not very complicated and all the sharding logic is stored in the backend, so the front end tools aren't aware of this at all.
>
> by sharding the database I hope we could:
> - easier to backup and restore ( much smaller set of data ) 
> - easier to archive together with the entire project files ( cause that DB is isolated from the rest of the projects data )

I would think this is only a concern if you are generating massive amounts of data and find a performance benefit in being able to remove chunks of data from being queried. Then again, if your project fields are indexed anyways, wouldn't queries still be fast when they include a filter on the project?

> - set permissions based on a project/shard

Permissions could be part of the application layer. I'm not sure how sharding the database makes this any easier. If user X is not allowed to access project Y, then you can fail that request at the application level.

> - scalability where each shard could live on a different server or location (if needed)

Or you can just try to improve performance with replicated slaves? Unless your bottleneck is writes, but even then you should be able to accomplish something with replicas.

>
> did anyone try/needed that in a production environment?
>

As mentioned above, I'm pretty sure these problems can either be addressed without partitioning across multiple databases, at the application layer. Or, some of them aren't actually a problem. Like you said, this is currently driven by "curiosity and a philosophy", so it doesn't sound like you are specifically trying to solve a current problem you are facing.

>>>> To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.


>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/6dbde745-cfdf-4609-ab09-714f99d37ba4%40googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.

>>> To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.

> --
> You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/eea91bfb-ca17-404a-839b-ab29d1f6319c%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages