i18n support & database design issues

24 views
Skip to first unread message

D Miedema

unread,
Dec 26, 2011, 2:31:42 PM12/26/11
to ptahp...@googlegroups.com
I am quite enthousiastic about the idea and scope of ptah as a framework for further app development in pyramid.
However, as I was trying to map my use cases on the functionality of ptah, I came across some issues on which I would like to know what the design decisions were, in order to gain a better understanding.

i18n:
I was disappointed that ptah does not provide an opinion on this, since the best location for a 'locale' column would be in the table 'ptah_nodes' or in 'ptah_content' and it is quite an elementary feature. For user-friendliness and seo I would like to have localized urls, so /about-us with the canonicals /nosotros (es) and /uber-uns (de) for instance.  An additional table to link the node with the default locale to the translated ones might also be included in the ptah-scope?
If it was deliberately left out, I would like to know why? (at least include a 'locale' column in 'ptah_nodes'?)

Database structure:
Since ptah has chosen for a relational database, I am suprised that data is not always stored in a normalized way but often in a URN notation.
For instance, I've got the following value in column 'path' in table 'ptah_content' for a record:
"/cms-ptah_minicms-app:75f41c3748c948db821c2fa361d9205a/about-us/"
I think this is flawed and it would be better if it only contained value 'about-us' for the following reasons:
* it already links to the parent node / relative path via the primary key (=foreign key to ptah_node implementing the 1:1 relationship)
* a uuid is already unique, so why prefixing it with the type that is already in ptah_nodes.type in the linked record? Why not use the uuid as the primary key for ptah_nodes?
* now, /about-us gives a 404 and only resolves /about-us/
and why is ptah_nodes.parent not referencing to ptah_nodes.id, but having again a value like 'cms-ptah_minicms-app:75f41c3748c948db821c2fa361d9205a'?
Me thinks it is best to use the strengths of a relational database to the max, including maintaining referential integrity. Resolving the '/about-us/' path now seems to me not the most straight-forward query and involves a substition which could have been avoided.

If anyone of the ptah team could elaborate a bit more on the mentioned design decisions I would appreciate it a lot.

thx in advance, kind regards,

Douwe

Nikolay Kim

unread,
Dec 27, 2011, 11:45:34 AM12/27/11
to ptahp...@googlegroups.com
On Mon, 2011-12-26 at 16:31 -0300, D Miedema wrote:

> i18n:
> I was disappointed that ptah does not provide an opinion on this,
> since the best location for a 'locale' column would be in the table
> 'ptah_nodes' or in 'ptah_content' and it is quite an elementary
> feature. For user-friendliness and seo I would like to have localized
> urls, so /about-us with the canonicals /nosotros (es) and /uber-uns
> (de) for instance. An additional table to link the node with the
> default locale to the translated ones might also be included in the
> ptah-scope?
> If it was deliberately left out, I would like to know why? (at least
> include a 'locale' column in 'ptah_nodes'?)

seems you are talking about i10n rather than i18n. we are planning to
work on i10n. we just didnt have enough time during 0.2 dev cycle.
its not clear yet how to implement i10n, and it probably should be
implemented as external package. any ideas would be useful.


> Database structure:
> Since ptah has chosen for a relational database, I am suprised that
> data is not always stored in a normalized way but often in a URN
> notation.
> For instance, I've got the following value in column 'path' in table
> 'ptah_content' for a record:

> "/cms-ptah_minicms-app:75f41c3748c948db821c2fa361d9205a/about-us/"
> I think this is flawed and it would be better if it only contained
> value 'about-us' for the following reasons:

in ptah you can create any number of application objects
"/cms-ptah_minicms-app:75f41c3748c948db821c2fa361d9205a/",
so you would need new 'root' column.

right now we need only one sql query to traverse paths like
'/folder1/folder2/page.html'

and remember '/folder1' can start with any route_name,
for example you can have cms site at '/', forums at '/forums', and this
apps absolutely independent

> * it already links to the parent node / relative path via the primary
> key (=foreign key to ptah_node implementing the 1:1 relationship)

> * a uuid is already unique, so why prefixing it with the type that is
> already in ptah_nodes.type in the linked record? Why not use the uuid
> as the primary key for ptah_nodes?

uuid could be anything and it can be resolved from different storages,
like mango or zodb.

> * now, /about-us gives a 404 and only resolves /about-us/

thats strange, what is 'about-us' (page or folder)?

> and why is ptah_nodes.parent not referencing to ptah_nodes.id, but
> having again a value like
> 'cms-ptah_minicms-app:75f41c3748c948db821c2fa361d9205a'?

your parent can be anything, uri is application specific, it can be
resolved to any application specific object

> Me thinks it is best to use the strengths of a relational database to
> the max, including maintaining referential integrity. Resolving the
> '/about-us/' path now seems to me not the most straight-forward query
> and involves a substition which could have been avoided.

could you explain how you would do traversing?

here is some feature traversing should support:

* we have multiple application root
* root can be mounted to different urls as route factories.
for example '/' or '/forum' and we need ability to move it to
different route.
* right now we need one sql query to load all objects from path


> If anyone of the ptah team could elaborate a bit more on the mentioned
> design decisions I would appreciate it a lot.
>
> thx in advance, kind regards,
>
> Douwe
>

thanks for your questions, thats what we need :)

contribution in any form are welcome!


Alan Runyan

unread,
Dec 28, 2011, 8:22:04 AM12/28/11
to ptahp...@googlegroups.com
Nikolay replied but I wanted to also add my 2 cents.  We talk about i10n very frequently.  

On Mon, Dec 26, 2011 at 1:31 PM, D Miedema <miedem...@gmail.com> wrote:
I am quite enthousiastic about the idea and scope of ptah as a framework for further app development in pyramid.
However, as I was trying to map my use cases on the functionality of ptah, I came across some issues on which I would like to know what the design decisions were, in order to gain a better understanding.

Great.  The more feedback you provide the better.
 

i18n:
I was disappointed that ptah does not provide an opinion on this, since the best location for a 'locale' column would be in the table 'ptah_nodes' or in 'ptah_content' and it is quite an elementary feature. For user-friendliness and seo I would like to have localized urls, so /about-us with the canonicals /nosotros (es) and /uber-uns (de) for instance.  An additional table to link the node with the default locale to the translated ones might also be included in the ptah-scope?

I can imagine this being in `ptah_content` but not in `ptah_nodes`.  Data support and object model to facilitate content translation, I do not believe, will be in the core of ptah.  I believe `language` column being left out of `ptah_content` is certainly an oversight.  Please provide us your thoughts.  Some i10n issues, that I am aware of:

  * language agnostic fields on content models

  * URL policies:

  * How i10n content relates to versioning & staging (although it is outside of what you bring up. something else we have been thinking about.)

If it was deliberately left out, I would like to know why? (at least include a 'locale' column in 'ptah_nodes'?)

What about `ptah_content`?  Does anyone have any other thoughts?  I was hoping to keep "content-ish" concerns out of the nodes table but locale isnt really content-ish.  

Database structure:
Since ptah has chosen for a relational database, I am suprised that data is not always stored in a normalized way but often in a URN notation.

Good observation they are URNs not URIs (something I wanted to change).

I think this is because:

  a) we come from nosql background and have tendency to push problems into apps

  b) we are worried about tight coupling; especially since we *do* use other data sources in our daily work

  c) honestly we are unsure of the data model; we feel more comfortable as the days go on.  I have flashbacks from my previous life of having to drop/re-enable constraints simply to do data maintenance.  
 
For instance, I've got the following value in column 'path' in table 'ptah_content' for a record:
"/cms-ptah_minicms-app:75f41c3748c948db821c2fa361d9205a/about-us/"
I think this is flawed and it would be better if it only contained value 'about-us' for the following reasons:
* it already links to the parent node / relative path via the primary key (=foreign key to ptah_node implementing the 1:1 relationship)
* a uuid is already unique, so why prefixing it with the type that is already in ptah_nodes.type in the linked record? Why not use the uuid as the primary key for ptah_nodes?
* now, /about-us gives a 404 and only resolves /about-us/

Do you understand what Nikolay said about having multiple applications inside a url space?  You may not agree but do you understand it?  I can document it in the FAQ.
 
Can you confirm  /about-us vs. /about-us/ ?

and why is ptah_nodes.parent not referencing to ptah_nodes.id, but having again a value like 'cms-ptah_minicms-app:75f41c3748c948db821c2fa361d9205a'?
Me thinks it is best to use the strengths of a relational database to the max, including maintaining referential integrity. Resolving the '/about-us/' path now seems to me not the most straight-forward query and involves a substition which could have been avoided.

Coming from ZODB land and having an eye towards Mongo.  We have been feeling that URI are a good way to decouple the  relationship.  The downside of this is that we cannot use referential integrity.  

What are people's thoughts?  Does it make them puke inside their mouths when they see this?

If anyone of the ptah team could elaborate a bit more on the mentioned design decisions I would appreciate it a lot.

thx in advance, kind regards,

Douwe

cheers!


--
Alan Runyan

Skype/Twitter:: runyaga
http://ploud.com/  Plone site in less than 10 seconds

dm73

unread,
Dec 28, 2011, 10:49:38 AM12/28/11
to ptahproject
Nikolay & Alan, thanks for responding.

First, I would like to let you know that I can't reproduce the 404 on /
about-us (folder) anymore, now it responds with 200 on /about-us as
well as /about-us/. No idea what may caused it.

I understand that the datamodel supports multiple roots, and I
appreciate having the possibility to support multiple sites or
applications in one database.

The question which datastore to use for ptah? I am not very strong
opiniated about it. The advantages of a rel. database are that via
sqlalchemy you can give users a broad range of databases to choose
from, and some flavors scale very well and can be tuned for
performance in a lot of ways. Above, it can enforce data integrity and
you can access your data by sql. However, if you choose for a
relational database, I like seeing it used the way you should use
them: a normalized database (may be with some redundant columns for
some specific queries/performance reasons), with the appropriate
constraints to enforce the integrity of the data. And not worry
anymore about portability of the application to nosql databases. It's
like marriage: once you choose for it, stick with it and make the best
out of it.

Regards,
Douwe

Alan Runyan

unread,
Dec 28, 2011, 11:30:46 AM12/28/11
to ptahp...@googlegroups.com
First, I would like to let you know that I can't reproduce the 404 on /
about-us (folder) anymore, now it responds with 200 on /about-us as
well as /about-us/. No idea what may caused it.

ok
 
I understand that the datamodel supports multiple roots, and I
appreciate having the possibility to support multiple sites or
applications in one database.

ok
 
The question which datastore to use for ptah? I am not very strong
opiniated about it. The advantages of a rel. database are that via
sqlalchemy you can give users a broad range of databases to choose
from, and some flavors scale very well and can be tuned for
performance in a lot of ways. Above, it can enforce data integrity and
you can access your data by sql. However, if you choose for a
relational database, I like seeing it used the way you should use
them: a normalized database (may be with some redundant columns for
some specific queries/performance reasons), with the appropriate
constraints to enforce the integrity of the data. And not worry
anymore about portability of the application to nosql databases. It's
like marriage: once you choose for it, stick with it and make the best
out of it.

Maybe we are not clear.  The data model isnt for "ease of porting".
It really is for being able to mix application data between mongo,
postgresql, hadoop, etc.  Our RDBMS of choice is PostgreSQL but
seem to use MySQL quite a bit (for its ease of replication).

So I believe the big bugaboo you have is the parent_id being a URI
not an ID.  is that the main concern?

I think we cant help re: path attribute.  We need them to be in that
format.

What else regarding data model gives you pause?  
There really isnt much there. 

cheers

dm73

unread,
Dec 28, 2011, 12:26:03 PM12/28/11
to ptahproject
Hi Alan,

The additional critique I have on the database design is the URN
content in several columns. You're losing sql power here if you want
to query the data I think, because normally you would split the
contained info in seperate columns (type, parent_uuid, rel_path for
instance).
But hey, I am not the normalization-gestapo, I was just trying to
figure out to what extend it might be useful to me and trying to
understand the underlying assumptions/design decisions. I like the
concept behind ptah.

Cheers too,
Douwe
Reply all
Reply to author
Forward
0 new messages