Collections vs Relationships

74 views
Skip to first unread message

Matthew

unread,
Apr 21, 2008, 11:26:49 PM4/21/08
to intersystems.public.cache
I have a situation where ClassA contains several members of ClassB,
and the members of ClassB are sequentially ordered. This can be
modelled as either a list of ClassB or as an array of ClassB using
integers as the key, or as a one-many relationship with a key property
in ClassB.

What are the relative strengths and weaknesses of these various
possibilities?

I regularly need to traverse ClassB sequentially, and to do SQL
searches that pick a ClassA with ClassB elements with particular
orders and property values.

What do you suggest?

Most of my logic is server side Cache Basic scripts.
Thanks,
Matt

dawn

unread,
Apr 22, 2008, 2:05:32 PM4/22/08
to intersystems.public.cache
This might be irrelevant from your perspective, Matt, but given that
the large MultiValue implementation within the Cache' platform is
based on "list of" properties within the Cache' classes, the MV folks
will surely want excellent support for such properties across the
board -- from Zen to XML to SQL... Our persistent classes (ones that
subclass %MV.Adaptor) implement these lists differently within the
globals, but when working through the class structure, the feature set
for MV and for you should be identical.

So from my perspective, I like to see non-MV folks using lists too.
That way any features we have for MV will be available to non-MV and
vice versa.

Again, this is just "for what it's worth" -- it might be neither here
nor there from your perspective. cheers! --dawn

Eric

unread,
Apr 23, 2008, 7:41:17 AM4/23/08
to intersystems.public.cache


> What are the relative strengths and weaknesses of these various
> possibilities?

From an object perspective, the API is the same.
From a SQL perspective, Relationships are projected as true relations
but Lists and Arrays are not.
AFAIK, lists are also plagues with the 32k limit but Relationships are
not.

I am interested in others experience also...

Benjamin Spead

unread,
Apr 23, 2008, 8:07:17 AM4/23/08
to intersystems...@googlegroups.com
I believe that arrays are projected as a child table which can be projected as SQL. List contents are not projected for SQL.

Also, Lists have a 32K limit (unless you have turned on long string support).

dawn

unread,
Apr 23, 2008, 9:31:53 AM4/23/08
to intersystems.public.cache
On Apr 23, 7:07 am, "Benjamin Spead" <Benjamin.Sp...@intersystems.com>
wrote:
> I believe that arrays are projected as a child table which can be projected as SQL.  List contents are not projected for SQL.
>
> Also, Lists have a 32K limit (unless you have turned on long string support).  

Ahhh, bummer. I project all lists as SQL "child tables" but all MV
sites have long strings turned on. Is there are a reason that sites
would not want to turn on long strings? --dawn

Benjamin Spead

unread,
Apr 23, 2008, 10:14:31 AM4/23/08
to intersystems...@googlegroups.com
Dawn,
My guess is that most sites that are not turning on long string support are not doing so because it is a new feature turned off by default. If they find that they need it, I am guessing that they turn it on at that point.
- Ben

Wolf Koelling

unread,
Apr 23, 2008, 10:34:22 AM4/23/08
to intersystems.public.cache
Arrays are automatically projected to Sql as child tables, lists
aren't. A list collection is visible as a single field only in Sql BUT
you can search for individual items in the list.

Simply define an index on the individual elements of the list property

Index MyIndex On MyProperty(ELEMENTS)

and then use the following syntax in your sql statement:

SELECT...
FROM MyTable
WHERE FOR SOME %ELEMENT(MyProperty) (%VALUE = :somevalue)

(This would generally be more useful for a list of datatypes then a
list of objects, unless the pointed-to class uses a "real data"
primary key rather than an auto-generated id).

A list collection provides you with one value per element, whereas an
array is a collection of key/value pairs.

As was pointed out before a list collection is not suitable where the
number of elements is potentially very large or unknown (because the
entire list is stored in a single global node, whereas the individual
array elements occupy a global node of their own in a subtree to the
main global node.

Relationships are bi-directionial, you can access the one from the
many and vice versa. They provide for referential integrity for both
objects and sql: CASCADE update/delete in parent/child relationship
whereas deletion of the one item is prohibited in a one/many
relationship (the latter may not always be what you want but is not
negotiable. If you use a list or array collection on the other hand
you can define a DELETE Trigger and set it to what you want BUT this
is sql only and you'd have to code the corresponding object logic
separately)

Wolf

On 23 Apr, 13:07, "Benjamin Spead" <Benjamin.Sp...@intersystems.com>
wrote:

dawn

unread,
Apr 23, 2008, 2:46:09 PM4/23/08
to intersystems.public.cache
Makes sense. One thing I would like to see (for various reasons) is
that at least some of the work done for the MV implementation be
useful to other Cache' users too, rather than just a world unto
itself. So another way to write:

"List contents are not projected for SQL Also, Lists have a 32K limit
(unless you have turned on long string support)."

that might help integrate the two worlds (pick and mumps) a bit more
might be "Both arrays and lists are projected for SQL. There is no
longer a 32K limit on lists. You simply need to turn on long string
support as of version xyz."

ISC will also be implementing more support for lists with arity > 1
(an MV/Pick way of handling one type of child table) too, which might
be of interest to others in the future (that is, if it is implemented
so that more than just MV folks can benefit).

cheers! --dawn

On Apr 23, 9:14 am, "Benjamin Spead" <Benjamin.Sp...@intersystems.com>
> > I am interested in others experience also...- Hide quoted text -
>
> - Show quoted text -

Matthew

unread,
Apr 23, 2008, 7:15:51 PM4/23/08
to intersystems.public.cache
So, to sum up, the choice seems to be between arrays and
relationships. What about my need though for the sequential ordering
of the children in 1:many? If I use a relationship, do I simply add an
order property as %Integer to each child, and then retrieve using this
field?

dawn

unread,
Apr 24, 2008, 9:03:41 AM4/24/08
to intersystems.public.cache
Matthew -- could you clue me in to what it is that makes you take
lists off your list? That sets off a big red flag to me since my
entire application currently under construction is set on a platform
of Cache' properties that are either "single-valued" or lists. If
typical Cache users are running away from lists for some reason I
really want to understand that. Thanks. --dawn
> > > - Show quoted text -- Hide quoted text -

Mark Sires

unread,
Apr 24, 2008, 12:12:34 PM4/24/08
to intersystems...@info2.kinich.com
Dawn,

You won't like my answer, and Intersystems doesn't agree with me, but..

LISTS ARE EVIL

Ok, boy do I feel better. I've been biting my tongue trying not to spout
off, but, hey, you asked!

The only possible reason I can see that they even came up with lists is that
they needed this abominable thing to pass data back and forth through the
interfaces to non-native development environment, ie .Net, Java, etc.

Arrays are superior in every way I can think of to lists.

Even with large strings turned on, lists are still more limited in size than
an arrays. Arrays can grow to the entire local storage if needed (up to
16m), lists can only get as big as the current long string support (around
3m). If they implemented local access to arrays as a process private global
instead of a local storage array, this limitation would go away also.

You can dynamically and randomly add/remove items from an array. Unless a
later versions have added new features, lists have to be rebuilt to add,
change, or remove an item. On large lists this can have significant
overhead, and is just generally a pain in the neck.

Arrays automatically project as an SQL child table. Lists don't project at
all.

Detecting if a value is a list is difficult, there is no $ISLIST function,
unless they've added it in 2008.x, and attempting to do list functions on a
non-list variable returns an error. Determining if a variable has array
elements is a simple $Data call.

And I'm even more non-standard in my thinking in that a relationship is
better than an array, and a child relationship is a better option than
either a list or an array property, if the total number of elements is not
subject to limits and likely to be a large number of elements. The reason I
say this is that a relationship property will not be loaded into local
memory until referenced (either explicitly or through swizzling), but a list
or child will be loaded into memory every time the object is opened. If you
have an invoice object, for example, with an array or list of the detail
line items, and the invoice has a thousand details, opening the object will
bring all thousand lines into local storage. If the detail line items are
children, then they would only be 'swizzled' and brought into local storage
if you reference the detail lines. If you frequently reference just the
'header' information, especially if you have included an 'invoice total' and
'invoice balance' property in the 'header', this can save a lot of overhead.

If someone else can explain why Intersystems has decided that lists are so
wonderful, I would be happy to revise my opinion.

Mark

"dawn" <dawnwo...@gmail.com> wrote in message
news:46452b01-fd21-4645...@y38g2000hsy.googlegroups.com...

Julius Kavay

unread,
Apr 24, 2008, 12:51:16 PM4/24/08
to intersystems...@info2.kinich.com
Mark Sires wrote:

>
> LISTS ARE EVIL

it depends...

>
> Detecting if a value is a list is difficult, there is no $ISLIST function,
> unless they've added it in 2008.x, and attempting to do list functions on a
> non-list variable returns an error. Determining if a variable has array
> elements is a simple $Data call.

first, version 2008.1 introduced a $listvalid(string) also as $lv(string)
function
second, with just one (or two ) line of code you can have the same function in earlier versions too. Put the following line into %ZLANGF00 routine:

// ListValid
ZZLV(x) Public { try { s ok='$lf(x,x) } catch { s ok=0 } q ok }

and voila, you have your $ZZLV(string) or call it listcheck... function.

hth and
have a nice day

Julius
--
///

Matthew

unread,
Apr 24, 2008, 10:51:54 PM4/24/08
to intersystems.public.cache
See Mark's answer below. In my own prototyping using the pythonbind,
we also found lists to be less user-friendly than arrays or
relationships, particularly once you get large amounts of data.

It is the lack of SQL projection that really clinched it.

Matt

unread,
Apr 24, 2008, 11:40:12 PM4/24/08
to intersystems.public.cache
Well, it looks like everyone is talking about different aspects of
lists here !

1. There is the primitive $list structure.

I personally use this everywhere I would have used a delimited string
in the past. It is quicker (much quicker) than using $piece and is
easier to read. It also protects against people wanting to use my
standard delimiter in the data !

Typical uses are for non-default storage global data, and as return
values from methods / functions if I want to return multiple values -
I don't like passing by reference (personal thing).

I see no disadvantage to using the list structure here at all.

If I was expecting a list and it isn't one, causing my code to fail -
good, I have a bug somewhere, I now know I need to find it !


2. There is the list property type

If I want to store small amounts of unordered data or object
references in a property, then I prefer lists to anything else - less
code and less building of classes.

But, if order is important, I will use an array,

or the volumes are expected to be anything except very small (up to 10
items), I use relationships - controlling order is harder, but
possible with a manually defined IDKEY property.

The way lists of datatypes are projected via ODBC is fine for me, and
if I'm using them, then I'm not using SQL within Cache as a major part
of my application (typically searches only which works as Wolf
explained).

So you have a series of building blocks that can do similar things,
each with advantages and disadvantages - choose the right one for your
needs. I have used lists, arrays and relationships in each application
I have built in the last few years !


Matt

dawn

unread,
Apr 25, 2008, 11:52:02 AM4/25/08
to intersystems.public.cache
On Apr 24, 9:51 pm, Matthew <mans...@csu.edu.au> wrote:
> See Mark's answer below. In my own prototyping using the pythonbind,
> we also found lists to be less user-friendly than arrays or
> relationships, particularly once you get large amounts of data.

Although we are specifying a lot of lists in our persistent classes,
they are stored differently in the globals than regular Cache' lists.
They have the same features within the Cache' class system, however.
Some of the less user-friendliness might be mitigated for us by using
MVBASIC enhanced for objects instead of COS. MVBASIC has good
features to handle multivalues, at least when doing standard MV reads
and writes. However, to get concurrency handled in web applications we
are using object %OpenId() and %Save() syntax for reads and writes. I
haven't dived too far into the Cache' object means of working with
lists much beyond GetAt() so far, but I would anticipate a ripple
delete (remove list item in the 5th position and the ones after it
move up), and easy inserts (everything after pushes back). Clearly
I'll have to dig in a bit further.

> It is the lack of SQL projection that really clinched it.

All of our lists are projected for SQL, so I think this is not a
current issue with Cache'. We project them two ways, in fact -- as a
child table and as a delimited list (using a calculated property).

This helps me understand that if there are features missing for lists
within Cache' it might be because non-MV folks are more inclined to
use Arrays than Lists. So, it sounds like it is primarily the
(relatively new to Cache') MV developers who care the most about lists
working great.

cheers! --dawn

sukesh_hoogan

unread,
Apr 26, 2008, 10:28:50 AM4/26/08
to intersystems...@info2.kinich.com
My $0.02

I stopped using lists and arrays long time ago. Did not find any
advantage in using them.

Instead using another class to hold the data and simple object reference
to that class is the preferred approach. The code (COS & SQL) then is so
much more simpler = no messy SetAts, GetAts, RemoveAts and BlahBlahAts.

Regards
Sukesh Hoogan
Bombay, India
[Enterprise Resource Planning & Business Intelligence]

Matthew

unread,
Apr 28, 2008, 3:08:54 AM4/28/08
to intersystems.public.cache
But how do you manage one to many? Do you use relationships?

Jack Heath

unread,
Apr 29, 2008, 6:47:59 PM4/29/08
to intersystems...@info2.kinich.com
Dawn,

Could you provide an example of projecting a list as a child table?
I'm a newbie trying to develop classes for MV without using
CREATE.CLASS, and didn't think that projecting lists as child tables
was possible.

Thanks - Jack


On Fri, 25 Apr 2008 08:52:02 -0700 (PDT), dawn
<dawnwo...@gmail.com> wrote:

>
>All of our lists are projected for SQL, so I think this is not a
>current issue with Cache'. We project them two ways, in fact -- as a
>child table and as a delimited list (using a calculated property).
>
>

>cheers! --dawn
>

sukesh_hoogan

unread,
Apr 28, 2008, 8:56:49 AM4/28/08
to intersystems...@info2.kinich.com
Mathew

For One-to-Many relationship, the object reference is reversed.
The [Many] class references the [One] class.

This may surprise you and others. However, this is only small part of
the overall application logic, which has checks and balances to take
care of the other related issues.

Regards
Sukesh Hoogan
Bombay, India
[Enterprise Resource Planning & Business Intelligence]

http://sukeshhoogan.blogspot.com

Mark Sires

unread,
Apr 28, 2008, 10:50:31 AM4/28/08
to intersystems...@info2.kinich.com
This is why I tend to use a lot of parent/child relationships. Since I have
a property in both the parent and child that point to each other, I get the
benefits that Sukesh gets using this method, but when I need to get to the
'children' from the parent, I have a property that gets me there easily.
Since child table is a separate table, it always gets projected the way I
expect it to, not the way the compiler decides to...
Mark

"sukesh_hoogan" <"sukesh_hoogan <at> wrote in message
news:4815c...@info2.kinich.com...

sukesh_hoogan

unread,
Apr 28, 2008, 12:13:13 PM4/28/08
to intersystems...@info2.kinich.com
IMO, the data structure must depend upon the status of the object(s) -
an independent existence or dependent on the existence of another.
For example, a Company/Enterprise is independent object. So is a Person,
he/she may or not be an employee of a Company or can change Company
within a group. If a person is employed by a Company, a property in
Person class references the fact. Moreover, a person/object does not
cease to exist, when he/she is not on the payroll of any of the group
companies ( temporarily or permanently).

A Company may decide to have details of the employee's family (for
whatever reasons). A family member may not be a employee of the Company
to start with. What happens when the member is employed by the Company
or was employed to start with and decides to part ways (in both cases
the/she is still related to an employee)? In my design, there is no
separate class for the family members, if it were so, how does one deal
with the above situations? Delete details from family class and put them
one in employee class or the other way around! Now that is surely going
to break up a lot of references and we would have a royal mess. Another
case - a contact person from a vendor (his/her details - maybe minimal -
are there as part of the VRM/CRM) is hired by the company. Again shift
his details from a "contact/vendor??" class to employee class? Still
another example - an employee retires and is hired as a consultant or
becomes a director of the Company?

Hence, all family members and also persons having dealings with the
group are created in Person class and their roles or any other
references are marked/edited as such (which can change over a period of
time) and the rest of the references remain intact.

All elements are treated as objects, including many many property
attributes and create them only once.

Based on this principle, the schema I use has been designed as such. It
may be very different from the traditional database designs.

To get all the objects in the [many] class based upon the selection
criteria in [one] class. For example - to retrieve the list of the
employees of a particular company.

"SELECT whatever FROM EmployeesClass WHERE CompanyRef->ID=?"

This works just as fine. 99% of my queries are just this type (no
parameters or at most one or two parameters). No extrinsic joins, no
foreign keys. I am no great fan of relational parent-child relationships.

Regards
Sukesh Hoogan
Bombay, India
[Enterprise Resource Planning & Business Intelligence]

http://sukeshhoogan.blogspot.com

Benjamin Spead

unread,
Apr 30, 2008, 5:44:58 AM4/30/08
to intersystems...@googlegroups.com, intersystems...@intersystems.com
Mark,
FYI... the child table is actually subtable stored within the parent table. If you want a truly separate table (with independent storage), then a One/Many should be used.
Cheers :)
Ben

-----Original Message-----
From: intersystems...@googlegroups.com [mailto:intersystems...@googlegroups.com] On Behalf Of Mark Sires
Sent: Monday, April 28, 2008 10:51 AM
To: intersystems...@intersystems.com
Subject: [Caché] Re: Collections vs Relationships

This is why I tend to use a lot of parent/child relationships. Since I have
a property in both the parent and child that point to each other, I get the
benefits that Sukesh gets using this method, but when I need to get to the
'children' from the parent, I have a property that gets me there easily.
Since child table is a separate table, it always gets projected the way I
expect it to, not the way the compiler decides to...
Mark

"sukesh_hoogan" <"sukesh_hoogan <at> wrote in message
news:4815c...@info2.kinich.com...

Matthew

unread,
Apr 30, 2008, 6:11:25 AM4/30/08
to intersystems.public.cache
I find this idea intriguing. Have others structured their objects in
this way? It brings a certain simplicity that I find appealing. So
what then are the drawbacks? And can many-many relationships, of which
I have many, be simply two classes referencing each other directly?
And then let the user sort out the implcations of search results?

And Sukesh, you mentioned other "checks and balances" - could you
please elaborate? Thanks.

Enrico Parisi

unread,
May 2, 2008, 4:28:25 AM5/2/08
to intersystems...@info2.kinich.com
Projecting a list as subtable has always been possible but was a bit
tricky, with recent versions (at least from 5.2, maybe before) it's easier:

Property MyList As list Of %String(STORAGEDEFAULT = "array");

One caveat: if you modify a class where storage has already been
generated for the list property, adding the parameter does not recreate
the storage definition where this behavior is implemented.

Is this documented? Well....kinda, not in DocBook, it's documented in
the class documentation of %Library.CacheCollection, so....if you do
know it exists you can find the documentation of it, otherwise....well,
you may step into it playing with Studio inspector! :)

It's a pity that InterSystems implement (very little, let me say)
features customers have been demanding since ever and not even
mentioning it in the release notes, let alone the documentation.
This thread is an excellent example of how much/many users need this and
how little is known about the fact it is supported since a few versions!
Anyone in the ISC doc group listening?

Here is what the class documentation says:

parameter STORAGEDEFAULT;

STORAGEDEFAULT is the type of storage used for the collection if it is
not already allocated in a DATA definition in the active STORAGE
definition. LIST collections default to list storage and ARRAY
collections default to array storage (SUBNODE structure).


HTH
Enrico

Mark Sires

unread,
May 2, 2008, 12:50:29 AM5/2/08
to intersystems...@info2.kinich.com
Benjamin,

That is one of the reasons I like parent/child. It is not uncommon for me
to want to reference the children once I have referenced the parent. Since
the 'child' rows are likely to be in the same database page as the parent,
referencing the children (unless there are a lot of them) is less likely to
require a page fetch. The overhead I am avoiding if I don't reference the
children is the overhead of opening each of the child objects.

Sukesh's method also works fine. It doesn't have the benefit of the
children being in the same page (not guaranteed, but is frequently true),
but does have other advantages. However, I would caution him against using
the -> join syntax in the WHERE clause of SQL statements. The SQL
deoptimizer, er, optimizer does NOT like this syntax, and will frequently
(about 99.9% of the time as far as I can tell) fall back to reading the
entire table to find matching rows. If you have an index on a property, it
will not be used if referenced by the -> syntax.

Mark


"Benjamin Spead" <Benjami...@intersystems.com> wrote in message
news:81F3EC83B5424847B6D2EF5C4D406705045D9313@Exchange1_backup...

sukesh_hoogan

unread,
May 2, 2008, 3:50:07 AM5/2/08
to intersystems...@info2.kinich.com
Mark

see in-line

Mark Sires wrote:
> Benjamin,
>
> That is one of the reasons I like parent/child. It is not uncommon for me
> to want to reference the children once I have referenced the parent. Since
> the 'child' rows are likely to be in the same database page as the parent,
> referencing the children (unless there are a lot of them) is less likely to
> require a page fetch. The overhead I am avoiding if I don't reference the
> children is the overhead of opening each of the child objects.
>
> Sukesh's method also works fine. It doesn't have the benefit of the
> children being in the same page (not guaranteed, but is frequently true),
> but does have other advantages. However, I would caution him against using
> the -> join syntax in the WHERE clause of SQL statements. The SQL
> deoptimizer, er, optimizer does NOT like this syntax, and will frequently
> (about 99.9% of the time as far as I can tell) fall back to reading the
> entire table to find matching rows. If you have an index on a property, it
> will not be used if referenced by the -> syntax.

Thanks for your comments and the cautionary notice, I really appreciate
it. Did not know that SQL optimizer does not like the -> syntax.
However, I must state that I have not found any 'delays' in fetching
instances using the -> syntax. Almost all my queries are coded and
compiled in the relevant classes - no dynamic queries.

Matthew

unread,
May 6, 2008, 5:59:09 AM5/6/08
to intersystems.public.cache
This entire post has demonstrated at least one thing to me, that the
documentation has lots abuot *how* to structure data but very little
on *why* or data modelling. I have learned of new ways to structure my
existing data, but I still don't enough to make the optimal decision
as to which way to go. I can spend time trying things out, but the
recoding and benchmarking is time-consuming. Anyone willing to
populate a giant database with varieties of relationships and then
test out various ways of manipulation?
> > "Benjamin Spead" <Benjamin.Sp...@intersystems.com> wrote in message
Reply all
Reply to author
Forward
0 new messages