Is OrientDB a good choice for a financial system?

257 views
Skip to first unread message

TomaszK

unread,
Mar 7, 2016, 8:38:05 AM3/7/16
to orient-...@googlegroups.com

I have few questions about OrientDB. Can you please dispel my doubts?
1) Is a document-oriented database like OrientDB, a good choice for financial system? All financial systems I know use relational databases.
2) Is using a LINKSET is good idea if set of links is really big? Its all in one JSON like {"name":"ABC","links":[ref1,ref2,...ref29234]}
Will it work quite fast to add or remove a link when this set is so big?
3) Assuming the use of the scheme-full model does aggregation (SUM) are as fast as in a good relational database?

Best Regards
Tomek

Hung Tran

unread,
Mar 7, 2016, 10:43:05 AM3/7/16
to orient-...@googlegroups.com
Hi Tomek,

Just my idea

1/ That's just because they are old systems, and relational databases are just traditional databases. They have a lot of limitation, and unusable in Big Data scenario.
2/ If set of links is really big, I believe that is a mistake in your database modelling. For references, we should have n->1, and should not 1->n. Even if they are huge, OrientDB is very good at it. Yes, it's extremely fast.
3/ OrientDB now is pretty fast, much faster than traditional databases. Especially in case of doing aggregation.
4/ OrientDB SQL + Fetching Strategy also give you a big advantage than other databases in querying, you will reduce a huge of database hits.

Hope it helps.

My Best,
Hung Tran

TomaszK

unread,
Mar 8, 2016, 3:02:06 AM3/8/16
to orient-...@googlegroups.com
Thank you for your reply Hung Tran.

This is model of the situation where the customer has the documents. They can be very numerous, for example a document is a receipt from the cash register at the grocery store, so can be numerous even in a single day.

I need aggregation, for example, in 2015 total sales. So I have a 1-n relationship with a very large n.
Hence I have in OrientDB JSON client: {"name": ... ,"Documents" [ref_to_doc_1, ..., ref_to_doc_12234]}

Is there any other way, with no adding in the "document" table field with the client id, like in typical relational database?

Best Regards
Tomek

Hung Tran

unread,
Mar 8, 2016, 4:39:35 AM3/8/16
to orient-...@googlegroups.com
Hi Tomek,

In your situation, I will not keep a Documents property with Client class, it means there is only a single uni-directional relationship from Document to Client instead of a bi-directional relationship. If your logic need to know a set of document by a client, you could easily write a query instead, so you could fetch as many document as you want, also count by some criteria, it's extremely fast and you could browse your result set by pages, so there is no limit of number of rows by your query.

In case, you want to keep Documents property, you will need to worry all related code logic, because the entire Documents of a single Client could be fetched into RAM at once by mistakes even that current code logic is not need such data.

Remarks, in typical relationship database, a LINK is a logical concept, that is identified by a Foreign Key value at runtime (that is a root cause why a query with JOIN is extremely slow with large tables). In OrientDB, it's different, it is a physical LINK, the database engine takes zero time to identify and navigate through it to another entities. On the other hand, a single LINK could be added on both side of classes, in that case, I call a bi-directional relationship, otherwise, it's a uni-directional relationship by default.

My Best,
Hung Tran

TomaszK

unread,
Mar 8, 2016, 6:30:00 AM3/8/16
to OrientDB
Thank you

So in class "documents" I should keep foreign key, or a link to a record of Client class. Probebly with index set on it.

Id data aggregation is needed i should use:
"SELECT ... FROM Documents WHERE client = ..."?
Exactly the same as in relational databases.

Regards
Tomek

scott molinari

unread,
Mar 8, 2016, 8:18:49 AM3/8/16
to OrientDB
I am going to go out on a limb, and state some things from my understanding. If I am wrong, please do correct me.

There is an advantage with ODB. With your current query in a relational database, you'd receive the data from the Documents class, but only with the foreign key ID to the client. That doesn't help much, if you need the data from the client in the results too. You'd have to do a join to get the client data.

Instead of "joining" to get the client data in ODB, you'd simply add the projections for the client data to your select.

SELECT client.name, client.billingAddress.street, client, billingAddress.city, etc., etc.

The same goes, if you wanted to filter on data from the client. For instance, "WHERE client.billingAddress.city = 'New York'". With ODB, you don't need the join. You just need the linked reference in the document.

The only disadvantage you have with ODB is, there isn't any referential integrity in all of this (as Hung mentioned) within ODB. You could stick an Id from a totally different class in the link to the client, and ODB won't complain. But, if you query on it, you'll probably run into troubles. In other words, you are responsible in your code to keep up the referential integrity.

If you don't want to do that, then you go with edges. The only thing with edges though is, you are back to a bi-directional links, which means the Client class is also linking back to the documents and this can get troublesome with very large amounts of links, as Hung mentioned.

This is one of the reasons why I wish there were uni-directional links in ODB.   

Scott

Eric24

unread,
Apr 5, 2016, 12:36:34 PM4/5/16
to OrientDB
Just a thought (and I'm relatively new to ODB, so I invite anyone's input if my thinking is wrong)...

When designing a data model for ODB, if you think about it in the same terms as a relational database, you can end up with a model that's pretty close to what you'd come up with there (as you say, "same as in a relational database"). And it will "work", but probably won't be very efficient. My experience is that having a better understanding of how you'll be using/querying the data is much more important with a graph database (not that you wouldn't take this into account with a relational database, but since relational query results are essentially "built on the fly" when you run them, I think it's natural to think less about the relationships between the data and instead focus on the individual tables, "fixing" performance by adding secondary indexes as needed to speed up the joins).

Let's take your example. If you'll be using the data in such a way that you'll need to start with a client and retrieve a list of documents (rather than just being able to query the documents based on the client key), the question I would have is whether a single list of every document would really be all that useful (to my point, I see this as trying to use a LINKSET as a "documents table", thinking in terms of relational database design). But maybe that query (i.e. "return every document that belongs to client X") isn't one that you'd ever really do. Instead, maybe there are different kinds of documents (invoices, receipts, credits, etc.), which might benefit from having a separate LINKSET for each kind of document. Or maybe (more likely since this is a financial application) time is your "grouping vector", so that all documents created on a certain day are related by virtue of when they were created. And in that case, creating some meaningful "layers" might be the answer (i.e. client->years->days->documents or maybe client->years->months->days->documents). Look in the ODB documentation on Time Series for some additional details here--this approach lets you quickly retrieve a document based on a particular day or accounting year or range of days/etc. and is very efficient, because any given LINKSET will never never be particularly large. Of course, this could be done using edges as well, but from your example, a LINKSET might be simpler.

scott molinari

unread,
Apr 5, 2016, 2:51:12 PM4/5/16
to OrientDB
Just remember though, if you do use LINKSETs, it is up to you to always keep up the referential integrity of the links between the linked documents within your application. This is where using edges simplifies upkeep of relationships. 

I'd also say, there will also be a performance issue with vertexes with a lot of links/ relationships (10000s). This is known as the supernode problem within graph databases and this issue hasn't yet been addressed in ODB, AFAIK. 

If someone from ODB is reading, it would be interesting to hear, if supernodes can be considered a possible performance issue or not.

Scott 

Eric Lenington

unread,
Apr 5, 2016, 3:17:49 PM4/5/16
to OrientDB
Yes, that's a good point. Since the OP's application was described as a financial application, I assumed that these documents were created once and essentially never deleted or "moved around", so I didn't figure there was any need to worry about referential integrity. That said, using edges may still be worthwhile in this application for other reasons.

Regarding the supernode problem, unless the number of documents generated on a per-day basis was in the 1000s, that's what I was trying to address by using the "layers of time blocks" (i.e. a year can have no more than 366 days, etc.), but if one day is too large a "lowest level" grouping, then maybe adding an hour layer would solve this (back to needing to know more about the real-world details of the application).


--

---
You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/QlJOzjS28JI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

pabloa

unread,
Apr 8, 2016, 9:36:16 PM4/8/16
to OrientDB
It depends of the latency you are looking for.

If you are ok with a relational database, probably OrientDB is OK too.
If you are looking for microsecond performance and high frecuency trading, then you put all in ram. No Odb, no SQL. 

Your question is too wide. Could you narrow it?

Pablo

Hung Tran

unread,
Apr 10, 2016, 5:07:34 AM4/10/16
to orient-...@googlegroups.com
Hi Pablo,

If you look at OrientDB settings, you will see there are a lot of settings which help you not look at other NoSQL database engines :)

Microsecond performance is possible, but it exists at engine level only. With including output serialization and network latency, the fastest timing is 0.1 millisecond.

My Best,
Hung Tran
Reply all
Reply to author
Forward
0 new messages