--
You received this message because you are subscribed to
the "Pick and MultiValue Databases" group.
To post, email to: mvd...@googlegroups.com
To unsubscribe, email to: mvdbms+un...@googlegroups.com
For more options, visit http://groups.google.com/group/mvdbms
---
You received this message because you are subscribed to the Google Groups "Pick and MultiValue Databases" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mvdbms+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mvdbms/CAPZXEvPtSh95pHvVsxb9CGe6mWkP3w8-t5E4DbRX3AWRWq1viw%40mail.gmail.com.
Hi Alberto,
OpenQM should be able to handle files of this size or even larger. There is nothing in your posting to suggest that the process has died. It is just taking a long time.
It is useful to understand what happens when populating a large file. In your example, the data is coming from a CSV file and I assume is being read line by line using READSEQ or similar. Performance of each READSEQ will not be affected by the total size of the data being imported.
The imported data then has to be written to the target file. The position of a record in the file is determined by the hashing algorithm and it is likely that this varies widely from one record to the next. For a small file, the group level caching of the system means that the target group is often already in memory and can be updated very quickly. For a large file, there is a high probability that the group must be read, updated and then written (though the write may not immediately go to disk).
The impact of this is that loading a file that is smaller than the memory space available for caching group buffers is fast but, as the file grows beyond the cache size, we reach the situation where we do a read/write pair for each record. Performance ultimately becomes dependent on the speed of the disk.
I think that if you leave it running, your data import will eventually finish, however, we need to consider whether the file created will actually be usable from a performance point of view. If you do a query processor command with a selection clause, this must read the entire file, checking whether each record meets the selection criteria. This is going to take a while with an 80Gb file.
If the file has indices, the select is much faster as it simply reads the index record, however, updating the index on a write or delete requires us to read/update/write the index record. It is likely that some of the index records will themselves be enormous, impacting performance.
You do not give any indication of what this data represents or how it is used but it may be helpful for me to give a simple example of how distributed files can give a big improvement in some selection operations. For the purpose of this explanation, I am going to assume that each record has a date field and that you have historic data going back many years.
Instead of storing all the data in a single file, you could split it up so that each year or perhaps month had a separate data file. Now, if you want to select record from a specific year (and maybe other criteria) you can do your selection against just the relevant year's file. This can give substantial performance benefits.
But, of course, there are parts of your application that rely on it all being in one data file. No problem. You can define a "distributed file" that links all of the data file together and can be processed exactly as though it really was one file. A distributed file holds no data. It is simply a set of pointers to the individual part files that are to be linked as though they are one file. If the partitioning went to the one file per month level, you could define multiple distributed files giving different views of subsets of the data. There is an example of this in the OpenQM documentation.
On Jul 1, 2022 at 3:41:01 AM, euobeto <ees....@gmail.com> wrote:
Hello everyone,
i'm starting to build a big file in OpenQm, its a dynamic file, so far has 3 GB and over 5990000 (every time i tried do select to know how many files my computer freezes hahah), i'm importing some cvs data, its over 80GB and billions itens ( with 32 attributes each)... and i read something about split file or multifiles... but i didn't understand how to use... does any one has some tips about this ?
Thanks
--
Alberto Leal
T.I Campo Grande
LPI ID: LPI000191272
E-mail: alb...@tecwebcg.com
Gmail: ees....@gmail.com
================================
Rocket Software, Inc. and subsidiaries ■ 77 Fourth Avenue, Waltham MA 02451 ■ Main Office Toll Free Number: +1 855.577.4323
Contact Customer Support: https://my.rocketsoftware.com/RocketCommunity/RCEmailSupport
Unsubscribe from Marketing Messages/Manage Your Subscription Preferences - http://www.rocketsoftware.com/manage-your-email-preferences
Privacy Policy - http://www.rocketsoftware.com/company/legal/privacy-policy
================================
This communication and any attachments may contain confidential information of Rocket Software, Inc. All unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please notify Rocket Software immediately and destroy
all copies of this communication. Thank you.
--
You received this message because you are subscribed to
the "Pick and MultiValue Databases" group.
To post, email to: mvd...@googlegroups.com
To unsubscribe, email to: mvdbms+un...@googlegroups.com
For more options, visit http://groups.google.com/group/mvdbms
---
You received this message because you are subscribed to the Google Groups "Pick and MultiValue Databases" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mvdbms+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mvdbms/SA0PR07MB767628FB9E0B21CF82F7D49AC8BD9%40SA0PR07MB7676.namprd07.prod.outlook.com.
--
You received this message because you are subscribed to
the "Pick and MultiValue Databases" group.
To post, email to: mvd...@googlegroups.com
To unsubscribe, email to: mvdbms+un...@googlegroups.com
For more options, visit http://groups.google.com/group/mvdbms
---
You received this message because you are subscribed to the Google Groups "Pick and MultiValue Databases" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mvdbms+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mvdbms/1cbf8031-a648-de19-5db5-3222817c3de7%40youngman.org.uk.