Maximum amount of data that APSIC Xbench can handle in a single project?

99 views
Skip to first unread message

Michael Beijer

unread,
Jul 19, 2013, 6:27:20 PM7/19/13
to cafetra...@googlegroups.com
Anyone here happen to know the maximum amount of data that Xbench can handle in a single project? Both the new and old versions.

I remember hearing that it runs in RAM. Would this preclude using it to index 30 GB of TMXs?

Michael

Selcuk Akyuz

unread,
Jul 20, 2013, 10:13:36 AM7/20/13
to cafetra...@googlegroups.com
Hi Michael,

I have a one year subscription for the new version of XBench but I rarely use it. It is not a database program so loading (and searching) 30 GB of data will take considerable time.

ApSIC Xbench is not an indexer. It reads the contents of the files each time you load a project and tries to respond to terminology queries very fast. Therefore, loading all of the Microsoft glossaries for a major language can take several minutes and will require a significant amount of memory. If you plan to load huge amounts of reference in the range of tens of millions of words, it is strongly recommended to have at least 1 GB of memory. If you don't have a very powerful machine, we do not recommend loading all Microsoft glossaries but a more focused selection with the subject areas that pertain to your current translation project.

I would use External Database feature in CafeTran and even Rendezvous Memory Server for such a large resource. External Databases have limitations for termbases (glossaries) as discussed in one of my previous  posts but it can be good for TMX files. What about the API for  MemoQ?  

Selcuk

Hans van den Broek

unread,
Jul 20, 2013, 10:33:01 AM7/20/13
to cafetra...@googlegroups.com

On Jul 20, 2013, at 9:13 PM, Selcuk Akyuz wrote:

I would use External Database feature in CafeTran and even Rendezvous Memory Server for such a large resource. External Databases have limitations for termbases (glossaries) as discussed in one of my previous  posts but it can be good for TMX files.

I must have missed that posting, Selçuk. It's a pity, because it may explain some things I noticed yesterday. Earlier, I loaded a 1.5 GB TMX file in the external database, and everything was fine. Loading didn't take long, searching wasn't slower than when using the regular - RAM - approach. But yesterday, I loaded a 500 MB text file into the external database, and that took about half an hour. Since it was only a test, I checked RAM use when loading, and it topped at 6.5 GB, more than a 1.5 TMX file (plus some other, smallerTMX files) uses in the regular mode, and after loading, the Activity Monitor still indicated that 6.5 GB was in use. In which thread can I find your posting on this matter?

Cheers,

Hans

-- 

Hans van den Broek
Schrijf-, vertaal- en redigeerwerk
Peleman Rejowinangun KG1/513
RT029 RW009
Yogyakarta 55171
Indonesia
SKYPE: hanstranslations
Ah, but I was so much older then
I'm younger than that now



Hans van den Broek

unread,
Jul 20, 2013, 10:38:27 AM7/20/13
to cafetra...@googlegroups.com
And while I'm at it, another posting on the subject on ProZ by me still hasn't been approved, most likely because it contains the word "bastards" (as in "big bastards" meaning "big TMX files): http://www.proz.com/forum/apple_mac_operating_systems/253039-how_to_search_very_large_tmx_files_on_a_mac.html


[quote]John Moran wrote:
Assuming you have more than 4GB RAM OmegaT has no problems with 3GB TM's[/quote]
I'm pretty sure <i>der Wilhelm</i> is well aware of that solution. Like me, he uses CafeTran. Loading and searching large files in CT is no problem, and you can even run two instances of CT at the same time. And if that isn't enough, you can load a huge TMX file as an "external" database, in which case it uses very little RAM. Let me rephrase Wilhelm's question:

<i>How can I search large TMX (and other) files on a Mac, <i>outside</i> my CAT tool.</i>

There are two problems with that:
 You can't open documents (not files) exceeding around 350 MB on a Mac with apps that don't run under Java (I don't know if there are other solutions, but I doubt it)
 Spotlight/SpotInside cannot search TMX files

So to search those beasts, you need a Java application, or you (still) need a Java application to open the TMX file, convert it to TXT, split it into files OS X can handle, i.e. smaller than 300 MB.

I still don't know how to do it. I tried Martin's solution (above), but a 1.5 GB TMX file didn't open in MacVim. I tried to increase the Java heap for MacVim to no avail, mainly because MacVim isn't a Java app.

Der Wilhelm suggested UltraEdit (Java). The new beta can split files it seems, so that could be a solution. I downloaded the latest build which can't split.

I spent so many hours on trying to solve the issue, I could have learned the contents of those databases by heart. I'm sick of it. But I'm sure everybody knows we're talking about the EU files (DGT and Eurobook), and I happen to translate EU notifications. What's worse, from two source languages - ENG and GER - to DUT. I need those big bastards.

Cheers,

Hans

-- 

Hans van den Broek
Schrijf-, vertaal- en redigeerwerk
Peleman Rejowinangun KG1/513
RT029 RW009
Yogyakarta 55171
Indonesia
SKYPE: hanstranslations
Ah, but I was so much older then
I'm younger than that now


Hans van den Broek

unread,
Jul 20, 2013, 10:53:55 AM7/20/13
to cafetra...@googlegroups.com

On Jul 20, 2013, at 9:33 PM, Hans van den Broek wrote:

and after loading, the Activity Monitor still indicated that 6.5 GB was in use.

My educated (though not in this matter) guess is, that the TXT file is loaded in the RAM rather than on the HDD, because OS X won't handle files larger than around 350 MB.

Selcuk Akyuz

unread,
Jul 20, 2013, 11:14:10 AM7/20/13
to cafetra...@googlegroups.com
Hi Hans,

See my questions and Igor's explanations here https://groups.google.com/forum/#!topic/cafetranslators/yMuz8OmLG54

Selcuk


On Saturday, July 20, 2013 5:33:01 PM UTC+3, Hans van den Broek wrote:

On Jul 20, 2013, at 9:13 PM, Selcuk Akyuz wrote:

I would use External Database feature in CafeTran and even Rendezvous Memory Server for such a large resource. External Databases have limitations for termbases (glossaries) as discussed in one of my previous  posts but it can be good for TMX files.

Igor Kmitowski

unread,
Jul 20, 2013, 4:23:36 PM7/20/13
to cafetra...@googlegroups.com
Hello Hans,

> On Jul 20, 2013, at 9:13 PM, Selcuk Akyuz wrote:
>
>> I would use External Database feature in CafeTran and even Rendezvous
>> Memory Server for such a large resource. External Databases have
>> limitations for termbases (glossaries) as discussed in one of my
>> previous posts but it can be good for TMX files.
>
> I must have missed that posting, Selçuk. It's a pity, because it may
> explain some things I noticed yesterday. Earlier, I loaded a 1.5 GB TMX
> file in the external database, and everything was fine. Loading didn't
> take long, searching wasn't slower than when using the regular - RAM -
> approach. But yesterday, I loaded a 500 MB text file into the external
> database, and that took about half an hour. Since it was only a test, I
> checked RAM use when loading, and it topped at 6.5 GB, more than a 1.5
> TMX file (plus some other, smallerTMX files) uses in the regular mode,
> and after loading, the Activity Monitor still indicated that 6.5 GB was
> in use. In which thread can I find your posting on this matter?

What kind of a text file did your try loading into the External DB? It is
not meant for documents although the idea sounds interesting.

Igor

--
Igor Kmitowski
Translator and Java developer
CafeTran website: http://www.cafetran.com
CafeTran support: cafetran...@gmail.com

Hans van den Broek

unread,
Jul 20, 2013, 7:21:45 PM7/20/13
to cafetra...@googlegroups.com

On Jul 20, 2013, at 10:14 PM, Selcuk Akyuz wrote:

See my questions and Igor's explanations here https://groups.google.com/forum/#!topic/cafetranslators/yMuz8OmLG54

Thank you, Selçuk. Now I remember why I forgot it... Too difficult for mere mortals.

Hans van den Broek

unread,
Jul 20, 2013, 7:24:29 PM7/20/13
to cafetra...@googlegroups.com

On Jul 21, 2013, at 3:23 AM, Igor Kmitowski wrote:

> What kind of a text file did your try loading into the External DB? It is not meant for documents although the idea sounds interesting.

A tab delimited version of a DGT. Entries too long?
Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
Message has been deleted
0 new messages