large strings and JPL

21 views
Skip to first unread message

Eric Kow

unread,
Sep 29, 2014, 9:12:11 AM9/29/14
to swi-p...@googlegroups.com
Hi all,

I'd been having some trouble lately with some Prolog code that wraps a Java library via the JPL. I have a workaround for it, but I wanted to check with others to see if I have the right understanding and solution.

I have a program that uses Java to tokenise a large text file, and which retrieves the text fragment that corresponds to each token found.  If the text is large (eg. 4788 words), the JVM runs out of heap space and dies.  The code looks something like the below:

get_snippets(LargeString, Js, Snippets) :-
     maplist(get_snippets(LargeString), Js, Snippets).

get_snippet(LargeString, J, Snippet) :-
     jpl_call(J, getSnippet, [LargeString], Snippet).

After poking around with visualvm, I came to believe that making a lot of jpl_call invocations with the same large string results in there being many duplicate java.lang.String and underlying char [] instances for that string. I'm not entirely clear on why this is the case, or what happens to these strings (I notice that we still reach heap exhaustion if we force JVM to gc on each token, just much more slowly), but for now I've worked around the issue by wrapping the string so that I can pass a shared object reference to it.

get_snippet(Text, Js, Snippets) :-
     jpl_new('java.lang.StringBuffer', [Text], Text2),
     maplist(get_snippet(Text2), Js, Snippets).

That seems to solve my problem (actual code below):


There's a slight wrinkle to my solution, though. It seems like there's a JPL special case which means I can't create a java.lang.String object, and instead have to use some other type. Fortunately, the underlying library accepts any java.lang.CharSequence, so I was able to substitute java.lang.StringBuffer in its place (and solely for the virtue of not being java.lang.String).
 
Does anybody have any insight into how the JPL behaves with strings?
Have I understood the problem correctly, or am I missing something interesting?

Thanks!

Eric

Paul Singleton

unread,
Sep 30, 2014, 6:29:55 AM9/30/14
to swi-p...@googlegroups.com
Hi Eric, and anyone else interested...

java.lang.String instances are immutable, and Java compilers treat them like atoms, interning them in a string pool.

JPL maps Prolog (classic) atoms to & from String instances.

Unfortunately (for performance and efficiency, not for correctness) JVMs do not intern Strings created dynamically via the JNI.


You can override the default JVM stack sizes/limits by calling jpl_get_default_jvm_opts/1 before the first real JPL call initialises the JVM (option syntax e.g. '-Xss200m' may be JVM-specific).

It seems some modern JVMs permit explicit interning of dynamically created String instances, but JPL by design relies only on the JNI contract which all conforming JVMs must implement.

You need a workaround which doesn't involve surgery on JPL ;-)

later - Paul Singleton

Eric Kow

unread,
Sep 30, 2014, 6:57:20 AM9/30/14
to Paul Singleton, swi-p...@googlegroups.com
Thanks very much for the background information, Paul!

Assuming I'm reading correctly, I feel a bit reassured that my
wrap-it-in-StringBuffer solution is just about the best I can do for
now.

Cheers,

Eric
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "SWI-Prolog" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/swi-prolog/sLlpsQdXNLQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> swi-prolog+...@googlegroups.com.
> Visit this group at http://groups.google.com/group/swi-prolog.
> For more options, visit https://groups.google.com/d/optout.



--
Eric Kow <http://erickow.com>
Reply all
Reply to author
Forward
0 new messages