Memory corruption when asian characters are encountered during importing into db

19 views
Skip to first unread message

Frank Smith

unread,
Mar 9, 2016, 10:04:27 AM3/9/16
to Sparksee
Hi,
following my previous two recent posts, after dozens of trials in importing into Sparksee db different types of characters I can say that I'm pretty sure there is something I do not understand about asian characters handling.

  type_t nodeTypeId = graph->FindType(L"CNODE");
  type_t CNode;
  type_t relType;
  attr_t CId;
  attr_t CName;
  attr_t RSText;
  attr_t RDataset;
  AttributeList relAttrs;

  if (nodeTypeId==Type::InvalidType) {
    std::wcout << "CNODE has to be created from scratch" << std::endl;
    CNode = graph->NewNodeType(L"CNODE");
    CId = graph->NewAttribute(CNode, L"ID", Long, Unique);
    CName = graph->NewAttribute(CNode, L"NAME", String, Indexed);
  }
  else {
    std::wcout << "CNODE already present" << std::endl;
    CNode = nodeTypeId;
  }

      attr_t nameAttrIdH = graph->FindAttribute(CNode, L"NAME");
      value->SetString(HeadNodeName);

The line which causes the memory leak and subsequent crash is:
oid_t HeadObject = graph->FindOrCreateObject(nameAttrIdH, *value);
This happens only if asian characters are encountered along the way during the importing.

Trying FindOrCreateObject with attribute: 9 and value: ランニングシューズ and line: 1578 *** Error in `./Graph': malloc(): memory corruption: 0x0000000001afc960 ***

If more memory leaks occur, because of something strange happens for the presence of asian characters, the software crashes:


*** Error in `./ConceptNetGraph': free(): invalid next size (normal): 0x0000000005261f20 ***

**** CRITICAL ERROR (SIGNAL NUM 6)

------- Begin of call stack ------

/home/frank/sparkseecpp-5.2.0/lib/linux64/libsparksee.so(_ZN13sparksee_core21CallStackTraceHandler13SignalHandlerEi+0x28) [0x7f9c4a9f59a8]

/lib/x86_64-linux-gnu/libc.so.6(+0x36d40) [0x7f9c4a008d40]

/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7f9c4a008cc9]

/lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7f9c4a00c0d8]

/lib/x86_64-linux-gnu/libc.so.6(+0x73394) [0x7f9c4a045394]

/lib/x86_64-linux-gnu/libc.so.6(+0x7f66e) [0x7f9c4a05166e]

/home/frank/sparkseecpp-5.2.0/lib/linux64/libsparksee.so(_ZN13sparksee_core5Value5ClearENS_8DataTypeE+0x143) [0x7f9c4a9fec73]

/home/frank/sparkseecpp-5.2.0/lib/linux64/libsparksee.so(_ZN13sparksee_core4Link6ExistsERKNS_5ValueERyPy+0x171) [0x7f9c4aa36231]

/home/frank/sparkseecpp-5.2.0/lib/linux64/libsparksee.so(_ZN13sparksee_core9GraphImpl10FindObjectEPNS_7TxGraphEyRNS_5ValueEbRy+0x2b5) [0x7f9c4ab561e5]

/home/frank/sparkseecpp-5.2.0/lib/linux64/libsparksee.so(_ZN13sparksee_core7TxGraph15GraphFindObjectEPNS_5GraphEyRNS_5ValueERy+0x76) [0x7f9c4aa58626]

/home/frank/sparkseecpp-5.2.0/lib/linux64/libsparksee.so(_ZN8sparksee3gdb5Graph10FindObjectEiRNS0_5ValueE+0x23) [0x7f9c4a9ae9d3]

./ConceptNetGraph() [0x403e3c]

/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f9c49ff3ec5]

./ConceptNetGraph() [0x405177]

-------- End of call stack -------


So...my question is:
how to seamlessly handle asian characters in Sparksee?

Frank

c3po.ac

unread,
Mar 9, 2016, 10:20:01 AM3/9/16
to Sparksee
Hi,

This memory corruption makes me think that the problem really are different std library versions.
Can you please try it without using C++11 to compile your application?

Thanks.

El dimecres, 9 març de 2016 16:04:27 UTC+1, Frank Smith va escriure:

Frank Smith

unread,
Mar 9, 2016, 12:14:54 PM3/9/16
to Sparksee
Hi,
everything I do is based on the very stable, and already pretty old (but not for gcc) C++11....so...why should I use a different compiler in order to make Sparksee working with asian characters?
Any other ideas?

Frank

c3po.ac

unread,
Mar 10, 2016, 3:05:40 AM3/10/16
to Sparksee

Hi,

It's not just for asian characters.

The Sparksee API needs to receive or return unicode strings in several methods. In order to make it easier to use it in very different platforms and to build interfaces for other languages easier (java, .NET, ObjectiveC, python) std::wstrings was the type chosen for the string arguments instead of just using something like arrays of wchar_t.

In most languages, that decision is irrelevant to the user because the api of the specific language layer use the native String class of the language instead of wstrings, and is the specific language layer of sparksee who does the conversion.

But in the C++ language, you are directly using the base api methods that expect to receive std::wstrings. The problem is that there are several different implementations of the standard library. In the other languages, that's not a problem because both the C++ kernel and the specific language layers will be compiled with the same compiler options.

But with C++ if you set your compiler to use a specific implementation of the standard library (in this case a C++11 library) and Sparksee is compiled with a completely different standard library, the implementation of the std::wstring class that you are passing as an argument to a Sparksee api method can be completely different that the implementation of the std::wstring class that the Sparksee api expects to receive.

Imagine that the c++11 implementation of the wstring class was an array of utf8 encoded bytes and the stdlib implenetation of wstring was implemented as a double linked list of wchar_t 4 byte characters. If you pass a pointer of the first implementation to a function expecting a pointer to the second implementation of the class a crash is unavoidable. In reality both implementations will not be that different but different enough to be a problem. The problem even exists between release and debug versions of the library in some platforms.

The only way that it can work is if you pass the arguments in the format expected, you could transform a c++11 wstring to a stdlib wstring but using both libraries in your application an doing the conversion would be very difficult. It's a lot easier to just use the same standard library in both the application and the Sparksee library.

In order to use the same versions, if you don't need the C++11 features, you could just use the old standard library, so your application will use the same library as Sparksee. That was the solution I suggested.

If you want to use C++11 then you need a version of Sparksee compiled using the same version as your application.

Best regards,


El dimecres, 9 març de 2016 18:14:54 UTC+1, Frank Smith va escriure:
Reply all
Reply to author
Forward
0 new messages