How to use UTF8 mode with qewd and yottaDB?

314 views
Skip to first unread message

Annop kobhirun

unread,
Jan 19, 2018, 7:52:10 AM1/19/18
to Enterprise Web Developer Community

Hi everyone

I have a problem about my QEWD app with my thai langauge.
When I Insert the value with thai language in DB via QEWD. the result show in below picture.


When I get the value to show in fronend, the result show in below picture.



So, I try to google and find some solution, that suggest to install ICU and export a variable


export gtm_chset=utf-8 gtm_icu_version=5.5


Then, I see in DB again, and result is correct.



But when I run "node qewd" to start my application, that have a error show in picture below.



Then I try to run nodem example to see the result, and got the error.



I have a newbie in QEWD and GT.M , Maybe this problem it easy, but I dont know the basic enough.

Please suggest me how to solve this problem. 


PS. I use this link for install QEWD and Yotta.



Sorry for my poor english

Regard 


Annop

Screen Shot 2561-01-19 at 18.38.33.png

Sam Habiel

unread,
Jan 19, 2018, 8:28:25 AM1/19/18
to enterprise-web-de...@googlegroups.com
Good morning.

Search this group for issues with Arabic with NodeM. You should find some hints.

--Sam

On Fri, Jan 19, 2018 at 7:52 AM, Annop kobhirun <anno...@gmail.com> wrote:
> Hi everyone
>
> I have a problem about my QEWD app with my thai langauge.
> When I Insert the value with thai language in DB via QEWD. the result show
> in below picture.
>
>
> When I get the value to show in fronend, the result show in below picture.
>
>
>
> So, I try to google and find some solution, that suggest to install ICU and
> export a variable
>
>
> export gtm_chset=utf-8 gtm_icu_version=5.5
>
>
> Then, I see in DB again, and result is correct.
>
>
>
> But when I run "node qewd" to start my application, that have a error show
> in picture below.
>
>
>
> Then I try to run nodem example to see the result, and got the error.
>
>
>
> I have a newbie in QEWD and GT.M , Maybe this problem it easy, but I dont
> know the basic enough.
>
> Please suggest me how to solve this problem.
>
>
> PS. I use this link for install QEWD and Yotta.
>
>
>
> Sorry for my poor english
>
> Regard
>
>
> Annop
>
> --
> You received this message because you are subscribed to the Google Groups
> "Enterprise Web Developer Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to enterprise-web-develope...@googlegroups.com.
> To post to this group, send email to
> enterprise-web-de...@googlegroups.com.
> Visit this group at
> https://groups.google.com/group/enterprise-web-developer-community.
> For more options, visit https://groups.google.com/d/optout.

Rob Tweed

unread,
Jan 19, 2018, 8:30:28 AM1/19/18
to Enterprise Web Developer Community
Sam - I think the issue is in ewd-document-store

If someone wants to figure out how to fix it, that would be good.

Whatever the fix is, it needs to not break ewd-document-store for non UTF-8 users

Rob



> To post to this group, send email to
--
You received this message because you are subscribed to the Google Groups "Enterprise Web Developer Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to enterprise-web-developer-community+unsubscribe@googlegroups.com.
To post to this group, send an email to enterprise-web-developer-comm...@googlegroups.com.



--
Rob Tweed
Director, M/Gateway Developments Ltd
http://www.mgateway.com

Sam Habiel

unread,
Jan 19, 2018, 8:33:17 AM1/19/18
to enterprise-web-de...@googlegroups.com
Thanks Rob.

Annop, can you try to come up with a simple test case illustrating the
problem? I don't see an issue above except with how your terminal is
displaying data (which has nothing to do with how the Database stores
it).

--Sam
>> > enterprise-web-develope...@googlegroups.com.
>> > To post to this group, send email to
>> > enterprise-web-de...@googlegroups.com.
>> > Visit this group at
>> > https://groups.google.com/group/enterprise-web-developer-community.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Enterprise Web Developer Community" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to enterprise-web-develope...@googlegroups.com.
>> To post to this group, send an email to
>> enterprise-web-de...@googlegroups.com.
>> Visit this group at
>> https://groups.google.com/group/enterprise-web-developer-community.
>> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Rob Tweed
> Director, M/Gateway Developments Ltd
> http://www.mgateway.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "Enterprise Web Developer Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to enterprise-web-develope...@googlegroups.com.
> To post to this group, send email to
> enterprise-web-de...@googlegroups.com.
Message has been deleted

DL Wicksell

unread,
Jan 19, 2018, 11:10:43 AM1/19/18
to Enterprise Web Developer Community
Hi Annop,

 The same error is happening when you start qewd.js, and when you test using set.js.
It is happening because you had already compiled the M language file that Nodem uses
to interface with YottaDB/GT.M while you were in the 'M' character set mode, and therefore
that object file won't work while you are in 'utf-8' character set mode. Thankfully, the solution
is simple, remove the v4wNode.o file, and recompile it while in 'utf-8' mode. You will want to
run these commands after ensuring you are in 'utf-8' mode:

cd ~/qewd/node_modules/nodem/src/
rm v4wNode.o
mumps v4wNode.m
cd -

 That should fix your issue. If not, let me know, and I can try to help you further.

David Wicksell
Owner/CEO
Fourth Watch Software, LC

Rob Tweed

unread,
Jan 19, 2018, 11:19:18 AM1/19/18
to Enterprise Web Developer Community
That's interesting, David

One question: what's the command(s) to use to "ensure you are in 'utf-8' mode" before running those commands?

Cheers

Rob


--
You received this message because you are subscribed to the Google Groups "Enterprise Web Developer Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to enterprise-web-developer-community+unsubscribe@googlegroups.com.
To post to this group, send email to enterprise-web-developer-comm...@googlegroups.com.

Annop kobhirun

unread,
Jan 19, 2018, 11:30:36 AM1/19/18
to Enterprise Web Developer Community
Thank you for your help, my test case flow show at below

1.This is my form and my input data ( annop584, password, อรรรนพ, กอบหิรัญ )



2. I Insert data to DB by this command

   var usersGlobal =  new this.documentStore.DocumentNode('USERS');
   usersGlobal.$([id, 1]).value=field1;

3. I open the DB shell to see my result, but the result that I was higlighted have to be a  "$C(25,30)"^"$C(1)"-"$C(26)"+4#1"$C(13)"  
instead of  "อรรนพ"^"กอบหิรัญ"



4. I try to solve a problem by install ICU and chage the gtm_chset from m to utf-8

export gtm_chset=utf-8 gtm_icu_version=5.5
source /usr/local/lib/yottadb/r110/gtmprofile



5. I try to restart the qewd app again by "node qewd.js"  but I got the error "SyntaxError: Need to supply a 'data' property"



--Annop

Annop kobhirun

unread,
Jan 19, 2018, 11:42:31 AM1/19/18
to Enterprise Web Developer Community
That work David!!


I stuck this problem all day, but now it work! 

I'm very appreciated, Thank you so much ^^


--Annop

DL Wicksell

unread,
Jan 19, 2018, 11:56:28 AM1/19/18
to Enterprise Web Developer Community
Rob,

 The easiest way to tell is to go in to programmer mode and look at the $zchset intrinsic variable. Here is an example on my laptop, though I'm not running UTF-8 at the moment.

dlw@gondor:~$ mumps -dir

WV>w $zch
M
WV>


--

David Wicksell
Owner/CEO
Fourth Watch Software, LC


On Friday, January 19, 2018 at 9:19:18 AM UTC-7, rtweed wrote:
That's interesting, David

One question: what's the command(s) to use to "ensure you are in 'utf-8' mode" before running those commands?

Cheers

Rob

DL Wicksell

unread,
Jan 19, 2018, 12:05:04 PM1/19/18
to Enterprise Web Developer Community
Hi Annop,

 You are welcome, I'm glad that was your only issue and you got it all sorted out. By the way, your English is fine, I understood
everything you were saying without difficulty. Good luck and enjoy QEWD, NodeM, and YottaDB!


--

David Wicksell
Owner/CEO
Fourth Watch Software, LC


K.S. Bhaskar

unread,
Jan 19, 2018, 12:06:29 PM1/19/18
to Enterprise Web Developer Community
The mode (M mode vs. UTF-8 mode) must be set before starting the process. $zchset only tells you what mode a process is in. The environment variable gtm_chset can be set to "UTF-8" for UTF-8 mode (anything else, including "M" starts it in M mode).

Regards
– Bhaskar

Rob Tweed

unread,
Jan 19, 2018, 12:15:27 PM1/19/18
to Enterprise Web Developer Community
Ok I think this resolves an issue that a number of QEWD users have identified in the past few months.  I think I have the information needed to explain how to set up / reconfigure QEWD for UTF-8 use

Cheers all

Rob

--
You received this message because you are subscribed to the Google Groups "Enterprise Web Developer Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to enterprise-web-developer-community+unsubscribe@googlegroups.com.
To post to this group, send email to enterprise-web-developer-comm...@googlegroups.com.
Visit this group at https://groups.google.com/group/enterprise-web-developer-community.
For more options, visit https://groups.google.com/d/optout.

Sam Habiel

unread,
Jan 19, 2018, 12:19:22 PM1/19/18
to enterprise-web-de...@googlegroups.com
Rob,

While the problem is solved, I didn't understand how it got solved! I
didn't think GTM/YDB changed how they handled a bytestream from a C
API based on the value of $ZCHSET.

--Sam
>> email to enterprise-web-develope...@googlegroups.com.
>> To post to this group, send email to
>> enterprise-web-de...@googlegroups.com.
>> Visit this group at
>> https://groups.google.com/group/enterprise-web-developer-community.
>> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Rob Tweed
> Director, M/Gateway Developments Ltd
> http://www.mgateway.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "Enterprise Web Developer Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to enterprise-web-develope...@googlegroups.com.
> To post to this group, send email to
> enterprise-web-de...@googlegroups.com.

DL Wicksell

unread,
Jan 19, 2018, 12:53:24 PM1/19/18
to Enterprise Web Developer Community
Hi Sam,

  This will be the over-simplified version, but it is because the current YottaDB/GT.M call-in interface does not call
directly in to the data engine, like some other C APIs for databases might. It actually calls in to the M language
execution engine, and uses a call-in table to map C data structures to M data structures, so data can be passed
back and forth between the different language environments. All of the accessing of the M database is done in the
M code in v4wNode.m, while the C++ code in mumps.cc does all the mapping between the V8 API in C++
(that Node.js uses)and the GT.M call-in API in C. Therefore the character set that affects the encoding of data in
YottaDB and GT.M via the M language, also affects NodeM, and therefore, QEWD when using NodeM, at least as
far as the back end database is concerned.

 However, the YottaDB team is working on a full C DAL (Data Access Layer), which they are calling SimpleAPI I believe.
Once that is completed, there will then exist a C API that accesses the data storage engine directly, bypassing the M
execution engine, and providing a much simpler, and much faster way to interface something like Node.js with YottaDB.
Though FIS may not take that code back in to GT.M, so it may only be accessible from YottaDB, and I'm not sure if it
will include function and procedure APIs, which may work best with the current call-in interface.

 I think part of the confusion is that some vendors will use the term call-in interface, when they are referring to what is
more of a DAL. I hope that clears up the confusion for you. Thanks.



--
David Wicksell
Owner/CEO
Fourth Watch Software, LC


Sam Habiel

unread,
Jan 19, 2018, 12:56:31 PM1/19/18
to enterprise-web-de...@googlegroups.com
David,

I don't think the actual byte steam that gets saved to the database
should change at all from changing the YDB/GTM UTF-8 mode. It's not
like we are switching the interpretation of data from big endian to
little endian. The UTF-8 encoded string should be saved into GTM/YDB
the same way no matter what the UTF-8 mode is. I am missing something
here.

--Sam

K.S. Bhaskar

unread,
Jan 19, 2018, 1:55:55 PM1/19/18
to Enterprise Web Developer Community
As we are attempting to develop YottaDB as a free / open source software (FOSS) project, rather than just putting out periodic releases under a FOSS license, and we are trying to write software to match the documentation, you can see the C API we are building (https://docs.yottadb.com/MultiLangProgGuide/index.html). Caveat: as the software is still under construction, there will be times that we need to change the documentation to match the software – so consider what is there to be a blueprint rather than an map documenting what exists.

As values in a database are just bytes without any meaning or interpretation assigned to them, a database file has no notion of M mode or UTF-8 mode, and the bytes of data that get stored in a database are indeed the same whether a process accessing them is in M mode or UTF-8 mode. Indeed, two processes, one in M mode and another in UTF-8 mode, can access the same database at the same time, and see the same bytes therein.

A process is in either M mode or UTF-8 mode. This means that functions line $length(), $piece(), etc. can return different results on the same input, depending on the mode of the process.

The bytes sent to a terminal are often the same, regardless of mode, but are not required to be. For example, control characters and escape sequences may be processed differently.

The actual bytes sent to a terminal device are rendered according to the terminal settings. Thus, even if a process is in UTF-8 mode, but a terminal emulator is set to a different mode, writing a legitimate UTF-8 string to the terminal device can result in the terminal emulator rendering it as something different. Conversely, a process may be in M mode, but if the terminal emulator is in UTF-8 mode, the process may write binary data to the terminal device, and have it show up as UTF-8 characters.

The object code generated is different between M mode and UTF-8 mode. A mismatch between the mode of the process attempting to execute an object file and that of the object file results in an INVOBJFILE error.

If you want to use both M mode and UTF-8 mode, look at https://docs.yottadb.com/ProgrammersGuide/langext.html#extensions-for-unicode-support

Regards
– Bhaskar
> To post to this group, send email to

DL Wicksell

unread,
Jan 19, 2018, 2:17:30 PM1/19/18
to Enterprise Web Developer Community
Hi Sam,

 I understand your point, and it is not unreasonable. However, there
are a few reasons why it works this way. The data coming through the C
call-in API, is bound to arguments to M functions, so is treated the
same as if you passed those strings of data in regular M code. If you
were in M character set mode, and you passed in a UTF-8 character, it
might be encoded incorrectly in the database. The UTF-8 character
might be considered one character, encoded with more than one byte,
while YottaDB/GT.M might encode it as separate characters. And of
course, there are characters that can be saved in M mode, which are
illegal in UTF-8 mode, and vice versa.

 Maybe this example of just using one UTF-8 character might help. My
terminal is in UTF-8 mode, but GT.M starts off in M mode, and then I
change it to UTF-8 mode. I just copied a UTF-8 character from a web
page here. See the differences?



dlw@gondor:~$ mumps -dir

WV>w $zch
M
WV>s dlw="���"

WV>w dlw

WV>zwr dlw
dlw="�"_$C(130)_"�"

WV>w $l(dlw)
3
WV>w $zl(dlw)
3
WV>h
dlw@gondor:~$ export gtm_chset=utf-8 gtm_icu_version=5.7

dlw@gondor:~$ mumps -dir

WV>w $zch
UTF-8
WV>s dlw="€"

WV>w dlw

WV>zwr dlw
dlw="€"

WV>w $l(dlw)
1
WV>w $zl(dlw)
3
WV>h
dlw@gondor:~$


 I hope I'm explaining this well enough. Anyway, you can see that it
treats bytes of data differently while in UTF-8 mode, and people use
NodeM and QEWD to access a lot of M APIs, where character lengths need
to be correct, as well as other differences. This is not my area of
expertise, but I know that if you simply pass a byte stream all the
way through from QEWD to NodeM to YottaDB, with the web page in UTF-8
mode and YottaDB in M mode, you can have issues. For one thing, the V8
API really wants to work in UTF-8, and I had to write special code to
even be able to pass other characters, from older character sets,
which use the code points between 128 and 255 (different characters in
UTF-8 than in other extended upper-bit character sets, or even illegal
code points used for shifting in UTF-8). Many users of NodeM require
the ability to use it with those older extended single-byte character
sets.

 This is a complex topic, as you know, and we are dealing here with
multiple technologies, passing data back and forth, in complex ways.
So at the end of the day, I can say that QEWD and NodeM will work the
way most people want, if they make sure to set up YottaDB or GT.M with
the character encoding (M or UTF-8) that they need to use. Thanks.



--
David Wicksell
Owner/CEO
Fourth Watch Software, LC


DL Wicksell

unread,
Jan 19, 2018, 2:24:12 PM1/19/18
to Enterprise Web Developer Community
Ok, after reading Bhaskar's reply, which I had missed, I can see that I
was wrong in this first part, "If you were in M character set mode, and

you passed in a UTF-8 character, it might be encoded incorrectly in the
database. The UTF-8 character might be considered one character,
encoded with more than one byte, while YottaDB/GT.M might encode
it as separate characters." It won't be encoded wrong, but interpreted
wrong. But I think the rest of what I wrote is still correct. Thank you.



--
David Wicksell
Owner/CEO
Fourth Watch Software, LC


DL Wicksell

unread,
Jan 19, 2018, 3:00:49 PM1/19/18
to Enterprise Web Developer Community
 Thank you Sam and Bhaskar. Now that I think about it more, I think I should be able
to decouple the encoding in NodeM from the YottaDB/GT.M database character set
mode, such that I can add an 'encoding' parameter to the open API, that can be set
to 'utf-8' or 'UTF-8' on the one hand, or 'm' or 'M' or 'byte' on the other hand, which
will default to whatever the local YottaDB/GT.M database environment is set to, if you
don't specify an 'encoding' parameter. I am working on NodeM right now, so maybe I
can get that working, and you can set the encoding in NodeM directly, and not have to
worry about what which character set mode the YottaDB/GT.M object files were compiled
against. I'll let you know if I can get that going, or run in to any issues with it. Thanks again.

Sam Habiel

unread,
Jan 22, 2018, 10:30:13 AM1/22/18
to enterprise-web-de...@googlegroups.com
"For one thing, the V8 API really wants to work in UTF-8, and I had to
write special code to
even be able to pass other characters, from older character sets,
which use the code points between 128 and 255 (different characters in
UTF-8 than in other extended upper-bit character sets, or even illegal
code points used for shifting in UTF-8)."

Maybe that's the issue? Your example in GT.M/YDB above proved my
point. The storage of what you put in doesn't change whether you are
in M mode or UTF-8 mode.

Can you point to the code in NodeM that does sets?

--Sam
Message has been deleted

David Wicksell

unread,
Mar 12, 2018, 6:50:55 PM3/12/18
to Enterprise Web Developer Community
Sam and everyone,

 So, I'm about to release NodeM version 0.10.0, which will decouple the character encoding that NodeM
uses from YottaDB/GT.M. Meaning, that you will be able to use NodeM in UTF-8 mode (which will now be
the default), or in M mode (you will be able to set that in the call to the open API, as {charset: 'm'}, or ascii,
or binary will also work instead of m; in keeping with the language of YottaDB/GT.M (which uses m), and
Node.js (which uses binary, and for backwards compatibility, ascii). Then NodeM will work correctly regardless
of which encoding is configured in the YottaDB/GT.M environment itself. This will also make it much easier to
support other languages encoded in single bytes directly, such as in the iso-8859-*/Windows-*/CP* standards.
Previously I had described how to convert an application from one of those character sets to UTF-8 and back
again, to support foreign language character sets with NodeM, but that might not work with VistA, as VistA uses
some characters that are not valid UTF-8 characters in some of its globals, specifically 4 that I know of. But
now you will be able to simply use NodeM with:

    db.open({charset: 'm'});

 Then set what character set you are using in your web page, or terminal emulator, or whatever you are using
with NodeM, and directly store those characters in the database, without having to do any conversions.

 The reason it was coupled before was because of the v4wNode.m source file, which is compiled and run in
the same mode that YottaDB/GT.M is configured for in that environment, and I didn't have the insight I gained
from this thread that the character set/encoding was only applicable to the commands and functions in YottaDB/
GT.M source code, and doesn't change the storage of those characters in the database. I probably should have
realized that, but I have fixed it now. This will also mean that you do not even have to configure YottaDB/GT.M
for Unicode, in order to use Unicode with NodeM. You will still have to configure YottaDB/GT.M for Unicode, for
any other use of YottaDB/GT.M with UTF-8 data, as per the documentation.

 I'm really hoping to release NodeM 0.10.0 by the end of this week, or sometime next week, so watch for that.
Another thing that it will support is reverse $query for YottaDB version 1.10 or newer, as the previous_node API.

Thank you.
David Wicksell
Fourth Watch Software LC

David Wicksell

unread,
Mar 12, 2018, 7:20:13 PM3/12/18
to Enterprise Web Developer Community
Everyone,

 I wanted to clarify one thing. My previous post made it sound like there was a big change in how NodeM
handles character set encodings, but really, that is only partly true. There are two changes introduced in
the forthcoming NodeM version 0.10.0 release; the default encoding is now UTF-8, rather than M, and
that encoding system is now decoupled and independent from the underlying YottaDB/GT.M environment
settings with regard to character set encoding.

 Before this change, you could work with single-byte character sets just fine, and in fact, that was the default.
You didn't have to convert single-byte characters to UTF-8 and back again. So the really important and nice
change is that you don't have to care about whether you have installed and configured YottaDB/GT.M with
Unicode support, in order to support Unicode (via UTF-8) while using them with NodeM. NodeM will properly
interpret (store and retrieve) characters, depending upon the charset property in the open API (defaulting to
UTF-9), and not dependent upon what the gtm_chset environment variable is set to. You also don't have to
worry about which version of ICU is installed on your system (while using NodeM).

 However, keep in mind that Node.js will always default to try to interpret your characters as UTF-8, even when
you have charset set to m/binary/ascii. I will probably write up a more detailed post on how to easily use Node.js
properly with NodeM and single-byte non-ASCII character sets.


Thank you.
David Wicksell
Fourth Watch Software LC



Sam Habiel

unread,
Mar 14, 2018, 4:36:08 PM3/14/18
to enterprise-web-de...@googlegroups.com
David,

Thank you for your work. This is difficult to digest. A table may help (maybe?) but I don't understand it enough.

--Sam

--
You received this message because you are subscribed to the Google Groups "Enterprise Web Developer Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to enterprise-web-developer-community+unsubscribe@googlegroups.com.
To post to this group, send email to enterprise-web-developer-comm...@googlegroups.com.

David Wicksell

unread,
Mar 14, 2018, 5:23:55 PM3/14/18
to Enterprise Web Developer Community
Hi Sam,

 Hmmm, well I'm not sure what you didn't understand, but you can always ask me any question you like, and I'll do my best to answer. That
being said, once I release 0.10.0 I plan to write up something a bit more on how to use NodeM to work with character sets, and maybe that
will help? Anyways, thanks for the kind words, and I'll try to be as clear as possible in the future.

- David Wicksell
Fourth Watch Software LC

Dileep V S

unread,
Oct 8, 2018, 3:55:56 AM10/8/18
to Enterprise Web Developer Community
Hi Rob,

I am using redis and am still facing this problem. Do you have any thoughts on fixing this for Redis?

regards
Dileep

Sam Habiel

unread,
Oct 16, 2018, 4:22:44 PM10/16/18
to enterprise-web-de...@googlegroups.com
Hello all,

I don't know what happened, but I saw this with my own eyes:

I upgraded from nodem 0.11.2 to 0.12.1 and qewd 2.36.1 to 2.39.2 and
UTF-8 handling in Panorama broke! Everything used to work well before.

I tried nodem from the command line, and it seems to get the right
data without any garbling, and so I think something changed on the
QEWD end.

I haven't investigated it yet... I am putting this out there just in
case somebody else sees this.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to enterprise-web-develope...@googlegroups.com.
>>> To post to this group, send email to enterprise-web-de...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/enterprise-web-developer-community.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>>
>> --
>> Rob Tweed
>> Director, M/Gateway Developments Ltd
>> http://www.mgateway.com
>>
> --
> You received this message because you are subscribed to the Google Groups "Enterprise Web Developer Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to enterprise-web-develope...@googlegroups.com.
> To post to this group, send email to enterprise-web-de...@googlegroups.com.

Sam Habiel

unread,
Oct 17, 2018, 11:25:12 AM10/17/18
to enterprise-web-de...@googlegroups.com
Rob, you did this:

https://github.com/robtweed/ewd-qoper8-gtm/commit/e7aaeb66d99fc1ffa61a60d21b1a9d752e8c3a86

Why???

Once I undid this, everything went back to working.

I thought we would try by default to inherit what the environment
tells us (in my case, gtm_chset is set to utf-8).

I now see the problem: ./setEnvironment does not check gtm_chset,
which it SHOULD!

David, care to comment? How should we resolve this?

--Sam

DL Wicksell

unread,
Oct 17, 2018, 1:56:53 PM10/17/18
to enterprise-web-de...@googlegroups.com
Hi Sam,

 So, here is my comment. A while back, I decoupled the handling of
character set encodings between NodeM and the underlying YottaDB or GT.M
database. So it doesn't really matter what character set the underlying
MUMPS environment is set to, and NodeM doesn't care what $gtm_chset is
set to. This allows flexibility, and ease of installation and
configuration for NodeM.

 NodeM will default to using a UTF-8 character encoding, regardless of
what the underlying MUMPS language environment is configured to use. The
only time you need to think about what $gtm_chset is set to when working
with NodeM, is when using the function or procedure/routine APIs, as
they might end up calling MUMPS code that might do things that would be
interpreted based on $gtm_chset. I'm here talking about things like
$length or $extract in the MUMPS code, that might work differently,
depending upon $gtm_chset. But that is also one of the reasons I
decoupled the encoding configuration like that. You might want to
interpret everything in your application as UTF-8, and then call in to
an old MUMPS API, which needs to interpret that data as a simple byte
stream, in a particular locale.

 Another reason for defaulting to UTF-8, is that Node.js will interpret
any data you write out with something like console.log as UTF-8 by
default. You have to change Node.js's default character encoding to
binary in order to have it properly write out your m or binary data with
the correct glyphs of your chosen non-unicode single byte character set.
The web is also typically written to encode data as UTF-8 these days. So
it just makes intuitive sense.

 That being said, as you know, you can easily set the character set to
be other than UTF-8, with the charset property in the open call to
NodeM. That will then interpret everything as a single byte, based on
the current locale. But remember, if you do that, you will need to set
charset: 'binary' in NodeM (either binary, ascii, or m will work here),
set process.stdout.setDefaultEncoding = 'binary' in Node.js (either
binary or ascii will work here), and set charset=<character encoding> in
the appropriate HTML tag of your web page or application, to make sure
everything is handled correctly all the way through the application you
are developing. Also, if you are using Node.js in a terminal emulator,
you would also need to set the character encoding to the locale you want
in the terminal emulator, which is usually a drop down box in the
terminal's GUI.

 So now I want to address the other questions inherent in your last
post. I could have NodeM default to what is in $gtm_chset (defaulting to
M mode if it is unset), rather than UTF-8, but I don't think that is the
right way to go, because a lot of people don't really understand that
aspect of configuring YottaDB and GT.M (and the extra work of installing
the ICU library, and figuring out what version to point to, et al), and
most of the time, they simply want to use NodeM directly, with a
minimally configured YottaDB/GT.M database, and have it just work with
UTF-8 across their application. And for those that want something more
particular, like dealing with older VistA data and applications, in M
mode, they can easily set the charset to 'm' in the open call to NodeM.

 Also, as far as Rob looking in the current environment to see if
$gtm_chset is already set , that wouldn't matter. Basically,
setEnvironment.js is setting environment variables to configure YottaDB
or GT.M to work right with QEWD. If you already have an environment
variable set to something, that setEnvironment.js doesn't override, it
will still be there when YottaDB or GT.M initializes. And $gtm_chset
doesn't get overridden in setEnvironment.js, so I don't think that matters.

 Now, Rob is definitely setting charset: 'm' in his open call to NodeM
(which by the way, would override any default I add to NodeM - whether
UTF-8 like it is now, or even if I defaulted to whatever is in the
YottaDB/GT.M environment configuration, in $gtm_chset). However, it
looks as though Rob provides an override for NodeM configuration
options. In your startup configuration file, you should be able to
override Rob's default charset for NodeM. From a quick look at the code,
I think this will do the trick:

    var config = {
        managementPassword: 'keepThisSecret!',
        serverName: 'New QEWD Server',
        port: 8080,
        poolSize: 1,
        database: {
            type: 'gtm',
            params: {
                open_params: {
                    charset: 'utf-8';
                }
            }
        }
    };

So, I'd give that a try, as I think that should sort your issues.
Thanks, and if you have any more questions, let me know. I will try to
write up more thorough, and hopefully clearer, documentation about this,
but there is currently some information in NodeM's README.md file
already. Thank you.


On 10/17/18 9:24 AM, Sam Habiel wrote:
> Rob, you did this:
>
> https://github.com/robtweed/ewd-qoper8-gtm/commit/e7aaeb66d99fc1ffa61a60d21b1a9d752e8c3a86
>
> Why???
>
> Once I undid this, everything went back to working.
>
> I thought we would try by default to inherit what the environment
> tells us (in my case, gtm_chset is set to utf-8).
>
> I now see the problem: ./setEnvironment does not check gtm_chset,
> which it SHOULD!
>
> David, care to comment? How should we resolve this?
>
> --Sam
> On Tue, Oct 16, 2018 at 4:22 PM Sam Habiel <sam.h...@gmail.com> wrote:

--

DL Wicksell

unread,
Oct 17, 2018, 2:48:23 PM10/17/18
to enterprise-web-de...@googlegroups.com
Hi Sam and Rob,

After considering this a bit more, I think that maybe Rob could change
gtm.js in ewd-qoper8-gtm to pick up the $gtm_chset environment variable,
or default to 'm' if it isn't set. I think that might work for both what
you want Sam, and for what Rob needs, e.g.

    var openParams = {
        mode: 'strict',
        charset: process.env.gtm_chset || 'm'
    };

 Maybe that would work for you guys?

Sam Habiel

unread,
Oct 17, 2018, 3:04:39 PM10/17/18
to enterprise-web-de...@googlegroups.com
That sounds good David. I like this solution.

--Sam
> --
> You received this message because you are subscribed to the Google Groups "Enterprise Web Developer Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to enterprise-web-develope...@googlegroups.com.
> To post to this group, send an email to enterprise-web-de...@googlegroups.com.

rtweed

unread,
Oct 22, 2018, 5:00:11 AM10/22/18
to Enterprise Web Developer Community
 Looks like an interesting and sensible solution which I'll take a look at.

BTW the reason for the "m" charset was because things broke a while ago with one of the versions of GT.M.  It related to how the ewd-document-store "forEachChild" function worked when specifying a "from..to" range, where a high-ASCII-value character is appended to force the collation sequence start/end points.  The "m" charset fixed that.

You might like to just check if using ranges in the forEachChild() function works correctly with a range when using UTF-8 mode....and/or see if you can come up with an alternative workable solution to the range logic that avoids the issue in the first place

Rob

Sam Habiel

unread,
Oct 23, 2018, 1:59:30 PM10/23/18
to enterprise-web-de...@googlegroups.com
I wrote this tiny program to see if there is an issue with UTF-8 collation: I don't see any. The test lines and data are in RED; the QEWD output to compare against the RED data is in ORANGE. The output exactly matches the B index.

var Minterface = require('nodem');
var DocumentStore = require('ewd-document-store');
var sessions = require('ewd-session');

this.db = new Minterface.Gtm();
this.db.open();
this.documentStore = new DocumentStore(this.db);

console.log(this.documentStore.db.version());

sessions.addTo(this.documentStore);
this.db.symbolTable = sessions.symbolTable(this.db);
var session = sessions.create('bbApp');

var patients = new this.documentStore.DocumentNode('DPT');
var bIndex   = new this.documentStore.DocumentNode('DPT',['B']);

patients.forEachChild({range: {from: 1, to: ' '}}, (ien, node) => {
        console.log(ien + ': ' + node.$(0).value);
});

bIndex.forEachChild((name) => console.log(name));

console.log('***');

bIndex.forEachChild({range: {from: '곽', to: '구'}}, (name) => {
console.log(name)
});


this.db.close();

Output:

[ov6@7f2a5816fac1 node_modules]$ node sam.js
Node.js Adaptor for YottaDB: Version: 0.12.1 (FWS); GT.M version: 6.3-004; YottaDB version: 1.22
1: 가,민준^M^2560708^^^^^^444678924^^Boston^25^^^1^3181017^^^^1
2: 간,서준^M^2450502^^^^^^656771234^^Boston^25^^^1^3181017^^^^1
3: 갈,하준^M^2591003^^^^^^543236666^^Flower Mound^48^^^1^3181017^^^^1
4: 감,서윤^M^2660101^^^^^^323123456^^Providence^44^^^1^3181017^^^^1
5: 강,서연^M^2330501^^^^^^354623902^^London^25^^^1^3181017^^^^1
6: 견,지우^M^2410501^^^^^^656451234^^Canton^25^^^1^3181017^^^^1
7: 경,서현^M^2440706^^^^^^323556789^^Monmoth^34^^^1^3181017^^^^1
8: 계,다은^M^2330401^^2^^^^345238901^^Manchester^33^^^1^3181017^^^^1
9: 고,시우^M^2540330^^^^^^333224444^^Ripley^25^^^1^3181017^^^^1
10: 곡,현우^M^2320226^^^^^^888776666^^kingsland^4^^^1^3181017^^^^1
11: 공,예준^M^2570102^^^^^^411555432^^Concord^33^^^1^3181017^^^^1
12: 곽,지민^M^2480302^^^^^^323554567^^Trenton^34^^^1^3181017^^^^1
13: 관,민서^M^2460605^^^^^^123455678^^Redbank^34^^^1^3181017^^^^1
14: 교,현우^M^2550910^^^^^^311776543^^Weare^33^^^1^3181017^^^^1
15: 구,현준^M^2670607^^^^^^212118765^^Beverly^25^^^1^3181017^^^^1
16: 국,민재^M^2660506^^^^^^222559876^^Wakefield^44^^^1^3181017^^^^1
17: 궁,우진^M^2330401^^^^^^655447777^^Boston^25^^^1^3181017^^^^1
18: 궉,민지^M^2450414^^^^^^301444321^^Boston^25^^^1^3181017^^^^1
19: 권,슬기^M^2330601^^^^^^345678233^^Liverpool^25^^^1^3181017^^^^1
20: 근,수진^M^2220501^^^^^^656454321^^Springfield^25^^^1^3181017^^^^1
21: 금,현정^M^2590304^^^^^^711667890^^New London^9^^^1^3181017^^^^1
22: 기,성민^M^2690708^^^^^^111257654^^Lincoln^33^^^1^3181017^^^^1
23: Samúelsson,Ólafur Jóhann^M^2540707^^^^^^323678904^^Boston^25^^^1^3181017^^^^1
24: Indriðason,Þórarinn^M^2550401^^^^^^666551234^^Providence^44^^^1^3181017^^^^1
25: CARTER,DAVID^M^2810302^^^^^^000000113^^Santa Monica^6^^^1^3181017^^^^1^1
CARTER,DAVID
Indriðason,Þórarinn
Samúelsson,Ólafur Jóhann
가,민준
간,서준
갈,하준
감,서윤
강,서연
견,지우
경,서현
계,다은
고,시우
곡,현우
공,예준
곽,지민
관,민서
교,현우
구,현준

국,민재
궁,우진
궉,민지
권,슬기
근,수진
금,현정
기,성민
***
곽,지민
관,민서
교,현우
구,현준

[ov6@7f2a5816fac1 node_modules]$ mumps -r %XCMD 'ZWRITE ^DPT("B",*)'
^DPT("B","CARTER,DAVID",25)=""
^DPT("B","Indriðason,Þórarinn",24)=""
^DPT("B","Samúelsson,Ólafur Jóhann",23)=""
^DPT("B","가,민준",1)=""
^DPT("B","간,서준",2)=""
^DPT("B","갈,하준",3)=""
^DPT("B","감,서윤",4)=""
^DPT("B","강,서연",5)=""
^DPT("B","견,지우",6)=""
^DPT("B","경,서현",7)=""
^DPT("B","계,다은",8)=""
^DPT("B","고,시우",9)=""
^DPT("B","곡,현우",10)=""
^DPT("B","공,예준",11)=""
^DPT("B","곽,지민",12)=""
^DPT("B","관,민서",13)=""
^DPT("B","교,현우",14)=""
^DPT("B","구,현준",15)=""

^DPT("B","국,민재",16)=""
^DPT("B","궁,우진",17)=""
^DPT("B","궉,민지",18)=""
^DPT("B","권,슬기",19)=""
^DPT("B","근,수진",20)=""
^DPT("B","금,현정",21)=""
^DPT("B","기,성민",22)=""


--
You received this message because you are subscribed to the Google Groups "Enterprise Web Developer Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to enterprise-web-develope...@googlegroups.com.
To post to this group, send email to enterprise-web-de...@googlegroups.com.

K.S. Bhaskar

unread,
Oct 23, 2018, 3:07:15 PM10/23/18
to Enterprise Web Developer Community
Rob –

While I no longer have access to the GT.M issues list (since I don't work at FIS any more), I don't remember any such issue in a GT.M version (the code base relies on libicu). But remember that UTF-8 mode has two illegal characters at the end of every plane – e.g., $char(65534) and $char(65535) are illegal but $char(65536) is legal, whereas in M mode all characters from $char(0) through $char(255) are legal. UTF-8 mode also has the concept of illegal strings because not all sequences of bytes are legal characters. Could that be what you are thinking of?

Regards
– Bhaskar

Sam Habiel

unread,
Dec 3, 2018, 9:28:53 AM12/3/18
to enterprise-web-de...@googlegroups.com
Rob,

I just tried this again today; and you haven't fixed it yet. Would you
like me to make a pull request?

--Sam
> --
> You received this message because you are subscribed to the Google Groups "Enterprise Web Developer Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to enterprise-web-develope...@googlegroups.com.
> To post to this group, send email to enterprise-web-de...@googlegroups.com.

David Wicksell

unread,
Jan 6, 2019, 3:51:14 PM1/6/19
to Enterprise Web Developer Community
Hello all,

 I noticed that I made a mistake in the highlighted section of this message. It should
have read:

But remember, if you do that, you will need to set {charset: 'binary'} in
NodeM's open call (either binary, ascii, or m will work here). You also should set
process.stdout.setDefaultEncoding('binary'); and process.stdin.setEncoding('binary');
in Node.js (either binary or ascii will work here), and set charset=<character encoding>
in the appropriate HTML tag of your web page or application, to make sure everything
is handled correctly all the way through the application you are developing.

 Sorry for the confusion. Those are functions, not data properties. Thank you.

David Wicksell
Fourth Watch Software LC


Reply all
Reply to author
Forward
0 new messages