On Feb 14, 4:34 am, happytoday <
ehabaziz2...@gmail.com> wrote:
> I am trying to sort a file according to unicode field
> (position,length) under Berkeley unix version (windows version). I
> tried msort3.exe utility but can not find msort3.exe working with me.
> Is there a command line utitlity or perl/sedawk program that sorts a
> file according to unicode column UTF-8 with start_position,length_position.
You should try GNU sort, which does run under Windows. Note the need
to set some environment variables. From the info pages:
(1) If you use a non-POSIX locale (e.g., by setting `LC_ALL' to
`en_US'), then `sort' may produce output that is sorted differently
than you're accustomed to. In that case, set the `LC_ALL' environment
variable to `C'. Note that setting only `LC_COLLATE' has two
problems.
First, it is ineffective if `LC_ALL' is also set. Second, it has
undefined behavior if `LC_CTYPE' (or `LANG', if `LC_CTYPE' is unset)
is
set to an incompatible value. For example, you get undefined behavior
if `LC_CTYPE' is `ja_JP.PCK' but `LC_COLLATE' is `en_US.UTF-8'.
A comment on
stackoverflow.com says this:
... keep in mind that GNU sort depends on a correct locale setting
(the LC_* environment variables, and specifically the LC_COLLATE one).
LC_COLLATE (or LC_ALL) should be set to a locale with UTF-8 support
(e.g. en_US.UTF-8 or el_GR.UTF-8), preferably in the language that you
are interested in.
To sort on start position, end position, do this;
sort -t x -k 1.M,1.N
where 'x' is a character known to exist nowhere in the file, M is the
start column number, and N is the end column number.
Eric