Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

CAtlREMatchContext and special characters

23 views
Skip to first unread message

smalolepszy

unread,
Dec 20, 2004, 4:06:15 AM12/20/04
to
I have problem with special language characters, which have ascii code
above 128.

This is my source code:

CAtlRegExp<> re;
REParseError status =
re.Parse("{[&#261;&#260;&#263;&#262;&#281;&#280;&#322;&#321;&#324;&#323;&#347;&#346;&#380;&#379;&#378;&#377;óÓ]+}");
ATLASSERT(status == REPARSE_ERROR_OK);

CAtlREMatchContext<> mc;
const CAtlREMatchContext<>::RECHAR* szStart = "1234 &#261; 1234 &#263;
1234";
const CAtlREMatchContext<>::RECHAR* szEnd = 0;

while (re.Match( szStart, &mc, &szEnd ) )
{
mc.GetMatch( 0, &szStart, &szEnd);
ptrdiff_t nLength = szEnd - szStart;
CString a;
a.Format("%.*s", nLength, szStart );
}

When deguger goes to re.Match( 0, &szStart, &szEnd) I receive error:
Access violation. Why I can't use regular expressions to special
characters in national language?

Thanks

Igor Tandetnik

unread,
Dec 20, 2004, 11:00:38 AM12/20/04
to
"smalolepszy" <smalo...@poczta.onet.pl> wrote in message
news:21669c6c.04122...@posting.google.com

> I have problem with special language characters, which have ascii code
> above 128.

Known problem. See

http://groups-beta.google.com/group/microsoft.public.vc.atl/browse_frm/thread/8eba20965be51c2b/58a0eb917936d4d8

No workaround that I know of.

I found CAtlRegExp to be rather buggy. Also, it uses an unusual regex
dialect. Consider using Greta [1] or Boost regex [2] instead. The latter
is very close to becoming part of C++ standard (or to be exact, C++
Library Technical Report). I've seen claims that Greta is noticeably
faster, but have not verified them myself either way.

[1] http://research.microsoft.com/projects/greta/
[2] http://boost.org/libs/regex/doc/index.html

--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925


Unknown

unread,
Dec 28, 2004, 3:19:07 PM12/28/04
to
Hi,

I was running in the same problem with german umlauts.

The algorithm seems to be buggy. To word around that, copy the file

C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\atlmfc\include\atlrx.h

to your project path and make the following changes and include it in your cpp:

Line 637

<!new unsigned char* usz = (unsigned char *) sz;
<!new size_t u = (size_t) *usz;
!>old size_t u = (size_t) *sz;

and Line 1181

<! new unsigned char uchStart = chStart;
<! new unsigned char uchEnd = chEnd;
<! new for (int i=uchStart; i<=uchEnd; i++)
!> old for (int i=chStart; i<=chEnd; i++)
pBits[i >> 3] |= 1 << (i & 0x7);

With these changes everthing should work fine.

Good luck!
Michael


**********************************************************************
Sent via Fuzzy Software @ http://www.fuzzysoftware.com/
Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...

0 new messages