Hi list members,
(and a few others with whom I have corresponded about the Xtra --
sorry for any duplicate messages),
The long-awaited (by some) PRegEx 2.0 for Director 11 is finally here
as a public Beta:
http://openxtras.org/pregex/
The Read Me contains a "What's New" but I'll go ahead and save you the
clicks by pasting at end.
Quality level:
This build has all the features of 1.0, but now is Director 11-
friendly (a couple things relating to non-ASCII characters work
slightly differently). At least 4 bugs from version 1.0 have been
fixed, but one or 2 known bugs from that release remain and will be
fixed, we hope, before the final release. Again, these are all
detailed in the Read Me.
So, bottom line: this is *probably* as ready, or more ready, for
production use as version 1.0 was, but we're still calling at a beta.
Please let us know if it works with your project.
Enjoy,
-c
=========================================
What's New in 2.0
=========================================
Mac OS X Universal Binary
-------------------------
PRegEx is now a Universal binary. That means it runs natively on
Intel-based Macs, and also on older PowerPC (PPC) Macs, without
emulation. However, it is now Mac OS X only. (In fact, it only
supports v.10.4 ("Tiger") and later, the same as Director 11.)
Director 11+ Only
-----------------
PRegEx 1.0 supported Director versions 7-10. The older version
does NOT work on Dir. 11+ (even if it seems to work on Windows).
Unicode
-------
In Director 11, Macromedia changed the internal string format to
UTF-8 (Unicode). This is great news, but completely changes the
way that PRegEx needs to work. Here is a summary of the changes:
Reading/Writing files:
Reading files into memory and writing them back out again now
requires careful attention to text encodings. (In Director 7-10,
all files were simply assumed to be MacRoman or Windows1252 files,
whether they were or not, and this was OK). The great news is
that PRegEx now supports essentially *all* known text file formats
(by fully incorporating the open-source iconv library), plus some
additional custom formats that will be helpful to PRegEx users.
See ReadFileToString, and WriteString to file for the details.
Escape Codes:
PRegEx supports "interpolation" of special escape codes to
generate special characters in strings. Interpolation is used in
3 places: Replace (in the replacement string), Translate (in the
input and output mapping strings), and Interpolate. In Director
7-10, any 8-bit value was legal in strings. In Director 11, all
characters in strings must be valid UTF-8, or Director could
crash. So the meanings of the following escapes have changed:
\200-\377 octal escapes - formerly inserted 8-bit char/byte,
now Unicode code points 128-255
\x80-\xFF hex escapes - formerly inserted 8-bit char/byte, now
Unicode code points 128-255
And these new escapes have been added:
\400-777 new octal escapes for Unicode code points 257 through
511
\x{0}-\x{7FFFFFFF} new hex escapes for *any* valid Unicode
code points
Please note that not all Unicode code points between 0 and
7FFFFFFF are valid! You should restrict yourself to valid Unicode
code points as defined in the latest Unicode specifications.
Also note that the UTF-8 hexadecimal representations of Unicode
characters are NOT the same as the Unicode code point numbers.
For example, the correct Unicode code point specification for
"cents" sign is U+00A2, which can be specified as \x{A2} or
\x{00A2}. The 2 hex bytes C2A2 describe the UTF-8 encoding of that
symbol, but the escape code \x{C2A2} can NOT be used to
interpolate one of these values into a string. PRegEx provides no
way to expressly indicate the UTF-8 representation of a character.
Director and PRegEx and PCRE and iconv always figure out the UTF-8
encodings for you.
These escape codes are the same as PCRE's octal and hexadecimal
escape codes, so you can use the same encodings in both the Search
and Replace strings of any PRegEx function.
Translate Function:
Because of how Unicode works, the Translate function can no longer
work with non-ASCII characters. Specifically:
- Any non-ASCII characters in the InputTable and OutputTable
will simply be ignored, as if they were not present at all.
- If used in a "range specifier", non-ASCII characters will
prevent the range from being recognized as a range.
- Any non-ASCII characters in the SrchStrL (string being
modified) will be untouched. I.e. they will never be
modified by the Translate function.
Quotemeta function:
Quotemeta formerly would put a backslash in front of non-ASCII
characters. Now, it will not. (Those characters are always
literal in PCRE.)
String lengths:
As in Lingo, string lengths returned by PRegEx functions and
accepted as arguments are always in terms of character length,
never byte length. (Prior to Director 11 and Unicode/UTF-8, these
concepts were the same.) For strings that are 100% ASCII, the
lengths are the same. For non-ASCII strings, the length in bytes
is dependent on the UTF-8 representation.
The exception is when writing a string to a file: the return value
is the size of file on disk, in bytes, and is dependent upon the
character encoding chosen and the content of the string, possibly
being higher or lower than the number of characters written.
Bug Fixes
---------
Fixed in 2.0:
- Calling join() with an empty list crashed the Mac (and maybe
Windows)
- Could not write file names longer than 31 characters
- The "s" option did not always function correctly
- An error message said "...with setting" rather than "...without
setting".
New build methodology (for building the Xtras from source)
----------------------------------------------------------
- Better supports source control techniques
- Uses modern development tools (XCode, VC++ 2003 as patched)
- No longer has to worry about Mac resource forks (new OSX binary
format)
- Uses .zip format instead of .sit for distribution
- See ReadMe.txt files in make_mac and make_win directories for
details