I can provide some information as to what is happening -- though, I'd be curious to know a bit more about the process you are using to process the data.
I posted about this a bit ago, but MarcEdit does embed 0x202e, or characters marking the start and end of right to left processing. This is done specifically because of instances like this:
http://blog.reeset.net/archives/2103. Within right to left languages, there are a range of characters in the general latin set that will render as right to left if placed along-side a right to left value. One of the ones causing specific problems right now is the $0 combination. The blog post shows how this data renders, which is incorrect, but is also correct (in terms of rendering logic). The 0x202e character forces correct rendering. To do this, MarcEdit uses the following function to read the bits on a character to determine if has a right to left marker:
private string Right2Left(string toLoad)
{
char cLMS = (char)0x202A;//(char)0x200E;
char cRMS = (char)0x202B; //(char)0x200F;
bool bRight2Left = false;
System.Text.StringBuilder ctoLoad = new System.Text.StringBuilder();
//foreach (char c in toLoad)
//{
for (var i = 0; i < toLoad.Length; i += char.IsSurrogatePair(toLoad, i) ? 2 : 1)
{
var c = char.ConvertToUtf32(toLoad, i);
int hasRandALCat = 0;
if (c >= 0x5BE && c <= 0x10B7F)
{
if (c <= 0x85E)
{
if (c == 0x5BE) hasRandALCat = 1;
else if (c == 0x5C0) hasRandALCat = 1;
else if (c == 0x5C3) hasRandALCat = 1;
else if (c == 0x5C6) hasRandALCat = 1;
else if (0x5D0 <= c && c <= 0x5EA) hasRandALCat = 1;
else if (0x5F0 <= c && c <= 0x5F4) hasRandALCat = 1;
else if (c == 0x608) hasRandALCat = 1;
else if (c == 0x60B) hasRandALCat = 1;
else if (c == 0x60D) hasRandALCat = 1;
else if (c == 0x61B) hasRandALCat = 1;
else if (0x61E <= c && c <= 0x64A) hasRandALCat = 1;
else if (0x66D <= c && c <= 0x66F) hasRandALCat = 1;
else if (0x671 <= c && c <= 0x6D5) hasRandALCat = 1;
else if (0x6E5 <= c && c <= 0x6E6) hasRandALCat = 1;
else if (0x6EE <= c && c <= 0x6EF) hasRandALCat = 1;
else if (0x6FA <= c && c <= 0x70D) hasRandALCat = 1;
else if (c == 0x710) hasRandALCat = 1;
else if (0x712 <= c && c <= 0x72F) hasRandALCat = 1;
else if (0x74D <= c && c <= 0x7A5) hasRandALCat = 1;
else if (c == 0x7B1) hasRandALCat = 1;
else if (0x7C0 <= c && c <= 0x7EA) hasRandALCat = 1;
else if (0x7F4 <= c && c <= 0x7F5) hasRandALCat = 1;
else if (c == 0x7FA) hasRandALCat = 1;
else if (0x800 <= c && c <= 0x815) hasRandALCat = 1;
else if (c == 0x81A) hasRandALCat = 1;
else if (c == 0x824) hasRandALCat = 1;
else if (c == 0x828) hasRandALCat = 1;
else if (0x830 <= c && c <= 0x83E) hasRandALCat = 1;
else if (0x840 <= c && c <= 0x858) hasRandALCat = 1;
else if (c == 0x85E) hasRandALCat = 1;
}
else if (c == 0x200F) hasRandALCat = 1;
else if (c >= 0xFB1D)
{
if (c == 0xFB1D) hasRandALCat = 1;
else if (0xFB1F <= c && c <= 0xFB28) hasRandALCat = 1;
else if (0xFB2A <= c && c <= 0xFB36) hasRandALCat = 1;
else if (0xFB38 <= c && c <= 0xFB3C) hasRandALCat = 1;
else if (c == 0xFB3E) hasRandALCat = 1;
else if (0xFB40 <= c && c <= 0xFB41) hasRandALCat = 1;
else if (0xFB43 <= c && c <= 0xFB44) hasRandALCat = 1;
else if (0xFB46 <= c && c <= 0xFBC1) hasRandALCat = 1;
else if (0xFBD3 <= c && c <= 0xFD3D) hasRandALCat = 1;
else if (0xFD50 <= c && c <= 0xFD8F) hasRandALCat = 1;
else if (0xFD92 <= c && c <= 0xFDC7) hasRandALCat = 1;
else if (0xFDF0 <= c && c <= 0xFDFC) hasRandALCat = 1;
else if (0xFE70 <= c && c <= 0xFE74) hasRandALCat = 1;
else if (0xFE76 <= c && c <= 0xFEFC) hasRandALCat = 1;
else if (0x10800 <= c && c <= 0x10805) hasRandALCat = 1;
else if (c == 0x10808) hasRandALCat = 1;
else if (0x1080A <= c && c <= 0x10835) hasRandALCat = 1;
else if (0x10837 <= c && c <= 0x10838) hasRandALCat = 1;
else if (c == 0x1083C) hasRandALCat = 1;
else if (0x1083F <= c && c <= 0x10855) hasRandALCat = 1;
else if (0x10857 <= c && c <= 0x1085F) hasRandALCat = 1;
else if (0x10900 <= c && c <= 0x1091B) hasRandALCat = 1;
else if (0x10920 <= c && c <= 0x10939) hasRandALCat = 1;
else if (c == 0x1093F) hasRandALCat = 1;
else if (c == 0x10A00) hasRandALCat = 1;
else if (0x10A10 <= c && c <= 0x10A13) hasRandALCat = 1;
else if (0x10A15 <= c && c <= 0x10A17) hasRandALCat = 1;
else if (0x10A19 <= c && c <= 0x10A33) hasRandALCat = 1;
else if (0x10A40 <= c && c <= 0x10A47) hasRandALCat = 1;
else if (0x10A50 <= c && c <= 0x10A58) hasRandALCat = 1;
else if (0x10A60 <= c && c <= 0x10A7F) hasRandALCat = 1;
else if (0x10B00 <= c && c <= 0x10B35) hasRandALCat = 1;
else if (0x10B40 <= c && c <= 0x10B55) hasRandALCat = 1;
else if (0x10B58 <= c && c <= 0x10B72) hasRandALCat = 1;
else if (0x10B78 <= c && c <= 0x10B7F) hasRandALCat = 1;
}
}
if (hasRandALCat == 1)
{
bRight2Left = true;
}
if ((toLoad[i] == '$' ||
toLoad[i] == '\r' ||
toLoad[i] == '\n') && bRight2Left == true)
{
ctoLoad.Append(cLMS);
bRight2Left = false;
pbHasRight2Left = true;
}
ctoLoad.Append(toLoad[i]);
}
return ctoLoad.ToString();
}
This function tells MarcEdit when to insert explicit markers to render data as left to right. Previously, the tool would look for the "$" character (as it was a special value), but that was causing the validator some trouble. To correct this, I've updated the function to look at new lines as well (I can do this because I know what the mnemonic format should look like).
This data only will show up when working with the editor. When you save the mnemonic file, the data is filtered away. Previously, this happened like this:
foreach (string tmp_line in objRich.Lines)
{
if (pbHasRight2Left == true)
{
//clear the clms
string ptmp_line = tmp_line.Replace(((char)0x200E).ToString(), "").Replace(((char)0x200F).ToString(), "");
writer_buffer_string.AppendLine(ptmp_line);
}
else
{
writer_buffer_string.AppendLine(tmp_line);
}
}
The function would look for the pbHasRight2Left bit (which is set by the right to left function, to identify when to remove the embedded value. I've updated the code (which I'm posting this afternoon), that will add an additional check:
foreach (string tmp_line in objRich.Lines)
{
if (pbHasRight2Left == true ||
(tmp_line.IndexOf((char)0x202e) > -1))
{
//clear the clms
string ptmp_line = tmp_line.Replace(((char)0x200E).ToString(), "").Replace(((char)0x200F).ToString(), "");
writer_buffer_string.AppendLine(ptmp_line);
}
else
{
writer_buffer_string.AppendLine(tmp_line);
}
}
to try and make sure that this data is filtered. I've never seen a case where this data hasn't been filtered when working with MarcEdit's processing. If you are finding a case where it's slipped by, that would be a surprise -- but the updated filter should make sure that doesn't occur in the future.
So, that was all a long way of saying, these bytes are embedded in the mnemonic format (and have to be) per the notes above. However, this data should never be in the binary MARC file, because anytime that the data is saved in MarcEdit, the data is filtered away. The byte embedding only occurs and is visible in the data file that is rendered into the editor of the tool.
I've not added this filter into the binary MARC processor (it shouldn't be there) -- so, I guess what I'd recommend...the updated flavors of MarcEdit with the changes noted above will show up in a couple hours (or sooner). You should reprocess your MARC data (tasks, etc.), and see if you are still having problems. You may be able to open the mod mrc file in the editor and then resave/compile it -- but I'd do it from scratch because as I say, there should be no way for this data to slip into the binary marc file format given that the data is stripped when saving (which happens prior to processing).
--tr