here is my (simplified) code:
string html =
Clipboard.GetDataObject().GetData(DataFormats.Html).ToString();
the problem is that the html string gets lots of substituted strange
characters, for example:
a dash - character from the word document gets converted into â€"
a line break gets converted into Â
an apostrophe gets converted into ‘
this doesn't happen when i just paste as normal into my html editor.
the characters import normally.
is there a way to read from the clipboard without screwing up the
characters? i tried Ascii.Encoding.GetString() but it needs a byte[],
which i don't know how to get from the DataObject.
many thanks for any help.
tim
string html =
Clipboard.GetDataObject().GetData(DataFormats.Html).ToString();
// Create a UTF-8 encoding.
UTF8Encoding utf8 = new UTF8Encoding();
// Get the encoded html string.
byte[] encodedBytes = utf8.GetBytes(html);
// Decode bytes back to string.
String decodedString = utf8.GetString(encodedBytes);
Console.WriteLine();
Console.WriteLine("Decoded bytes:");
Console.WriteLine(decodedString);
"Tim_Mac" <t...@mackey.ie> wrote in message
news:1125081696.6...@g43g2000cwa.googlegroups.com...
hi,
i am accessing some html (originating from MS Word) in the clipboard in
my winforms app. i catch it before the paste, clean up the html, set
the clipboard with the cleaned Html, and then paste.
here is my (simplified) code:
string html =
Clipboard.GetDataObject().GetData(DataFormats.Html).ToString();
the problem is that the html string gets lots of substituted strange
characters, for example:
a dash - character from the word document gets converted into ā?"
a line break gets converted into Ā
an apostrophe gets converted into ā?~
I can't see how that would help - it's just encoding and decoding with
the same encoding. As UTF-8 can encode any string, I can't envisage any
situation where html wouldn't be equal to decodedString - could you
give an example of such a situation?
--
Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
My code snippet certainly doesn't solve anything.
"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.1d7991eff...@msnews.microsoft.com...
Thanks for your post.
Yes, I can reproduce out your issue on my side. It seems that this issue
only occurs for localized characters, not for standard english characters.
Also, this issue only occurs with DataFormats.Html, but not for
DataFormats.Text etc..
Then after doing some research, I found that this issue is documented in
our internal database as a known issue. This is not winform side problem.
When asked for HTML format, GetData returns an ANSI string which obviously
does not have enough information to render chinese script. Currently, I can
not think of a better workaround for this issue.
Hope this helps.
Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
<!--StartFragment-->\r\n\r\n<p class=MsoNormal>‘â€â™
“€ </p>\r\n\r\n<p class=MsoNormal>`…
</p>\r\n\r\n<!--EndFragment-->
as you can see, there are garbage characters in the middle
corresponding to the characters in the word doc.
interestingly, when i paste the content into WordPad, it preserves the
open/close quote characters etc., but when i then copy and paste from
WordPad, the html string is read correctly in my application. the
open/close apostrohpes get demoted back to the normal apostrophe
character, and the ellipsis character gets demoted back to 3 period
characters.
what's a little bit annoying is that this problem only arose when i
attempt to intercept the html in the clipboard before it is pasted.
i'm using the Comzept HtmlEditor control for win-forms (a wrapper for
MSHTML), and it has it's own Paste() method, which does not produce
such character problems as i am experiencing. i presume it just calls
the MSHTML Paste() method.
looking forward to your reply
tim
But it *does* have characters where aren't in the ANSI code page.
That's what Jeffrey meant by "not for standard English characters" I
believe.
Thanks for your feedback.
Yes, I just tested '-' in english, which has no problem. However, with '"'
character, I can reproduce out this problem on my side.
After doing some further research, I found that this issue only occurs with
Word application, if we copy '"' characters from IE, Winform application
will get the characters well without any problem. Even with Excel, it will
retrieve well. So it seems that this issue is on Word application side.
Because Winform Clipboard class is just a wrapper of underlying windows
Clipboard operation, it seems there is little work can be done in Winform
side.
so far i can identify the following mappings:
‘ open single quote
’ close single quote
“ open double quote
†close double quote
… ellipsis
 two space characters, (as used by some formatting
conventions) after period
thanks
tim
Thanks for your post.
Yes, after doing some more research in this issue, I found that it seems
that it is Winform's problem. Because I created a Win32 appliction, which
use Win32 Api to get the clipboard CF_HTML format, I can get it without
garbled text. Then I converted this Win32 code into managed code with
P/invoke:
[DllImport("user32.dll",SetLastError=true)]
static extern IntPtr GetClipboardData(uint uFormat);
[DllImport("user32.dll",SetLastError=true)]
static extern bool OpenClipboard(IntPtr hWndNewOwner);
[DllImport("user32.dll",SetLastError=true)]
static extern bool CloseClipboard();
[DllImport("user32.dll", SetLastError=true)]
static extern uint RegisterClipboardFormatA(string lpszFormat);
[DllImport("user32.dll",SetLastError=true)]
static extern bool IsClipboardFormatAvailable(uint format);
[DllImport("kernel32.dll",SetLastError=true)]
static extern IntPtr GlobalLock(IntPtr hMem);
[DllImport("kernel32.dll",SetLastError=true)]
static extern uint GlobalSize(IntPtr hMem);
[DllImport("kernel32.dll",SetLastError=true)]
static extern IntPtr GlobalUnlock(IntPtr hMem);
private void button1_Click(object sender, System.EventArgs e)
{
uint CF_HTML = RegisterClipboardFormatA("HTML Format");
if (IsClipboardFormatAvailable(CF_HTML))
{
if(OpenClipboard(this.Handle))
{
IntPtr hGMem = GetClipboardData(CF_HTML) ;
IntPtr pMFP = GlobalLock(hGMem) ;
uint len=GlobalSize(hGMem);
byte[] bytes=new byte[len];
Marshal.Copy(pMFP,bytes, 0, (int)len);
string strMFP =System.Text.Encoding.UTF8.GetString(bytes);
this.textBox1.Text=strMFP;
GlobalUnlock(hGMem) ;
CloseClipboard() ;
}
}
}
This works well on my side. Hope this helps.
=================================================================
Thank you for your patience and cooperation. If you have any questions or
concerns, please feel free to post it in the group. I am standing by to be
of assistance.
I am glad my reply makes sense to you.
Yes, I think it will not break in all win32 version of OS. Because we are
just using Win32 API, which is guarantee to have consistent behavior on all
Win32 OS, our solution should be safe.
Thanks