Batch convert to utf8

2,319 views
Skip to first unread message

Andrew Brown

unread,
Oct 20, 2012, 9:49:22 AM10/20/12
to bbe...@googlegroups.com
I have thousands of files to convert from latin1 to utf8, is there any way of doing this ? I don't use unix except under severe pressure, and need to retain the directory structure.

The AS by Nobumi Iyanaga no longer works.

Thanks for any help!

AB

François Schiettecatte

unread,
Oct 20, 2012, 10:07:08 AM10/20/12
to bbe...@googlegroups.com
You could use iconv which is included in Mac OS X, you will need to wrap it with some sort of script though.

Alternatively you could take a look at this:

https://itunes.apple.com/us/app/text-encoding-converter/id414626292?mt=12

And searching google for:

'mac OS X text encoding converter'

returned a number of solutions.


Best regards

François
> --
> --
> You received this message because you are subscribed to the
> "BBEdit Talk" discussion group on Google Groups.
> To post to this group, send email to bbe...@googlegroups.com
> To unsubscribe from this group, send email to
> bbedit+un...@googlegroups.com
> For more options, visit this group at
> <http://groups.google.com/group/bbedit?hl=en>
> If you have a feature request or would like to report a problem,
> please email "sup...@barebones.com" rather than posting to the group.
> Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>
>
>
>

Rich Siegel

unread,
Oct 20, 2012, 10:23:20 AM10/20/12
to bbe...@googlegroups.com
On Saturday, October 20, 2012, Andrew Brown <li...@c18.net> wrote:

> I have thousands of files to convert from latin1 to utf8, is there any way of
> doing this ?

A text factory with a single "Change Text Encoding" action should do the job.

R.
--
Rich Siegel Bare Bones Software, Inc.
<sie...@barebones.com> <http://www.barebones.com/>

Someday I'll look back on all this and laugh... until they sedate me.

Andrew Brown

unread,
Oct 20, 2012, 11:52:51 AM10/20/12
to bbe...@googlegroups.com
Thanks, Rich. It took me a while to find them, but they are just what's needed. -- AB

Andrew Brown

unread,
Oct 20, 2012, 1:14:05 PM10/20/12
to bbe...@googlegroups.com
But... when I open a file so modified, and try to save it, I get a message

> Document encoding mismatch
> This document contains data which describes its encoding as Western (ISO Latin 1), but the encoding has been set to Unicode (UTF-8).
> Saving this document as-is will likely cause unexpected display of its contents and may prevent BBEdit from opening it in the future.

Two questions:

1. Why on earth do I have to retype this message in order to send it to the list? Why cannot such messages be copied? Among the lamentable absurdities of human existence, this one takes the carrot.

2. How can I eliminate this problem?

AB

Rich Siegel

unread,
Oct 20, 2012, 2:34:33 PM10/20/12
to bbe...@googlegroups.com
On Saturday, October 20, 2012, Andrew Brown <li...@c18.net> wrote:

>But... when I open a file so modified, and try to save it, I get a message
>
>>Document encoding mismatch This document contains data which
>>describes its encoding as Western (ISO Latin 1), but the encoding
>>has been set to Unicode (UTF-8). Saving this document as-is will
>>likely cause unexpected display of its contents and may prevent
>>BBEdit from opening it in the future.

WAYRTTD? This error occurs when the document has an explicit
character set specified in its contents (as is the norm for HTML
documents), but you've changed the text encoding setting and
tried to save it.

>2. How can I eliminate this problem?

Add a Replace All action to your text factory, which searches
for "charset=iso-8859-1" (or whatever specification it is that
occurs in your documents; check first) and changes it to specify UTF-8.

Andrew Brown

unread,
Oct 21, 2012, 2:00:03 AM10/21/12
to bbe...@googlegroups.com
On 20 oct. 2012, at 20:34, Rich Siegel wrote:

> Add a Replace All action to your text factory, which searches for "charset=iso-8859-1" (or whatever specification it is that occurs in your documents; check first) and changes it to specify UTF-8.

Tried two Replace All in one Factory

<!doctype(.+?)>
<meta(.+?)>

and got "Insufficient memory to complete this operation".

Worked ok one at a time in Multi-File Search.

I still don't see why error messages cannot be copied, but I suppose that all developers have a team devoted to keeping the user in his place.

AB

Robert A. Rosenberg

unread,
Oct 21, 2012, 3:48:02 AM10/21/12
to bbe...@googlegroups.com, Andrew Brown
At 08:00 +0200 on 10/21/2012, Andrew Brown wrote about Re: Batch
convert to utf8:
Here are the two statements that need to be altered:

<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<?xml version="1.0" encoding="utf-8"?>

DOCTYPE does not have any indication of what character set the file
uses so I do not know why you were checking it. If you do multi-file
search but code SAVE and DO-NOT-PROMPT you will not run into a
insufficient memory error (it only is handling one file at a time as
opposed to the LEAVE OPEN option).

Andrew Brown

unread,
Oct 21, 2012, 4:41:41 AM10/21/12
to bbe...@googlegroups.com
On 21 oct. 2012, at 09:48, Robert A. Rosenberg wrote:

> At 08:00 +0200 on 10/21/2012, Andrew Brown wrote about Re: Batch convert to utf8:
>
>> On 20 oct. 2012, at 20:34, Rich Siegel wrote:
>>
>>> Add a Replace All action to your text factory, which searches for "charset=iso-8859-1" (or whatever specification it is that occurs in your documents; check first) and changes it to specify UTF-8.
>>
>> Tried two Replace All in one Factory
>>
>> <!doctype(.+?)>
>> <meta(.+?)>
>>
>> and got "Insufficient memory to complete this operation".
>>
>> Worked ok one at a time in Multi-File Search.
>>
>> I still don't see why error messages cannot be copied, but I suppose that all developers have a team devoted to keeping the user in his place.
>>
>> AB
>
> Here are the two statements that need to be altered:
>
> <meta http-equiv="content-type" content="text/html; charset=utf-8" />
> <?xml version="1.0" encoding="utf-8"?>
>
> DOCTYPE does not have any indication of what character set the file uses so I do not know why you were checking it.

I was taking that away because I don't need any of the HTML etc. headers. I would have deleted them all in one go, except that the files are hopelessly inconsistent and no single search could zap all various combinations of headers and extraneous code of and sort and another.

> If you do multi-file search but code SAVE and DO-NOT-PROMPT you will not run into a insufficient memory error (it only is handling one file at a time as opposed to the LEAVE OPEN option).

Yes, I know, multi-file search works fine, it is the factory that falls over, with more than one replace, even with save and do-not-prompt. Which is a great shame, because I have multiple search-and-replaces to run and will have to find another solution.

AB

Christopher Stone

unread,
Oct 21, 2012, 2:20:13 PM10/21/12
to bbe...@googlegroups.com
On Oct 21, 2012, at 01:00, Andrew Brown <li...@c18.net> wrote:
I still don't see why error messages cannot be copied, but I suppose that all developers have a team devoted to keeping the user in his place.
______________________________________________________________________

Hey Andrew,

In fact I agree with you that software error messages should by copyable, but expressing sour grapes on the users list is not especially productive.

I usually take a screenshot of error dialogs and send that when possible.  (Both this list and the Bare Bones Support address will take graphics attachments.)

Have you actually made a feature request to support?

  Bare Bones Software <sup...@barebones.com>

--
Best Regards,
Chris

Reply all
Reply to author
Forward
0 new messages