Chinese OCR for Mac?

467 views
Skip to first unread message

Andrew Main

unread,
Apr 6, 2010, 10:46:05 AM4/6/10
to chine...@googlegroups.com
"PDF OCR is a simple drag-and-drop utility for Mac OS X, that converts
your PDFs and images into text documents. It uses advanced OCR
(optical character recognition) technology to extract the text of the
PDF even if that text is contained in an image. This is particularly
useful for dealing with PDFs that were created via a Scan-to-PDF
function in a scanner or photo copier."
<http://solutions.weblite.ca/pdfocrx>

It also offers downloadable Chinese (and other) language packs:
<http://solutions.weblite.ca/pdfocrx/languages.php>

The free "Community Version" is limited to single-page images and
PDFs. Might be worth a try.

Andrew Main

Kerim Friedman

unread,
Apr 6, 2010, 7:44:44 PM4/6/10
to chine...@googlegroups.com
Hmm. I ran a test on the following page - after installing the tradition chinese input method - and it gave me nothing. It ran the conversion, but the end result was a blank page of text.

It also seems this program just gives you raw text, it doesn't preserve formatting, or create a merged PDF with both the text and original image.

I don't like ReadIris, but I've yet to find anything to replace it with...

Kerim



--
You received this message because you are subscribed to the Chinese Mac group.
For answers to frequently-asked questions, visit http://www.yale.edu/chinesemac
To start a new topic, send a new message to chine...@googlegroups.com
To unsubscribe, send a message to chinesemac-...@googlegroups.com
For more options, visit http://groups.google.com/group/chinesemac

To unsubscribe, reply using "remove me" as the subject.

test.pdf

Pascale

unread,
Apr 6, 2010, 8:04:43 PM4/6/10
to chine...@googlegroups.com
Same for me, with two documents with four or five lines without any images. I thought I should have missed some steps, but now, I'm afraid it' s"normal".
I also found the process really slow to finally get an entire blank page…
And, like you, I'm still looking for something to replace this bad and expansive Readiris.

Pascale.

Kerim Friedman a écrit :

Thomas Howell

unread,
Apr 7, 2010, 4:16:09 PM4/7/10
to chine...@googlegroups.com
I was interested in this and e-mailed the company, and they located a bug in the Chinese conversion and fixed it.  They sent me a beta version to try and it does work, with the following caveats.  The example I tried had both text and image(pictures). The conversion could not convert lines that were over background images, and it tried to convert pictures to text. But if you have a pdf with plain Chinese text against a plain white background, the new version should work. They plan to post the new version on their website by this Friday or Saturday.

Tom Howell


Steve Hannah

unread,
Apr 7, 2010, 4:39:16 PM4/7/10
to Chinese Mac
Thanks to Tom for alerting us to this bug in working with Chinese.
The new version that should work with Chinese has now been posted for
download on our website http://solutions.weblite.ca/pdfocrx . You
would also require the chinese language file.

Here is a sample output using the test.pdf file that was posted by
Kerim above. Prior to running the PDF to PDF OCR X, I had to rotate
the PDF 90 degrees clockwise. I ran it through with the following
settings:
1. Language: Chinese Simplified
2. Layout: Multi-column
3. Text Wrap: Hard wrap.

新嗣荸研究口 第九十六期 2008年7月


逐渐增刀口的离佳女昏率、堕目台、同性癌、核1口冢庭、罩耜家庭、罩身家庭、'
燕子女冢庭、曼薪冢庭 ’都撼勤了傅呆充的冢庭僵{直舆冢庭唇冒{系(
喻碓欣,2003) 。而逭些冢庭意羲的改夔不只存在日常生活的寅跷中7
也影兽了大罡媒介的再琨。 』
依撞 ]ha11y (1989〉 的看法7人癌肃土畲工棠化之後畲崖生雨桓危
横>一槿是厂工棠化的危樵」 -指商品大量生崖遇剩»另一桓具u是「意
羲的危椴」 7手旨工棠4匕2後傅硫意羲 系的瓦解。ma11y 韶焉»遣雨桓
危横都要依靠匿告声斤激诿的大量浦黄手壬焉>才能加以解除。因篇大量的
治黄一方面焉大量的商品提f共出路舆市玉易7另一方面也篇人俩舆而土畲提
供了存在的雨傈舆意羲。在 21 世杞的初始-台 汽草匮告所使用的靓
服策略’即以Ad恤S (1983>所害胃的 「冢庭形象」 (fami1ia1 imag6) 焉
基磺=重椿了冢庭的僵值舆理想的家庭型熊=亚且舆消黄做微妙的速
腊7遣不僵焉汽草岫售匿胴市±易>同峙也将汽草概念化焉理想冢庭胴
{系的再琨。
本昔命文剑封台湾近年柬在霓而旯中播放的汽草匿告,棣嗣匿告如何舆
家庭的文化建槿 (cu1tura1c0nsm1ct) 造行速佶。本研究硼逵用家庭形
象在其内容中的汽草匮告>造行符虢璺分析»以瞪解冢庭做焉一桓符
虢7是姻可在流行媒介的再壬克系航舆政治中逞作。具 而
探索在汽草匿告中-家庭形象如何被禾u用篇一桓铡良的工具7将汽草建
椿成一棰爱舆幸福的剩原7也就是汽草女口何被等值1匕篇冢庭。尤其是一
桓中鏖锴极、具性癌的核心冢庭:如何在鏖告中被定羲篇理想的「幸
福」 家庭-成篇一桓主流的冢庭再琨。最後7本输她将探尉7如此罩
一的、被自然化的家庭意戬型熊封於舆冢庭有胴的寅跷及韶同可能有的
幸福冢庭的房卓 : 汽卓盾告 中所再现的理想家庭



一 、 匿告 、 文化 的嗣í系
在琨{畲中 ’ 唐告造太我侗的日常生活 ’ 己;湮到了熙所不在的地
步, 它 我侗的四周 , 充斥在 、 街道 、 以及每一倜人的家
中 ° 正如 LeiSS 、 Kline 和 Jha11y ( 1990二 45〉 所言 ’ 中的重要
性即在於它 「可以 费者 置物品峙所作的泱定」 口 十的主
要目的 了吸弓 I受罡的注意 ’ 他侗容易理解 、 改 的熊
度 ’ 最後便他俩有所行勤 (P011ay, 1986) 。 晦倜 告都能逵至H
遣些自的 ’ 每倜鏖告商都 注大量的金罐去徙事匿告的行销 。
然而 ’ 匮告做篇一 的形式 ’ 不只是一槿行蹭的工具而已 =
' 、 傅遗了言午多文化的僵值 (Dyer, l982; FTith, 1995; Leíss, K1ine
& Iha11y, 1990〉 ° 篇了 品的噩胄售 ’ 匿告必须逞用大量的符琥 、 意
象 、 情感 、 直 ° 在 Rotzo11 、 Haefi1er 和 Sandage (1990) 告
禾口肃土畲之尸丑猎特的嗣{系H寺 ’ 他侗将 告幌焉一桓反映T畲琨况的文化鏖
品 ° LeiSS 等人 (1990〉 具甘韶篇 ’ 匿告是一槿强大的甭土畲勤力 7 它可以
舆思想 ’ 遗可以将涸人舆崖品的形 在一起 。
揽判理言命的颧黠韶篇 7 匿告除了 「告 (inf0nnatiVe) 的明
颢功能外 ’ 它 隐然但影窖溧速的功能-将人俩蒂仄消黄的文化
中 。 Ke1iner (l995〉 韶焉 ’ 壬旯代肃土 中少有鏖告只是罩钝的告知萱瓿 (
目 前逭可能僵旯於宰艮祗的 告而已〉 ’ 大多数的匮告都包含
圃片 7 以剧造高品和某些人俩所欲求的情境 (例如 = 成 之周
_ ° G0ldman (l992〉 也韶篇>’ 堂匿告的梨作技循愈 ’ 以及


Best regards
Steve Hannah
Web Lite Solutions


Robert Smitheram

unread,
Apr 7, 2010, 5:29:03 PM4/7/10
to chine...@googlegroups.com
What version of the software did you use with successful results? I downloaded 1.7.1 (their latest available) and still get a blank page... (yes, I did download and install the appropriate language packs...

Robert

2010/4/7 Steve Hannah <adw...@weblite.ca>

Robert Smitheram

unread,
Apr 7, 2010, 5:41:08 PM4/7/10
to chine...@googlegroups.com
I am replying to myself... I took a closer look at the earlier post and saw that the sample text was Simplified Chinese.... Ran a test using simplified Chinese text and it worked... Does not seem like the traditional Chinese language pack has been similarly updated?

Robert
--
Robert H. Smitheram
Santa Barbara, CA
sai...@californiadream.com

Steve Hannah

unread,
Apr 7, 2010, 5:51:07 PM4/7/10
to Chinese Mac
Arggh... Yes. You appear to be correct. I'm getting blank pages with
Traditional on some documents also. We'll have to add this to the
list. Hopefully it's a simple fix - so for now I suppose PDF OCR X is
limited to simplified chinese. I'll repost here when traditional
chinese support is fixed.

-Steve

On Apr 7, 2:41 pm, Robert Smitheram <rsmithe...@gmail.com> wrote:
> I am replying to myself... I took a closer look at the earlier post and saw
> that the sample text was Simplified Chinese.... Ran a test using simplified
> Chinese text and it worked... Does not seem like the traditional Chinese
> language pack has been similarly updated?
>
> Robert
>

> On Wed, Apr 7, 2010 at 2:29 PM, Robert Smitheram <rsmithe...@gmail.com>wrote:
>
> > What version of the software did you use with successful results? I
> > downloaded 1.7.1 (their latest available) and still get a blank page...
> > (yes, I did download and install the appropriate language packs...
>
> > Robert
>

> > 2010/4/7 Steve Hannah <adwo...@weblite.ca>


>
> > Thanks to Tom for alerting us to this bug in working with Chinese.
> >> The new version that should work with Chinese has now been posted for

> >> download on our websitehttp://solutions.weblite.ca/pdfocrx.  You


> >> would also require the chinese language file.
>
> --
> Robert H. Smitheram
> Santa Barbara, CA

> saiw...@californiadream.com

Steve Hannah

unread,
Apr 7, 2010, 5:33:34 PM4/7/10
to Chinese Mac
Version 1.7.1 should work. Can you post the error log that comes out
when you try. (You can view the error log in the Console app (/
Applications/Utilities/Console)

On Apr 7, 2:29 pm, Robert Smitheram <rsmithe...@gmail.com> wrote:
> What version of the software did you use with successful results? I
> downloaded 1.7.1 (their latest available) and still get a blank page...
> (yes, I did download and install the appropriate language packs...
>
> Robert
>

> 2010/4/7 Steve Hannah <adwo...@weblite.ca>


>
>
>
> > Thanks to Tom for alerting us to this bug in working with Chinese.
> > The new version that should work with Chinese has now been posted for

> > download on our websitehttp://solutions.weblite.ca/pdfocrx.  You

Reply all
Reply to author
Forward
0 new messages