I have to calculate the readability index for PDF document like the
Microsoft Word.
--------------------------------------------------------------------------------
**Microsoft Readability Statistics**
Follow the instruction to view this feature in ms word.
Display readability statistics:
- On the Tools menu, click Options, and then click the Spelling &
Grammar tab.
- Select the Check grammar with spelling check box.
- Select the Show readability statistics check box, and then click
OK.
- On the Standard toolbar (toolbar: A bar with buttons and options
that you use to carry out commands. To display a toolbar, click
Customize on the Tools menu, and then click the Toolbars tab.), click
Spelling and Grammar. When Microsoft Word finishes checking spelling
and grammar, it displays information about the reading level of the
document."
It display the:
Words - 128
Characters - 763
Paragraphs - 7
Sentences - 7
Sentences per Paragraph - 2.3
Words per Sentence - 17.2
Characters per Word - 4.9
Passive Sentences - 0
Flesch Reading Ease - 50.8
Flesch-Kincaid Grade Level - 10.4
--------------------------------------------------------------------------------
Same thing I have to do with PDF documents so I can do it?
And apart from this I have to process the PDF based on the following
rules to assure the readability:
-Select areas of submission to be Process(usually 3 content blocks)
-Identify what things can not be pre-processed (i.e., images or other
parts of the content)
-Font type (it has to be Sans Serif)
-Font type larger than 12 pt. (foot notes can have 10 pt font Verify
w/L. Dieter)
-Bulleted lists are only counted when they are full sentences
-No italics
-No words all caps, except the product
-You can have all caps or reverse type/knockouts in the headline
Contrast of foreground words on a background
Potentially check RGB values of both and issue warning
-Warn if there is too little white space?
did anyone face the same problem?
Thanks,
Utkarsh
Nagarro Software Pvt Ltd
B. Tech, IIT- Delhi
09810355322
Your task is difficult since a pdf does not lend itself to that type of
analysis. Generally you would not end up with anything to analyze per the
instructions you listed. You may want to look into more detail on just what
a PDF is and why it just wouldn't fit. You could possibly analyze the
original (if you had it) and then work on a histogram of whole document
treating it as an image but I am not sure that's what you are looking for.
Hope this is of some help.
Larry T.
Try converting the PDF to Word, and then using your current procedure.
Automatic PDF to Word conversion doesn't generally look good, but you
only desire statistics. Acrobat 6 has a 'Microsoft Word Document'
option on the 'Save As...' dialog.
Sid Steward
http://www.AccessPDF.com/pdftk/