Docx Page Count?

7,416 views
Skip to first unread message

joshk...@gmail.com

unread,
Jan 19, 2015, 9:16:42 PM1/19/15
to pytho...@googlegroups.com
Is there any way using python-docx to get an accurate page count of a particular docx file? I've scoured the docs but cant seem to find any function to perform this task. Thanks,

Josh

Steve Canny

unread,
Jan 20, 2015, 1:25:18 AM1/20/15
to pytho...@googlegroups.com
No. The layout onto specific pages is not specified in the .docx file; it's determined by the printing application at runtime.

This SO answer has a bit more on the details:

joshk...@gmail.com

unread,
Jan 20, 2015, 11:37:52 AM1/20/15
to pytho...@googlegroups.com
Thanks Steve - I found that I can unzip the .docx file and extract docProps/app.xml, then parse the XML with ElementTree to get the <Pages></Pages> element. I found that most of the time that number is accurate, but I've seen a few instances where the number in that element is not correct. Just curious if you know the circumstances that would lead to that attribute being incorrect?

Steve Canny

unread,
Jan 21, 2015, 2:59:46 PM1/21/15
to pytho...@googlegroups.com
Hi Josh, good point! This question comes up so frequently in one of its variants (detecting page break locations, getting page numbers for table of contents etc.) that I can see I didn't give it any fresh thinking for this particular use case :)

As I understand it, Word writes that attribute on each save. So the first thing I would look for is a document that Word hadn't touched yet, like perhaps one converted from another format or maybe one generated by Google Docs or something. The other thought that occurs to me is that Word might only update this when a document is paginated, and that might only happen in page layout view or when the document is printed; so perhaps if the document was modified significantly in Draft mode it might not get updated.

All these are speculation though, I don't have much insight into Word internals as far as rendering is concerned. I would consult my friend Google on this one I suppose, it's possible that someone who uses Word a lot has come across it somewhere along the line :)

How far off are the numbers you're finding? Like 1 instead of 20 or more like 22 instead of 20? It occurs to me that locale/printer setup, especially Letter vs A4 could figure in as well. LibreOffice vs. Word could also be a factor.

Would also be interesting to know whether the discrepancy persists once the file is opened and then saved under a different name without making any changes.

--Steve

Alexey padenko

unread,
Mar 5, 2015, 5:44:50 PM3/5/15
to pytho...@googlegroups.com

 After i checked docProps/app.xml after saving document in Microsoft Word <Pages></Pages> element is true  number pages in document. But if document created with python docx always page number is 1. Maybe it will help in solving the problem.

Steve Canny

unread,
Mar 6, 2015, 12:53:03 AM3/6/15
to pytho...@googlegroups.com
A document created with python-docx will always have the same page-count (<Pages>nnn</Pages>) of the document you opened. The default starting document (built into python-docx) has a page count of 1, which is why you observe this behavior.

--Steve

Alexey padenko

unread,
Mar 6, 2015, 2:57:46 AM3/6/15
to pytho...@googlegroups.com
Then I have a question, can the python docx change after saving  file docProps/app.xml with the actual number of pages .
If this is possible, letter write code for automatically retrieve the data from a file and write to the variable.
Reply all
Reply to author
Forward
0 new messages