Office files previews (DOCX, XLSX, PPTX)

2,330 views
Skip to first unread message

DrakoWhole

unread,
Apr 5, 2011, 5:21:30 AM4/5/11
to ResourceSpace
Hi, I would like to know if there are any solutions for Office
previews for RS installed on a Windows platform.
Something equivalent to unoconv or qlpreview... which are valid for
linux and MAC OS... Anyone has this request as well?

Thanks.

Dan Huby

unread,
Apr 5, 2011, 6:16:09 AM4/5/11
to ResourceSpace
Those newer formats can include an embedded preview which will be
automatically used without any third party code (I think a 'zip'
utility is all that is needed). Have you tried uploading them?

Dan

DrakoWhole

unread,
Apr 5, 2011, 6:43:12 AM4/5/11
to ResourceSpace
I have tried uploading a DOCX, XLSX, and PPTX and what I get is a nice
preview icon, but the preview online of the file in the system is not
available.
I do not have OpenOffice installed yet on the Server. Do you think it
is worth give it a try to see what happens?
thanks,

TopSolid

unread,
Apr 20, 2011, 9:13:06 AM4/20/11
to resour...@googlegroups.com
Hi,

I am very interested in finding a solution to generate thumbnails of office documents 2003 + 2007 (.doc, .ppt, .docx etc) on a windows environment (WAMP). Is this correctly working on a LAMP?
I have installed open office on the server and no difference. Any clue about that ? You write about zip utility, how to install/test it?

Thanks for your help

Rgds

David

Jeff Harmon

unread,
Apr 20, 2011, 8:32:06 PM4/20/11
to ResourceSpace
Colorhythm funded the ability of ExifTool also to extract these
previews.

- Jeff

Tom Gleason

unread,
Apr 20, 2011, 11:26:46 PM4/20/11
to resour...@googlegroups.com
openoffice is the best way to get good multipage previews of office
documents, but you need to set up the 'unoconv' utility. It's only
been tested on Linux.

> --
> You received this message because you are subscribed to the Google Groups "ResourceSpace" group.
> To post to this group, send email to resour...@googlegroups.com.
> To unsubscribe from this group, send email to resourcespac...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/resourcespace?hl=en.
>
>

--
Tom Gleason, PHP Developer

ResourceSpace Support Services
https://www.buildadam.com/muse2

Exploring ResourceSpace at:
http://resourcespace.blogspot.com

TopSolid

unread,
Apr 21, 2011, 6:15:20 AM4/21/11
to ResourceSpace
Hi !

Thanks for your answers...
So I am trying to make running unoconv on Windows for ResourceSpace
DAM and I am experiencing some problem.
It does not find soffice.exe (it exists in the correct path), but it
apparently finds pyuno.pyd (no error):

Command line :
C:\xampp\unoconv>"C:\Program Files (x86)\OpenOffice.org 3\program
\python.exe" un
oconv.py --listener &

unoconv: Cannot find the soffice binary in sys.path and known paths.
ERROR: Please locate this binary and send your feedback to:
<to...@lists.rpmforg
e.net>.

I have changed my paths in the script like this (just added spaces)
but it does not solve the problem :


if 'Program Files (x86)' in os.environ.keys():
extrapaths += glob.glob(os.environ['Program Files (x86)']+'\
\OpenOffice.org*\\URE\bin') + \
glob.glob(os.environ['Program Files (x86)']+'\
\OpenOffice.org*\\program') + \
glob.glob(os.environ['Program Files (x86)']+'\
\OpenOffice.org*\\Basis*\\program')

binaries = ( 'soffice.bin', 'soffice', 'soffice.exe' )

Any ideas/hints?

Thanks !

David



On Apr 21, 5:26 am, Tom Gleason <t...@buildadam.com> wrote:
> openoffice is the best way to get good multipage previews of office
> documents, but you need to set up the 'unoconv' utility. It's only
> been tested on Linux.
>
>
>
>
>
>
>
>
>
> On Wed, Apr 20, 2011 at 8:32 PM, Jeff Harmon <jeffreyhhar...@gmail.com> wrote:
> > Colorhythm funded the ability of ExifTool also to extract these
> > previews.
>
> > - Jeff
>
> > On Apr 20, 6:13 am, TopSolid <d.arno...@topsolid.com> wrote:
> >> Hi,
>
> >> I am very interested in finding a solution to generate thumbnails of office
> >> documents 2003 + 2007 (.doc, .ppt, .docx etc) on a windows environment
> >> (WAMP). Is this correctly working on a LAMP?
> >> I have installed open office on the server and no difference. Any clue about
> >> that ? You write about zip utility, how to install/test it?
>
> >> Thanks for your help
>
> >> Rgds
>
> >> David
>
> > --
> > You received this message because you are subscribed to the Google Groups "ResourceSpace" group.
> > To post to this group, send email to resour...@googlegroups.com.
> > To unsubscribe from this group, send email to resourcespac...@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/resourcespace?hl=en.
>
> --
> Tom Gleason, PHP Developer
>
> ResourceSpace Support Serviceshttps://www.buildadam.com/muse2

Jeff Harmon

unread,
Apr 21, 2011, 4:50:20 PM4/21/11
to ResourceSpace
i suggest asking the unoconv developer and community.

- J

Tom Gleason

unread,
Apr 21, 2011, 4:52:56 PM4/21/11
to resour...@googlegroups.com
I agree, you need to make unoconv work on the command line first, and
then if it doesn't work with ResourceSpace, report back to us.

> For more options, visit this group at http://groups.google.com/group/resourcespace?hl=en.

TopSolid

unread,
Apr 26, 2011, 10:21:10 AM4/26/11
to resour...@googlegroups.com
Hi Jeff,

Is it possible that you could share exiftool ability to extract previews for ResourceSpace community hosted on Windows?
I have contacted unoconv developer without success so far...

Thanks and Best Regards

David

Rudy

unread,
Apr 26, 2011, 10:57:33 AM4/26/11
to ResourceSpace
Hmm, not sure this helps, but you might want to give this link a go:

Sample VB.NET subroutine to extract a preview image (thanks Claus
Beckmann)
http://owl.phy.queensu.ca/~phil/exiftool/vb_sample.html

(taken from http://www.sno.phy.queensu.ca/~phil/exiftool/)

If you succeed, you might want to give feedback and share your
experience!

Good Luck!
Rudy

Tom Gleason

unread,
Apr 26, 2011, 10:59:09 AM4/26/11
to resour...@googlegroups.com
have you checked if there is an embedded preview in your office files?
(change the extension to zip and unzip it).

I think it's a option that has to be turned on when saving the files.

Rudy

unread,
Apr 26, 2011, 11:04:19 AM4/26/11
to ResourceSpace
It is indeed an option that needs to be checked (I vaguely remember
that is unchecked by default). And the ExifToolGui for Windows will
report if there is a thumbnail embedded in the exif-data or not.

Cheers,
Rudy
On 26 Apr., 16:59, Tom Gleason <t...@buildadam.com> wrote:
> have you checked if there is an embedded preview in your office files?
> (change the extension to zip and unzip it).
>
> I think it's a option that has to be turned on when saving the files.
>
>
>
> On Tue, Apr 26, 2011 at 10:57 AM, Rudy <ruediger.schw...@gmail.com> wrote:
> > Hmm, not sure this helps, but you might want to give this link a go:
>
> > Sample VB.NET subroutine to extract a preview image (thanks Claus
> > Beckmann)
> >http://owl.phy.queensu.ca/~phil/exiftool/vb_sample.html
>
> > (taken fromhttp://www.sno.phy.queensu.ca/~phil/exiftool/)
>
> > If you succeed, you might want to give feedback and share your
> > experience!
>
> > Good Luck!
> > Rudy
>
> > On 26 Apr., 16:21, TopSolid <d.arno...@topsolid.com> wrote:
> >> Hi Jeff,
>
> >> Is it possible that you could share exiftool ability to extract previews for
> >> ResourceSpace community hosted on Windows?
> >> I have contacted unoconv developer without success so far...
>
> >> Thanks and Best Regards
>
> >> David
>
> > --
> > You received this message because you are subscribed to the Google Groups "ResourceSpace" group.
> > To post to this group, send email to resour...@googlegroups.com.
> > To unsubscribe from this group, send email to resourcespac...@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/resourcespace?hl=en.
>
> --
> Tom Gleason, PHP Developer
>
> ResourceSpace Support Serviceshttps://www.buildadam.com/muse2

David ARNOULT - Edition & Internet Operations

unread,
Apr 26, 2011, 12:33:11 PM4/26/11
to resour...@googlegroups.com
Hi

Thank you guys for your feedback and links, VB.net example does not seem
portable in php...

Yes, there's the embedded preview when unzipping it. It is in
\docProps\thumbnail.wmf, it is here!

Is this supposed to work out of the box without Open office and with
exiftool under windows ? If not, it does not look very complex to develop
this feature with exiftool natively in ResourceSpace.

I will have a look with exiftool possibility but I am not a developper. If a
good php skilled guy could have a look on that, it would be awesome!

Keep you posted

David


-----Message d'origine-----
De : resour...@googlegroups.com [mailto:resour...@googlegroups.com]
De la part de Tom Gleason
Envoyé : mardi 26 avril 2011 16:59
À : resour...@googlegroups.com
Objet : Re: Office files previews (DOCX, XLSX, PPTX)

Jeff Harmon

unread,
Apr 26, 2011, 12:42:37 PM4/26/11
to resour...@googlegroups.com, <resourcespace@googlegroups.com>
I have it and will share it but am indisposed for a while.

Jeff

David ARNOULT - Edition & Internet Operations

unread,
Apr 26, 2011, 12:50:14 PM4/26/11
to resour...@googlegroups.com
Hi!

Just to confirm you that with ExiftoolGUI program all .doc + .docx + pptx +
ppt... thumbnails are correctly displayed :-) but not in RS :-(.

It seems that it just needs to execute this command:
exiftool.exe -b -previewimage" + " " + Chr(34) + FileName + Chr(34) to get
the thumbnail and link it to RS!

Anyone interested in integrating this in php source code?

Cheers

David

-----Message d'origine-----
De : resour...@googlegroups.com [mailto:resour...@googlegroups.com]
De la part de Tom Gleason
Envoyé : mardi 26 avril 2011 16:59
À : resour...@googlegroups.com
Objet : Re: Office files previews (DOCX, XLSX, PPTX)

have you checked if there is an embedded preview in your office files?

David ARNOULT - Edition & Internet Operations

unread,
Apr 26, 2011, 12:52:22 PM4/26/11
to resour...@googlegroups.com
Thank you Jeff! Looking forward testing it on my instance on Windows :-)

Have a nice evening

Rgds

David


-----Message d'origine-----
De : resour...@googlegroups.com [mailto:resour...@googlegroups.com]

De la part de Jeff Harmon
Envoyé : mardi 26 avril 2011 18:43
À : resour...@googlegroups.com
Cc : <resour...@googlegroups.com>

Tom Gleason

unread,
Apr 26, 2011, 12:55:23 PM4/26/11
to resour...@googlegroups.com
the current method is unzipping the preview manually via php. now that
exiftool supports the extraction better, it needs to be changed in
include/preview_preprocessing.php.
I don't have time to do this right now, but if you send me a sample
office file with previews embedded I can try to fix it when I have a
chance.


On Tue, Apr 26, 2011 at 12:52 PM, David ARNOULT - Edition & Internet
Operations <d.ar...@topsolid.com> wrote:
> Thank you Jeff! Looking forward testing it on my instance on Windows :-)
>
> Have a nice evening
>
> Rgds
>

> David
>
>
> -----Message d'origine-----
> De : resour...@googlegroups.com [mailto:resour...@googlegroups.com]

> De la part de Jeff Harmon
> Envoyé : mardi 26 avril 2011 18:43
> À : resour...@googlegroups.com

> Cc : <resour...@googlegroups.com>

David ARNOULT - Edition & Internet Operations

unread,
Apr 26, 2011, 7:40:04 PM4/26/11
to resour...@googlegroups.com
Just to let you know my progress...

I am trying to play with exiftool to extract thumbnail from docx for RS but
no results.

I have tried to modify directly php file preview_preprocessing.php like
this:

if ((($extension=="docx") || ($extension=="xlsx") || ($extension=="pptx") ||
($extension=="xps")) && !isset($newfile))
{
#shell_exec("unzip -p $file \"docProps/thumbnail.jpeg\" >
$target");$newfile = $target;
#shell_exec("C:/xampp/unzip/bin/unzip.exe -p $file
\"/docProps/thumbnail.wmf\" > $target");$newfile = $target;
#shell_exec("C:/xampp/7z/7za e -so $file
\"docProps/thumbnail.wmf\" > $target");$newfile = $target;
shell_exec("c:/xampp/exiftool/exiftool.exe -b -previewimage
$file > $target");$newfile = $target;
}

No result.

Then I have tested on command line.

1/ This is working on command line but not in RS:
c:/xampp/7z/7za e -so c:\travail\temp\2.docx "docProps/thumbnail.wmf" >
c:\travail\temp\dam\image_dam.wmf

2/No thumbnail, but it works with a jpg !
c:/xampp/exiftool/exiftool.exe -b –previewimage c:\travail\temp\2.docx >
c:\travail\temp\dam\image_exif.jpg

I have found this page, interesting, an unzip feature/parameter is missing
somewhere on my command line.
http://cpan.uwinnipeg.ca/htdocs/Image-ExifTool/Image/ExifTool/ZIP.pm.html#Pr
ocessZIP-


Rgds

David
-----Message d'origine-----
De : David ARNOULT - Edition & Internet Operations
[mailto:d.ar...@topsolid.com]
Envoyé : mardi 26 avril 2011 18:52
À : 'resour...@googlegroups.com'
Objet : RE: Office files previews (DOCX, XLSX, PPTX)

Thank you Jeff! Looking forward testing it on my instance on Windows :-)

Have a nice evening

Rgds

David


-----Message d'origine-----
De : resour...@googlegroups.com [mailto:resour...@googlegroups.com]

De la part de Jeff Harmon
Envoyé : mardi 26 avril 2011 18:43
À : resour...@googlegroups.com

Cc : <resour...@googlegroups.com>

Rudy

unread,
Apr 27, 2011, 2:23:29 AM4/27/11
to ResourceSpace
Hmm, maybe you can ask Phil Harvey over at his exifttool-forum:
http://u88.n24.queensu.ca/exiftool/forum/

He usually answers questions on the same day they are posted.

Cheers,
Rudy

On 27 Apr., 01:40, "David ARNOULT - Edition & Internet Operations"
> somewhere on my command line.http://cpan.uwinnipeg.ca/htdocs/Image-ExifTool/Image/ExifTool/ZIP.pm....
> ocessZIP-
>
> Rgds
>
> David
> -----Message d'origine-----
> De : David ARNOULT - Edition & Internet Operations
> [mailto:d.arno...@topsolid.com]
> > On Tue, Apr 26, 2011 at 10:57 AM, Rudy <ruediger.schw...@gmail.com> wrote:
> >> Hmm, not sure this helps, but you might want to give this link a go:
>
> >> Sample VB.NET subroutine to extract a preview image (thanks Claus
> >> Beckmann)
> >>http://owl.phy.queensu.ca/~phil/exiftool/vb_sample.html
>
> >> (taken fromhttp://www.sno.phy.queensu.ca/~phil/exiftool/)

Jeff Harmon

unread,
Apr 27, 2011, 3:28:39 AM4/27/11
to ResourceSpace
it is perplexing when i announce to the group that i have this code,
and then people bang their heads against the wall anyway! so
impatient!



/* -----------------------------------------------
Try Office Docs preview extraction via ExifTool
-----------------------------------------------
*/

if ((($extension=="thmx") || ($extension=="docm") ||
($extension=="ppt") ||($extension=="pptm")||($extension=="xls") ||
($extension=="docx") || ($extension=="xlsx") || ($extension=="xltx")
|| ($extension=="dotm") || ($extension=="dotx")) && !isset($newfile))

{
global $exiftool_path;
if (isset($exiftool_path))
{
shell_exec($exiftool_path.'/exiftool -b -previewimage '.$file.' > '.
$target);
}
if (file_exists($target))
{
#if the file contains an image, use it; if it's blank, it needs to
be erased because it will cause an error in ffmpeg_processing.php
if (filesize($target)>0){$newfile = $target;}else{unlink($target);}
}
}


you can change the extensions list to include which ones you want.
this list reflects how we balance quality with previews from
qlpreview, which most of you are not using.

- Jeff

David ARNOULT - Edition & Internet Operations

unread,
Apr 27, 2011, 9:03:38 AM4/27/11
to resour...@googlegroups.com
Hi,

Sorry for my impatience and banging my head (not so strong!).
Thanks for sahirng your code but it does not seem to work on Windows on my
machine.

When I execute on command line as described in your code:
exiftool.exe -b -previewimage 2.docx > 2.jpg

JPG file is always empty and docx file has an embedded thumbnail. It works
fine with jpg or psd.

I have posted this issue on exiftool forum.
http://u88.n24.queensu.ca/exiftool/forum/index.php/topic,3266.0.html

Keep you posted

Rgds


David
-----Message d'origine-----
De : resour...@googlegroups.com [mailto:resour...@googlegroups.com]
De la part de Jeff Harmon

Envoyé : mercredi 27 avril 2011 09:29
À : ResourceSpace


Objet : Re: Office files previews (DOCX, XLSX, PPTX)

it is perplexing when i announce to the group that i have this code,

- Jeff

--

David ARNOULT

unread,
May 11, 2011, 6:30:13 AM5/11/11
to resour...@googlegroups.com
Hi All!

After contacted Exiftool developer, Office document (.doc) thumbnail are stored in wmf format that is not managed by exiftool (yet). So Jeff's code simply does not work for me...
So I have explored another way, using this code in preview_preprocessing.php:
/* ----------------------------------------
    Try Microsoft OfficeOpenXML Format
    It simply extracts thumbnail.wmf of zip file
   ----------------------------------------
*/

if ((($extension=="docx") || ($extension=="xlsx") || ($extension=="pptx") || ($extension=="xps")) && !isset($newfile))
    {
    shell_exec("C:/xampp/7z/7za e -so $file \"docProps/thumbnail.wmf\" > $target");$newfile = $target;
    }

It works but not for all .docx files... When not working, thumbnail in RS is white or blank. However wmf file is correct and correctly saved within docx. It seems that in some cases word generates different thumbnails in size and resolution. I attached 2 Word files OK and NotOK... I have just added a photo in the second case. Sounds like IM does not handle correctly wmf when there's a photo or bigger size. Any idea guys?

Thanks!

David
NotOK.docx
OK.docx

DrakoWhole

unread,
Jun 4, 2011, 5:17:56 PM6/4/11
to ResourceSpace
Hi group.
After all these efforts have you done any progress???
Thanks
>  NotOK.docx
> 254 KVerDescargar
>
>  OK.docx
> 20 KVerDescargar

DrakoWhole

unread,
Jun 4, 2011, 5:21:01 PM6/4/11
to ResourceSpace
By the way, I found this link with a tool which converts these types
thru a command line...

http://www.oooninja.com/2008/02/batch-command-line-file-conversion-with.html

but I do not know if it is feasible.

DrakoWhole

unread,
Jun 4, 2011, 5:39:11 PM6/4/11
to ResourceSpace
or even this link indicates that this Java tool is being used by
alfresco, nuxeo

http://www.artofsolving.com/opensource/jodconverter/adoption

or this python script tool does the same:

http://www.artofsolving.com/opensource/pyodconverter

But ... is this possible to integrate these tools in RS?


On 4 jun, 23:21, DrakoWhole <drakowh...@gmail.com> wrote:
> By the way, I found this link with a tool which converts these types
> thru a command line...
>
> http://www.oooninja.com/2008/02/batch-command-line-file-conversion-wi...

David ARNOULT - Edition & Internet Operations

unread,
Jun 6, 2011, 4:44:01 AM6/6/11
to resour...@googlegroups.com
Hi!

Thanks for the links and put this topic up to the pile, I haven't done any
development so far on this topic.
Today this solution works with all ppt and pptx (jpeg format is OK for IM).
Problem appears with Word files that uses wmf format that is not supported
by IM or Exiftool (why not jpeg?).

- Exiftool could be a good option but I have no news about the author and
possibility to read wmf.
- IM could be fine but wmf library does not work completely with all files,
it's a shame. I have tested external library but no success.
- On Windows the python script did not work for me with Open Office, path
problems.

I will investigate with new tools you sent.

Btw, anyone has tested with Office 2010 format? Is it the same format (wmf
for Word/jpg for Powerpoint?)

Keep you posted!
And keep us posted on any progress if you find a solution to display Office
2007/2010 document thumbnails on Windows environment within RS.

David


-----Message d'origine-----
De : resour...@googlegroups.com [mailto:resour...@googlegroups.com]

De la part de DrakoWhole
Envoyé : samedi 4 juin 2011 23:18


À : ResourceSpace
Objet : Re: Office files previews (DOCX, XLSX, PPTX)

Hi group.

--

Jeff Harmon

unread,
Jun 11, 2011, 10:28:08 PM6/11/11
to ResourceSpace
ExifTool 8.59 just released with

*Extract PreviewWMF from DOCX files
*Recognize WMF images

- Jeff

On Jun 6, 1:44 am, "David ARNOULT - Edition & Internet Operations"

DrakoWhole

unread,
Jun 13, 2011, 8:34:25 AM6/13/11
to ResourceSpace
Thank you Jeff.
Which code should be tweaked to test it now?
thanks.

David ARNOULT - Edition & Internet Operations

unread,
Jun 13, 2011, 8:49:11 AM6/13/11
to resour...@googlegroups.com
Hi,

I have tested latest version of exiftool and it can extract now wmf file (as
7-zip can do, like this: shell_exec("C:/Tools/7-zip/7za.exe e -so $file
\"docProps/thumbnail.wmf\" > $target");$newfile = $target;). But it can't be
converted to jpg and processed by IM that can't read wmf... even with libwmf
installed. A white image is always created.

See my post here:
http://u88.n24.queensu.ca/exiftool/forum/index.php/topic,3266.0.html

By the mean time I have tested with MS Office 2010 / Word 2010 docx and
thumbnail format has changed to .emf.

It is better, after emf extraction, IM can processed it except full preview
where I have this error:

-Zip (A) 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
Processing archive:
***\include\..\filestore\9\0_fdb2ce845612f27\90_79aec515ccb5709.docx
Extracting docProps\thumbnail.emf
Everything is Ok
Size: 6483696
Compressed: 3690050
Magick: no stream handler is defined
`****\include/../filestore/9/0_fdb2ce845612f27/90_79aec515ccb5709.jpg' @
error/stream.c/QueueAuthenticPixelsStream/843.


Here is my code:
/* ----------------------------------------
Try extract thumbnail Microsoft Office 2007
----------------------------------------
*/
if ((($extension=="xlsx") || ($extension=="xps")) && !isset($newfile))
{
shell_exec("C:/Tools/7-zip/7za.exe e -so $file


\"docProps/thumbnail.wmf\" > $target");$newfile = $target;

#shell_exec("c:/xampp/exiftool/exiftool.pl -b -previewimage $file >


$target");$newfile = $target;
}

if ((($extension=="pptx") || ($extension=="potx")) && !isset($newfile))
{
shell_exec("C:/Tools/7-zip/7za.exe e -so $file
\"docProps/thumbnail.jpeg\" > $target");$newfile = $target;
}

/* ----------------------------------------
Try extract thumbnail Microsoft Office 2010
----------------------------------------
*/
if ((($extension=="docx")) && !isset($newfile))
{
shell_exec("C:/Tools/7-zip/7za.exe e -so $file
\"docProps/thumbnail.emf\" > $target");$newfile = $target;
#shell_exec("c:/xampp/exiftool/exiftool.pl -b -previewimage $file >


$target");$newfile = $target;
}

Any idea is welcome... I have latest IM version on Windows 2003 server x86.

To be continued...

Rgds

David

-----Message d'origine-----
De : resour...@googlegroups.com [mailto:resour...@googlegroups.com]
De la part de DrakoWhole

Envoyé : lundi 13 juin 2011 14:34


À : ResourceSpace
Objet : Re: Office files previews (DOCX, XLSX, PPTX)

Thank you Jeff.

--

David ARNOULT - Edition & Internet Operations

unread,
Jun 13, 2011, 10:16:29 AM6/13/11
to resour...@googlegroups.com
Hi again,

Just being curious on this topic, I have tested inserting directly the
extracted thumbnail.emf image as a resource in RS.
All previews works, even the Full Screen preview but with this error in
apache logs:


Magick: no stream handler is defined

`***\include/../filestore/9/3_3518d9a134260e0/93_c755664de522454.emf' @
error/stream.c/QueueAuthenticPixelsStream/843.

So I don’t' understand why Full screen preview does not work when inserting
the docx file instead...

I attached the .emf file if you want to have a go on your systems!

David


-----Message d'origine-----
De : resour...@googlegroups.com [mailto:resour...@googlegroups.com]

De la part de David ARNOULT - Edition & Internet Operations
Envoyé : lundi 13 juin 2011 14:49
À : resour...@googlegroups.com
Objet : RE: Office files previews (DOCX, XLSX, PPTX)

thumbnail.emf.rar

DrakoWhole

unread,
Jun 21, 2011, 9:39:56 AM6/21/11
to ResourceSpace
Hi.
I am working on the DocumentConverter.py which is something very
similar to unoconv, and is able to convert any file to a PDF file.

When I run the python script from the DOS command it works:

"C:\Program Files (x86)\OpenOffice.org 3\program\python" C:\scripts
\DocumentConverter.py input.xlsx c:\scripts\output.pdf

Now the question is... How do I integrate this into
preview_preprocessing.php?

This is what I have tried, but I am getting a "The regexp string
"SUCCESS$" was not found in the response body" error

$unocommand=$unoconv_path . "/python.exe c:/scripts/
documentconverter.py ";
if (!file_exists($unocommand)) {exit("Unoconv executable not found at
'$unoconv_path'");}

shell_exec($unocommand . $file ." " . $target);
$path_parts=pathinfo($target);


$basename_minus_extension=remove_extension($path_parts['basename']);
$pdffile=$path_parts['dirname']."/".$basename_minus_extension.".pdf";
if (file_exists($pdffile))
{
# Attach this PDF file as an alternative download.
sql_query("delete from resource_alt_files where resource = '".
$ref."' and unoconv='1'");
$alt_ref=add_alternative_file($ref,"PDF version");
$alt_path=get_resource_path($ref,true,"",false,"pdf",-1,1,false,"",
$alt_ref);
copy($pdffile,$alt_path);unlink($pdffile);
sql_query("update resource_alt_files set file_name='$ref-
converted.pdf',description='generated by Open
Office',file_extension='pdf',file_size='".filesize($alt_path)."',unoconv='1'
where resource='$ref' and ref='$alt_ref'");

# Set vars so we continue generating thumbs/previews as if this is a
PDF file
$extension="pdf";
$file=$alt_path;
}
}

On 13 jun, 14:49, "David ARNOULT - Edition & Internet Operations"

David ARNOULT - Edition & Internet Operations

unread,
Jun 21, 2011, 9:55:55 AM6/21/11
to resour...@googlegroups.com
Hi,

Save your php file WITHOUT BOM UTF-8 (simple UTF-8). It will simply solve
response body error.
Keep me posted if any progress, looks good to me!

David

-----Message d'origine-----
De : resour...@googlegroups.com [mailto:resour...@googlegroups.com]
De la part de DrakoWhole

Envoyé : mardi 21 juin 2011 15:40

DrakoWhole

unread,
Jun 22, 2011, 9:25:05 AM6/22/11
to ResourceSpace
Hi, the error is the same. I was already saving the file in UTF-8.
I think that the problem I am having is with this:
shell_exec($unocommand . $file ." " . $target);

In the DOS command you must specify an extension. (output.pdf)
However I do not know whet $target is, and the directory where it is
supposed to be located.
Probably you have to give the python script the proper directory and
file extension.
Any ideas?

On 21 jun, 15:55, "David ARNOULT - Edition & Internet Operations"
<d.arno...@topsolid.com> wrote:
> Hi,
>

David ARNOULT - Edition & Internet Operations

unread,
Jun 23, 2011, 3:24:20 AM6/23/11
to resour...@googlegroups.com
Hi,

Are you sure that your file encoding is without BOM?
It is a typical error for that.

David

-----Message d'origine-----
De : resour...@googlegroups.com [mailto:resour...@googlegroups.com]
De la part de DrakoWhole

Envoyé : mercredi 22 juin 2011 15:25


À : ResourceSpace
Objet : Re: Office files previews (DOCX, XLSX, PPTX)

Hi, the error is the same. I was already saving the file in UTF-8.

--

DrakoWhole

unread,
Jun 24, 2011, 7:18:34 AM6/24/11
to ResourceSpace
Yes, I am.
Unicode UTF-8 without BOMB.

Complete error says:

"The upload stopped with errors

wjhk.julopad2.policies.DEfaultUploadPolicy.checkUploadSuccess(): The
regexp string "^SUCCESS$" was not found in the response body"

As I said I think I must give the command a proper directory and file
extension. How do I manage that in this line?

shell_exec($unocommand . $file ." " . $target);

I do not know what value has $target inside.




On 23 jun, 09:24, "David ARNOULT - Edition & Internet Operations"

DrakoWhole

unread,
Jun 25, 2011, 11:02:29 AM6/25/11
to ResourceSpace
I just managed it! Any PPTX, PPT, XLSX, XLS, DOCX and DOC file is now
converted into a PDF, with the chance of a multipage, in Windows
Environment.

The python script can be downloaded here:

http://www.artofsolving.com/opensource/pyodconverter

Locate the DocumentConmverter.py in a folder. For instance, to make my
code work, this should be in c:\docConv\

PyODConverter requires OpenOffice.org to be running as a service and
listening on port (by default) 8100; the simplest way to start
OpenOffice.org as a service is from the command line

"C:\Program Files\OpenOffice.org 3.x\program\soffice" -
accept="socket,port=8100;urp;"

... And here is the tweaked code of preview_preprocessing.php

/* ----------------------------------------
Try Microsoft OfficeOpenXML Format
Also try Micrsoft XPS... the sample document I've seen uses the same
path for the preview,
so it will likely work in most cases, but I think the specs allow it
to go anywhere.
----------------------------------------
*/
if ((($extension=="docx") || ($extension=="xlsx") ||
($extension=="pptx") || ($extension=="doc") || ($extension=="xls") ||
($extension=="ppt") || ($extension=="xps")) && !isset($newfile))
{

$pythoncommand="\"C:/Program Files (x86)/OpenOffice.org 3/program/
python.exe\" c:/docConv/documentconverter.py ";

shell_exec($pythoncommand . "\"" . $file . "\" " . "\"" . $target .
".pdf" . "\"");

$path_parts=pathinfo($target. ".pdf");
$basename_minus_extension=remove_extension($path_parts['basename']);
$pdffile=$path_parts['dirname']."/".$basename_minus_extension.".pdf";

if (file_exists($pdffile))
{
# Attach this PDF file as an alternative download.
sql_query("delete from resource_alt_files where resource = '".
$ref."' and unoconv='1'");
$alt_ref=add_alternative_file($ref,"PDF version");
$alt_path=get_resource_path($ref,true,"",false,"pdf",-1,1,false,"",
$alt_ref);
copy($pdffile,$alt_path);unlink($pdffile);
sql_query("update resource_alt_files set file_name='$ref-
converted.pdf',description='generated by Open
Office',file_extension='pdf',file_size='".filesize($alt_path)."',unoconv='1'
where resource='$ref' and ref='$alt_ref'");

# Set vars so we continue generating thumbs/previews as if this is a
PDF file
$extension="pdf";
$file=$alt_path;
}


}

Hope this helps for RS alignment in Windows.
Best regards.
Drakowhole.

David ARNOULT - Edition & Internet Operations

unread,
Jun 27, 2011, 11:57:53 AM6/27/11
to resour...@googlegroups.com
Hi Drakowhole,

This is very good news, excellent! I will try your solution ASAP. Will keep
you posted ;-)

Regards

David

-----Message d'origine-----
De : resour...@googlegroups.com [mailto:resour...@googlegroups.com]
De la part de DrakoWhole

Envoyé : samedi 25 juin 2011 17:02


À : ResourceSpace
Objet : Re: Office files previews (DOCX, XLSX, PPTX)

I just managed it! Any PPTX, PPT, XLSX, XLS, DOCX and DOC file is now

http://www.artofsolving.com/opensource/pyodconverter


}

--

David ARNOULT - Edition & Internet Operations

unread,
Jul 8, 2011, 10:26:08 AM7/8/11
to resour...@googlegroups.com
Hi DrakoWhole,

Thank you for your feedback, I have tested your solution on my Windows 7
local machine and it rocks!

But, when I try to install it on my live RS on Windows too, it does not
work. It works fine by the command line but not from php script :-(
Have you done anything particular on Windows server (rights, path, extra
Python needed?)?

Seems that it is a path problem.
Could you attach your code in a separate txt file to be sure?

RS Debug says:
Starting preview preprocessing. File extension is docx.
create_previews_using_im(ref=1456,thumbonly=,extension=docx,previewonly=,pre
viewbased=,alternative=-1)
SQL: select * from preview_size order by width desc, height desc
Contemplating hpr (sw=, tw=9999999, sh=, th=9999999, extension=docx)
Generating preview size hpr to
G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456hpr_0a322f
2837b9d6c.jpg
Contemplating lpr (sw=, tw=2000, sh=, th=2000, extension=docx)
Contemplating scr (sw=, tw=850, sh=, th=850, extension=docx)
Generating preview size scr to
G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456scr_48458c
d5d39fb71.jpg
Contemplating pre (sw=, tw=350, sh=, th=350, extension=docx)
Generating preview size pre to
G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456pre_f1bcf1
7dc8cb6a9.jpg
Contemplating thm (sw=, tw=150, sh=, th=150, extension=docx)
Generating preview size thm to
G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456thm_83db3f
f5f06d264.jpg
Contemplating col (sw=, tw=75, sh=, th=75, extension=docx)
Generating preview size col to
G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456col_996189
ebff427c9.jpg

Apache log says :
'unzip' n'est pas reconnu en tant que commande interne
ou externe, un programme ex‚cutable ou un fichier de commandes.
Syntaxe du nom de fichier, de r‚pertoire ou de volume incorrecte.
identify.exe: no decode delegate for this image format
`G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456_fa4a7e5e
ef4e9cb.docx' @ error/constitute.c/ReadImage/532.
convert.exe: no decode delegate for this image format
`G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456_fa4a7e5e
ef4e9cb.docx' @ error/constitute.c/ReadImage/532.
convert.exe: missing an image filename
`G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456hpr_0a322
f2837b9d6c.jpg' @ error/convert.c/ConvertImageCommand/3015.
convert.exe: no decode delegate for this image format
`G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456_fa4a7e5e
ef4e9cb.docx' @ error/constitute.c/ReadImage/532.
convert.exe: missing an image filename
`G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456scr_48458
cd5d39fb71.jpg' @ error/convert.c/ConvertImageCommand/3015.
convert.exe: no decode delegate for this image format
`G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456_fa4a7e5e
ef4e9cb.docx' @ error/constitute.c/ReadImage/532.
convert.exe: missing an image filename
`G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456pre_f1bcf
17dc8cb6a9.jpg' @ error/convert.c/ConvertImageCommand/3015.
convert.exe: no decode delegate for this image format
`G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456_fa4a7e5e
ef4e9cb.docx' @ error/constitute.c/ReadImage/532.
convert.exe: missing an image filename
`G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456thm_83db3
ff5f06d264.jpg' @ error/convert.c/ConvertImageCommand/3015.
convert.exe: no decode delegate for this image format
`G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456_fa4a7e5e
ef4e9cb.docx' @ error/constitute.c/ReadImage/532.
convert.exe: missing an image filename
`G:\Sites\library\include/../filestore/1/4/5/6_59b2de397cf33bc/1456col_99618
9ebff427c9.jpg' @ error/convert.c/ConvertImageCommand/3015.
[Fri Jul 08 16:19:58 2011] [error] [client 192.168.9.108] (20024)The given
path is misformatted or contained invalid characters: Cannot map GET
/pages/%22c:/tools/openoffice/program/python.exe%22%20c:/tools/conv/filestor
e/1/4/5/6_59b2de397cf33bc/filestore/1/4/5/6_59b2de397cf33bc/1456_fa4a7e5eef4
e9cb.jpg.pdf%22../gfx/type1_col.gif HTTP/1.1 to file, referer:
http://library.topsolid.com/pages/upload_swf.php?resource_type=2&collection_
add=&entercolname=&replace=&no_exif=&autorotate=

Thanks!

David
-----Message d'origine-----
De : resour...@googlegroups.com [mailto:resour...@googlegroups.com]
De la part de DrakoWhole

Envoyé : samedi 25 juin 2011 17:02

À : ResourceSpace
Objet : Re: Office files previews (DOCX, XLSX, PPTX)

I just managed it! Any PPTX, PPT, XLSX, XLS, DOCX and DOC file is now

http://www.artofsolving.com/opensource/pyodconverter


}

--

David ARNOULT - Edition & Internet Operations

unread,
Jul 12, 2011, 4:13:56 AM7/12/11
to resour...@googlegroups.com
Hi,

Just to let you know that my problem is now solved and DrakoWhole solution
rocks on Windows.
It was brackets syntax problem: I had to simplify the pythoncommand path and
now it works like a charm! It would be worth to integrate it in standard for
Windows users I suppose...

/* ----------------------------------------
Try Microsoft OfficeOpenXML Format
Also try Micrsoft XPS... the sample document I've seen uses the same
path for the preview,
so it will likely work in most cases, but I think the specs allow it
to go anywhere.
----------------------------------------
*/
if ((($extension=="docx") || ($extension=="xlsx") || ($extension=="pptx") ||
($extension=="doc") || ($extension=="xls") || ($extension=="ppt") ||

($extension=="xps") || ($extension=="dot") || ($extension=="dotx") ||
($extension=="pot") || ($extension=="potx")) && !isset($newfile))
{
$pythoncommand = "C:/Tools/OpenOffice/program/python.exe
C:/Tools/DocConv/DocumentConverter.py ";
shell_exec($pythoncommand . $file . " " . $target . ".pdf");



$path_parts=pathinfo($target. ".pdf");
$basename_minus_extension=remove_extension($path_parts['basename']);

$pdffile=$path_parts['dirname']."/".$basename_minus_extension.".pdf";


if (file_exists($pdffile))
{

# Attach this PDF file as an alternative download.
sql_query("delete from resource_alt_files where resource =

'".$ref."' and unoconv='1'");


$alt_ref=add_alternative_file($ref,"PDF version");

$alt_path=get_resource_path($ref,true,"",false,"pdf",-1,1,false,"",$alt_ref)
;
copy($pdffile,$alt_path);unlink($pdffile);
sql_query("update resource_alt_files set file_name='$ref-

converted.pdf',description='Generated by
OpenOffice',file_extension='pdf',file_size='".filesize($alt_path)."',unoconv


='1'where resource='$ref' and ref='$alt_ref'");

# Set vars so we continue generating thumbs/previews as if
this is a PDF file
$extension="pdf";
$file=$alt_path;
}
}

Rgds

David

-----Message d'origine-----
De : resour...@googlegroups.com [mailto:resour...@googlegroups.com]
De la part de DrakoWhole

Envoyé : samedi 25 juin 2011 17:02

À : ResourceSpace
Objet : Re: Office files previews (DOCX, XLSX, PPTX)

I just managed it! Any PPTX, PPT, XLSX, XLS, DOCX and DOC file is now

http://www.artofsolving.com/opensource/pyodconverter


}

--

Robert Damrau

unread,
Jan 25, 2013, 6:36:13 AM1/25/13
to resour...@googlegroups.com, d.ar...@topsolid.com
Hi,

i tried to get this working as described here, but no luck. The errors i get are similar as yours above

Starting preview preprocessing. File extension is docx.
create_previews_using_im(ref=56,thumbonly=,extension=docx,previewonly=,previewbased=,alternative=-1)
CLI command: "e:/ResourceSpace/imagemagic\identify.exe" -format %wx%h "C:\inetpub\wwwroot\resourcespace\include/../filestore/5/6_9215ad36892c5a7/56_8b024c23b69530f.docx"[0]
CLI output: 
CLI errors: identify.exe: no decode delegate for this image format `C:\inetpub\wwwroot\resourcespace\include/../filestore/5/6_9215ad36892c5a7/56_8b024c23b69530f.docx' @ error/constitute.c/ReadImage/550.
SQL: select * from preview_size  order by width desc, height desc
Contemplating hpr (sw=, tw=999999, sh=, th=999999, extension=docx)
Generating preview size hpr to C:\inetpub\wwwroot\resourcespace\include/../filestore/5/6_9215ad36892c5a7/56hpr_455701aafd89330.jpg
CLI command: "e:/ResourceSpace/imagemagic\convert.exe" "C:\inetpub\wwwroot\resourcespace\include/../filestore/5/6_9215ad36892c5a7/56_8b024c23b69530f.docx"[0] +matte -flatten -quality 90 +matte +profile "*" -colorspace sRGB -resize 999999x999999">" "C:\inetpub\wwwroot\resourcespace\include/../filestore/5/6_9215ad36892c5a7/56hpr_455701aafd89330.jpg"
CLI output: 
CLI errors: convert.exe: no decode delegate for this image format `C:\inetpub\wwwroot\resourcespace\include/../filestore/5/6_9215ad36892c5a7/56_8b024c23b69530f.docx' @ error/constitute.c/ReadImage/550.
convert.exe: no images defined `C:\inetpub\wwwroot\resourcespace\include/../filestore/5/6_9215ad36892c5a7/56hpr_455701aafd89330.jpg' @ error/convert.c/ConvertImageCommand/3066.
Contemplating lpr (sw=, tw=2000, sh=, th=2000, extension=docx)
Contemplating scr (sw=, tw=600, sh=, th=9999, extension=docx)
Generating preview size scr to C:\inetpub\wwwroot\resourcespace\include/../filestore/5/6_9215ad36892c5a7/56scr_146cd351885d3cf.jpg
CLI command: "e:/ResourceSpace/imagemagic\convert.exe" "C:\inetpub\wwwroot\resourcespace\include/../filestore/5/6_9215ad36892c5a7/56_8b024c23b69530f.docx"[0] +matte -flatten -quality 90 +matte +profile "*" -colorspace sRGB -resize 600x9999">" "C:\inetpub\wwwroot\resourcespace\include/../filestore/5/6_9215ad36892c5a7/56scr_146cd351885d3cf.jpg"
CLI output:
...

So i assume the phytoncommand isn't executed? Can you maybe attach a copy of your modified preview_preprocessing.php?
Also, how do i exactly load soffice as a service? I tried the above command but it doesn't seem to run in the background.

Thanks for any help!

Steven Swart

unread,
Mar 17, 2022, 4:23:38 PM3/17/22
to ResourceSpace
Where would one put this code block?
Reply all
Reply to author
Forward
0 new messages