Since 2007, Microsoft began to use a file format docx, which is created by using the Office Open XML. The format is a zip-file containing the text in the form of XML, graphics and other data that can be translated into a sequence of bits using patent-protected binary formats. At first it was assumed that this format will replace the doc, but both formats are still used today.
I am new to web development and trying to create a language translation app in Django that translates uploaded documents. It relies on a series of interconversions between pdf and docx. When my code ouputs the translated document it cannot be opened.
I am trying to convert docx files to pdf on my Ubuntu server using the command line but none of converters I tried so far seems to convert Word 2007/2010/2013 files correctly.
Appearently online converters can manage it without any problems but Web services are not an option because the files contain sensitive data. For tests I use this Word 2007 file because it contains some important elements (formulas, vector graphics, images, lists, etc.). I tested the following tools (partly from this post):
Is there any way to convert docx files to PDF on Linux correctly? It would also help me if I knew it works for someone with any of the programs I already mentioned.I will start a bounty as soon as SE lets me.
I had to conclude that as for me, as for now, there is no reliable tool which will work with new MS Word formats and all kind of its elements on Ubuntu and create a one-to-one copy of docx files. None of tools I tested could convert the sample file properly. Since I will be facing very different kind of document versions/contents and the output quality has one of the highest priority, I will end up performing the conversions by means of VB macros in Word on a Windows server connected to my Linux.
It seems that libreoffice and unoconv have some problems with correctly rendering the flow chart that is in the .docx file. This is probably because it was made using smart art in Microsoft Office. That is the problem. That is a bug also discussed on this thread. The textual and visual information is present in the pdf resulting from the above method as you can see (I had to select the text, though).
In short, what you are doing is really hard and there are at present no solutions that will fully satisfy you. The achilles' heel of docx2pdf conversions is the smart art. If you can live without that or if you can find a way to spot smart art and convert it somehow into an image, you can reach your goal.
If the flow charts are often very similar and depending on how good a developper you are, you could try and convert the smart art separately. You could, extract the drawing1.xml file from the .docx cluster of documents and then use natural language processing and some crazy hacks to rebuild a the smart art. For instance, you'd have to mess with this type of xml:
I have done some more research the past few days and I have found a service that does the conversion perfectly: zamzar. Zamzar allows you to upload a docx file and then emails you a link. They also have a (paying?) service where you can send any file to [email protected] and then get the converted file back in your inbox. You could easily build a system around this where you automatically send the file and parse it from the email. This is not so much work and it the end result is the best.
So to be sure you need to process your .docx files with a Microsoft Word installation (and yes, I think it's their option and it's fair. If you do not want to use Word, don't use it --- I go with LaTeX for my work, but it's difficult to convince the rest of the world around...).
Dear @ajlittoz, I meant that regardless of the Font/Font size setting for the equation the conversion to .docx defaults to Cambria/12 pt. Unfortunately the real world is not ideal and many official publishers such as IEEE do not accept .odt, only .tex/.doc/.docx. I do love the LO program though, and wish that there would be a way to fix these minor issues.
Until now I've been using an online service to achieve this, but this has several disadvantages.
Therefore, I was wondering whether there exists a package for arch which is able to make the conversion from docx to pdf.
I've been doing some research online and it looks like I'll need a PDF to Docx converter to change the Precepts Symbols document that I have into a form that I will need to add them to Highlights - New Palette. There are several online that are listed as being "Free." Can anyone here recommend a particular one?
With a suite of other easy-to-use tools for merging and splitting PDFs, compressing and rotating PDFs, and deleting PDF pages, our PDF converter breaks you free from the typical constraints of PDF files.
Try our PDF to Word converter free with a free trial, or sign up for a monthly, annual, or lifetime membership to get unlimited access to all our tools, including unlimited document sizes and the ability to convert multiple documents at once.
Mammoth is designed to convert .docx documents, such as those created by Microsoft Word, Google Docs and LibreOffice, and convert them to HTML. Mammoth aims to produce simple and clean HTML by using semantic information in the document, and ignoring other details. For instance, Mammoth converts any paragraph with the style Heading1 to h1 elements, rather than attempting to exactly copy the styling (font, text size, colour, etc.) of the heading. This allows you to paste from Word documents without the usual mess.
By default, Mammoth maps some common .docx styles to HTML elements. For instance, a paragraph with the style name Heading 1 is converted to a h1 element. If you have a document with your own custom styles, you can use an embedded style map to tell Mammoth how those styles should be mapped. For instance, you could convert paragraphs with the style named WarningHeading to h1 elements with class="warning" with the style mapping:
Convert your SubRip (.srt) subtitle files to Word (.docx) quickly and easily with Ebby's free online SRT subtitle converter.
No need to download and install any third-party software on your computer - all in your browser and it works on Windows, Mac (Apple), Linux and any mobile device. Simply upload your SubRip file and hit the Convert SRT button.
Most of the files are converted correctly using unoconv as a document converter on our Moodle instance. But we had recently a complaint from a teacher of our institution about a bunch of .docx files submitted to an assignment that was not being converted correctly and a blank page was showing instead. I was able to confirm that. Those files are not converted anywhere, not even on demo.moodle.net.
Tried to unzip the docx documents and understand some similarities between the documents. Maybe beacuse they have jpeg files embedded? But some with png files were also problematic... ( -58272) Maybe it is the word version... but doesn't seem to be that...
NOTE: These two sites are just for development and are very lightly loaded, which could possibly make a difference. I have not had any problems with unoconv working correctly, as long as the file types are listed in the, Supported document conversions, list for Site administration > Plugins > Document converters > Manage document converters.
Unfortunately, unoconv seems to break on pretty often...and it is broken right now, so I would expect that you get no success. You can check this yourself. If you notice when you go to that site, it says you can log in as admin. Do so, and go Site administration > Plugins > Document converters > Manage document converters. You should see that Google drive is NOT Enabled, and that there is NOTHING listed under Unconv, Supported document conversions. If you click the link for, Settings, for Unoconv, you will see a red X which indicates the path is wrong to the document converter.
You are right, it is broken. And not even moodle cloud can be used to check unoconv...
I was trying to find a way to check if the problem is with our unoconv instance or if there is some problem with those .docx files in particular. Because I get trouble submitting only a few of the files I consider the problem being on the files and not on our unoconv instance configuration...
You told me you were able to submit the files I attached, right? But did you opened and saved them before trying to submit them? I mean did you change anything on the files before submitting?
I would really like for EN to be able to export directly to a Word file... I would love to use EN for structuring and writing my books, but I need to be able to convert it to an rtf or plain text or docx for sending to publishers...
I agree. Evernote should have a simple "export to docx." There's no excuse for lacking this. When I export to html, most of the time Word cannot open it. Says the document must be corrupted. C'mon guys and gals, this is a much-needed addition to Evernote.
Definitely need a simple "Export to docx" feature for notes. I'm using Evernote to write a book, and it's excellent for organizing chapters and main points within the chapters. Ultimately, however, I need to put that all into a Word format. Someone suggested saving as HTML and that Word could read that, but as with a previous poster in this thread, Word couldn't open it. Are we stuck with copying and pasting each individual note?
I suggested that, and I've had pretty good luck the few times I've done it; just tried my weekly work journal just now -- it's a big table, with an image in the header -- and it did fine. Evernote for Windows can also print to PDF, and Word can read those as well. If and until Evernote ever provides an export to .docx, those are the options.
760c119bf3