Re: Conversor De Pdf A Word

0 views

Skip to first unread message

Message has been deleted

Adrian Rocher

unread,

Jul 14, 2024, 1:21:56 AM7/14/24

to raiketcasi

--dc--adobecom.hlx.page/dc-shared/assets/images/frictionless/how-to-images/word-to-pdf-how-to.svg A Microsoft Word document next to an Adobe Acrobat document displaying the Word to PDF conversion process

conversor de pdf a word

Descargar https://tweeat.com/2yPe2M

Here is a modification of a program that worked for me. It uses Word 2007 with the Save As PDF add-in installed. It searches a directory for .doc files, opens them in Word and then saves them as a PDF. Note that you'll need to add a reference to Microsoft.Office.Interop.Word to the solution.

I went through the Word to PDF pain when someone dumped me with 10000 word files to convert to PDF. Now I did it in C# and used Word interop but it was slow and crashed if I tried to use PC at all.. very frustrating.

This lead me to discovering I could dump interops and their slowness..... for Excel I use (EPPLUS) and then I discovered that you can get a free tool called Spire that allows converting to PDF... with limitations!

Also, with Office 2007 having publish to PDF functionality, I guess you could use office automation to open the *.DOC file in Word 2007 and Save as PDF. I'm not too keen on office automation as it's slow and prone to hanging, but just throwing that out there...

Microsoft PDF add-in for word seems to be the best solution for now but you should take into consideration that it does not convert all word documents correctly to pdf and in some cases you will see huge difference between the word and the output pdf. Unfortunately I couldn't find any api that would convert all word documents correctly.The only solution I found to ensure the conversion was 100% correct was by converting the documents through a printer driver. The downside is that documents are queued and converted one by one, but you can be sure the resulted pdf is exactly the same as word document layout.I personally preferred using UDC (Universal document converter) and installed Foxit Reader(free version) on server too then printed the documents by starting a "Process" and setting its Verb property to "print". You can also use FileSystemWatcher to set a signal when the conversion has completed.

I know this can be done using Microsoft.Office.Interop.Word, but my application is .NET Core and does not have access to Office interop. It could be running on Azure, but it could also be running in a Docker container on anything else.

Bad news at the moment there isn't a lot of choice for PDF generation libraries on .NET Core. Since it doesn't look like you want to pay for one and you can't legally use a third party service we have little choice except to roll our own.

The main problem is getting the Word Document Content transformed to PDF. One of the popular ways is reading the Docx into HTML and exporting that to PDF. It was hard to find, but there is .Net Core version of the OpenXMLSDK-PowerTools that supports transforming Docx to HTML. The Pull Request is "about to be accepted", you can get it from here:

Now that we can extract document content to HTML we need to convert it to PDF. There are a few libraries to convert HTML to PDF, for example DinkToPdf is a cross-platform wrapper around the Webkit HTML to PDF library libwkhtmltox.

If you only want to show Word .docx files in a web browser its better not to convert the HTML to PDF as that will significantly increase bandwidth. You could store the HTML in a file system, cloud, or in a dB using a VPP Technology.

Next thing we need to do is pass the HTML to DinkToPdf. Download the DinkToPdf (90 MB) solution. Build the solution - it will take a while for all the packages to be restored and for the solution to Compile.

The DinkToPdf library requires the libwkhtmltox.so and libwkhtmltox.dll file in the root of your project if you want to run on Linux and Windows. There's also a libwkhtmltox.dylib file for Mac if you need it.

Ps. I realise you wanted to convert both .doc and .docx to PDF. I'd suggest making a service yourself to convert .doc to docx using a specific non-server Windows/Microsoft technology. The doc format is binary and is not intended for server side automation of office.

The LibreOffice project is a Open Source cross-platform alternative for MS Office. We can use its capabilities to export doc and docx files to PDF. Currently, LibreOffice has no official API for .NET, therefore, we will talk directly to the soffice binary.

It is a kind of a "hacky" solution, but I think it is the solution with less amount of bugs and maintaining costs possible. Another advantage of this method is that you are not restricted to converting from doc and docx: you can convert it from every format LibreOffice support (e.g. odt, html, spreadsheet, and more).

I wrote a simple c# program that uses the soffice binary. This is just a proof-of-concept (and my first program in c#). It supports Windows out of the box and Linux only if the LibreOffice package has been installed.

I don't know if this suits your use case, as you haven't specified the size of the documents you're trying to write, but if they're < 3 pages or you can manipulate them to be less than 3 pages, it will allow you to convert them into PDFs.

After struggling for some hours, I found that the test.docx copied to bin file is only 1kb. To solve this, right click test.docx > Properties, set Copy to Output Directory to Copy always solves this problem.

For converting DOCX to PDF even with placeholders, I have created a free "Report-From-DocX-HTML-To-PDF-Converter" library with .NET CORE under the MIT license, because I was so unnerved that no simple solution existed and all the commercial solutions were super expensive. You can find it here with an extensive description and an example project:

You only need the free LibreOffice. I recommend using the LibreOffice portable edition, so it does not change anything in your server settings. Have a look, where the file "soffice.exe" (on Linux it is called differently) located, because you need it to fill the variable "locationOfLibreOfficeSoffice".

As you see, you can also convert from DOCX to HTML. Also, you can put placeholders into the Word document, which you can then "fill" with values. However, this is not in the scope of your question, but you can read about that on Github (README).

This is adding to Jeremy Thompson's very helpful answer. In addition to the word document body, I wanted the header (and footer) of the word document converted to HTML. I didn't want to modify the Open-Xml-PowerTools so I modified Main() and ParseDOCX() from Jeremy's example, and added two new functions. ParseDOCX now accepts a byte array so the original Word Docx isn't modified.

In my case, I then convert the HTML files to images (using Net-Core-Html-To-Image, also based on wkHtmlToX). I combine the header and body images together (using Magick.NET-Q16-AnyCpu), placing the header image at the top of the body image.

Here is my implementation of Shmuel H. method using LibreOffice binary on windows, maybe this could help someone out. It works pretty well, just ensure you install LibreOffice, I used the portable version ( -versions/) and copied it to my C drive. Performance wise it is not too bad, most of the time it takes is for loading LibreOffice into memory. Apparently you can have it running as a service somehow which should speed things up but I haven't been able to do so yet.

I have MS WORD 2000 documents which I wish to combine with jpegs into 1 pdf. The WORD documents do not convert - the error says that it is not a supported file type or the file is damaged (the latter is not true). I have read some old posts on this sublect but would like to know how to fix this for my particular version of WORD - I know it is ancient, but.... I am currently using a free trial version of adobe, and I will definitely not be purchasing it if this problem persists.Thanks for any help, Andy

The suggestion to install the 32-bit version was great - this version worked for me - thanks for the idea. I must say that actually finding out how to do this was very time consuming - fortunately someone had done this previously and I was able to follow their suggestions. Andy

Is this a behavior with a particular Word file or with all the files that you try to convert to PDF? Please try with a different Word file and check. If the Word file is stored on a shared network/drive, please download it to our computer first and then try again.

Also please try to create the PDF form the Acrobat file menu > Create > PDF from file and check. You may also go through the help page -to/create-pdf-files-word-excel-website.html and see if that works.

Thanks for the reply. All WORD files fail in the same way. I have tried converting a few to docx files but they too fail exactly the same. To create the pdf I use Adobe Acrobat DC (64 bit) - on the home screen I click the Combine Files tab, top rightish. I drag and drop some WORD files and click the Combine button, top right of screen. The watch symbol appears, so that I know it is processing, and then I get the screenshot jpeg attached. They are 4 Word documents

I can also go to Tools, and then Create PDF, multiple files, and select combine files. I then drag & drop 4 files, 2 Word and 2 jpeg, as it is this type of pdf with mixed file-type input that I wish to create.I then click the Combine button and get the attached screenshot 2. The 2 jpegs are OK and the 2 Word documents have failed. My trial of this product continues to run, but I am unable to use the product. Help appreciated! Andy

Ok - I will try this, but how? Should I uninstall the 64-bit version first? I was not asked if I wanted 32bit when I started the trial (presumably because I have a 64-bit system!), so how do I select 32-bit version? Andy

I am a final year graduate student and I have my thesis (about 350 pages) in Microsoft Word format. I would like to convert the document into a LaTeX "camera" ready PDF. Is there any easy way to do this?I am very new to LateX..