Why won't Convert.ToPDF converrt my xml document as plain text?

195 views
Skip to first unread message

Aaron

unread,
Oct 14, 2014, 7:05:58 PM10/14/14
to
Q:

I'm trying to convert a text file, containing XML, to PDF on Windows.  But I'm getting an error.  Why?

A:

Convert.ToPDF's txt-to-PDF conversion is, on Windows, performed using Office interop.  If you use the Convert add-on to convert the text file to PDF, you will need to make sure Word displays the document as desired.

In the case of one such document, Office appears to give an error when opening it:

The document sbml_xml_as_txt.txt cannot be opened because there are problems with the contents.

Details:
A text/xml declaration may occur only at the very beginning of input.

Location: Line: 1, Column: 40

It seems that Office is detecting that the file is XML, and thus refuses to open it as text. 

If you'd like to continue using Office to layout the text, you would need to find some way to configure Office to open this file without complaint.

An alternative would be to give the file a new extension (".plaintext") which is associated with a program able to print text documents (such as a text editor).

There are, of course, many other ways to use the PDFNet SDK to add text to a PDF page.

To start with, you may want to take a look at this sample:

http://www.pdftron.com/pdfnet/samplecode/data/Text2PDF.zip

It shows how to write lines of text, starting a new line at a specific width.

To fill text to an exact width, the following KB article might be helpful:

https://groups.google.com/d/msg/pdfnet-sdk/MdEMEkr6wxs/qv7ezDu2QSkJ

To scale and place arbitrary text on a PDF page, you might want to look at the stamper sample:

http://www.pdftron.com/pdfnet/samplecode.html#Stamper


Aaron

unread,
Oct 16, 2014, 3:11:08 PM10/16/14
to pdfne...@googlegroups.com


Here are discrete steps for setting up such a file association on Windows:
  1. Rename the existing .txt file to .mytext.
  2. Associate .mytext files with notepad, so that it opens using notepad.exe.
  3. In the Windows registry, under HKEY_CLASSES_ROOT, find .mytext.  It should have a default value --- in my system, this was mytext_auto_file.
  4. In the Windows registry, under HKEY_CLASSES_ROOT, find that value (mytext_auto_file on my system).
  5. Under shell, there should be "edit" and "open" commands.  Create a print/command, with a (default) value of notepad /p %1.
  6. You should be able to right-click on the .mytext file in a Windows Explorer window and select "print".  (You may need to temporarily set your printer to something that will actually print, to test that this works.)
  7. Some versions of notepad allow a "/pt" option for "printto".  (Mine does not.)  If so, you can add a "printto" command "notepad /pt %1 %2".  Otherwise, you would need to set your default printer to the "PDFTron PDFNet" driver.
  8. The conversion should now work correctly for .mytext files.

Aaron

unread,
Oct 17, 2014, 6:35:13 PM10/17/14
to pdfne...@googlegroups.com
An alternative solution is to run the following code to convert text to PDF:

//---------------------------------------------------------------------------------------
// Copyright (c) 2001-2014 by PDFTron Systems Inc. All Rights Reserved.
// Consult legal.txt regarding legal and license information.
//---------------------------------------------------------------------------------------

import pdftron.Common.PDFNetException;
import pdftron.PDF.*;
import pdftron.SDF.SDFDoc;

import java.io.*;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class ConvertTest {

   
public static int indexOfAny(String str, String[] searchStrs, int startIdx) {
       
if ((str == null) || (searchStrs == null)) {
           
return -1;
       
}
       
int sz = searchStrs.length;

       
// String's can't have a MAX_VALUEth index.
       
int ret = Integer.MAX_VALUE;

       
int tmp = 0;
       
for (int i = 0; i < sz; i++) {
           
String search = searchStrs[i];
           
if (search == null) {
               
continue;
           
}
            tmp
= str.indexOf(search, startIdx);
           
if (tmp == -1) {
               
continue;
           
}

           
if (tmp < ret) {
                ret
= tmp;
           
}
       
}

       
return (ret == Integer.MAX_VALUE) ? -1 : ret;
   
}


   
public static PDFDoc textToPDF(String text_file_path)
   
{
       
PDFDoc doc = null;
       
try
       
{
            doc
= new PDFDoc();
           
ElementBuilder eb = new ElementBuilder();
           
ElementWriter writer = new ElementWriter();
           
Element element;

           
// Start a new page ------------------------------------
           
// Position an image stream on several places on the page
           
Page page = doc.pageCreate();
            writer
.begin(page);    // begin writing to this page

           
// Begin writing a block of text
            element
= eb.createTextBegin(Font.create(doc, Font.e_courier), 8);

           
// Position the text on the page...
           
double col_start_x = 60;
           
double col_start_y = page.getPageHeight() - 20;
            element
.setTextMatrix(1, 0, 0, 1, col_start_x, col_start_y);
            element
.getGState().setLeading(15);    // Set the spacing between lines
            writer
.writeElement(element);


           
String para = new String(Files.readAllBytes(Paths.get(text_file_path)), Charset.defaultCharset());

           
int para_end = para.length();
           
int text_run = 0;
           
int text_run_end;

           
// Set text column width
           
double para_width = page.getPageWidth() - col_start_x - 20;

           
// Draw the text on the page...
           
double cur_width = 0;
           
String[] line_break = {" ", "\r", "\n"};

           
while (text_run < para_end)
           
{
                text_run_end
= indexOfAny(para, line_break, text_run);
               
if (text_run_end < 0) text_run_end = para_end - 1;
               
               
boolean new_line = false;
               
if (para.charAt(text_run_end) == '\r' || para.charAt(text_run_end) == '\n')
               
{   // If new line character ...
                    new_line
= true;
               
}

               
int num_chars = text_run_end-text_run;
               
String text = para.substring(text_run, text_run + (new_line ? num_chars : num_chars+1));
                element
= eb.createTextRun(text);
               
if (cur_width + element.getTextLength() < para_width)
               
{
                    writer
.writeElement(element);
                    cur_width
+= element.getTextLength();
               
}
               
else
               
{
                    writer
.writeElement(eb.createTextNewLine());  // New line
                    text
= para.substring(text_run, text_run + (text_run_end-text_run+1));
                    element
= eb.createTextRun(text);
                    cur_width
= element.getTextLength();
                    writer
.writeElement(element);
               
}

               
if (new_line)
               
{
                    writer
.writeElement(eb.createTextNewLine());  // New line
                   
if (para.charAt(text_run_end) == '\r' && text_run_end + 1 < para_end && para.charAt(text_run_end+1) == '\n')
                   
{   // treat carriage return / linefeed  pair as a single new line character.
                       
++text_run_end;
                   
}
               
}

                text_run
= text_run_end+1;
           
}

           
// Finish the block of text
            writer
.writeElement(eb.createTextEnd());        

            writer
.end();  // save changes to the current page
            doc
.pagePushBack(page);
       
}
       
catch(IOException e)
       
{
           
System.out.print("Unable to convert text file '");
           
System.out.print(text_file_path);
           
System.out.print("' to PDF, IOException:");
           
System.out.println(e);
       
}
       
catch(PDFNetException e)
       
{
           
System.out.print("Unable to convert text file '");
           
System.out.print(text_file_path);
           
System.out.print("' to PDF, PDFNetException:");
           
System.out.println(e);
       
}

       
return doc;
   
}

   
public static void main(String[] args)
   
{
       
boolean uninstallPrinterWhenDone = false; // change this to test the uninstallation functions
       
PDFNet.initialize();

       
// Relative path to the folder containing test files.
       
String input_path =  "../../TestFiles/";
       
String output_path = "../../TestFiles/Output/";
       
String outputFile;
       
       
// Convert a TXT document to PDF
       
try
       
{
           
System.out.println("Converting TXT to PDF");
           
PDFDoc doc = textToPDF(input_path + "input.txt");
            outputFile
= output_path + "output.pdf";
            doc
.save(outputFile, 0, null);
           
System.out.println("Result saved in " + outputFile);
       
}
       
catch(PDFNetException e)
       
{
           
System.out.println("Unable to convert TXT document to PDF, error:");
           
System.out.println(e);
       
}

       
System.out.println("Done.");
       
PDFNet.terminate();
   
}
}



Jörg B.

unread,
Nov 11, 2014, 6:46:38 PM11/11/14
to pdfne...@googlegroups.com
Aaron,

I am currently starting to re-touch that topic in one of our applications, too - converting plaintext files to .pdf (we are already using PDFnet SDK elsewhere in the application) & I have a Q regarding that sample code: by reading over it, it seems like there's only one page being created.. correct?

How would this work with large plaintext files, both in terms of amount of lines as well as columns: e.g. a '1 row' text file which has no newlines but just one (very) long string OR one with many, many rows (hundreds/thousands of potential pages).. or a combination of both?


While I can conveniently read files line by line (e.g. via the StreamReader.Readline() method), how would that translate into properly aligned text inside the .pdf (page) and if the current line 'flows' over the current page, how would I properly flow the text over to the next page etc?

Thanks,
-Jörg

Aaron

unread,
Nov 12, 2014, 8:26:54 PM11/12/14
to pdfne...@googlegroups.com
Hello Jörg,

You are correct that this simple is quite limited in its layout functionality.  If you're interested, we're readying an alpha release of improved text layout functionality in PDFNet, and we have a demo of better text-to-PDF functionality we could send you.  Please send an email to support at pdftron.com if you're interested.  Thanks!

Support

unread,
Nov 14, 2014, 1:53:36 AM11/14/14
to pdfne...@googlegroups.com
Hello Jörg,

We will provide you shortly with a preview version of the new API that will allow for quick & simple text layout, but in the meantime, I wanted to point couple of alternative ways to accomplish the same task with the current API:

Option A) Use pdftron.PDF.HTML2PDF (perhaps by surrounding bits of text in HTML string tags)

   https://www.pdftron.com/pdfnet/samplecode.html#HTML2PDF

 Option B) Load Text with help of .NET Flow API then serialize content to PDF. This is shown in Xaml2Pdf sample:

   https://www.pdftron.com/pdfnet/samplecode.html#Xaml2Pdf

Reply all
Reply to author
Forward
0 new messages