AcroForm

Jarosław Bober

unread,

Jan 21, 2013, 2:37:31 PM1/21/13

to pdfhummus-in...@googlegroups.com

Hello, first of all thank you for sharing this library. Great work.
I'd like to use this library to dynamically fill forms from database or other data sources.
Could you help me to start with this, what do I need to acomplish tthis. How do I find AcroForm on a page and then get its data.

I understand that in order to modify element in a document, what really is going on is that everything is copied to a new pdf document except that element. And then new element is being added with modified content.

Ps. working on visual studio I had hard times getting it compiled, I had to set manually paths to headers, copy some libraries, something did not compile like bugs/AmitsTest.cpp I had to remove it. Sorry for complaining, but seems like it could be easier. :)

Regards,
Jarosław

Gal Kahana

unread,

Jan 22, 2013, 3:12:07 AM1/22/13

to pdfhummus-in...@googlegroups.com

Hi Jaroslaw,

As for AcroForm. As there's no high level support for this, you need to use the more basic building blocks to do this. You need to parse it from the Catalog. get the /AcroForm entry in the catalog object. then you can get to the list of fields through the /Fields entry. For each fields there's a V entry which contains the value. I'm not too familiar with the details (this is just from basic reading in the PDF reference), but we can probably dig further to find answers, for particular questions.

Now, you correctly state that you will need to modify the document to fill the forms. however, it is not required to copy a new PDF. in the latest version i introduced an ability to modify the document. you can read about it here https://github.com/galkahana/PDF-Writer/wiki/Modification (in the library documentation). there's a usage example here - https://github.com/galkahana/PDF-Writer/blob/master/PDFWriterTestPlayground/ModifyingExistingFileContent.cpp

(you should also have it in the materials that you downloaded in PDFWriterTestPlayground project). Then you just need to recreate either the value or the field dictionary but the entry, and use the new object instead. similar to how i'm adding comments in the sample, and therefore changing the page object.

i'll look into the issues with projects. sorry for that.

Gal.

Jarosław Bober

unread,

Jan 29, 2013, 4:57:03 PM1/29/13

to pdfhummus-in...@googlegroups.com

Hello,
I'm still working on editing AcroForms. I have problem that that the content of a form show up only if it is active. If I click anywhere else on document, the value of a form dissapear. I suspect that I broke somehow the structure of a PDF.
Let me show you what I have tried so far.
I have added new method to parser, which returns ID of a form that "T" Key == fieldName
ObjectIDType PDFParser::FindAcroFormID(std::string fieldName)
int main()
{ std::string testFileName = "text_fields.pdf";
    PDFWriter pdfWriter;
    EStatusCode status = eSuccess;
    status = pdfWriter.ModifyPDF(testFileName, ePDFVersion16, std::string("mod") + testFileName);
    PDFParser* inParser;
    inParser =&pdfWriter.GetModifiedFileParser();
    PDFDocumentCopyingContext* copyingContext = pdfWriter.CreatePDFCopyingContextForModifiedFile();

    //Get ID of a AcroForm by the label "text_1"
    ObjectIDType modifiedAcroFormID = inParser->FindAcroFormID("text_1");


    PDFObjectCastPtr<PDFDictionary> AcroFormFieldObject(inParser->ParseNewObject(modifiedAcroFormID));
    MapIterator<PDFNameToPDFObjectMap> objectContentIterator = AcroFormFieldObject->GetIterator();
    pdfWriter.GetObjectsContext().StartModifiedIndirectObject(modifiedAcroFormID);
        DictionaryContext* modifiedPageObject = pdfWriter.GetObjectsContext().StartDictionary();

        while(objectContentIterator.MoveNext())
        {
            if(objectContentIterator.GetKey()->GetValue() != "V")
            {
                modifiedPageObject->WriteKey(objectContentIterator.GetKey()->GetValue());
                copyingContext->CopyDirectObjectAsIs(objectContentIterator.GetValue());
            }
        }

        modifiedPageObject->WriteKey("V");
        modifiedPageObject->WriteLiteralStringValue("modifiedValue");
        pdfWriter.GetObjectsContext().EndDictionary(modifiedPageObject);
       pdfWriter.GetObjectsContext().EndIndirectObject();
       pdfWriter.EndPDF();

Can You tell if there is something wrong with the code above. Or How do you debug a document to find what's wrong?

Regards,
Jaroslaw

Gal Kahana

unread,

Jan 30, 2013, 1:35:15 AM1/30/13

to pdfhummus-in...@googlegroups.com

Hi Jarosław,

Normally to debug things what i do is to create a PDF file with the feature i want, say using InDesign or Acrobat, and then immitate it.

in your case, take the original form file, open it in acrobat, fill the form, save as another file, and see what changed (try incremental save if you can, so it will be in the style of modified PDF).

In this case, judging by reading the specs now, it sounds like you are doing 1 right thing (and the code is perfect), but there might be another thing to do, and that it so update the appearance stream.

check out page 677 in the PDF reference (the latest). about "Variable Text" and also in 691 "Text fields". it seems to be that there's something else to do, and there are instructions there. something to try, i haven't done this myself, so i don't know if this will resolve things, but i'm fairly sure that this part is missing from the solution here. that is, now that i read the specs for it.

Regardless i think that it's good if you first see how an existing application does this.

Regards,

Gal.

Jarosław Bober

unread,

Feb 1, 2013, 8:27:52 AM2/1/13

to pdfhummus-in...@googlegroups.com

Hello,
Thanks for help.

You are right, fields have "AP" keywhich is appearance stream.

So I tried this:
I wanted to construct new object with a stream which defines the appearance and then modify "AP"key" to point at this new object.
Is this correct approach? what do you think?
I get mixed results. In X-change viewer it works fine :), bu int acrobat and foxit field still needs to be active to show value.
Also in stream after EMC I can see some strance characters.
This is the code:

ObjectIDType modifiedAcroFormID = inParser->FindAcroFormID("text_1");

    ObjectIDType appearanceObject =    pdfWriter.GetObjectsContext().StartNewIndirectObject();

    pdfWriter.GetObjectsContext().EndIndirectObject();

    PDFStream* pdfStream = pdfWriter.GetObjectsContext().StartPDFStream();
    pdfWriter.GetObjectsContext().WriteKeyword("/Tx BMC");
    pdfWriter.GetObjectsContext().WriteKeyword("q");


    pdfWriter.GetObjectsContext().WriteKeyword("BT");
    //DA (/Helv 0 Tf 0 g)
    pdfWriter.GetObjectsContext().WriteKeyword("DA (/Helv 0 Tf 0 g)");
    pdfWriter.GetObjectsContext().WriteKeyword("/Helv 0 Tf 0 g Tf");
    pdfWriter.GetObjectsContext().WriteKeyword("2 3 Td");
    pdfWriter.GetObjectsContext().WriteKeyword("(asdfasdf)Tj");
    pdfWriter.GetObjectsContext().WriteKeyword("ET");
    pdfWriter.GetObjectsContext().WriteKeyword("Q");
    pdfWriter.GetObjectsContext().WriteKeyword("EMC");

    pdfWriter.GetObjectsContext().EndPDFStream(pdfStream);

    PDFObjectCastPtr<PDFDictionary> AcroFormFieldObject(inParser->ParseNewObject(modifiedAcroFormID));
    MapIterator<PDFNameToPDFObjectMap> objectContentIterator = AcroFormFieldObject->GetIterator();
    pdfWriter.GetObjectsContext().StartModifiedIndirectObject(modifiedAcroFormID);
        DictionaryContext* modifiedPageObject = pdfWriter.GetObjectsContext().StartDictionary();



        while(objectContentIterator.MoveNext())
        {

            if(objectContentIterator.GetKey()->GetValue() == "V")
            {
                modifiedPageObject->WriteKey("V");
                modifiedPageObject->WriteLiteralStringValue("notworking");
            }

            if (objectContentIterator.GetKey()->GetValue() == "AP")
            {
                modifiedPageObject->WriteKey("AP");
                DictionaryContext* APDictionary = pdfWriter.GetObjectsContext().StartDictionary();
                APDictionary->WriteKey("N");
                APDictionary->WriteNewObjectReferenceValue(appearanceObject);
                pdfWriter.GetObjectsContext().EndDictionary(APDictionary);
            }
            //just copy other keys
            if(objectContentIterator.GetKey()->GetValue() != "V" && objectContentIterator.GetKey()->GetValue() != "AP")

            {
                modifiedPageObject->WriteKey(objectContentIterator.GetKey()->GetValue());
                copyingContext->CopyDirectObjectAsIs(objectContentIterator.GetValue());
            }
        }

Regards,
Jaroslaw
On Monday, January 21, 2013 8:37:31 PM UTC+1, Jarosław Bober wrote:

Gal Kahana

unread,

Feb 1, 2013, 11:22:47 AM2/1/13

to pdfhummus-in...@googlegroups.com

I'm reading the specs (again...never done this). this seems mostly right a couple of things which are probably critical:

1. in the stream itself you need to remove these lines:

//DA (/Helv 0 Tf 0 g)
pdfWriter.GetObjectsContext().WriteKeyword("DA (/Helv 0 Tf 0 g)");

you wrote the DA itself...and that's good enough (you should copy it from the text field, by the way...but i guess just for testing its ok to write directly).

2. this shouldn't be just a stream, but rather a form xobject. so create it and it's stream should be what you wrote. the xobject should also have a resources dictionary, copied from the form DR entry (where you got the fields from). make sure the name used for setting the font (tf), is the same as one of the fonts in the resources dictionary (it will...because it's coming from the DA...which is probably synced)

one moer thing. not sure it's critical (you can see if the stream come out right), but you should be using WriteKeyword just for writing keywords. if you want to write other things, use other operators. but if it looks ok than just leave it.

when done, it'll probably be good to write some more sophisticated code that modifies the AP entry in care the original is more sophisticated.

good luck,

Gal.

Jarosław Bober

unread,

Feb 4, 2013, 12:22:48 PM2/4/13

to pdfhummus-in...@googlegroups.com

Hello Gal,
Thanks for your help regarding AcroForms.
I managed to make it work, somehow ;)
Like you said, i looked at how Acrobat is doing this and immitated it, finally it clicked.

I have one question still :) When I try to modify it with some characters from my native language which is Polish I get question marks.
These are the characters "ąęśćżźńół" :)
How do You write such characters to a pdf document?
I need to support other languages as well..

Regards.

Jaroslaw

On Monday, January 21, 2013 8:37:31 PM UTC+1, Jarosław Bober wrote:

Gal Kahana

unread,

Feb 4, 2013, 1:56:25 PM2/4/13

to pdfhummus-in...@googlegroups.com

Hi, great to gear that you succeeded. Re polack (and other non ansiis). You should be passing the strings to tj encoded in utf8. In fact all strings should be encoded to utf8. You can use unicodestring class if you need a helper.

Jarosław Bober

unread,

Feb 8, 2013, 7:35:10 AM2/8/13

to pdfhummus-in...@googlegroups.com

Hello once again :)
I'm having trouble in getting polish characters.
Strings are encoded in UTF-8
Going under the hood I noticed that 'Ą' character has the glyph index of 34
glyphIndex = FT_Get_Char_Index(mFace,*it);
So the font does have glyphs for that.
However in pdf document I get only '?'
Do you know why that is?

Regards,
Jaroslaw

Ps. Thank you for mentioning my name on you main site :)

Gal Kahana

unread,

Feb 8, 2013, 8:45:03 AM2/8/13

to pdfhummus-in...@googlegroups.com

It might mean that this is what the font has assigned as glyph. Can you send me the font name so i can try it out?

Gal.

Jarosław Bober

unread,

Feb 8, 2013, 8:49:44 AM2/8/13

to pdfhummus-in...@googlegroups.com

I used arial.ttf

Gal Kahana

unread,

Feb 9, 2013, 5:44:36 AM2/9/13

to pdfhummus-in...@googlegroups.com

Hi Jeroslaw,

i tested both on my mac and PC...and it seems to work fine. cut paste the letter into a file, saved it with utf8 encoding (no bom), loaded the file, wrote the string...perfect.

any chance of getting a code sample (preferably in ITestUnit test class, but not critical, just something from start to end) and the font you are using? i'll see if it works here, and try to figure this out.

On Friday, February 8, 2013 3:49:44 PM UTC+2, Jarosław Bober wrote:

I used arial.ttf

Jarosław Bober

unread,

Feb 9, 2013, 7:08:55 AM2/9/13

to pdfhummus-in...@googlegroups.com

Aurę i'kl push it to git in the evening or tomorrow morning.

Jarosław Bober

unread,

Feb 10, 2013, 8:45:59 AM2/10/13

to pdfhummus-in...@googlegroups.com

Sorry, my Android device knows better what I want to say;)

I pushed my code to master branch:
https://github.com/gerronimo/PDF-Writer.git
Including the document that I want to modify.
Sorry for the mess ;) I tried to make it more clean. I promise to refactor this ;)
As for the font file. I used the one that you provided in testMaterials/fonts.

Regards.
Jaroslaw

Gal Kahana

unread,

Feb 11, 2013, 3:46:19 PM2/11/13

to pdfhummus-in...@googlegroups.com

OK. a couple of things that i see:

1. You can't place literal text that is not ansi in code. it does not make it unicode or utf8. it is simply Ansi. This means that my library won't recognize it as your intended unicode values. to make this text unicode, you must use the actual unicode character values. when i'm testing these things, normally what i do is write the text in an external file, make sure it's encoded the way i want it to (this time utf8) and load it to a string. if you want to write the strings from code, you should use their unicode values or utf8 encoding instead.

so this: EStatusCode encodingStatus = xobjectContentContext->Tj("Ą");

is wrong.

check the attached files which have my example for how it should be done (well, form a file).

if you want to write it literally, you need to use the utf8 encoding. something like this:

(hint - this character encoding in utf8 is C4 84)

string aString;

aString.push_back(0xc4);

aString.push_back(0x84);

EStatusCode encodingStatus = xobjectContentContext->Tj(aString);

2. I noted that in writing V you had in your code (though not in the sample) uses:

modifiedPageObject->WriteLiteralStringValue("asdf");

for anything non-latin it'll probably wouldn't do. what you need to do is to provide go through the PDFString object, so that the utf8 encoded string (which yes...you should provide) will got through the PDF special encoding for unicode. so if we are using the previous example, this is what should be done:

string aString;

aString.push_back(0xc4);

aString.push_back(0x84);

PDFTextString pdfTextString;

pdfTextString.FromUTF8(aString);

form->WriteKey("V");

form->WriteLiteralStringValue(pdfTextString.ToString());

3. once you go through this problem, if we want a truly generic code, we must somehow refer to the DA string and the original resources....but one thing at a time.

Regards,

Gal.

PolishTest.cpp

PolishTest.h

PolishText.txt

Jarosław Bober

unread,

Feb 12, 2013, 4:42:16 AM2/12/13

to pdfhummus-in...@googlegroups.com

Hello, I knew I was doing something trivial. I never had to work with strings actually ;)
. I thought that If I use coding page in visual studio IDE then those strings will be encoded in UTF-8.

Ok so your suggestions work.
I don't think however that I can read strings from file.
I need to experiment a bit. I'm thinking about using wstring or wchar_t*
Because I want to create a shared library that will be consumed by other piece of software and that string will come from there.

Regards,
Jaroslaw

Gal Kahana

unread,

Feb 12, 2013, 5:16:16 AM2/12/13

to pdfhummus-in...@googlegroups.com

yeah...it's ok to just pass the strings encoded in utf 8. no need to read from file.

if you are using wstrings or wchar_t, check out my UnicodeString class. it can convert from 2 byte strings (UTF16) to UTF8, and so it should be rather trivial to work with wstrings/wchar_t. note though...in case you are going xplatform, that mac wchar_t is 4 bytes...and intends to use utf32. if that's a consideration let's talk further.

Jarosław Bober

unread,

Feb 12, 2013, 2:04:43 PM2/12/13

to pdfhummus-in...@googlegroups.com

I'm not going xplatform any time soon, but I might be intrested in going linux and there AFAIK wchar_t is also 4 bytes.

Jarosław Bober

unread,

Feb 13, 2013, 3:45:23 AM2/13/13

to pdfhummus-in...@googlegroups.com

Hello,
after reading a bit maybe wchar_t is not a good option.
I'm thinking about using something from boost or c++11 to convert between string and its representation.

Regards,
Jaroslaw.

Jarosław Bober

unread,

Feb 14, 2013, 5:24:27 AM2/14/13

to pdfhummus-in...@googlegroups.com

Hello,
This is what I used so far:
wchar_t* asdf = L"béęą";
std::wstring_convert<std::codecvt_utf8<wchar_t>> conv;

EStatusCode encodingStatus = xobjectContentContext->Tj(conv.to_bytes(asdf));

new c++11 standard defines new types like wchar16_t and wchar32_t.
I may still need to experiment with those. However this works nice for me at the moment.

Regards,
Jaroslaw

Gal Kahana

unread,

Feb 14, 2013, 6:11:57 AM2/14/13

to pdfhummus-in...@googlegroups.com

Thanks,

Good to know :)

BiB1

unread,

Feb 15, 2013, 9:36:28 AM2/15/13

to pdfhummus-in...@googlegroups.com

Hi,

@Jaroslow: I have made some update to your FindAcroFormID function, because if the fieldName does not exist, the app crashes.

Now the function take another param ObjectIDType (the one previously returned), and now return a EStatusCode if the field have been find.

The declaration look like this :

PDFHummus::EStatusCode FindAcroFormID(std::string fieldName, ObjectIDType &outObjectID);

And the implementation :

EStatusCode PDFParser::FindAcroFormID(std::string fieldName, ObjectIDType &outObjectID)

{

PDFObjectCastPtr<PDFIndirectObjectReference> catalogReference(mTrailer->QueryDirectObject("Root"));

if(!catalogReference)

{

TRACE_LOG("PDFParser::ParsePagesObjectIDs, failed to read catalog reference in trailer");

return PDFHummus::eFailure;

}

PDFObjectCastPtr<PDFDictionary> catalog(ParseNewObject(catalogReference->mObjectID));

if(!catalog)

{

TRACE_LOG("PDFParser::ParsePagesObjectIDs, failed to read catalog");

return PDFHummus::eFailure;

}

// get AcroForm, verify indirect reference

PDFObjectCastPtr<PDFIndirectObjectReference> AcroFormReference(catalog->QueryDirectObject("AcroForm"));

if(!AcroFormReference)

{

TRACE_LOG("PDFParser::ParseAcroForm, failed to read AcroForm reference in catalog");

return PDFHummus::eFailure;

}

PDFObjectCastPtr<PDFDictionary> AcroForm(ParseNewObject(AcroFormReference->mObjectID));

if(!AcroForm)

{

TRACE_LOG("PDFParser::ParseAcroForm, failed to read AcroForm");

return PDFHummus::eFailure;

}

PDFObjectCastPtr<PDFArray> AcroFormFields(AcroForm->QueryDirectObject("Fields"));

if(!AcroFormFields)

{

TRACE_LOG("PDFParser::ParseAcroForm, failed to read AcroFormFields");

return PDFHummus::eFailure;

}

EStatusCode status = PDFHummus::eFailure;

SingleValueContainerIterator<PDFObjectVector> it = AcroFormFields->GetIterator();

while(it.MoveNext())

{

if(it.GetItem()->GetType() != PDFObject::ePDFObjectIndirectObjectReference)

{

TRACE_LOG1("PDFParser::ParsePagesIDs, unexpected type for a AcroForm array object, type = %s",PDFObject::scPDFObjectTypeLabel[it.GetItem()->GetType()]);

status = PDFHummus::eFailure;

break;

}

PDFObject* pdfObject = it.GetItem();

PDFIndirectObjectReference* pdfIndobjRef = (PDFIndirectObjectReference*) pdfObject;

//pdfIndobjRef->mo

pdfObject = ParseNewObject(pdfIndobjRef->mObjectID);

PDFObjectCastPtr<PDFDictionary> AcroFormFieldObject(ParseNewObject(((PDFIndirectObjectReference*)it.GetItem())->mObjectID));

if(!AcroFormFieldObject)

{

TRACE_LOG("PDFParser::ParsePagesIDs, unable to parse page node object from AcroFormsFields reference");

status = PDFHummus::eFailure;

break;

}

//PDFObjectCastPtr<PDFName> FieldType(AcroFormFieldObject->QueryDirectObject("FT"));

PDFObjectCastPtr<PDFLiteralString> FieldValue(AcroFormFieldObject->QueryDirectObject("T"));

if(!FieldValue)

{

TRACE_LOG("PDFParser::ParseAcroForm, failed to read fieldName");

status = PDFHummus::eFailure;

break;

}

if(FieldValue->GetValue() == fieldName)

{

//return it.GetItem();

//return AcroFormReference->mObjectID;

status = PDFHummus::eSuccess;

outObjectID = pdfIndobjRef->mObjectID;

break;

//return AcroFormFieldObject;

//return status;

}

return status;

}

Jarosław Bober

unread,

Feb 16, 2013, 8:31:55 AM2/16/13

to pdfhummus-in...@googlegroups.com

Hello, I replied yesterday but now I couldn't find my reply.

Anyway,

Thanks for your contribution.

I didn't handle errors because I'm just prototyping this :) So it's still work in progress.

Regards,

Jaroslaw

Gal Kahana

unread,

Feb 17, 2013, 2:42:52 AM2/17/13

to pdfhummus-in...@googlegroups.com

Jaroslaw, when you are done, if that is OK with you, i'd like to incorporate your solution with the next release of the library.

There's some structure change that i'm interested from the prototype (i would like the handling of forms in a new class, rather than in the parser, and maybe surround it with some conveniences),

but other than that, the work is already brilliant, and obviously, would like others to enjoy it...but obviously...per your decision.

no obligation here.

know that i do realize now that filling forms is becoming interesting (reviewing stackoverflow, and several requests that i got), and would like to add support for it.

Regards,

Gal.

Jarosław Bober

unread,

Feb 17, 2013, 5:28:53 AM2/17/13

to pdfhummus-in...@googlegroups.com

I would be really happy :)

Regards,

Jaroslaw

BiB1

unread,

May 30, 2013, 4:47:31 AM5/30/13

to pdfhummus-in...@googlegroups.com

Hi,

I come back to know if the Jaraslaw solution have been added to the library ?

And also (maybe the most important part), if pdfhummus contains a solution to flatten an editable PDF.

I'have done a lot of research without success.

Regards

Ben

Gal Kahana

unread,

May 30, 2013, 12:44:45 PM5/30/13

to pdfhummus-in...@googlegroups.com

Hi :)

I'm afraid that Jeroslaw never got back to me, and i was working on other things, so i didn't quite get to it.

as for flattening an editable PDF, one could do this, again using the lower level interfaces. if i'm not mistaken, flattening an editable PDF only means to remove the interactive fields. using similar code to the modification above, i suggest to simply recreate the catalog, without the acroform key. does this sound reasonable?

Gal.

--
You received this message because you are subscribed to a topic in the Google Groups "PDFHummus interest group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pdfhummus-interest-group/rM-WNn2PRjk/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to pdfhummus-interest...@googlegroups.com.

To post to this group, send email to pdfhummus-in...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pdfhummus-interest-group/6be90853-d81f-466c-80db-f9ac2b89007d%40googlegroups.com?hl=en.

For more options, visit https://groups.google.com/groups/opt_out.

BiB1

unread,

May 31, 2013, 10:09:57 AM5/31/13

to pdfhummus-in...@googlegroups.com

Humm,

Well, i don't really understand how to do that, but i will try. However, if i remove the acroform from the catalog, all the the fields values will be also remove.

I think i need to get all fields, create simple label from it, and them, remove the the acroform.

Ben

To unsubscribe from this group and all its topics, send an email to pdfhummus-interest-group+unsub...@googlegroups.com.

Jarosław Bober

unread,

Jun 5, 2013, 7:37:36 AM6/5/13

to pdfhummus-in...@googlegroups.com

Hello guys sorry for not answering for so long, I admit that I had to drop this project. I had too many things to do. But I will certainly get back to it really soon.

Regards.
Jaroslaw

Reply all

Reply to author

Forward