CURRENT TAG: Span
Traversing the marked content properties dictionary
Key: MCID
Text: 5.0
CURRENT TAG: Figure
Traversing the marked content properties dictionary
Key: BBox
Key: MCID
Text: 53.0
Key: Type
CURRENT TAG: Caption
Traversing the marked content properties dictionary
Key: MCID
Text: 44.0
itr = mcProp.GetDictIterator
puts "Traversing the marked content properties dictionary"
while itr.HasNext do
key = itr.Key
puts "Key: " + key.GetName.to_s
value = itr.Value
##this is a really dumb way to do it
##but if i can find the alt tag this way, can figure out a better way to
##extract them
begin
eval value.GetNumber.to_s
rescue StandardError => boom
else
puts "Text: " + value.GetNumber.to_s
end
begin
eval value.GetAsPDFText.to_s
rescue StandardError => boom
else
puts "Text: " + value.GetAsPDFText.to_s
end
begin
eval value.IsArray
rescue StandardError => boom
else
puts "Warn: Is array"
end
itr.Next
end
elsif element.GetType == Element::E_marked_content_end
puts "MC End"
end
puts "\n"
end
element = reader.Next
end
It is seeing everything except the Alt Tags.
I have checked the pdf source. The tags are there as prescribed by Adobe. So I have no idea what i am missing.
Thanks for you help in advance.
zonker
Vitalsource (Ingram)
It looks like you are able to extract MCID (Marked Content Identifier), so the remaining question is how do you get the relevant ‘Structure Element’. This is shown in LogicalStructure sample project:
https://www.pdftron.com/pdfnet/samplecode/LogicalStructureTest.cs.html
https://www.pdftron.com/pdfnet/samplecode.html#LogicalStructure