Speed-up patch

42 views
Skip to first unread message

jcwojdel

unread,
Apr 20, 2012, 5:14:12 AM4/20/12
to FoX-discuss
Hello everyone,
My name is Jacek Wojdeł, I am actively using FoX for my research at
the moment. I stumbled upon an annoying deficiency of the code: it
does not like large tokens. That is, it gets really slow on them
because of constant alloc/copy/dealloc when appending a single
character. The issue is that I have files with large data-blobs
(basically matrices spelled out in txt) within XML structure; see e.g.
https://www.dropbox.com/sh/y1f9mf1f3in403h/16rHVk0WMO
Reading these files gives following timings:

1.5M.xml - 0m6.075s
14M.xml - slightly over 1/2h

I wrote a patch which replaces the following idiom:
tempString => fx%token
fx%token => vs_str_alloc(str_vs(fx%token)//c)
deallocate(tempString)
with a subroutine which puts new character in a table, which is
reallocated with 1024b steps (new datatype which takes care of storing
the actual length with the table). The resulting timings are:

1.5M.xml - 0m0.561s
14M.xml - 0m4.754s

The patch is not quite ready for submission to git, it's just to show
its potential (for myself - the code reads my files... I'm done,
unless there is more interest in it).
If you're interested, I'll put some effort in the following:
1) The typedef (expvs) and subroutines should be placed in some
reasonable place rather than in m_sax_types module (any suggestions?)
2) Appending should be done for arbitrary length string (parser spits
at most len=3 changes so it was fast to do just 3 consecutive appends,
but it's ugly)
3) Not all of the subroutine names look nice (same for the typedef...
any suggestions?)
4) Other 'vs' data could be moved to use 'expvs', which would allow
for speeding up the following idiom:
fx%content => fx%token
fx%token => vs_str_alloc("")
which could be done without a single alloc and no copying of data
(just exchange the pointers and make one length zero)
Cheers,
Jacek

---------------------------------
diff -ru FoX-4.1.2_orig/sax/m_sax_parser.F90 FoX-4.1.2/sax/
m_sax_parser.F90
--- FoX-4.1.2_orig/sax/m_sax_parser.F90 2012-01-05 11:04:10.000000000
+0100
+++ FoX-4.1.2/sax/m_sax_parser.F90 2012-04-20 09:55:19.344742726 +0200
@@ -76,7 +76,7 @@
nullURI => null()
#endif

- allocate(fx%token(0))
+ call init_expvs(fx%exptoken)

call init_error_stack(fx%error_stack)
call init_elstack(fx%elstack)
@@ -123,7 +123,7 @@
fx%context = CTXT_NULL
fx%state = ST_NULL

- if (associated(fx%token)) deallocate(fx%token)
+ if (associated(fx%exptoken%data)) deallocate(fx%exptoken%data)
if (associated(fx%root_element)) deallocate(fx%root_element)

call destroy_error_stack(fx%error_stack)
@@ -140,7 +140,7 @@
call destroy_entity_list(fx%forbidden_pe_list)
call destroy_entity_list(fx%predefined_e_list)

- if (associated(fx%token)) deallocate(fx%token)
+ if (associated(fx%exptoken%data)) deallocate(fx%exptoken%data)
if (associated(fx%content)) deallocate(fx%content)
if (associated(fx%name)) deallocate(fx%name)
if (associated(fx%attname)) deallocate(fx%attname)
@@ -521,8 +521,8 @@
if (fx%state_dtd==ST_DTD_ATTLIST_CONTENTS &
.or.fx%state_dtd==ST_DTD_ELEMENT_CONTENTS) then
! stick the token back in contents ...
- fx%content => fx%token
- fx%token => vs_str_alloc("")
+ fx%content => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
endif
if (reading_main_file(fb)) &
fx%inIntSubset = .true.
@@ -565,7 +565,7 @@
select case (fx%state)

case (ST_MISC)
- !write(*,*) 'ST_MISC', str_vs(fx%token)
+ !write(*,*) 'ST_MISC', str_expvs(fx%exptoken)
select case (fx%tokenType)
case (TOK_PI_TAG)
wf_stack(1) = wf_stack(1) + 1
@@ -587,7 +587,7 @@
case (TOK_OPEN_COMMENT)
nextState = ST_START_COMMENT
case (TOK_NAME)
- if (str_vs(fx%token)=='DOCTYPE') then
+ if (str_expvs(fx%exptoken)=='DOCTYPE') then
fx%context = CTXT_IN_DTD
nextState = ST_IN_DOCTYPE
endif
@@ -599,19 +599,19 @@
select case (fx%tokenType)
case (TOK_NAME)
if (namespaces_) then
- nameOk = checkNCName(str_vs(fx%token), fx%xds
%xml_version)
+ nameOk = checkNCName(str_expvs(fx%exptoken), fx%xds
%xml_version)
else
- nameOk = checkName(str_vs(fx%token), fx%xds%xml_version)
+ nameOk = checkName(str_expvs(fx%exptoken), fx%xds
%xml_version)
endif
if (nameOk) then
- if (str_vs(fx%token)=='xml') then
+ if (str_expvs(fx%exptoken)=='xml') then
call add_error(fx%error_stack, "XML declaration must be
at start of document")
goto 100
- elseif (checkPITarget(str_vs(fx%token), fx%xds
%xml_version)) then
+ elseif (checkPITarget(str_expvs(fx%exptoken), fx%xds
%xml_version)) then
nextState = ST_PI_CONTENTS
- fx%name => fx%token
- fx%token => null()
- else
+ fx%name => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
+ else
call add_error(fx%error_stack, "Invalid PI target
name")
goto 100
endif
@@ -631,7 +631,7 @@
select case(fx%tokenType)
case (TOK_CHAR)
if (present(processingInstruction_handler)) then
- call processingInstruction_handler(str_vs(fx%name),
str_vs(fx%token))
+ call processingInstruction_handler(str_vs(fx%name),
str_expvs(fx%exptoken))
if (fx%state==ST_STOP) goto 100
endif
deallocate(fx%name)
@@ -664,8 +664,8 @@
!write(*,*)'ST_START_COMMENT'
select case (fx%tokenType)
case (TOK_CHAR)
- fx%name => fx%token
- nullify(fx%token)
+ fx%name => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextState = ST_COMMENT_END
end select

@@ -701,16 +701,16 @@
.or. fx%context==CTXT_BEFORE_CONTENT &
.or. fx%context==CTXT_IN_CONTENT) then
if (namespaces_) then
- nameOk = checkQName(str_vs(fx%token), fx%xds
%xml_version)
+ nameOk = checkQName(str_expvs(fx%exptoken), fx%xds
%xml_version)
else
- nameOk = checkName(str_vs(fx%token), fx%xds
%xml_version)
+ nameOk = checkName(str_expvs(fx%exptoken), fx%xds
%xml_version)
endif
if (.not.nameOk) then
call add_error(fx%error_stack, "Illegal element name")
goto 100
endif
- fx%name => fx%token
- nullify(fx%token)
+ fx%name => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextState = ST_IN_TAG
elseif (fx%context == CTXT_AFTER_CONTENT) then
call add_error(fx%error_stack, "Cannot open second root
element")
@@ -725,7 +725,7 @@
!write(*,*) "ST_START_CDATA_DECLARATION"
select case (fx%tokenType)
case (TOK_NAME)
- if (str_vs(fx%token)=="CDATA") then
+ if (str_expvs(fx%exptoken)=="CDATA") then
if (fx%context/=CTXT_IN_CONTENT) then
call add_error(fx%error_stack, "CDATA section only
allowed in text content.")
goto 100
@@ -749,8 +749,8 @@
!write(*,*)'ST_CDATA_CONTENTS'
select case (fx%tokenType)
case (TOK_CHAR)
- fx%name => fx%token
- nullify(fx%token)
+ fx%name => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextState = ST_CDATA_END
end select

@@ -855,21 +855,21 @@

case (TOK_NAME)
if (namespaces_) then
- nameOk = checkQName(str_vs(fx%token), fx%xds%xml_version)
+ nameOk = checkQName(str_expvs(fx%exptoken), fx%xds
%xml_version)
else
- nameOk = checkName(str_vs(fx%token), fx%xds%xml_version)
+ nameOk = checkName(str_expvs(fx%exptoken), fx%xds
%xml_version)
endif
if (.not.nameOk) then
call add_error(fx%error_stack, "Illegal attribute name")
goto 100
endif
!Have we already had this dictionary item?
- if (has_key(fx%attributes, str_vs(fx%token))) then
+ if (has_key(fx%attributes, str_expvs(fx%exptoken))) then
call add_error(fx%error_stack, "Duplicate attribute
name")
goto 100
endif
- fx%attname => fx%token
- nullify(fx%token)
+ fx%attname => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
if (associated(elem)) then
attDecl => get_attribute_declaration(elem, str_vs(fx
%attname))
else
@@ -892,9 +892,8 @@
select case (fx%tokenType)
case (TOK_CHAR)
!First, expand all entities:
- tempString => normalize_attribute_text(fx, fx%token)
- deallocate(fx%token)
- fx%token => tempString
+ tempString => normalize_attribute_text(fx, vs_expvs(fx
%exptoken))
+ call expvs_from_vs( fx%exptoken, tempString )
tempString => null()
!If this attribute is not CDATA, we must process further;
if (associated(attDecl)) then
@@ -904,11 +903,11 @@
endif
if (temp_i==ATT_CDATA) then
call add_item_to_dict(fx%attributes, str_vs(fx%attname),
&
- str_vs(fx%token), itype=ATT_CDATA,
declared=associated(attDecl))
+ str_expvs(fx%exptoken), itype=ATT_CDATA,
declared=associated(attDecl))
else
if (validCheck) then
if (fx%xds%standalone.and..not.attDecl%internal &
- .and.(str_vs(fx%token)//"x"/
=att_value_normalize(str_vs(fx%token))//"x")) then
+ .and.(str_expvs(fx%exptoken)//"x"/
=att_value_normalize(str_expvs(fx%exptoken))//"x")) then
call add_error(fx%error_stack, &
"Externally-declared attribute value normalization
results in changed value "// &
"in standalone document")
@@ -916,7 +915,7 @@
endif
endif
call add_item_to_dict(fx%attributes, str_vs(fx%attname),
&
- att_value_normalize(str_vs(fx%token)), itype=temp_i, &
+ att_value_normalize(str_expvs(fx%exptoken)),
itype=temp_i, &
declared=.true.)
endif
deallocate(fx%attname)
@@ -927,17 +926,17 @@
!write(*,*)'ST_CHAR_IN_CONTENT'
select case (fx%tokenType)
case (TOK_CHAR)
- if (size(fx%token)>0) then
+ if (fx%exptoken%real_length>0) then
if (validCheck) then
if (elementContent(fx%elstack)) then
- if (verify(str_vs(fx%token), XML_WHITESPACE)==0) then
+ if (verify(str_expvs(fx%exptoken),
XML_WHITESPACE)==0) then
if (fx%xds%standalone.and..not.elem%internal) then
call add_error(fx%error_stack, &
"Externally-specified ignorable whitespace used
in standalone document")
goto 100
endif
if (present(ignorableWhitespace_handler)) then
- call ignorableWhitespace_handler(str_vs(fx
%token))
+ call ignorableWhitespace_handler(str_expvs(fx
%exptoken))
if (fx%state==ST_STOP) goto 100
endif
else
@@ -949,13 +948,13 @@
goto 100
else
if (present(characters_handler)) then
- call characters_handler(str_vs(fx%token))
+ call characters_handler(str_expvs(fx%exptoken))
if (fx%state==ST_STOP) goto 100
endif
endif
else
if (present(characters_handler)) then
- call characters_handler(str_vs(fx%token))
+ call characters_handler(str_expvs(fx%exptoken))
if (fx%state==ST_STOP) goto 100
endif
endif
@@ -998,15 +997,15 @@
'Encountered reference to undeclared entity')
endif
endif
- ent => getEntityByName(fx%forbidden_ge_list, str_vs(fx
%token))
+ ent => getEntityByName(fx%forbidden_ge_list, str_expvs(fx
%exptoken))
if (associated(ent)) then
call add_error(fx%error_stack, 'Recursive entity
reference')
goto 100
endif
- ent => getEntityByName(fx%predefined_e_list, str_vs(fx
%token))
+ ent => getEntityByName(fx%predefined_e_list, str_expvs(fx
%exptoken))
if (associated(ent)) then
if (present(startEntity_handler)) then
- call startEntity_handler(str_vs(fx%token))
+ call startEntity_handler(str_expvs(fx%exptoken))
if (fx%state==ST_STOP) goto 100
endif
if (validCheck) then
@@ -1019,15 +1018,15 @@
endif
endif
if (present(characters_handler)) then
- call characters_handler(expand_entity(fx
%predefined_e_list, str_vs(fx%token)))
+ call characters_handler(expand_entity(fx
%predefined_e_list, str_expvs(fx%exptoken)))
if (fx%state==ST_STOP) goto 100
endif
if (present(endEntity_handler)) then
- call endEntity_handler(str_vs(fx%token))
+ call endEntity_handler(str_expvs(fx%exptoken))
if (fx%state==ST_STOP) goto 100
endif
- elseif (likeCharacterEntityReference(str_vs(fx%token)))
then
- if (checkRepCharEntityReference(str_vs(fx%token), fx%xds
%xml_version)) then
+ elseif (likeCharacterEntityReference(str_expvs(fx
%exptoken))) then
+ if (checkRepCharEntityReference(str_expvs(fx%exptoken), fx
%xds%xml_version)) then
if (validCheck) then
if (associated(elem)) then
if (.not.elem%mixed.and..not.elem%any) then
@@ -1038,18 +1037,18 @@
endif
endif
if (present(characters_handler)) then
- call characters_handler(expand_char_entity(str_vs(fx
%token)))
+ call
characters_handler(expand_char_entity(str_expvs(fx%exptoken)))
if (fx%state==ST_STOP) goto 100
endif
- elseif (checkCharacterEntityReference(str_vs(fx%token), fx
%xds%xml_version)) then
+ elseif (checkCharacterEntityReference(str_expvs(fx
%exptoken), fx%xds%xml_version)) then
call add_error(fx%error_stack, "Unable to digest
character entity reference in content, sorry.")
goto 100
else
call add_error(fx%error_stack, "Illegal character
reference")
goto 100
endif
- elseif (existing_entity(fx%xds%entityList, str_vs(fx
%token))) then
- ent => getEntityByName(fx%xds%entityList, str_vs(fx
%token))
+ elseif (existing_entity(fx%xds%entityList, str_expvs(fx
%exptoken))) then
+ ent => getEntityByName(fx%xds%entityList, str_expvs(fx
%exptoken))
if (ent%wfc.and.fx%xds%standalone) then
call add_error(fx%error_stack, &
'Externally declared entity referenced in standalone
document')
@@ -1063,22 +1062,22 @@
if (iostat/=0) then
if (validCheck) then
call add_error(fx%error_stack, &
- "Unable to retrieve external entity "//str_vs(fx
%token))
+ "Unable to retrieve external entity "//
str_expvs(fx%exptoken))
goto 100
endif
if (present(skippedEntity_handler)) then
- call skippedEntity_handler(str_vs(fx%token))
+ call skippedEntity_handler(str_expvs(fx%exptoken))
if (fx%state==ST_STOP) goto 100
endif
else
if (present(startEntity_handler)) then
- call startEntity_handler(str_vs(fx%token))
+ call startEntity_handler(str_expvs(fx%exptoken))
if (fx%state==ST_STOP) goto 100
endif
#ifdef PGF90
- call add_internal_entity(fx%forbidden_ge_list,
str_vs(fx%token), "", nullURI, .false.)
+ call add_internal_entity(fx%forbidden_ge_list,
str_expvs(fx%exptoken), "", nullURI, .false.)
#else
- call add_internal_entity(fx%forbidden_ge_list,
str_vs(fx%token), "", null(), .false.)
+ call add_internal_entity(fx%forbidden_ge_list,
str_expvs(fx%exptoken), "", null(), .false.)
#endif
temp_wf_stack => wf_stack
allocate(wf_stack(size(temp_wf_stack)+1))
@@ -1100,15 +1099,16 @@
endif
endif
if (present(startEntity_handler)) then
- call startEntity_handler(str_vs(fx%token))
+ call startEntity_handler(str_expvs(fx%exptoken))
if (fx%state==ST_STOP) goto 100
endif
#ifdef PGF90
- call add_internal_entity(fx%forbidden_ge_list, str_vs(fx
%token), "", nullURI, .false.)
+ call add_internal_entity(fx%forbidden_ge_list,
str_expvs(fx%exptoken), "", nullURI, .false.)
#else
- call add_internal_entity(fx%forbidden_ge_list, str_vs(fx
%token), "", null(), .false.)
+ call add_internal_entity(fx%forbidden_ge_list,
str_expvs(fx%exptoken), "", null(), .false.)
#endif
- call open_new_string(fb, expand_entity(fx%xds
%entityList, str_vs(fx%token)), str_vs(fx%token), baseURI=ent%baseURI)
+ call open_new_string(fb, expand_entity(fx%xds
%entityList, str_expvs(fx%exptoken)), &
+ str_expvs(fx%exptoken), baseURI=ent%baseURI)
temp_wf_stack => wf_stack
allocate(wf_stack(size(temp_wf_stack)+1))
wf_stack = (/0, temp_wf_stack/)
@@ -1118,7 +1118,7 @@
! Unknown entity check standalone etc
if (fx%skippedExternal.and..not.fx%xds%standalone) then
if (present(skippedEntity_handler)) then
- call skippedEntity_handler(str_vs(fx%token))
+ call skippedEntity_handler(str_expvs(fx%exptoken))
if (fx%state==ST_STOP) goto 100
endif
else
@@ -1133,9 +1133,9 @@
!write(*,*)'ST_CLOSING_TAG'
select case (fx%tokenType)
case (TOK_NAME)
- if (checkName(str_vs(fx%token), fx%xds%xml_version)) then
- fx%name => fx%token
- nullify(fx%token)
+ if (checkName(str_expvs(fx%exptoken), fx%xds%xml_version))
then
+ fx%name => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextState = ST_IN_CLOSING_TAG
else
call add_error(fx%error_stack, "Closing tag: expecting a
Name")
@@ -1175,26 +1175,26 @@
select case (fx%tokenType)
case (TOK_NAME)
if (namespaces_) then
- nameOk = checkQName(str_vs(fx%token), fx%xds%xml_version)
+ nameOk = checkQName(str_expvs(fx%exptoken), fx%xds
%xml_version)
else
- nameOk = checkName(str_vs(fx%token), fx%xds%xml_version)
+ nameOk = checkName(str_expvs(fx%exptoken), fx%xds
%xml_version)
endif
if (.not.nameOk) then
call add_error(fx%error_stack, "Invalid document name")
goto 100
endif
- fx%root_element => fx%token
- nullify(fx%token)
+ fx%root_element => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextState = ST_DOC_NAME
end select

case (ST_DOC_NAME)
- !write(*,*) 'ST_DOC_NAME ', str_vs(fx%token)
+ !write(*,*) 'ST_DOC_NAME ', str_expvs(fx%exptoken)
select case (fx%tokenType)
case (TOK_NAME)
- if (str_vs(fx%token)=='SYSTEM') then
+ if (str_expvs(fx%exptoken)=='SYSTEM') then
nextState = ST_DOC_SYSTEM
- elseif (str_vs(fx%token)=='PUBLIC') then
+ elseif (str_expvs(fx%exptoken)=='PUBLIC') then
nextState = ST_DOC_PUBLIC
endif
case (TOK_OPEN_SB)
@@ -1225,9 +1225,9 @@
!write(*,*) 'ST_DOC_PUBLIC'
select case (fx%tokenType)
case (TOK_CHAR)
- if (checkPublicId(str_vs(fx%token))) then
- fx%publicId => fx%token
- fx%token => null()
+ if (checkPublicId(str_expvs(fx%exptoken))) then
+ fx%publicId => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextState = ST_DOC_SYSTEM
else
call add_error(fx%error_stack, "Invalid document public
id")
@@ -1239,8 +1239,8 @@
!write(*,*) 'ST_DOC_SYSTEM'
select case (fx%tokenType)
case (TOK_CHAR)
- fx%systemId => fx%token
- fx%token => null()
+ fx%systemId => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextState = ST_DOC_DECL
end select

@@ -1385,12 +1385,12 @@
!write(*,*) 'ST_START_PE'
select case (fx%tokenType)
case (TOK_NAME)
- if (existing_entity(fx%forbidden_pe_list, str_vs(fx
%token))) then
+ if (existing_entity(fx%forbidden_pe_list, str_expvs(fx
%exptoken))) then
call add_error(fx%error_stack, &
'Recursive entity reference')
goto 100
endif
- ent => getEntityByName(fx%xds%PEList, str_vs(fx%token))
+ ent => getEntityByName(fx%xds%PEList, str_expvs(fx
%exptoken))
if (associated(ent)) then
if (ent%wfc.and.fx%xds%standalone) then
call add_error(fx%error_stack, &
@@ -1398,25 +1398,25 @@
goto 100
elseif (ent%external) then
if (present(startEntity_handler)) then
- call startEntity_handler('%'//str_vs(fx%token))
+ call startEntity_handler('%'//str_expvs(fx%exptoken))
if (fx%state==ST_STOP) goto 100
endif
#ifdef PGF90
call add_internal_entity(fx%forbidden_pe_list, &
- str_vs(fx%token), "", nullURI, .false.)
+ str_expvs(fx%exptoken), "", nullURI, .false.)
#else
call add_internal_entity(fx%forbidden_pe_list, &
- str_vs(fx%token), "", null(), .false.)
+ str_expvs(fx%exptoken), "", null(), .false.)
#endif
call open_new_file(fb, ent%baseURI, iostat, pe=.true.)
if (iostat/=0) then
if (validCheck) then
call add_error(fx%error_stack, &
- "Unable to retrieve external parameter entity "//
str_vs(fx%token))
+ "Unable to retrieve external parameter entity "//
str_expvs(fx%exptoken))
goto 100
endif
if (present(skippedEntity_handler)) then
- call skippedEntity_handler('%'//str_vs(fx%token))
+ call skippedEntity_handler('%'//str_expvs(fx
%exptoken))
if (fx%state==ST_STOP) goto 100
endif
! having skipped a PE, we must now not process
@@ -1427,15 +1427,15 @@
else
fx%inIntSubset = .false.
if (present(startEntity_handler)) then
- call startEntity_handler('%'//str_vs(fx%token))
+ call startEntity_handler('%'//str_expvs(fx
%exptoken))
if (fx%state==ST_STOP) goto 100
endif
#ifdef PGF90
call add_internal_entity(fx%forbidden_pe_list, &
- str_vs(fx%token), "", nullURI, .false.)
+ str_expvs(fx%exptoken), "", nullURI, .false.)
#else
call add_internal_entity(fx%forbidden_pe_list, &
- str_vs(fx%token), "", null(), .false.)
+ str_expvs(fx%exptoken), "", null(), .false.)
#endif
call parse_text_declaration(fb, fx%error_stack)
if (in_error(fx%error_stack)) goto 100
@@ -1449,19 +1449,19 @@
else
! Expand the entity,
if (present(startEntity_handler)) then
- call startEntity_handler('%'//str_vs(fx%token))
+ call startEntity_handler('%'//str_expvs(fx%exptoken))
if (fx%state==ST_STOP) goto 100
endif
#ifdef PGF90
call add_internal_entity(fx%forbidden_pe_list, &
- str_vs(fx%token), "", nullURI, .false.)
+ str_expvs(fx%exptoken), "", nullURI, .false.)
call open_new_string(fb, &
- expand_entity(fx%xds%PEList, str_vs(fx%token)),
str_vs(fx%token), baseURI=nullURI, pe=.true.)
+ expand_entity(fx%xds%PEList, str_expvs(fx%exptoken)),
str_expvs(fx%exptoken), baseURI=nullURI, pe=.true.)
#else
call add_internal_entity(fx%forbidden_pe_list, &
- str_vs(fx%token), "", null(), .false.)
+ str_expvs(fx%exptoken), "", null(), .false.)
call open_new_string(fb, &
- expand_entity(fx%xds%PEList, str_vs(fx%token)),
str_vs(fx%token), baseURI=null(), pe=.true.)
+ expand_entity(fx%xds%PEList, str_expvs(fx%exptoken)),
str_expvs(fx%exptoken), baseURI=null(), pe=.true.)
#endif
! NB because we are just expanding a string here,
anything
! evaluated as a result of this string is evaluated in
the
@@ -1480,7 +1480,7 @@
if (fx%skippedExternal.and..not.fx%xds%standalone) then
if (processDTD) then
if (present(skippedEntity_handler)) then
- call skippedEntity_handler('%'//str_vs(fx%token))
+ call skippedEntity_handler('%'//str_expvs(fx
%exptoken))
if (fx%state==ST_STOP) goto 100
endif
endif
@@ -1530,9 +1530,9 @@
! EOF of main file
if (startInChardata_) then
if (fx%well_formed) then
- if (fx%state==ST_CHAR_IN_CONTENT.and.associated(fx%token))
then
- if (size(fx%token)>0.and.present(characters_handler)) &
- call characters_handler(str_vs(fx%token))
+ if (fx%state==ST_CHAR_IN_CONTENT.and.associated(fx%exptoken
%data)) then
+ if (fx%exptoken
%real_length>0.and.present(characters_handler)) &
+ call characters_handler(str_expvs(fx%exptoken))
endif
else
if (present(fatalError_handler)) &
@@ -1601,13 +1601,13 @@
case (TOK_OPEN_COMMENT)
nextDTDState = ST_DTD_START_COMMENT
case (TOK_NAME)
- if (str_vs(fx%token)=='ATTLIST') then
+ if (str_expvs(fx%exptoken)=='ATTLIST') then
nextDTDState = ST_DTD_ATTLIST
- elseif (str_vs(fx%token)=='ELEMENT') then
+ elseif (str_expvs(fx%exptoken)=='ELEMENT') then
nextDTDState = ST_DTD_ELEMENT
- elseif (str_vs(fx%token)=='ENTITY') then
+ elseif (str_expvs(fx%exptoken)=='ENTITY') then
nextDTDState = ST_DTD_ENTITY
- elseif (str_vs(fx%token)=='NOTATION') then
+ elseif (str_expvs(fx%exptoken)=='NOTATION') then
nextDTDState = ST_DTD_NOTATION
endif
end select
@@ -1616,7 +1616,7 @@
!write(*,*) "ST_DTD_START_SECTION_DECL"
select case (fx%tokenType)
case (TOK_NAME)
- if (str_vs(fx%token)=="IGNORE") then
+ if (str_expvs(fx%exptoken)=="IGNORE") then
if (fx%context/=CTXT_IN_DTD.or.reading_main_file(fb))
then
call add_error(fx%error_stack, "IGNORE section only
allowed in external subset.")
return
@@ -1625,7 +1625,7 @@
fx%context = CTXT_IGNORE
nextDTDState = ST_DTD_FINISH_SECTION_DECL
endif
- elseif (str_vs(fx%token)=="INCLUDE") then
+ elseif (str_expvs(fx%exptoken)=="INCLUDE") then
if (fx%context/=CTXT_IN_DTD.or.reading_main_file(fb))
then
call add_error(fx%error_stack, "INCLUDE section only
allowed in external subset.")
return
@@ -1675,18 +1675,18 @@
select case (fx%tokenType)
case (TOK_NAME)
if (namespaces_) then
- nameOk = checkNCName(str_vs(fx%token), fx%xds
%xml_version)
+ nameOk = checkNCName(str_expvs(fx%exptoken), fx%xds
%xml_version)
else
- nameOk = checkName(str_vs(fx%token), fx%xds%xml_version)
+ nameOk = checkName(str_expvs(fx%exptoken), fx%xds
%xml_version)
endif
if (nameOk) then
- if (str_vs(fx%token)=='xml') then
+ if (str_expvs(fx%exptoken)=='xml') then
call add_error(fx%error_stack, "XML declaration must be
at start of document")
return
- elseif (checkPITarget(str_vs(fx%token), fx%xds
%xml_version)) then
+ elseif (checkPITarget(str_expvs(fx%exptoken), fx%xds
%xml_version)) then
nextDTDState = ST_DTD_PI_CONTENTS
- fx%name => fx%token
- fx%token => null()
+ fx%name => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
else
call add_error(fx%error_stack, "Invalid PI target
name")
return
@@ -1717,7 +1717,7 @@
select case(fx%tokenType)
case (TOK_CHAR)
if (present(processingInstruction_handler)) then
- call processingInstruction_handler(str_vs(fx%name),
str_vs(fx%token))
+ call processingInstruction_handler(str_vs(fx%name),
str_expvs(fx%exptoken))
if (fx%state==ST_STOP) return
endif
deallocate(fx%name)
@@ -1742,8 +1742,8 @@
!write(*,*)'ST_DTD_START_COMMENT'
select case (fx%tokenType)
case (TOK_CHAR)
- fx%name => fx%token
- nullify(fx%token)
+ fx%name => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextDTDState = ST_DTD_COMMENT_END
end select

@@ -1782,18 +1782,18 @@
select case (fx%tokenType)
case (TOK_NAME)
if (namespaces_) then
- nameOk = checkQName(str_vs(fx%token), fx%xds%xml_version)
+ nameOk = checkQName(str_expvs(fx%exptoken), fx%xds
%xml_version)
else
- nameOk = checkName(str_vs(fx%token), fx%xds%xml_version)
+ nameOk = checkName(str_expvs(fx%exptoken), fx%xds
%xml_version)
endif
if (.not.nameOk) then
call add_error(fx%error_stack, "Invalid element name for
ATTLIST")
return
endif
- if (existing_element(fx%xds%element_list, str_vs(fx
%token))) then
- elem => get_element(fx%xds%element_list, str_vs(fx
%token))
+ if (existing_element(fx%xds%element_list, str_expvs(fx
%exptoken))) then
+ elem => get_element(fx%xds%element_list, str_expvs(fx
%exptoken))
else
- elem => add_element(fx%xds%element_list, str_vs(fx
%token))
+ elem => add_element(fx%xds%element_list, str_expvs(fx
%exptoken))
endif
nextDTDState = ST_DTD_ATTLIST_CONTENTS
end select
@@ -1807,16 +1807,16 @@
nextState = ST_START_PE
case (TOK_DTD_CONTENTS)
if (processDTD) then
- call parse_dtd_attlist(str_vs(fx%token), fx%xds
%xml_version, &
+ call parse_dtd_attlist(str_expvs(fx%exptoken), fx%xds
%xml_version, &
namespaces_, validCheck, fx%error_stack, elem, &
internal=reading_main_file(fb))
else
#ifdef PGF90
- call parse_dtd_attlist(str_vs(fx%token), fx%xds
%xml_version, &
+ call parse_dtd_attlist(str_expvs(fx%exptoken), fx%xds
%xml_version, &
namespaces_, validCheck, fx%error_stack, nullElement, &
internal=reading_main_file(fb))
#else
- call parse_dtd_attlist(str_vs(fx%token), fx%xds
%xml_version, &
+ call parse_dtd_attlist(str_expvs(fx%exptoken), fx%xds
%xml_version, &
namespaces_, validCheck, fx%error_stack, null(), &
internal=reading_main_file(fb))
#endif
@@ -1894,16 +1894,16 @@
select case (fx%tokenType)
case (TOK_NAME)
if (namespaces_) then
- nameOk = checkQName(str_vs(fx%token), fx%xds%xml_version)
+ nameOk = checkQName(str_expvs(fx%exptoken), fx%xds
%xml_version)
else
- nameOk = checkName(str_vs(fx%token), fx%xds%xml_version)
+ nameOk = checkName(str_expvs(fx%exptoken), fx%xds
%xml_version)
endif
if (.not.nameOk) then
call add_error(fx%error_stack, "Invalid name for
ELEMENT")
return
endif
- fx%name => fx%token
- fx%token => null()
+ fx%name => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextDTDState = ST_DTD_ELEMENT_CONTENTS
end select

@@ -1940,7 +1940,7 @@
else
elem => null()
endif
- call parse_dtd_element(str_vs(fx%token), fx%xds
%xml_version, fx%error_stack, &
+ call parse_dtd_element(str_expvs(fx%exptoken), fx%xds
%xml_version, fx%error_stack, &
elem, reading_main_file(fb))
if (in_error(fx%error_stack)) return
nextDTDState = ST_DTD_ELEMENT_END
@@ -1972,24 +1972,24 @@
!write(*,*) 'ST_DTD_ENTITY'
select case (fx%tokenType)
case (TOK_NAME)
- if (str_vs(fx%token)=="%") then
+ if (str_expvs(fx%exptoken)=="%") then
pe = .true.
! this will be a PE
nextDTDState = ST_DTD_ENTITY_PE
else
pe = .false.
if (namespaces_) then
- nameOk = checkNCName(str_vs(fx%token), fx%xds
%xml_version)
+ nameOk = checkNCName(str_expvs(fx%exptoken), fx%xds
%xml_version)
else
- nameOk = checkName(str_vs(fx%token), fx%xds
%xml_version)
+ nameOk = checkName(str_expvs(fx%exptoken), fx%xds
%xml_version)
endif
if (.not.nameOk) then
call add_error(fx%error_stack, &
"Illegal name for general entity")
return
endif
- fx%name => fx%token
- fx%token => null()
+ fx%name => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextDTDState = ST_DTD_ENTITY_ID
endif
end select
@@ -1999,17 +1999,17 @@
select case (fx%tokenType)
case (TOK_NAME)
if (namespaces_) then
- nameOk = checkNCName(str_vs(fx%token), fx%xds
%xml_version)
+ nameOk = checkNCName(str_expvs(fx%exptoken), fx%xds
%xml_version)
else
- nameOk = checkName(str_vs(fx%token), fx%xds%xml_version)
+ nameOk = checkName(str_expvs(fx%exptoken), fx%xds
%xml_version)
endif
if (.not.nameOk) then
call add_error(fx%error_stack, &
"Illegal name for parameter entity")
return
endif
- fx%name => fx%token
- fx%token => null()
+ fx%name => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextDTDState = ST_DTD_ENTITY_ID
end select

@@ -2017,9 +2017,9 @@
!write(*,*) 'ST_DTD_ENTITY_ID'
select case (fx%tokenType)
case (TOK_NAME)
- if (str_vs(fx%token) == "PUBLIC") then
+ if (str_expvs(fx%exptoken) == "PUBLIC") then
nextDTDState = ST_DTD_ENTITY_PUBLIC
- elseif (str_vs(fx%token) == "SYSTEM") then
+ elseif (str_expvs(fx%exptoken) == "SYSTEM") then
nextDTDState = ST_DTD_ENTITY_SYSTEM
else
call add_error(fx%error_stack, "Unexpected token in
ENTITY")
@@ -2027,9 +2027,9 @@
endif
case (TOK_CHAR)
if (reading_main_file(fb)) then
- tempString => fx%token
+ tempString => vs_expvs(fx%exptoken)
else
- tempString => expand_pe_text(fx, fx%token, fb)
+ tempString => expand_pe_text(fx, vs_expvs(fx%exptoken),
fb)
endif
fx%attname => expand_entity_value_alloc(tempString, fx%xds,
fx%error_stack)
if (reading_main_file(fb)) then
@@ -2048,9 +2048,9 @@
!write(*,*) 'ST_DTD_ENTITY_PUBLIC'
select case (fx%tokenType)
case (TOK_CHAR)
- if (checkPublicId(str_vs(fx%token))) then
- fx%publicId => fx%token
- fx%token => null()
+ if (checkPublicId(str_expvs(fx%exptoken))) then
+ fx%publicId => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextDTDState = ST_DTD_ENTITY_SYSTEM
else
call add_error(fx%error_stack, "Invalid PUBLIC id in
ENTITY")
@@ -2065,8 +2065,8 @@
!write(*,*) 'ST_DTD_ENTITY_SYSTEM'
select case (fx%tokenType)
case (TOK_CHAR)
- fx%systemId => fx%token
- fx%token => null()
+ fx%systemId => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextDTDState = ST_DTD_ENTITY_NDATA
case default
call add_error(fx%error_stack, "Unexpected token in
ENTITY")
@@ -2097,7 +2097,7 @@
if (associated(fx%Ndata)) deallocate(fx%Ndata)
nextDTDState = ST_DTD_SUBSET
case (TOK_NAME)
- if (str_vs(fx%token)=='NDATA') then
+ if (str_expvs(fx%exptoken)=='NDATA') then
if (pe) then
call add_error(fx%error_stack, "Parameter entity cannot
have NDATA declaration")
return
@@ -2118,16 +2118,16 @@
select case (fx%tokenType)
case (TOK_NAME)
if (namespaces_) then
- nameOk = checkNCName(str_vs(fx%token), fx%xds
%xml_version)
+ nameOk = checkNCName(str_expvs(fx%exptoken), fx%xds
%xml_version)
else
- nameOk = checkName(str_vs(fx%token), fx%xds%xml_version)
+ nameOk = checkName(str_expvs(fx%exptoken), fx%xds
%xml_version)
endif
if (.not.nameOk) then
call add_error(fx%error_stack, "Invalid name for
Notation")
return
endif
- fx%Ndata => fx%token
- fx%token => null()
+ fx%Ndata => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextDTDState = ST_DTD_ENTITY_END
case default
call add_error(fx%error_stack, "Unexpected token in
ENTITY")
@@ -2167,16 +2167,16 @@
select case (fx%tokenType)
case (TOK_NAME)
if (namespaces_) then
- nameOk = checkNCName(str_vs(fx%token), fx%xds
%xml_version)
+ nameOk = checkNCName(str_expvs(fx%exptoken), fx%xds
%xml_version)
else
- nameOk = checkName(str_vs(fx%token), fx%xds%xml_version)
+ nameOk = checkName(str_expvs(fx%exptoken), fx%xds
%xml_version)
endif
if (.not.nameOk) then
call add_error(fx%error_stack, "Invalid name for
Notation")
return
endif
- fx%name => fx%token
- fx%token => null()
+ fx%name => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextDTDState = ST_DTD_NOTATION_ID
case default
call add_error(fx%error_stack, "Unexpected token in
NOTATION")
@@ -2187,9 +2187,9 @@
!write(*,*)'ST_DTD_NOTATION_ID'
select case (fx%tokenType)
case (TOK_NAME)
- if (str_vs(fx%token)=='SYSTEM') then
+ if (str_expvs(fx%exptoken)=='SYSTEM') then
nextDTDState = ST_DTD_NOTATION_SYSTEM
- elseif (str_vs(fx%token)=='PUBLIC') then
+ elseif (str_expvs(fx%exptoken)=='PUBLIC') then
nextDTDState = ST_DTD_NOTATION_PUBLIC
else
call add_error(fx%error_stack, "Unexpected token after
NOTATION")
@@ -2204,8 +2204,8 @@
!write(*,*)'ST_DTD_NOTATION_SYSTEM'
select case (fx%tokenType)
case (TOK_CHAR)
- fx%systemId => fx%token
- fx%token => null()
+ fx%systemId => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextDTDState = ST_DTD_NOTATION_END
case default
call add_error(fx%error_stack, "Unexpected token in
NOTATION")
@@ -2216,9 +2216,9 @@
!write(*,*)'ST_DTD_NOTATION_PUBLIC'
select case (fx%tokenType)
case (TOK_CHAR)
- if (checkPublicId(str_vs(fx%token))) then
- fx%publicId => fx%token
- fx%token => null()
+ if (checkPublicId(str_expvs(fx%exptoken))) then
+ fx%publicId => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextDTDState = ST_DTD_NOTATION_PUBLIC_2
else
call add_error(fx%error_stack, "Invalid PUBLIC id in
NOTATION")
@@ -2256,8 +2256,8 @@
deallocate(fx%publicId)
nextDTDState = ST_DTD_SUBSET
case (TOK_CHAR)
- fx%systemId => fx%token
- fx%token => null()
+ fx%systemId => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
nextDTDState = ST_DTD_NOTATION_END
end select

diff -ru FoX-4.1.2_orig/sax/m_sax_tokenizer.F90 FoX-4.1.2/sax/
m_sax_tokenizer.F90
--- FoX-4.1.2_orig/sax/m_sax_tokenizer.F90 2012-01-05
11:04:10.000000000 +0100
+++ FoX-4.1.2/sax/m_sax_tokenizer.F90 2012-04-19 17:39:56.512220646
+0200
@@ -39,8 +39,7 @@

xv = fx%xds%xml_version

- if (associated(fx%token)) deallocate(fx%token)
- fx%token => vs_str_alloc("")
+ fx%exptoken%real_length = 0
if (fx%nextTokenType/=TOK_NULL) then
eof = .false.
fx%tokenType = fx%nextTokenType
@@ -58,13 +57,10 @@
if (eof) then
if (fx%state==ST_CHAR_IN_CONTENT) then
if (phrase==1) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(fx%token)//"]")
- deallocate(tempString)
+ call append_expvs(fx%exptoken,']')
elseif (phrase==2) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(fx%token)//"]]")
- deallocate(tempString)
+ call append_expvs(fx%exptoken,']')
+ call append_expvs(fx%exptoken,']')
endif
fx%tokenType = TOK_CHAR
endif
@@ -112,8 +108,7 @@
elseif (c=="[") then
fx%tokenType = TOK_OPEN_SB
elseif (verify(c,upperCase)==0) then
- deallocate(fx%token)
- fx%token => vs_str_alloc(c)
+ call expvs_from_char(fx%exptoken,c)
else
call add_error(fx%error_stack, "Unexpected character
after <!")
endif
@@ -124,9 +119,7 @@
call add_error(fx%error_stack, "Unexpected character
after <!-")
endif
elseif (verify(c,XML_WHITESPACE)>0) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs(fx%exptoken,c)
else
call push_chars(fb, c)
fx%tokenType = TOK_NAME
@@ -135,9 +128,7 @@
case (ST_START_PI)
! grab until whitespace or ?
if (verify(c, XML_WHITESPACE//"?")>0) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
else
fx%tokenType = TOK_NAME
if (c=="?") call push_chars(fb, c)
@@ -158,21 +149,16 @@
fx%nextTokenType = TOK_PI_END
elseif (c=="?") then
! The last ? didn't mean anything, but this one might.
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"?")
- deallocate(tempString)
+ call append_expvs( fx%exptoken, '?' )
else
phrase = 0
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"?"//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, '?' )
+ call append_expvs( fx%exptoken, c )
endif
elseif (c=="?") then
phrase = 1
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
endif

case (ST_START_COMMENT)
@@ -181,17 +167,14 @@
if (c=="-") then
phrase = 1
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
endif
case (1)
if (c=="-") then
phrase = 2
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"-"//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, '-' )
+ call append_expvs( fx%exptoken, c )
phrase = 0
endif
case (2)
@@ -208,9 +191,7 @@
case (ST_START_TAG)
! grab until whitespace or /, >
if (verify(c, XML_WHITESPACE//"/>")>0) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
else
fx%tokenType = TOK_NAME
if (c==">") then
@@ -226,8 +207,7 @@
call add_error(fx%error_stack, &
"Whitespace not allowed around CDATA in section
declaration")
else
- deallocate(fx%token)
- fx%token => vs_str_alloc(c)
+ call expvs_from_char(fx%exptoken, c)
ws_discard = .false.
endif
else
@@ -238,9 +218,7 @@
fx%tokenType = TOK_NAME
if (c=="[") fx%nextTokenType = TOK_OPEN_SB
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
endif
endif

@@ -251,17 +229,14 @@
if (c=="]") then
phrase = 1
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
endif
case (1)
if (c=="]") then
phrase = 2
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"]"//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, ']' )
+ call append_expvs( fx%exptoken, c )
phrase = 0
endif
case (2)
@@ -269,13 +244,11 @@
fx%tokenType = TOK_CHAR
fx%nextTokenType = TOK_SECTION_END
elseif (c=="]") then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"]")
- deallocate(tempString)
- else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"]]"//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, ']' )
+ else
+ call append_expvs( fx%exptoken, ']' )
+ call append_expvs( fx%exptoken, ']' )
+ call append_expvs( fx%exptoken, c )
phrase = 0
endif
end select
@@ -297,8 +270,7 @@
phrase = 1
ws_discard = .false.
else
- deallocate(fx%token)
- fx%token => vs_str_alloc(c)
+ call expvs_from_char(fx%exptoken,c)
ws_discard = .false.
endif
endif
@@ -322,9 +294,7 @@
call push_chars(fb, c)
endif
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
endif
endif
endif
@@ -357,22 +327,17 @@
if (c==q) then
fx%tokenType = TOK_CHAR
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
endif
endif

case (ST_CHAR_IN_CONTENT)
if (c=="<".or.c=="&") then
if (phrase==1) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"]")
- deallocate(tempString)
+ call append_expvs( fx%exptoken, ']' )
elseif (phrase==2) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"]]")
- deallocate(tempString)
+ call append_expvs( fx%exptoken, ']' )
+ call append_expvs( fx%exptoken, ']' )
endif
fx%tokenType = TOK_CHAR
if (c=="<") then
@@ -386,35 +351,27 @@
elseif (phrase==1) then
phrase = 2
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"]")
- deallocate(tempString)
+ call append_expvs( fx%exptoken, ']' )
endif
elseif (c==">") then
if (phrase==1) then
phrase = 0
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"]>")
- deallocate(tempString)
+ call append_expvs( fx%exptoken, ']' )
+ call append_expvs( fx%exptoken, '>' )
elseif (phrase==2) then
call add_error(fx%error_stack, "]]> forbidden in
character context")
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//">")
- deallocate(tempString)
+ call append_expvs( fx%exptoken, '>' )
endif
elseif (phrase==1) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"]"//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, ']' )
+ call append_expvs( fx%exptoken, c )
elseif (phrase==2) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"]]"//c)
- deallocate(tempString)
- else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, ']' )
+ call append_expvs( fx%exptoken, ']' )
+ call append_expvs( fx%exptoken, c )
+ else
+ call append_expvs( fx%exptoken, c )
endif

case (ST_TAG_IN_CONTENT)
@@ -444,9 +401,7 @@

case (ST_START_ENTITY)
if (verify(c,XML_WHITESPACE//";")>0) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
elseif (c==";") then
fx%tokenType = TOK_NAME
else
@@ -455,9 +410,7 @@

case (ST_CLOSING_TAG)
if (verify(c,XML_WHITESPACE//">")>0) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
else
fx%tokenType = TOK_NAME
if (c==">") fx%nextTokenType = TOK_END_TAG
@@ -492,16 +445,14 @@
if (verify(c, XML_WHITESPACE)>0) then
if (verify(c, "'""")==0) then
q = c
- deallocate(fx%token)
- fx%token => vs_str_alloc("")
+ fx%exptoken%real_length = 0
ws_discard = .false.
elseif (c=="[") then
fx%tokenType = TOK_OPEN_SB
elseif (c==">") then
fx%tokenType = TOK_END_TAG
else
- deallocate(fx%token)
- fx%token => vs_str_alloc(c)
+ call expvs_from_char(fx%exptoken, c)
ws_discard = .false.
endif
endif
@@ -515,9 +466,7 @@
call push_chars(fb, c)
fx%tokenType = TOK_NAME
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
endif
endif

@@ -526,9 +475,7 @@

case (ST_START_PE)
if (verify(c,XML_WHITESPACE//";")>0) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
elseif (c==";") then
fx%tokenType = TOK_NAME
else
@@ -580,8 +527,8 @@
if (.not.associated(fx%content)) then
! content will not always be empty here;
! if we have two PErefs bang next to each other.
- fx%content => fx%token
- fx%token => vs_str_alloc("")
+ fx%content => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
endif
fx%tokenType = TOK_ENTITY
return
@@ -658,8 +605,7 @@
elseif (c=="[") then
fx%tokenType = TOK_OPEN_SB
elseif (verify(c,upperCase)==0) then
- deallocate(fx%token)
- fx%token => vs_str_alloc(c)
+ call expvs_from_char(fx%exptoken,c)
else
call add_error(fx%error_stack, "Unexpected character
after <!")
endif
@@ -670,9 +616,7 @@
call add_error(fx%error_stack, "Unexpected character
after <!-")
endif
elseif (verify(c,XML_WHITESPACE)>0) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
else
call push_chars(fb, c)
fx%tokenType = TOK_NAME
@@ -685,15 +629,12 @@
endif
if (ws_discard) then
if (verify(c, XML_WHITESPACE)/=0) then
- deallocate(fx%token)
- fx%token => vs_str_alloc(c)
+ call expvs_from_char(fx%exptoken, c)
ws_discard = .false.
endif
else
if (verify(c, XML_WHITESPACE//"[")>0) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
else
fx%tokenType = TOK_NAME
if (c=="[") fx%nextTokenType = TOK_OPEN_SB
@@ -737,9 +678,7 @@
case (ST_DTD_START_PI)
! grab until whitespace or ?
if (verify(c, XML_WHITESPACE//"?")>0) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
else
fx%tokenType = TOK_NAME
if (c=="?") call push_chars(fb, c)
@@ -760,21 +699,16 @@
fx%nextTokenType = TOK_PI_END
elseif (c=="?") then
! The last ? didn't mean anything, but this one might.
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"?")
- deallocate(tempString)
+ call append_expvs( fx%exptoken, '?' )
else
phrase = 0
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"?"//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, '?' )
+ call append_expvs( fx%exptoken, c )
endif
elseif (c=="?") then
phrase = 1
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
endif

case (ST_DTD_START_COMMENT)
@@ -783,25 +717,20 @@
if (c=="-") then
phrase = 1
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
endif
case (1)
if (c=="-") then
phrase = 2
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//"-"//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, '-' )
+ call append_expvs( fx%exptoken, c )
phrase = 0
endif
case (2)
if (c==">") then
fx%tokenType = TOK_CHAR
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
fx%nextTokenType = TOK_COMMENT_END
else
call add_error(fx%error_stack, &
@@ -814,14 +743,11 @@
if (firstChar) ws_discard = .true.
if (ws_discard) then
if (verify(c,XML_WHITESPACE)>0) then
- deallocate(fx%token)
- fx%token => vs_str_alloc(c)
+ call expvs_from_char(fx%exptoken, c)
ws_discard = .false.
endif
elseif (verify(c,XML_WHITESPACE//">")>0) then
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
else
if (c==">") then
fx%nextTokenType = TOK_END_TAG
@@ -834,30 +760,27 @@
case (ST_DTD_ELEMENT_CONTENTS)
if (c==">") then
if (associated(fx%content)) then
- deallocate(fx%token)
- fx%token => fx%content
+ call expvs_from_vs(fx%exptoken,fx%content)
fx%content => null()
endif
fx%tokenType = TOK_DTD_CONTENTS
fx%nextTokenType = TOK_END_TAG
else
if (associated(fx%content)) then
- deallocate(fx%token)
- fx%token => vs_str_alloc(str_vs(fx%content)//c)
+ call expvs_from_vs(fx%exptoken,fx%content)
+ call append_expvs(fx%exptoken, c)
deallocate(fx%content)
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs(fx%exptoken, c)
endif
if (c=="(") then
fx%tokenType = TOK_OPEN_PAR
- fx%content => fx%token
- fx%token => vs_str_alloc("")
+ fx%content => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
elseif (c==")") then
fx%tokenType = TOK_CLOSE_PAR
- fx%content => fx%token
- fx%token => vs_str_alloc("")
+ fx%content => vs_expvs(fx%exptoken)
+ fx%exptoken%real_length = 0
endif
endif

@@ -866,13 +789,11 @@
fx%tokenType = TOK_DTD_CONTENTS
fx%nextTokenType = TOK_END_TAG
elseif (associated(fx%content)) then
- deallocate(fx%token)
- fx%token => vs_str_alloc(str_vs(fx%content)//c)
+ call expvs_from_vs(fx%exptoken,fx%content)
+ call append_expvs(fx%exptoken, c)
deallocate(fx%content)
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
endif
if (c=="'".or.c=="""") then
if (q==c) then
@@ -900,14 +821,12 @@
if (verify(c, XML_WHITESPACE)>0) then
if (verify(c, "'""")==0) then
q = c
- deallocate(fx%token)
- fx%token => vs_str_alloc("")
+ fx%exptoken%real_length = 0
ws_discard = .false.
elseif (c==">") then
fx%tokenType = TOK_END_TAG
else
- deallocate(fx%token)
- fx%token => vs_str_alloc(c)
+ call expvs_from_char( fx%exptoken, c )
ws_discard = .false.
endif
endif
@@ -922,9 +841,7 @@
call push_chars(fb, c)
endif
else
- tempString => fx%token
- fx%token => vs_str_alloc(str_vs(tempString)//c)
- deallocate(tempString)
+ call append_expvs( fx%exptoken, c )
endif
endif

diff -ru FoX-4.1.2_orig/sax/m_sax_types.F90 FoX-4.1.2/sax/
m_sax_types.F90
--- FoX-4.1.2_orig/sax/m_sax_types.F90 2012-01-05 11:04:10.000000000
+0100
+++ FoX-4.1.2/sax/m_sax_types.F90 2012-04-19 17:34:42.171290156 +0200
@@ -115,6 +115,16 @@
integer, parameter :: TOK_OPEN_PAR = 19 ! (
integer, parameter :: TOK_CLOSE_PAR = 20 ! )

+ ! Expandible vs type.
+ ! It is used for token field, so that it can be grown
+ ! by one character at a time without incurring constant
+ ! penalty on allocating/deallocating vs data.
+ ! See functions and subroutines at the end of this module.
+ type expvs
+ character, dimension(:), pointer :: data
+ integer :: real_length
+ end type expvs
+
type sax_parser_t
type(xml_doc_state), pointer :: xds
logical :: xds_used = .false. ! is the xds used by DOM? If so, we
must
@@ -124,7 +134,7 @@
integer :: state_dtd = ST_DTD_SUBSET
logical :: well_formed = .false.
logical :: skippedExternal = .false.
- character, dimension(:), pointer :: token => null()
+ type(expvs) :: exptoken
character, dimension(:), pointer :: content => null()
integer :: tokenType = TOK_NULL
integer :: nextTokenType = TOK_NULL
@@ -158,4 +168,70 @@
#endif
end type xml_t

+contains
+! Routines used by expandible vs type.
+subroutine init_expvs(evs)
+ type(expvs), intent(inout) :: evs
+ allocate(evs%data(1024))
+ evs%real_length = 0
+end subroutine init_expvs
+
+function vs_expvs(evs) result(vs)
+ type(expvs) :: evs
+ character, dimension(:), pointer :: vs
+
+ allocate(vs(evs%real_length))
+ vs = evs%data(1:evs%real_length)
+end function vs_expvs
+
+function str_expvs(evs) result(s)
+ type(expvs), intent(in) :: evs
+ character(len=evs%real_length) :: s
+#ifdef PGF90
+!PGI crashes on this use of transfer. Knob-ends.
+ integer :: i
+ do i = 1, evs%real_length
+ s(i:i) = evs%data(i)
+ enddo
+#else
+ s = transfer(evs%data(1:evs%real_length), s)
+#endif
+end function str_expvs
+
+subroutine append_expvs(evs,c)
+ type(expvs), intent(inout) :: evs
+ character, intent(in) :: c
+ character, dimension(:), pointer :: tmp
+
+ if (evs%real_length+1 > size(evs%data)) then
+ allocate(tmp(size(evs%data)+1024))
+ tmp(1:evs%real_length) = evs%data(1:evs%real_length)
+ deallocate(evs%data)
+ evs%data => tmp
+ end if
+
+ evs%data(evs%real_length+1) = c
+ evs%real_length = evs%real_length+1
+end subroutine append_expvs
+
+subroutine expvs_from_char(evs,c)
+ type(expvs), intent(inout) :: evs
+ character, intent(in) :: c
+ evs%real_length = 1
+ evs%data(1) = c
+end subroutine expvs_from_char
+
+subroutine expvs_from_vs(evs,vs)
+ type(expvs), intent(inout) :: evs
+ character, dimension(:), intent(in) :: vs
+
+ if (size(vs) > size(evs%data)) then
+ deallocate(evs%data)
+ allocate(evs%data(size(vs)))
+ end if
+
+ evs%real_length = size(vs)
+ evs%data(1:size(vs)) = vs
+end subroutine expvs_from_vs
+
end module m_sax_types

Andrew Walker

unread,
Apr 21, 2012, 7:43:13 AM4/21/12
to fox-d...@googlegroups.com, j.c.w...@gmail.com
Dear Jacek,

First, thank you very much for the observation and draft patch to speed up
the FoX SAX parser. It's always good the hear about how people are using
(and modifying) the library.

The performance issue around long strings is something that has come up
before (but not for a long time: see the thread "rts subroutine very slow"
in the archive of this list). As you show, the problem is the alloc and
dealloc each time a character is added to an array representing a variable
length string. This is a significant design flaw (and not just because of
the performance hit - see below).

Once upon a time (c. 2009) I had a shot at fixing this. The approach was
similar to your patch. The (out of date) result which I never merged can
be seen on the 'vstr' branch on github (see
https://github.com/andreww/fox/tree/vstr ). This gave a similar speed up
to what you have found (factor of ~10) but was an attempt to remove all of
the arrays of characters (and so caused much more code churn). I didn't
merge this because somewhere along the way I introduced some had-to-fix
bugs (segfaults from the DOM layer, if memory serves) and never got around
to debugging the problem.

Another reason I would be keen to see a change along the lines of the
patch you propose is that this would be an essential first step in
allowing FoX to handle documents encoded in something other than ASCII (or
low code-point utf-8). The trick to do this would be, I think, to store
code-points in an integer array and hide things like character comparison
away in a derived type access via a set of functions. (Note that there
would need to be a bunch of other changes, but I think these would be less
intrusive.)

Anyway, I've had a quick look at your patch and have put some comments
inline.

Thanks again,

Andrew

On Fri, April 20, 2012 10:14 am, jcwojdel wrote:
> Hello everyone,

> My name is Jacek WojdeÅ , I am actively using FoX for my research at


> the moment. I stumbled upon an annoying deficiency of the code: it
> does not like large tokens. That is, it gets really slow on them
> because of constant alloc/copy/dealloc when appending a single
> character. The issue is that I have files with large data-blobs
> (basically matrices spelled out in txt) within XML structure; see e.g.
> https://www.dropbox.com/sh/y1f9mf1f3in403h/16rHVk0WMO
> Reading these files gives following timings:
>
> 1.5M.xml - 0m6.075s
> 14M.xml - slightly over 1/2h
>
> I wrote a patch which replaces the following idiom:
> tempString => fx%token
> fx%token => vs_str_alloc(str_vs(fx%token)//c)
> deallocate(tempString)
> with a subroutine which puts new character in a table, which is
> reallocated with 1024b steps (new datatype which takes care of storing
> the actual length with the table). The resulting timings are:
>
> 1.5M.xml - 0m0.561s
> 14M.xml - 0m4.754s
>
> The patch is not quite ready for submission to git, it's just to show
> its potential (for myself - the code reads my files... I'm done,
> unless there is more interest in it).

Yes - I'm interested in merging this.

> If you're interested, I'll put some effort in the following:
> 1) The typedef (expvs) and subroutines should be placed in some
> reasonable place rather than in m_sax_types module (any suggestions?)

I would create a new file and module in fox/fsys/ - something like
fox_m_fsys_expvs.F90.

> 2) Appending should be done for arbitrary length string (parser spits
> at most len=3 changes so it was fast to do just 3 consecutive appends,
> but it's ugly)

I'm not sure what you mean here. Could you clarify?

> 3) Not all of the subroutine names look nice (same for the typedef...
> any suggestions?)

What about varstr and XXX_varstr?

> 4) Other 'vs' data could be moved to use 'expvs', which would allow
> for speeding up the following idiom:
> fx%content => fx%token
> fx%token => vs_str_alloc("")
> which could be done without a single alloc and no copying of data
> (just exchange the pointers and make one length zero)

I would recommend finishing and merging the bits you've done first. It's
well worth trying to merge small patches rather than make a large change
(that's where I think I went wrong).

Would it be possible for vs_expvs to sort out real_length? The content of
the derived type should probably be private (eventually).

> endif
> if (reading_main_file(fb)) &
> fx%inIntSubset = .true.
> @@ -565,7 +565,7 @@
> select case (fx%state)
>
> case (ST_MISC)
> - !write(*,*) 'ST_MISC', str_vs(fx%token)
> + !write(*,*) 'ST_MISC', str_expvs(fx%exptoken)
> select case (fx%tokenType)
> case (TOK_PI_TAG)
> wf_stack(1) = wf_stack(1) + 1
> @@ -587,7 +587,7 @@
> case (TOK_OPEN_COMMENT)
> nextState = ST_START_COMMENT
> case (TOK_NAME)
> - if (str_vs(fx%token)=='DOCTYPE') then
> + if (str_expvs(fx%exptoken)=='DOCTYPE') then

The alternative would be to overload the '==' operator. But not in this
patch.

Does this not end up leaking memory?

Could this be private (inside the type)?

I would make the 1024 here a module parameter, say EXPVS_INIT_SIZE. This
means we could easily play around with the memory-use / speed trade-off.
(Eventually we may want to make this an optional argument - as, e.g.
character data can be quite long but element names are typically short.)

character(len=1), intent(in) :: c would be more explicit

> + character, dimension(:), pointer :: tmp
> +
> + if (evs%real_length+1 > size(evs%data)) then
> + allocate(tmp(size(evs%data)+1024))

Can the 1024 here be another (different) parameter?

> + tmp(1:evs%real_length) = evs%data(1:evs%real_length)
> + deallocate(evs%data)
> + evs%data => tmp
> + end if
> +
> + evs%data(evs%real_length+1) = c
> + evs%real_length = evs%real_length+1
> +end subroutine append_expvs
> +
> +subroutine expvs_from_char(evs,c)
> + type(expvs), intent(inout) :: evs
> + character, intent(in) :: c

character(len=1), intent(in) :: c would be more explicit

> + evs%real_length = 1
> + evs%data(1) = c
> +end subroutine expvs_from_char
> +
> +subroutine expvs_from_vs(evs,vs)
> + type(expvs), intent(inout) :: evs
> + character, dimension(:), intent(in) :: vs
> +
> + if (size(vs) > size(evs%data)) then
> + deallocate(evs%data)
> + allocate(evs%data(size(vs)))
> + end if
> +
> + evs%real_length = size(vs)
> + evs%data(1:size(vs)) = vs
> +end subroutine expvs_from_vs
> +
> end module m_sax_types
>

> --
> You received this message because you are subscribed to the Google Groups
> "FoX-discuss" group.
> To post to this group, send email to fox-d...@googlegroups.com.
> To unsubscribe from this group, send email to
> fox-discuss...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/fox-discuss?hl=en.
>
>


--


Andrew Walker

unread,
May 3, 2012, 4:32:48 AM5/3/12
to fox-d...@googlegroups.com
Dear all,

I just wanted to note in this thread that I've now merged a modified version of Jacek's SAX speed-up patch into my master branch on github. Parsing large documents should be substantially faster in the next FoX release (or, if you are using the development tree, now). My thanks go to Jacek for working up this modification.

Let me know if you find problems with this, or any other change.

Best wishes,

Andrew
Andrew Walker <andrew...@bris.ac.uk>
http://www1.gly.bris.ac.uk/~walker/
phone: 0117 9545698

School of Earth Sciences,
University of Bristol,
Wills Memorial Building,
Queen’s Road,
Bristol, BS8 1RJ, UK







Shane Clauson

unread,
May 3, 2012, 9:56:39 AM5/3/12
to fox-d...@googlegroups.com, Andrew Walker
Jacek, Andrew

Thanks for the sax speedup mods, they are working well here.
Even though my application uses significantly smaller
files than seem typical for other fox users,
the load times are still showing up to a 50% improvement.

As an additional benefit, a crash problem with parsing lengthy comments
has also disappeared with the introduction of varstr.

Cheers,
Shane.
>>> My name is Jacek Wojde� , I am actively using FoX for my research at
>>> the moment. I stumbled upon an annoying deficiency of the code: it
>>> does not like large tokens. That is, it gets really slow on them
>>> because of constant alloc/copy/dealloc when appending a single
>>> character. The issue is that I have files with large data-blobs
>>> (basically matrices spelled out in txt) within XML structure; see e.g.
>>> https://www.dropbox.com/sh/y1f9mf1f3in403h/16rHVk0WMO
>>> Reading these files gives following timings:
>>>
... snip
Reply all
Reply to author
Forward
0 new messages