fixed, was Re: noisy DTDs and Re: [FoX] Re: ...

15 views
Skip to first unread message

Shane Clauson

unread,
Jan 14, 2009, 8:36:52 PM1/14/09
to fox-d...@googlegroups.com
Toby and Group.

Thank you for looking at this.
I've reverted fox_m_utils_uri.f90 and added Toby's change.
Re-running my earlier tests shows that the problem is now fixed
for relative paths. Many Thanks !

On a semi-related issue, I'm unable to get a file scheme URI to work for opening a file
with open_xml_file().
For example supplying          file:///C:/activity.xml
will not open a file in that location.
My goal is to be able to specify a file located anywhere on my (windows) system,
but the code appears intransigent here and I'm obviously missing something !
Any help or suggestions would be greatly appreciated.

Cheers and Thanks,
Shane.

ps I'll leave the #ifdef DEBUG directive around the dtd  dump. !






2009/1/15 Toby White <toby.o....@googlemail.com>

As a quick check, can you see if this fixes the copyURI issue for you?

--- a/utils/fox_m_utils_uri.F90
+++ b/utils/fox_m_utils_uri.F90
@@ -365,6 +365,7 @@ contains
     p = checkNonOpaquePath(path, segments)
     if (.not.p) then
       p = checkOpaquePart(path)
+      if (p) allocate(segments(0))
     endif

   end function checkPath

As for the dumpCPtree thing - that's a debug call; it shouldn't have
been left in the release version, sorry about that!

Toby

On 14 Jan 2009, at 23:39, Shane Clauson wrote:

> Thank you people,
> Perhaps its a windows/ifort 10 issue.
> I'll attempt to produce a succinct demonstration of the issue when
> time permits.
> However, right now the hack makes the symptoms go away,
> (and arguably makes the URI code more robust) but there is a
> deadline on my project that won't go away, so the extra
> investigation will need to wait,
> apologies.
>
> Also on an unrelated issue, I found the dumping to stdout of a
> loaded DTD by
> parse_dtd_element was useful when first testing the dtd, but proved
> to be
> somewhat noisy for my particular application.
> To silence it, I made its inclusion optional via a preprocessor
> directive.
> In case this would be useful to others,  I've included the relevant
> snippet below
> so you can consider it for the next release. No doubt there will be
> an established
> idiom for coding this sort of thing in FoX, but I am not familiar
> enough with the codebase
> to  make that assessment.
>
> Cheers,
> Shane
>
>
> m_common_element.F90
> ...
> subroutine parse_dtd_element(contents, xv, stack, element, internal)
> ...
>     if (associated(element)) then
>       element%any = any
>       element%empty = empty
>       element%mixed = mixed
>       element%model => vs_str_alloc(trim(strip_spaces(contents)))
>       element%cp => top
>       element%internal = internal
> #ifdef DEBUG
>       call dumpCPtree(top)
> #endif
>     else
>       if (associated(top)) call destroyCPtree(top)
>     endif
>     return
>
>
>
> Toby White wrote:
>>
>> On 14 Jan 2009, at 09:54, Andrew Walker wrote:
>>
>>> Your case should not have a null pointer to segments, so I suspect
>>> something is going wrong before copyURI is called. Having said that,
>>> from my quick look at the code in parseURI it's not completely clear
>>> if segments is never allowed to be null. If it can be then your fix
>>> looks useful anyway. Perhaps Toby can comment on this?
>>>
>> segments should never be null. The initialization code ultimately
>> lives
>> in checkNonOpaquePath (which, ironically, is probably rather an
>> opaque
>> place for it to live.) where segments should be guaranteed to be
>> initialized,
>> albeit sometimes to a zero-length array.
>>
>> On quick inspection, I think their *may* be a bug when an opaque path
>> is used
>> in the URI, where segments will not be allocated correctly. This
>> should be
>> fixed, but it's an edge case, and I don't think it's causing the
>> issue
>> here.
>>
>> (from RFC 2396: 'URIs that do not make use of the slash "/" character
>> for separating hierarchical components are considered opaque by the
>> generic URI parser.')
>>
>> If parseURI ever generates a URI object with segments being null,
>> it's
>> a bug.
>>
>> Toby
>>
>>
>>
>>
>
>
> >






--
Regards,
Shane

Andrew Walker

unread,
Jan 15, 2009, 5:16:47 AM1/15/09
to fox-d...@googlegroups.com
Hi all,

I've just committed the two fixes (silenced the DTDs and allocated segments for opaque paths) and added some testing infrastructure to /utils (but there is only one test case at the moment). 

Having said that, I suspect there is still a problem with the URI stuff on your system, Shane. I'm not sure why your relative path case being treated as opaque (it contains backslashes) to trigger that bug. Also, as far as I can see your file scheme case should eventually be translated as:

open(... file='/C:/activity.xml' ...)

on line 156 of m_sax_reader.f90. Would you expect this to work on windows? The little Fortran program below should help diagnose what's going on. Do you get a path other than 'path: /C:/activity.xml'?

One possibility is that backslashes are being mangled somewhere. I've had a quick look at the intel documentation and that mentions a -nbs flag, but as this is supposed to be on by default I don't see how this could be the problem.

Cheers,

Andrew



program tesURI

  use fox_m_utils_uri

  type(URI), pointer :: u

  u => parseURI("file:///C:/activity.xml")
  call check

  contains
    subroutine check
      character(len=100) :: us
      if (associated(u))  then
        call dumpURI(u)
        us = expressURI(u)
        print*, us
        print*
        call destroyURI(u)
      else
        print*, "parsing failed"
      endif
    end subroutine check

end program testURI


For reference, I get the following output:
 scheme: file
 authority: 
 userinfo UNDEFINED
 host UNDEFINED
 port UNDEFINED
 path: /C:/activity.xml
     segment: /
     segment: C:/
     segment: activity.xml
 query UNDEFINED
 fragment UNDEFINED
 file:///C:/activity.xml 

Shane Clauson

unread,
Jan 16, 2009, 7:42:06 AM1/16/09
to fox-d...@googlegroups.com
Hello all,
Thanks for making the updates Andrew, and for looking at the windoze pathing issue.

I checked the compiler switches and the -nbs flag wasn't explicitly set.
Turning it on explicitly seemed to have no effect, as you'd expect.
One probable red herring is that I am compiling with default integer and real kind set to 8,
not the defaults of 4, but I can't see how that should be an issue.

I've run your test program, and its helped shed a little light on the situation.
The output I get is


 scheme: file
 authority:
 userinfo UNDEFINED
 host UNDEFINED
 port:    8971170457722028032

 path: /C:/activity.xml
     segment: /
     segment: C:/
     segment: activity.xml
 query UNDEFINED
 fragment UNDEFINED
 ↨           ctivity.xml

The expressURI  line above seems to indicate the leading characters in the string
are being corrupted or interpreted as metachars.

I've extended the test program to do a simple fortran open(file=...) call and dump the file. (see below)
The test for open(file=...) works fine when supplying 'c:/activity.xml', and also with "/activity.xml"
and also for a different drive "f:/activity.xml".

However, calling with '/C:/activity.xml' will result in the following error
"forrtl: severe (43): file name specification error, unit 10, file C:\c:\activity.xml"

It would appear from the message above that the OS is switching the"/" for" \"
and then catching the leading "\" and prepending "C:" automagically.

The next step will be to delve into the URI code to work out what to do with this
'"/C:/activity.xml" becomes "C:\c:\activity.xml" in the hands of open() on windows'
behaviour, and I will start on this. However, meanwhile people who are more familiar
with the codebase may be able to work out the answer more quickly !

Cheers,
Shane.


------------------------- slightly modified testURI ---------------
program testURI

  use fox_m_utils_uri

  type(URI), pointer :: u

  u => parseURI("file:///C:/activity.xml")
  call check
  pause 'press <Enter> to continue'
  call dumpFile
  pause 'press <Enter> to continue'
  stop 0
 
  contains
    subroutine check
      character(len=64) :: us

      if (associated(u))  then
        call dumpURI(u)
        us = expressURI(u)
        print*, us
        print*
        call destroyURI(u)
      else
        print*, "parsing failed"
      endif
    end subroutine check

    subroutine dumpFile
      character*80 ::aLine
      open (unit=10, file='c:/activity.xml')
20      read (10,'(A80)',end=30) aLine
        print*, trim(aLine)
        goto 20
30    close (10)
    end subroutine dumpFile

end program testURI

Andrew Walker

unread,
Jan 17, 2009, 7:41:04 AM1/17/09
to fox-d...@googlegroups.com
Hi,

It looks to me that there is several things going on here. First there appears to be a (probably always harmless) bug when the URI does not include a port specification. As far as I can see the patch below should fix this.

Second, it does look like the print / expressURI combination when the URI is producing corrupted output. I don't think this can be related to the port issue as expressURI doesn't actually need to use the port number (it stays encoded in the authority string). It may be instructive to see what:

print*, "file:///C:/activity.xml" 

outputs. And what happens when you put the string in a variable / make it be returned from a function and write that out. In any case, I don't think this is the root of your problem either, as the sax parser uses the path and scheme and these don't seem to be being corrupted. 

The third problem seems to be the (to my non-windows aware eyes) odd way that c:\ gets prepended if you call open(file='C:/'... but not open(file='c:/'... Are drive letters even supposed to be case sensitive under windows? The easy solution may be to special case filenames beginning with 'C:/' or 'file:///C:/' either in your application before starting up the sax parser or in the sax parser before the open call. If you do this in the parser I would put the check in the open_actual_file subroutine in sax/m_sax_reader.F90. 

I hope this is useful,

Cheers,

Andrew



From 2c12e1677998616d35cd2312dcb8a2137f492709 Mon Sep 17 00:00:00 2001
From: Andrew Walker <and...@diopside.local>
Date: Sat, 17 Jan 2009 12:04:35 +0000
Subject: [PATCH] Make sure port is always defined by parseURI

If parseURI does not find a port number it overwrote
the default value (-1) with an undefined local variable
so define the local port number to -1 before parsing.
---
 utils/fox_m_utils_uri.F90 |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/utils/fox_m_utils_uri.F90 b/utils/fox_m_utils_uri.F90
index 283e259..e90cb84 100644
--- a/utils/fox_m_utils_uri.F90
+++ b/utils/fox_m_utils_uri.F90
@@ -403,6 +403,7 @@ contains
     authority => null()
     userinfo => null()
     host => null()
+    port = -1
     path => null()
     segments => null()
     query => null()
-- 

Shane Clauson

unread,
Jan 22, 2009, 5:09:36 AM1/22/09
to fox-d...@googlegroups.com
Hello Andrew and All,
Thanks for the recommendations, and our follow-ups off-list Andrew.
I've tested the port mod (below), and it fixes the spurious display issue.

I've also had a chance to look at the windows absolute path issue, and
have one partial fix
that should be applied to the distribution. It would appear that for
ifort10/windows,
blanks at the unused end (if any) of the URIstring supplied to parseURI
will cause paths like
"/folderAtRoot/activity.xml" to fail.

A simple fix for this sort of path specifier on windows is to trim the
string supplied
to parseURI, as below.

-------------------------------------------------------------------------------
--- FoX/utils/fox_m_utils_uri.F90
***************
*** 385,393 ****
end function checkFragment
#endif

! function parseURI(URIstring) result(u)
! character(len=*), intent(in) :: URIstring
type(URI), pointer :: u
#ifndef DUMMYLIB
character, pointer, dimension(:) :: scheme, authority, &
userinfo, host, path, query, fragment
--- 385,394 ----
end function checkFragment
#endif

! function parseURI(inURIstring) result(u)
! character(len=*), intent(in) :: inURIstring
type(URI), pointer :: u
+ character(len=len_trim(inURIstring)) :: URIstring
#ifndef DUMMYLIB
character, pointer, dimension(:) :: scheme, authority, &
userinfo, host, path, query, fragment
***************
*** 398,403 ****
--- 399,405 ----

#endif
u => null()
+ URIstring = trim(inURIString)
#ifndef DUMMYLIB

scheme => null()
-------------------------------------------------------------------------------

This fix also has the benefit of being a little more efficient with the
storage used by the
"path" component of the uri.

I'll try next to see what can be done about resolving the issue of
windows paths containing
drive specifiers like "C:"

Regards,
Shane.

> <mailto:and...@diopside.local>>

Andrew Walker

unread,
Jan 22, 2009, 10:20:49 AM1/22/09
to fox-d...@googlegroups.com
Hi Shane,

Thanks for confirming that the port number fix works.

Your patch opens a rather unpleasant can of worms. Briefly, we make
use of the same URI parsing code to parse XML namespace names (which
should be URIs) and raise an error if the parse fails. I think FoX is
doing the right thing here in rejecting a document with, for example,
the following namespace declaration (note the space):

<employee xmlns="http://www.nist.gov ">

(But there has been some debate about how an XML processor should
deal with such names. According to Namespaces in XML 1.0 namespace
names are URIs, they are compared as opaque strings and "a processor
MUST report violations of namespace well-formedness, with the
exception that it is not REQUIRED to check that namespace names are
legal URIs". Apparently there are checks in the XML test suite to
check that errors are reported for illegal URIs.) So, I think we
actually want parseURI to fail in the presence of trailing whitespace.

Having said that, it would be nice to avoid crashing and be lenient
with options passed into open_xml_file. I think the patch below will
have the desired effect and keep the worms firmly locked in their
can. (For files names with embedded or trailing whitespace, it'll be
necessary to URI encode the space as %20, but this was always the case.)

I also suspect that there is a need to check the error stack in
open_xml_file before sax_parser_init gets called, but that's another
issue.

Cheers,

Andrew




From 3c9310090127ab41bcfa700fe3b03fd4e461ace1 Mon Sep 17 00:00:00 2001
From: Andrew Walker <a.wa...@ucl.ac.uk>
Date: Thu, 22 Jan 2009 14:20:40 +0000
Subject: [PATCH] Trim paths passed into open_xml_file

Paths with trailing whitespace can cause problems within
parseURI. Early removal of this whitespace avoids this.
---
sax/m_sax_operate.F90 | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/sax/m_sax_operate.F90 b/sax/m_sax_operate.F90
index 7813c62..c428302 100644
--- a/sax/m_sax_operate.F90
+++ b/sax/m_sax_operate.F90
@@ -31,7 +31,7 @@ contains
#else
integer :: i

- call open_file(xt%fb, file=file, iostat=i, lun=lun, es=xt%fx%
error_stack)
+ call open_file(xt%fb, file=trim(file), iostat=i, lun=lun, es=xt%
fx%error_stack)
if (present(iostat)) then
iostat = i
if (i/=0) return
--
1.5.6.5
Reply all
Reply to author
Forward
0 new messages