Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

XML Schema?

2 views
Skip to first unread message

Harry George

unread,
Feb 13, 2001, 10:21:41 PM2/13/01
to
Anyone have a python XML Schema parser/validator? I thought I saw
comments that it wasn't being done yet as part of xml-sig. Of course,
we don't actually need an XML Schema validator inpython (java or C++
renditions would do fine), but there is a social cachet to it, so
maybe worth the effort.

Assuming it is an open task, here is an approach. Anyone see holes in
this, besides it being a humongous task?

1. Get the specs from OASIS-->W3C.

2. Get test cases (for schemas and for instances) There are a few
cases at xml-conf, but I think a lot more will be needed. So I'll
need to generate them, and that suggests a case generator, plus of
course a test driver. I have the testcase generator and driver
done.

3. XML Schema is basically a regular expression problem, with nodes as
the "characters". So we can use classical lexer algorithms:
regexpr --> NFA --> DFA. The hassles may be at the leaf nodes,
where XML Schema has lots of special cases. I don't knbow if there
are non-re constraints in the specs, but if so I'd apply them after
the initial pass.

4. Given that state machine, run schemas through the parser until it can
build machines from valid schemas and detect invalid ones.

5. Given a sound state machine, run instance test cases through the
package until it is passing valid instances and detecting invalid
ones.

6. This would probably be an iterative enhancement exercise, once the
state machine engine was in place.

I have a lex-workalike I wrote in Modula-2, which I'll use as the
start point. Probably could use a SAX input approach ("next node"
instead of "next char"), maybe with 1 lookahead.

--
Harry George
hgg...@seanet.com

Uche Ogbuji

unread,
Feb 14, 2001, 9:00:40 AM2/14/01
to
Harry George wrote:
>
> Anyone have a python XML Schema parser/validator? I thought I saw
> comments that it wasn't being done yet as part of xml-sig. Of course,
> we don't actually need an XML Schema validator inpython (java or C++
> renditions would do fine), but there is a social cachet to it, so
> maybe worth the effort.

I'm not personally a fan of XML Schemas, but I think this would be a
very worth-while project. You'd probably get plenty of help as well.

> Assuming it is an open task, here is an approach. Anyone see holes in
> this, besides it being a humongous task?
>
> 1. Get the specs from OASIS-->W3C.
>
> 2. Get test cases (for schemas and for instances) There are a few
> cases at xml-conf, but I think a lot more will be needed. So I'll
> need to generate them, and that suggests a case generator, plus of
> course a test driver. I have the testcase generator and driver
> done.
>
> 3. XML Schema is basically a regular expression problem, with nodes as
> the "characters".

Hmm. I wouldn't go this far. The most basic parts of the content model
are so, but the entire data-type system and parts of the content model
need a different approach than regular grammar.

> So we can use classical lexer algorithms:
> regexpr --> NFA --> DFA. The hassles may be at the leaf nodes,
> where XML Schema has lots of special cases. I don't knbow if there
> are non-re constraints in the specs, but if so I'd apply them after
> the initial pass.

Interesting approach.

> 4. Given that state machine, run schemas through the parser until it can
> build machines from valid schemas and detect invalid ones.
>
> 5. Given a sound state machine, run instance test cases through the
> package until it is passing valid instances and detecting invalid
> ones.
>
> 6. This would probably be an iterative enhancement exercise, once the
> state machine engine was in place.
>
> I have a lex-workalike I wrote in Modula-2, which I'll use as the
> start point. Probably could use a SAX input approach ("next node"
> instead of "next char"), maybe with 1 lookahead.

Just to note: LT-XML supposedly has a Python interface and an XSchemas
validator. I still think your effort would be worth-while, especially
given your fresh approach.

http://www.ltg.ed.ac.uk/software/xml/


--
Uche Ogbuji
Personal: uc...@ogbuji.net http://uche.ogbuji.net
Work: uche....@fourthought.com http://Fourthought.com

Harry George

unread,
Feb 14, 2001, 12:58:53 PM2/14/01
to
Thanks for the pointer. I didn't find LTXML in my initial literature
search. Given that it exists, I don't see much reason to continue on
my effort. Maybe the xml-conf site could use the testcase generator.

Another possibility I considered was to do a python binding to the
apache Xerces "C++". Do you know if anyone has done that? That would
hook into IBM's significant C++/Java XML-oriented releases.

I'm not a fan of Schema either, but it sure is being hyped to the
local decision makers -- so I need a python treatment. The whole XML
world has migrated from "It is deliberately simple so all languages can
play" to "Let's complexify it so those pesky GPL guys can't keep up."

Uche Ogbuji <uc...@ogbuji.net> writes:

--
Harry George E-mail: harry.g...@boeing.com
The Boeing Company Renton: (425) 237-6915
P. O. Box 3707 02-CA Everett: (425) 266-3868
Seattle, WA 98124-2207 Page: (425) 631-8803

Romuald Texier

unread,
Feb 15, 2001, 6:24:38 AM2/15/01
to
Did you take a look at http://4suite.org/ ?

Regards.

Romuald Texier.

Harry George wrote:

--
Romuald Texier

Adam Logghe

unread,
Feb 14, 2001, 2:00:31 AM2/14/01
to
I haven't seen a Python schema validator yet.

I would think that if anyone would know it would be the XML-SIG mailing list
and the 4suite.org people.

4suite has the most complete standard XML toolset in Python I've seen yet,
I've been VERY pleased.

I suspect that Schema is not far enough along yet for serious
implementation??

Adam
adam in the domain devtty.net


"Harry George" <hgg...@seanet.com> wrote in message
news:m3ofw62...@wilma.localdomain...

Martin von Loewis

unread,
Feb 19, 2001, 1:52:02 PM2/19/01
to
Harry George <hgg...@cola.ca.boeing.com> writes:

> Another possibility I considered was to do a python binding to the
> apache Xerces "C++". Do you know if anyone has done that?

People have considered that, but nobody has done it, AFAICT.
Note that you could chose to expose either SAX or DOM or both.

> That would hook into IBM's significant C++/Java XML-oriented
> releases.

I guess Xerces C++ would not hook very much into Java releases...

It's also open for debate what you'd gain from using that XML parser,
compared to, say, Expat.

Regards,
Martin

Paul Prescod

unread,
Feb 19, 2001, 3:28:28 PM2/19/01
to pytho...@python.org
Harry George wrote:
>
> ...

> Another possibility I considered was to do a python binding to the
> apache Xerces "C++". Do you know if anyone has done that? That would
> hook into IBM's significant C++/Java XML-oriented releases.

Xereces C++ does not have a schema engine yet.

> I'm not a fan of Schema either, but it sure is being hyped to the
> local decision makers -- so I need a python treatment. The whole XML
> world has migrated from "It is deliberately simple so all languages can
> play" to "Let's complexify it so those pesky GPL guys can't keep up."

That's one cynical reading of the situation. I think it is more a case
of getting more competing interests into a room than we had in the early
days. You need a much more disciplined moderator/editor in that
situation than you do when nobody cares.

--
Vote for Your Favorite Python & Perl Programming
Accomplishments in the first Active Awards!
http://www.ActiveState.com/Awards

Harry George

unread,
Feb 19, 2001, 3:28:41 PM2/19/01
to

Martin von Loewis <loe...@informatik.hu-berlin.de> writes:

> Harry George <hgg...@cola.ca.boeing.com> writes:

> > That would hook into IBM's significant C++/Java XML-oriented
> > releases.
>
> I guess Xerces C++ would not hook very much into Java releases...
>

What I meant was that IBM is pouring money into XML, using C++ and
Java as the proof-of-concept languages. It would be easy for python
to fall off the leading edge if they got too far out there. Binding
directly to the C++ library keeps that connection.

Not that I approve of all this churning of standards. XML was meant
to be simple -- and it was until business people figured out they
could get lockin with complex standards.



> It's also open for debate what you'd gain from using that XML parser,
> compared to, say, Expat.
>

Expat has XML Schema and UDDI?

> Regards,
> Martin

Uche Ogbuji

unread,
Feb 19, 2001, 7:53:54 PM2/19/01
to
Adam Logghe wrote:
>
> I haven't seen a Python schema validator yet.
>
> I would think that if anyone would know it would be the XML-SIG mailing list
> and the 4suite.org people.
>
> 4suite has the most complete standard XML toolset in Python I've seen yet,
> I've been VERY pleased.
>
> I suspect that Schema is not far enough along yet for serious
> implementation??

We've just ben talking about this on XML-SIG.

The main problem seems to be that it's pretty hard to find someone who
likes XML Schemas enough to begin implementing it. I personally dislike
the spec. I prefer Schematron, TREX, RELAX, HOOK, etc.

However, given the number of other specs beginning to depend on XSchema,
I don't suppose we'll have much choice. Probably post 4Suite 1.0.

For now Henry Thomson's XSV package is the only option for Pythoneers
looking for XSchemas. (No wonder he's an XSchema fan: he's on the
working group).

Thanks for the kind words about 4Suite, though. Don't miss the 0.10.2
release we just put out. *Many* bug-fixes.

Martin von Loewis

unread,
Feb 20, 2001, 8:02:21 AM2/20/01
to
Harry George <hgg...@cola.ca.boeing.com> writes:

> > It's also open for debate what you'd gain from using that XML parser,
> > compared to, say, Expat.
> >
>
> Expat has XML Schema and UDDI?

As Paul points out: Xerces C++ does not have schema support,
either. Not sure how UDDI fits into the parsing business, though.

Regards,
Martin

Harry George

unread,
Feb 20, 2001, 9:47:23 AM2/20/01
to
My concern was not about parsers per se, but a general body of work
rapidly developing and moving to either legal or defacto standard
status. If that work is focused on one or a few languages, then other
languages have some difficulty staying in the game.

So I'm looking for a low cost way to keep up. One way is to bind to
libraries generated by others -- that's easier done against C/C++
libraries. Another is to do idiomatic code conversion -- that's
probably easier done from Java to Python.

Martin von Loewis <loe...@informatik.hu-berlin.de> writes:

--

Alex Martelli

unread,
Feb 20, 2001, 11:38:30 AM2/20/01
to
"Harry George" <hgg...@cola.ca.boeing.com> wrote in message
news:xqxg0h9...@cola.ca.boeing.com...

> My concern was not about parsers per se, but a general body of work
> rapidly developing and moving to either legal or defacto standard
> status. If that work is focused on one or a few languages, then other
> languages have some difficulty staying in the game.

...unless the "other languages" are particularly apt at "fitting into
diverse ecological niches" -- Python does that well:-).

> So I'm looking for a low cost way to keep up. One way is to bind to
> libraries generated by others -- that's easier done against C/C++
> libraries.

Easier than what...? Definitely not easier than Java/Python
integration (with Jython), so I must be misreading you...?

> Another is to do idiomatic code conversion -- that's
> probably easier done from Java to Python.

Among the easiest things in the world (definitely easier than
stealing candy from a baby, even not considering the moral
implications of this latter act) is using Java classes from
the Jython version of Python. If, as I originally read your
message, such use is what you desire, then I don't understand
your concern.

If, OTOH, XML use from Python is actually your main thrust,
I'd stay with 4Suite. But that's a personal choice -- there's
just SO much current/modern/leading-edge stuff out there for
Java, that Jython may be the best choice for cross-platform
work, just as Python + win32com probably is for Windows-only
work for similar reasons.

Anyway, my key point is that the ease of "extending and
embedding" Jython with Java is *astounding* -- a completely
different order of magnitude from 'extending and embedding'
CPython with C or C++. You don't have to write ONE LINE of
Java code in the knowledge that it will ever be used from
Python -- you can do all that's needed on the Python side
of things. .NET may give us all that and more besides one
(perhaps not-too-far-off) day, but Java and Jython give a
LOT today, within the JVM's limitations (speed, possible
security issues, etc) which you'd have from Java itself as
well to some extent.


Alex


Martin von Loewis

unread,
Feb 20, 2001, 1:08:48 PM2/20/01
to
Harry George <hgg...@cola.ca.boeing.com> writes:

> So I'm looking for a low cost way to keep up. One way is to bind to
> libraries generated by others -- that's easier done against C/C++
> libraries. Another is to do idiomatic code conversion -- that's
> probably easier done from Java to Python.

Integrating C libraries always was one of the major strengths of
Python. I'm sure that somebody will integrate any library which would
be useful to enough people - I just doubt that Xerces is such a
library.

Python is particularly good at integrating things that come from
different origins. So if it turns out that IBM UDDI support is great,
then somebody will wrap it. You could still ignore the SAX part if you
think it sucks; I'd personally favour the small Expat library over a
heavy C++ parser any time.

Regards,
Martin

Harry George

unread,
Feb 20, 2001, 7:15:38 PM2/20/01
to

"Alex Martelli" <ale...@yahoo.com> writes:

> "Harry George" <hgg...@cola.ca.boeing.com> wrote in message
> news:xqxg0h9...@cola.ca.boeing.com...
> > My concern was not about parsers per se, but a general body of work
> > rapidly developing and moving to either legal or defacto standard
> > status. If that work is focused on one or a few languages, then other
> > languages have some difficulty staying in the game.
>
> ...unless the "other languages" are particularly apt at "fitting into
> diverse ecological niches" -- Python does that well:-).

Yes, python is extrodinary in that niche. But we wouldn't be having
this discusison if it were on top of all the modules that Java is
covering.

>
> > So I'm looking for a low cost way to keep up. One way is to bind to
> > libraries generated by others -- that's easier done against C/C++
> > libraries.
>
> Easier than what...? Definitely not easier than Java/Python
> integration (with Jython), so I must be misreading you...?
>

Jython is great -- if you want to use Java. Personally, I don't want
to lock in to that world. So I was looking for ways to escape the
java mindset, not support it.

> > Another is to do idiomatic code conversion -- that's
> > probably easier done from Java to Python.
>
> Among the easiest things in the world (definitely easier than
> stealing candy from a baby, even not considering the moral
> implications of this latter act) is using Java classes from
> the Jython version of Python. If, as I originally read your
> message, such use is what you desire, then I don't understand
> your concern.

Again, I don't want to use or directly link to java. I want to link
to other libraries.

>
> If, OTOH, XML use from Python is actually your main thrust,
> I'd stay with 4Suite. But that's a personal choice -- there's
> just SO much current/modern/leading-edge stuff out there for
> Java, that Jython may be the best choice for cross-platform
> work, just as Python + win32com probably is for Windows-only
> work for similar reasons.
>

I already use 4Suite. As noted elsewhere, it doesn't do XSchema,
which is why I started this thread.


> Anyway, my key point is that the ease of "extending and
> embedding" Jython with Java is *astounding* -- a completely
> different order of magnitude from 'extending and embedding'
> CPython with C or C++. You don't have to write ONE LINE of
> Java code in the knowledge that it will ever be used from
> Python -- you can do all that's needed on the Python side
> of things. .NET may give us all that and more besides one
> (perhaps not-too-far-off) day, but Java and Jython give a
> LOT today, within the JVM's limitations (speed, possible
> security issues, etc) which you'd have from Java itself as
> well to some extent.
>

I may have to eventually use .NET at work, but it will be a cold
day in Tuxland before I use it at home...

>
> Alex
>
>
>
>

--
Harry George
hgg...@seanet.com

Alex Martelli

unread,
Feb 21, 2001, 4:18:36 AM2/21/01
to
"Harry George" <hgg...@seanet.com> wrote in message
news:m34rxoa...@wilma.localdomain...
[snip]

> > > status. If that work is focused on one or a few languages, then other
> > > languages have some difficulty staying in the game.
> >
> > ...unless the "other languages" are particularly apt at "fitting into
> > diverse ecological niches" -- Python does that well:-).
>
> Yes, python is extrodinary in that niche. But we wouldn't be having
> this discusison if it were on top of all the modules that Java is
> covering.

But it *IS* -- just use Jython, hey presto, you're there.


> > > So I'm looking for a low cost way to keep up. One way is to bind to
> > > libraries generated by others -- that's easier done against C/C++
> > > libraries.
> >
> > Easier than what...? Definitely not easier than Java/Python
> > integration (with Jython), so I must be misreading you...?
>
> Jython is great -- if you want to use Java.

This is like saying that "CPython is great -- if you want to use C".

A total non-sequitur! If you want to use C, use C; if you want to
use Java, use Java; Python is great if you want to AVOID using C, or
Java, as much as feasible, while still extracting benefits from
existing libraries that are written in/for those languages.

The correct statements may thus be more like:
J) Jython is great if you want to use libraries written in/for
Java (while eschewing use of Java itself as much as possible)
just like:
C) CPython is great if you want to use libraries written in/for
C/C++ (while eschewing use of C or C++ as much as possible)

The fact is that, while both statements (C) and (J) are true, (J)
is *far TRUER* -- reusing ANY library written in/for Java (while
eschewing use of Java itself) is a _snap_ with Jython, while the
analogous situation in case (C) often requires *SOME* amount of
C-level work to "wrap" the library specifically for Python use.

> Personally, I don't want
> to lock in to that world. So I was looking for ways to escape the
> java mindset, not support it.

The Java mindset seems to me to be "one language to bind them
all", etc. Using a language that is definitely NOT Java thus
escapes "the java mindset" (as I see it) quite effectively,
and Jython is that "language that is definitely NOT Java".

Jython runs only within the *JVM*, of course -- *THAT* is the
unavoidable pre-req for reusing libraries that also run only
within the JVM. If/when a fully-JVM-specs-compatible JIT or
AOT (ahead-of-time) compiler for JVM-bytecode emerges, then
the libraries in question will be freed from running "within"
the JVM, and so, of course, will Jython be (without any special
effort on the part of anybody EXCEPT the authors of the
hypothetical JVM-bytecode-to-something-else compilers:-).

> Again, I don't want to use or directly link to java. I want to link
> to other libraries.

You can wrap C-coded or C++-coded libraries with lots of
existing tools such as SWIG, although much manual work
is typically needed. But that buys you nothing squat, when
"that work is focused on one .. language", given that said
language, in practice, *IS* Java.

With Jython, you will NOT "use or directly link to" *Java* --
you WILL use the bytecodes of the JVM, rather than those
of the dedicated Python VM that's bundled with CPython,
and that's all.

> I already use 4Suite. As noted elsewhere, it doesn't do XSchema,
> which is why I started this thread.

But neither does the C++ version of Xerces, etc, so, why
DO you at all care about 'linking' to it?!

The (perhaps-sad) current reality is that much research and
development effort regarding bleeding-edge tools is going to
build libraries that will only run within the JVM; the silver
lining in this situation is that Jython rescues you from
having to use Java as the entry-ticket to that world of
rich, bleeding-edge libraries (you do have to be able to
*read* "just enough Java to get by", just like, to use COM
Automation effectively, you have to be able to read just
enough VB to get by, and for a similar reason -- the _docs
and examples_ you will find around will be totally slanted
to Java/VB respectively; fortunately, Java and VB, as used
in docs and examples, tend to be somewhat readable -- it's
NOT as bad as if the examples and docs were slanted to C++
or Perl!-).

If you don't NEED to use certain available-for-JVM-only
bleeding-edge libraries, then CPython appears to me to be
as well-placed as any other language to use other, non-JVM
libraries. 4Suite appears to offer, roughly, about as
much XML functionality, or more, than any other XML suite
of tools that is not JVM-connected. Easy alternatives
include using COM, which is about as effortless with CPython
(and win32all) as using for-JVM tools with Jython; if a
for-JVM-only tool IS compatible with Microsoft's JVM, you
can probably take advantage of the latter's ability to
expose any Java class as a COM/Automation object to use
it that way (I haven't tried that, and it does not seem
very strategic, given that MS's JVM is not state-of-the-
art any more, and is slowly or not-so-slowly dying).

Near-future possibilities will include XPCOM (a COM work-
alike that was developed within Mozilla and is now well
usable from CPython, thanks to ActiveState's efforts; it
remains to be seen how many bleeding-edge components &c
will in fact be available for it!) and .NET, of which
you say...:

> I may have to eventually use .NET at work, but it will be a cold
> day in Tuxland before I use it at home...

...even if/when it becomes available for, say, Linux as
well as Windows platform, and bleeding-edge tools you
need are only available for it? Why?

Judging from this assertion, and your total reluctance
to use the JVM (which you appear to be expressing as
a reluctance to use *Java*, which would in fact be well
justified, AND a completely different thing), you would
appear to be allergic to bytecode-based execution
environments. In which case, it's hard to understand
why you use Python, since, in any of its forms (CPython
with the special-purpose VM, Jython with JVM, Python.NET
with MSIL...), it *DOES* rely on just such bytecode ideas!-)


Alex

0 new messages