Grupos de Google ya no admite nuevas publicaciones ni suscripciones de Usenet. El contenido anterior sigue siendo visible.

Validating XML against schema (XSD) possible with TCL?

Visto 267 veces
Saltar al primer mensaje no leído

ben.c...@gmail.com

no leída,
8 sept 2009, 8:05:508/9/09
a
I have been reading every bit of documentation I can find and am still
not clear if this is doable with TclXML or TDom.
I only need to read an XML file and validate it (+ use the data of
course) so SAX style implementation is enough.
Can anyone help?

tom.rmadilo

no leída,
8 sept 2009, 11:23:168/9/09
a

If the schema was hand generated and not spit out from a Java or .NET
service, it could be easy to reconstruct the schema in tWSDL. If that
is possible, all the data will be available as well as a series of
procedures to consume the XML document. (You don't need to use a WSDL
service to use this feature) Validation is based on XSD types and
structures. Post, or email me the schema and I can quickly rule in/out
if tWSDL can help.

Gerald W. Lester

no leída,
8 sept 2009, 12:10:178/9/09
a

Do you really want/need to validate it - or just to process it?

The Web Services for Tcl package allows you to take an XSD and an XML
message and transforms it to a dictionary that you can process.

It does not validate that the XML meets the XSD -- it just pulls out the
parts that meet the elements and makes a dictionary. I.e. no range checks
and no error if additional elements are present or required elements are
missing.

Are you actually attempting to make a Web Service call with a WDSL or what?

--
+------------------------------------------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+

Cameron Laird

no leída,
8 sept 2009, 12:51:118/9/09
a
In article <8f161d3b-89d9-4864...@v37g2000prg.googlegroups.com>,

tom, I read ben as asking a different question than I believe you've
answered. I think he is NOT asking about Web Services-WSDL-SOAP, but
more general use of XML. I believe he's asking for a function that
receives one XSD instance, and one XML instance, and returns either:
*) OK
*) "invalid construct at $SOMEWHERE; does not match
$SOMETHING expected from XSD"
"Valid" here is interpreted in the sense of the "XML Schema" definition,
*not* WSDL.

It is my understanding that tDom does NOT currently build in this kind
of XSD validation, but 3.0 of TclDOM does.

I've started to collect notes <URL:
http://phaseit.net/claird/comp.text.xml/XSD.html#validation > on this
subject.

tom.rmadilo

no leída,
8 sept 2009, 14:13:168/9/09
a
On Sep 8, 9:51 am, cla...@lairds.us (Cameron Laird) wrote:
> In article <8f161d3b-89d9-4864-acce-a8ddc0f65...@v37g2000prg.googlegroups.com>,

Yes, and tWSDL contains completely separate validation procs,
generated during type definition. You can actually use it to generate
an XSD schema, although it would currently require you to copy-n-paste
it from the WSDL. Maybe there is a proc which allows you to just print
the schema.

You can't do everything allowed with XSD, but you can create derived
simple types, create and use abstract types. If you define a procedure
with <ws>proc, you get an API which takes a reference to the document
(not the SOAP message). Of course the proc you define gets either
values or references to more complex structures.

Anyway, as far as validation goes, if you can build the schema with
tWSDL (actually the TWiST API), you can validate documents against it.
If validation fails, you can print out the point and reason of
failure.

tom.rmadilo

no leída,
8 sept 2009, 20:39:298/9/09
a
On Sep 8, 9:51 am, cla...@lairds.us (Cameron Laird) wrote:
> In article <8f161d3b-89d9-4864-acce-a8ddc0f65...@v37g2000prg.googlegroups.com>,
>

Here's an (somewhat long) example of using tWSDL/TWiST at a command
line to validate an XML document:


source init.tcl ;# Sources tWSDL/TWiST

<ws>namespace init ::stock

<ws>namespace schema ::stock "http://junom.com/stockquoter"


<ws>type enum stock::symbol {MSFT WMT XOM GM F GE}
<ws>type pattern stock::Code {[0-9]{4}} xsd::integer
<ws>type simple stock::verbose xsd::boolean
<ws>type simple stock::quote xsd::float
<ws>type enum stock::trend {-1 0 1} xsd::integer
<ws>type simple stock::dailyMove xsd::float
<ws>type simple stock::lastMove xsd::float
<ws>type simple stock::name
<ws>type simple stock::dateOfChange xsd::dateTime

# Example use of documentation proc

<ws>doc type ::stock symbol "NYSE Trading Symbol"

<ws>element sequence stock::StockResponse {
{Symbol:stock::symbol }
{Quote:stock::quote }
{DateOfChange:stock::dateOfChange {minOccurs 0}}
{Name:stock::name {minOccurs 0 nillable true}}
{Trend:stock::trend {minOccurs 0}}
{DailyMove:stock::dailyMove {minOccurs 0}}
{LastMove:stock::lastMove {minOccurs 0}}
}

<ws>element sequence stock::StockRequest {
{Symbol:stock::symbol}
{Verbose:stock::verbose {minOccurs 0 default "1"}}
}

<ws>doc element stock StockRequest {Defines StockRequest type.
User supplies NYSE symbol and a verbose flag for additional data.}

<ws>element sequence stock::StocksToQuote {
{Symbol:stock::symbol {maxOccurs 8 default "MSFT"}}
{Verbose:stock::verbose {minOccurs 0 default "1"}}
}

<ws>doc element stock StocksRequest {Multiple StockRequest in one
document.}

<ws>element sequence stock::StocksQuoted {
{StockResponse:elements::stock::StockResponse {maxOccurs 8}}
}

<ws>proc ::stock::PutsData {StockRequest} {
puts "Symbol = $Symbol Verbose = $Verbose\n"
set StockValue [format %0.2f [expr 25.00 + [ns_rand 4].[format
%0.2d [ns_rand 99]]]]
if {$Verbose} {
return [list $Symbol $StockValue 2006-04-11T00:00:00Z "SomeName Corp.
" 1 0.75 0.10]
} else {
return [list $Symbol $StockValue]
}
} returns StockResponse

<ws>namespace set ::stock hostHeader "filesystem"
<ws>namespace finalize ::stock
<ws>namespace freeze ::stock

# Process XML Document:

dom parse {<StockRequest>
<Symbol>MSFT</Symbol>
<Verbose>1</Verbose>
</StockRequest>} requestDoc

$requestDoc documentElement requestRoot

set instanceNS ::request::input
set responseNS ::request::output

namespace eval $instanceNS {}

::xml::instance::newXMLNS $instanceNS [$requestRoot asList] "1"

set ::request::input::documentElement StockRequest

set is_valid
[$::wsdb::elements::stock::StockRequest::validate ::request::input::StockRequest]

if {$is_valid} {
puts "Document is valid:"
puts [::xml::document::print $instanceNS]\n
set responseXML
[::wsdb::operations::stock::PutsDataOperation::Invoke ::request::input::StockRequest
$responseNS]
set ${responseNS}::documentElement StockResponse
puts [::xml::document::print $responseNS]\n

} else {
puts "Document is invalid:"
puts [::xml::instance::printErrors ::request::input::StockRequest
5]
}


This prints out (when valid):

Document is valid:
<?xml version="1.0" encoding="utf-8"?>
<StockRequest>
<Symbol>MSFT</Symbol>
<Verbose>1</Verbose>
</StockRequest>

Symbol = MSFT Verbose = 1

<?xml version="1.0" encoding="utf-8"?>
<StockResponse>
<Symbol>MSFT</Symbol>
<Quote>27.70</Quote>
<DateOfChange>2006-04-11T00:00:00Z</DateOfChange>
<Name>SomeName Corp. </Name>
<Trend>1</Trend>
<DailyMove>0.75</DailyMove>
<LastMove>0.10</LastMove>
</StockResponse>

When invalid (change MSFT to MSFTT):

Document is invalid:

StockRequest
Invalid Child:
Symbol
Invalid Value for Symbol
Element = Symbol
Value = MSFTT

niobe

no leída,
9 sept 2009, 7:57:579/9/09
a
Ah, no this is not in a web context at all. I am using XML as a kind
of API to my TCL program. Some of the programs behaviour is
customizable, and this needs to be done with some kind of
configuration file rather than arguments since the data is very
structured and heirachical.
I think XML is a good choice for this but it's essential that it's
validated so the program can ignore the customisation if there is any
kind of error. Cameron is right in this respect, it's just a yes or
no, a decision on whether to trust the entire set of data or not.
Importing badly formed data could be ugly indeed.

As it happens I received a nice response from another forum, I'll
include it here as for such an important task it seems very difficult
to find good information. Maybe a lot of people are still using DTD?
Coming fresh to XML though schemas look far more up to the task since
they can validate character data.

Haven't tried this yet, but looks very simple

---------

Hello,

I managed to validate a document using TclDOM along with TclXML

Here is a sample code :

#####################################
package require xml
package require xml::libxml2
package require dom

# Reads the document to validate
set text [read [open /input/atmoa4_atmos_smioc.xml]]
set doc [dom::parse $text]

# Reads the validating schema
set schema_text [read [open/xmlfiles/smioc.xsd]]
set schema_doc [dom::parse $schema_text]

# Compile the schema
$schema_doc schema compile

# Do the actual validation
$schema_doc schema validate $doc

tom.rmadilo

no leída,
9 sept 2009, 11:30:539/9/09
a
On Sep 9, 4:57 am, niobe <ben.carb...@gmail.com> wrote:
> Ah, no this is not in a web context at all. I am using XML as a kind
> of API to my TCL program. Some of the programs behaviour is
> customizable, and this needs to be done with some kind of
> configuration file rather than arguments since the data is very
> structured and heirachical.
> I think XML is a good choice for this but it's essential that it's
> validated so the program can ignore the customisation if there is any
> kind of error. Cameron is right in this respect, it's just a yes or
> no, a decision on whether to trust the entire set of data or not.
> Importing badly formed data could be ugly indeed.

This is exactly what the TWiST API provides: it turns a Tcl procedure
into an XML document consumer. Validation is done inline, but the data
is also instantly available to the procedure you define. However, you
have to define your own schema using the API.

But it does several other very useful things during the invoke step:
defaults are added for missing elements and the data is filtered to
remove additional elements. Also, some non-essential structural
requirements on the input XML are relaxed (besides ignoring extra
elements): elements don't have to be in any particular order or
grouped together if there is more than one of the same element in a
group. These requirements enhance processing speed for optimized code,
but are unimportant in the Tcl environment. The invoke API reorders
and groups the elements to match the definition (this is necessary
because the data loses the element container (although deep structures
usually require element references to hand off processing to other
API).

In general there are two issues with XML: validating the data/document
and actually getting at the data. This is true regardless of
programming language. Although generic validators allow you to avoid
the need to construct your own schema, because they consume a schema
along with the instance document, you still have to understand the
structure of the incoming documents to get at and use the data. It is
also possible that you might need to add or subtract elements or
change defaults to match your requirements.

If you read Cameron's article (linked to in his notes on schema
validation, great article, read it!), he talks about the difficulty of
validation beyond what can be easily expressed in a schema: for
instance does an invoice id match up with an actual id in a database?
It is easy to write this into a validation procedure in tWSDL/TWiST.
And all simpleType validation is done up the chain used to derive the
type (and decimal type are converted to canonical form for
validation).

It may be that tWSDL/TWiST is poorly named. The SOAP/WSDL protocol is
handled by exactly one procedure, which validates and strips away the
envelope and maps the document to the invoke procedure. The lower half
of the script above is a full substitute for that procedure.

Nick Hounsome

no leída,
9 sept 2009, 11:42:219/9/09
a
On 9 Sep, 12:57, niobe <ben.carb...@gmail.com> wrote:
> Ah, no this is not in a web context at all. I am using XML as a kind
> of API to my TCL program. Some of the programs behaviour is
> customizable, and this needs to be done with some kind of
> configuration file rather than arguments since the data is very
> structured and heirachical.
> I think XML is a good choice for this but it's essential that it's
> validated so the program can ignore the customisation if there is any
> kind of error. Cameron is right in this respect, it's just a yes or
> no, a decision on whether to trust the entire set of data or not.
> Importing badly formed data could be ugly indeed.
>

This is a very strange use of Tcl since one of its main advantages is
that it is easy to use as a powerful configuration language directly -
even complex structured hierarchies can now be easily handled using
nested dictionaries.
You can make it more friendly by using a safe interpreter and having
commands to match what would have been your XML elements:


<AAA> <BBB x="42"> <CCC> hello </CCC> </BBB> </AAA>

becomes either the nested dict (requires later validation)

AAA {
BBB {
{x 42}
{CCC hello}
}
}

or the command sequence (commands can validate as you go - each
command returns a dict to ultimately produce the same as above)

AAA [BBB {x 42} [CCC hello]]

or even use an OO package and build it piecemeal

Hai Vu

no leída,
15 sept 2009, 1:16:2815/9/09
a
On Sep 8, 5:05 am, "b...@spam.com" <ben.carb...@gmail.com> wrote:

I ran into similar situation and I had to get the job done fast so I
cheated by calling an external program called xmllint. You can look it
up.
Hai

niobe

no leída,
15 sept 2009, 3:03:5115/9/09
a
>On Sep 10, 1:42 am, Nick Hounsome <nick.houns...@googlemail.com> wrote:
> This is a very strange use of Tcl since one of its main advantages is
> that it is easy to use as a powerful configuration language directly -
> even complex structured hierarchies can now be easily handled using
> nested dictionaries.
> You can make it more friendly by using a safe interpreter and having
> commands to match what would have been yourXMLelements:

Hi Nick,

Not so strange I think, since by definition an API is not for me, i.e.
the TCL programmer, to use - else it would be in the main code.
XML provides a relatively intuitive way for a non-TCL person to
program the application, whilst provides more structure than *nix-
style config files.

N

0 mensajes nuevos