Transformers Java

0 views

Skip to first unread message

Bradley Zweig

unread,

Aug 4, 2024, 7:13:39 PM8/4/24

to selmatira

Aninstance of this class can be obtained with the TransformerFactory.newTransformer method. This instance may then be used to process XML from a variety of sources and write the transformation output to a variety of sinks.

Transformer is reset to the same state as when it was created with TransformerFactory.newTransformer(), TransformerFactory.newTransformer(Source source) or Templates.newTransformer(). reset() is designed to allow the reuse of existing Transformers thus saving resources associated with the creation of new Transformers.

The reset Transformer is not guaranteed to have the same URIResolver or ErrorListener Objects, e.g. Object.equals(Object obj). It is guaranteed to have a functionally equal URIResolver and ErrorListener.

Transform the XML Source to a Result. Specific transformation behavior is determined by the settings of the TransformerFactory in effect when the Transformer was instantiated and any modifications made to the Transformer instance.

An empty Source is represented as an empty document as constructed by DocumentBuilder.newDocument(). The result of transforming an empty Source depends on the transformation behavior; it is not always an empty Result.

Pass a qualified name as a two-part string, the namespace URI enclosed in curly braces (), followed by the local name. If the name has a null URL, the String only contain the local name. An application can safely check for a non-null URI by testing to see if the first character of the name is a '{' character.

This method does not return a default parameter value, which cannot be determined until the node context is evaluated during the transformation process.Parameters:name - of Object to getReturns:A parameter that has been set with setParameter.clearParameterspublic abstract void clearParameters()Clear all parameters set with setParameter.setURIResolverpublic abstract void setURIResolver(URIResolver resolver)Set an object that will be used to resolve URIs used in document(). If the resolver argument is null, the URIResolver value will be cleared and the transformer will no longer have a resolver.

Pass a qualified property key name as a two-part string, the namespace URI enclosed in curly braces (), followed by the local name. If the name has a null URL, the String only contain the local name. An application can safely check for a non-null URI by testing to see if the first character of the name is a '{' character.

The properties returned should contain properties set by the user, and properties set by the stylesheet, and these properties are "defaulted" by default properties specified by section 16 of the XSL Transformations (XSLT) W3C Recommendation. The properties that were specifically set by the user or the stylesheet should be in the base Properties list, while the XSLT default properties that were not specifically set should be the default Properties list. Thus, getOutputProperties().getProperty(String key) will obtain any property in that was set by setOutputProperty(java.lang.String, java.lang.String), setOutputProperties(java.util.Properties), in the stylesheet, or the default properties, while getOutputProperties().get(String key) will only retrieve properties that were explicitly set by setOutputProperty(java.lang.String, java.lang.String), setOutputProperties(java.util.Properties), or in the stylesheet.

Pass a qualified property name as a two-part string, the namespace URI enclosed in curly braces (), followed by the local name. If the name has a null URL, the String only contain the local name. An application can safely check for a non-null URI by testing to see if the first character of the name is a '{' character.

If a property has been set using setOutputProperty(java.lang.String, java.lang.String), that value will be returned. Otherwise, if a property is explicitly specified in the stylesheet, that value will be returned. If the value of the property has been defaulted, that is, if no value has been set explicitly either with setOutputProperty(java.lang.String, java.lang.String) or in the stylesheet, the result may vary depending on implementation and input stylesheet.

Apache OpenNLP is an open source machine learning library for natural language processing (NLP) for Java used by many popular open source applications, including Apache Solr, Apache UIMA, and Apache Lucene, as well as many commercial and research applications. Its development goes back to the early 2000s and provides NLP capabilities for sentence detection, tokenization, parts-of-speech, lemmatization, language detection, and named-entity recognition using maximum entropy and perceptron-based algorithms. These rich capabilities have led to adoption by a variety of academic and commercial projects.

In this blog post we show how you can train a sequence classification model using PyTorch and Hugging Face transformers, convert the trained model to ONNX, and use it for inference from Apache OpenNLP. A sequence classification model is capable of predicting a class label to an object. In our case, the object is a movie review and the predicted classes are negative and positive sentiments.

Our motivation is a use-case to classify documents being indexed into Apache Solr. By assigning a classification to each document, a search powered by Apache Solr can leverage those classifications to provide an improved search experience with more relevant search results. In the world of improving search relevance, having this inference available in the document indexing pipeline can have significant positive impacts!

In this section we will train a sequence classification model based on positive and negative labeled movie reviews. If you want to skip this step and get a pre-trained model ready to export to ONNX, you can use the pre-trained model jzonthemtn/distilbert-imdb from the Hugging Face hub. This model was trained using the steps described below.

We will now train a sequence classification model that predicts a category for each document. Given a document, the model will assign it a value of either 0 or 1, where zero is negative sentiment and 1 is positive sentiment. We will use the Large Movie Review Dataset via the Hugging Face datasets library.

Once the model training is complete you will have some important files under the directory. This directory will contain the model (pytorch_model.bin), the vocabulary file (vocab.txt), and other files.

The result of this command will be a directory called onnx that contains the model. (It is important to make sure you select the appropriate feature when converting the model. Refer to the Hugging Face documentation for a list of valid feature values to use when training other types of models.)

The model variable is the ONNX model file and the vocab variable is the vocab.txt file. Both of these files are in the distilbert-imdb directory that was created after training the model. We need these files for the inference:

Next, we will define the classifications that the model can assign to each document during inference. The categories map is derived from the config.json file. This map assigns label names to each of the potential categories that can be predicted by the model. (If you are using a model from the Hugging Face Hub, the model should have a config.json file. Open that file and look at the labels in the configuration to determine how to make the map. It will be straight-forward!)

You will likely note that we did not perform any model evaluation after training our model. Of course, you should! But this blog post is focused on how to use an ONNX model from Apache OpenNLP. You would evaluate your model here just as you would any other model. Apache OpenNLP does not provide any model training or evaluation functions - all training and evaluation is done outside of Apache OpenNLP using your favorite tools and frameworks. Apache OpenNLP does not require you to use any specific tools or frameworks when creating and evaluating your models.

As mentioned in the introduction, this capability was recently leveraged in efforts to improve search relevance by assigning a classification to documents as they were indexed into Apache Solr. But there are many other possibilities! Document classification is a very useful tool to have available in an NLP pipeline. Document classification can be used to route documents based on their content, in sentiment analysis, spam filtering, and many other use-cases.

In addition to document classification, Apache OpenNLP also currently supports ONNX Runtime for token classification models (named-entity recognition). You can use a pre-trained named-entity recognition model or train your own model just as we did here and use it from your Java applications with Apache OpenNLP. Extracting entities from documents prior to indexing in Apache Solr can also have a dramatic improvement on search relevance and findability.

You can learn more about Apache OpenNLP at and new contributors to the project are always welcome. If you are using Apache OpenNLP I would love to hear about your use-case to learn how OpenNLP can keep improving!

Transformers are used to increase or decrease AC voltages and currents in circuits. The operation of transformers is based on the principal of mutual inductance. A transformer usually consists of two coils of wire wound on the same core. The primary coil is the input coil of the transformer and the secondary coil is the output coil. Mutual induction causes voltage to be induced in the secondary coil.

where VI is the input voltage of the primary coil, VO is the output voltage of the secondary coil, NP, the is number of windings of the primary coil, and NS is the number of windings in the secondary coil.If the output voltage of a transformer is greater than the input voltage, it is called a step-up transformer. If the output voltage of a transformer is less than the input voltage it is called a step-down transformer.

-mainframe-software/traditional-management/ca-spool/14-0/customizing-java-transformers.html

With the Java Transformers there is a separate started task that performs the conversions.

The connection between CA Spool and the Java Transformer started task is defined in the CA Spool parameters (ESFPARM).

When you use the Java Transformers there is an additional statement, X2YYDEF, in the ESFPARM:

X2YYDEF x2yyname, Transformation FSS name

PROC=CAIQD2E, FSS JCL procedure name

MAXTASK=10, Max concurrent transformations

DEFAULT=YES Implicit X2YYDEF

The name x2yyname can be anything. See that the PROC keyword points to the Java Transformer JCL procedure CAIQD2E (can be any name).

In the NODE statement you point to the FSS name in the keyword X2YY:

NODE SAMPX2YY, Sample transform printer

defnodename, DEFNODE name

GROUP=1, Network group

TRANSFRM=x2yy, Transfomation type and options

X2YY=x2yyname Transformation FSS name

This example above can be found in the member CAIQPARM of the CA Spool 14.0 library CBQ4PARM.

When a file is queued for transformation, CA Spool determines if the specified Java FSS task is started. If not, CA Spool starts the task. It remains active unless the Transformer interface is halted or manually stopped. Do not start the CAIQD2E task manually, CA Spool must start it.

You can halt the transformer task with the /HT command on the CA Spool menu or

F caspoolstc,HT from a z/OS console. You need to restart the interface using the /ST command on the CA Spool menu or F caspoolstc,ST from a z/OS console to activate the Transformers interface again. The Java task won't be started automatically after the ST command. It will be started when a file is queued for transformation.

You can stop the transformer task with the command /PFSS,x2yyname on the CA Spool menu or

F caspoolstc,PFSS,x2yyname from a z/OS console.. The Java task will be started again when the next file is queued for transformation.

- Most of the parameters used by Java FSS task are in USS files

- The parameters used in the transformations corresponding to the ones in the DDs A2PCPARM and A2PDPARM are in project files in an USS folder named /apps. For example, A2PC.d2eproj, A2PD.d2eproj. You can create any number of project files to accommodate requirements of different reports.

See how the project files are selected for each file in the link below:

3a8082e126

Reply all

Reply to author

Forward

0 new messages