node-red html node

1,482 views
Skip to first unread message

Andreas Fuchs

unread,
Dec 18, 2015, 10:48:40 AM12/18/15
to Node-RED
ho i try to get the image source out of a html tag

the html node is configured as:

.observ .temp .wind-container

the html i get that way is:

<img alt="direction" src="/fileadmin/styles/img/wind/wind-arrow_small-white.png" style="transform:rotate(-160deg); -ms-transform:rotate(-160deg); -moz-transform:rotate(-160deg); -webkit-transform:rotate(-160deg); -o-transform:rotate(-160deg);" class="wdir">

now i tryed lots of things like:

.observ .temp .wind-container img.attr('src')

but no matter what i try i get unmatched selector error messages.
How do i have to write the selector so i get the single values back for src and style?

The other issue i have, if i forward the current result to a node-red-contrib-ui template node, the < and > signs are replaced by &lt; and &gt; an then not rendered as html by the browser. here i'm unsure if this is an error of the html or template node.

example:
[{"id":"88f5ae81.3d8df8","type":"ui_tab","name":"Home","icon":"dashboard","order":"1"},{"id":"9c132289.f50338","type":"inject","z":"2035bcaf.433e24","name":"Every minute","topic":"","payload":"test2","payloadType":"date","repeat":"60","crontab":"","once":true,"x":275,"y":216,"wires":[["a9b51b44.f67088","1761fdd9.eab442"]]},{"id":"1761fdd9.eab442","type":"http request","z":"2035bcaf.433e24","name":"Wetter24 Interlaken","method":"GET","ret":"txt","url":"http://www.wetter24.de/vorhersage/schweiz/interlaken/18129469/","x":474,"y":371,"wires":[["93bc2fd3.9a5408","dff72f5.816f15","cb2aac8d.fd89e8","df4d242d.c5b888","cadd745.1179688"]]},{"id":"cadd745.1179688","type":"html","z":"2035bcaf.433e24","name":"Winddir","tag":".observ .temp .wind-container","ret":"html","as":"multi","x":690,"y":573,"wires":[["7291fc8b.e18784","7bd760fd.fa9ad"]]},{"id":"7291fc8b.e18784","type":"ui_template","z":"2035bcaf.433e24","tab":"88f5ae81.3d8df8","name":"","group":"Test","order":1,"format":"<span>{{msg.payload}}</span>","x":887,"y":574,"wires":[[]]},{"id":"7bd760fd.fa9ad","type":"debug","z":"2035bcaf.433e24","name":"","active":true,"console":"false","complete":"payload","x":884,"y":623,"wires":[]}]

Dave C-J

unread,
Dec 18, 2015, 12:17:31 PM12/18/15
to node...@googlegroups.com
Right now the HTML node only really lets you return text or the the html container. - ie not individual attributes. So the simplest way would be to so a string split on a following function node. Here I am forcing it into xml so that all the attributes would then be available....  ugly but...

[{"id":"4fcc45f3.b033bc","type":"inject","z":"9e538f88.61ac7","name":"Every minute","topic":"","payload":"test2","payloadType":"date","repeat":"","crontab":"","once":true,"x":206,"y":1907,"wires":[["a4f6c3ff.5b094"]]},{"id":"a4f6c3ff.5b094","type":"http request","z":"9e538f88.61ac7","name":"Wetter24 Interlaken","method":"GET","ret":"txt","url":"http://www.wetter24.de/vorhersage/schweiz/interlaken/18129469/","x":286,"y":2011,"wires":[["9edf2254.6120e","b6cfc5fe.493038"]]},{"id":"9edf2254.6120e","type":"html","z":"9e538f88.61ac7","name":"Winddir","tag":".observ .temp .wind-container","ret":"html","as":"multi","x":374,"y":1952,"wires":[["3964c3cf.c69b3c"]]},{"id":"b6cfc5fe.493038","type":"debug","z":"9e538f88.61ac7","name":"","active":true,"console":"false","complete":"payload","x":560,"y":2048,"wires":[]},{"id":"3964c3cf.c69b3c","type":"function","z":"9e538f88.61ac7","name":"","func":"msg.payload = msg.payload + \"</img>\";\nreturn msg;","outputs":1,"noerr":0,"x":473,"y":1877,"wires":[["96ffb058.69005"]]},{"id":"96ffb058.69005","type":"xml","z":"9e538f88.61ac7","name":"","attr":"","chr":"","x":592,"y":1959,"wires":[["ee9337ed.116cc8"]]},{"id":"ee9337ed.116cc8","type":"function","z":"9e538f88.61ac7","name":"","func":"msg.payload = msg.payload.img[\"$\"].src;\nreturn msg;","outputs":1,"noerr":0,"x":657,"y":1891,"wires":[["67fd570e.9802a8"]]},{"id":"67fd570e.9802a8","type":"debug","z":"9e538f88.61ac7","name":"","active":true,"console":"false","complete":"payload","x":789,"y":1980,"wires":[]}]

Edward Vielmetti

unread,
Dec 18, 2015, 12:28:41 PM12/18/15
to node...@googlegroups.com
When I've tried to extract text out of a bunch of arbitrary well-formed and predictable HTML pages, my easiest pipeline has been this:

1. Normalize the HTML and convert to XML using "tidy"
2. Translate the XML to JSON using one of the "xml2json" tools out there
3. Extract the elements from the JSON, where it's easy using "jq".

The writeup I did when I had this all fresh in my brain is here


it does not explicitly mention Node-RED, but the principles should be useful. Last time I did this from within Node-RED I used an HTTP request node to feed an "exec" node that returned the fully parsed data. The exec could be as simple as

#!/bin/sh
tidy -q -asxml 2>/dev/null | xml2json | jq -r .html.body.img.src

which spits out 

/fileadmin/styles/img/wind/wind-arrow_small-white.png

when I feed it the data that you provide.


--
http://nodered.org
---
You received this message because you are subscribed to the Google Groups "Node-RED" group.
To unsubscribe from this group and stop receiving emails from it, send an email to node-red+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Julian Knight

unread,
Dec 18, 2015, 5:36:03 PM12/18/15
to Node-RED
If the text from the HTML is consistent, you might be better off extracting what you want using using REGEX.

Dave C-J

unread,
Dec 19, 2015, 7:44:12 AM12/19/15
to node...@googlegroups.com
I've pushed an enhancement to the HTML node that adds an option to pull out the attributes.
In your case it would still best be done as two instances of the same node - one (as you have it) to select down to the html, and then a second (new) html node set to extract the attributes of any selected tags within that html (in this case img) - such that it now outputs
{ "alt": "direction", "src": "/fileadmin/styles/img/wind/wind-arrow_small-white.png", "style": "transform:rotate(10deg); -ms-transform:rotate(10deg); -moz-transform:rotate(10deg); -webkit-transform:rotate(10deg); -o-transform:rotate(10deg);", "class": "wdir" }

Dave C-J

unread,
Dec 19, 2015, 8:05:38 AM12/19/15
to node...@googlegroups.com
Actually it works better than I thought ;-) ... selecting   .observ .temp .wind-container img 
and the new attribute capability does it in one hit.. :-) yay me !

Julian Knight

unread,
Dec 19, 2015, 12:05:20 PM12/19/15
to Node-RED
Damn, you're good! :)

Andreas Fuchs

unread,
Dec 20, 2015, 2:20:35 PM12/20/15
to node...@googlegroups.com
Hi guys, thanks allot for all your tips. Im extracting per regex at the moment, bu are looking forward to test Dave's enhancement. That all really rocks

--
http://nodered.org
---
You received this message because you are subscribed to a topic in the Google Groups "Node-RED" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/node-red/SvxImZ0pRwQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to node-red+u...@googlegroups.com.

Dave C-J

unread,
Dec 21, 2015, 1:38:18 PM12/21/15
to Node-RED
Reply all
Reply to author
Forward
0 new messages