Parsing HTML table

353 views
Skip to first unread message

Tony

unread,
Apr 30, 2018, 12:14:35 AM4/30/18
to Node-RED
Hello, 

I hope everyone is doing fine. I would like your help with something, I am trying to parse a html table from a website using node - red in order to obtain two arrays one for the cities and one for the values. My flow is below:

[{"id":"e5b1403a.9c7c9","type":"inject","z":"93368df8.956a7","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":320,"y":340,"wires":[["b92c199b.dc9a88"]]},{"id":"b92c199b.dc9a88","type":"http request","z":"93368df8.956a7","name":"","method":"GET","ret":"txt","url":"https://stat.epa.gov.tw/","tls":"","x":490,"y":340,"wires":[["fbc646f8.3ead88"]]},{"id":"6f962955.749268","type":"debug","z":"93368df8.956a7","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":1010,"y":400,"wires":[]},{"id":"cef098be.3fc4f8","type":"debug","z":"93368df8.956a7","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","x":990,"y":260,"wires":[]},{"id":"fbc646f8.3ead88","type":"html","z":"93368df8.956a7","name":"","property":"","tag":"table.tableAQI","ret":"text","as":"single","x":720,"y":340,"wires":[["cef098be.3fc4f8","6f962955.749268"]]}]

I am using the html node to define the selector, I have tried many different ones but most of them throw an empty array. The website I am trying to parse is this one: https://stat.epa.gov.tw/  if you inspect it you will see that it actually has to tables the one I am trying to parse is the big one on the right. If I use tr or td as selectors it references the other table. 

I am kind of new to node-red and JavaScript so I apologized if my question is not that good. I really appreciate your help. 


Capture.PNG

Zenofmud

unread,
Apr 30, 2018, 7:49:37 AM4/30/18
to node...@googlegroups.com
You have two issues
1) the format of your selector is improper - see the example at https://github.com/node-red/cookbook.nodered.org/wiki/Extracting-data-from-an-HTML-page
2) (the big issue) the table you are looking at is a generate table from a script. The HTML node can’t access the elements that are generated by a script on the page.

Paul

Tony

unread,
Apr 30, 2018, 8:10:25 AM4/30/18
to Node-RED
Thanks Paul indeed, I figured a few hours back it is generated by a Script. I was just about to watch some videos how to use Selenium and Python to extract the content. Do you know if there is a way to parse this information in Node-Red? Thank you.

Zenofmud

unread,
Apr 30, 2018, 8:53:51 AM4/30/18
to node...@googlegroups.com
If you mean parse it using the HTML node, no I don’t. Sorry.
 
--
http://nodered.org
 
Join us on Slack to continue the conversation: http://nodered.org/slack
---
You received this message because you are subscribed to the Google Groups "Node-RED" group.
To unsubscribe from this group and stop receiving emails from it, send an email to node-red+u...@googlegroups.com.
To post to this group, send email to node...@googlegroups.com.
Visit this group at https://groups.google.com/group/node-red.
To view this discussion on the web, visit https://groups.google.com/d/msgid/node-red/3f8f18a4-7e93-4c36-9c4a-cd2b44b1a5a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages