Another way to process HTML in GAS server side?

90 views
Skip to first unread message

ChiefChippy2 is awesome

unread,
Apr 15, 2019, 1:27:13 PM4/15/19
to Google Apps Script Community
So I often have HTML to process in GAS, but I can't always do it in the client side.
In GAS server side the (best) only way to process HTML is by XmlService, but it often returns an error when parsing the HTML.
A specific example would be :
1.I have a piece of HTML, i.e
<h1 class="title">Title</h1>
<div>
<span class="text">Lorem Ipsum...</span>
<img src="/img/lorem.png">
<span class="more text">Why Ipsum?</span>
</div>
<h1 class="title">Another title</h1>
<div>
<span class="text">Not Lorem Ipsum...but Ipsum Lorem</span>
<img src="/img/ipsum.png">
<span class="more text">Why not Lorem?</span>
</div>
2. I want to make this into a JSON format - I want something like

{"body":[{"title":"Title","text":["Lorem Ipsum...","Why Ipsum?"]},{"title":"Title","text":["Not Lorem Ipsum... but Ipsum Lorem","Why not Lorem?"]}]}

In JavaScript I would do something like (I know it is not optimized ):

var JsOn={}
var array=[]
for(var i =0;i<document.getElementsByTagName("h1").length;i++){
array[i]={"title":document.getElementsByTagName("h1")[i].innerText,"text":[]}
var txt=document.getElementsByTagName("h1")[i].getElementsByClassName("text")
Array.from(txt).forEach(function(a){text.push(a.innerText)})
}

But I can't use this in GAS server side since document is a client side thing.
If I am right it is possible via search and split but it is gonna be painfully complicated for me.
Thanks for any thoughts or advices.

Adam Morris

unread,
Apr 15, 2019, 9:29:40 PM4/15/19
to google-apps-sc...@googlegroups.com
Client-side there is probably a library that will help you parse the
HTML to get what you want. Probably even jQuery.
But yes, don't use regular expressions or string manipulations; you
are intending to parse a tree and so use a solution that handles that
correctly.
> --
> You received this message because you are subscribed to the Google Groups
> "Google Apps Script Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-apps-script-c...@googlegroups.com.
> Visit this group at
> https://groups.google.com/group/google-apps-script-community.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-apps-script-community/5d28e69f-1a1a-46ef-8dfb-420c66d569c2%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>


--
*————————————————————————————*

*Adam Morris** | IT Systems & English Teacher | **IGB*
* International School*Jalan Sierramas Utama, Sierramas,
47000 Sungai Buloh, Selangor DE, Malaysia

*t *+60 3 6145 4688
*f *+60 3 6145 4600
*w *www.igbis.edu.my
*e adam....@igbis.edu.my <adam....@igbis.edu.my>*
*————————————————————————————*

Romain Vialard

unread,
Apr 16, 2019, 4:24:50 AM4/16/19
to Google Apps Script Community
Here's an example of functions getElementById(), getElementsByClassName() & getElementsByTagName() that work well with the XML Service:

But indeed, if the XmlService returns an error when parsing your HTML in won't be helpful.
You could try the XmlService on a substring of your HTML or indeed use regex or search and split if you are not able to use the XmlService at all...
Reply all
Reply to author
Forward
0 new messages