HTTP Get to scrape data from a site

62 views
Skip to first unread message

Duckredbeard

unread,
Jul 21, 2021, 9:37:47 PM7/21/21
to Tasker
I am using this task to scrape the website shown in A2 for the status of a local mountain biking trail.  The end product desired is setting a variable that reflects the trail status.  The wrench in the works is that the person who updates the trail status on this site does not use a consistent format to state the status.  Also, it seems there are more that one person who can change the site.  If you visit the site, you will see in large print a statement regarding the status.  Under that, there is a note, where the words "open" and "closed" occur a few times, so it seems to me that just trying to match those words in A3 would cause issues, as both occur.  Also on that site, there is a clickable "next" where you can see past trail status updates and see that there is no consistent format of the message.  Sometimes the two trail names are reversed, sometimes the word "parks" is added, sometimes there are spelling errors, sometimes the trail closure is justified (wet, race, tree).

Besides asking the site maintainers to use standard words or phrases, how can I use step A3 to reliably scrape the info to get the trail status?  Will I have to list all the possible matches in my Regex of A3?  Could I use the number of occurrances of the words "open" or "closed" and the one with the higher count wins?


```
Task: Blankets

A1: Vibrate [
     Time: 46 ]

A2: HTTP Request [
     Method: GET
     Timeout (Seconds): 30
     Structure Output (JSON, etc): On ]

A3: Simple Match/Regex [
     Type: Regex
     Text: %http_data
     Regex: (Blankets Creek are open|Rope Mill are CLOSED) ]

A4: Run Shell [
     Command: date +"%H:%M"
     Timeout (Seconds): 0
     Store Output In: %BLANKETSTRAILTIME ]

A5: If [ %mt_match ~ Blankets Creek are open ]

    A6: Variable Set [
         Name: %BLANKETSSTATUS
         To: OPEN
         Max Rounding Digits: 3
         Structure Output (JSON, etc): On ]

A7: Else

    A8: Variable Set [
         Name: %BLANKETSSTATUS
         To: CLOSED
         Max Rounding Digits: 3
         Structure Output (JSON, etc): On ]

A9: End If

A10: Minimalistic Text [
      Configuration: %BLANKETS = %BLANKETSSTATUS (%BLANKETSTRAILTIME)
      Timeout (Seconds): 0
      Structure Output (JSON, etc): On ]

A11: Notify [
      Title: Blankets Creek
      Text: Blankets Creek trail system is now open
      Icon: mw_maps_directions_bike
      Number: 0
      Priority: 3
      LED Colour: Red
      LED Rate: 0 ]If  [ %BLANKETSSTATUS ~ OPEN ]

A12: Vibrate [
      Time: 46 ]

A13: Flash [
      Text: %BLANKETSSTATUS (%BLANKETSTRAILTIME)
      Long: On ]


```

jmjc...@gmail.com

unread,
Jul 21, 2021, 9:59:55 PM7/21/21
to Tasker
%http_data returns the content of the whole web page. If we extract only the "statement" from %http_data, then we see if the "statement" contains "open" or "closed". Would that work in all scenarios?

Duckredbeard

unread,
Jul 21, 2021, 10:17:52 PM7/21/21
to Tasker
Yes. But how do we isolate this "statement"?

jmjc...@gmail.com

unread,
Jul 21, 2021, 11:45:42 PM7/21/21
to Tasker
You can use the Variable Search Replace or SimpleMatch action to achieve that. I took a quick look at the webpage and this seems to work:

SimpleMatch action:
Type: Simple
Text: %http_data
Match Pattern: data-title="$status"

After this action, the statement will be stored in %status.

So:

if %status ~ *open ...
if %status ~ *closed ...

joaomgcd

unread,
Jul 22, 2021, 4:18:48 AM7/22/21
to Tasker
What if you simply use

%http_data[.entry-title]

? 😊

jmjc...@gmail.com

unread,
Jul 22, 2021, 5:07:33 AM7/22/21
to Tasker
Yep, that will do as well!
Reply all
Reply to author
Forward
0 new messages