Get Text from Website

432 views
Skip to first unread message

kunj thakker

unread,
May 19, 2019, 6:02:54 AM5/19/19
to MIT App Inventor Forum
I just want to grab text from https://www.timeanddate.com/time/zones/ist the IST
could I possibly do that
There are some posts regarding this 
but I couldn't find examples for scrape 

TimAI2

unread,
May 19, 2019, 7:46:11 AM5/19/19
to MIT App Inventor Forum
Use the web component then text split blocks or parsing to get the data you want
In the blocks below i scraped the page, then extracted the IST time

blockswebscrape.png


kunj thakker

unread,
May 19, 2019, 8:19:24 AM5/19/19
to MIT App Inventor Forum
how did you get the ones marked in image
blockswebscrape.png

TimAI2

unread,
May 19, 2019, 8:45:05 AM5/19/19
to MIT App Inventor Forum
I looked at the full web scrape and identified the split points from there.

The first one was easy "IST time now" as I could see that on the rendered page...

kunj thakker

unread,
May 19, 2019, 9:13:42 AM5/19/19
to MIT App Inventor Forum
what if i want to learn such stuff

TimAI2

unread,
May 19, 2019, 9:40:04 AM5/19/19
to MIT App Inventor Forum
You just have !!

Seriously, my example more or less tells all there is to know. (ABG will no doubt be along with more...)

As mentioned you can do more detailed parsing of the web scrape using Taifun's parsing method procedure

Regular success relies on things remaining the same on the web page you are scraping

ABG

unread,
May 19, 2019, 12:47:51 PM5/19/19
to MIT App Inventor Forum

TimAI2

unread,
May 19, 2019, 1:23:25 PM5/19/19
to MIT App Inventor Forum
:)
Message has been deleted

kunj thakker

unread,
May 21, 2019, 7:09:38 AM5/21/19
to MIT App Inventor Forum
Not Working
Please Help
IMG-2210144563321.png

TimAI2

unread,
May 21, 2019, 10:03:05 AM5/21/19
to MIT App Inventor Forum
Doesn't appear to be anything wrong with the url, works in a browser, and worked in my example.
Network connectivity issue ?

ABG

unread,
May 21, 2019, 12:10:30 PM5/21/19
to MIT App Inventor Forum
What happens if you change the https: to http:
in the URL?

ABG

kunj thakker

unread,
May 22, 2019, 7:16:44 AM5/22/19
to MIT App Inventor Forum
Changed URL to http: but Watch Now
IMG-2210144563322.png

ABG

unread,
May 22, 2019, 10:00:35 AM5/22/19
to MIT App Inventor Forum
It looks like timeanddate.com noticed that you asked for
http:// and redirected you back to https:// ,
judging from the error message it gave you.

So much for that idea.

ABG

ABG

unread,
May 22, 2019, 10:18:23 AM5/22/19
to MIT App Inventor Forum
Some more ideas ...

That particular web site looks like it hates to be scraped.
It might be using redirects to defend itself against that.
Research this search term:   how to scrape a redirect

Also, if all you want is the time in a particular country, maybe there is a governmental web site
more scrape friendly?

Also, look at the Clock component in the Sensors drawer, if you haven't yet done so.

ABG

TimAI2

unread,
May 22, 2019, 11:42:02 AM5/22/19
to MIT App Inventor Forum
I can scrape the website just fine (after all we are simply capturing the html as found in view source in our response content)

Are trying to do anything else at the same time ? (show more blocks above Web1.get...)

kunj thakker

unread,
May 23, 2019, 5:09:08 AM5/23/19
to MIT App Inventor Forum
@ABG there Is One Govt. Website https://www.irctc.co.in/nget/train-search

kunj thakker

unread,
May 23, 2019, 5:14:14 AM5/23/19
to MIT App Inventor Forum
there you go @TimAI2
If possible could you please send me .aia file of the application you made where it worked fine?
IMG-2210144563323.png

TimAI2

unread,
May 23, 2019, 5:46:08 AM5/23/19
to MIT App Inventor Forum
Can't see anything there that would affect things....

See attached aia, tested in Companion 2.52u Android 9 and as installed app on Android 9

I added a button toggle so you can switch between returning just the time or all the html from the page
getTime.aia

ABG

unread,
May 23, 2019, 9:11:23 AM5/23/19
to MIT App Inventor Forum
@ABG there Is One Govt. Website https://www.irctc.co.in/nget/train-search

A quick way to see if a web page can be scraped easily:

In Chrome, view the web page and find something you want to scape, like '2019'.
Type Ctrl-U to see the source of the page
(on other browsers, this might be a different keyboard code).

Use the browser's Ctrl-F (Find) facility to search for the desired text ('2019'). 
If you found it, you can probably scrape it.
If you couldn't find it, it is probably generated by JavaScript on the fly, and you can't scrape it easily.

ABG

kunj thakker

unread,
May 24, 2019, 4:06:24 AM5/24/19
to mitappinv...@googlegroups.com
TimAI2 it still doesn't work
Screenshot_2019-05-24-13-33-32.png
Message has been deleted

TimAI2

unread,
May 24, 2019, 4:56:33 AM5/24/19
to MIT App Inventor Forum
from what i can see in your blocks image, that url will not work, looks like you have it in there twice ?

https://.....train-searchhttps:///.....   <<<< ???

kunj thakker

unread,
May 24, 2019, 4:58:22 AM5/24/19
to MIT App Inventor Forum
Wait Tim I am trying alternate websites

kunj thakker

unread,
May 24, 2019, 6:07:26 AM5/24/19
to MIT App Inventor Forum
I found Website which matches the criteria mentioned by ABG but It still ain't working
Watch this and tell 
I am also attaching an .aia file to sort the problem
Is The mobile phone android version a problem ?
because I still use android KitKat 4.4.3
2.png
blocks.png
Untitled.png
getTime (1).aia

TimAI2

unread,
May 24, 2019, 7:17:08 AM5/24/19
to MIT App Inventor Forum
"because I still use android KitKat 4.4.3"

Could be ? I can't test as my earliest version is 5.x.  Do you have another device to try, or an emulator ?

Also just test with putting all the responseContent into a label (as in my example), once you get that working you can figure out the text manipulation to get the data you want.

Ghica

unread,
May 24, 2019, 8:51:48 AM5/24/19
to MIT App Inventor Forum
The real bad thing about your implementation was that you have set the timer to 1, which means 1 millisecond. No website is able to respond that fast.
Then, I think your parsing was not right. This implementation works: (I put hours, minutes and seconds in separate labels, for easier coding)

blocks (10).png



Have fun with it. 
Cheers, Ghica

getTime_2.aia

kunj thakker

unread,
May 25, 2019, 5:09:55 AM5/25/19
to MIT App Inventor Forum
so i should set timer at 1000 milisec?
Chrome is not allowing me to download the .aia file sent by you Ghica

Ghica

unread,
May 25, 2019, 10:24:37 AM5/25/19
to MIT App Inventor Forum
Kunj,
I could not download your .aia either, using Chrome, but with Firefox you can. Or, I put up a link now: https://drive.google.com/file/d/1jiDtWYfB5a6WeMYhwMkbcbp42yUuXjlJ/view?usp=sharing

Actually, I do not think you should set a timer at all. Think of it: you are going to access this website every second, or 60 times a minute, or 3600 times an hour ... 
But a timer event every millisecond is crazy.
Sooner than later this website will block you, because you are using precious website traffic.

What you could do is find the time once from this website, and then at the GotText event, enable the timer and update your clock display at every Clock timer event.
Even better would be to not use this website at all and use the Clock exclusively. If you are running your app in the timezone you want to see, it is easy, see the documentation: http://ai2.appinventor.mit.edu/reference/components/sensors.html#Clock
Otherwise you have to do some block juggling and look here for the time zone constants you can use: https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html

But this thread was about reading text from a web page. A good way to do this is to find the URL you want to look at, display the website in a browser, click with your right mouse button on the text, choose "view page source" (or similar), which will give you a page of HTML.
It is not always possible or easy to find what you are looking for, but in your case it was: I found this piece of text:
    <h1>Current IST time</h1>
        <div class="clock_block">
        <div class="next-to-banner">
            <div class="current-time">
                <span class="fullscreen-ico"></span>
                <p>IST</p>
                <div id="cityClock">
                    <div class="time">
                                                    <span class="hours">7</span><span class="minutes">51</span><span class="seconds">27</span><sup>pm</sup>
                                            </div>
                    <p>Saturday, May 25, 2019</p>
                </div>
            </div>
        </div>

So, you only need to look at class="hours">find whatever is here</ ... etc.

Cheers, Ghica.


kunj thakker

unread,
May 28, 2019, 5:15:46 AM5/28/19
to MIT App Inventor Forum
Ghica so you mean get the time once and then use 
  clock1.MakeInstantFromParts 
and put the values scraped from website 
 When Screen1.Initialize
    Set Clock2.TimerInterval = 1000
And then use another timer
     When Clock2.Timer
     Call Clock1.Add Seconds
                 Instant = ? tell me what should be here
                                 Quantity = 1

TimAI2

unread,
May 28, 2019, 5:23:08 AM5/28/19
to mitappinv...@googlegroups.com
If you are going to do this, there is actually no need to visit and scrape the website at all, just use the device's internal clock?
If you need to be always working with "IST" (Indian Standard Time) you can always program this into your blocks

I will refer you to a previous thread you posted about the same matter:

kunj thakker

unread,
May 28, 2019, 5:48:17 AM5/28/19
to MIT App Inventor Forum
Phone's Time can be changed by the user anytime 

Wont the internal time of every phone vary?

TimAI2

unread,
May 28, 2019, 8:34:22 AM5/28/19
to MIT App Inventor Forum
You have everything you need in AI2 for this. All you need is a clock timer and some blocks to do the conversion
The instant generated by the device issues a zone offset and a daylight saving time offset
I found I needed to add 3000ms to the generated time output due to calculation time
I have assumed that negative offsets will have a minus sign in front of them!
aia project attached

Of course, by adjusting the value in IST, and the content of a few labels/titles, this can be used for any time zone


Here are the blocks:

blocksIST.png


getIST.aia

Ghica

unread,
May 28, 2019, 11:08:06 AM5/28/19
to MIT App Inventor Forum
@Tim, this is a rather brutal way of doing this calculation, which also requires a lot of non AI knowledge.
Here is an alternative:

Snap23.png


It is based on the knowledge that IST is 5:30 hrs = 330 minutes ahead of UTC. I can get my local difference with UTC by asking for the Z parameter, which gives me +0200 where I am at.
Then, you find now and add the difference between your time and IST time. Try it. I attached an .aia which also still can get the time from the website. I used global variables to be able to use DoIt for debugging.

@Kunj
It is time for you to say what you actually want to do with this IST calculation.
Keep in mind that the time on your phone usually is the time set by your phone provider, althouth you can set it to something different if you wish. If you do that, I am not really sure how that influences any of the calculations above.

If you want something like the duration of usage of your app, you need to do something very different. And when do you need to know the IST time? At startup? 

Cheers, Ghica.
getTime_3.aia

TimAI2

unread,
May 28, 2019, 11:44:46 AM5/28/19
to MIT App Inventor Forum
I like the "Z" parameter :)

kunj thakker

unread,
May 30, 2019, 7:10:06 AM5/30/19
to MIT App Inventor Forum
Ghica I have made a Banking app so for Transaction history purpose i require it
If the users change the time of the phone Then It might be used for fraud 
For Eg. The transaction was made 30-May-19 4:33:46 PM and user may change the phone time to 03-May-19 4:33:46 PM
so That is the major issue for me insisting to use the time on website

NOTE - The Banking app has no access to the user's bank account and thereby to his/her money . The term fraud is limited to the virtual currency used in the app which may or may not be changed anytime by the owner of  the app.

kunj thakker

unread,
May 30, 2019, 8:01:16 AM5/30/19
to MIT App Inventor Forum
Ghica so you mean get the time once from website and then use 
Ghica Please help 

TimAI2

unread,
May 30, 2019, 9:04:14 AM5/30/19
to MIT App Inventor Forum
Try World Clock api


This returns a json of current UTC, which you can then use to align with IST

{"$id":"1","currentDateTime":"2019-05-30T13:01Z","utcOffset":"00:00:00","isDayLightSavingsTime":false,"dayOfTheWeek":"Thursday","timeZoneName":"UTC","currentFileTime":132036948684164917,"ordinalDate":"2019-150","serviceResponse":null}

You may have to account for DST...

Ghica

unread,
May 30, 2019, 12:38:35 PM5/30/19
to MIT App Inventor Forum
Anything that gets information from the web you should use sparingly, be it worldclockapi.com or 24timezones.com, otherwise you will be thrown out sooner or later and your user will not be happy because of all the data traffic he has to pay for.
Adding a second to your initial value every second is not a good idea either, because, when your app stays in the background for a while, the clock will stop counting.
The best way is to store the local "now" time at app start-up, together with the IST time from the web,  and look at the duration your app was running every second.

Even that is not totally fool proof, because the user may tamper with the local time on the phone while the app is running and then the duration could still be off.
Some blocks:
From Web1.GotText:

Snap23.png

And the clock timer event (the new IST time is shown in lbel6:

Snap24.png


I also attached the updated .aia
Try to understand how this works, and think about places where you could verify that you still have the right time, without going to the web every second.

Cheers, Ghica.

getTime_4.aia

kunj thakker

unread,
May 31, 2019, 7:08:27 AM5/31/19
to mitappinv...@googlegroups.com
, when your app stays in the background for a while, the clock will stop counting.
That won't matter as when the app will start , The time will be scraped from the website just once and then 1 sec will be added  after every 1 sec as long as application is running

Ghica

unread,
May 31, 2019, 8:24:36 AM5/31/19
to MIT App Inventor Forum
It does matter. If your app is used and the phone screen switches off after a while or the person answers a phone call in the mean time or uses whatever other app, then the app will be paused and the clock will stop.
Anyway, the blocks I gave you are just as easy to implement and do not have this problem. As you see the time is scraped once and then, every second or longer if the app was paused, the time will be adapted correctly using the duration. And that does not depend on the timezone of the user.
Cheers, Ghica.

kunj thakker

unread,
Jun 2, 2019, 8:27:07 AM6/2/19
to MIT App Inventor Forum
Ghica I will Check and Let You Know
Reply all
Reply to author
Forward
0 new messages