Data of Indian Railways

3,177 views
Skip to first unread message

srinivas kodali

unread,
Sep 24, 2015, 8:33:56 AM9/24/15
to datameet
Hi guys,

I have writing a series of posts to highlight the data available with in Indian railways. I am not sharing any data as of now, But listing things on how and where you can find the datasets.


Regards,
Srinivas Kodali

Anand Chitipothu

unread,
Sep 24, 2015, 8:48:18 AM9/24/15
to data...@googlegroups.com
Very interesting. Looking forward for more posts.

Anand

srinivas kodali

unread,
Oct 6, 2015, 7:35:13 AM10/6/15
to datameet
So the new version of  trains at a glance is out, this post is basically how to mine TAG to get train numbers.


Also OGD (data.gov.in) released a set of trains details a month or so ago, 2810 trains to be specific. Which can be found here https://data.gov.in/catalog/indian-railways-train-time-table-0#web_catalog_tabs_block_10

Regards,
Srinivas Kodali

--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nikhil VJ

unread,
Dec 25, 2018, 12:02:03 AM12/25/18
to datameet
Hi folks,

There's a project afoot in the OpenStreetMap and Wikidata communities to get together Indian Railways data.

One major part of it: Properly mapping all the railway stations of India, and ensuring they have wikidata entries.

Here's a wiki page set up for it: 

I'm cross-posting from OpenStreetMap India Telegram group:

(Arun Ganesh): There seems to be around 7000 stations located. There still ~1.5k missing. A lot more need names, refs and wikidata links. Overpass: http://overpass-turbo.eu/s/EC4


(Srihari Thalla) : Last year I created two MapRoulette Challenges to tag station codes and add Wiki tags

--------

The overpass query above queries the whole country and may be slow or timeout. I adapted the query to work only on the map area being viewed, so you can zoom into smaller regions. And changed a few things, included a legend in the comments to explain.

https://overpass-turbo.eu/s/EL9


Want to get involved? Engage here.



Regards

Nikhil VJ

Pune, India


PS: Posting on an older thread from '15 that had the perfect subject line - didn't want to create yet another new thread. Pro tip: Use Datameet from google groups - its more fun and you can find stuff that was posted long before you joined.


Arun Ganesh

unread,
Jan 3, 2019, 3:25:19 AM1/3/19
to datameet
Thanks Nikhil for compiling all the links and improving the overpass query!

Thinking whats the best way to scale this exercise so others can join. Having a task list like the maproulette challenge is convenient tfor a newcomer to join. The only issue is that not every station has an entry on wikidata.

Maybe a preparatory task can be to import all the stations, codes and operating division into Wikidata. The most updated and official source for this seems to be the Rates Branch System site: http://rbs.indianrail.gov.in/ShortPath/StationFront.jsp

Whats the best way to scrape the data from here?

Rajaram K

unread,
Jan 3, 2019, 10:13:22 PM1/3/19
to data...@googlegroups.com
Hi I can help with scraping. Let me know. 

Jasvinder Singh

unread,
Jan 3, 2019, 11:31:07 PM1/3/19
to data...@googlegroups.com
Dear All,

Not all the members are familiar with intricacies of the data collection for such projects. Since this seems to be a crowd sourcing endeavour, I suggest that the basic data collection protocol be enumerated for newbies so that they can also contribute data which can then be put in proper format by professionals. 

Regards,

Jasvinder Singh

Arun Ganesh

unread,
Jan 4, 2019, 1:23:38 AM1/4/19
to datameet
Beauty of the internet, crawler got done by Srihari: https://twitter.com/sriharithalla/status/1080801313707896837


I'm in the process of doing a little bit of cleanup using openrefine and will share on a spreadsheet.

Arun Ganesh

unread,
Jan 4, 2019, 1:41:37 AM1/4/19
to datameet

There are 16,770 station entries of which 11,660 seem to be currently operational according to the expiry date of 2999.

Filtering out goods stations, there are 9835 entries. This still seems to include a few yards and cabins that are not legitimate stations. Also noticed quite a few spelling and formatting issues in the names. The station codes look correct. Some amount of manual cleanup is needed on this list. 

The official number of stations according to IR is 7349 stations (as of 2017)and 1817 halts/block huts (2013).  

Jasvinder Singh

unread,
Jan 4, 2019, 2:13:17 AM1/4/19
to data...@googlegroups.com
Dear Arun,
Exactly the type of simple data sheet that newbies can understand. However how location (Coordinates) is linked in this file?

Nikhil VJ

unread,
Jan 4, 2019, 5:21:04 AM1/4/19
to datameet
Hi, 

If there's interest I can set up a mapping interface to crowdsource lat-longs. Along the lines of this: https://fuzzymapper.herokuapp.com/ 
But I would take a week to set up so tell. Though it would be best to get existing lat-long sets with the official station code and import them in, this can help cover the laggards.


Also, if collective work on OpenRefine is required then I can help set it up on cloud and keep protected edit access. That won't take more time.


Regards
Nikhil VJ, Pune

Sajjad Anwar

unread,
Jan 4, 2019, 5:28:34 AM1/4/19
to datameet
Hi! 

There are some ~8900 station coordinates that we scraped a while ago here https://github.com/datameet/railways
Most of these have station codes so if we want to match with the list Arun generated we could do it. And then run another round of manual geocoding. 

Cheers,
Sajjad

Srihari Thalla

unread,
Jan 4, 2019, 10:52:05 AM1/4/19
to datameet
Hi Sajjad,

I recently noticed that station codes from Datameet railways is used to update Wikidata pages for the stations. Would it be possible to update the repo in reverse as well, with the Arun's spreadsheet?

-- Srihari

Srihari Thalla

unread,
Jan 4, 2019, 10:52:12 AM1/4/19
to datameet
Thanks for the mention Arun!

I have now updated the crawler - removing tabs, unwanted newlines, leading and trailing spaces in the data columns.

Here is the latest links to download:

Hope this helps!

@Jasvinder I think one solution to extract locations for the stations is via Overpass using the station codes and combining them to the spreadsheet.

-- Srihari

Nikhil VJ

unread,
Jan 6, 2019, 12:48:58 AM1/6/19
to datameet
Hi, 

We might be able to get some data from Train On Map site of Indian Railways?

https://enquiry.indianrail.gov.in/ntes/trainOnMap.jsp

(good to see the map is working again.. last few months it had been down after google api changes)


The browser console is yielding quite some stuff.

One JS file there is holding some data:

https://enquiry.indianrail.gov.in/ntes/js/stnCodesWithNamesArrayStr.js


(note: you may see weird chars.. that's unicode hindi, save the file locally and you'll see the chars coming proper. )


----------

Another dataset we could get from here: Train (routes) data (unique code and name), with names in Hindi too.
See this API URL, I tried with wget and am getting results without having to do any cookie sessions etc:

Change last arg for different results.. It gives max 30 results.

I changed the output to CSV by this process; it can be scripted: 
1. Ran though https://www.freeformatter.com/json-formatter.html which put quotes around all the keys. (But one could script this too?)
2. There is a "function(){..." line which needs to be taken care of. 
3. In advanced text editor, did Find+Replace-All on the following terms:
[search string] >> [replace with] 
function(){return _LANG==="en-us"?" >> "
":" >> ", "trainNameHindi": "
"}, >> ",
(don't remove quotes)

4. Now it becomes valid JSON. OpenRefine recognizes it and converts it to tabular form, and even this site converts to CSV: https://konklone.io/json/



Regards
Nikhil VJ Pune, India

Arun Ganesh

unread,
Jan 6, 2019, 2:16:23 PM1/6/19
to datameet
On Sun, Jan 6, 2019 at 11:19 AM Nikhil VJ <nikh...@gmail.com> wrote:
Hi, 

We might be able to get some data from Train On Map site of Indian Railways?

https://enquiry.indianrail.gov.in/ntes/trainOnMap.jsp

(good to see the map is working again.. last few months it had been down after google api changes)


The browser console is yielding quite some stuff.

One JS file there is holding some data:

https://enquiry.indianrail.gov.in/ntes/js/stnCodesWithNamesArrayStr.js



This is amazing, wrote a simple script and converted the station list into a CSV: https://docs.google.com/spreadsheets/d/1AFwl_5cB9qD39VWNox1LoeL3tGaGB22f7p4vc7IyMqY/edit?usp=sharing

There are 10,482 entries and 9,240 have a coordinate. The list includes many block cabins and freight sidings which are not stations.

Did a quick quality check, Yesvantpur is Yasvantpur and KSR Bengaluru has the old name in Hindi बेंगलोर सिटी जं.

Overall quite good, but sadly not perfect.


Jasvinder Singh

unread,
Jan 6, 2019, 11:35:03 PM1/6/19
to data...@googlegroups.com
Dear Nikhil,

Saw the interface. Its quite simple and easy to use even by newbies. However is an android version of the same available?
It would be easy to update the databank whenever one is traveling and comes across an unfamiliar station.

Regards,

Jasvinder Singh


On Fri, Jan 4, 2019 at 3:51 PM Nikhil VJ <nikh...@gmail.com> wrote:

Nikhil VJ

unread,
Jan 7, 2019, 12:30:56 AM1/7/19
to datameet
Heads-up for folks in Pune : Mark your calendars for Jan 12th, DMers are organizing a meetup to get together and work on this data. We'll post details here when finalized.

-----------
Hi Arun,

Good is great, and I tell this to my interns : Never chase perfection, target 80% and take a break; then shoot for 80% in the next round and so on and we'll reach perfection iteratively. ;)

That explains why the site didn't seem to be fetching any lat-long data.. it was all in that js. 
I still can't make out how the map is loading the tracks, though.

------
More on the train (ie, route) info:
Two API calls are made my TrainOnMap site to fetch route info:

Taking example of train no. 12438:

From this we get the following data:
- train (route) code, name
- schedule with each stop code, arrival, departure times. But this is main stops only.
- but lat-long of station codes not given.
- validity dates (at least from which date this schedule is active)
- days of the week it runs on (as a 7 digit binary num like 1001010 )
- train type
- journey duration
- latest journey actual times, delays, timestamp of last update

- This gives a flat json array (equivalent to flat table) of 205 rows,
- many station codes. Presumably, these are all the stations the train passes, not just the main ones. Hence, intermediate stops can be obtained from here.
- but no station names or lat-longs (That "Geo" makes the heart skip a beat but alas, no dice.)
- other timing info is all zero'd so N/A.
- day-count tells if it's the next day.
- the date parameter in the URL can presumably be taken from the response of previous query


---------------

Hi Jasvinder, 
Glad you liked it! We got some students together and mapped around 60 routes, 3000+ stops this past weekend with a next version of that. I'll start work on a version for railway station data.
By android I assume you mean mobile-friendly. Been trying for that, but its a bit difficult to re-arrange the various parts and make all the functionality work. I really doubt if the precision needed can be obtained without a mouse. But you're welcome to have a go at it, we can talk it over.


Regards
Nikhil VJ
Pune, India


Reply all
Reply to author
Forward
0 new messages