Open Data: Choking India’s Innovation Pipe

54 views

Skip to first unread message

Anchal Agarwal

unread,

Jun 22, 2017, 10:39:59 AM6/22/17

to datameet

Hello Group, I have written an article in 'the Ken' on the state of Open data in the country and how government is restricting access to the most important datasets of the country. Please give your feedback on this article - also are there other datasets which are very important for the economy but are behind restricted access. Please let me know of such datasets.. we will try to highlight them in media and to the government.

Here is the link of my article : https://the-ken.com/choking-innovation-pipe/

Know Your Defaulter, or KYD, launched in mid-2016. The startup’s software spiders crawled through dozens and hundreds of data sources each day – court cases, company filings, credit ratings and defaulter lists put out by banks and financial institutions. It allowed anyone to instantly conduct an online due diligence on a company before entering into any sort of agreement with it. In less than a year, KYD had saved tens, possibly hundreds of millions of dollars for its users by helping them spot companies that were unscrupulous.

Best Trip, a trip planning app that showed real-time road traffic conditions in the top six Indian metro cities was credited with saving millions of commuter hours, vehicle fuel and generating hundreds of millions of dollars in productivity and healthcare costs savings in just the 17 months since its launch in late 2015.

There’s also the story of Compliance Scanner which started in 2014 as Data Watch. After the first year of meandering, trying to find its business purpose, it pivoted into a useful tool that allowed anyone to instantly identify the statutory compliance status of Indian firms on various laws. Regulators, lawyers, consultants, civic-minded citizens and NGOs; all relied on it to spot variances between what a company claimed in public, and what it practiced in private. It was single handedly responsible for dozens of well-known companies being found of falling short of their provident fund (EPFO) obligations.

This is where we must apologize and tell you that none of the above three examples are true.

Know Your Defaulter, Best Trip and Compliance Scanner do not exist.

In fact, these startups cannot exist (and neither can the efficiency they generate) because of the way various Indian governments and regulators have either ignored or actively stymied the “Open Data” initiative.

What is open data?

“Open data and content can be freely used, modified, and shared by anyone for any purpose.”

Data = Innovation Fuel

A 2013 McKinsey research report showed how governments around the world could unlock an additional $3 trillion (yes, that’s right) in economic value merely by enabling open data across seven domains.

To quote from the report: “An estimated $3 trillion in annual economic potential could be unlocked across seven domains. These benefits include increased efficiency, development of new products and services, and consumer surplus (cost savings, convenience, better-quality products). We consider societal benefits, but these are not quantified. For example, we estimate the economic impact of improved education (higher wages), but not the benefits that society derives from having well-educated citizens. We estimate that the potential value would be divided roughly between the United States ($1.1 trillion), Europe ($900 billion) and the rest of the world ($1.7 trillion).”

The seven domains McKinsey identified were education, transportation, consumer products, electricity, oil & gas, healthcare and consumer finance.

Needless to say, these are some of the sectors where India needs innovation and transformation at scale. Today. If only.

Even though there were early indicators that the Indian government was interested in furthering open data and transparency initiatives, the last few years have been a big let-down. After finding the Right to Information (RTI) Act “somewhat wanting”, the current government laid special emphasis on the need to release raw data in a machine-readable format. The Open Government Data platform data.gov.in was supposed to do exactly that. Even before the current government, back in 2011, India was also a formative member of the Open Government Partnership (which has since been joined by 75 countries) but withdrew just before the kickoff. With the benefit of hindsight, these pronouncements seem to be pretentious at best.

Higher value data sets still remain behind access walls. In fact, in some cases, access has been made more difficult since the government started driving its open data movement. Much of this data was provided for public access earlier but increasingly, access has been restricted. The government is opening up less useful data sets while blocking access to richer ones. This is entirely contrary to the open data policy of the government which says that data collected by the government with the public money shall be in the open.

Data death by a thousand CAPTCHAS

Here, an important question must be asked. What does it mean for data to be truly open? In a way that it acts as an innovation multiplier for a country’s economy? Here’s how McKinsey described it:

McKinsey (How Government Can Promote Open Data And Unleash Over $3 Trillion In Economic Value)

For any activity that needs to be done at scale and with few errors and at speed, machine readability is crucial. Scanned documents or badly formatted PDFs force diligence and other processes into manual mode which are slow, error-prone and costly – and sometimes altogether impossible to do manually.

Diligence (both at individual level as well as sector level) is increasingly an on-going activity rather than a one-time, at-initiation activity. Without crawlability, on-going diligence might as well be written off.

Unfortunately in India, we’re on the “closed” end of the spectrum on almost all counts. CAPTCHAS, paywalls and secretiveness are increasingly what we encounter while collating most government data.

Case statuses and details in district courts were originally crawlable (meaning, could be read and indexed by software spiders) but have now been put behind CAPTCHAs. This means a human intervention is required for downloading every single unit of data. An argument is made that if you need to search for a case, you can directly put in a CAPTCHA on the government website and search there. However, unless service providers can pre-download all of the data, various matching and tracking algorithms cannot be run. Further, unless these databases can be continuously updated, early warning systems (of the type recommended by the RBI for instance) cannot possibly be built.

All courts in India put out “cause lists” – or lists of cases scheduled for hearings. These are important lists for a variety of stakeholders – from lawyers to lenders to parties actually involved in the suits. However, these lists are rarely machine readable or even searchable (often being scanned copies). The system seems to expect all interested stakeholders to manually download these lists daily and go through them to see if there is anything that affects them. The scale of this task seems to elude the authorities.

For instance, a bank like State Bank of India (SBI) has tens of thousands of corporate borrowers. What mechanism are they expected to have to manually look through these lists for potential interests?

It will be fair to say that today, there is no systematic mechanism for accessing this information on a large scale, and it is by design.

The state-run Provident Fund Organization (EPFO) used to have a system to search for all payers in the EPFO system. This was routinely used to estimate the size of an organization and to determine if it was complying with payment of EPFO dues etc. This too has now been put behind a CAPTCHA. If a bank or a manufacturer wants to keep track of its borrower/vendor for compliance on this front, it is close to impossible now.

A lot of private company information, like CIN (Corporate Identification Number) changes, are either behind CAPTCHAs or available only on paper (because the services are broken).

The same is the case with trademark information and filings related to them. This means it’s impossible to create a comprehensive database of trademarks in India. In 2017.

Compare this to the US where the government proactively pushes out patents and trademarks data freely in association with private companies.

Information related to consignment-wise exports and imports used to be open for years, but was suddenly taken offline in December 2016. It was brought back online, but at a charge of Re 1 per record (that’s almost Rs 2 lakh per day if you want all the records).

Then there’s information related to the Profit & Loss statements of companies. As per the Companies Act, 2013, P&L statements of companies are no longer private. And for a few months following that, old P&L statements had been made available on MCA website after payment of a requisite fee. But now, all of a sudden, that too has been withdrawn. Compare this to Companies House in the UK where such documents are available free of charge.

Even publicly listed companies have been rendered partly opaque. In the US, the SEC actively (even aggressively) disseminates filings by public entities, relating not only to their financials but also all material events. But in India our securities regulator SEBI has abdicated this responsibility in favour of respective exchanges. The exchanges in turn hoard quarterly filings in the XBRL format (this is a structured format in which companies report quarterly data) and make corporate announcements available for retail use but for commercial use require a paid subscription.

Do note, these are “public filings”. It is incredible that SEBI has allowed private monopolies to exist and extort the public for what is rightfully theirs to begin with.

This list is endless.

Even papers laid on the table of Parliament, those are technically public documents whose titles are made available on the Lok Sabha’s website, continue to be unavailable. Presumably, most if not all of these documents are now prepared digitally so there can’t be much friction in making them available online.

Given the daily news coverage around bank defaults, one would think at least the Reserve Bank of India (RBI) might have taken the lead in making this data available easily. But the RBI seems to have abdicated its responsibility with regards to such data and has asked the credit bureaus to disseminate it. However, the bureaus again, while making it available for general use, have conditions forbidding the commercial use of these lists. Again, this is public data and yet the Govt has created a private monopoly over it for commercial use.

India's hidden or locked data sources

Over and again, it’s the same story. Either the government hands off public data to private monopolies, or it itself prevents open use of it.

Choking Innovation

It’s tempting to think of open data as a “good to have”, something India isn’t ready for yet and thus isn’t a pressing need. But the reality is that data is the fuel that powers the global economy today and most innovative companies. Starved of it, India entrepreneurs and businesses will never be able to become world class.

Banks could use open data to conduct due diligence on businesses and reduce their non-performing assets (NPAs). In 2016 the RBI put down guidelines for the early detection of fraud or stress in a lender’s loan book. But for banks to be able to do this systematically across their entire breadth of borrowers, they need external service providers to be able to monitor borrower’s health vitals in the form of litigations, defaulter or watchlist statuses, VAT/TAN defaults, ongoing registration and compliance statuses with various bodies. But this information is either impossible to come by or is increasingly behind CAPTCHAs or formats that cannot be processed.
Conversely, open data could help well-run businesses to borrow cheaply because otherwise banks are reluctant to provide loans sans credit histories.
According to the 2016-17 Global Fraud & Risk Report by Kroll, 27% of surveyed Indian businesses said they have faced fraud issues from their vendors or suppliers. Around 87% of the businesses say that they try (or would like to try) some kind of due diligence on businesses they are dealing with. But how do you do that with data hidden or locked up?
Stock Brokers need open data to do due diligence on their clients under the PMLA Act to ensure that their clients are not using them to transact laundered money.
Exporters and importers were heavy users of EXIM data to understand emerging trade patterns, product lines, etc. The government’s own dashboard while good looking, is no substitute for the raw data that allowed for far richer analysis.
Indian and international companies need open data to conduct due diligence of potential customer tie ups or joint ventures. We’re seeing a few large players, particularly in the automotive industry where supply chains are very lean, trying to do this in a structured manner. Other big corporate groups we contacted said that they do not have any such process currently but would like to incorporate such process if any such service is available.
Fast moving consumer goods (FMCG) and market research companies use Census data to understand their markets, particularly rural areas. But this data is provided in a very bad shape, spread across innumerable excel sheets, and therefore is very difficult to consolidate or analyze. These excel sheets do not even bear proper names so it is difficult to understand data contained in each of those sheets.
Regulatory authorities themselves use this data to identify companies involved in fraudulent or suspicious activities, or to study managements of companies. Interestingly, they find it easier to get this data from the private information data providers than from the Government’s own ministries. Fraud investigation authorities like the Serious Frauds Investigation Office use this data to understand the management history of the company and to understand how companies and directors are connected to each other. Often this helps them understand a group or web of companies that could be acting in tandem.
Lastly, apart from institutes, even individuals use data for due diligence of potential employers and third party job consultants who offer them jobs. They also use this when they are dealing with businesses for transactions involving real estate, lending, investments.

This list too, is endless. Suffice it to say, a closed and tight-fisted approach to data is costing Indian businesses and citizens billions of dollars in lost productivity, lost profits or lost opportunities.

Anchal Agarwal and Parijat Garg are the co-founders of Tofler which is solving for business transparency and visibility into Indian business ecosystem

srinivas kodali

unread,

Jun 22, 2017, 11:56:38 AM6/22/17

to datameet

Dear Anchal,

Thank you for sharing this within the group. I read this on the day it appeared in Ken and shared with the group. I largely agree with your article, but have to say you might be unaware of the existence of certain databases. There is lot of data out there and is actually not really that bad. For example:

1. EXIM data was being published by customs authorities until Nov 26th, 2016. https://factordaily.com/customs-stops-trade-data/ They shut it down because of few businesses complaining to the PMO and commerce ministry along with customs dept. https://www.documentcloud.org/documents/3701588-Customs-Data-RTI-Response.html There are business which were effected by this move, but they still have the old data.

2. The example of real-time data of buses and trains not available is a bit of a myth. Within datameet there were multiple API's and datasets shared related to this data. Few apps like ixigo, travel yatri actually use real-time railway data published by railways. But accessing them might not be straight forward. ixigo for example used break encrypted railway data to profit from it, they don't it anymore. https://www.youtube.com/watch?v=0kaUz_F3Eo0

3. You can buy information from MCA on companies data. This data exists in an open standard called XBRL. Open Corporates a startup actually lists worlds company data including that of India https://opencorporates.com/companies/in

While you stress that it is important data which can be used by businesses to improve efficiency in trade and other sectors. I would like to highlight that govt. is not obligated to release this information for private parties to profit but rather to increase transparency and accountability of everyone. As much as I want this data too, we cannot demand it with wrong reasons. Indian Railways and BMTC are planning to sell real-time data, which I think is their right to do so. They want to do it earn revenue, every department has different incentives to open or close data.

It is not really that straight forward to ask them to open all the data. But we can ask them what all can they open and how can they implement NDSAP in the right way. Stating this, I request you to write more on open data in the future.

Regards,

Srinivas Kodali

www.lostprogrammer.com

--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Srinivas Karuturi

unread,

Jun 23, 2017, 12:32:31 PM6/23/17

to data...@googlegroups.com

Srini & members

The need is for a strong review mechanism of datasets classification as open. It needs to be critically reviewed and the opening of data and bringing transparency might hurt certain businesses if their practices are not fair.

I strongly feel that there needs to be a federal level standards and policies which can then be take. And states cities and muncipalities can adopt and enhance.

Driving standards that are consistent across is very important

Regards

Srinivas

To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.

To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.

Reply all

Reply to author

Forward

0 new messages