Know Your Defaulter, or KYD, launched in mid-2016. The startup’s software spiders crawled through dozens and hundreds of data sources each day – court cases, company filings, credit ratings and defaulter lists put out by banks and financial institutions. It allowed anyone to instantly conduct an online due diligence on a company before entering into any sort of agreement with it. In less than a year, KYD had saved tens, possibly hundreds of millions of dollars for its users by helping them spot companies that were unscrupulous.
Best Trip, a trip planning app that showed real-time road traffic conditions in the top six Indian metro cities was credited with saving millions of commuter hours, vehicle fuel and generating hundreds of millions of dollars in productivity and healthcare costs savings in just the 17 months since its launch in late 2015.
There’s also the story of Compliance Scanner which started in 2014 as Data Watch. After the first year of meandering, trying to find its business purpose, it pivoted into a useful tool that allowed anyone to instantly identify the statutory compliance status of Indian firms on various laws. Regulators, lawyers, consultants, civic-minded citizens and NGOs; all relied on it to spot variances between what a company claimed in public, and what it practiced in private. It was single handedly responsible for dozens of well-known companies being found of falling short of their provident fund (EPFO) obligations.
This is where we must apologize and tell you that none of the above three examples are true.
Know Your Defaulter, Best Trip and Compliance Scanner do not exist.
In fact, these startups cannot exist (and neither can the efficiency they generate) because of the way various Indian governments and regulators have either ignored or actively stymied the “Open Data” initiative.
What is open data?
“Open data and content can be freely used, modified, and shared by anyone for any purpose.”
A 2013 McKinsey research report showed how governments around the world could unlock an additional $3 trillion (yes, that’s right) in economic value merely by enabling open data across seven domains.
To quote from the report: “An estimated $3 trillion in annual economic potential could be unlocked across seven domains. These benefits include increased efficiency, development of new products and services, and consumer surplus (cost savings, convenience, better-quality products). We consider societal benefits, but these are not quantified. For example, we estimate the economic impact of improved education (higher wages), but not the benefits that society derives from having well-educated citizens. We estimate that the potential value would be divided roughly between the United States ($1.1 trillion), Europe ($900 billion) and the rest of the world ($1.7 trillion).”
The seven domains McKinsey identified were education, transportation, consumer products, electricity, oil & gas, healthcare and consumer finance.
Needless to say, these are some of the sectors where India needs innovation and transformation at scale. Today. If only.
Even though there were early indicators that the Indian government was interested in furthering open data and transparency initiatives, the last few years have been a big let-down. After finding the Right to Information (RTI) Act “somewhat wanting”, the current government laid special emphasis on the need to release raw data in a machine-readable format. The Open Government Data platform data.gov.in was supposed to do exactly that. Even before the current government, back in 2011, India was also a formative member of the Open Government Partnership (which has since been joined by 75 countries) but withdrew just before the kickoff. With the benefit of hindsight, these pronouncements seem to be pretentious at best.
Higher value data sets still remain behind access walls. In fact, in some cases, access has been made more difficult since the government started driving its open data movement. Much of this data was provided for public access earlier but increasingly, access has been restricted. The government is opening up less useful data sets while blocking access to richer ones. This is entirely contrary to the open data policy of the government which says that data collected by the government with the public money shall be in the open.
Here, an important question must be asked. What does it mean for data to be truly open? In a way that it acts as an innovation multiplier for a country’s economy? Here’s how McKinsey described it:
For any activity that needs to be done at scale and with few errors and at speed, machine readability is crucial. Scanned documents or badly formatted PDFs force diligence and other processes into manual mode which are slow, error-prone and costly – and sometimes altogether impossible to do manually.
Diligence (both at individual level as well as sector level) is increasingly an on-going activity rather than a one-time, at-initiation activity. Without crawlability, on-going diligence might as well be written off.
Unfortunately in India, we’re on the “closed” end of the spectrum on almost all counts. CAPTCHAS, paywalls and secretiveness are increasingly what we encounter while collating most government data.
Case statuses and details in district courts were originally crawlable (meaning, could be read and indexed by software spiders) but have now been put behind CAPTCHAs. This means a human intervention is required for downloading every single unit of data. An argument is made that if you need to search for a case, you can directly put in a CAPTCHA on the government website and search there. However, unless service providers can pre-download all of the data, various matching and tracking algorithms cannot be run. Further, unless these databases can be continuously updated, early warning systems (of the type recommended by the RBI for instance) cannot possibly be built.
All courts in India put out “cause lists” – or lists of cases scheduled for hearings. These are important lists for a variety of stakeholders – from lawyers to lenders to parties actually involved in the suits. However, these lists are rarely machine readable or even searchable (often being scanned copies). The system seems to expect all interested stakeholders to manually download these lists daily and go through them to see if there is anything that affects them. The scale of this task seems to elude the authorities.
For instance, a bank like State Bank of India (SBI) has tens of thousands of corporate borrowers. What mechanism are they expected to have to manually look through these lists for potential interests?
It will be fair to say that today, there is no systematic mechanism for accessing this information on a large scale, and it is by design.
The state-run Provident Fund Organization (EPFO) used to have a system to search for all payers in the EPFO system. This was routinely used to estimate the size of an organization and to determine if it was complying with payment of EPFO dues etc. This too has now been put behind a CAPTCHA. If a bank or a manufacturer wants to keep track of its borrower/vendor for compliance on this front, it is close to impossible now.
A lot of private company information, like CIN (Corporate Identification Number) changes, are either behind CAPTCHAs or available only on paper (because the services are broken).
The same is the case with trademark information and filings related to them. This means it’s impossible to create a comprehensive database of trademarks in India. In 2017.
Compare this to the US where the government proactively pushes out patents and trademarks data freely in association with private companies.
Information related to consignment-wise exports and imports used to be open for years, but was suddenly taken offline in December 2016. It was brought back online, but at a charge of Re 1 per record (that’s almost Rs 2 lakh per day if you want all the records).
Then there’s information related to the Profit & Loss statements of companies. As per the Companies Act, 2013, P&L statements of companies are no longer private. And for a few months following that, old P&L statements had been made available on MCA website after payment of a requisite fee. But now, all of a sudden, that too has been withdrawn. Compare this to Companies House in the UK where such documents are available free of charge.
Even publicly listed companies have been rendered partly opaque. In the US, the SEC actively (even aggressively) disseminates filings by public entities, relating not only to their financials but also all material events. But in India our securities regulator SEBI has abdicated this responsibility in favour of respective exchanges. The exchanges in turn hoard quarterly filings in the XBRL format (this is a structured format in which companies report quarterly data) and make corporate announcements available for retail use but for commercial use require a paid subscription.
Do note, these are “public filings”. It is incredible that SEBI has allowed private monopolies to exist and extort the public for what is rightfully theirs to begin with.
This list is endless.
Even papers laid on the table of Parliament, those are technically public documents whose titles are made available on the Lok Sabha’s website, continue to be unavailable. Presumably, most if not all of these documents are now prepared digitally so there can’t be much friction in making them available online.
Given the daily news coverage around bank defaults, one would think at least the Reserve Bank of India (RBI) might have taken the lead in making this data available easily. But the RBI seems to have abdicated its responsibility with regards to such data and has asked the credit bureaus to disseminate it. However, the bureaus again, while making it available for general use, have conditions forbidding the commercial use of these lists. Again, this is public data and yet the Govt has created a private monopoly over it for commercial use.
Over and again, it’s the same story. Either the government hands off public data to private monopolies, or it itself prevents open use of it.
It’s tempting to think of open data as a “good to have”, something India isn’t ready for yet and thus isn’t a pressing need. But the reality is that data is the fuel that powers the global economy today and most innovative companies. Starved of it, India entrepreneurs and businesses will never be able to become world class.
This list too, is endless. Suffice it to say, a closed and tight-fisted approach to data is costing Indian businesses and citizens billions of dollars in lost productivity, lost profits or lost opportunities.
Anchal Agarwal and Parijat Garg are the co-founders of Tofler which is solving for business transparency and visibility into Indian business ecosystem
--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.