Re-assessing India's Non-Personal Data

On 12 July, 2020, a committee created by the Ministry of Electronics and Information Technology and led by Kris Gopalakrishnan[1] released a report (Report) on non-personal data, a first of its kind. The Report attempts to decode non-personal data by defining it and delineating the ecosystem in which it will operate, while also assessing its monetization.

With this Report, India has taken a frontal and early approach to the governance of non-personal data. But is it too early an assessment, given the lack of a global digital governance framework in a fast-evolving emerging technology market?

An analysis of the Report reveals some challenges the Committee has yet to address. These are:

a) It attempts to create a distinction between personal and non-personal data, without delving into the problems of de-anonymisation that hinders the creation of such a clear distinction;

b) A significant part of the Report deals with ‘community non-personal data’ which has been given a broad and vague definition;

c) It envisages multiple stakeholders without much clarity on their operation;

d) It suggests a vague mechanism of data-sharing and creates a new business category of ‘data business’.

These are examined in detail as below:

First, on personal and non-personal data, the fundamental question is whether non-personal data can be segregated from personal data. The specific problem here is personal data being anonymised, i.e. when personally identifiable information from a data-set is removed. For example, a person’s travel history without the name, age and address; or, purchase history and preferences on an e-commerce website without the name, gender and location. However, this anonymisation may not be permanent – already emerging technologies are offering multiple ways to de-anonymize data. Thus, it becomes nearly impossible to classify data into strict categories of personal and non-personal data.

Second, on community non-personal data. This data class brings to fore the following issues:

a) Community non-personal data is defined as including anonymised personal data, and non-personal data about inanimate and animate things or phenomena – whether natural, social or artefactual, whose source or subject pertains to a community of natural persons. This is too broad and vague a definition of community non-personal data, with no rules on its operation in the social ecosystem. The birth of the concept of community data was equally vague when it was introduced in the e-Commerce draft policy of 2019.

b) Finding a community association is problematic in itself. For instance, can a person be associated with more than one community? What will happen to those datasets that fall under the ambit of multiple communities?

c) The Report misses to consider individual privacy in the context of community privacy. For example, what will be the consent pattern or ownership rights of a person whose data has been anonymised to be classified as community data?

d) The Report says that raw data will be classified as community data. However, it does not elaborate on how a clear distinction can be made between raw data and processed data, while also not accounting for IP-protected data. This is a serious lacuna as businesses from automobiles and retail to entertainment and creative sectors, often base their business models on IP-protected data, which needs to be fully protected. Any mandatory open access or sharing of such data will endanger businesses.

Third on multiple stakeholders. The Report envisages the creation of an ecosystem with multiple players including a data trustee, data custodian and a data regulator. The government could be the data trustee, exercising control over the data on behalf of a ‘community’. It is thin on the details on how these will be constituted, operated, be made accountable and remain independent of each other and why the government is being treated as a superceding stakeholder.

The Report also suggests the creation of a separate NPD regulatory authority, in addition to those regulatory bodies under the Personal Data Protection Bill and the draft e-commerce policy. An additional regulator runs counter to the government’s leading initiative to bring in ease of doing business in India. The Report sees this NPD authority as determining disputes on data-sharing, effectively negating the right of parties to rely on private contracts where they can determine their own dispute resolution mechanism. This will kill the basic fabric of social contracts and discourage businesses from achieving scale.

Fourth, data business. The Report endorses the sharing of data among businesses and from businesses/ communities to governments for various purposes ranging from public policy development, public service delivery to national security and sovereign purposes.

This is problematic for several reasons:

a) The scope of national security or sovereign purpose has not been defined by this Committee, thus giving the state unfettered power and potential for abuse of non-personal data.

b) The concept of sharing of data by a ‘data business’ [2] does not distinguish between IP-protected data and unprotected data. A company collects data over the years, spending money and resources. For companies to then arbitrarily part with its data, may be seen as unfair and unreasonable.

c) The creation of a separate taxonomy of data businesses, defined broadly and bound by several legal obligations and liabilities, shall be a matter of concern for industry and investor groups.

d) The critical aspect of data privacy has not been considered. This is particularly striking given the on-going debates on data privacy for personal data based on the PDP Bill.

e) As for monetization, how will data will be priced? Who will determine this pricing? A data set could have a different level of importance for Company A than for Company B. Thus, Company A will perhaps find value in pay 10x the price that Company B offered the data seller (could be any entity). For instance, traffic data will be critical for Uber but not for Coca-Cola.

Finally, the Report largely focusses on Indian companies and wishes to derive benefits for the domestic industry. However, there is no clarity on whether this includes subsidiaries of foreign companies that are registered businesses in India or about the companies with foreign investors. This is critical considering the FDI equity inflows of $49,977 million in FY 2019-2020.[3] The Report also discourages cross-border flow of data; this impacts trusted foreign partners who are critical for the growth of India as a strong regional technology leader in Asia.

Recommendations:

Some recommendations for the Committee to consider:

a) Establish detailed rules on anonymisation:

i) Create the deciding authority to determine if a dataset has been permanently anonymised;

ii) Set parameters of anonymisation: To what extent does information need to be removed from a data set to label it anonymised;

iii) Provide clarity on mixed datasets and the applicable data framework;

iv) Discuss the inclusion or exclusion of pseudonymised datasets (i.e. those datasets where artificial identifiers are placed. For example, if Amazon replaces the actual name and address of a person purchasing from its website, with a different name and address.

b) Clarify the consent framework including ownership of data especially in cases of anonymised datasets;

c) The Committee should consider whether there is a need for a separate subset of community non-personal data at all;

d) Create separate sets of rules for the sharing of IP protected and unprotected datasets/ databases. A future non-personal data policy must be in line with existing laws on intellectual property and competition law;

e) Pricing of data should ideally be market-based and left to the discretion of the two entities/ businesses, as in any other social contract;

f) Re-evaluate the creation of a new category of ‘data business’ and the creation of yet another data regulator.

It is critical that a future non-personal data policy acts as an enabler. India, one of the world’s largest data markets, has the opportunity to play the leading role in its upcoming G20 presidency (2022) where it can develop a data-sharing regime focused on interoperability and trust amongst nations. To lead this, India must first domestically establish robust ground rules on data flows.

For effective implementation of a data policy, it is necessary to have greater debate, discussion and consultation amongst all relevant stakeholders. In today’s multi-stakeholder ecosystem, which include individuals, businesses, communities and government bodies both domestic and foreign, data has immense value for all stakeholders and is central to innovation. Regulations must encourage innovation while also creating a level playing field for both, domestic and foreign businesses.

Ambika Khanna is Senior Researcher, International Law Studies Programme, Gateway House.

Gateway House and The Centre for Internet Society hosted a roundtable discussion on ‘Non-Personal Data and Policy’ on 20 July, 2020.

This article was exclusively written by Gateway House: Indian Council on Global Relations. You can read exclusive content here.

For interview requests with the author, please contact outreach@gatewayhouse.in.

References

[1] Constituted by the Ministry of Electronics and Information Technology, Government of India in September 2019.

[2] Defined as an entity including government entity that processes or manages data beyond a certain data-related threshold.

[3] Quarterly Fact Sheet, Fact Sheet on Foreign Direct Investment (FDI) From April, 2000 to March, 2020, https://dipp.gov.in/sites/default/files/FDI_Factsheet_March20_28May_2020.pdf