Perspective

Autonomous vehicles and big data: Managing the personal information deluge

November 30, 2020

We have commented before in The Sensor on the general privacy and data protection implications for the connected and autonomous vehicles (CAVs) ecosystem. In this issue, we return to consider those implications in relation to recent developments in the world of CAV big data.

The Automotive Edge Computing Consortium (AECC) researches how edge network architectures could help CAVs process data effectively within the timeframes needed to support current and next generation CAV technologies and enterprise applications. These technologies and applications include High Definition Mapping, Intelligent Driving, Mobility Services and Finance and Insurance services.¹

In its most recent publications and presentations, the AECC concluded that earlier studies of the data required for CAVs in relation to these services greatly underestimated the volume of data needed. When the technologies mentioned above are taken into account, current estimates suggest that each vehicle could be generating up to 100 GB of data per month. Across an estimated 100 million CAVs in use worldwide by 2025, that could mean as much as 10 exabytes of data per month.²

In our previous article, we referred to the amount of data generated by CAVs as a torrent. These recent studies on data volumes reveal that the metaphor of a tsunami or deluge is likely more appropriate. As a significant quantity of this information will be personal information, the volume of data and the complexity of the multitier information ecosystems needed to support its processing will create challenges for privacy and data protection.

Privacy challenge one: Managing contractual relationships

Edge computing provides an intelligent solution to the basic problem of managing and timely processing large quantities of data. However, with so much information circulating through multitier network architectures, organizations providing services that use them need to be aware of where the data is flowing, and who ultimately has access. Important considerations include such questions as:

What are the relationships between the entities processing data, from the edge to the centre? For example, is there a clear hierarchical service provider relationship between them, or are they partners or joint ventures? Do they have certain independent rights around the use of the data flowing through these systems?
What is the legal relationship between my organization and the entities involved in the CAV ecosystem?
If my organization is contractually bound to only one entity at the edge, how do all the relevant legal obligations around data security, obtaining consent and liability flow up and down through the contractual chain? For example, how are auditing rights and access to information requests managed down the contractual chain?

If my organization is contractually bound to only one entity at the edge, how do all the relevant legal obligations around data security, obtaining consent and liability flow up and down through the contractual chain? For example, how are auditing rights and access to information requests managed down the contractual chain?

Understanding these relationships will be crucial, both for properly drafting the contracts between an organization offering CAVs’ services and the entities operating the information processing systems that will govern the use of personal information, and for understanding the organization’s own responsibilities with respect to such matters as obtaining consent from individuals whose personal information is used.

Privacy challenge two: Meaningful consent

Canadian private sector privacy laws are consent-based: subject to certain exceptions, organizations must obtain consent for the collection, use and disclosure of personal information, and regulators expect them to meet the challenge of obtaining meaningful consent.³

Privacy policies are typically used as a way to consolidate information about privacy and data handling practices, but regulators increasingly expect organizations to find ways to communicate salient information in a form that can be easily understood, particularly by those who do not have the time to read and digest lengthy privacy policies.

Regulators now expect organizations to be, among other things, creative in their approaches, for example by using just-in-time notices, interactive tools or infographics. Organizations are also expected to allow individuals to control the level of detail they get and when and emphasize key elements, such as what personal information is being collected; with which parties personal information is being shared; for what purposes personal information is collected, used or disclosed; and residual risk of harm and other consequences.⁴

Satisfying these obligations is a significant challenge, and all the more so when one considers complex information ecosystems such as those examined by AECC. We can hope that, in the future, privacy law reforms in Canada will introduce legal basis other than consent for the collection, use and disclosure of personal information, as the EU has done.⁵ In the meantime, CAVs organizations in Canada should pay attention to the obligation to provide control over the level of detail provided to individuals, for example by offering layers of elucidation that span the interpretability/explainability divide.⁶

Organizations that need to provide explanations of how information moves within a system in order to explain how it is collected, used and disclosed should consider investing time in the layered approach. For example, organizations can provide top-level summaries that provide just enough information for individuals in a hurry to get a sense that there are many participants and directions in which information is flowing; a second level that goes into some detail, perhaps with a summary black-box diagram; and a third layer, that provides a closer view and more detailed infographics.

Privacy challenge three: Privacy by design

Between the requirements to obtain meaningful consent and the challenges of ensuring contractual protections, organizations might feel that the complexities of the edge computing model for CAVs services creates privacy law risks that are difficult to quantify or control. For this reason, organizations may want to consider Privacy by Design (PbD).⁷ While always good practice, adopting a PbD approach might be particularly useful in the CAVs multitier information processing contexts.

PbD means calls for privacy be taken into account throughout the engineering process. In the CAV edge computing context, this could translate into determining what the organization really needs to collect, use or disclose, in order to provide the service; engineering the system to strip out as much personal information as possible, as early as possible, in order to reduce the attack surface of the system; and using advanced techniques – such as homomorphic encryption,⁸ private set intersection,⁹ and others – to minimize exposure of personal information as it makes its way from CAVs to edge to centre.

The principles of PbD are sometimes criticized as vague and not easily translated into engineering practice.¹⁰ In fairness to the champions of PbD, however, it should be mentioned that PbD is a set of principles, not specific practices. By having engineers, product owners and other stakeholders absorb those principles, organizations can put themselves in a position to fashion PbD solutions specific to a particular service, product, or line of business. Moreover, while the merits of specific PbD principles can be debated, as a whole they represent a set of regulative ideals that can assist organizations in reducing the risk of handling and transacting personal information.

Final word

Organizations confronting the CAVs data deluge will need to arm themselves with a variety of solutions. Just as edge computing can help meet the processing challenges, consideration of the matters discussed here can help organizations to meet privacy law challenges.

¹ See e.g., Automotive Edge Computing Consortium, “AECC Technical Report v2.0: Driving Data to the Edge: The Challenge of Data Traffic Distribution,” July 2020; Automotive Edge Computing Consortium, “Operational Behavior of a High Definition Map Application (White Paper),” May 26, 2020; Prashant Tiwari, “Managing the Connected Car Data Tsunami” presented at Edge Computing World 2020, Oct. 14, 2020; Ken-ichi Murata, “Edge Computing for the Connected & Autonomous Car,” presented at Edge Computing World 2020, Oct. 14, 2020.

² Tiwari, Prashant, “Managing the Connected Car Data Tsunami,” presented at Edge Computing World 2020, Oct. 14, 2020. To better appreciate the scale, consider: 1000 terabytes = 1 petabyte; 1000 petabytes = 1 exabyte. While accurate estimates are hard to come by, Google is thought to process approximately 200 petabytes per day.

³ See e.g. the Office of the Privacy Commissioner of Canada, “Guidelines for obtaining meaningful consent”, May 18, 2020.

⁴ Ibid.

⁵ The European Union’s General Data Protection Regulation (GDPR) provides six legal bases: consent, performance of a contract, a legitimate interest, a vital interest, a legal requirement, and public interest: see article 6, GDPR.

⁶ Often raised in the context of AI, but applicable generally. Many systems that we interact with every day are interpretable to us, but not necessarily explainable, such as mobile phones, cars and elevators. For most people, such systems are black boxes. Through experience, individuals will come to associate a variety of inputs with outputs, reactions or responses, and can also make successful predictions about how one of these systems would react to a certain input. Most of us typically rely on interpretability: we skip the (many) details that we would not understand anyway, or have no interest in taking the time to learn. See e.g. Leilani H. Gilpin et al., “Explaining Explanations: An Overview of Interpretability of Machine Learning,” Feb. 3, 2019.

⁷ See e.g., Ann Cavoukian, “Privacy by design: The 7 foundational principles,” Information and Privacy Commissioner of Ontario, 2009.

⁸ Bernard Marr, “What Is Homomorphic Encryption? And Why Is It So Transformative?” Forbes, November 2019.

⁹ Hao Chen, Kim Laine and Peter Rindal, “Fast Private Set Intersection from Homomorphic Encryption,” CCS '17 Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, October 2017.

¹⁰ Jeroen van Rest, "Designing Privacy-by-Design," Designing Privacy by Design: Lecture Notes in Computer Science (2014) pp. 55–72; Seda Gurses, Carmela Troncoso, and Claudia Diaz, “Engineering Privacy by Design."