With an ever-expanding ocean of consumer data at their disposal, firms must balance efforts to map out customer journeys and profiles with building trust that consumer information is safe with them. Boasting airtight data security and privacy is no longer a choice for companies but a public mandate and an industry benchmark. As online activity becomes increasingly complex and personal, how can firms personalize customer experiences and build digital identities without compromising consumer data?
As part of the continuing series addressing our annual Grand Challenge theme of building business resilience, the Kenan Institute sits down with Longxiu Tian, UNC Kenan-Flagler Business School assistant professor of marketing, to mine his expertise in customer strategies and get his perspective on the current, crucial moment for firms trying to build trust and profitability with innovative consumer data management strategies. Tian’s responses and supporting research were jointly developed with Nikhita Bhoopati (MBA’ 24), Forte Fellow.
In your research, you find a recent “privacy paradigm shift” regarding consumer data. How do you define this shift, and what role have consumers played in driving the changes we’re seeing?
Longxiu Tian: Consumers have benefitted tremendously from the digital transformation of goods and services, an arena that has become overwhelmingly data-driven. Many aspects of today’s digital commerce were hardly thinkable just 25 years ago. One can now compare hundreds of products with a single click, and algorithms create real-time personalized product offerings and recommendations. Customers today provide valuable insights to brands at many different touchpoints – from landing pages and point-of-sales to chatbot conversations and, of course, post-purchase product reviews to help fellow consumers make better informed purchases. It is hard to imagine a consumer today not reading or watching an online review before making a purchase, whether they’re buying a toothbrush or booking a vacation. Contemporary consumers have, in many ways, become self-taught data scientists.
This model of digital commerce relies on fine-grained user-level data that need to be collected, shared, and combined. These data include demographics, behaviors, locations, as well as information about preferences, which consumers disclose in what they explicitly say (e.g., via online reviews, customer journey surveys) or reveal through their actions (e.g., via purchases and engagement patterns).
The “privacy paradigm shift,” which I examine in my research, refers to the transformation of values, norms, and regulatory frameworks for use of personal data, as well as calls for safeguarding individual rights to data privacy. There is growing awareness and concern over how our personal data are handled, and increased scrutiny of the stakeholders deciding policies and strategies that govern data privacy and protection. Consumers today are demanding – rightfully – a seat at the table, and those demands have catalyzed a disruption and brought us to a crossroads in marketing practices and technologies.
You describe this crossroads in marketing where companies simultaneously strive to personalize consumer experiences while safeguarding consumer data. What regulatory and industrywide factors have set the stage for the current conundrum?
Longxiu Tian: Since the early days of ecommerce up until a decade ago, the conventional wisdom was that the collection and management of customer data is by and large engineering and operational processes. Yet a broader stakeholder scope has evolved to include data scientists, marketers, and business partners, reflecting customer data’s central role in driving business innovation and growth. Exploiting the growing complexity of data stakeholder relationships, malicious actors have proliferated in the past decade, their numbers exhibited in waves of data breaches (see Figure 1). These large-scale cybercrimes show that data security demands technological innovations as well as new regulatory and organizational paradigms. As a result, business stakeholders and policymakers have come around to see that data privacy and data security, while closely linked, are in fact distinct facets of data technology and management that need to be addressed.
One emblematic case study is the consumer credit reporting agency Equifax’s handling of their 2017 data breach when nearly 150 million international consumers’ credit data were affected. Evidence strongly indicates that this was a cybercrime committed by hackers supported by foreign state actors. Talk about a David vs. Goliath battle: we were pitting the engineering resources of a single American enterprise against the cyberwarfare capabilities of entire foreign nations.
Around the time of the Equifax breach, the General Data Protection Regulation in the European Union and the California Consumer Privacy Act were passed in the United States, providing comprehensive guidelines on prioritizing user privacy and data protection. These regulations set standards for how companies handle user data, their consent, and transparency in storage and usage. The regulatory frameworks were broadly welcomed by the business community because they provided common and actionable benchmarks for data ecosystems, echoing banking industry reforms and standards established decades earlier. GDPR, for instance, requires certain organizations to appoint a data protection officer responsible for overseeing data protection strategies and GDPR compliance, and acting as a point of contact with supervisory authorities.
Beyond requirements and standards, these changes also created opportunities for innovation. Instituting DPOs, for instance, has been a catalyst for many firms to modernize data administration internally and shed data silos that had been outgrowths of vertical organizational structures. In the case of Equifax, it became one of the biggest success stories of a ground-up cloud transformation in the aftermath of its data breach. In addition to fortifying the data security of its vast data ecosystem, cloud migration enabled it to commit to net zero greenhouse emissions. The move also positioned Equifax to become an aggressive early mover in leveraging the explosive growth of generative AI to improve its processes in anomaly detection, identity protection, and credit scoring accuracy, building the business’s resiliency to both malicious actors and shifting customer expectations.
It seems that nearly every company today puts data – particularly consumer information – at the core of its operations. How can companies build consumer trust while they collect and manage consumer data?
Longxiu Tian: Building the requisite trust with consumers so that customers “opt-in” to sharing their data is the cornerstone of customer relationship management in this era of data privacy and protection. To understand the basis of this level of trust, we must realize how data security and data privacy go hand in hand and how they differ. I’ll speak to these challenges citing two highly promising emergent technologies: data clean rooms and differential privacy.
Now offered by many cloud and data management vendors, data clean rooms are a data security technology that establishes secure cloud environments for sharing and combining data and machine learning insights between business partners. Clean rooms allow “walled gardens” like Google and Facebook, as well as retail media platforms like Amazon and Walmart, to share their customer-level insights with advertisers while still exerting strict controls, such as multifactor authentication, data encryption, and so-called “data contracts,” so that they determine what can leave the clean rooms. This means that even if malicious actors obtain the physical hardware where data are being shared (a nearly impossible scenario for bypassing all digital authentication), they’d only find an encrypted, unintelligible mess. This level of security gives assurances that data of both the publishers and advertisers can securely be joined in the same space to learn shareable insights, in accordance with the data contract. Clean rooms are the most effective solution developed so far for maintaining data security when fusing data across various sources.
This process of data fusion is the bread and butter for marketers when it comes to customer identity resolution, which entails stitching together user activities across separate channels and platforms (i.e., data sources) to unify customer journeys. Chief marketing officers with whom I’ve spoken have told me there can be upwards of six times as many distinct channel-level user IDs than known users. As much as we’d like to interpret this statistic as a sign of untapped customers, the more realistic (and troubling) interpretation is that, for many brands, the average user’s customer journey is broken into six fragments that are not correctly fused. This identity fragmentation is among the most pressing marketing challenges that has arisen with the privacy paradigm shift.
To address this challenge, companies need to prioritize transparency and consent in their data collection, incentivize customers to “opt-in”, and communicate what they’re doing with data to the customers themselves. Data usage communication cannot be stressed enough: it needs to be clearer, simpler, and communicated across more touchpoints than what brands are doing today. This is where data security hands the baton to data privacy. Data security refers to the infrastructure and technology used to protect against malicious actors. Meanwhile, data privacy is, first and foremost, about trust between brands and consumers – specifically, that customers’ data aren’t being misused or abused in ways that run counter to their welfare and basic right to privacy. The brands that practice this ethos and successfully communicate it to their customers will be the brands that own the market’s most competitively advantageous data.
The second technology that I want to highlight is differential privacy. A “by the algorithm for the algorithms” technology, differential privacy mitigates the risk of revealing personally identifying information when training AI models used for targeting and personalization. Differential privacy is a way to architect AI models to algorithmically guarantee that an individual’s presence in a dataset cannot be discerned or revealed. For data stakeholders, differential private algorithms are also a way that their organization can both realize and communicate their commitment to data privacy.
Having data security doesn’t automatically ensure data privacy and customer trust – data security and privacy must be interlinked for brands to earn the trust of consumers to share their data. By combining transparency, consent, secure data handling practices, and innovative privacy-preserving technologies, brands can successfully navigate the current crossroads, while reaping the potential windfalls of a closer customer-brand relationship.
You have proposed Privacy Preserving Data Fusion methodologies for companies currently tackling “identity fragmentation” while doing so in a way that adheres to data privacy standards. What problems is PPDF aiming to solve?
Longxiu Tian: Third-party cookies have been widely criticized and curtailed in recent years, and cookies’ decline threatens businesses’ ability to fully map out customer journeys. To counteract the fragmentation of the customer journey, marketers are pushing to make better use of various first-party data sources in a more unified and holistic way. Toward this end, together with my colleagues Dana Turjeman of Reichman University Israel and Samuel Levy of the University of Virginia Darden School of Business, we developed Privacy Preserving Data Fusion, a generative AI approach for combining consumer data from multiple sources with quantifiable and tunable privacy guarantees.
We teamed up with a major wireless telecom carrier that ran large-scale brand perception surveys each quarter. These surveys hold extraordinary strategic value as they can be used to identify service pain points, provide perception measures on customer touchpoints and initiatives, and quantify long-term brand marketing efforts. To garner candid responses, the survey instructions clearly communicate that the identities of responding customers are anonymized.
It has become increasingly clear, however, that anonymity – the removal of personally identifiable information – does not in and of itself guarantee privacy. A fact that is particularly salient when combining and fusing multiple data sources. For instance, in 1997, computer scientist Latanya Sweeney was able to uniquely identify then-Governor of Massachusetts Bill Weld within an anonymized state employee medical records database using just three pieces of publicly available voter registration information: birthday, gender and ZIP code. In the privacy domain, this is referred to as a data linkage attack.
In our application of PPDF, we sought to fuse the anonymized surveys with the telecom firm’s internal customer relationship management database to better understand how internal customer metrics, such as churn and spend, correspond to the survey’s brand perception measures. Because PPDF is based on a class of probabilistic generative models called variational autoencoders and is trained using differential privacy, it prevents unintentional reidentification of individual customers, such as what happened to Gov. Weld.
For end-users of PPDF, differential privacy can be fine-tuned based on the desired tolerance level for reidentification relative to model accuracy. PPDF is scalable and capable of fusing a wide array of tabular customer data sources that marketers are accustomed to. Because of the algorithmic properties of differential privacy, any downstream tasks or queries based on the fused data also “inherit” the privacy guarantees of PPDF.
This article is part of our Grand Challenge series on business resilience.
Marketing at a Crossroads, Part I: The Privacy Paradigm Shift