The Power of Metaphors- data is the new oil.

#Metaphors are not mere slogans to communicate a profound idea, but also have transformative power, to bring the invisible process by which “slogans” come into being. Take the example of “Data is the new oil” and how it has transformed AI. The AI revolution is by and large established on the pillars of Algorithms, Compute, Communication, and Data. Importantly, the data is the “fuel” that propels AI into different areas at warp speed, and aptly data was coined as “the new oil”.

As the data is harvested/mined/siphoned, we need to brace and mitigate the impacts that might come. Albeit these “impacts” can be big and may be largely unknown, the nations need to build strategies for them. TV series like The Capture TV series, Black Mirror etc are a dramatization of some of the impacts of such “far of dystopia”. But are these some future perspectives or current predicaments? Take an example of a “digital twin”, which is a digital replica of a person (audio/video/behavioural quarks). Often, these twins are AI trained on vast amounts of data such as video, audio, and images of the person. We have, in the recent past, seen AI systems demonstrating a deep fake model of remarkable quality:

[Obama] https://www.youtube.com/watch?v=cQ54GDm1eL0
[Leonardo DiCaprio] https://www.youtube.com/watch?v=17_xLsqny9E

This has implications for national security, as well. Take the example of, Killing of Mahmoud al-Mabhouh and how Dubai’s police used digital forensics to identify the perpetrators, and how “Toka” is creating technology to counter such forensic analysis– turning BBC’s The Capture TV series into something of a reality.

How should we leverage the data then? One example is that of #TONOMOUS #Neom. What they say on the webpage goes like this: “A cognitive world is built on trust, using more than 90% of consented data rather than the 1% used by today’s smart cities. Hyper-connected cognitive cities will use that data to seamlessly interact with community members, interpreting and mapping future needs and adding unprecedented value to humanity”. Though the idea is great, I don’t know what they mean by 90% consented–90% of what? And what about the last 10% is it un-consented? Besides consent can be tricky, for example, deliberately buried under some dark UI pattern, “on/accepted” by default, transitive acceptance, incomprehensible legal language etc which makes it difficult & may be impossible to be an informed consent. Take the example of the usage of audio data by apple to train their AI, courtesy agreement with Spotify (https://www.wired.com/story/apple-spotify-audiobook-narrators-ai-contract/) and the concerns of voice artists and their concerns about “consent”.

On the flip side of data usage and collection, we have China. A sort of blanket data collection. As Kai-Fu Lee mentions in his book “AI Super-Powers China, Silicon Valley, and the New World Order”, “….By immersing themselves in the messy details of food delivery, car repair, shared bike, and purchases at the corner store, these companies are turning China into the Saudi Arabia of data: a country that suddenly finds itself sitting atop stockpiles of the key source that powers this technological era.”. In an interview with Forbes, Kai-Fu lee when asked about investment philosophy of Sinovation VC, he said..”..Instead of seeking rocket science AI companies, we seek companies that have data. Data that is well structured, large in quantity, connects to a business metric and ideally, is proprietary…”. Perhaps such investment philosophies fueled and incentivized the data collection by the Chinese tech companies, with an assumption that “..Chinese people are willing to give up data privacy for convenience..”. However, The Personal Information Protection Law (PIPL), November 2021, may give Chinese data subjects new rights as it seeks to prevent the misuse of personal data.

Similarly, the attitude of BigTech about the data hasn’t been much different either; take the examples of Cambridge Analytica &Facebook (Meta), WiFI data collection during Street View data collection by Google etc.

What am I getting at?

The AI engine needs data to train and improve itself. For the groups that are building these models, the proliferation of data is a must to achieve AI supremacy. Data is much different to “oil”. You can’t have two copies of the same litre of oil; the ownership is based on the custody and that is transferred once physical custody of oil is transferred. Data, on the other hand, is much different. Perhaps, maybe, we should follow something which is analogous to “the smart grid approach”, where the surplus energy produced by the household is sold to the grid– that is incentivize consent-based data sharing. I would love to hear your thoughts, ideas, and feedback.

John Oliver did a great job. in explaining some of the nuances in a fun way Artificial Intelligence: Last Week Tonight with John Oliver (HBO)