GenAI holds a lot of potential in speeding up pharmaceutical research, yes ! But that’s not what we’re going to delve into now !
First things first – Elon Musk and Mark Zuckerberg made common cause quite recently. And no, it has nothing to do with their aborted cagefight challenge ! No, they haven’t yet agreed on what to make of AI, either !! Their legendary rivalry endures on those fronts.
But they seem to have found common ground by way of legal challenges that they’ve mounted against OpenAI (nee Microsoft). Why ?
OpenAI’s attempt to restructure its board, giving more control to its ‘for-profit’ entity is the event that triggered this unlikely ‘bonhomie’. While Elon Musk had an ongoing duel with OpenAI in this matter, Meta’s action to join Musk’s suit has brought to the fore a critical issue – what it takes to build and scale an LLM.
Huge training datasets and pricey computing resources.
OpenAI’s response to the original suit by Elon Musk, was to make public his e-mails indicating the same – he had emailed (OpenAI CEO) Altman and other executives that OpenAI would not be relevant “without a dramatic change in execution and resources. This needs billions per year immediately or forget it,” Musk emailed. “I really hope I’m wrong.”
Considering that OpenAI is also facing lawsuits by several leading news & content producers, for using their copyrighted content to train their GPT, what this really brings to the fore is the centrality of content, and training datasets, in building out the LLMs that we’ve been using, experiencing and experimenting with. The last aspect – experimenting – is a real keyword here, given that enterprises are still exploring these LLMs, trying out pilots, but not really committed to going the distance with the popular GenAI models on the cloud, apprehending challenges on the intellectual property front, as well as the costs involved in building and training such models in-house.
As McKinsey puts it, “while enterprises recognize the transformative potential of AI, scaling efforts often falter without clear strategies to address data provenance, bias mitigation, and cost efficiency.”
Moreover, the awakening of content creators, to how their IP has been leveraged, has put a question mark on the future availability of content to build such large, unique datasets, which had been available to the likes OpenAI, bringing forth the reality that relying on limited publicly available data could render models generic, and may not be effective in delivering the much touted competitive advantages. This, if they look beyond the need for vast computational resources, a very costly proposition to surmount.
Long story short, enterprises looking to take advantage of GenAI will not find it easy to train LLMs in-house.
A pragmatic possibility seems to be Small Language Models (SLMs), which come with fewer parameters (few billions, as against trillion parameters of LLMs), trained with smaller datasets which the enterprises can find within their historical operations. Though lacking in creative abilities of LLMs, SLMs make up in terms of efficiency, accuracy, security, and traceability. SLMs are proving that relatively less compute can yield more reliable and accurate responses in specialized contexts, particularly in industries where precision and domain-specific knowledge are paramount.
But the big elephant in the room, ‘intellectual property’, seems to be a much more compelling reason, why enterprises should focus on leveraging SLMs. GenAI as we’ve been experiencing these days is not made up of ‘generated’ content, as much as it is ‘regenerated’ content driven by the contents of training datasets modifying, reusing and repurposing content (which, in turn, have largely turned out to be copyrighted materials).
Imitation, as they say, is the best form of flattery. However your average enterprise is not in the business of stand-up comedy, eh ! And so, now we know why fierce opponents, Musk and Mark, are suddenly compatriots.
But, what-if the outcome of the lawsuits forces all these GenAI systems to becoming open to everyone. Could humanity could redeem itself from the seemingly risky precipice it finds itself on ?
The story of how Indian pharma major Cipla (Chemical Industrial Pharmaceutical Labs) exploited a loophole in the India Patent Law, to take on Big Pharma and give the ‘Third World’ a way to treat AIDS, is an excellent case in point.
The Indian Patent Act of 1972 states that, in matters of food and health, a product cannot be patented, but only the process can be. Leveraging this provision, Cipla reverse engineered AIDS medicines that were being sold at about $ 10,000 per year, to be available at $ 350, that is less than $1 per day. Variously referred to as buccaneer and pirate, Cipla produces a third of the AIDS medicines used in Africa, delivered through Doctors without Frontiers (medecins sans frontieres), as well as anti-malaria vaccines. Big Pharma had to follow suit, in the face of massive protests across the world, and faced with the moral compulsion of putting humanity above profits. Big AI has some lessons here indeed !
The demise of Suchir Balaji, the Indian-American OpenAI whistleblower, though tragic, has also opened windows to debate the ethical and legal implications of building and deploying AI technologies.
Considering that the trajectory of GenAI goes beyond ethics, impacting anthropological and sociological issues including human agency, cognition, and potential to innovate, it does seem like SLMs do matter, for now, at least !