Artificial Intelligence Good Reads - Dec 23

A list of Artificial Intelligence Good Reads, trying to make sense of the fastest moving space in tech since the advent of the Internet. This is the smallest tip possible of an unmanageable iceberg. Here are a handful of initial articles. I will update this over the next few days. Enjoy! (Image courtesy of DALL-E-2)

December ’23

EU agrees new AI legislation

The EU Commission and European Parliament this week agreed the outline of an EU-wide AI Act that aims to provide safeguards on the use of advanced AI models. The proposed legislation forbids the use of AI algorithms in a number of application, including the use of tracking people through facial recognition (except for law enforcement) or for the purposes of ‘social scoring’. Companies will also have transparency obligations on the inner-workings of advanced models and the data that was used for training, and will be required to comply with a number of safety mechanisms including risk assessments, benchmarking and adversarial testing. It is notable that companies such as OpenAI and Google’s DeepMind have so far resisted calls for this sort of disclosure.

EU agrees landmark rules on artificial intelligence
Legislation lays out restrictive regime for emerging technology

Google announces its Gemini family of multimodal AI models

Just last week, I wrote about Mirasol3B, a multimodal AI model, and this week that news is already old hat, as Google announced their flagship generative AI model, Gemini. This will be available in three flavours. The most performant of the three, known as Ultra, is said to outperform GPT-4 on most benchmarks. It is however not available for public use, and the version that is currently integrated into Bard is based on a less performant Pro version. (See here for the full technical report, and a critique here). Having played around with Gemini Pro on Bard, the experience is fairly similar to ChatGPT (based on GPT-4), but it clearly has a more up-to-date feature set. Looking forward to the general availability og Gemini Ultra.

Introducing Gemini: our largest and most capable AI model
Gemini is our most capable and general model, built to be multimodal and optimized for three different sizes: Ultra, Pro and Nano.

MLOps – A primer

Whilst everyone involved in tech will be familiar with DevOps, the set of software engineering practices that span coding through to operation, its equivalent in the AI space, MLOps (Machine Learning Ops) is not as well known. The behaviour of machine learning and AI models is less predictable than traditional software, as it depends on the data used for training, the model structure and its parameters, the inputs used in production, as well as the software that hosts and interacts with the model. As such productising machine learning in a way that is safe, reliable and repeatable requires a structured approach to how data, models and software are managed, based very much on DevOps principles. Databricks provides a nice summary of the key elements.

What is MLOps?
MLOps is a core function of Machine Learning engineering, focused on streamlining the process of taking ML models to production, and then maintaining and monitoring them.

Older posts…

Google’s new multimodal AI model

As AI models become ever more sophisticated, one of most challenging problems is how to combine different media types together. Video, audio and text data all have very different characteristics in terms of how they are represented in data as well as the AI models used to process them. This means that creating an AI model that can manipulate all forms of media is proving to be a big challenge. A couple of weeks ago, Google DeepMind announced a new model, called Mirasol3B that implements multimodal learning across audio, video and text in an efficient way. The draw for Google is obvious – how can it combine its vast YouTube catalogue in a meaningful way with its enormous largely text-based search engine. Although benchmarking indicates that this model may have broken new ground, researchers have criticised it for the opaqueness on how it works.

Google DeepMind breaks new ground with ‘Mirasol3B’ for advanced video analysis
Google DeepMind announces Mirasol3B, a new multimodal AI system for understanding long videos, but questions remain about real-world applicability.

Using the Human Brain as a template for more efficient AI models

Although it is often claimed that neural netwrorks are modelled on human brains, the vast amounts of material that generative language or image models consume during their training bears little resemblance to how humans learn. Humans are clearly much more constrained in terms of the energy they consume when learning or problem solving. In a recent paper in Nature Machine Intelligence magazine, scientists from the University of Cambridge have sought to model an artificial neural network that contained constraints similar to those found in a human brain. The research showed that these constraints influenced the model to seek more efficient ways of solving problems. This obviously has very interesting implications for the development of AI models, particularly in designing systems that are both adaptable as well as efficient.

Physical Constraints Drive Evolution of Brain-Like AI – Unite.AI
In a groundbreaking study, Cambridge scientists have taken a novel approach to artificial intelligence, demonstrating how physical constraints can profoundly influence the development of an AI system. This research, reminiscent of the developmental and operational constraints of the human brain,

Anthropic’s ChatGPT rival sets an important benchmark

Anthropic, the startup backed by Amazon and Google, announced that Claud 2.1, its Large Language Model can process inputs with up to 200,000 tokens at once, equivalent to 500 pages of text. For comparison GPT-4 supports a token length of 8,000 or 32,000, depending on the model used. Token length, also known as the context window, is important as it represents the quantitiy of input information that it can consider when generating text. For example, this sets the upper limit on text it can summarise, or sets a limit before it can no longer ‘remember’ the previous context.

OpenAI rival Anthropic makes its Claude chatbot even more useful
Claude can now handle double the number of tokens with half the hallucinations and gets new API tools in Anthropic’s latest chatbot update.

The hidden manual effort in creating AI models

Many generative AI systems, such as OpenAI’s GPT family of Large Language Models make use of human labelling to fine-tune and improve the prediction models, in a technique called “Reinforcement Learning from Human Feedback” (RLHF). A Wired article last month explored who carries out this data labelling. The article described how workers in places such as Venezuela, Colombia, east Africa, the Philippines and Kenya manually label images, outputs from large language models as part of their training process.

Millions of Workers Are Training AI Models for Pennies
From the Philippines to Colombia, low-paid workers label training data for AI models used by the likes of Amazon, Facebook, Google, and Microsoft.

The Open Source vs Proprietary AI Models Faultlines

A row has escalated in the past few weeks over the relative threats to public safety of open-source and proprietary AI models. Meta, who famously released the inner workings of its Llama 2 models has come under criticism by some safety advocates who claim this lowers the bar for malicious third parties to use LLMs for nefarious purposes such as cybercrime or developing harmful biological or chemical agents. Unsuprisingly, OpenAI and Meta are on opposite sides of these faultlines, with Sam Altman, OpenAI’s CEO claiming that its closed proprietary model provides the best safeguards against exploitation. Yann LeCun, Meta’s head of AI and one of the godfathers of AI development, strongly makes counter-arguments that the very nature of closed AI systems makes their risks unknowable and can create a monopoly that concentrates humanity’s knowledge into black boxes is a threat to democracy and diversity of opinion.

Protesters Decry Meta’s “Irreversible Proliferation” of AI
But others say open source is the only way to make AI trustworthy

AI one-percenters seizing power forever is the real doomsday scenario, warns AI godfather
The real risk of AI isn’t that it’ll kill you. It’s that a small group of billionaires will control the tech forever.

AI’s Regulatory Outlook

As big tech argues about the merits of open-sourced vs proprietary models, governments around the world are trying to figure out the best way to keep their citizens safe. Some AI luminaries such as Geoff Hinton, Elon Musk and Sam Altman are warning against the existential risks of artificial general intelligence (AGI), while others, including Meta, are more concerned about the more prosaic risk to competition of ‘winner takes all’ economics and the cost of regulatory compliance on open source models. Last week, the US Government issued an Executive Order which places requirements on the establishment of guidelines and best practices, carrying red teaming exercises on large models and requiring disclosure on how large-scale computing clusters are used. This was folllowed by the ‘Bletchley Declaration’ at the AI Safety Summit hosted in the UK that outlines an international consensus on the need for scientific and policy collaboration in the face of AI risk, but was somewhat short on practical measures.

Global leaders scramble to regulate the future of AI
AI is now an issue of global import. Why the next few years will be crucial to balancing its promise with ethical and societal safeguards.

Joe Biden’s Sweeping New Executive Order Aims to Drag the US Government Into the Age of ChatGPT
President Joe Biden issued a wide-ranging executive order on artificial intelligence with measures to boost US tech talent and prevent AI from being used to threaten national security.

Dealing with Prompt Hacking

Despite their size and sophistication, large language models (LLMs) are particularly sensitive to the instructions, or ‘prompt’ used to generate an outcome. In its benign form, this is sometimes called ‘prompt engineering’ and a quick scour of the web throws up prompt templates for anything from creating a CV to answering a high school essay question. The darker side of prompt engineering is ‘prompt hackingexploits’ which uses carefully constructed prompts to work around safeguards and bypass protections in the model. This includes ‘indirect prompt injection’ which exploit the ability of many LLMs’ to ingest data as part of the query. The combination of LLMs’ inherent opaqueness of their inner workings, plus their ability to make use of data of potentially concerning origins, means that they should always be treated with caution, following the cybersec principle of least privileges.

Generative AI’s Biggest Security Flaw Is Not Easy to Fix
Chatbots like OpenAI’s ChatGPT and Google’s Bard are vulnerable to indirect prompt injection attacks. Security researchers say the holes can be plugged-sort of.

State of AI 2023

Nathan Benaich, of Air Street Capital, a VC focused on AI ventures, issues an annual “State of AI report”. The 2023 issue was published a couple of weeks ago and highlights that although GPT-4 currently sets the benchmark in terms of large language performance (it is actually a multi-model model, as it was trained on text and images), efforts are growing to develop open source models that match the performance of proprietary models. Interestingly, the largest models are running out of open human-generated data that can be used to train them. The report provides a nice summary of the key research highlights of the year, where industry is investing, the political and safety implications and predictions for 2024. At 163 slides, it is a fairly hefty read.

State of AI Report 2023
The State of AI Report seeks to trigger informed conversation about the state of AI and its implication for the future.

October ’23

Towards AI Safety (1) – On Mechastic Interpretability

One of the (many) challenges in understanding how neural networks really work is that there is no easily-observable logic to how they actually work. They are not governed by simple mathematical relationships, making them very difficult to diagnose problems, or explain why they predict certain outcomes. The same is true for neuroscientists, who struggle to understand the function of individual neurons in the brain. Anthropic recently published a paper that describes how a large language model can be decomposed into more coherent features, e.g. describing themes such as legal language or DNA sequences, and then use another LLM to generate short descriptions of the features, through a process called “autointerpretability”. This work is aimed to provide a mechanistic model of how neural networks work in order to overcome concerns about their use in safety or ethically-sensitive applications.

Decomposing Language Models Into Understandable Components

Towards AI Safety (2) – Watermarking AI Images

In July, the White House announced that the main large tech companies had agreed to deploy watermarks to determine in a robust way whether they have been generated by their generative AI models. Watermarking involves adding patterns that are difficult to remove from the image that attest to their origin. The Register however reports that a team at the University of Maryland demonstrated that they were able to attack such schemes, both degrading the watermark (i.e. allowing an AI-generated image to skip detection) as well as making non-watermarked images appear as though they are AI-generated. It seems that there is way to go until we have robust systems to identify deepfakes.

Academics blast holes in AI-made images’ watermark security
Basically, it’s ‘not going to work’

Towards AI Safety (3) – Tackling Algorithmic Bias

As AI models, including generative AI models such as ChatGPT, Stable Diffusion, as well as other tools such as classifiers, fraud detectors, credit scoring algorithms and so on are trained on large sets of data, often from the Internet and other public datasets, they contain and reflect the biases contained within them. So, we have seen how a recruitment tool used by Amazon a few years ago showed bias against women as it was trained on the male-dominated applicant pool, while more recently AI-generated barbies displayed all the national stereotypes you might fear (including a German SS officer Barbie!) . An Article in Vox recently provided an overview on why fixing AI bias is so difficult, why it is everywhere, who is most negatively impacted (no prizes for guessing!) and some of the initiatives underway to tackle the problem.

AI automated discrimination. Here’s how to spot it.
The next generation of AI comes with a familiar bias problem.

AI’s Environmental Impact

A recent article in the NY Times quoted an analysis that predicted that by 2027 AI servers would use between 85 to 135 TWh annually, equal to the entire electricity consumption of Sweden, and 0.5% of the world’s electricity. For context, all data centres in 2022 consumed between 1 and 1.3% of the world’s electricity (excluding crypto-mining), and in 2021 was already responsible for 10-15% of Google’s electricity consumption. As many AI training techniques are at their infancy, as AI scales further, a paper by MIT predicts that the algorithms will be optimsed for efficiency and not only for accuracy, for example by stopping underperforming models or tracks early. There is definitely a sense that so far, in the race for AI headlines, all secondary considerations have been put aside. Even if companies and cloud providers are not driven by environmental concerns, the likely constraints on GPU supply will be sufficient motivation to push for algorithmic efficiency.

The environmental impact of the AI revolution is starting to come into focus
Google’s use of AI alone could use up as much electricity as a small country – but that’s an unlikely worst-case scenario, a new analysis finds.

Generative AI copyright gets thorny

While Hollywood actors and writers are on strike, partly trying to protect themselves against AI-created derivatives of them and their work, a US federal court has rules that generative AI work will not be protected under US copyright law. In the ruling, the judge stated that although “copyright is designed to adapt with the times… (it) has never stretched so far, however, as to protect works generated by new forms of technology operating absent any guiding human hand.” In a related development, Microsoft have decided that they will underwrite and defend any customer of its Copilot AI services (which includes its generative coding tools) from any copyright infringement suits.

AI-Generated Art Is Not Copyrightable, Judge Rules
A federal court ruled on August 18 that AI-generated artwork cannot be copyrighted because copyright law only extends to human beings, per The Hollywood Reporter.

A closer look at Meta’s Open-Source alternative to Chat GPT

We have already looked at the impact of open-source large language models as an alternative to ChatGPT. These have the clear advantage that they can be used and trained on domain-specific data and thus be optimised to solve specific tasks, as exemplified by Bloomberg’s financial analysis model. Moreover, as the underlying model and training approaches are public, their strengths, limitations, biases and vulnerabilities are open to inspection. This article provides an overview of Lllama 2 as well explaining how to get your hands on the model and have a play with it yourself. The open source nature of LlaMa2 has stimulated an industry’s worth of activity tailoring it for specific applications, such as Colossal-AI’s large-scale training solution.

Llama 2: A Deep Dive into the Open-Source Challenger to ChatGPT – Unite.AI
Large Language Models (LLMs) capable of complex reasoning tasks have shown promise in specialized domains like programming and creative writing. However, the world of LLMs isn’t simply a plug-and-play paradise; there are challenges in usability, safety, and computational demands. In this article, we

Exploring an LLM software stack

An article describing Colossal AI’s large-scale LlaMa 2 model training solution also provides a pretty clear outline of a typical software stack for training and operating large language models. Worth reading just for this.

Artificial Intelligence at the Edge – Looking at TinyML

Much of the discussion relating to the future of machine learning and generative AI models focuses on how future applications will require larger, more computationally-expensive models, with more parameters and much larger training sets. There is however an alternative approach, which instead considers how best to distribute algorithms on cheap low-power edge devices. The advantages are obvious. If AI algorithms can be run on end devices, be they cars, home sensors, health monitors, agricultural sensors and so on, they do not need to be hosted and processed on a server somewhere. For large systems, this has a significant impact on system resilience, responsiveness and privacy, as devices can be intelligent themselves, rather than sending all data for processing elsewhere. Light-weight models such as TensorFlow lite are designed to operate on the smallest, most power-efficient chipsets to bring AI processing out of the data centre to the real world.

TinyML: Applications, Limitations, and It’s Use in IoT & Edge Devices – Unite.AI
In the past few years, Artificial Intelligence (AI) and Machine Learning (ML) have witnessed a meteoric rise in popularity and applications, not only in the industry but also in academia. However, today’s ML and AI models have one major limitation: they require an immense amount of computing and pro

Updated 2 July

A dissenting view. Why AI may not be as revolutionary to the world economy as we are assuming.

This essay takes a number of steps back and frames AI in the context of its likely long-term societal impact. In other words, where will it sit compared to the inventions, say, of agriculture or the steam engine? The authors, including The Economist‘s economics correspondent, take a refreshingly dissenting view. Challenges that will limit its impact include: drowning in information and content generated by AI, legal constraints of getting computers to make decisions impacting humans, the not insignificant issue of physical world interactions, and the challenge of mimicking human expertise. I don’t agree with everything that is said here, but as a counterpoint to AI hype, this is a fantastic read.

Why transformative artificial intelligence is really, really hard to achieve
A collection of the best technical, social, and economic arguments Humans have a good track record of innovation. The mechanization of agriculture, steam engines, electricity, modern medicine, computers, and the internet-these technologies radically changed the world. Still, the trend growth rate of GDP per capita in the world’s frontier

Will we survive the flood of AI content?

In a previous release of this list, we referenced a paper that touched on the value of human work where AI-generated content is assessed by other AI models. An essay this week by Alberto Romano asks what happens when AI-generated content increases to the point where most content is created by statistical models designed to mimic human coherence. Only this week an author of fiction observed that 81 out in a top 100 chart of self-published Kindle content were AI-generated. Search engines and social media algorithms were tuned to identify an promote content people will find more useful. How will this work when most content is computer generated?

How the Great AI Flood Could Kill the Internet
The web has become a place of passive consumption that AI hustlers won’t hesitate to exploit

Meta’s Transparency Quest

Meta, is in many ways a surprising AI pioneer. No only is it making some of the more interesting forays into open-source models, (see links below), but it is also in explaining how all the relevant recommender and ranking mechanisms work across Facebook and Instagram. While the algorithms themselves are not described in any detail (assume Meta consider these to be their ‘secret sauce’), you can find all the input signals that are used to rank content. This is a lot more than we have learnt from the likes of OpenAI. See here for an interview on Nick Clegg, Meta’s VP of global affairs (and former UK deputy PM).

Our approach to explaining ranking
At Meta, we define AI as the capability of computer systems to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. As part of our ongoing commitment to transparency, we provide tools and information to help you understand how AI at Meta works.

Not so Massive. Device-optimised generative AI

Large language models and large diffusion models for image generation, are, well, large, and are typically limited to server-based deployments. This week Google published a paper that describes how they have optimised the memory needs for large diffusion models, such that they can support image creation from text input in under 12 seconds on a high-end smartphone GPU. While it still requires 2GB of RAM to hold the model parameters. Whilst this paper speaks to the optimisations required for generative adversarial networks, I suspect, we will see a greater focus in reducing the size of inference models to allow for deployment on devices.

Speed is all you need: On-device acceleration of large diffusion models via GPU-aware optimizations
The proliferation of large diffusion models for image generation has led to a significant increase in model size and inference workloads. On-device ML inference in mobile environments requires meticulous performance optimization and consideration of trade-offs due to resource constraints.

Operationalising Machine Learning

With most of the discussions online focusing on AI models, their applications and real-world impact, it is easy to ignore the engineering discipline required to keep an AI product maintainable, reliable and secure. Just as DevOps principles allow software engineers to reliably and frequently release code that works, MLOps (Machine Learning Operations) are essential for creating an auditable, testable, end-to-end ML/AI pipeline from data ingestion, through to model training and tuning, and deployment and management. The AWS blog site provides an overview of how AWS supports MLOps, while Google describes how to implement an automated AI pipeline on their Google Cloud platform.

What is MLOps and Why Does It Matter?
Discover the game-changing approach to managing machine learning workflows with MLOps, and learn why it’s essential for scalable, reliable, and reproducible AI.

AI Security Models

The OWASP (Open Worldwide Application Security Project) Foundation is a community producing open-source tools and best-practice for application software security. Recently it has published a guide on AI security and privacy, tying into broader software good practice and then tying into AI-specific attack surface areas and vectors. A good place to start exploring privacy and security considerations relating to AI models.

OWASP AI Security and Privacy Guide
Guidance on designing, creating, testing, and procuring secure and privacy-preserving AI systems

How self-learning models can outperform much larger LLMs

LLMs are often characterised by the size of the model (parameters) and their training data, with the presumption that bigger is better. Large models however come with the obvious disadvantages of computational cost and privacy protection. However a paper by an MIT professor describes how implementing self-training model, where the AI model uses its own prediction (via a process called textual entailment) to teach itself without human intervention. The resulting 350m parameter model outperformed models such as Google’s LaMDA and GPT models.

MIT researchers make language models scalable self-learners
MIT CSAIL researchers used a natural language-based logical inference dataset to create smaller language models that outperformed much larger counterparts.

The secret inside GPT-4

OpenAI has kept the inner workings of GPT-4 well under wraps, maintaining their aura as leaders in generative language models, with much speculation as to how GPT-4 is able to outperform its predecessors. A blog post this week says that there isn’t an underlying algorithmic or model breakthrough or a much larger model. Instead, GPT-4 connects 8 different models together in an unspecified way. It feels to me that OpenAI is now moving towards keeping the fundamental model structures to itself, presumably to maintain the edge in real-world performance it appears to have over its competitors.

GPT-4’s Secret Has Been Revealed
Unraveling OpenAI’s masterful ploy

The thorny issue of copyright and generative AI

The implications of the use of intellectual property being used in training sets of generative AI models, first arose with image-generating AI and has spawned a number of lawsuits, including one by Getty Images against Stable AI for copyright. It is however virtually impossible to tell with certainty whether a work was including in the training of a Large Language Model. For this reason, the EU are proposing to require companies to disclose the data used when training their models, so as to protect the creators of intellectual property, thereby creating licensing models for AI. We are beginning to see the beginnings of these commercial relationships, with Microsoft, Google, OpenAI and Adobe negotiating with media and information organisations such as News Corp, Axel Springer, New York Times and the Guardian.

Reported EU legislation to disclose AI training data could trigger copyright lawsuits
The EU is reportedly considering a provision in its upcoming AI Act that would force companies to disclose the sources of their training data. This could lead to lawsuits by owners of copyrighted data used for training.

GPT-4 is pretty good at Maths too (or Math, if you are American)

In a recent paper called “Let’s Verify Step-by-Step”, researchers at OpenAI describe how GPT-4 has improved the underlying model to solve maths problems. They applied a supervision model during training called “process supervision” in which feedback is provided on each step of the reasoning, and not simply on the outcome. The paper (see here) shows that this optimised model, based on GPT-4 can successfully solve 78% of problems in a test problem set, though does not provide information for vanilla GPT-4. Nevertheless, having experimented a bit myself, it is clear that GPT-4 currently is a lot more reliable at maths problems than GPT-3.5.

OpenAI improves GPT-4’s mathematical reasoning with a new form of supervision
OpenAI shows an AI model that reaches the state of the art in solving some mathematical problems. Is the method used the future of GPT-4?

70% of developers aim to use AI

A survey by the developer chat site Stack Overflow released a couple of days ago showed that 44% of developers already use AI tools when coding, and a further 26% plan to do so soon. GitHub Copilot, a code completion tool is by far the most popular AI-powered developer tool, while ChatGPT is unsurprisingly the most used AI-enabled search tool. This clearly has implications for AI usage policies for companies, which are largely reticent to use such tools do the risk of IPR leakage.

70% of Devs Using or Will Use AI, Says Stack Overflow Survey
ChatGPT is most popular among AI search tools. The global study of more than 90,000 developers also noted declines in cloud provider usage.

Google introduces a framework for Secure AI.

Unimaginatively titled Secure AI Framework (SAIF), this initiative by Google brings good infosec practice into the artificial intelligence space. Its six pillars include building on established cloud and infrastructure security principles, implementing perimeter monitoring of inputs and outputs of AI models to detect anomalies, building the ability to scale to deal with automated attacks and creating fast feedback loops for vulnerability detection and mitigation.

Introducing Google’s Secure AI Framework
Today Google released released the Secure AI Framework to help collaboratively secure AI technology.

GPT-4’s understanding of the world’s geography

There have been several studies exploring large language models’ ability to understand different categories of information, including software, exam curricula and literature. In a paper published on arXiv, scientists at a number of universities discover the remarkable geographic understanding within GPT-4, which is able to return basic geographic data such as socio-economic indicators, and physical geography such as topography. More impressively, it can carry out route planning and figure out routes, such as key transport routes (maritime, rail and air). Whilst subject to hallucinations and missing data, it is really quite impressive.

Click to access 2306.00020.pdf

Bloomberg’s purpose-built finance large language model

We have already seen many papers and articles on how generalist LLMs such as GPT-3 can be applied to solve problems across a broad range of domains. A couple of months ago, Bloomberg announced BloombergGPT^TM , an LLM based on BLOOM with a 363 billion token dataset created from Bloomberg’s private archive of financial data, and augmented with a 345 billion token public data training set. In a paper published on arXiv, Bloomberg claims that this model outperforms larger, more-general purpose LLMs for tasks such as financial sentiment analysis, named entity recognition, and conversational reasoning of financial data. This is a remarkable case study for anyone considering creating a domain-specific LLM.

Introducing BloombergGPT, Bloomberg’s 50-billion parameter large language model, purpose-built from scratch for finance

Will incumbents’ moats see off the waves of AI start-ups?

A few weeks ago, I explored the much-publicised clarion call of a Google researcher who claimed that Open Source will eat Google’s AI breakfast. Alberto Romero, a researcher at Cambrian AI takes a dissenting view, arguing that open-source models are tuned against closed-source models, through a process called self-instructing and consequently without incurring the prohibitive cost of training an LLM from scratch, open-source models will struggle to compete. Secondly, incumbents have access to millions of customers, which gives them an unparalleled route to market. We have already seen how Microsoft is integrating GPT into GitHub and its Office suite of productivity software, while Adobe’s FireFly has been wowing Photoshop users. Watch this space.

Open Source AI Is Not Winning-Incumbents Are
Progress in open-source (OS) generative AI (particularly language models, LMs) has exploded in recent months. As a consequence-and with the help of desperate internal confessions -people believe it has become a threat to incumbent companies like Google and Microsoft, and leading labs like OpenAI and Anthropic.

AI-assisted Writing and the Meaning of Work

Ethan Mollick, a Wharton professor, is one of the most insightful observers of the implications of AI. As Microsoft prepares to embed GPT-4 within its Office suite, it is surely only a matter of a few weeks before AI-assisted ‘Word’smithing becomes commonplace. What meaning and value do we assign to work that has only taken a few prompts to a generative AI system to create? The efficiency gains will be amazing, but expect it to be a bumpy ride.

Setting time on fire and the temptation of The Button
I saw a bit more of the future of AI at work this week, and it shows every sign of vastly boosting productivity, while also causing a crisis of meaning in many organizations. For such a dramatic statement, the actual bit of AI technology I got to experience this week is incredibly minor.

Meta’s Multilingual Model supports over 1000 different languages.

We have already seen below the strides Meta is making, including on multi-modal foundation models. The rate of innovation shows no signs of abating, having recently open-sourced a multilingual model that was trained on 1,162 languages. To overcome the dearth of labelled datasets for many of the world’s languages, the researches used textual and audio data from religious texts, including the Bible, before unsupervised learning was applied to a further 4000 languages. The model was made available through extensions to Facebook AI’s popular PyTorch library.

Meta AI Launches Massively Multilingual Speech (MMS) Project: Introducing Speech-To-Text, Text-To-Speech, And More For 1,000+ Languages
Significant advancements in speech technology have been made over the past decade, allowing it to be incorporated into various consumer items. It takes a lot of labeled data, in this case, many thousands of hours of audio with transcriptions, to train a good machine learning model for such jobs.

The genesis of ChatGPT

The MIT Technology Review has a great story on how ChatGPT was released. What really stands out is how surprised the team at OpenAI were by how it became a viral sensation, and how the importance of accuracy has increased now that it is acting effectively as a search engine.

The inside story of how ChatGPT was built from the people who made it
Skip to Content Exclusive conversations that take us behind the scenes of a cultural phenomenon. When OpenAI launched ChatGPT, with zero fanfare, in late November 2022, the San Francisco-based artificial-intelligence company had few expectations. Certainly, nobody inside OpenAI was prepared for a viral mega-hit.

Will AI veer towards open or closed-sourced models?

Although Meta (i.e. Facebook) has not quite been hitting the AI headlines, its team, led by YannLeCun, is very active. One of its most significant contributions has been the release of LLaMA, a large language model trained on 1.4 trillion parameters. By sharing the code, the Meta team is hoping to drive faster innovation, particularly in adapting it to different use cases. Likewise, Stability AI has open-sourced its text-to-image model in the hope of benefitting from innovation amongst developers. Although a researcher at Google claimed that fighting Open Source is a “losing battle”, the size of these models means that they are trained by large tech companies, and it is unclear for how much longer this openness will persist.

The open-source AI boom is built on Big Tech’s handouts. How long will it last?
Skip to Content Greater access to the code behind generative models is fueling innovation. But if top companies get spooked, they could close up shop. Last week a leaked memo reported to have been written by Luke Sernau, a senior engineer at Google, said out loud what many in Silicon Valley must have been whispering for weeks: an open-source free-for-all is threatening Big Tech’s grip on AI.

Updated 23 May

Meta’s Multimodal AI Models

Multimodal AI models link different content types (e.g. text, audio, video) into a single index or ’embedding space’ and are increasingly a subject of much research. For example, AI image generators such as Midjourney and DALL-E link text decoders with image inference models. (See here for a good overview). Meta has announced an open-sourced AI model called ImageBind that brings together text, image, video, audio and sensor data (including depth, thermal and inertial measurements). For example, Meta claim that the model could create an image of a rainy scene from the sound of rain, or conversely add appropriate audio to a video sequence.

ImageBind: Holistic AI learning across six modalities
When humans absorb information from the world, we innately use multiple senses, such as seeing a busy street and hearing the sounds of car engines. Today, we’re introducing an approach that brings machines one step closer to humans’ ability to learn simultaneously, holistically, and directly from many different forms of information – without the need for explicit supervision (the process of organizing and labeling raw data).

Updated 22 May

Large Language Model Landscape

With all the attention being heaped upon OpenAI’s ChatGPT, you’d be forgiven for thinking that GPT-3.5/4 was the only large language model in town. However, as a blog site maintained by Alan Hardman makes clear, this is an increasingly crowded space. The author helpfully also provides a list of LLMs, datasets , benchmarking and labs. An essential reference.

Inside language models (from GPT-4 to PaLM)
👋 Hi, I’m Alan. I advise government and enterprise on post-2020 AI like OpenAI GPT-n and Google DeepMind Gemini. You definitely want to keep up with the AI revolution this year. My paid subscribers (NASA, Microsoft, Google…) receive bleeding-edge and exclusive insights on AI as it happens. Get The Memo.

Updated 18 May

Annual Stanford AI Index Report

Not exactly a quick read, and I certainly have not yet been through it all yet, but if nothing else the key takeouts provide a quick snapshot of the state of AI. Points of interest are that industry has firmly taken over from academia in creating new AI models (not coincidentally the exponential increase in training compute continues), Chinese universities lead the world in AI publications and AI models performance continues to improve, with some categories out-performing the human baseline (such as language inference). A good long read.

AI Index Report 2023 – Artificial Intelligence Index
The AI Index is an independent initiative at the Stanford Institute for Human-Centered Artificial Intelligence (HAI), led by the AI Index Steering Committee, an interdisciplinary group of experts from across academia and industry. The annual report tracks, collates, distills, and visualizes data relating to artificial intelligence, enabling decision-makers to take meaningful action to advance AI responsibly and ethically with humans in mind.

Microsoft researchers make claims on Artificial General Intelligence

It is said that one way to destroy your credibility in the AI space is to claim that you have built a system capable of Artificial General Intelligence. This is generally taken as being an AI system that can be applied to a number of unrelated fields, achieving a level of capability similar or exceeding human capabilities. Cade Metz, who recently broke the news of Geoffrey Hinton’s AI concerns reports that researchers at Microsoft recently published a paper stating that GPT-4 is demonstrating ‘sparks’ of AGI that surprised the researchers. They describe a number of tasks that it was able to carry out including (rather incredibly) write a rhyming proof that there are an infinite number of prime numbers.

Microsoft Says New A.I. Shows Signs of Human Reasoning
A provocative paper from researchers at Microsoft claims A.I. technology shows the ability to understand the way people do. Critics say those scientists are kidding themselves. Send any friend a story As a subscriber, you have 10 gift articles to give each month. Anyone can read what you share.

Updated 15 May

OpenAI Improves ChatGPT Privacy

Addressing concerns in many jurisdictions about the privacy impact of ChatGPT (such as Italy which temporarily banned its use), OpenAI has introduced features to allow individuals and organisations to request that they do not appear in answers via a Personal Data Removal Request Form. This is however only the opening gambit and is unlikely to satisfy regulators, as it has no bearing on whether data (correct or otherwise) can be restricted from training, and the risk of your chat history influencing the answers it gives other users.

How To Delete Your Data From ChatGPT
OpenAI has now introduced a Personal Data Removal Request form that allows people-primarily in Europe, although also in Japan-to ask that information about them be removed from OpenAI’s systems. It is described in an OpenAI blog post about how the company develops its language models.

Updated 13 May

How do Transformers Work?

Hugging Face, a company that provides AI models and datasets to developers offers a free online course on natural language processing (NLP) which contains a nice overview of the workings of the transformer, the AI architecture that underpins most modern large language models. For those of a more technical bent, you can read the original paper where Google scientists first described the transformer model and its attention mechanism. [Updated 22 -May] For further, more accessible, descriptions of the transformer model, see Transformers from Scratch by Peter Bloem, who also offers code in github and a few video lectures.

Updated 12 May

How do Transformers work? – Hugging Face NLP Course
In this section, we will take a high-level look at the architecture of Transformer models. Here are some reference points in the (short) history of Transformer models: The Transformer architecture was introduced in June 2017. The focus of the original research was on translation tasks.

Size does not always matter

An article by the IEEE argues that there isn’t an inexorable correlation between size and model performance and that smaller models trained on larger datasets can outperform larger models. The cost of training between these two scenarios is unclear, but this may be the start away from a ‘number of parameters’ arms race, or “my model is bigger than yours” debate. Sam Altman, CEO of OpenAI made similar point at an MIT conference last month.

When AI’s Large Language Models Shrink
Building ever larger language models has led to groundbreaking jumps in performance. But it’s also pushing state-of-the-art AI beyond the reach of all but the most well-resourced AI labs. That makes efforts to shrink models down to more manageable sizes more important than ever, say researchers.

Updated 11 May

A look at ChatGPT’s Code Interpreter – A program that creates programs

The dystopian scenario that keeps AI pessimists alive is a future where AI systems are able to generate new AI systems, getting into a runaway loop of ever-improving capability which humans are powerless to stop. I am quite sceptical of this scenario, but GPT-4’s code interpreter, a sandboxed environment where GPT-4 can create, run and improve Python code is quite amazing, a preview of a world to come, and one that is particularly well-suited for complex data analysis.

It is starting to get strange.
OpenAI may be very good at many things, but it is terrible at naming stuff. I would have hoped that the most powerful AI on the planet would have had a cool name (Bing suggested EVE or Zenon), but instead it is called GPT-4 . We need to talk about GPT-4.

Stanford University study on the impact of AI assistants on productivity

It has often been claimed that Artificial Intelligence can do for white-collar work what automation has already done for manufacturing. This is a viewpoint I subscribe to, and Stanford University’s Human-Centered Artificial Intelligence (HAI) centre has shown that call centre workers at a Fortune 500 software companies did indeed see an average of 13.8% productivity increase. Of particular note, the AI assistant was able to accelerate the up-skilling of workers, reaching productivity levels in two months that would previously have taken six months.

Updated 9 May

Will Generative AI Make You More Productive at Work? Yes, But Only If You’re Not Already Great at Your Job.
Scholars examining the impact of an AI assistant at a call center find gains for less experienced workers.

Google and OpenAI struggling to keep up with open-source AI

A Google researcher claims that the current generation of Large Language Models do not have any intrinsic insurmountable defenses, and that the threat/opportunity (depending on your vantage point) of open-sourced AI models was being overlooked. Following the leaking to the public of Meta’s open-sourced LLaMA model, a flurry of innovation has resulted in models being trained for as little as $100 worth of cloud compute.

Google and OpenAI struggling to keep up with open-source AI, senior engineer warns – SiliconANGLE
Google LLC and ChatGPT developer OpenAI LP face increasing competition from open-source developers in the field of generative artificial intelligence, which may threaten to overcome them, a senior Google engineer warned in a leaked document.

WIRED, The Hacking of ChatGPT is Just Getting Started

An insight into what it means for a large language model to be compromised, the techniques being used to bypass ChatGPT’s safeguards, and how the field of Generative AI vulnerability and security research is still in its infancy.

The Hacking of ChatGPT Is Just Getting Started
It took Alex Polyakov just a couple of hours to break GPT-4. When OpenAI released the latest version of its text-generating chatbot in March, Polyakov sat down in front of his keyboard and started entering prompts designed to bypass OpenAI’s safety systems. Soon, the CEO of security firm Adversa AI had GPT-4 spouting homophobic statements, creating phishing emails, and supporting violence.

Cade Metz, Genius Makers

For a fast-moving history of how a collection of doctoral students, researchers and academics persevered for years in relative obscurity before being suddenly cast into the spotlight as they became the most sought-after talent in tech. Charting the genesis of companies that are now household names such as DeepMind and OpenAI, Genius Makers tells the story of the multi-million tussle between Silicon Valley giants as they raced to assemble the best AI teams to build systems that can lay claim to human or super-human intelligence. Fascinating reading, especially as this book was written before generative systems exploded into the public consciousness.

Genius Makers
‘This colourful page-turner puts artificial intelligence into a human perspective . . . Metz explains this transformative technology and makes the quest thrilling.’ Walter Isaacson, author of Steve Jobs ____________________________________________________ This is the inside story of a small group of mavericks, eccentrics and geniuses who turned Artificial Intelligence from a fringe enthusiasm into a transformative technology.

The Economist – Special AI Edition

The Economist starts its special edition on artificial intelligence with an essay that takes the long view on the impact AI may have on humans’ sense of self and exceptionalism. Comparing it to the invention of printing, the dawn of the world wide web, and psychoanalysis, the essay posits that advances in AI will also lead to a reassessment of how humans understand the world. Are LLMs simply sequencing words, or is something more fundamental emerging? Less controversial but equally enlightening articles on how ChatGPT works (self-attention models), and whether they provide societal-level risks.

How to worry wisely about AI | Apr 22nd 2023 | The Economist
How to worry wisely about AI – Weekly edition of The Economist for Apr 22nd 2023. You’ve seen the news, now discover the story.

A paid-for

1 thought on “Artificial Intelligence Good Reads – Dec 23”

Donald Fager

October 11, 2023 at 1:14 pm | Reply

Thank you for sharing this curated list of AI articles, Simon! It’s clear that AI is rapidly evolving and influencing various aspects of our lives. The diversity of topics in your list reflects the breadth of AI’s impact, and it’s great to see that you’re continuously updating it.