Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
post

ChatGPT and Large Language Models

In this episode, the focus is on ChatGPT-4 and large language models – how they have revolutionized our interaction with machines, opening a world of possibilities and raising intriguing questions about our technological future.

About The Guest

Daniel Whitenack is a data scientist at SIL International and also co-hosts an AI podcast called Practical AI. At SIL International, they specialize in language-based work, including literacy, education, and translation, along with mapping language populations worldwide. Daniel is also building Prediction Guard, a tool designed for integrating AI into applications.

What Is ChatGPT-4?

On the frontend, ChatGPT-4 is a user-friendly chat interface that answers queries in natural language text. It is a causal-based language model, which means it predicts what comes next in the sequence of characters based on what came prior in the sequence of characters. The AI system takes a user’s natural language text and attempts to predict what would naturally follow – essentially completing the sequence of characters.

How Do Predictive Language Models Work?

At their core, language models are powered by neural networks. These neural networks can be viewed as a pipeline of data transformations: you input a set of data, which undergoes a series of transformations through a myriad of functions, and you receive a different set of data at the other end. Often, this data is a sequence of numbers that can be decoded into text.

Training Predictive Language Models

A vital aspect of predictive language models is the parameters that help shape the data transformations. These parameters can number in the billions, but the challenge is in finding the right parameters that create the desired transformation. A model has to learn to find the right parameters and gets better at it with repeated training and human feedback.

Typically, a model’s process of finding the right parameters is akin to trial and error. You give the model a large array of text examples and set it a task of predicting the next word in the sequence. By doing this repeatedly with varied sequences, the model learns to adjust its parameters, just like a child learns to speak by mimicking and repeating words. This simple yet labor-intensive learning happens on a massive scale, with large models training on texts from the entire internet or vast collections of books. The scale is the secret ingredient that gives these models their immense predictive power.

Why Human Input Is Needed in The AI Training Process

Despite the considerable strides made in AI, they remain, at their core, advanced pattern matchers. The models derive probable combinations of text based on the patterns they have observed and learned. However, without human input, this autonomous learning can lead to perplexing, and sometimes disturbing, results.

Take the instance of the AI model Galactica. It was trained extensively on scientific literature, and so, when asked an absurd question like the number of giraffes that have visited the moon, it provided a detailed, scientific response complete with citations. Clearly, this is not the response a human would ideally want. By integrating human judgment and intuition, these models are refined and fine-tuned to generate more accurate and contextually appropriate predictions.

What Does Temperature Mean In AI?

The balance between predictability and creativity in AI models presents a paradox. The ‘sameness’ induced by pattern-matching contrasts starkly with the creativity these models exhibit. This is made possible by an aspect called ‘temperature’.

In AI, temperature is a hyperparameter that controls the degree of variation the model can exhibit while choosing the next probable sequence. It essentially introduces an element of creativity in the otherwise structured world of AI, mimicking the human ability to express the same idea in various ways.

Having this variability opens up possibilities for AI to act as a creative muse, prompting new ways to approach challenges and tasks. However, high temperature can lead to unpredictable and inconsistent results, highlighting the crucial role of human involvement in configuring the parameters according to specific needs.

How AI Hallucinations Occur

AI hallucinations occur when an AI model produces coherent yet factually incorrect information. This phenomenon can even occur with low ‘temperature’ settings, as it is fundamentally tied to the data the model has been trained on. If the model is prompted with a question that is misrepresented, or not represented at all in its training data, it might fabricate an answer based on the patterns it has learned. To mitigate this, it is advisable to practice AI grounding.

Countering AI Hallucinations with AI Grounding

AI Grounding refers to the practice of infusing external, factual knowledge into the prompts. This can be done by integrating a reliable knowledge base or a set of documentation into the model. By prompting the model based on this grounded context, it is effectively anchored to factual reality, reducing the risk of hallucinations.

A Surprising Creativity in AI

The initial expectation for AI was that it would excel in logic but lack creativity. However, the reality has turned out to be the complete opposite. AI models today demonstrate a surprising degree of creativity, capable of generating images from text or even creating a unique rap song. Yet, they often fall short when it comes to logical consistency, requiring human intervention to ensure factual consistency and grounding in reality.

Prompt Engineering In AI

Prompt engineering, as it applies to artificial intelligence (AI), refers to the structuring of prompts or requests to elicit the most accurate, relevant and appropriate responses from AI models. Like a Google search, you input a query and expect an answer, but there’s more to it when dealing with AI.

In AI, a good prompt is grounded in reality and context. This grounding differentiates it from a simple Google search as it involves injecting external knowledge and setting parameters around which the AI should generate a response.

Another dimension of prompt engineering is controlling the AI’s output. You can design the prompt to instruct the AI on what to do when it doesn’t find an answer in the given context. For example, the prompt could instruct the AI to respond with an apology if it cannot find an answer.

AI As an Evolutionary Tool in Coding

In the coding domain, AI has the potential to serve as a digital pair programming assistant. This is already evident in the usage of tools like GitHub’s Copilot, which augments code writing. These tools can predictably generate code structures, automating repetitive tasks like pulling data from SQL databases.

However, the issue of ‘hallucination’ remains a significant challenge. Even though AI can generate near-perfect code, human intervention is often required to ensure accuracy and usability. This makes AI more of a code-generation aid (saves considerable time and effort) than a fully autonomous code-writing entity.

A Future with AI Technology: A Tool for Evolution or Elimination?

The critical question is whether we should fear AI or embrace it. The concerns typically revolve around job loss due to automation or a dystopian future where AI gains sentience and seizes control. However, a more balanced view acknowledges that while AI systems have their risks, they are tools that transform the way we work rather than eradicate employment.

AI systems, much like the evolution from typewriters to computer word processors, introduce changes to job roles rather than eliminating them entirely. While certain roles may become obsolete, new ones emerge that require an evolved set of skills. For instance, data entry jobs are more abundant now than during the era of typewriters, albeit in a different format.

Concerns with AI should focus more on the over-reliance on AI systems without adequate understanding of their limitations and without robust engineering to manage edge cases and potential system failures.

This is perhaps the quote for the episode that I have spent the most time thinking about: “We always thought AI would be logical and lack creativity – but it is almost the exact opposite.” This reframes the idea of being wrong to being creative, which I think you could argue really depends on the context.


In Conversation

Meet Daniel Whitenack

Daniel: Welcome back to the podcast. For those who didn’t hear the last episode, who are you and what do you do?

Daniel Whitenack: I’m a data scientist with SIL International, an international NGO that does language-related work around the world — literacy, education, translation, and even some mapping of language populations. I’m also building a product called Prediction Guard, a tool for those integrating AI into their applications, and I’m the co-host of a podcast focused on AI called Practical AI.

What ChatGPT Is

Daniel: ChatGPT-4 — what is it?

Daniel Whitenack: Most people have seen the basic chat interface where you ask a question. On the back end, these systems take that natural language input and try to predict a completion of it. These are mostly what are called causal-based language models — it’s a fancy term for saying it’s predicting what comes next in the sequence of words or tokens or characters, based on what came prior in that sequence.

How Predictive Language Models Work

Daniel: How is it making these predictions?

Daniel Whitenack: Under the hood is a neural network — you can think of a neural network like a series of data transformations. There’s no fairy dust or magic. You put in one set of data, usually a vector of numbers, it’s transformed through a series of parameterized functions, and out the other end comes another sequence of numbers, decoded into text. When people hear about models with 60 billion or 200 billion parameters, those are the parameters of those transformations. The way you find those parameters is essentially trial and error: you give the model a sequence of words, ask it to predict the next word, and do that repeatedly. It’s simple but labor-intensive, and it happens on a massive scale — training on texts from the entire internet. The scale is the secret ingredient that gives these models their predictive power.

Human Feedback in Training

Daniel: It sounds like it’s not enough just to have a lot of data — you need guidance from humans.

Daniel Whitenack: At the end of the day, these models are pattern matchers — extremely complicated and advanced pattern matchers, but the output probabilities are such that the system is making probable combinations of text it’s seen before. Without human input that can produce disturbing results — a model called Galactica, trained on scientific literature, would answer “how many giraffes have visited the moon” with a detailed scientific response complete with citations. ChatGPT integrated something called reinforcement learning from human feedback: you gather a data set of human preference — how a human would rate responses — and use that preference data to train a second model paired with the language model, which fine-tunes it to human preferences. That’s what stunned a lot of people — prior models gave good text, but it was weird; ChatGPT immediately felt like how you wanted it to answer.

Temperature and Creativity

Daniel: Can you help us understand temperature, and how it leads to creativity?

Daniel Whitenack: When you predict the next word in a sequence, there’s a most probable prediction. If you always choose the most probable word, you’d get good text — but you’d get the same thing every time. So developers built in a concept called temperature: a control parameter for how often the model chooses the strict next probable thing versus, maybe, the third or fifth most probable. Human language has a lot of variability — we rephrase things all the time. If you set temperature too high and expect reliable, consistent results, you may not get them. But where it’s cool is using it as a muse — a way to infuse creativity when you have writer’s block or want a new way to do something.

Hallucinations and Grounding

Daniel: How does that tie in with the idea of hallucinations?

Daniel Whitenack: If you ask about something not well represented in the training data, or where the answers conflict, the model has no problem outputting coherent text that responds to your prompt — it might just be completely factually incorrect. That’s the giraffes-on-the-moon thing: there’s no connection to reality, no intent, no understanding — it’s computing a sequence of tokens. So how developers should think about using these models is by pairing them with some grounding — external knowledge infused into the prompts. I could have a knowledge base, search it, and insert factually correct documentation into my prompt, saying “answer this question based on this context.” Now I’ve grounded the output in reality rather than assuming the model responds well based on its training.

Prompt Engineering

Daniel: How is prompt engineering different from just being very good at Googling things?

Daniel Whitenack: How I think about it is: how can I construct a prompt that reduces the risk of the output as much as possible? Version 0.1 is just asking the question. The next version injects external knowledge — maybe a semantic search of Wikipedia — into a template that says “answer the following question based on the following context.” Then I can engineer further: “if the answer is not explicitly stated in the context, respond with ‘sorry, I wasn’t able to get the answer.'” The second level is that you don’t have to use a single prompt — there’s a concept of chaining, or Chain of Thought, where you do a series of prompts: classify the question, get data from a database, answer it, rephrase it, find a related image, and combine all of that into a rich output.

Daniel: What about using this to write code?

Daniel Whitenack: Even before ChatGPT I was using GitHub’s Copilot to augment my code writing. Code has similarities with language but is a bit more predictable. If I ask ChatGPT to write a function that pulls data out of a PostgreSQL database, it will do that almost perfectly — but there’s still that element of hallucination, so the human needs to look at it and post-edit. Right before this call I asked it to write D3 JavaScript code to visualize a GeoJSON file, and it output code referencing some file that’s not the file I want. It’s easy for me to modify, and that’s how I’ve been using these tools — for very predictable things that have been done a million times. You should always assume there could be hidden nuggets of badness in the output.

AI and the Future of Work

Daniel: Who should be worried about this?

Daniel Whitenack: I very seldom have as my worry the sentience and singularity scenario, nor the worry that these systems will ruin all our jobs. What’s been demonstrated in the past is that these systems change the way we work, and the people that latch onto them will move on to the next things. Think of people who adopted computer word processors over typewriters — some pure typists lost their jobs, but there are still people doing data entry, just differently, and in fact there are more of them now. What I think about in terms of risk is over-reliance on these systems without enough engineering thought about edge cases and how they might fail.

Daniel: You gave a great example before about radiologists. Could you resurface that?

Daniel Whitenack: There was a lot of concern because computer vision systems are actually better at identifying certain diseases in medical imagery than human doctors. But there are always edge cases — rare diseases, special conditions — and doctors are adept at knowing those. What’s been found over time is that the power comes when you combine the human radiologist with the computer vision system, producing a hybrid that gives more thorough diagnosis than either alone. There’s more demand for radiologists now than there’s ever been — specifically for radiologists who are trained on how to use these AI systems.

Daniel: Are you more excited or less excited about the future of this?

Daniel Whitenack: I’m very hopeful and very excited. The tooling and interfaces are getting better, so it’s not just AI practitioners who can integrate these tools — in a low-code way, many more people are exploring integration into their applications. That brings a wider perspective and a wider diversity of eyes onto these systems, and I think the systems will continue to be made better because of that.

About the Author
I'm Daniel O'Donohue, the voice and creator behind The MapScaping Podcast ( A podcast for the geospatial community ). With a professional background as a geospatial specialist, I've spent years harnessing the power of spatial to unravel the complexities of our world, one layer at a time.