Artificial Intelligence In Simpler Terms
The guest on this episode is Daniel Whitenack. He is a data scientist at SIL International, a teacher of AI, and a co-host of the Practical AI podcast. With his background in computational physics and his day to day role as a data scientist, Daniel has developed expert-level modelling and math skills. Currently, he is working on AI technology that benefits local language communities (i.e. speech recognition for local languages, machine translation, and different natural language processing techniques). On today’s show, he helps to demystify questions around Artificial Intelligence (AI), machine learning, and deep learning.
What Is AI?
Put simply, Artificial Intelligence can be thought of as a function of a software. In programming, a function is the logic or algorithm that performs a certain transformation on an input, and gives a certain output. Usually, a function is created by a developer who curates the logic associated with that function. The developer specifies exactly what happens to the input, how it is transformed, and how the output is provided. In the case of AI, instead of a human specifying all the logic of a function, they create a function whose internal parameters are not set. These parameters are left to be set by the computer itself later on. This is the main distinction of AI functions from other conventional functions in software engineering, the computer is trained to select its own parameters to achieve a goal.
What Is Algorithm Training?
When an AI model is created, its functions are un-parameterized. This means that although the structure for handling the desired data transformation is defined, the parameters that should be used are not specified. The computer has to learn and figure out the right parameters that will transform the input into the desired output so that it can fill them out itself for future runs.
In algorithm training, the computer learns to set the optimal parameters for an AI function through a trial and error cycle. The training data provided to the AI algorithms includes samples of the expected input to the model, as well as samples of the outcome that the model is expected to produce. Using the sample data, the computer sets the parameters for the algorithm. Tests are conducted to see which parameters produce the best results in comparison to the sample output, and the model’s settings are refined through an iterative process until this is achieved.
What Are Transferable Models?
In the world of AI models, transferability means that an existing model that solves a certain problem, and can be used to solve another different, but very similar problem. Tweaking the parameters a little bit can make the model useful to another use case. For instance, by just tweaking the parameters of a model trained to recognize dogs in images, it could be used to recognize cats in images. Transferable models are more computationally favorable since training does not start from scratch, but instead makes use of the knowledge and efforts already developed in the parameter and training sample set.
What Are Model Architectures?
An architecture of models is a configuration of a neural network that is composed of different layers bolted together. Each layer in the neural network is better than its counterparts in processing particular types of data. Model architectures are useful in applications like Natural Language Processing (NLP) which often uses recurrent layers to process the sequences of text. In NLP, text is treated as a sequence of words and characters. The neural networks process the order in the sequence and the relationship between parts of the sequence in order to pull meaning from the individual words to create a sentence, and impose context.
When Should AI Be Used?
There are typically two instances that would tell whether AI is beneficial for a certain application. The first is if the data transformation that needs to happen is so complicated that a human cannot seem to be able to do it reliably every time. An example of this is detecting mental health issues from a person’s voice. While it is nearly impossible for a human to do this due to the very complicated data transformation that needs to happen, an AI model would do it very efficiently.
The second is the scale of the task. When the scale of the task is too large and repetitive, a human may not be the most efficient option for doing it. For instance, it would be tedious for a human to have to classify two million aerial images, breaking them down into different land use types. At such large scales, it would be best to automate with a model.
What is the Difference Between Machine Learning and Deep Learning?
The main distinction between machine learning and deep learning is the scope at which the models are operating. Machine learning models operate within a smaller scope, and do not usually utilize neural network architectures (e.g. decision trees, random forests, naive Bayes, among many more). On the other hand, deep learning occurs at a larger scope, and requires less human intervention than machine learning.
Deep learning usually involves neural network architectures that can take in millions or even billions of parameters. Significantly more data is required to properly fit the parameters of functions in deep learning than what would be required in machine learning models. Similarly, more computing power is needed in deep learning than in machine learning.
Making the Choice Between Machine Learning and Deep Learning
There are some situations where deep learning, though applicable, may not be the best option for your use case. One is when we require clear interpretability of the decisions we make. Certain industries, i.e. in finance or health care, there could be a high burden or government regulation that requires you to be able to explain and audit the decisions that you are taking. This means clear documentation on every step taken in a process. In cases such as this, a simpler model may be more appropriate since it is possible to actually tell how a decision was made, you aren’t plugging everything into a black box.
The other element is the cost involved. In certain cases, a lot of specialized, expensive hardware may be required in a deep learning AI project which you may not already have access to. If a simpler machine learning model can be trained on a laptop to solve the same problem, then there shouldn’t be a need to spend thousands of dollars on specialized AI hardware clusters for deep learning.
Do You Need Specialized Hardware to Use AI?
The phases of working with AI models can be broadly put into two categories – training and inferencing. The training side is often the more time and resource intensive side that would require specialized hardware, especially if it is deep learning. A lot of computations are required to set the parameters of a model and extra Graphical Processing Units (GPUs) may be necessary depending on the scale of the data.
Inferencing is when the model is making predictions after it has already been trained. Specialized hardware is not usually required, save only for certain use cases. In real time processing and data feeds, where instant results are required, specialized inference hardware might be required. Otherwise, many AI models can run on CPUs since inferencing is not as computationally intensive as training.
Demystifying AI
For quite some time, the AI industry has faced two extremes: a heightened hype of unwarranted expectations, and on the other end, a total shutdown on the applicability of AI in certain cases. This is well captured in the words of Daniel Whitenack;
“If you are on the side of things where you are thinking AI is the solution to every problem, then you are overestimating its utility and where it is at the moment. On the other hand, if you are running a business of any size, operating at least some type of technology, and you think AI cannot solve any of your problems, then you might be mistaken as well.”