Basic concepts of Machine Learning and Artificial Intelligence.

5 min readFeb 13, 2023

Introduction

Much has been said lately about Artificial Intelligence, predictive models and natural language processing. The topic is becoming more and more prominent with the advance of applications such as Chat GPT, autonomous cars, and customer service robots.

The subject is quite complex, but do you know at least the basics of how it all works? No… it is not alien technology! Come with me in this article to understand the fundamental principles of AI.

What is the definition of artificial intelligence? The answer to this question is somewhat abstract, and varies according to the view of each scientist. In general terms, artificial intelligence is a form of computationally created intelligence, based on large amounts of data and algorithms that use statistics, computation, and other mathematical tools to generate some conclusion, prediction or action.

Some kinds of AI are:

Computational Vision
Natural language processing
Sentiment Analysis
Text to speech and speech to text translation
Data predictions e.g. temperature, credit score and delivery time.
Data classifications e.g. returning the species of a plant based on its characteristics or defining consumer classes in e-commerce.

Of course, we must combine different AI techniques to be able to build a robust solution. To develop a telemarketing or answering bot, for example, we need to combine natural language processing, sentiment analysis, speech-to-text conversion, and a few other pieces to give a really satisfying user experience.

The good news is that although all of this is quite complex, there are platforms and tools that abstract the hard part and provide interfaces so that we can build solutions faster — imagine if we always had to build these solutions from scratch, it would be reinventing the wheel all the time.

This is what major cloud platforms like AWS, IBM Watson, Azure and Google Cloud are investing in over the last few years. With these tools we can easily send a text via API and receive a blob with the transcribed audio, without worrying about how it was done.

Okay, now that you understand the basics of the subject, let’s dig a little deeper into machine learning models, which is my current work context.

Some kinds of Machine Learning Models:

Binary classification
Multi-class classification
Regression

Binary classification

Binary classification models are used to classify data into two possible classes — hence the name “binary”. Some algorithms can be used for this, such as logistic regression, binary decision tree, and others.

Examples of binary classification problems:

“Is this product a shoe?” (Yes/No)
“Will it rain tomorrow?” (Yes/No)
“Is this vehicle a car or a motorcycle?” (Car/Motorcycle)

Multi-class classification

Basically the same idea as above, but with more than two possibilities for classification results. Some used algorithms for this model are k-nearest neighbors (Knn), Gaussian Naive Bayes and Decision Tree.

Examples of multi-class classification problems:

Classify product reviews as Positive/Neutral/Negative
Classify customer profile into different consumer personas
Classify animal species based on physical characteristics

Regression

Regression models are used to predict a real numeric value, not belonging to a bounded set like the previous ones. Here a number of algorithms can be used such as regression tree, linear regression and polynomial regression.

Examples of regression problems:

Temperature prediction in degrees
Sales-based turnover prediction
Stock value prediction based on history and correlations

How does it work ?

Cool, now that you know some of the most used ML models, you must be wondering, but how do these classification and predictions work ? How do they improve, and how can we know that the result is really reliable?

Very well, there are supervised and unsupervised models.

Supervised: In these models, the given data is already labeled correctly, and we use a portion of a dataset with the input data and the output to train that model. For example, we could train our model with a spreadsheet of data where each row is a set of characteristics of a plant, and one of the columns is the species. After training the model with a suitable algorithm for classification, we could now insert a new row of a discovered plant and receive a nearest class for given values

Unsupervised:In this case, the information does not have a label, and the algorithm is responsible for understanding the relationships of the variables and finding hidden patterns that can be used to cluster the data (divide into groups by similarity). For example, we can have a huge data lake of images, which are themselves numeric matrices, and cluster photos by similarity to detect duplicate or similar images, or even identify the content such as landscapes, people, cars, etc.

How do we know if our model is giving us the correct predictions?

For this there are several ways to extract some metrics from the models.

Some metrics used are:

Accuracy (hit rate of our model)
Precision (Quality of the positive prediction, calculated by the number of true positives divided by the sum of false positives and true positives)
Recall (Total number of true positive cases that were predicted correctly, calculated by the number of true positives divided by the sum of true positives and false negatives)

False positive and true positive, what does it mean ?

Our binary classification model can misclassify data when we are training, we can compare the prediction results with the real result that we already know during training. For this there is the confusion matrix, which expresses the classifications as:

True positive: When the model hits the value understood as 1
True negative: When the model hits the value understood as 0
False positive: When the model predicts 1 for a data that is 0
False negative: When the model predicts 0 for a data that is 1

With this we can draw a matrix to visualize, and calculate the metrics described above.

In the case of multi-class models it gets a little more complex, we won’t go into depth on this, but it follows the same idea, but for having a larger dimension the calculation gets more extensive.

It is also important to note that our data sets can be very large, and be composed of thousands of variables in order to make a prediction, and this data is ingested in different ways depending on the application. We can end up having an incorrect ingestion and send erroneous data to our models. This is why it is very important to MONITOR this data.

And that is exactly what we are building here at WhyLabs.

Very effective ways to make sure your models are performing as expected, enable you to create alerts and monitors that specialize in data types, format, value range, and more. We are also working on ways to trace data history and understand the root cause of your problems related to machine learning models and datasets.

It is important to remember that the market and people change over time, we can’t just create a model and believe that it will work forever, we can fall into a trap with this thinking.

Conclusion

Well… Now you already know the minimum about IA and ML to be able to go deeper. I leave here some links that helped in the writing of this mini-article and that can be useful for a better understanding.

https://c3.ai/glossary/data-science/mean-absolute-error/

https://www.ibm.com/cloud/blog/supervised-vs-unsupervised-learning

https://medium.com/data-science-in-your-pocket/calculating-precision-recall-for-multi-class-classification-9055931ee229

https://docs.aws.amazon.com/machine-learning/latest/dg/types-of-ml-models.html

Matheus Mendes — Linkedin

Frontend Software Engineer