The art of data scientists is explained by Plato's Cave

The art of data scientists is explained by Plato's Cave

This article provides further explanation of this article on what is the parallelism between Machine Learning and the Allegory of the cave.

Every data scientist knows that to train a Machine Learning (ML)model whose input is one table and we have several tables, we have to consolidate the tables into one table whose columns are called data analytics. These columns collect information that the ML model employs to predict. The quality of the prediction depends on the quality of its data analytics.  

The creation of data analytics is the art of any data scientist. There is no general rule on how to create analytics since its creation is ad-hoc. 

Where did the ad-hoc columns come from?

The explanation can be found in Plato's Cave. Plato in the voice of Socrates explained that we do not know the direct reality of the things by our sense and he compared us to a group of people who have lived chained to the wall of a cave all of their lives, facing a blank wall. The people watch shadows projected on the wall from objects passing in front of a fire behind them and give names to these shadows. The shadows are the prisoners' reality but are not accurate representations of the real world (Plato's ideals).

In a machine learning project, every data set can be seen as a shadow of the real objects that we want to model. Each real object is incommensurable since it contains all information pieces about the real object. We can see each real object as an infinite vector (v1, v2, v3, ....) of values. For example, consider the object 'customer'. We can create potentially infinite data analytics to capture customer behavior but only a reduced set of data analytics represent well the behavior of customers.   

Since there are potentially infinite may customers, we can see the real concept of 'customer' as an infinite table living in the Platonic World. The left side of the above figure shows an infinite table representing one real object. Thus the Platonic World can be seen as a database with infinite many infinite tables, where each infinite table corresponds to one real object. 

The art of data scientist is to navigate in the platonic world using her intuition, knowledge, or experience in order to select the columns (data analytics) to capture the behavior that we want to model. The selection of columns is a projection from the infinite table to a finite table.

For example, suppose that we are making a machine learning model predict the number of deaths in countries due to a new virus propagation. Hence the objects to the model are countries and their deaths by virus propagation. Countries are complex objects in which we can potentially infinite many perspectives to measure countries. The World Development Indicators data set in Kaggle contains more than 1500 countries' indicators among of them we can find indicators that would be useful in our project as  GNP per capita, Unemployment rate, how many medical doctors per 100 thousand inhabitants, and so on and we can create potentially infinite many more indicators that we think that are good predictors for the virus propagation.

From a mathematical point of view, the creation of data analytics can be seen as the embedding of the raw tables that we have into an infinite table and the data analytics selection as the projection of the infinite table into a finite table. This process is shown on the right side of the above figure. 

The projected table thay feds the ML algorithm is a facet that we did not know before of the real object and the facet can be seen as the result of rotation and projection of the infinite table onto a table with a finite number of columns. Thus the creation of new data analytics from several data sets can be seen as the rotation and projection of Platonic Objects.
Juan Javier Brito Grandes

Senior Risk Management Consultant presso Planetica Antiriciclaggio & Compliance KYC , AML Siron e Comunicazioni oggettive

3y

excelente articulo

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics