MainProjectsTravel

Catboost instead of simple regression

|No pictures due to NDA|

One of the popular tasks solved inside Deloitte showcasing the solution using catboost library for a multiclass classification problem. This case is a common among data-driven retail businesses forecasting possible points of sale. In this version of a task, several datasets have been acquired. The first one shows the data about customers shopping at existing points of sales (PoS) in major airports across EU. The data consist of >110k rows each representing an interviewed person, his personal data and the information about the purchase made. For this post the data has been depersonalized, so it can be shown with the most of the features available.
Second dataset consists of ~120k rows with almost the same interview questions excluding the category of the purchase made. This is because these interviews were conducted in several airports of interest, where the client wants to consider opening a point of sale.
Above mentioned means that the dataset is identical to the previous one but excludes the information about purchases

The Task

The task is simple - help the client to predict based on the data which of the considering outside the EU airports can be taken for opening a new PoS.

The Approach

From a first sight, the task was considered as multiclass classification problem. Which means, that every category can be viewed as an average amount spent with a certain purchase. Through this, the closest to the optimal solution would be to predict the most profitable airport based on features provided in the first dataset and applied to a second one.