Introduction
We are excited to sit down with Alexey Khoroshilov, Senior Data Scientist at UnaFinancial. With over six years of experience in banking and fintech, Alexey has a proven track record in developing and implementing credit risk models that significantly enhance business outcomes. From leading the migration from SAS to Python environments in major banks to boosting company performance through innovative model development, his expertise in risk analytics, modeling, and validation has made a notable impact in the industry. Join us as we dive into his journey and explore his insights on data science in the financial sector.
Given your extensive experience in banking risk analytics, how do you approach developing PD (Probability of Default) models for different segments like individual entrepreneurs and small businesses? What are the key challenges you’ve encountered?
Developing models is a complex and multi-stage process. First, every development starts with understanding the business objectives. We need to clearly define the business segment the model will serve. Second, we need to understand the desired impact, which allows us to formulate the final goal. The goal could be to increase the approval rate while maintaining the default rate, maintain the approval rate while decreasing the default rate, reduce the volatility of predictions compared to the old model, comply with new regulatory requirements, and so on.
Let’s consider the process of working with a PD model. Once we’ve defined the goal, we need to review the regulations and determine how the law defines the default of an entity or individual. In most countries, the definition consists of two parts. The first part is quite straightforward—typically, it’s a delay of more than 90 days. The second part is more complicated, as the description is often vague and might state something like: “due to the inability to fulfill obligations under the loan,” listing some reasons with a note that “credit organizations may independently establish a definition of default, provided it does not contradict the law.”
There are two scenarios here. The first is that the financial institution already has a comprehensive regulatory document for each customer segment, approved by the country’s central bank. There are also validated mechanisms for calculating delinquencies, entering defaults, and customer recovery. The second scenario is that all these processes are still in the formation stage within the organization. If we consider the more complex scenario (the second one), the task of modeling PD should begin with coordinating the logic for assembling model targets with the relevant business and risk units, then moving on to documentation and approval of the main operational algorithms.
Now let’s talk about the differences between segments. For individual entrepreneurs and small businesses, the internal definition of default may differ. For example, the approach to default for individual entrepreneurs is similar to that for retail customers (ordinary individuals): more emphasis is placed on the borrower’s individual problems rather than on issues related to associated entities.
For small businesses, in addition to individual problems, the issues of related entities are also significant. This introduces the concept of cross-default, which is a situation where the default of related companies causes the primary entity to default.
What is another fundamental difference between segments? The format of reporting. For example, in Russia, individual entrepreneurs do not submit financial statements in the same format as legal entities. Because of this, in models for individual entrepreneurs, we shift the focus from financial components to credit history or mobile operator data. Meanwhile, for the micro-segment, financial analysis carries a weight comparable to credit history.
You’ve implemented monitoring processes for model performance. Can you elaborate on your approach to model validation and the key metrics you use to ensure model reliability over time?
Previously, many financial institutions did not have daily monitoring of models for either retail or corporate segments. In retail, this was mainly because there weren’t many data sources, and internal systems underwent rigorous quality control.
Instead of daily monitoring, there was a validation report after development, a validation report after implementation, and then quarterly and annual performance reports for the models. However, over time, more and more additional sources were connected, variables became more complex and combined (part of a variable came from one source, and another part from another), and new risks of poor-quality data began to emerge due to these processes.
This raised the issue of data quality control, and simultaneously, it became possible to start monitoring model performance on a continuous daily basis. Unlike the quarterly validation report, which consists of several dozen pages, the daily monitoring includes only the main triggers that can indicate the current state of the model. This includes monitoring missing sources, checking the convergence of predictions in production and test environments (for this, we deployed a test environment to simulate models without loading production systems). In addition to standard metrics such as Gini and stability indicators like PSI for the model/external scores/variables, we tracked how default rates changed over time in model/external score/variable bins, monitored calibrations over time, and, of course, watched early triggers that could predict delinquencies.
Having led the development of Application Fraud models, what machine learning techniques have you found most effective for fraud detection in the banking sector?
Application fraud, which I dealt with, is a broader concept compared to typical classic fraud. Classic fraud usually involves passport fraud, identity theft for taking out loans, or manipulation with dummy clients. In this case, it refers to earlier exits into delinquency than social defaults. For predicting social defaults, the difference between the results of advanced machine learning methods (boosting, forests, neural networks) and classic logistic regression is usually small on the test sample. If the difference is significant, it is almost always possible to adjust the features for logistic regression to reduce the quality gap to an acceptable level.
However, for application fraud, which also includes classic fraud, this is not quite the case. When using neural networks and appropriate data preprocessing, I achieved a substantial uplift (from 3 to 5 points in Gini on the test) compared to logistic regressions.
Could you explain your role in constructing the Credit Conveyor for corporate clients? What were the main objectives and challenges in this project?
In building the credit conveyor, I was responsible for the decision-making block. This is the part of the conveyor where customer rating models are embedded. The main task was to bring the entire process to the highest possible level of automation. For large businesses, this can be considered an unachievable goal. However, for small and medium-sized enterprises, a high level of automation in rating is quite achievable. I proved this in my project. Naturally, there were many difficulties. To begin with, we had to redo all the rating methodologies for most customer segments. Processes like manual rating adjustments and checks, where it was not possible to connect sources, had to be replaced.
Problems also affected reporting: the formats of reports provided by external sources differed from those adopted in pre-conveyor methods. These changes had to be considered in calculating financial variables during modeling. A separate discussion could be dedicated to transitioning to new formats for corporate credit histories to automate the process of determining credit quality. To achieve all this, we worked 100 hours a week for a year. In the end, we achieved the desired results, and we can confidently say that the outcome was worth it.
You led the migration from SAS to the Python modeling environment. What were the main benefits and challenges of this transition? How did you manage the process to ensure smooth adoption?
During the transition to the credit conveyor for corporate clients, we faced a choice—stay on the SAS infrastructure or look for a more flexible modeling tool. As an alternative, we were choosing between R and Python. I had previously worked with Python when I was building income prediction models in retail. Moreover, many of my team members were already familiar with Python, so we chose it.
Then the process of transitioning from SEMMA (a methodology implemented in the SAS Data Mining Solution) to CRISP-DM (Cross-Industry Standard Process for Data Mining) began. At that time, Python already had reliable open-source packages for machine learning, so we had to write the algorithms themselves to a minimum. Additionally, we needed to build pipelines for modeling, validation, and monitoring tasks. A challenging task was rewriting all the internal data processing scripts from one language to another. Scripts analyzing credit history had non-standard algorithms for working with payment lines. Overall, the transition was smooth. For a while, we even built models simultaneously in SAS and Python to compare results under the supervision of the validation department visually.
Can you describe your approach to macroeconomic modeling for credit risk, particularly in the context of stress testing credit portfolios?
First, linear risk models are built on monthly portfolio metrics by customer segment, depending on the country’s macroeconomic indicators. It sounds complicated, but these are some of the simplest models in credit risk. For example, if we need a macro model to predict portfolio LGD, we take historical average monthly LGD metrics for the portfolio—normalized values from 0 to 1 (or percentages from 0 to 100%)—and monthly macroeconomic indicators of the country, such as GDP, unemployment, exchange rate, cost of the grocery basket, etc. The model’s target uses historical LGD metrics, and variables are all sorts of transformations, ratios, and combinations of available macroeconomic indicators. Typically, up to three variables are selected during model training that most accurately describe the portfolio LGD dependence.
After developing the model, as part of stress testing, the financial institution makes its own forecasts for macro parameters or requests them from the regulator. In any case, three forecast scenarios are formulated: favorable, baseline, and adverse. Then, all three forecasts are run through the developed models. The final metrics are used for further calculations in stress testing.
At UnaFinancial, you mentioned achieving a 10% increase in AR without increasing portfolio risk. How do you balance risk management with business growth objectives in your modeling approaches?
In this case, achieving a significant uplift in approval rates without exceeding risk indicators was made possible by a new and higher-quality PD model. I advocate for a complex logic in developing variables for models, especially when using data from credit reports. Generating such variables takes much more time: you need to dive deep into the data, study the source documentation, build logic based on segment specifics, and often review on a case-by-case basis, rather than relying on general ideas about classic scoring variables.
In addition to building a quality model, you need to solve an optimization problem—determine the upper threshold (cutoff) for the predicted PD for the incoming flow. This can be done in several ways. In fintech companies, unlike banks, a business approach is mainly used—optimizing the profit function. The cutoff is set at the boundary between two buckets: profitable and unprofitable. Banks, however, are more conservative, and planned risk indicators are important to them. The cutoff is chosen based on these considerations, even though clients beyond the cutoff can also be profitable. The difference in options again brings us to the point that the main thing is to decide what policy the organization pursues: conservative or focused on business growth, and act according to this plan.
Given your experience with various modeling techniques, including neural networks, how do you see the role of advanced machine learning evolving in financial risk assessment over the next 5 years?
Many processes in the world are becoming more complex, and financial systems are no exception. A few years ago, neural networks and boosting were not used so extensively in banks for solving tasks. I do not rule out that in the foreseeable future, regulators will allow the use of advanced machine learning methods, including for building regulatory models. Currently, the problem is that neural networks are considered a black box with special dependencies that are difficult to interpret for regulators. However, there are already special tools for partial interpretation and visualization. Examples include SHAP or LIME. But even they do not allow establishing transparent dependencies between variables and predictors. It is quite possible that additional tools will come to their aid, allowing arguments to be made to explain the results of advanced machine learning for client classification tasks and transform them into language understandable to regulators.
In your experience across different banks and fintech companies, how do you approach data quality issues and preprocessing for model development? What are some common pitfalls you’ve encountered?
When a company has a full process set up for receiving, validating, and organizing all client data into relational databases, data processing for model development happens quickly. However, even in large companies, the process of organizing and validating the complete data set is often not streamlined. Sometimes, it simply doesn’t make sense to maintain hundreds of tables. So usually, only a small part is processed and maintained—just what is currently used in production, while the rest is stored in an unstructured format. However, in the modeling process, the more data processed, the higher the chance of finding the necessary variables for a better-quality model.
Often, it’s necessary to delve into analyzing incoming text files independently. Usually, these are XML formats that need to be parsed, structured, keyed between tables, and brought to the correct format. If there is a lot of data, this parsing will take a lot of time. It’s better to randomly select a small batch of data and perform all procedures on this part. After laying out a separate chunk of data, checking for missing data, and manually analyzing a dozen cases, it’s advisable to conduct unit tests. With a large amount of data, potential pitfalls can be corrupted XML files, errors within fields, incorrect data, incorrect formats, incorrect signs, and so on. Only after thorough checking of the sub-sample does it make sense to run the calculation on the entire sample. Otherwise, you will have to recalculate many times, which will take much time.
The next step is data preprocessing for building model variables. This involves handling missing data, inconsistent values, outliers, checking distributions, deduplication of data, and calculating statistics. Errors are investigated and corrected. If errors cannot be corrected, this fact is recorded in the development document, and such data is not used in building model features.
As AI and machine learning become more prevalent in financial decision-making, how do you address potential biases and ensure ethical considerations are incorporated into your models?
I believe that AI and machine learning enable the creation of higher-quality products. They do not have subjectivity and cannot discriminate against borrowers, unlike people who are tasked with making decisions. They can make the process more transparent. Transparency, in turn, attracts customers, which is beneficial for the financial organization in a highly competitive environment.
What else is an advantage? Imagine you are a client who has been denied a loan. You have the right to ask the financial institution for the reason for the denial. AI and machine learning help accurately determine the so-called “pain point” on which the decision was based. There are even financial institutions for such purposes that help borrowers increase their creditworthiness and improve their chances of getting subsequent loans.
So AI and ML can definitely be considered a key not only to the economic stability of organizations but also to a fair assessment of borrowers. Hopefully, these tools will be implemented in all financial companies sooner or later.