Data categorization with open banking: how does it work?

Jordi Soler

Jordi Soler Data Science Engineer


Data categorization with open banking: how does it work?

Understanding users’ spending habits using open banking is key to build successful financial apps. But in order to turn raw data into actionable insights, transactions need to be properly cleaned, organized, and enriched. 

Table of Contents

Digital banks, PFMs, and other fintechs building financial apps all have one clear need in common: understanding their users’ spending and saving habits to give them proactive advice to improve their financial well-being. 

Open banking technologies make this possible by giving companies a way to extract and interpret their users’ banking and financial information in a simple and secure manner. 

But access to raw financial data is not enough. 

Turn raw data into actionable insights

If you’ve ever looked at the details of your banking transactions, you’ll know that they don’t usually come in a standard and understandable format. Banks, users, and retailers name transactions using different systems, with irregular wording, and incomplete information. 

Faced with this situation, developers have to manually interpret and classify the information provided in the transaction labels themselves, which takes large amounts of time and resources.

In order to solve this problem, Belvo has built from scratch its own data categorization engine within its enrichment solutions. This product provides a layer of intelligence on top of raw data to help companies create features such as spend monitoring and budgeting tools

How does data categorization work?

Data categorization is the process by which we organize financial transactions into a set of defined groups, such as personal shopping or bills and utilities. 

We’ve established 14 of these categories based on the most common types of transactions that users perform and on the needs of companies building personal finance tools.

To do this, our model goes through the raw data contained in the labels descriptions to search for a series of patterns that we’ve previously identified using natural language processing (NLP) techniques. Every time a call to our API is performed to retrieve account data, transactions are assigned to one of these categories based on the patterns that match the description. 

These categories are also given a specific priority based on our rules. This helps us determine the correct category of a transaction in case the description matches more than one pattern. For example, when a transaction contains the words “Uber Eats”, it could automatically be included under “Transport & Travel”, but we’ve created a set of priority rules that helps us tag it correctly under “Food & Groceries”. 

Finally, the category of the transaction is returned to the customer as an additional field called ‘category’ for each transaction in the response.

Intelligence from millions of transactions

The key differentiator of having a data categorization engine powered by open banking –as compared to building an engine in-house–, is the amount and wide range of data that we use to feed our model. 

By using Belvo companies can access the intelligence that we’ve gathered from analyzing millions of users’ transactions from dozens of companies across different industries and countries. 

And our model is in constant evolution: every day we keep feeding and training our engine with the intelligence we gather from analyzing new account data coming from our growing customer base across Latin America. 

Thanks to this, the accuracy and coverage of our engine are constantly improving. Additionally, we perform periodical analysis by looking for new patterns in our database that help us boost and optimize our prediction capabilities. 

What can you build?

Our categorization product was built to satisfy the specific needs of PFMs of grouping their customers’ spending into categories to make it easier for them to provide efficient analytics on top of it. 

Brazilian PFM app Mobills uses Belvo to improve how they process financial data and how it’s displayed to their users.

By grouping expenses and incomes into categories, these companies are able to help their users with budgeting and offer proactive saving advice. 

One example is Mobills, a Brazilian personal finance app that is using Belvo to collect and process its users’ credit card financial information. Thanks to our engine, they are able to receive already categorized data through a simple and fluid integration with their app, making transactions ready to be displayed to their customers. 

This service is also particularly helpful for other fintech companies that are building more engaging and automated financial services, such as:  

  • Credit providers: to identify customers’ income and common expenses to assess the risk within their underwriting decisions. 
  • Accounting companies: to group expenses by categories to automate bookkeeping processes. 

If you want to learn more about our data categorization engine, get in touch with us! And if you’re building a PFM app, take a look at our dedicated guide for developers to help your users with their finances.

✍️ Jordi Soler is a Data Science Engineer at Belvo.



The best content about Open Finance, monthly in your inbox

We can’t wait to hear what you’re going to build

Belvo does not grant loans or ask for deposits