contents
Blogs / How to Process 10K Bank Transactions Through an LLM
April 14, 2024 • Matthew Duong • Self Hosting; • 3 min read
LLMs like GPT have changed how we think about AI, providing insights and generating text based on the context provided by extensive training data. However, these models have a limitation known as the "context window". This can be thought of as “working memory”. This article explores the significance of context windows, their limitations, and strategies to overcome these constraints.
The context window of an LLM is the amount of text it can "remember" or consider when generating a response. This limit is measured in “tokens”. In the english language these are equivalent units:
The latest GPT-4 can consider significantly more tokens at once than its predecessors. Detailed context window sizes for various models are available on platforms like OpenAI's model guide .
Imagine you want to analyze a document of 500 words, but your model's context window can only accommodate 250 words at a time. In such cases, only the latter half of the document remains in the model’s context, meaning any analysis or response generated will only have knowledge on the second half. This can result in incomplete understanding and responses that might seem out of context.
Imagine you're cooking a complex recipe but can only remember a few steps and ingredients at a time. Like a limited context window, this might cause you to forget key spices or miss important steps, potentially leading to a less successful dish.
Another analogy of what the effect of a smaller context window is like is trying to compose a coherent text message on your phone while under the influence of alcohol. Often, when you revisit what you've written the next morning, you might find the sentences disjointed and lacking in cohesion.
At a high level the goal of overcoming context window limitations is to reduce the number of tokens required to convey the exact same message and meaning. There are two benefits:
In my day job we are developing an assistant for a client that interfaces with a database of financial records. This assistant can access and "call" tools, which in this scenario, involve database queries to fetch data. A typical query might involve asking for the "evolution of accounts in 2023," where the assistant would display financial data in a graph format.
However the data comes back in the form of transactions like this:
ID | Date | Description | Debit | Credit |
---|---|---|---|---|
0001 | 2024-04-01 | Opening Balance | 0.00 | 0.00 |
0002 | 2024-04-02 | Purchase of office supplies | 150.00 | 0.00 |
0003 | 2024-04-03 | Client Invoice #123 | 0.00 | 2,000.00 |
0004 | 2024-04-04 | Bank Fee | 15.00 | 0.00 |
0005 | 2024-04-05 | Rent Payment | 1,200.00 | 0.00 |
0006 | 2024-04-06 | Sale of Product XYZ | 0.00 | 1,500.00 |
0007 | 2024-04-07 | Coffee for office | 45.00 | 0.00 |
0008 | 2024-04-08 | Internet bill | 100.00 | 0.00 |
0009 | 2024-04-09 | Transfer to savings | 500.00 | 0.00 |
... | ... | ... | ... | ... |
9999 | 2024-12-28 | Client Invoice #124 | 0.00 | 750.00 |
You can imagine this table extending out to hundreds and thousands of transactions, well over any context window limit. To manage data efficiently, the assistant uses a data aggregation pipeline that aggregates account balances into annual summaries with growth rates, significantly reducing the volume of data processed. The result of such an aggregation looks like the following:
Month | Balance (USD) | MoM Growth Rate (%) |
---|---|---|
January | 10,000.00 | - |
February | 12,345.67 | 23.46 |
March | 9,876.54 | -19.99 |
April | 13,210.78 | 33.79 |
May | 10,005.89 | -24.27 |
June | 14,567.12 | 45.59 |
July | 11,234.56 | -22.85 |
August | 15,000.33 | 33.61 |
September | 12,250.00 | -18.34 |
October | 16,789.45 | 37.06 |
November | 13,555.55 | -19.29 |
December | 18,000.99 | 32.85 |
In my opinion, the issue of context window limits in large language models is a temporary technical limitation. Drawing a parallel to the early days of computing where memory was scarce, I see a similar trajectory with context windows. As technology advances, these limits are likely to expand significantly, making current optimization concerns for context windows less relevant over time.
Similarly, the evolution of programming practices offers a relevant analogy. Early programmers had to optimize code to fit within strict CPU and memory limits due to hardware constraints. Today, with substantial improvements in hardware, these constraints are less pressing. Likewise, as context windows in LLMs expand, the focus on optimizing within these limits will likely decrease, allowing engineers to focus on the business problems.