How to Process 10K Bank Transactions Through an LLM

April 14, 2024 • Matthew Duong • Self Hosting; • 3 min read

What are LLM Context Limits and How to Overcome Them?

LLMs like GPT have changed how we think about AI, providing insights and generating text based on the context provided by extensive training data. However, these models have a limitation known as the "context window". This can be thought of as “working memory”. This article explores the significance of context windows, their limitations, and strategies to overcome these constraints.

What is the Context Window?

The context window of an LLM is the amount of text it can "remember" or consider when generating a response. This limit is measured in “tokens”. In the english language these are equivalent units:

1 token ≈ 4 characters
1 token ≈ 0.75 words
100 tokens ≈ 75 words

The latest GPT-4 can consider significantly more tokens at once than its predecessors. Detailed context window sizes for various models are available on platforms like OpenAI's model guide .

Why is the Context Window Important?

Imagine you want to analyze a document of 500 words, but your model's context window can only accommodate 250 words at a time. In such cases, only the latter half of the document remains in the model’s context, meaning any analysis or response generated will only have knowledge on the second half. This can result in incomplete understanding and responses that might seem out of context.

Imagine you're cooking a complex recipe but can only remember a few steps and ingredients at a time. Like a limited context window, this might cause you to forget key spices or miss important steps, potentially leading to a less successful dish.

Another analogy of what the effect of a smaller context window is like is trying to compose a coherent text message on your phone while under the influence of alcohol. Often, when you revisit what you've written the next morning, you might find the sentences disjointed and lacking in cohesion.

How do you pack more into your Context Window?

At a high level the goal of overcoming context window limitations is to reduce the number of tokens required to convey the exact same message and meaning. There are two benefits:

Preserve context on a large amount of data or text.
Reduce API charges since most LLMs charge on a per token basis.

Data Compression

In my day job we are developing an assistant for a client that interfaces with a database of financial records. This assistant can access and "call" tools, which in this scenario, involve database queries to fetch data. A typical query might involve asking for the "evolution of accounts in 2023," where the assistant would display financial data in a graph format.

However the data comes back in the form of transactions like this:

ID	Date	Description	Debit	Credit
0001	2024-04-01	Opening Balance	0.00	0.00
0002	2024-04-02	Purchase of office supplies	150.00	0.00
0003	2024-04-03	Client Invoice #123	0.00	2,000.00
0004	2024-04-04	Bank Fee	15.00	0.00
0005	2024-04-05	Rent Payment	1,200.00	0.00
0006	2024-04-06	Sale of Product XYZ	0.00	1,500.00
0007	2024-04-07	Coffee for office	45.00	0.00
0008	2024-04-08	Internet bill	100.00	0.00
0009	2024-04-09	Transfer to savings	500.00	0.00
...	...	...	...	...
9999	2024-12-28	Client Invoice #124	0.00	750.00

You can imagine this table extending out to hundreds and thousands of transactions, well over any context window limit. To manage data efficiently, the assistant uses a data aggregation pipeline that aggregates account balances into annual summaries with growth rates, significantly reducing the volume of data processed. The result of such an aggregation looks like the following:

Month	Balance (USD)	MoM Growth Rate (%)
January	10,000.00	-
February	12,345.67	23.46
March	9,876.54	-19.99
April	13,210.78	33.79
May	10,005.89	-24.27
June	14,567.12	45.59
July	11,234.56	-22.85
August	15,000.33	33.61
September	12,250.00	-18.34
October	16,789.45	37.06
November	13,555.55	-19.29
December	18,000.99	32.85

The Future of Context Windows

In my opinion, the issue of context window limits in large language models is a temporary technical limitation. Drawing a parallel to the early days of computing where memory was scarce, I see a similar trajectory with context windows. As technology advances, these limits are likely to expand significantly, making current optimization concerns for context windows less relevant over time.

Similarly, the evolution of programming practices offers a relevant analogy. Early programmers had to optimize code to fit within strict CPU and memory limits due to hardware constraints. Today, with substantial improvements in hardware, these constraints are less pressing. Likewise, as context windows in LLMs expand, the focus on optimizing within these limits will likely decrease, allowing engineers to focus on the business problems.

Matthew Duong's blog