Blogs / How to Prevent Mangled URLs in OpenAis LLM Chat Completions Api

How to Prevent Mangled URLs in OpenAis LLM Chat Completions Api

March 23, 2024 • Matthew Duong • AI,LLM,Startups • 2 min read

How to Prevent Mangled URLs in OpenAis LLM Chat Completions Api

Introduction: The Task

As a founding engineer at Truewind (YC 23) - AI powered accounting software, I was tasked with developing a proof of concept (POC) for a public company. The project involved creating a chat assistant that leverages a collection of callable tools, with a backbone powered by an LLM akin to OpenAI's ChatGPT API. The strength of this application lies in its ability to process a user's query, like “what is my liquidity ratio in 2023,” by sequentially executing several preset queries or tools and carrying out the necessary calculation. One such tool retrieves data from a database, saves it to an Excel file, and generates a signed URL (similar to S3) for user access. Truewind Ai logo

The Problem: Mangled URLs

However, I came across a recurring issue. The mangling of the signed URLs. For example a raw input of: https://s3.amazonaws.com/your-bucket-name/data.xlsx?AWSAccessKeyId=AKIAIOSFODNN7EXAMPLE&Expires=1715105758&Signature=%2Fv%3D1.0%2Fs%3D

Would become: https://example-s3amazonawscom/yourbucketname/dataxlsx The URLs returned were consistently mangled in three ways: stripping of query parameters (including signed data crucial for accessing the file), removal of any hyphens, and the prefixing of "example-" to the domain. In some cases all three.

AWS S3 Links Getting Broken

To address this, I experimented with several strategies:

Strategy 1

Strategy One: My initial approach was to provide a description around the tool, indicating that the response would be a signed URL and should not be altered. However, this strategy failed to influence the LLM's output as intended.

Strategy 2

Strategy Two: Next, I turned to URL shorteners, hoping their simplicity would prevent the LLM from altering the structure. This method saw moderate success; although the path remained intact, the LLM still appended "example-" to the domain and removed hyphens from domains that included them.

Strategy 3

Strategy Three: Combining a URL shortener with post-processing became my final strategy. Recognizing the consistent pattern in how URLs were mangled (especially the prefixing of "example-" to domains), I focused on serving files from a specific domain and performing a find-and-replace on the LLM's response. This approach proved effective, finally allowing me to circumvent the issue of mangled URLs.

The URL shortner I used is a self hosted open source solution called shlink .

Shlink Open Source Self Hosted URL Shortener

Conclusion

Next time you encounter URL mangling, a strategic mix of URL shortening and targeted post-processing might just be the key. Look for patterns in the alterations, like unwanted prefixes, and apply specific corrections to LLM responses. This proven approach could be your solution to maintaining URL integrity and navigating similar challenges with confidence.

Link Shortener Concept Art

© 2023-2024 Matthew Duong