Rushil Nagarsheth is a serial entrepreneur & Co-Founder/CTO of Hypercard , an AI-powered expense dashboard & credit card for businesses. As the CTO of Hypercard, I've learned firsthand that truly optimizing large language models (LLMs) goes far beyond basic prompt engineering. Deeply integrating LLMs into your business means tackling performance at multiple levels—from token efficiency to infrastructure resilience.
Here are some lessons we've learned along the way that have significantly enhanced the performance of our AI agents. Tokens directly affect both cost and response speed. Many engineering teams overlook token optimization, yet it’s one of the quickest ways to see performance gains.
Initially, we used lengthy prompts filled with extensive examples, believing this would ensure accuracy. However, we quickly realized that shorter, more direct prompts significantly reduced latency and hallucinations. We implemented token-trimming techniques, systematically removing redundant or irrelevant prompt data.
This led to substantial cost savings, faster response times and more accurate outputs. Always ask: "Can the prompt be shorter without sacrificing quality?" Our iterative testing consistently proved that concise, precise prompts outperform verbose counterparts. As a SaaS B2B provider serving enterprise customers, uptime is critical.
To ensure near-100% availability, we've adopted a strategy of integrating multiple fallback models powered by a unified API platform. If our primary LLM encounters downtime or latency issues, our system automatically and seamlessly switches to alternative models available through the platform. This redundancy is essential for consistently meeting enterprise SLAs and maintaining customer trust.
The platform also makes it easy for us to swap out and use different models for different tasks. Staying nimble and regularly evaluating emerging models has allowed us to continuously leverage improvements in model performance and capabilities. Early on, we realized LLMs handle structured data far better than unstructured text.
By shifting from a plain-text context to structured JSON, our models interpreted prompts more accurately, drastically reducing errors and improving consistency. This transition significantly elevated our internal workflows and user satisfaction by providing clearer and more precise responses. We also built an internal suite of comprehensive test cases and tools designed specifically to measure and compare prompt performance iteratively.
By systematically analyzing false positives and negatives, we could clearly identify improvements or regressions after each prompt adjustment. Given our domain—expense management—accuracy is crucial. Our AI prioritizes avoiding false positives (incorrect approvals or payments) even if it means accepting more false negatives, which is essential for financial integrity and user trust.
A critical insight we’ve gained at Hypercard is the necessity of intentional AI bias—especially in sensitive workflows involving financial transactions. We purposefully engineered our prompts to strongly avoid false positives at the expense of occasional false negatives. Every CTO must understand and explicitly define these trade-offs early, ensuring your AI aligns precisely with your business’s strategic priorities.
For CTOs serious about leveraging LLMs, prompt engineering is merely the first step. Real optimization involves concise token management, strategic model redundancy, structured data integration, rigorous iterative testing and clearly defined biases that match your business objectives. At Hypercard, these principles turned generative AI into a competitive advantage, transforming our technology into a strategic asset that delivers measurable, ongoing business value.
It took us several months to perfect this process, but today, we handle over a million agentic workflows in production. Continuous learning from real-world client interactions has been invaluable, allowing us to refine our strategies based on actual usage patterns and outcomes. Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives.
Do I qualify?.
Technology
Beyond Prompt Engineering: How CTOs Can Optimize LLMs For Maximum Impact

Deeply integrating LLMs into your business means tackling performance at multiple levels—from token efficiency to infrastructure resilience.