AI & Engineering · · 5 min read

The ROI of an AI Token Factory

Why owning an on-premise AI Token Factory enables 7.7x more token generation over APIs, cutting AI costs by 87%

The ROI of an AI Token Factory
Own your inference

Heidi Health approached Maincode to investigate a very topical conundrum: how could Heidi Health scale its use of public APIs without sacrificing profit? Heidi Health could foresee the dilemma that was unfolding; unless it found an alternative solution, the cost of ramping up token use would rapidly chew up any realised benefit from profit and growth.

Maincode partnered with Heidi Health to benchmark alternative solutions to the unsustainable economics of cloud-based AI inference. What was found is that by transitioning from public APIs to Maincode’s AI Token Factory - an on premise architecture powered by Maincode MC-X software on AMD MI355X GPUs - Heidi Health would see inference costs reduce by 87% while expanding AI application to 100% coverage of daily data load. As a result, for the same cost, Heidi Health could produce 7.7x more tokens compared to cloud API inference, enabling rapid experimentation with sustainable economies of scale.

The Heidi Health Story

Founded in Melbourne, Heidi Health was born out of a desire to eliminate the administrative burden that leads to clinician burnout. The team recognised that while medicine had advanced, the administrative workflow had not, forcing doctors to spend up to half their day on paperwork rather than patient care.

Heidi Health’s flagship product is an AI-powered medical scribe. It listens to consults, transcribes the dialogue, and autonomously generates compliant medical notes, referral letters, and documents. It distinguishes itself with "Heidi for Clinicians" - a tool that doesn't just transcribe, but leverage AI, to reason over the data to provide clinical context. Heidi Health has experienced explosive growth, raising significant capital (including a recent Series B of $100m AUD) to fuel global expansion. The platform is now active in over 50 countries including major adoption in Australia, the UK (NHS partnerships), and the US. Today, Heidi Health supports millions of patient consultations per week, positioning it as one of the fastest-growing generative AI companies in healthcare globally.

Challenge: The API Cost Barrier

As Heidi Health scaled to processing 500,000 doctors' notes per day, the unit economics of public AI APIs were fast becoming unsustainable.

Heidi Health needed to explore alternative solutions that would allow them to run high-reasoning AI models on every consult without high run-on costs.

Solution: An On-Premise AI Token Factory

An on-premise AI Token Factory means full ownership and operational control of the hardware and software stack used to run AI models. This shift moves model inference from an OpEx (operating expense) rental model to a CapEx (capital expense) ownership model, dramatically lowering the cost per token in the process.

We compared three deployment methods for a representative workload of 8,000 input tokens and 1,000 output tokens per request, simulating a typical example of summarisation or reasoning over a multi-page transcript:

  1. Public AI APIs (Azure, AWS, Fireworks, Mistral Platform)
  2. Nvidia DGX (On-premise, powered by Nvidia B200 GPUs)
  3. Maincode MC-X Software (On premise, powered by AMD MI355X GPUs)

Result: Cost Per Million Tokens Reduce By Up To 87%

The resulting numbers paint a clear picture: the cost per token drops by up to 87% using an on-premise AI token Factory. This calculation assumes amortisation of the on-premise solution over three years, and includes datacenter cost to ensure a realistic comparison.

To add, we compared the NVIDIA DGX cluster against our on-prem MC-X software layer running on AMD GPUs. We found that the solutions perform on-par for dense models (e.g. Mistral Large). However, for Mixture of Expert (MoE) models like DeepSeek R1 and GPT-OSS, MC-X further reduces cost by up to 44% compared to the NVIDIA DGX cluster. In other words, the MC-X cluster produces almost twice as many tokens at the same cost compared to NVIDIA DGX.

The ROI of an AI Token Factory

Running AI workloads on the Maincode MC-X solution fundamentally transforms Heidi Health’s unit economics. When deploying complex, reasoning-heavy workloads, a single MC-X cluster unlocks the capacity to process 100,000 multi-page documents per day.

This shift delivers immediate financial impact: monthly operating costs for this particular workload plummets from $77,254 AUD (via public API) to just $26,706 AUD (MC-X) in amortised hardware costs, allowing the hardware investment to fully pay for itself within just 9 months.

The efficiency gains are even more pronounced for less reasoning-heavy generative tasks. For lighter workloads, such as those running the GPT-OSS model, the MC-X compute performance enables processing the entirety of Heidi Health’s daily volume (500,000 multi-page documents) at only 17.8% utilisation of the system.

Crucially, this creates a massive "Free Capacity" advantage. Because the hardware is owned rather than rented, the remaining ~82% of compute power is available at no marginal cost. This surplus allows Heidi Health to experiment with new agents, fine-tune models, run deep background analytics, or scale to 2.5 million daily notes without spending an extra dollar on inference. Faster and cheaper experimentation is ultimately a key success driver for businesses exploring AI automation that actually makes a difference to the bottom line.

Beyond Hardware: The MC-X OS Advantage

MC-X is a turnkey software layer designed for high-throughput AI factories:

Deploying Maincode’s MC-X powered token factory successfully breaks the unsustainable relationship between user growth and operating costs for Heidi Health. With a single MC-X cluster, Heidi Health can move from processing 1% of their daily data load to 100% of it, while reducing their prime inference cost by 87%. This means 7.7 times more token output, and thus experimentation, at the same cost for the business.

For high-volume AI companies, this case study proves that owning a token factory can dramatically expand an organisation’s capacity for automation, experimentation and innovation.

Read next