Business news

From bare metal leasing to Token Factory: How SOLOSTM is revolutionizing AI computing operations

Maximize Token output from each GPU

In November 2022, with the release of ChatGPT, the global artificial intelligence industry officially entered the era of large-scale models.

Over the past two years, large-scale intelligent computing centers have been rapidly established from China to the Middle East, and from North America to Southeast Asia. Millions of GPUs have been deployed in data centers, with trillions of yuan invested in AI infrastructure development.

However, a new issue is gradually emerging:

There is ample computing power available, but truly stable, efficient, operable, and profitable computing resources remain scarce.

Under the traditional model, intelligent computing centers primarily operate using a “bare metal leasing” approach. Customers purchase GPU cards, server nodes, or cabinet resources, while operators generate revenue by leasing the hardware.

This approach proved effective in the era of cloud computing, but has increasingly revealed limitations in the age of large-scale models.

For end users, what they truly need is not the GPU itself, but the model training results and inference capabilities; for intelligent computing centers, the real value lies not in the GPU itself, but in the Tokens it ultimately generates.

The AI industry is undergoing a profound transformation:

From selling GPUs to selling Tokens; from computing power leasing to operating Token Factories.

The computing power industry is entering the era of the token economy.

If the GPU is the machine in the factory, then the Token is the product it produces.

Today’s large-scale model inference service is essentially a token generation process:

  • User input prompt;
  • The model has started calculating;
  • The GPU performs inference;
  • Final output Token;

Therefore, the key indicator for measuring the operational performance of a smart computing center is no longer:

  • Number of GPUs;
  • Number of PFlops deployed;
  • How many cabinets have been built?

Instead:

  • Number of Tokens generated per second (TPS);
  • Number of Tokens generated per minute (TPM);
  • The number of Tokens sold annually;
  • Cost per million Tokens;
  • The daily Token value generated per GPU;

Against this backdrop, leading global AI companies have begun progressively adopting Tokens as their operational and billing units.

From OpenAI and Anthropic to Google Gemini, and further to major model providers like DeepSeek and Qwen, their business models are shifting from GPU resource leasing to Token-based operations.

The token has become a new “digital commodity” in the era of AI.

The intelligent computing center is also evolving into:

New Challenges in the Era of Token Factories

When computing power centers become Token factories, new problems arise.

The same 1000 GPU clusters:

Why can some clusters generate 300 million Tokens daily?

And some clusters can only produce 100 million Tokens?

Why choose the H100 or B200 model instead?

Some companies can generate profits quickly.

Yet some companies have been operating at a loss for a long time?

The root cause of the problem is:

The number of GPUs does not equate to token production capacity.

The factors influencing Token production extend far beyond the hardware itself.

for instance :

The above factors collectively determine:

How much Token value can a single GPU ultimately generate?

Therefore, AI computing power investors require a new business methodology.

SOLOSTM proposed the TEF: Token Efficiency Factor.

To quantify token production efficiency, SOLOSTM was the first to propose:

TEF(Token Efficiency Factor)

That is:

Token Efficiency Factor.

It is defined as:

Actual Token Output ÷ Theoretical Token Output.

in simple terms :

TEF measures the efficiency with which GPU resources are converted into Token output.

for instance :

A cluster theory can achieve the following:

700,000 TPS

Actual operation achieved only:

210,000 TPS

like that :

TEF = 30%

It means that :

Seventy percent of the potential production capacity has not been converted into tangible business value.

Through extensive industry practice,SOLOSTM has identified the following insights:

The TEF (Total Effective Factor) of many intelligent computing centers ranges only between 20% and 35%.

SOLOSTM lies precisely in helping customers restore that lost 70% of production capacity.

It means that :

A large number of GPU resources are idle, waiting, blocked, or operating inefficiently.

Through lean scheduling, intelligent operation and maintenance, and end-to-end optimization, TEF can often be increased to over 60%.

This not only means improved performance,

It also means an increase in income.

Some articles report that GPU efficiency can be improved to over 90%, which typically refers to test results on a single server with one or multiple SXM GPUs. Due to the training and inference characteristics of NLP LLM models, in a GPU cluster environment constrained by network and storage bandwidth, the larger the cluster size, the lower the average MFU efficiency per GPU card becomes. For detailed reasons, refer to ByteDance’s 2024 paper “MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs,” which provides an in-depth analysis of how GPU MFU efficiency decreases linearly with cluster size and how to maintain cluster stability.

From TEF to TFI: The Performance Indicator System for Token Factories

In the manufacturing industry, OEE is used to measure factory efficiency.

During the Token Factory era, SOLOSTM further proposed:

TFI(Token Factory Index)

Token Industry Index.

The calculation method is:

TFI = TEF × SLA × Sell Through

among :

TEF: Token production efficiency;

SLA: Service Availability Rate;

Sell Through: Commercializable availability rate;

TFI reflects:

The ultimate capacity of a Token factory to convert theoretical production capacity into actual revenue.

for instance :

The above operational efficiency improvement factor is only an estimate; actual performance depends on the specific operating conditions of the GPU cluster.

TFOM:Token Factory Operating Model

To assist clients in transitioning from GPU hardware management to Token management,

SOLOSTM further develops its capabilities:

TFOM(Token Factory Operating Model)

Token Factory Business Model.

TFOM views its intelligent computing center as a digital factory.

The core model link is:

pass through TFOM,

Customers can see:

“Number of GPUs”,

You can also see:

How much revenue and profit have these GPUs actually generated?

TFOM has achieved this for the first time:

The integration of technical indicators and financial indicators.

Let:

TTFT、TPS、TPM

take part in :

Revenue、EBITDA、ROI

Establish a direct association.

This is the primary concern for financial investors in computing power centers: an excellent TFOM operational model not only instills absolute confidence but also demonstrates investors’ willingness to allocate additional capital.

Make 1,000 GPUs deliver the performance of 2,167 GPUs.

In actual projects,

HaHa frequently employs the concept of “Equivalent GPU”.

That is: the optimized number of equivalent GPUs.

for instance :

1,000 B200 clusters

The TEF has increased from 30% to 65%.

like that :

Equivalent GPU

Approximately 2,167 GPUs

in other words :

Customers no longer need to purchase an additional 1,167 GPUs.

You will obtain the same Token production capacity.

It means that :

savings of capital expenditures amounting to hundreds of millions of yuan.

SOLOSTM defines it as:

Avoided GPU CapEx

That is:

Avoid additional GPU investments.

For investors in large-scale intelligent computing centers,

In the era of token operations, this not only translates to higher ROI returns than competitors with comparable capital investments, but also grants investors greater flexibility in investment decisions within the global token market—especially given the current scarcity of high-end GPU servers—allowing them to more precisely time and scale their investments.

From computing power centers to Token factories

The development of the AI industry is entering a new phase.

The core of future competition lies in…

It’s no longer about who owns the most GPUs.

Instead, it’s about who can:

At the lowest cost,

Produce the maximum number of tokens

Create maximum profit.

It means that :

The operational philosophy of intelligent computing centers is shifting from an infrastructure-oriented approach.

Shift your mindset towards manufacturing.

The GPU is no longer just a device.

They are not means of production, but rather means of production itself.

Tokens are no longer just model outputs.

It’s a digital product.

And SOLOSTM is doing this through:

  • TEF (Token Efficiency Factor) ← The optimization capability of SOLOSTM
  • TFI (Token Factory Index) ← Operational capabilities of smart computing centers
  • TFOM (Token Factory Operating Model) ← The operational capability of a token economy

Develop a new generation of business methodologies for the Token Factory era.

Maximize Token output from every GPU.

Let every Token generate real returns.

Drive the global intelligent computing industry to transition from the “era of bulk computing power” to the “era of refined token management”.

The above content is sourced from SOLOSTM – AI Engineering Research Center.

All illustrations were generated by Gemini.

Comments

TechBullion

FinTech News and Information

Copyright © 2026 TechBullion. All Rights Reserved.

To Top

Pin It on Pinterest

Share This