Tokenization

Tokenization has two distinct but converging meanings in modern technology: in AI, breaking text into sub-word units that language models process; in finance and blockchain, representing real-world assets as digital tokens on a distributed ledger.

In AI, tokenization is the first step in how language models understand text. A tokenizer splits input into tokens—typically sub-word fragments—that the model processes as discrete units. GPT-4's tokenizer uses roughly 100,000 tokens covering most languages. Token count directly determines processing cost (pricing is per-million tokens) and context window consumption. As AI inference costs have dropped 92% in three years—from $30 to $0.10-2.50 per million tokens—the economics of tokenization have become central to AI business models.

In finance, tokenization represents the conversion of real-world assets (real estate, securities, commodities, intellectual property) into digital tokens on blockchain networks. The real-world asset (RWA) tokenization market has crossed $15 billion and is growing rapidly. BlackRock's BUIDL fund tokenized US Treasury holdings on Ethereum. JPMorgan's Onyx platform processes billions in tokenized transactions daily. The Boston Consulting Group projects the tokenized asset market could reach $16 trillion by 2030.

The convergence is subtle but significant. Both forms of tokenization are about making complex things composable: AI tokenization makes language computable; financial tokenization makes assets programmable. Both enable composability at their respective layers—AI tokens compose into reasoning, financial tokens compose into complex instruments. In the agentic web, these converge: AI agents that understand language tokens can also transact in financial tokens, creating unified digital systems that reason and trade.

Tokenization

Related Topics

Further Reading