Layer 5: Knowledge & Substrate — The Agentic Economy

Layer 5 of 7 — The Agentic Economy

Knowledge & Substrate

Training data and ML engineering tools

L1Agents

L2Creation

L3Platforms

L4Models

L5Knowledge

L6Compute

L7Physical

The knowledge repositories, datasets, and engineering toolchains that foundation models are built from.

companies

subcategories

Data

Assets → Toolchain

Companies in this layer

Knowledge / Data Assets

arXivCommon CrawlGetty ImagesGitHubRedditShutterstockStack OverflowWikipediaXYouTube

Model-Building Toolchain

Anyscale (Ray)CUDAInferact (vLLM)JAXPyTorchW&B

View Full Market Map →

Layer 5: Knowledge & Substrate is the data and tooling layer of the Agentic Economy — the knowledge repositories that models learn from and the engineering tools used to build them.

Knowledge & Data Assets

Foundation models are only as good as the data they're trained on. Common Crawl provides the web-scale text corpus that most language models are trained on. Wikipedia contributes structured factual knowledge. GitHub and Stack Overflow are the primary sources for code understanding. arXiv provides the scientific literature. Reddit, X, and YouTube contribute conversational data, real-time information, and multimodal content.

Getty Images and Shutterstock are key players in the data licensing economy — negotiating deals with AI companies for licensed visual training data, reshaping how creative content feeds into the agentic stack.

Model-Building Toolchain

Building foundation models requires specialized engineering infrastructure. PyTorch and JAX are the dominant ML frameworks. CUDA provides the GPU programming layer. vLLM (Inferact) optimizes inference serving. Anyscale (Ray) handles distributed training. Weights & Biases provides experiment tracking and model management.

Layer 5: Knowledge & Substrate — The Agentic Economy

Knowledge & Data Assets

Model-Building Toolchain

Related Topics

Further Reading