Layer 5: Knowledge & Substrate — The Agentic Economy

Layer 5 of 7 — The Agentic Economy
Knowledge & Substrate
Training data and ML engineering tools
L1Agents
L2Creation
L3Platforms
L4Models
L5Knowledge
L6Compute
L7Physical
The knowledge repositories, datasets, and engineering toolchains that foundation models are built from.
16
companies
2
subcategories
Data
Assets → Toolchain
Companies in this layer
Knowledge / Data Assets
arXivCommon CrawlGetty ImagesGitHubRedditShutterstockStack OverflowWikipediaXYouTube
Model-Building Toolchain
Anyscale (Ray)CUDAInferact (vLLM)JAXPyTorchW&B

Layer 5: Knowledge & Substrate is the data and tooling layer of the Agentic Economy — the knowledge repositories that models learn from and the engineering tools used to build them.

Knowledge & Data Assets

Foundation models are only as good as the data they're trained on. Common Crawl provides the web-scale text corpus that most language models are trained on. Wikipedia contributes structured factual knowledge. GitHub and Stack Overflow are the primary sources for code understanding. arXiv provides the scientific literature. Reddit, X, and YouTube contribute conversational data, real-time information, and multimodal content.

Getty Images and Shutterstock are key players in the data licensing economy — negotiating deals with AI companies for licensed visual training data, reshaping how creative content feeds into the agentic stack.

Model-Building Toolchain

Building foundation models requires specialized engineering infrastructure. PyTorch and JAX are the dominant ML frameworks. CUDA provides the GPU programming layer. vLLM (Inferact) optimizes inference serving. Anyscale (Ray) handles distributed training. Weights & Biases provides experiment tracking and model management.