Groq vs Tenstorrent
ComparisonGroq and Tenstorrent represent two radically different bets on the future of AI silicon beyond NVIDIA's GPU monopoly. Groq built a deterministic Language Processing Unit (LPU) optimized purely for inference speed—a bet so compelling that NVIDIA acquired the company for $20 billion in December 2025. Tenstorrent, led by legendary chip architect Jim Keller, is building an open RISC-V-based architecture that spans training and inference while offering IP licensing and chiplet-level customization. One prioritizes raw token throughput; the other prioritizes architectural openness and cost efficiency. Together, they illustrate the branching paths of post-GPU AI compute.
Feature Comparison
| Dimension | Groq | Tenstorrent |
|---|---|---|
| Architecture | Language Processing Unit (LPU) with deterministic static scheduling and SRAM-only memory | RISC-V mesh-based Tensix cores with conditional execution and GDDR6 memory |
| Primary Focus | Inference-only, optimized for maximum token throughput | Training and inference, with emphasis on flexibility and openness |
| Current Flagship | Groq 3 LPU (shipping Q3 2026 under NVIDIA), 150 TB/s bandwidth, 1,500 tokens/sec target | Blackhole (6nm TSMC), 120 Tensix++ cores, 774 TFLOPS FP8 |
| Memory Architecture | 230MB on-die SRAM, no HBM—delivers up to 80 TB/s on-die bandwidth | GDDR6 DRAM (no HBM), distributed software architecture for memory management |
| Ownership | Acquired by NVIDIA (Dec 2025) for ~$20B; now NVIDIA's Real-Time Inference division | Independent startup; $1.18B total funding, ~$2.6B+ valuation |
| Leadership | Co-founder Jonathan Ross now leads NVIDIA inference division | CEO Jim Keller (former AMD Zen, Apple A-series, Tesla HW3 architect) |
| Business Model | Cloud inference API (pre-acquisition); NVIDIA hardware product (post-acquisition) | IP licensing, chiplet sales, dev kits ($12K+), complete systems |
| Open vs Proprietary | Proprietary architecture, now within NVIDIA's closed ecosystem | Open RISC-V ISA, open-source software stack (TT-Metalium) |
| Manufacturing | Groq 3 LPU fabricated via NVIDIA's supply chain | Samsung Foundry SF4X (cost-optimized), TSMC 6nm for Blackhole |
| Cost Strategy | Premium performance-per-watt: 35x throughput/MW vs Blackwell NVL72 | Deep cost undercut: no HBM, cheap process nodes, targeting 60%+ gross margins |
| Software Ecosystem | Integrated into NVIDIA CUDA/TensorRT ecosystem post-acquisition | TT-Metalium open-source SDK, growing community but early-stage tooling |
| Key Metric | 276–1,665 tokens/sec on Llama 70B (standard to speculative decoding) | 774 TFLOPS FP8 raw compute; emphasis on cost-per-inference over raw speed |
Detailed Analysis
Architectural Philosophy: Determinism vs Flexibility
Groq's LPU feeds tokens through a single wide pipeline of functional units executing in lock-step—no kernel switching, no cache misses, every clock cycle doing useful work. This deterministic approach eliminates the scheduling overhead that plagues GPU and TPU architectures, delivering predictable latency that matters enormously for agentic AI applications where multiple LLM calls chain within a single interaction. Tenstorrent's Tensix cores take the opposite approach: a mesh-based architecture with conditional execution that can skip unnecessary computation dynamically. This flexibility lets the same hardware handle training and inference workloads, and the RISC-V instruction set means the architecture can be extended and customized by licensees without proprietary lock-in.
The NVIDIA Factor
NVIDIA's $20 billion acquisition of Groq in December 2025 fundamentally changed the competitive landscape. The Groq 3 LPU, unveiled at GTC 2026, claims 35x higher throughput per megawatt than NVIDIA's own Blackwell NVL72 for trillion-parameter models—NVIDIA essentially acquired the technology that would obsolete its own inference approach. Jonathan Ross and Groq's senior engineering team now lead NVIDIA's Real-Time Inference division. For Tenstorrent, this consolidation is a double-edged sword: it removes an independent competitor but also validates the thesis that specialized inference hardware is critical. Jim Keller's strategy of building the anti-NVIDIA—open architecture, no HBM dependency, IP licensing—becomes more differentiated as NVIDIA absorbs Groq's technology into its walled garden.
Memory and Cost Architecture
Both companies reject HBM (High Bandwidth Memory), but for different reasons. Groq uses only on-die SRAM to achieve extreme bandwidth (150 TB/s on Groq 3) at the cost of limited capacity—the architecture works brilliantly for inference but cannot scale to training-sized models without multi-chip configurations. Tenstorrent uses commodity GDDR6 DRAM paired with a distributed software memory architecture, sacrificing peak bandwidth for dramatically lower bill-of-materials cost. Manufacturing on Samsung's SF4X process rather than TSMC's cutting-edge nodes further reduces Tenstorrent's cost basis. If Groq 3 represents the performance ceiling of AI inference silicon, Tenstorrent's Blackhole represents the cost floor—and in an inference economy where margins matter, both positions are viable.
Software Ecosystem Maturity
Post-acquisition, Groq benefits from NVIDIA's CUDA ecosystem—the largest and most mature GPU software stack in the world. Developers already using TensorRT for inference can potentially target Groq 3 LPUs with minimal code changes. Tenstorrent's TT-Metalium is open-source and growing, but remains early-stage compared to CUDA's decade-plus head start. Jim Keller has acknowledged this gap, noting that Tenstorrent needs 18–36 months of software stack maturation to achieve meaningful market traction. The open-source approach could ultimately be an advantage—much as Linux eventually displaced proprietary Unix—but the near-term developer experience favors NVIDIA/Groq.
Target Markets and Deployment Models
Groq (now under NVIDIA) targets hyperscale cloud providers and enterprises demanding the absolute fastest inference for real-time agentic web applications—the 1,500 tokens-per-second target enables multi-agent systems communicating in real time. Tenstorrent targets a broader market including edge deployment, sovereign AI initiatives, and organizations that want to customize their AI silicon without NVIDIA dependency. Tenstorrent's developer workstations starting at $12,000 and IP licensing model enable a long tail of hardware customization that Groq/NVIDIA's vertically integrated approach cannot serve. The Taiwan office expansion signals Tenstorrent's ambition to embed in the global semiconductor supply chain.
The Open Architecture Bet
Tenstorrent's use of RISC-V aligns with a broader industry shift toward open-source foundations in AI infrastructure. Just as open-source models from Meta and Mistral challenged proprietary LLMs, open-source hardware architectures challenge NVIDIA's proprietary GPU stack. Tenstorrent's licensing model—selling IP and chiplets that others can integrate—mirrors ARM's approach to mobile computing. If RISC-V AI accelerators achieve even a fraction of ARM's mobile success, Tenstorrent's early positioning becomes enormously valuable. Groq's absorption into NVIDIA forecloses this path entirely: its technology is now proprietary NVIDIA IP, accessible only through NVIDIA's product stack and pricing.
Best For
Real-Time Multi-Agent Systems
Groq (NVIDIA)Groq 3's 1,500 tokens/sec target and deterministic latency make it unmatched for applications where multiple AI agents must communicate in real time. The lock-step execution eliminates latency spikes that break agentic workflows.
Cost-Optimized Inference at Scale
TenstorrentTenstorrent's no-HBM GDDR6 architecture and cheap manufacturing process enable dramatically lower cost-per-inference. For high-volume workloads where cost matters more than peak latency, Blackhole offers compelling economics.
Edge and On-Premise AI Deployment
TenstorrentTenstorrent's developer workstations, chiplet sales model, and customizable RISC-V architecture serve edge deployment scenarios where NVIDIA's hyperscale-focused Groq 3 LPU is impractical or unavailable.
Enterprise Cloud Inference
Groq (NVIDIA)NVIDIA's ecosystem integration means Groq 3 will be available through major cloud providers with mature tooling, support contracts, and CUDA compatibility—exactly what enterprise procurement requires.
Sovereign AI and Export-Restricted Markets
TenstorrentNations building domestic AI capability without NVIDIA dependency can license Tenstorrent's RISC-V IP and manufacture locally. The open architecture avoids US export control chokepoints that affect NVIDIA products.
Custom AI Silicon (IP Licensing)
TenstorrentCompanies wanting to build custom AI chips can license Tenstorrent's Tensix cores and RISC-V IP. Groq's technology is now locked inside NVIDIA with no licensing path for third parties.
Latency-Sensitive Consumer AI Products
Groq (NVIDIA)Products like AI coding assistants, real-time translation, and conversational interfaces benefit most from Groq's sub-second response times and consistent latency profile.
AI Training Workloads
TenstorrentGroq is inference-only by design. Tenstorrent's architecture handles both training and inference—a 192-chip Blackhole training cluster is already operational, with larger clusters planned.
The Bottom Line
Groq and Tenstorrent no longer compete directly—they represent different futures for AI silicon. Groq, now inside NVIDIA, will define the performance ceiling for inference: the Groq 3 LPU shipping Q3 2026 promises to be the fastest inference chip ever built, backed by NVIDIA's manufacturing scale and software ecosystem. Tenstorrent, independent under Jim Keller, is building the open alternative: cheaper to manufacture, customizable via IP licensing, and free from NVIDIA's proprietary ecosystem. If you need maximum inference speed and operate within NVIDIA's ecosystem, Groq 3 is the clear choice. If you need cost efficiency, architectural customization, sovereignty from NVIDIA, or the ability to handle both training and inference, Tenstorrent offers a credible and increasingly mature path. The AI compute market is large enough for both approaches—and the industry is healthier for having them.
Further Reading
- IEEE Spectrum: Nvidia Groq 3 LPU — Speeding AI Inference Tasks
- EE Times: Jim Keller — 'Whatever Nvidia Does, We'll Do The Opposite'
- Irrational Analysis: Tenstorrent and the State of AI Hardware Startups
- Next Platform: Why NVIDIA Shelled Out $20 Billion for Groq
- IntuitionLabs: LLM Inference Hardware — An Enterprise Guide to Key Players