AllTopicsTodayAllTopicsToday
Notification
Font ResizerAa
  • Home
  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies
Reading: StreamTensor: A PyTorch-to-Accelerator Compiler that Streams LLM Intermediates Across FPGA Dataflows
Share
Font ResizerAa
AllTopicsTodayAllTopicsToday
  • Home
  • Blog
  • About Us
  • Contact
Search
  • Home
  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies
Have an existing account? Sign In
Follow US
©AllTopicsToday 2026. All Rights Reserved.
AllTopicsToday > Blog > AI > StreamTensor: A PyTorch-to-Accelerator Compiler that Streams LLM Intermediates Across FPGA Dataflows
Blog banner 17 1024x731.png
AI

StreamTensor: A PyTorch-to-Accelerator Compiler that Streams LLM Intermediates Across FPGA Dataflows

AllTopicsToday
Last updated: October 6, 2025 9:39 am
AllTopicsToday
Published: October 6, 2025
Share
SHARE

Why deal with LLM inference as a batch kernel to drums when the dataflow compiler can pipe tiles by on-chip FIFOS and stream converters? Stream Tenser is a compiler that reduces Pytorch LLM graphs (GPT-2, LLAMA, QWEN, GEMMA) and is an information movement accelerator that flows to AMD’s Alveo U5ca Schellow Eccelerators. The system introduces an iterative tensor (“Itensor”) sort for encoding Tile/Order of Streams, permitting it to show automated insertion/sizing of DMA engines, FIFOs, and format converters. For LLM decoding workloads, the analysis group reviews as much as 0.64 x low latency vs. GPU and as much as 1.99 instances power effectivity.

https://arxiv.org/pdf/2509.13694

What’s a stream tenser?

StreamTensor compiles Pytorch graphs into stream-oriented dataflow designs, with intermediate tiles nearly avoiding off-chip drum spherical journeys through on-chip streaming and fusion. The DMA is inserted solely when mandatory. They’re transferred to the downstream kernel through on-chip FIFOs. Compiler heart abstraction – Utilized tensors – information iterative order, tiles, format. This framework hierarchically searches tiles, fusions, and useful resource allocations, and makes use of linear applications to dimension FIFOS to keep away from meals stalls and deadlocks whereas minimizing on-chip reminiscence.

https://arxiv.org/pdf/2509.13694

What’s actually new?

Layer DSE. The compiler investigates three design areas: (i) tile/untilization/vectorization/permutation on the Linux stage, (ii) fusion underneath reminiscence/useful resource constraints, and (iii) Optimizing sustained throughput underneath bandwidth limits. Finish-to-end Pytorch → Machine Circulate. The mannequin is enter through Torch-Mlir and transformed to Mlir Linelg, then transformed to a dataflow IR the place the node turns into a {hardware} kernel with express streams and host/runtime adhesives. There isn’t a handbook RTL meeting. Iterative tensor typing system. Top notch tensor varieties characterize iterative order, tiles, and affine maps. This enables for express stream ordering, permitting safe kernel fusion, and permits the compiler to synthesize minimal buffer/format converters if the producer/shopper disagrees. Official FIFO sizing. It’s resolved by linear programming formulations to keep away from stall/deadlocks whereas minimizing on-chip reminiscence utilization (BRAM/URAM).

outcome

Latency: As much as 0.76×× vs. GPU baseline on the earlier FPGA LLM accelerator and 0.64×vs. Vitality Effectivity: As much as 1.99 x VS A100 (mannequin dependent) with the brand new LLMS. Platform context: ALVEO U55C (HBM2 16 GB, 460 GB/s, PCIE GEN3x16 or Twin GEN4x8, 2xQSFP28).

https://arxiv.org/pdf/2509.13694

A helpful contribution right here is the Pytorch→Torch-Mlir→Dataflow compiler, which emits the host/runtime of AMD’s ALVEO U55C. Iterative tensor varieties and linear programming-based FIFO sizing enable for protected inter-kernel streaming fairly than drum spherical journeys. Within the reported LLM decoding benchmarks for GPT-2, Llama, Qwen, and Gemma, the analysis group reveals GPU baselines and low energy-efficient geometric imply latency with GPU baselines as much as 1.99×. The {hardware} context is obvious. The ALVEOU55C presents twin QSFP28 at 460 GB/s and 16 GB HBM2 at 460 GB/s with PCIE GEN3x16 or twin GEN4x8 at 460 GB/s.

Take a look at the paper. For tutorials, code and notebooks, please go to our GitHub web page. Additionally, be at liberty to observe us on Twitter. Do not forget to hitch 100K+ ML SubredDit and subscribe to our publication.

Mikal Sutter is an information science skilled with a Grasp’s diploma in Information Science from Padova College. With its stable foundations of statistical evaluation, machine studying, and information engineering, Michal excels at reworking complicated datasets into actionable insights.

Comply with marktechpost: Add as Google’s most well-liked supply.

Top AI Risks, Dangers & Challenges in 2026
How a Haystack-Powered Multi-Agent System Detects Incidents, Investigates Metrics and Logs, and Produces Production-Grade Incident Reviews End-to-End
Exploring a space-based, scalable AI infrastructure system design
Trump says he and Putin will meet in Hungary to discuss war in Ukraine
Enhancing the foundation of genomic research
TAGGED:CompilerDataflowsFPGAIntermediatesLLMPyTorchtoAcceleratorstreamsStreamTensor
Share This Article
Facebook Email Print
Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Follow US

Find US on Social Medias
FacebookLike
XFollow
YoutubeSubscribe
TelegramFollow

Weekly Newsletter

Subscribe to our newsletter to get our newest articles instantly!

Popular News
Battlefield redsec launch screen 4.jpeg
Gaming

Battlefield Redsec boosts Battlefield 6’s Steam numbers, but not enough to get anywhere close to those launch heights

AllTopicsToday
AllTopicsToday
October 29, 2025
SAVE Student Loan Plan Timeline Estimates: What To Expect
A Look Into FirstCash Hldgs Inc’s Price Over Earnings – FirstCash Hldgs (NASDAQ:FCFS)
Best Soundbars of 2025: Latest Picks From Sonos, Bose, Yamaha
Matthew Lillard’s Video Game Movies Ranked, Including Five Nights At Freddy’s
- Advertisement -
Ad space (1)

Categories

  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies

About US

We believe in the power of information to empower decisions, fuel curiosity, and spark innovation.
Quick Links
  • Home
  • Blog
  • About Us
  • Contact
Important Links
  • About Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
  • Contact

Subscribe US

Subscribe to our newsletter to get our newest articles instantly!

©AllTopicsToday 2026. All Rights Reserved.
1 2
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?