In case anyone on here hasn’t spotted it in the wild yet, the Metis M.2 Max was just announced.

It’s basically managed to pack PCIe card-level performance into an M.2 stick. So that’s better performance for LLMs and vision transformers on much smaller host devices (RPi, Orange Pi, etc), all while keeping power draw super low (around 6.5W). It’s been bumped up to 16GB of memory, slimmer design options, and added security features for tougher environments.
Feels like a big step forward for anyone wanting to run GenAI, computer vision, or LLM workloads locally, on devices with an impressively small footprint, without needing a data centre to do it .
Shipping starts later this year.
https://axelera.ai/news/axelera-ai-boosts-llms-at-the-edge-by-2x-with-metis-m.2-max-introduction