Skip to main content
Question

inference.py always deploys model

  • April 1, 2026
  • 0 replies
  • 3 views

I am using Metis dev board with Arduino x8. When running
./inference_llm.py llama-3-2-1b-1024-4core-static --prompt "Give me a joke"
it always deploys the model first. Even when I run this command one by one several times it is always deploying the model again, even thought it was just deployed.

Is there a way to deploy the model once and then just inference it?