I am using Metis dev board with Arduino x8. When running./inference_llm.py llama-3-2-1b-1024-4core-static --prompt "Give me a joke"
it always deploys the model first. Even when I run this command one by one several times it is always deploying the model again, even thought it was just deployed.
Is there a way to deploy the model once and then just inference it?
