inference.py always deploys model

Question

I am using Metis dev board with Arduino x8. When running
./inference_llm.py llama-3-2-1b-1024-4core-static --prompt "Give me a joke"
it always deploys the model first. Even when I run this command one by one several times it is always deploying the model again, even thought it was just deployed.

Is there a way to deploy the model once and then just inference it?

Spanner · Answer

Yo ​@Dominik!I think that's expected behaviour when using --prompt since each command invocation is a separate process that loads the model, runs your prompt, and then exits.If you want to keep the model loaded and send multiple prompts without redeploying each time, giveaxllm a shot in interactive mode:axllm llama-3-2-1b-1024-4core-staticThis drops you into a chat session where the model stays loaded and you can keep sending prompts.exit when you're done. The LLM tutorial has more details on the available options like --temperature and --system-prompt. Let me know what happens!

Sign up

Log in, or create an Axelera AI account

Login to the community

Log in, or create an Axelera AI account

Scanning file for viruses.

This file cannot be downloaded