i just create a vllm service with 2 A10 GPU and use cuda-checkpoint & criu to dump/restore. the model i use is Qwen2.5-72B. i use the command to start-up: python3 -m vllm.entrypoints.openai.api_server ...
Pull requests help you collaborate on code with other people. As pull requests are created, they’ll appear here in a searchable and filterable list. To get started, you should create a pull request.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results