You might need to make use of the gpu_memory_limit and/or lora_on_cpu config possibilities to avoid running from memory. If you still run from CUDA memory, it is possible to make an effort to merge in technique RAM https://jemimahlfu880804.blogsvila.com/profile