Support for multiple LLMs (currently LLAMA, BLOOM, OPT) at various model sizes (up to 170B) Support for a wide range of consumer-grade Nvidia GPUs Tiny and easy-to-use codebase mostly in Python (<500 ...
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.