Hi all, I have never touched any tools for local inference and barely know anything about the landscape. Additionally, the only hardware I have available is a 8C/16T Zen 3 CPU and 48GB of RAM. I have many years experience running Linux as a daily driver and small network sysadmin.
I am well aware this is extreme challenge mode, but it’s what I have to work with for now, and my main goal is more to do with learning the ecosystem than with getting highly usable results.
I decided for various reasons that my first project would be to get a model which I can feed an image, and have it output a caption.
If I have to quantize a model to make it fit into my available RAM then I am willing to learn that too.
I am looking for basic pointers of where to get started, such as “read this guide,” “watch this video,” “look into this software package.”
I am not looking for solutions which involve using an API where inference happens on a machine which is not my own.
Chiming in to say this is a very reasonable starting place and wanted to highlight to op that this solution is 100% self-hosted