* llama.cpp, github version instead of hardcoded version
* llama.cpp, check if model is specified, if yes, run it, if not, then download model
* Use entrypoint for custom llama.cpp invocation
* `llama.cpp` is just raw executable. This I think is our new pattern.
* To run chat use the entrypoint: `pkgx +brewkit -- run llama.cpp`
Co-authored-by: James Reynolds <magnsuviri@me.com>
Co-authored-by: Max Howell <mxcl@me.com>