Take your coding to the next level with AI while keeping privacy and confidentiality under your control.
You can run any arbitrary LLM model supported by llama.cpp if you choose to run the provided Docker container that this extension supports. Or build and use your own wrapper, which can run locally or remotely.
Use open models, or your own private models, finetuned to your individual or team data.
It won't compete with ChatGPT or GitHub Copilot anytime soon, but it balances that with the ownership advantage.
For a quick start, it's recommended to use the supported container server. Ensure you have the LLM model binary to use. and edit the settings accordingly. Then call
PairCoder: Start LLM Server Container
via the Command Palette.
Select any text and use the contextual menu or the Command Palette for code generation. The prompt is the selected text.
Starting a new generation will interrupt the current one. You can also interrupt at any time using PairCoder: Stop PairCoding
from the Command Palette.
- Docker if you choose to use the encapsulated container functionality (recommended).
- Binaries of the model(s) that will be used for code generation.
You can also run your own server (locally or remotely). The extension simply expects a REST API with a /predict
POST
endpoint. See the Github repository for details.
This extension contributes the following settings:
youcefrahal.paircoder.server.url
: URL of the LLM server. Default ishttp://localhost:8399/predict
. Make sure to update the port number if you edit theyoucefrahal.paircoder.docker.port
setting.youcefrahal.paircoder.docker.image
: The Docker image to use. Default isyrahal/paircoder-server:<release>
.youcefrahal.paircoder.docker.port
: Port to map the LLM server running in the container. Default is8399
.youcefrahal.paircoder.model.modelsPath
: Absolute path to a local folder containing one or more LLMs that can be ingested byllama.cpp
.youcefrahal.paircoder.model.filename
: Filename of the LLM to load. If unspecfifed, the first file found in the models folder will be used.youcefrahal.paircoder.model.n_predict
: number of tokens to predict (default: 128, infinity: -1).
You should only have to edit youcefrahal.paircoder.model.modelsPath
with your own value, and possibly youcefrahal.paircoder.model.filename
if you have more than one model in that folder.
You might have to edit youcefrahal.paircoder.docker.port
if you have a port conflict (and therefore youcefrahal.paircoder.server.url
too).
- Parallel queries are unsupported but nothing prevents attempting them.
- Obviously, the speed and performance of the generation are dependent on the hardware and the model used. The above screenshot uses the base GPT4All model. Vicuna 13B will give better results in general, but slower.
- Ability to generate code or text using a local or a remote server.
- Integration of a container that serves
llama.cpp
, configurable to work with any of their supported models.
- A new command to interrupt prediction. Starting a new prediction will also interrupt the current one.
- A new option to customize the number of predicted tokens.
- By default, Docker images are now tied to the same release number as the extension.