Jan

Jan is an open-source, offline-first desktop application that provides a ChatGPT-like experience running entirely on your local machine — using llama.cpp under the hood to run models like Llama 3, Mistral, and Phi with full privacy, an OpenAI-compatible local API server at localhost:1337, and cross-platform support (Mac, Windows, Linux) that turns any modern laptop into a private AI assistant with zero cloud dependency.

What Is Jan?

- Definition: An open-source desktop application (AGPLv3 license) that packages local LLM inference into a polished, user-friendly interface — handling model downloading, GPU detection, memory management, and API serving so users can chat with AI models without any technical setup.
- Offline-First: Jan is designed to work completely without internet after initial model download — no telemetry, no cloud calls, no data leaving your machine. The application and all inference run locally.
- OpenAI-Compatible API: Jan exposes a local server at localhost:1337 that implements the OpenAI chat completions API — any application using the OpenAI SDK can point to Jan as a drop-in local replacement by changing the base URL.
- Extension System: Jan supports extensions for additional functionality — remote API connections (OpenAI, Anthropic as fallback), TensorRT-LLM acceleration, and community-built plugins.
- llama.cpp Backend: Uses llama.cpp for inference — supporting GGUF quantized models with automatic GPU offloading on NVIDIA (CUDA), AMD (Vulkan), and Apple Silicon (Metal).

Key Features

- Model Hub: Built-in model browser with recommended models for different hardware configurations — shows RAM requirements, download sizes, and performance expectations before downloading.
- Conversation Management: Multiple chat threads with conversation history, system prompt customization, and model switching mid-conversation.
- Local RAG: Upload documents and chat with them locally — Jan indexes files for retrieval-augmented generation without sending documents to any cloud service.
- Thread-Level Model Selection: Different conversations can use different models — use a small fast model for quick questions and a large model for complex reasoning.
- Resource Monitoring: Real-time display of RAM usage, GPU utilization, and tokens per second during inference.

Jan vs Alternatives

| Feature | Jan | Ollama | LM Studio | GPT4All |
|---------|-----|--------|----------|---------|
| Interface | Desktop GUI | CLI + API | Desktop GUI | Desktop GUI |
| Open source | Yes (AGPL) | Partial | No | Yes |
| API server | OpenAI-compatible | OpenAI-compatible | OpenAI-compatible | REST API |
| Extensions | Yes (plugin system) | No | No | No |
| Local RAG | Yes | No (needs app) | No | Yes (LocalDocs) |
| Platform | Mac, Win, Linux | Mac, Win, Linux | Mac, Win, Linux | Mac, Win, Linux |

Jan is the open-source desktop AI application that prioritizes privacy and extensibility — providing a polished ChatGPT-like interface with local inference, an OpenAI-compatible API, and a plugin system that makes it both a standalone AI assistant and a local inference server for developers building privacy-preserving AI applications.

Want to learn more?