HALT

Local Large Language Model Interface

Run powerful language models locally on your own hardware with an intuitive interface for local inference, multi-agent conversations, and customizable AI behavior.

Python CUDA Local LLMs GPU Acceleration Multi-Agent

Download View Source

Key Features

💻

Local LLM Support

Run models locally without internet connectivity, preserving privacy and security.

🔄

Model Management

Download, cache, and manage large language models directly from the interface.

👥

Multi-Agent Mode

Create conversations between multiple AI agents with different specialized roles.

⚙️

System Instructions

Customize AI behavior with predefined or custom instructions to guide responses.

🚀

GPU Acceleration

Optimized for CUDA with GPU selection and memory management for faster inference.

💾

Session Management

Save and load conversation sessions to continue work across multiple sessions.

System Requirements

💻

Operating System

Windows or Linux with Python 3.8+

🖥️

Graphics

NVIDIA GPU with CUDA support (min 8GB VRAM recommended)

🧠

Memory

At least 16GB system RAM

Installation

# Clone the repository
git clone https://github.com/Sevsai/HALT.git
cd HALT

# Install dependencies
pip install -r requirements.txt

# Run the application
python HALT.py

Quick Start

Download a Model:
- Go to the "Tools" tab
- Click "Pre-Download Model" and enter a HuggingFace model ID
- For example: NousResearch/Hermes-2-Pro-Mistral-7B
Enable Offline Mode:
- Check "Offline Mode" on the Tools tab
Check the Model:
- Go back to the "Chat" tab
- Click "Check Model" to load the model
Start Chatting:
- Type a message in the input area
- Click "Generate Response"

Advanced Usage

Model Settings

Fine-tune your model's responses with various parameters:

Temperature: Controls randomness (0.0-1.0)
Top-K/Top-P: Sampling parameters for generation
Max Length: Maximum token length for responses

System Instructions

The application uses a system instructions manager to provide context to the AI. You can:

Select from predefined instruction presets
Create custom instruction presets
Save and load instruction configurations

Multi-Agent Mode

HALT supports conversations between multiple AI agents:

Go to the "Agents" tab
Enable "Multi-Agent Mode"
Configure number of agents and roles
Return to the chat and enter a prompt
Agents will discuss the topic with their assigned perspectives

Advanced Model Downloading

For manual model downloads:

python model_downloader.py --model MODEL_ID --dir OUTPUT_DIR

For specific models like DeepHermes:

python deephermes_setup.py

Troubleshooting

Out of Memory Errors

Try a smaller model or enable 4-bit quantization in the model settings to reduce memory usage.

Model Not Found

Check that the model path is correct and the model is fully downloaded. Make sure you've selected the proper directory in the application settings.

CUDA Errors

Ensure you have compatible NVIDIA drivers installed and that your GPU has enough VRAM for the selected model.

We Value Your Privacy

Privacy Preferences

Essential Cookies

Analytics Cookies

Marketing Cookies

HALT

Key Features

Local LLM Support

Model Management

Multi-Agent Mode

System Instructions

GPU Acceleration

Session Management

System Requirements

Operating System

Graphics

Memory

Installation

Quick Start

Advanced Usage

Model Settings

System Instructions

Multi-Agent Mode

Advanced Model Downloading

Troubleshooting

Out of Memory Errors

Model Not Found

CUDA Errors