HALT
Local Large Language Model Interface
Run powerful language models locally on your own hardware with an intuitive interface for local inference, multi-agent conversations, and customizable AI behavior.
Key Features
Local LLM Support
Run models locally without internet connectivity, preserving privacy and security.
Model Management
Download, cache, and manage large language models directly from the interface.
Multi-Agent Mode
Create conversations between multiple AI agents with different specialized roles.
System Instructions
Customize AI behavior with predefined or custom instructions to guide responses.
GPU Acceleration
Optimized for CUDA with GPU selection and memory management for faster inference.
Session Management
Save and load conversation sessions to continue work across multiple sessions.
System Requirements
Operating System
Windows or Linux with Python 3.8+
Graphics
NVIDIA GPU with CUDA support (min 8GB VRAM recommended)
Memory
At least 16GB system RAM
Installation
# Clone the repository
git clone https://github.com/Sevsai/HALT.git
cd HALT
# Install dependencies
pip install -r requirements.txt
# Run the application
python HALT.py
Quick Start
-
Download a Model:
- Go to the "Tools" tab
- Click "Pre-Download Model" and enter a HuggingFace model ID
- For example: NousResearch/Hermes-2-Pro-Mistral-7B
-
Enable Offline Mode:
- Check "Offline Mode" on the Tools tab
-
Check the Model:
- Go back to the "Chat" tab
- Click "Check Model" to load the model
-
Start Chatting:
- Type a message in the input area
- Click "Generate Response"
Advanced Usage
Model Settings
Fine-tune your model's responses with various parameters:
- Temperature: Controls randomness (0.0-1.0)
- Top-K/Top-P: Sampling parameters for generation
- Max Length: Maximum token length for responses
System Instructions
The application uses a system instructions manager to provide context to the AI. You can:
- Select from predefined instruction presets
- Create custom instruction presets
- Save and load instruction configurations
Multi-Agent Mode
HALT supports conversations between multiple AI agents:
- Go to the "Agents" tab
- Enable "Multi-Agent Mode"
- Configure number of agents and roles
- Return to the chat and enter a prompt
- Agents will discuss the topic with their assigned perspectives
Advanced Model Downloading
For manual model downloads:
python model_downloader.py --model MODEL_ID --dir OUTPUT_DIR
For specific models like DeepHermes:
python deephermes_setup.py
Troubleshooting
Out of Memory Errors
Try a smaller model or enable 4-bit quantization in the model settings to reduce memory usage.
Model Not Found
Check that the model path is correct and the model is fully downloaded. Make sure you've selected the proper directory in the application settings.
CUDA Errors
Ensure you have compatible NVIDIA drivers installed and that your GPU has enough VRAM for the selected model.