HALT

Local Large Language Model Interface

Run powerful language models locally on your own hardware with an intuitive interface for local inference, multi-agent conversations, and customizable AI behavior.

Python CUDA Local LLMs GPU Acceleration Multi-Agent

Key Features

💻

Local LLM Support

Run models locally without internet connectivity, preserving privacy and security.

🔄

Model Management

Download, cache, and manage large language models directly from the interface.

👥

Multi-Agent Mode

Create conversations between multiple AI agents with different specialized roles.

⚙️

System Instructions

Customize AI behavior with predefined or custom instructions to guide responses.

🚀

GPU Acceleration

Optimized for CUDA with GPU selection and memory management for faster inference.

💾

Session Management

Save and load conversation sessions to continue work across multiple sessions.

System Requirements

💻

Operating System

Windows or Linux with Python 3.8+

🖥️

Graphics

NVIDIA GPU with CUDA support (min 8GB VRAM recommended)

🧠

Memory

At least 16GB system RAM

Installation

# Clone the repository
git clone https://github.com/Sevsai/HALT.git
cd HALT

# Install dependencies
pip install -r requirements.txt

# Run the application
python HALT.py

Quick Start

  1. Download a Model:
    • Go to the "Tools" tab
    • Click "Pre-Download Model" and enter a HuggingFace model ID
    • For example: NousResearch/Hermes-2-Pro-Mistral-7B
  2. Enable Offline Mode:
    • Check "Offline Mode" on the Tools tab
  3. Check the Model:
    • Go back to the "Chat" tab
    • Click "Check Model" to load the model
  4. Start Chatting:
    • Type a message in the input area
    • Click "Generate Response"

Advanced Usage

Model Settings

Fine-tune your model's responses with various parameters:

  • Temperature: Controls randomness (0.0-1.0)
  • Top-K/Top-P: Sampling parameters for generation
  • Max Length: Maximum token length for responses

System Instructions

The application uses a system instructions manager to provide context to the AI. You can:

  • Select from predefined instruction presets
  • Create custom instruction presets
  • Save and load instruction configurations

Multi-Agent Mode

HALT supports conversations between multiple AI agents:

  • Go to the "Agents" tab
  • Enable "Multi-Agent Mode"
  • Configure number of agents and roles
  • Return to the chat and enter a prompt
  • Agents will discuss the topic with their assigned perspectives

Advanced Model Downloading

For manual model downloads:

python model_downloader.py --model MODEL_ID --dir OUTPUT_DIR

For specific models like DeepHermes:

python deephermes_setup.py

Troubleshooting

Out of Memory Errors

+

Try a smaller model or enable 4-bit quantization in the model settings to reduce memory usage.

Model Not Found

+

Check that the model path is correct and the model is fully downloaded. Make sure you've selected the proper directory in the application settings.

CUDA Errors

+

Ensure you have compatible NVIDIA drivers installed and that your GPU has enough VRAM for the selected model.