Initial commit: llama.cpp Server GUI

A professional PyQt6-based GUI for managing llama.cpp server instances. Features: - Server binary and model file selection - Comprehensive server options (host, port, context, GPU layers, etc.) - Start/Stop controls with non-blocking operations - Real-time server log viewer - Profile management (save/load/delete configurations) - Configuration persistence - System tray support - Auto-start option 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 19:00:43 -05:00
commit 0356871946
4 changed files with 923 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,119 @@
+# llama.cpp Server GUI
+
+A professional PyQt6-based graphical interface for managing llama.cpp server instances.
+
+## Features
+
+- **Server Binary Selection**: Browse and select your llama.cpp server binary
+- **Model Selection**: Easy selection of GGUF model files
+- **Comprehensive Options**: Configure host, port, context length, GPU layers, threads, batch size, and more
+- **Start/Stop Controls**: Simple buttons to start and stop the server
+- **Real-time Logs**: View server output and errors in real-time
+- **Profile Management**: Save and load different configurations for different models/use cases
+- **Configuration Persistence**: All settings are saved between sessions
+- **System Tray Icon**: Minimize to tray to keep the server running in the background
+- **Auto-start**: Option to automatically start the server when the GUI launches
+
+## Requirements
+
+- Python 3
+- PyQt6
+- llama.cpp server binary
+
+## Installation
+
+1. Install PyQt6:
+```bash
+sudo apt install python3-pyqt6
+```
+
+2. Make sure you have llama.cpp compiled with the server binary
+
+## Usage
+
+Run the application:
+```bash
+./llama_server_gui.py
+```
+
+Or:
+```bash
+python3 llama_server_gui.py
+```
+
+## Quick Start
+
+1. **Select Server Binary**: Click "Browse..." in the "Server Binary" section and navigate to your llama.cpp server binary (e.g., `/home/xero110/dev/llama.cpp/build/bin/llama-server`)
+
+2. **Select Model**: Click "Browse..." in the "Model Selection" section and choose your GGUF model file
+
+3. **Configure Options**: Adjust the server options as needed:
+   - Host: IP address to bind to (default: 127.0.0.1)
+   - Port: Port number (default: 8080)
+   - Context Length: Maximum context size (default: 2048)
+   - GPU Layers (ngl): Number of layers to offload to GPU (default: 33)
+   - Threads: CPU threads to use (default: 8)
+   - Batch Size: Batch size for processing (default: 512)
+   - Additional Arguments: Any extra command-line arguments
+
+4. **Start Server**: Click "Start Server"
+
+5. **Save Profile**: Once you have a configuration you like, click "Save Profile" to save it for later use
+
+## Profile Management
+
+- **Save Profile**: Saves the current configuration with a custom name
+- **Load Profile**: Select a profile from the dropdown and click "Load" to load its settings (profiles also auto-load when selected from dropdown)
+- **Delete Profile**: Removes the selected profile
+- **Auto-start**: Check this option to automatically start the server when the GUI launches
+
+The GUI now includes detailed logging in the log viewer at the bottom, showing when profiles are saved, loaded, and what settings are being applied.
+
+## System Tray
+
+The application includes a system tray icon that allows you to:
+- Show/hide the main window
+- Start/stop the server from the tray menu
+- Quit the application
+
+When you close the window while the server is running, you can choose to:
+- Minimize to tray (server keeps running)
+- Stop server and quit
+- Cancel the close operation
+
+## Configuration File
+
+Settings are stored in `~/.llama_server_gui_config.json`
+
+## Common Server Options Explained
+
+- **Context Length (-c)**: Maximum number of tokens the model can process at once. Larger values use more RAM/VRAM.
+- **GPU Layers (-ngl)**: Number of model layers to offload to GPU. Higher = faster but uses more VRAM. Set to -1 for all layers.
+- **Threads (-t)**: Number of CPU threads for processing. Usually set to your CPU core count or less.
+- **Batch Size (-b)**: Number of tokens processed in parallel. Larger = faster but uses more memory.
+- **Host**: Network interface to bind to. Use 127.0.0.1 for local-only access, or 0.0.0.0 to allow network access.
+- **Port**: Network port for the server API.
+
+## Tips
+
+- For RTX 4070 (8GB VRAM): Start with ngl=33 and adjust based on your model size
+- With 96GB RAM and i9 CPU: You can use high thread counts (16-24) and large context sizes
+- Create different profiles for different models (e.g., "Llama-3-8B", "Mistral-7B", etc.)
+- Use the system tray to keep the server running while working on other tasks
+
+## Troubleshooting
+
+**Server won't start:**
+- Check that the binary path is correct and the file is executable
+- Verify the model path is correct
+- Check the logs for error messages
+
+**Out of memory errors:**
+- Reduce context length
+- Reduce GPU layers (ngl)
+- Use a smaller model
+
+**Slow performance:**
+- Increase GPU layers if you have VRAM available
+- Adjust thread count
+- Increase batch size (if you have memory available)