Tether Unveils Cross-Platform Local LLM Retraining System

Date: 2025-12-05 Author: Gabriel Deangelo Categories: BUSINESS
news-banner
Tether Data AI, a research division, announced QVAC-fabric-llm, a framework for Low-Rank Adaptation (LoRA) retraining of large language models (LLM) within llama.cpp. The company emphasizes that this is the first solution that provides parameter-efficient training on a wide range of hardware, including mobile GPUs and desktop graphics cards.

Tether notes that this technology makes LLM training hardware-independent, eliminating vendor lock-in. The developers claim that retraining can now be performed on any modern device, including mobile phones and server systems. One of the key features is the ability to use mobile GPUs, a feature previously considered impossible. Support covers Adreno, Mali, and Apple GPUs, as well as AMD, Intel, NVIDIA, and Apple desktop solutions.

QVAC-fabric-llm provides cross-platform support for LoRA training for Qwen3 and Gemma3 models for the first time, expanding the functionality of llama.cpp. Tether Data AI makes available multi-platform binaries, LoRA adapters retrained on devices, and the source code for new modules, while preserving the public APIs of llama.cpp. The code is distributed under the Apache 2.0 license.

For Tether, the largest issuer of the USDT stablecoin, local retraining is important for several reasons: creating autonomous systems for regions with unstable infrastructure, protecting user privacy, and scalability without dependence on a specific GPU. The company emphasizes that the ability to personalize LLM training on user data is critical for the implementation of AI technologies.

Technically, QVAC-fabric-llm wraps the complete LoRA workflow in llama.cpp with an API for initialization, training, and adapter merging. It utilizes Vulkan for universal GPU support, Metal for Apple devices, and instructional retraining methods with masked losses. The system supports modern architectures and dynamic sharding to overcome limitations of mobile GPU drivers.

Tests have shown that retraining quality on mobile and desktop devices is comparable to PyTorch-HuggingFace. The models demonstrate high accuracy in biomedical tasks and cosine similarity quality comparable to PyTorch. Tether notes that the technology is particularly useful in medicine, science, and regulated financial services.

The company's future plans include expanding support for the GPTQ-INT8 and Q5_K_M formats, optimizing GPU operators, and improving memory management. QVAC-fabric-llm brings data center capabilities to user devices, paving the way for on-premises and secure AI solutions. Tether also recently signed a lease agreement for 20,000 GPUs to support its AI business through a partnership with Rumble and Northern Data.
image

Leave Your Comments