Galaxy Digital Inc.

09/15/2025 | Press release | Distributed by Public on 09/15/2025 15:42

Decentralized AI Training: Architectures, Opportunities, and Challenges

Introduction

Last year, Galaxy Research published its first piece on the intersection of crypto and AI. It examined ways cryptocurrency's trustless and permissionless infrastructure could serve as a foundation for AI innovation. Among them were the emergence of decentralized marketplaces for processing power, or compute, in response to shortages of graphics processing units (GPUs); the early promise of zero-knowledge machine learning (zkML) for verifiable onchain inference; and the potential for autonomous AI agents to streamline complex interactions and use crypto as a native medium of exchange.

At the time, many of these initiatives were nascent, compelling proofs of concept that hinted at practical advantages over centralized offerings but not yet scaled enough to reshape the AI landscape. In the year since, however, meaningful progress has been made toward making decentralized AI a reality. To capture this momentum and surface the most promising developments, over the coming year, Galaxy Research will publish a series of pieces that dive into specific verticals at the crypto x AI frontier.

This first piece focuses on decentralized training, highlighting projects working to enable permissionless training of foundation models on a global scale. Their motivation is twofold. Practically, they recognize that vast amounts of unused GPUs around the world can be tapped for model training, opening up an otherwise unaffordable process to AI engineers around the world and making open-source AI development a reality. Philosophically, these teams are motivated by the tight control that the leading centralized AI labs wield over one of the most important technological revolutions of our time and the urgent need to create open alternatives.

For the crypto space more broadly, enabling decentralized training and post-training of foundational models is a critical step toward building a fully onchain AI stack that is permissionless and accessible at every layer. GPU marketplaces can plug into models to provide needed hardware for training and inference. zkML providers can be used to verify model outputs and protect privacy. AI agents can serve as composable building blocks, stitching together models, data sources, and protocols into higher-order applications.

This report explores the underlying architectures of decentralized AI protocols, the technical problems they aim to solve, and the prospects for decentralized training. The underlying premise for crypto and AI remains unchanged from a year ago. Crypto provides AI with a permissionless, trustless, and composable settlement layer for transferring value. The challenge now is to prove that decentralized approaches can deliver practical advantages over their centralized counterparts.

Model Training Basics

Before diving into the latest developments in decentralized training, it's important to gain a basic understanding of LLMs and their underlying architecture. This will help readers understand how these projects work, as well as the major problems they are trying to solve.

Transformers

Large language models (LLMs) such as ChatGPT are powered by an architecture known as the transformer. Transformers were introduced in a 2017 Google paper entitled "Attention Is All You Need" and are one of the most important innovations in AI development. Simply put, transformers ingest data (referred to as tokens) and apply a variety of mechanisms to learn how those tokens relate to one another.

The relation between the tokens is modeled using weights. Weights can be thought of as millions to trillions of knobs that make up the model and are constantly being dialed until it can consistently predict the next token in a sequence. Once training is completed, the model can essentially capture the patterns and meanings that underlie human language.

Key components of transformer training include:

  • Forward Pass: In the first step of the training process, the transformer is fed a batch of tokens from a larger dataset. Based on these inputs, the model then tries to predict what the next token should be. At the start of a training run, the model's weights are random.

  • Loss Computation: Forward pass predictions are then used to calculate a loss score that measures how far those predictions are from the actual tokens in the original data batch fed to the model. In other words, how do the predictions that the model produced during the forward passes compare to the actual tokens in the larger dataset being used to train it? During training, the goal is to reduce this loss score to make the model more accurate.

  • Backward Pass: The loss score is then used to compute a gradient for each weight. These gradients tell the model how to adjust its weights to reduce loss before the next forward pass.

  • Optimizer Update: An optimizer algorithm reads those gradients and adjusts each weight to reduce the loss.

  • Repeat: The above steps are repeated until all the data has been consumed and the model begins to reach convergence - in other words, when further optimizations no longer produce significant reductions in loss or improvements in performance.

Training (pre- and post-training)

The full model training process includes two discrete steps: pre-training and post-training. The steps described above are the core components of the pre-training process. Once completed, they produce a pre-trained base model, often referred to as a foundational model.

However, models often require further improvements after pre-training called post-training. Post-training is used to further improve the base model in a variety of ways, including making it more accurate or tailoring it for specific use cases (such as translation or medical diagnosis).

Post-training is a critical step in making LLMs the powerful tools they are today. There are several different methods used in post-training. Two of the most prevalent are:

  • Supervised Fine-tuning (SFT): SFT largely resembles the pre-training process described above. The primary difference is that the base model is trained on a more curated set of data or prompts and answers, so it learns to follow specific instructions or specialize in a domain.

  • Reinforcement Learning (RL): RL improves a model not by feeding it new data, but by scoring the model's outputs with a reward and having the model update its weights to maximize for that reward. More recently, RL has been used by reasoning models (covered below) to improve their outputs. As questions over pre-training scaling have arisen in recent years, the use of RL and reasoning models in post-training has been a major advancement by removing the need for additional data or large compute in order to meaningfully enhance a model's performance.

RL post-training, specifically, is well-suited for addressing obstacles faced in decentralized training (discussed below). That's because most of the time in RL, the model is generating lots of outputs using forward passes (where the model makes predictions but doesn't change itself yet). These forward passes don't require coordination or communication between machines and can be done asynchronously. They are also parallelizable, meaning they can be broken down into independent sub-tasks that can be performed simultaneously across multiple GPUs. That's because each rollout can be computed in isolation, letting training runs scale up throughput by simply adding compute. Only once the best answers are picked does the model update its internal weights, reducing the frequency at which machines need to sync.

Once a model has been trained, the process of using it to generate outputs is called inference. Unlike training, which involves adjusting millions or billions of weights, inference keeps those weights fixed and simply applies them to new inputs. For an LLM, inference means taking a prompt, running it through the model's layers, and predicting the most likely next tokens one step at a time. Because inference doesn't require backpropagation (the process of adjusting the model's weights based on its errors) or weight updates, it is far less computationally demanding than training, but still resource-intensive due to the sheer size of modern models.

Simply put: Inference is what powers applications like chatbots, code assistants, and translation tools. It's the stage where the model puts its "learned knowledge" into action.

Training Overhead

Facilitating the above training processes is resource-intensive and requires highly specialized software and hardware to conduct at scale. The world's leading AI labs are already spending at unprecedented levels, ranging from hundreds of millions to several billion dollars. OpenAI CEO Sam Altman has stated that GPT-4 costs over $100 million to train, while Anthropic's CEO Dario Amodei has indicated that training runs exceeding $1 billion are already underway.

A major share of these costs stems from GPUs. Top-tier GPUs such as NVIDIA's H100 or B200 can cost upwards of $30,000 per unit, and OpenAI reportedly plans to deploy over one million GPUs by the end of 2025. However, raw GPU power isn't enough. These systems must be deployed in high-performance data centers equipped with ultra-fast communication infrastructure. Technologies like NVIDIA NVLink enable rapid data exchange between GPUs within a server, while InfiniBand connects clusters of servers, allowing them to function as a single, unified compute fabric.

Sample DGX H100 architecture with NVLink connects GPUs (light green rectangles) inside the system, while InfiniBand links servers (green lines) into a unified fabric. (Source.)

As a result, the majority of foundational models have been developed by centralized AI labs such as OpenAI, Anthropic, Meta, Google, and xAI. Only such behemoths have the copious resources needed for training. While this has led to major breakthroughs in model training and performance, it has also consolidated control of the development of leading foundational models to just a few entities. Additionally, evidence is emerging that scaling laws may be coming into play, limiting the effectiveness of simply adding compute or data in order to enhance the intelligence of a pre-trained model.

In response, over the past few years, a cohort of AI engineers has begun developing new ways to train models by trying to solve these technical complexities and reduce the enormous resource requirements. For the purposes of this paper, this effort will be referred to as decentralized training.

Decentralized and Distributed Training

Bitcoin's success has proven that computation and capital can be coordinated in a decentralized manner to secure a large economic network. Decentralized training aims to leverage features of cryptocurrency, including permissionlessness, trustlessness, and incentive mechanisms, to build decentralized networks that can train powerful foundational models on par with centralized providers.

In decentralized training, nodes in separate locations across the world work on a permissionless, incentivized network to contribute to the training of AI models. This is in contrast to distributed training, which refers to models trained over different geographies, but by single or multiple entities that have been permissioned (i.e., through a whitelisting process). For decentralized training to exist, however, distributed training has to be viable. Many centralized labs, recognizing hard constraints on their training setups, have begun to explore ways to pursue distributed training that yields results on par with their existing setups.

There are a few practical roadblocks that have prevented decentralized training from becoming a reality:

  • Communication Overhead: When nodes are geographically separated, they do not have access to the communication infrastructure described above. Decentralized training needs to account for standard internet speeds, the frequent transfer of large amounts of data, and the synchronization of GPUs during the training process.

  • Verification: Permissionless by nature, decentralized training networks are designed to let anyone contribute compute. As a result, they must develop verification mechanisms that prevent contributors from trying to derail the network with incorrect or malicious inputs or game the system to earn incentives without contributing productive work.

  • Compute: Decentralized networks must also aggregate enough compute to train the models regardless of scale. While in some ways this plays to the strengths of decentralized networks because these networks are designed to let anyone with a GPU contribute to the training process, it also introduces complexities because these networks must coordinate heterogeneous compute.

  • Incentives/Funding/Ownership and Monetization: Decentralized training networks must devise incentive mechanisms and ownership/monetization models that effectively ensure the network's integrity and reward compute providers, validators, and model designers for their contributions. This is in direct contrast to centralized labs, where a sole company builds and monetizes the model.

Despite these constraints, a number of projects are pursuing decentralized training on the belief that the control of foundational models shouldn't rest with a handful of corporations. They aim to tackle risks posed by centralized training, such as single points of failure due to reliance on a few centralized offerings; data privacy and censorship; scalability; and AI alignment and bias. More broadly, they view open-source AI development as a necessity, not a nice-to-have. Without open, verifiable infrastructure, innovation will be stifled, access will be restricted to a privileged few, and society will inherit AI systems shaped by narrow corporate incentives. Decentralized training, in this view, is not only about building competitive models but also about creating a resilient, transparent, and participatory ecosystem that reflects collective rather than proprietary interests.

Projects Overview

Below, we provide an in-depth overview of the underlying mechanics powering several decentralized training projects.

Nous Research 

Background

Nous Research was founded in 2022 as an open-source AI research organization. The team started as an informal group of open-source AI researchers and developers working on finding solutions to limitations in open-source AI code. The mission is "creating and serving the best models out in the open."

Early on, the team identified decentralized training as a primary roadblock. Specifically, they recognized that access to GPUs and tooling for coordinating communication between GPUs was primarily developed to suit large, centralized AI companies, leaving little space for resource-constrained organizations to participate in meaningful development. NVIDIA's latest Blackwell GPUs (such as the B200), for example, can communicate with each other at speeds up to 1.8 terabytes per second using the NVLink Switch System. This rivals the total bandwidth of major internet infrastructure and is only possible in centralized, data center-scale deployments, making it nearly impossible for smaller or distributed networks to match the performance of large AI labs without rethinking communication strategies.

Prior to tackling decentralized training, Nous had made significant contributions to the AI space generally. In August 2023, Nous published "YaRN: Efficient Context Window Extension of Large Language Models." This paper tackled a simple but important problem: most AI models can only remember and work with a fixed amount of text at once (their "context window"). For example, a model trained with a 2,000-word limit quickly starts forgetting or losing track of information if given longer documents. YaRN introduced a way to stretch this limit much further without retraining the model from scratch. It adjusts how the model keeps track of word positions (like bookmarks in a book), so that it can still follow the flow of information even when the text is tens of thousands of words long. The method lets models handle sequences up to 128,000 tokens-about the length of Mark Twain's "The Adventures of Huckleberry Finn"-while using much less computing power and training data than older approaches. In plain terms, YaRN made it possible for AI models to "read" and understand much longer documents, conversations, or datasets in one go. This was a major step forward for scaling AI capabilities and has since been picked up by the wider research community, including OpenAI and China's Deepseek.

DeMo and DisTrO

In March 2024, Nous published a breakthrough in distributed training entitled Decoupled Momentum Optimization (DeMo). DeMo was developed by Nous researchers Bowen Peng and Jeffrey Quesnelle, in collaboration with Diederik P. Kingma (co-founder of OpenAI and inventor of the AdamW optimizer). It is a primary building block in the Nous decentralized training stack that reduces communication overhead in distributed data-parallel model training setups by cutting down on the data exchanged between GPUs. In data-parallel training, each node keeps a complete copy of the model's weights, but the dataset is split into chunks processed by different nodes.

AdamW is one of the most commonly used optimizers in model training. A key function of AdamW is to smooth something called momentum, a running average of past changes to a model's weights. Essentially, AdamW helps remove noise introduced in the data parallelism training process to increase training efficiency. Nous Research built on AdamW with DeMo, creating a new optimizer that splits the momentum into local and shared parts across different trainers. This reduces the amount of communication required between nodes by limiting the amount of data they have to share with each other.

DeMO selectively focuses on the parameters that are changing the most rapidly across iterations per GPU. The logic is simple: parameters undergoing large changes are the most critical to learning and should be synchronized across workers with higher priority. Meanwhile, slower-moving parameters can lag temporarily without significantly hurting convergence. In effect, this filters out noisy updates while incorporating the most meaningful ones. Nous also incorporated compression techniques, including an approach similar to how JPEGs shrink images called Discrete Cosine Transform (DCT), to further reduce the amount of data sent. By synchronizing only the most important updates, DeMO reduced communication overhead by 10x to 1,000x depending on the model size. For a full technical overview of the underlying optimization, refer to this blog post by Nous Research. 

In June 2024, the Nous team introduced their second major innovation called Distributed Training Optimizer (DisTro). While DeMo provides the core optimizer innovation, DisTrO incorporates it into a broader framework of optimizers that further compress information shared between GPUs and address issues like GPU synchronization, fault tolerance, and load balancing. In December 2024, Nous demonstrated the viability of this approach by utilizing DisTro to train a 15 billion-parameter model on a LlaMA-style architecture.

Psyche

In May, Nous unveiled Psyche, a framework for coordinating decentralized training that further innovates on the DeMO and DisTrO optimizer architectures. Notable technical upgrades in Psyche include improving asynchronous training by enabling GPUs to send model updates while they begin training on the next step. This minimizes idle time and brings GPU utilization closer to that of centralized, tightly coupled systems. Psyche also further improves on compression techniques introduced with DisTro, shrinking communication payloads by an additional 3x.

Psyche can be implemented using either a fully onchain (via Solana) or offchain setup. It has three main actors: coordinator, clients, and data providers. The coordinator houses all the information necessary to facilitate the training run, including the latest state of the model, the clients participating, and the assignment of data and output verification. Clients are the actual GPU providers that execute the training tasks during a training run. In addition to model training, they participate in the witness process (covered below). Data providers, which the clients can house themselves, provide the necessary data for training.

Nous Psyche Training Architecture

Psyche divides training into two distinct intervals, epochs and steps. This creates natural entry and exit points for clients, allowing them to participate without committing to a full training run. This structure helps minimize opportunity costs for GPU providers who may not be able to commit their resources for the full duration of the run.

At the start of an epoch, the coordinator defines key parameters: the model architecture, the dataset to be used, and the number of clients required. A short warmup phase follows, where clients sync to the most recent model checkpoint, either from a public source or peer-to-peer from other clients.  Once training begins, each client is assigned a portion of the data and conducts a training step locally. After computing an update, the client broadcasts its result to the rest of the network, along with a cryptographic commitment (a SHA-256 hash proving the work was done correctly).

A subset of clients is randomly selected to act as witnesses during each round and serve as the primary verification mechanism for Psyche. These witnesses train as usual, but also verify which client updates were received and valid. They submit Bloom filters, which are lightweight data structures that efficiently summarize this participation, to the coordinator. While Nous themselves admits that this method is imperfect, because it can yield false positives, the researchers are willing to accept that tradeoff for increased efficiency. Once a quorum of witness confirmations is reached for a given update, the coordinator applies the update to the global model and allows all clients to synchronize their models before proceeding to the next round. 

Crucially, Psyche is designed to allow training and verification to overlap. As soon as a client submits its update, it can immediately begin training the next batch rather than having to wait for the coordinator or other clients to finalize their previous round. This overlapping design, along with DisTrO's compression, ensures that communication overhead remains minimal and that GPUs are not left idle.

Client Interaction Workflow During Training Process

In May 2025, Nous Research launched its largest training run yet: Consilience, a 40-billion-parameter transformer being pretrained on roughly 20 trillion tokens across Psyche's decentralized training network. Training is still underway. So far, the run has been largely smooth, but a handful of loss spikes have appeared, signaling that the optimization trajectory briefly drifted away from convergence. In response, the team rolled back to the last healthy checkpoint and wrapped the optimizer with OLMo's Skip-Step safeguard, which automatically skips any update whose loss or gradient norm is several standard deviations away from the mean, reducing the risk of future loss spikes.

Solana's Role 

While Psyche can operate in an offchain environment, it's intended to be used on the Solana blockchain. Solana serves as the trust and accountability layer for the training network, recording client commitments, witness attestations, and training metadata onchain. This creates an immutable audit trail for each training round, enabling transparent verification of who contributed, what work was done, and whether it passed validation. 

Nous also plans to use Solana to facilitate training reward distribution. Although the project has not yet released formal tokenomics, Psyche documentation outlines a system in which the coordinator would track client compute contributions and assign points based on verified work. These points could then be redeemed for tokens using a treasurer smart contract, which functions as an onchain escrow. Clients who complete valid training steps could claim rewards directly from this contract according to their contribution. Psyche has yet to use the reward mechanism in training runs, but the system is expected to play a central role in distributing the Nous crypto token once it is officially launched.

Hermes Model Series

Alongside these research contributions, Nous has established itself as a leading open-source model developer through its Hermes series of instruction-tuned LLMs. In August 2024, the team launched Hermes-3, a suite of full-parameter fine-tuned off Llama 3.1 that achieved competitive results on open leaderboards, placing it alongside much larger proprietary models despite its comparatively modest scale.

Most recently, in August 2025, Nous unveiled Hermes-4, its most advanced family of models yet. Hermes-4 focused on making models better at step-by-step reasoning while still excelling at general instruction-following. It showed strong results across math, coding, comprehension, and general knowledge tests. True to Nous' open-source mission, the team publicly released all Hermes-4 model weights for anyone to use and build on. Additionally, Nous released an accessible interface for the model called Nous Chat, providing free access for the first week following its launch.

The Hermes model releases not only cement Nous' credibility as a model-building organization but also serves as practical validation of its broader research agenda. Each Hermes release provided evidence that cutting-edge capabilities could be achieved in the open, laying the groundwork for the team's decentralized training breakthroughs (DeMo, DisTrO, and Psyche) and culminating in the ambitious Consilience 40B run.

Atropos

As discussed above, RL is playing an increasingly important role in post-training due to advancements in reasoning models and scaling limitations with pre-training. Atropos is Nous' solution to RL in a decentralized setting. It is a plug-and-play modular RL framework for LLMs that is adaptable to different inference backends, training methods, datasets, and RL environments.

When conducting RL post-training in a decentralized manner with a large number of GPUs, the prompt outputs that the models generate during the training process will have different completion times. Atropos serves as a rollout handler, i.e., a central orchestrator, for coordinating task generation and completion across devices, enabling asynchronous RL training.

An initial release of Atropos launched in April, but for now only includes an environments framework that coordinates the tasks for RL. Nous plans to release complementary training and inference frameworks in the coming months.

Prime Intellect 

Background

Founded in 2024, Prime Intellect is a company dedicated to building infrastructure for decentralized AI development at scale. The team, co-founded by Vincent Weisser and Johannes Hagemann, began by focusing on aggregating compute resources from centralized and decentralized providers to support collaborative, distributed training of advanced AI models. The mission is to democratize AI development, enabling researchers and developers worldwide to access scalable compute and collectively own open AI innovation.

OpenDiLoCo, Intellect-1, and PRIME

In July 2024, Prime Intellect released OpenDiLoCo, an open-source version of a low-communication model training method called DiLoCo that was developed by Google's DeepMind for data parallel training. Google developed the model based on the view that "at modern scale, training via standard backpropagation poses unprecedented engineering and infrastructure challenges…it is difficult to collocate and tightly synchronize a large number of accelerators." While this statement focuses on the practicality of large-scale training, rather than the ethos of open-source development, it is a tacit acknowledgement of the limitations of centralized training over the long term and the need for distributed alternatives.

DiLoCo reduces the frequency and amount of information shared between GPUs training a model. In a centralized setting, GPUs share all the updated gradients with each other after each step of training. In DiLoCo, updated gradients are shared less frequently to reduce communication overhead. This creates a dual-optimization architecture, where individual GPUs (or clusters of GPUs) run an inner optimization that updates weights on their own models after every step, and an outer optimization where the inner optimizations are shared among GPUs, which then all update with the aggregate of the changes made.

In its initial release, OpenDiLoCo demonstrated 90% to 95% GPU utilization, meaning almost none of the machines were idle despite being sprinkled across two continents and three countries. OpenDiLoCo was able to reproduce comparable training results and performance relying on 500x less communication (evidenced below in the purple line below catching up with the blue one) to centralized counterparts. For a visualization of the training process, watch this video released by the Prime Intellect team demonstrating OpenDiLoCo in action.

The vertical axis represents perplexity, a measure of how well a model predicts the next token in a sequence. Lower perplexity means the model is more confident and accurate in its predictions. Source: Prime Intellect.

In October 2024, Prime Intellect began training INTELLECT-1, the first 10 billion-parameter language model trained in a distributed manner. Training took 42 days, after which the model was open-sourced. It was conducted on three continents and five countries. The training run demonstrated incremental improvements in distributed training, with 83% utilization across all compute, and 96% on communication between nodes only in the United States. GPUs for the project were sourced from Web2 and Web3 providers, including crypto GPU marketplaces such as Akash, Hyperbolic, and Olas.

INTELLECT-1 used Prime Intellect's new training framework, PRIME, which allows the Prime Intellect training system to adapt when compute unexpectedly enters and leaves ongoing training runs. It introduces innovations such as ElasticDeviceMesh that let contributors drop in or out on the fly.

Active training nodes over training steps, demonstrating training architecture's ability to handle dynamic node participation. Source: Prime Intellect

INTELLECT-1 was major validation of Prime Intellect's approach to decentralized training and received praise from AI thought leaders like Jack Clark (co-founder of Anthropic) as a viable demonstration of decentralized training.

Protocol

In February, Prime Intellect added another layer to its stack with the introduction of Protocol. Protocol ties together all Prime Intellect's training tooling to create a peer-to-peer network for decentralized model training. This includes:

  • Compute exchange for GPUs to contribute to training runs.

  • The PRIME training framework reducing communication overhead and improving fault tolerance.

  • An open-source library called GENESYS for synthetic data generation and verification useful in RL fine-tuning.

  • A lightweight verification system called TOPLOC for validating model execution and output from participating nodes.

Protocol plays a similar role to Nous' Psyche and has four main actors: 

  • Workers: Software that enables users to contribute their compute resources for training or other Prime Intellect AI-related products.

  • Validators: Verify the contribution of compute and work to prevent malicious behavior. Prime Intellect is working to adapt TOPLOC, a state-of-the-art inference verification algorithm, to decentralized training.

  • Orchestrator: A way for compute pool creators to manage workers. This fulfills a similar role as Nous' orchestrator. 

  • Smart Contracts: Track who supplied compute, slash bad actors' stakes, and pay out rewards autonomously. Currently running on Ethereum L2 Base's Sepolia testnet, but Prime Intellect has stated its intent to eventually transition to its own chain. 

Protocol training, step by step. Source: Prime Intellect

Protocol is designed to eventually let contributors own a stake in the model or earn rewards for their work while giving open-source AI projects new ways to fund and manage development through smart contracts and collective incentives.

INTELLECT 2 and Reinforcement Learning

In April, Prime Intellect began training a 32 billion parameter model called INTELLECT-2. While INTELLECT-1 focused on training a foundational model, INTELLECT-2 used RL on another open-source model (Alibaba's QwQ-32B) to train a reasoning model.

The team introduced two key infrastructure components to make this kind of decentralized RL training practical: 

  • PRIME-RL, a fully asynchronous reinforcement learning framework that splits the process into three independent stages: generating candidate answers; training on selected ones; and broadcasting updated model weights. This decoupling allows the system to work across unreliable, slow, or geographically dispersed networks. The training used another Prime Intellect innovation, GENESYS, to generate thousands of math, logic, and coding problems, along with automatic checkers that could instantly grade whether an answer was right or wrong. 

  • SHARDCAST, a new system for distributing large files (such as updated model weights) quickly across the network. Instead of each machine downloading updates from a central server, SHARDCAST uses a structure where machines share updates with one another. This keeps the network efficient, fast, and resilient. 

Intellect-2 Distributed RL Training Infrastructure. (Source: Prime Intellect)

For INTELLECT-2, contributors were also required to stake testnet crypto tokens to take part in the training run. If they contributed valid work, they were rewarded automatically. If not, their stake could be slashed. While no real money was involved during this test run, it highlights that there are some initial early forms of crypto-economic experimentation happening. Significantly more experimentation is required in this area, and we expect further changes in how crypto economics are used for security and incentivization.  Beyond INTELLECT-2, Prime Intellect is continuing work on several major initiatives not covered in this report, including:

  • SYNTHETIC-2, a next-generation framework for generating and verifying reasoning tasks;

  • Prime Collective Communications Library, which implements efficient and fault-tolerant collective communications operations such as reductions over IP and provides shared state synchronization mechanisms to keep peers in sync and allow for the dynamic joining and leaving of peers at any point during training, along with automatic bandwidth-aware topology optimization;

  • Ongoing enhancements to TOPLOC for scalable, low-cost proof-of-inference to verify model outputs; and

  • The improvement of Prime Intellect's protocol and crypto-economic layers based on lessons from INTELLECT2 and SYNTHETIC1

Pluralis Research

Alexander Long, an Australian machine learning researcher with a PhD from the University of New South Wales, viewed open-source model training as overly reliant on the leading AI labs providing foundational models for others to train. In April 2023, he set up Pluralis Research to chart a different path.

Pluralis Research tackles decentralized training with an approach called Protocol Learning, which it describes as "low-bandwidth, heterogeneous multi-participant, model-parallel training and inference." A major distinguishing feature of Pluralis is its economic model, which gives contributors to a training model equity-like upside to incentivize compute contributions and attract top-tier open-source software researchers. The economic model is premised on a core property of "unextractibility": that no one participant can obtain a full weights set, which in turn is tied to the training methodology and the use of model parallelism.

Model Parallelism

Pluralis' training architecture leverages model parallelism, different from the data parallel approach implemented by Nous Research and Prime Intellect in their initial training runs. As model size grows, even an H100 rack (one of the most advanced GPU setups) becomes insufficient to hold the full model. Model parallelism introduces one solution to this problem by splitting individual components of a single model across multiple GPUs.

There are three primary approaches to model parallelism.

  • Pipeline parallelism: The model's layers are divided across different GPUs. Each mini-batch of data flows through these GPUs like an assembly line during training.

  • Tensor (intra-layer) parallelism: Instead of giving each GPU whole layers, the heavy math inside each layer is split up so several GPUs share the work for a single layer at once.

  • Mixed parallelism: In practice, large models mix approaches, using pipeline and tensor parallelism together, often alongside data parallelism.

Model parallelism is an important advancement for distributed training because it allows for frontier-scale models to be trained, enables lower-tier hardware to participate, and ensures no one participant has access to the full set of model weights.

Protocol Learning and Protocol Models

Protocol Learning is Pluralis' framework for model ownership and monetization in a decentralized training context. Pluralis highlights three key principles that make up the Protocol Learning framework - decentralization, incentivization, and trustlessness.

A primary differentiator of Pluralis from other projects is its focus on model ownership. Given that models primarily derive their value from their weights, Protocol Models (PM) is an attempt to split up the weights of a model in a way that no single participant in a model training process ever has the full set of weights. In its final form, this would give each contributor to the trained model an ownership stake and thus a share of the revenue the model produces.

Positioning of different language models by training setup (open vs. closed data) and model weight availability (open vs. closed). Source: Pluralis

This is a fundamentally different approach to decentralized model economics than prior examples. Other projects incentivize contributions by providing a pool of funds that is distributed to contributors during the training cycle according to a specific measure (usually time or compute power contributed). Contributors to Pluralis are incentivized to devote their resources only to those models they believe have the best chance of being successful. Training a model that underperforms would be a waste of compute, energy, and time because a poorly performing models would produce no revenue.

This differs from prior approaches in two ways. First, it doesn't require an individual who wants to train a model to raise initial funding to pay contributors, lowering barriers to entry for model training and development. Second, it arguably better aligns incentives between model designers and compute providers because both will want the final version of the model to be the best possible version to ensure its success. It also opens the possibility for model training specializations to emerge. For example, there may be more risk-tolerant trainers that provide compute to earlier-stage/experimental models in search of a larger return (akin to a VC investor) versus compute providers that only target proven models with a higher chance of adoption (akin to a private equity investor).

While PM may represent a major unlock in decentralized training monetization and incentivization, Pluralis has yet to lay out how it will be implemented in detail. Given the highly complex nature of the approach, outstanding questions include how to assign ownership of the model, distribute revenues, and even govern the model's future upgrades or use cases.

Decentralized Training Innovations

Beyond economic considerations, Protocol Learning also faces the same core challenge as other decentralized training projects in training large AI models using a network of heterogeneous GPUs with communication constraints.

In June, Pluralis announced the successful training of an 8 billion parameter LLM based on Meta's Llama 3 architecture and published its Protocol Models paper. In it, Pluralis demonstrates how it can shrink the communication overhead between GPUs doing model parallel training. It does this by confining the signals that flow through each transformer layer to a tiny, pre-chosen sub-space, compressing the forward and backward passes by up to 99% for a 100x decrease in network traffic, without hurting accuracy or adding noticeable overhead. In plain terms, Pluralis has found a way to squeeze the same learning information into a fraction of the bandwidth required by earlier methods.

This was the first decentralized training run where the model itself was split across nodes connected via low bandwidth, rather than being replicated. The team successfully trained an 8-billion-parameter Llama model on low-end consumer GPUs spread over four continents, connected only by everyday 80 megabyte per second home internet links. In the paper, Pluralis proves the model converged just as well as when run on a 100 Gb/s data center cluster. In practice, this means large-scale model-parallel decentralized training is now feasible.

Finally, in July, a paper by Pluralis on asynchronous training for pipeline parallel training was accepted by ICML (one of the leading AI conferences). Pipeline parallel training, when conducted over the internet instead of in high-speed datacenters, also faces communication bottlenecks because nodes essentially operate like an assembly line, with each consecutive node waiting on the prior one for updates to the model. This leads to stale gradients and late transfer of information. SWARM, the decentralized training framework demonstrated in the paper, removes the two classic bottlenecks that normally exclude everyday GPUs from participating in training: memory capacity and tight synchronization. Their removal leads to better utilization of all available GPUs, faster training times, and lower costs, all critical for scaling large models using distributed, volunteer-based infrastructure. For a short explanation of this process, watch this video by Pluralis.

Looking ahead, Pluralis has stated it plans to soon launch a live training run to which anyone can contribute, but no date has been given. This launch will provide greater insights into aspects of the protocol not yet published, notably the economic model and crypto infrastructure.

Templar

Background

Templar launched in November 2024 as an incentive-driven marketplace for decentralized AI tasks on a subnet of the Bittensor protocol. It started as an experimental framework to pool global GPU resources for permissionless AI pre-training and aims to redefine AI development by making large-scale model training accessible, secure, and resilient through Bittensor's tokenized incentives.

From the outset, Templar took on the challenge of coordinating decentralized training over the internet for LLM pre-training. This is a tall order because latency, bandwidth constraints, and heterogeneous hardware make it difficult for distributed participants to match the efficiency of centralized clusters, where seamless GPU communication enables rapid iterations on massive models.

Most critically, Templar prioritizes truly permissionless participation, allowing anyone with computational resources to contribute to AI training without approval, registration, or gatekeeping. This permissionless approach is fundamental to Templar's mission of democratizing AI development, because it ensures that breakthrough AI capabilities aren't controlled by a few centralized entities but can emerge from global, open collaboration.

Training on Templar

Templar uses data parallelism for training, with two main actors:

  • Miners: These participants perform training tasks. Each miner synchronizes with the latest global model, fetches a unique shard of data, trains locally using forward and backward passes, compresses gradients using the custom CCLoco optimizer (covered below), and submits gradient updates.

  • Validators: Validators download and decompress submitted updates from miners, apply them to a local copy of the model, and compute loss deltas, a measure of improvement to the model. These deltas are used to score miner contributions via Templar's Gauntlet system.

To handle communication overhead, Templar's research team first developed Chunk Compressed DiLoCo (CCLoco). Similar to Nous, CCLoco improves on communication-efficient training techniques such as Google's DiLoCo framework, leading to orders of magnitude less inter-node communication cost while reducing the loss degradation often incurred by such methods. Instead of sending full updates every step, CCLoco shares only the most important changes at set intervals and maintains a small running tally so nothing meaningful is lost. The system operates on a competition-based model where miners are incentivized to provide low-latency updates to earn rewards. To be rewarded, miners must keep up with the network's pace through deployment of efficient hardware. This competitive structure is designed to ensure that only participants capable of maintaining adequate performance contribute to the training process, while lightweight sanity checks filter out obviously bad or malformed updates. In August, Templar formally published the updated training architecture and renamed it SparseLoCo.

Validators use Templar's Gauntlet system to track and update each miner's skill rating based on the observed contribution to model loss reduction. Using a technique called OpenSkill, high-quality miners with consistent, impactful updates gain higher skill ratings, increasing their influence on model aggregation and earning more TAO, the Bittensor network's native token. Lower-rated miners are discarded during aggregation. After scoring, the highest staked validator aggregates updates from the top-ranked miners, signs the new global model, and publishes it to storage. This version of the model is then used by miners to catch up if they fall out of sync.

Templar decentralized training architecture. Source: Templar Team.

Templar has initiated three training runs to date: Templar I, Templar II, and Templar III. Templar I was a 1.2 billion parameter model and involved nearly 200 GPUs across the globe. Templar II is in progress and is training an 8 billion parameter model, with plans to launch a larger training run soon. Templar's focus on training smaller parameter models at this stage is a deliberate choice to ensure the decentralized training architecture upgrades (discussed above) work before scaling to larger model sizes. From optimization strategies and scheduling to research iterations and incentive structures, validating these ideas on a smaller, 8 billion parameter model enables the team to iterate rapidly and more cost-effectively. Following recent progress and the formal publication of the training architecture, in September, the team launched Templar III, a 70 billion parameter model and the largest pre-training run in the decentralized space to date.

TAO and Incentive Mechanisms

A key distinguishing feature of Templar is its incentive model tied to TAO. Rewards are distributed based on skill-weighted contribution to model training. Where most protocols (e.g., Pluralis, Nous, Prime Intellect) have built permissioned runs or prototypes, Templar is fully operational on Bittensor's live network. This makes Templar the only protocol with a live, permissionless economic layer already integrated into its decentralized training framework. This real-time, in-production deployment allows Templar to iterate on its infrastructure in live training-run scenarios.

Each Bittensor subnet operates with its own "alpha" token that serves as a reward mechanism and a market signal for the subnet's perceived value. Templar's alpha token is called gamma. Alpha tokens cannot be freely traded on external markets; they can only be exchanged for TAO through their subnet's dedicated liquidity pool using an automated market maker (AMM). Users can stake TAO to receive gamma, or unstake gamma back to TAO, but cannot directly exchange gamma for another subnet's alpha tokens. Bittensor's dynamic TAO (dTAO) system uses the market price of alpha tokens to determine emission allocation across subnets. When gamma's price rises relative to other alpha tokens, it signals stronger market confidence in Templar's decentralized training capabilities, leading to higher emissions of TAO for the subnet. As of early September, Templar received ~4% of daily emissions, putting it in the top six of the TAO network's 128 subnets.

The subnet emissions work more specifically as follows. In each 12-second block, the Bittensor chain emits TAO and alpha tokens to a subnet's liquidity pool in proportion to the price of its alpha token relative to other subnets. Up to one whole alpha token (initial emission rate, subject to halvings) is emitted per block to subnets and then used to incentivize subnet contributors, split 41% to miners, 41% to validators (plus their stakers), and 18% to the subnet owner.

This incentive structure drives contributions to the Bittensor network by aligning economic rewards with the value provided by participants. Miners are motivated to deliver high-quality AI outputs, such as model training or inference tasks, in order to earn higher scores from validators and thus a larger share of emissions. Validators (and their stakers) are rewarded for accurately evaluating and maintaining network integrity.

The market valuation of alpha tokens, determined by staking activity, ensures that subnets demonstrating greater utility attract more TAO inflows and emissions, creating a competitive environment that encourages innovation, specialization, and sustained development. Subnet owners, receiving a dedicated portion of rewards, are incentivized to design effective mechanisms and attract contributors, ultimately fostering a permissionless, decentralized AI ecosystem where global participation advances collective intelligence.

The mechanism also introduces new incentive challenges, such as keeping validators honest, resisting Sybil attacks, and mitigating collusion. Bittensor subnets are often afflicted by cat-and-mouse games between validators or miners trying to game the system and subnet creators trying to thwart them. Over the long run, these struggles should make the system one of the most robust to gaming as subnet owners learn how to outfox the bad actors.

Gensyn

Gensyn published its first litepaper laying out a framework for decentralized training in February 2022 (Gensyn was the only decentralized training protocol covered in our initial piece on Understanding the Intersection of Crypto and AI last year). At the time, the protocol was focused primarily on verification for AI-related workloads, enabling users to submit training requests to the network to be fulfilled by compute providers and ensure those requests were executed as promised.

The original vision also highlighted the need to identify ways to accelerate applied machine learning (ML) research. In 2023, Gensyn built on this vision and articulated a broader need for access to machine learning compute at a global scale that could service AI-specific applications. It introduced the GHOSTLY principles as a framework for what such a protocol must satisfy: Generalizability, Heterogeneity, Overhead, Scalability, Trustlessness, and Latency. Gensyn had always been focused on building the compute infrastructure, and this marked their formal expansion to encompass other critical resources beyond compute.

At the core, Gensyn divides its training tech stack into four distinct components - execution, verification, communication, and coordination. Execution handles machine learning operations on any device in the world capable of performing them. Communication and coordination enable devices to send each other information in a standardized way. And verification ensures everything can be computed without requiring trust.

Execution - RL Swarm

Gensyn's first implementation in this stack is a training system called RL Swarm, a decentralized coordination mechanism for post-training reinforcement learning.

RL Swarm is designed to allow multiple compute providers to contribute to the training of a single model in a permissionless, trust-minimized environment. The protocol is structured around a three-step loop: Answer, Critique, and Resolve. First, each participant generates a model output in response to a prompt (Answer). Then, other participants evaluate that output using a shared reward function and submit feedback (Critique). Finally, these critiques are used to select the best responses, which are incorporated into the next version of the model (Resolve). This entire process happens peer-to-peer, without relying on a central server or a trusted authority.

RL Swarm Training Loop. Source: Gensyn

RL Swarm builds on the emerging importance of reinforcement learning in model post-training. As models hit scale ceilings in pre-training, RL offers a mechanism to improve reasoning, instruction-following, and factuality without retraining on massive datasets. Gensyn's system enables this improvement in a decentralized setting by breaking down the RL loop into distinct roles, each of which can be verified independently. Critically, it introduces fault-tolerant, asynchronous execution, meaning contributors don't always need to be online or in perfect sync to participate.

It's also inherently modular. The system doesn't require the use of a specific model architecture, data type, or reward structure, allowing developers to customize the training loop for their specific use case. Whether training a coding model, a reasoning agent, or a model with a specific instruction set, RL Swarm provides the scaffolding for decentralized RL workflows to run reliably at scale.

Verification - Verde

One of the least discussed aspects of decentralized training so far in this report is verification. Enter Verde, Gensyn's trust layer for its GPU marketplace. With Verde, Gensyn introduces a new verification mechanism so that users of the protocol can trust that those on the other end are doing what they say they do.

Every training or inference task is dispatched to a set number of independent providers determined by the application. If their outputs match exactly, the job is accepted. If they differ, a referee protocol locates the first step where the two traces diverge and re-computes only that single operation. The party whose numbers match the referee's keeps its payment, while the other forfeits its stake.

What makes this feasible is RepOps, a library of "reproducible operators" that forces common neural-network math (matrix multiplies, activations, and so on) to run in a fixed, deterministic order on any GPU. The deterministic aspect is most critical here; otherwise, verifiers might produce differing results despite both being correct. Honest providers, therefore, produce bit-for-bit identical results, letting Verde treat a match as proof of correctness. Because the referee replays just one micro-step, the added cost is a few percent, not the 10,000x overhead of full cryptographic proofs typically used in these processes.

Verde Verification Protocol Architecture. (Source: Gensyn)

In August, Gensyn released Judge, a verifiable AI evaluation system with two core components: Verde and a reproducible runtime that guarantees bitwise-identical results across hardware. To showcase it, Gensyn introduced a "progressive reveal game," where AI models bet on answers to complex problems as information is revealed, with Judge deterministically verifying outcomes and rewarding accurate early predictions.

Judge is significant because it addresses trust and scalability in AI/ML. It enables reliable model comparison, fosters transparency in high-stakes contexts, and reduces risks of bias or manipulation by allowing independent verification. Beyond reasoning tasks, Judge could support other use cases, like decentralized dispute resolution, and prediction markets, aligning with Gensyn's mission to build infrastructure for trustworthy, distributed AI compute. Ultimately, tools like Judge could strengthen reproducibility and accountability, which are critical as AI becomes more central to society.

Communication and Coordination: Skip-Pipe and Diverse Expert Ensembles

Skip-Pipe is Gensyn's solution to the bandwidth bottleneck that appears when a single giant model is sliced across many machines. As previously discussed, traditional pipeline training forces every micro-batch to traverse all layers in order, so any slow node stalls the line. Skip-Pipe's scheduler dynamically skips or re-orders layers that would create a delay, cutting iteration time by up to 55% and staying usable even if half the nodes fail. By slashing inter-node traffic and allowing layers to be dropped as needed, it lets trainers stretch very large models across a patchwork of geographically scattered, lower-bandwidth GPUs.

Diverse Expert Ensembles solve a different coordination challenge: how to build a strong "mixture-of-experts" system without constant crosstalk. Gensyn's Heterogeneous Domain Expert Ensemble (HDEE) trains each expert model completely independently and merges them only at the end. Counterintuitively, the resulting ensemble beat a uniform baseline on 20 of 21 test domains with the same overall compute budget. Because no gradients or activations flow between machines during training, any idle GPU can contribute.

Together, Skip-Pipe and HDEE give Gensyn a communication-efficient playbook. The protocol can shard within a single model when necessary or train many small specialists in parallel when independence is cheaper, and do both without the traditional need for flawless, low-latency networking.

Testnet

In March, Gensyn deployed its testnet on a custom Ethereum rollup. The team plans to roll out updates to the testnet piecemeal. For now, users can participate in three of Gensyn's offerings, RL Swarm, BlockAssist, and Judge. RL Swarm, as described above, enables users to contribute to the RL post-training process. In August, the team launched BlockAssist, "the first high-scale demo of assistance learning, a method for training agents directly from human actions, without manual labeling or RLHF." Users can download Minecraft and play the game with BlockAssist to train a Minecraft model.

Other Notable Projects

The above sections provide an overview of prominent architectures being implemented to achieve decentralized training. However, new projects enter the fray regularly. Here are some new projects in the decentralized training space:

Fortytwo: Fortytwo is built on the Monad blockchain and specializes in swarm inference, where multiple small language models (SLMs) collaborate across a network of nodes to process queries and generate peer-reviewed outputs for enhanced accuracy and efficiency. The system leverages consumer-grade hardware, such as idle laptops, bypassing the need for costly GPU clusters typical in centralized AI. The architecture includes decentralized execution of inference and aspects of training, like synthetic dataset generation for specialized models. The project is live on the Monad devnet.

Ambient: Ambient is an upcoming "proof of useful work" layer-1 blockchain that aims to power always-on, autonomous AI agents onchain, enabling continuous task execution, learning, and evolution in a permissionless ecosystem without centralized oversight. It will feature a single, open-source model collaboratively trained and improved by network miners, with contributors earning rewards for contributions to training, building, and utilizing AI models. While Ambient emphasizes decentralized inference, especially for agents, miners on the network will also be responsible for continually updating the base model that powers the network. Ambient uses a novel proof-of-logits mechanism (a system where validators can verify that miners correctly ran model computations by checking their raw output values, known as logits). The project is being built using a fork of Solana and has not launched yet.

Flower Labs: Flower Labs is developing Flower, an open-source framework for federated learning that enables collaborative AI model training across decentralized data sources without sharing raw data, thereby preserving privacy while aggregating model updates. Founded to address data centralization issues, Flower allows institutions and individuals to train models on local data, such as in healthcare or finance, while contributing to global improvements through secure parameter sharing. Unlike crypto-native protocols emphasizing token rewards and verifiable compute, Flower prioritizes privacy-preserving collaboration for real-world applications, making it ideal for regulated industries without requiring a blockchain.

Macrocosmos: Macrocosmos operates on the Bittensor network and is developing a complete pipeline for AI model creation across five subnets focused on pretraining, fine-tuning, data collection, and decentralized science. It introduces the Incentivized Orchestration Training Architecture (IOTA) framework for pretraining large language models on heterogeneous, unreliable, and permissionless hardware and has initiated 1B+ parameter training runs with plans to scale to larger parameter models soon.

Flock.io: Flock is a decentralized AI training ecosystem integrating federated learning with blockchain infrastructure to enable privacy-preserving, collaborative model development across a modular, token-incentivized network. Participants can contribute models, data, or compute resources and receive onchain rewards proportionate to their contributions. To protect data privacy, the protocol uses federated learning. This enables participants to train a global model using local data that is not shared with anyone else. While the setup necessitates additional verification steps to keep irrelevant data (often referred to as data poisoning) out of the model training, it has been an effective pitch for use cases like healthcare applications, where multiple healthcare providers can train a global model without revealing highly sensitive medical data.

Outlook and Risks

Over the past two years, decentralized training has shifted from an interesting concept to functioning networks operating in the wild. While these projects are still far from their intended end state, meaningful progress is being made toward making decentralized training a reality. Reviewing the existing decentralized training landscape, several trends are beginning to emerge:

Live proof-of-concepts are no longer hypothetical. Over the past year, early proofs like Nous's Consilience and Prime Intellect's INTELLECT-2 have crossed into production-scale runs. Breakthroughs like OpenDiLoCo and Protocol Models are enabling high-performance AI on distributed networks, fostering cost-effective, resilient, and transparent model development. These networks are coordinating dozens, sometimes hundreds, of GPUs to pre-train and fine-tune mid-sized models in real time, proving that decentralized training can move beyond closed demos and ad-hoc hackathons. While these are still not permissionless networks, Templar stands out here; its success reinforces the view that decentralized training is advancing from simply proving the underlying technology works to scaling to match the performance of centralized models and attracting the GPU resources needed to produce foundational models at scale.

Model sizes are climbing, but a gap remains. From 2024 to 2025, decentralized projects jumped from single-digit billion to 30 billion- to 40 billion-parameter models. Yet leading AI labs are already releasing multi-trillion-parameter systems and continue to innovate rapidly thanks to their vertically integrated datacenters and state-of-the-art hardware. Decentralized training can bridge the gap thanks to its ability to leverage training hardware from around the world, especially as centralized training methods face growing constraints due to the need for more and more hyperscaler datacenters. But closing that delta will depend on further breakthroughs in communication-efficient optimizers and gradient compression that allow for global scale, as well as incentivization and verification layers that cannot be gamed.

Post-training workflows are a growing area of focus. Supervised fine-tuning, RLHF, and domain-specific reinforcement learning demand far less synchronous bandwidth than full-scale pre-training. Frameworks like PRIME-RL and RL Swarm already work on erratic consumer-grade nodes, letting contributors monetize spare cycles while projects commercialize bespoke models quickly. Given that RL is well-suited for decentralized training, it is likely to only grow in importance as an area of focus for decentralized training projects. This positions decentralized training to potentially first find product-market fit at scale in RL training, evidenced by the growing number of teams rolling out RL-specific training frameworks.

Incentives and verification lag technical innovations. Incentives and verification still lag technical innovation. Only a handful of networks, most notably Templar, deliver real-time token rewards and onchain slashing that meaningfully discourage misbehavior and have been tested in live environments. While other projects are experimenting with reputation scores, witness attestations, or proof-of-training schemes, these systems remain unproven. Even if technological barriers are overcome, governance will pose an equally difficult challenge because decentralized networks must find ways to set rules, enforce them, and resolve disputes without replicating the inefficiencies seen in crypto DAOs. Solving technological hurdles is only the first step; long-term viability depends on pairing them with credible verification, effective governance, and compelling monetization/ownership structures that can ensure trust in the work being done and attract the talent and resources needed to scale.

Stacks are converging into end-to-end pipelines. Most leading teams now combine bandwidth-aware optimizers (DeMo, DisTrO), decentralized compute exchanges (Prime Compute, Basilica), and onchain coordination layers (Psyche, PM, PRIME). The result is a modular, open pipeline that mirrors a centralized lab's data-to-deployment workflow, only without a single point of control. Where projects do not directly integrate their own solutions, or even if they do, they can also plug into other crypto projects that specialize in verticals needed for decentralized training, such as data-provisioning protocols, GPU and inference marketplaces, and decentralized storage backbones. This surrounding infrastructure gives decentralized training projects plug-and-play components that can further be leveraged to enhance their offerings and better compete with centralized peers.

Risks

Hardware and software optimization is a moving target-and central labs keep stretching the field. Nvidia's Blackwell B200 chips just posted 2.2-2.6x faster training throughput than the previous generation on MLPerf benchmarks for both 405 billion-parameter pre-training and 70 billion LoRA fine-tuning, collapsing the time and energy cost for the biggest players. On the software side, PyTorch 3.0 and TensorFlow 4.0 introduced compiler-level graph fusion and dynamic-shape kernels that squeeze still more performance out of the same silicon. As hardware and software optimizations improve, or new training architectures are discovered, decentralized training networks will have to keep pace, continually updating their stacks to accommodate the fastest and most advanced training methods to attract talent and incentivize meaningful model development. This will require teams to develop software that ensures continued high performance, no matter the underlying hardware, as well as software stacks that make these networks adaptable to shifts in underlying training architecture.

Incumbents have open-sourced models, blurring the lines between decentralized and centralized training. Centralized AI labs have mostly kept their models closed, reinforcing the case for decentralized training as a way to guarantee openness, transparency, and community governance. While recent releases like DeepSeek, GPT open-source variants, and Llama show a shift toward greater openness, it's unclear if this trend will last amid rising competition, regulation, and safety concerns. Even when weights are released, they still reflect the values and choices of the originating labs - making the ability to train independently critical for adaptability, alignment with diverse priorities, and ensuring access isn't bottlenecked by a few incumbents.

Talent acquisition remains a struggle. Numerous teams told us this. While the quality of talent joining decentralized training efforts has improved, they lack the vast resources of leading AI labs (e.g., OpenAI's recent multi-million dollar "special award" per employee or Meta's $250 million offer to poach a researcher). For now, decentralized projects attract mission-driven researchers who value openness and independence, while also drawing from a broader global pool and vibrant open-source communities. To compete at scale, however, they must prove themselves by training models on par with incumbents and by refining incentive and monetization mechanisms that create meaningful upside for contributors. While permissionless networks and cryptoeconomic incentives offer unique value, failure to gain distribution and build sustainable revenue streams could stymie the long-term growth of the space.

Regulatory headwinds are real, especially for uncensored models. Decentralized training faces a unique regulatory challenge: by design, anyone can train any type of model. While this openness is a strength, it also raises safety concerns, particularly around misuse in areas like biosecurity, misinformation, or other sensitive domains. Policymakers in the EU and U.S. are already signaling tighter scrutiny: the EU AI Act introduces extra obligations for high-risk foundation models, while U.S. agencies are weighing restrictions on open systems and potential export-style controls. A single incident involving a decentralized model used for harmful purposes could prompt sweeping regulations, threatening the very premise of permissionless training.

Distribution and Monetization: Distribution remains a major challenge. Leading labs, including OpenAI, Anthropic, and Google, wield massive distribution advantages through brand recognition, enterprise contracts, integration into cloud platforms, and direct consumer reach. Decentralized training projects, by contrast, lack these built-in channels and must work much harder to get models adopted, trusted, and embedded into real-world workflows. This is likely to be even more challenging given crypto's still nascent integration beyond crypto-to-crypto applications (although that is changing quickly). A very important and still unresolved question is who will actually use these decentralized training models. High-quality open-source models already exist, and once new state-of-the-art models are released, it is not particularly difficult for others to distill or adapt them. Over time, the open-source nature of decentralized training projects should create network effects that address the distribution problem. Even if they can solve distribution, however, teams will then face the challenge of monetizing their products. For now, Pluralis' PM appears to deal most directly with these monetization challenges. This is not just a crypto x AI problem, but a much broader crypto problem, underscoring the challenge ahead.

Conclusion

Decentralized training has quickly evolved from an abstract concept into functioning networks coordinating real training runs across the globe. Over the past year, projects including Nous, Prime Intellect, Pluralis, Templar, and Gensyn have shown that it is possible to stitch together decentralized GPUs, compress communication efficiently, and even begin experimenting with incentive mechanisms in live environments. These early demonstrations prove that decentralized training can move beyond theory, though the path to competing with centralized labs at frontier scale remains steep.

Even if decentralized projects eventually train foundation models that rival today's leading AI labs, their toughest test still lies ahead: proving real-world advantages beyond philosophical appeal. Those advantages might emerge endogenously, through architectures that outperform or through novel ownership and monetization schemes that reward contributors. Alternatively, they may emerge exogenously if centralized incumbents try to stifle innovation by keeping weights closed or injecting unwelcome alignment biases.

Beyond technical progress, attitudes toward the space are beginning to shift. One founder described the change in sentiment at leading AI conferences over the past year this way: a year ago, there was virtually no interest in decentralized training, especially when paired with crypto; six months ago, attendees began acknowledging the underlying problems but doubted feasibility at scale; and in recent months, recognition has grown that ongoing advancements could make scalable decentralized training possible. This evolution of perception suggests momentum is building not just in technology, but in legitimacy.

The risks are real: incumbents maintain hardware, talent, and distribution advantages; regulatory scrutiny looms; and incentive and governance mechanisms remain untested at scale. Yet the upside is equally compelling. Decentralized training represents not just an alternative technical architecture, but a fundamentally different philosophy of how AI should be built: permissionless, globally owned, and aligned to diverse communities rather than a handful of corporations. If even one project can show that openness translates into faster iteration, novel architectures, or more inclusive governance, it would mark a breakthrough moment for both crypto and AI. The road ahead will be long, but the core ingredients for success are now firmly on the table.

Legal Disclosure:
This document, and the information contained herein, has been provided to you by Galaxy Digital Inc. and its affiliates ("Galaxy Digital") solely for informational purposes. This document may not be reproduced or redistributed in whole or in part, in any format, without the express written approval of Galaxy Digital. Neither the information, nor any opinion contained in this document, constitutes an offer to buy or sell, or a solicitation of an offer to buy or sell, any advisory services, securities, futures, options or other financial instruments or to participate in any advisory services or trading strategy. Nothing contained in this document constitutes investment, legal or tax advice or is an endorsement of any of the stablecoins mentioned herein. You should make your own investigations and evaluations of the information herein. Any decisions based on information contained in this document are the sole responsibility of the reader. Certain statements in this document reflect Galaxy Digital's views, estimates, opinions or predictions (which may be based on proprietary models and assumptions, including, in particular, Galaxy Digital's views on the current and future market for certain digital assets), and there is no guarantee that these views, estimates, opinions or predictions are currently accurate or that they will be ultimately realized. To the extent these assumptions or models are not correct or circumstances change, the actual performance may vary substantially from, and be less than, the estimates included herein. None of Galaxy Digital nor any of its affiliates, shareholders, partners, members, directors, officers, management, employees or representatives makes any representation or warranty, express or implied, as to the accuracy or completeness of any of the information or any other information (whether communicated in written or oral form) transmitted or made available to you. Each of the aforementioned parties expressly disclaims any and all liability relating to or resulting from the use of this information. Certain information contained herein (including financial information) has been obtained from published and non-published sources. Such information has not been independently verified by Galaxy Digital and, Galaxy Digital, does not assume responsibility for the accuracy of such information. Affiliates of Galaxy Digital may have owned, hedged and sold or may own, hedge and sell investments in some of the digital assets, protocols, equities, or other financial instruments discussed in this document. Affiliates of Galaxy Digital may also lend to some of the protocols discussed in this document, the underlying collateral of which could be the native token subject to liquidation in the event of a margin call or closeout. The economic result of closing out the protocol loan could directly conflict with other Galaxy affiliates that hold investments in, and support, such token. Except where otherwise indicated, the information in this document is based on matters as they exist as of the date of preparation and not as of any future date, and will not be updated or otherwise revised to reflect information that subsequently becomes available, or circumstances existing or changes occurring after the date hereof. This document provides links to other Websites that we think might be of interest to you. Please note that when you click on one of these links, you may be moving to a provider's website that is not associated with Galaxy Digital. These linked sites and their providers are not controlled by us, and we are not responsible for the contents or the proper operation of any linked site. The inclusion of any link does not imply our endorsement or our adoption of the statements therein. We encourage you to read the terms of use and privacy statements of these linked sites as their policies may differ from ours. The foregoing does not constitute a "research report" as defined by FINRA Rule 2241 or a "debt research report" as defined by FINRA Rule 2242 and was not prepared by Galaxy Digital Partners LLC. Similarly, the foregoing does not constitute a "research report" as defined by CFTC Regulation 23.605(a)(9) and was not prepared by Galaxy Derivatives LLC. For all inquiries, please email [email protected]. ©Copyright Galaxy Digital Inc. 2025. All rights reserved.

Galaxy Digital Inc. published this content on September 15, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on September 15, 2025 at 21:42 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]