Earlier this month, Zyphra released ZUNA, a new foundation model purpose-built for EEG data. We read the paper, dug into the architecture, and came away genuinely excited. This is the kind of infrastructure-level work that could reshape how the entire field handles noninvasive brain signals, and it aligns closely with what we’re building at BrainAccess. Here’s a summary of what ZUNA does, how it works, and why we think it matters.
For decades, electroencephalography (EEG) has been one of the most accessible windows into the working brain. It’s portable, noninvasive, and affordable — which is exactly why it sits at the heart of what we do at BrainAccess. But anyone who has worked with EEG data knows the reality: it’s noisy, it varies wildly across devices and subjects, and a model trained on one channel layout rarely transfers to another.
That’s what makes the release of ZUNA worth paying attention to.
ZUNA is a 380-million-parameter masked diffusion autoencoder, developed by Zyphra, that performs masked channel infilling and super-resolution for EEG signals across arbitrary electrode numbers and positions. It was trained on a harmonized corpus of 208 public datasets (roughly 2 million channel-hours of data) and released under an Apache 2.0 license with full inference code and an MNE-compatible pip package. For the BCI and neuroscience community, this is a meaningful step toward generalizable EEG processing that actually works outside the lab it was built in.
The Problem ZUNA Is Solving
Anyone working with EEG faces a familiar set of headaches. For instance:
- Electrodes go bad mid-recording.
- Consumer headsets have far fewer channels than research-grade systems.
- Datasets from different labs use different montages, different hardware, different channel counts.
- The preprocessing pipelines needed to clean up raw EEG — removing eye blinks, muscle artifacts, line noise — are labor-intensive and require real expertise.
Existing EEG foundation models have tried to address some of these issues, but most are trained on a fixed set of channel inputs and positions. They work well on the montage they were trained on, and poorly on everything else. This is a fundamental limitation: the field needs models that can generalize across the messy diversity of real-world EEG recordings.
ZUNA tackles this directly. Its core architectural innovation is a 4D rotary positional encoding (4D-RoPE) over the x, y, z scalp coordinates and time index of each electrode. Rather than requiring a fixed channel layout, the model learns to represent electrodes by their spatial position on the head. This means it can accept input from a 256-channel research cap, a 32-channel clinical setup, or an 8-channel consumer headband — and produce meaningful outputs from any of them without retraining.
How It Works
ZUNA is a transformer-based encoder-decoder built on diffusion modeling — the same family of generative techniques that powers modern image and audio synthesis. Here’s the pipeline in plain terms.
The model takes multichannel EEG and tokenizes it into short 0.125-second windows (32 samples each). Each channel-window pair becomes a token, and these tokens are serialized in a raster-scan order: all channels for time step 1, then all channels for time step 2, and so on. A 5-second sample with C channels produces 40×C tokens.
Each token carries a 4D coordinate:
- the electrode’s 3D scalp position and
- its time index — injected via the rotary positional encoding.
This is what gives ZUNA its flexibility: the model doesn’t care which channels you have or how many, because it reasons about electrode positions in continuous space rather than relying on a fixed ordering.
The encoder (16 transformer layers) compresses the observed channels into a latent representation. The decoder (another 16 transformer layers) takes this latent plus noised versions of the target signals and iteratively denoises them through the diffusion process, using cross-attention to incorporate the encoder’s output. The model was trained using rectified flow loss with adaptive loss-weighting, and the encoder latent was regularized with an auxiliary MMD loss.
During training, ZUNA used an aggressive channel dropout scheme — 90% of training steps included dropout, with randomly selected channels zeroed out. This forced the model to learn robust cross-channel correlations and to reconstruct missing signals from whatever subset of electrodes remained available.
What ZUNA Can Do
This architecture enables three practical capabilities within a single unified model.
- Channel reconstruction. When electrodes go bad or drop out during a recording, ZUNA can infer the missing signal from the remaining channels and the known spatial position of the missing electrode. This is a constant pain point in both clinical and research EEG — bad channels are inevitable, and current solutions like spherical-spline interpolation are limited, especially when many channels are affected.
- Denoising. Because the model is trained as a diffusion autoencoder with a reconstruction bottleneck, it inherently separates signal from noise. The encoder must compress the meaningful structure of the EEG into its latent space, discarding artifacts in the process. This offers a learned alternative to traditional preprocessing steps like ICA that require manual inspection and domain expertise.
- Spatial upsampling. Given a sparse set of electrodes, ZUNA can generate plausible signals for electrode positions that were never recorded. A consumer headset with 8 channels can be “upsampled” to approximate what a denser montage would capture. The model leverages learned priors about how brain signals propagate spatially across the scalp — going well beyond what geometric interpolation can achieve.
The Training Data
Building a foundation model for EEG requires solving a data problem that doesn’t exist in text or image domains: there is no “internet-scale” EEG corpus. ZUNA’s training set was assembled from the Temple University Hospital EEG Corpus and a large collection of publicly available datasets on OpenNeuro — 208 datasets in total, after filtering for those with reliable 3D electrode coordinate metadata.
The preprocessing pipeline standardized everything to 256 Hz sampling, applied high-pass filtering at 0.5 Hz, adaptive line-noise removal, and conservative artifact rejection. The final corpus comprised over 24 million non-overlapping 5-second epochs, with channel counts per recording ranging from 2 to 256.
This scale is still small compared to other modalities — the authors are candid about this — but it represents the most comprehensive harmonized EEG training corpus assembled to date.
Why This Matters for BrainAccess
At BrainAccess, we build noninvasive EEG hardware designed for real-world use — wearable, comfortable, practical. The persistent challenge for wearable BCI isn’t just hardware; it’s that the signal processing and modeling layers haven’t kept up. A lightweight headband with a handful of electrodes captures real neural information, but historically there’s been no good way to close the gap between what those few channels can provide and what dense research-grade systems offer.
ZUNA changes that equation. A foundation model that can take sparse, real-world EEG data and produce richer spatial representations — reconstructing missing channels, cleaning artifacts, upsampling resolution — makes lightweight wearable devices substantially more capable.
For developers building on the BrainAccess platform, this means better signal quality and more robust feature extraction without requiring users to wear bulky equipment.
The longer-term implication is even more significant. The authors frame ZUNA explicitly as a step toward noninvasive thought-to-text: if foundation models can learn to extract and represent the useful information within scalp EEG, that representation becomes the foundation for downstream decoding of language, imagery, and motor intention.
We’re not there yet, but having a model that handles the messiness of real-world EEG — across devices, across subjects, across recording conditions — is exactly the kind of infrastructure that makes that goal plausible.
Open Source and Practical
One detail worth highlighting: ZUNA is fully open. The model weights are available on Hugging Face under Apache 2.0, the inference and preprocessing code is on GitHub, and there’s a pip-installable package.
At 380M parameters, the model runs comfortably on consumer GPUs and reasonably on CPU.
This matters because reproducibility and accessibility have been persistent problems in EEG deep learning. Many prior models in this space were published without releasing weights or complete inference code, making fair comparisons effectively impossible. ZUNA’s open release sets a better standard and invites the community to build on top of it.
Looking Ahead
The authors identify several clear directions for improvement: scaling both model and data size (early signs of scaling-law behavior are already visible), extending context length beyond 5 seconds to capture longer temporal dynamics, incorporating intracortical EEG data for transfer learning, and architectural refinements tailored to the specific characteristics of brain signals.
For the broader field, ZUNA represents the kind of inflection point that other modalities have already experienced — the moment when general-purpose pretrained models become the starting point rather than each task requiring its own pipeline from scratch. As these models scale and the community builds shared infrastructure around them, the possibility space for what noninvasive EEG can do expands considerably.
We’re paying close attention and building accordingly.
Reference
Warner, C., Mago, J., Huml, J. R., Osman, M., & Millidge, B. (2026). ZUNA: Flexible EEG Superresolution with Position-Aware Diffusion Autoencoders. arXiv preprint arXiv:2602.18478. https://doi.org/10.48550/arXiv.2602.18478

Martina Berto, PhD
Research Engineer & Neuroscientist @ Neurotechnology.

Martina Berto, PhD
Research Engineer & Neuroscientist @ Neurotechnology.

