From Literature to Live Data: LLMs as Full-Cycle Research Companions - BrainAccess

From Literature to Live Data: LLMs as Full-Cycle Research Companions

For decades, the challenge in neuroscience has not only been collecting brain data, it has been making sense of it. EEG signals are noisy, high-dimensional, and shaped by dozens of interacting variables. Interpreting them well has traditionally required years of specialist training. But a growing body of evidence suggests that large language models (LLMs) may be quietly changing that equation: not just as tools for text generation, but as genuine partners in scientific reasoning.

Written by Martina Berto.

LLMS AS SCIENTIFIC ADVISORS

Curated Knowledge Makes the Difference

LLMs trained on general text already carry a surprisingly broad understanding of neuroscience. But a study published in Nature Human Behaviour (Luo et al., 2024) goes further, asking whether giving an LLM dedicated neuroscience knowledge makes it meaningfully better at reasoning about experiments.

To test this, the authors developed BrainBench, a benchmark that presents two versions of a real neuroscience abstract: the actual result, and an expertly altered alternative that sounds plausible but is empirically wrong. The task is to identify which is which. General-purpose LLMs averaged 81.4% accuracy on this task. Human neuroscience experts (chosen between doctoral students, postdocs, and faculty members) averaged 63.4%, even when restricted to those with the highest self-reported expertise in the relevant subfield.

The authors then created BrainGPT: a version of Mistral-7B fine-tuned on over 1.3 billion tokens drawn from neuroscience publications spanning 100 journals between 2002 and 2022. The result was a further 3% improvement in accuracy over an already-superhuman baseline. The mechanism matters: fine-tuning shifted how the model represented correct answers at a deep level, indicating genuine specialization rather than surface-level pattern matching.

Crucially, the authors found that LLM confidence is well-calibrated: when the model expressed high certainty, it was more likely to be correct. Moreover, LLMs and human experts make different errors, meaning human-LLM ensembles consistently outperform either alone. The picture that emerges is of a system that doesn’t replace expert judgment, but complements it in precisely the areas where human cognition struggles: synthesizing thousands of noisy, interrelated findings into reliable predictions.

FROM THEORY TO PRACTICE

Reading Brain Signals in Real Time

Theoretical knowledge of neuroscience is one layer. The other is the ability to work directly with physiological data as it is collected. And here too, LLMs are proving more capable than expected.

A 2025 survey in Frontiers in Neuroinformatics (Chandrasekharan & Jacob, 2025) documents a growing range of applications where LLMs have been applied to EEG signal analysis with meaningful results. A few examples illustrate the practical scope.

  1. EEG-GPT fine-tuned a GPT-3 base model on quantitative EEG features to classify signals as normal or abnormal, achieving performance comparable to deep learning models trained from scratch while using roughly 50 times less labeled data. Importantly, it could also generate a human-readable reasoning chain explaining why a signal was flagged, something traditional classifiers cannot do.
  2. In a similar vein, work on sleep and attention estimation used LLMs to interpret EEG and behavioral data, estimating sleep quality and attention states and generating suggestions for improvement, an early demonstration of LLMs as adaptive, interpretive companions rather than static classifiers.
  3. At the clinical end of the spectrum, a study on neurocognitive disorders used an LLaMA2-based encoding model to quantify the relationship between brain activity during language tasks and cognitive scores in older adults at risk of decline. Brain scores derived from the model correlated significantly with cognitive performance, with the strongest signals emerging in the middle temporal and superior frontal gyri, regions well-established in language processing research. The authors point to this as a promising avenue for early, interpretable detection of neurocognitive changes from language-related brain activity.

What these applications share is a common architecture: an LLM that combines broad prior knowledge with task-specific grounding, either through fine-tuning or structured prompting, to produce outputs that are not just accurate but explainable. This interpretability is not incidental. In research and clinical contexts, a system that can articulate its reasoning is far more useful than one that cannot.

A System That Combines Both

The BrainGPT results show that curating neuroscience literature makes LLMs better scientific advisors. The same logic extends naturally to other knowledge domains relevant to a researcher’s actual workflow: specific experimental protocols, device specifications, software behaviour, troubleshooting procedures. An LLM grounded in this kind of layered, curated theoretical and practical knowledge could function as a genuinely expert companion throughout an experiment, not just a general-purpose assistant that happens to know some neuroscience. Add signal processing capability on top of that, and the system can also read and interpret EEG data in real time: guiding data acquisition, flagging quality issues, and translating raw signals into meaningful insights about cognitive workload, fatigue, stress, attention, or early markers of clinical conditions.

The most compelling near-term possibility is a system that operates on both levels simultaneously: holding the full theoretical context of a field while watching the data come in, and reasoning across both in real time.

Such a system wouldn’t replace the researcher or clinician, but would act as a tireless assistant: catching problems early, continuously monitoring data quality, supporting faster experiment piloting, and enabling quick signal investigations, all the way from spectral analysis down to whether a device is connected and the battery charged.

BrainAccess Chatbot: A Step in This Direction

At BrainAccess, we are building toward exactly this kind of layered companion. Our next release will include a EEG-aware chatbot which integrates directly with BrainAccess Board software, grounding the model in live device state (connected hardware, recording status, signal quality, impedance readings, artifact flags) alongside curated knowledge about the equipment itself. It is deliberately constrained: it speaks to what it actually knows, and knows what your setup looks like right now.

This is a first layer. We’ll be sharing more about what comes next in a dedicated post soon.

References

Chandrasekharan, S., & Jacob, J. E. (2025). Bridging neuroscience and AI: a survey on large language models for neurological signal interpretation. Frontiers in Neuroinformatics, 19, 1561401.

Luo, X., Rechardt, A., Sun, G., et al. (2025). Large language models surpass human experts in predicting neuroscience results. Nature Human Behaviour, 9, 305–315.

From Literature to Live Data: LLMs as Full-Cycle Research Companions - BrainAccess

Martina Berto, PhD

Research Engineer & Neuroscientist @ Neurotechnology.

From Literature to Live Data: LLMs as Full-Cycle Research Companions - BrainAccess

Martina Berto, PhD

Research Engineer & Neuroscientist @ Neurotechnology.