
Some smart cookies have implemented a brain-computer interface that can synthesize speech from thought in near real-time. Described in a paper published in Nature Neuroscience this week, the neuroprosthesis is intended to allow patients with severe paralysis and anarthria – loss of speech – to communicate by turning brain signals into synthesized words. "Our streaming approach brings the same rapid speech decoding capacity of devices like Alexa and Siri to neuroprostheses," said Gopala Anumanchipalli – assistant professor of electrical engineering and computer sciences at University of California, Berkeley and co-principal investigator of the study, done in conjunction with UC San Francisco – in a statement .
"Using a similar type of algorithm, we found that we could decode neural data and, for the first time, enable near-synchronous voice streaming. The result is more naturalistic, fluent speech synthesis." The project improves on work published in 2023 by reducing the latency to decode thought and turn it into speech, which at the time took about eight seconds to produce a sentence.
As demonstrated in this video, below, the new process works roughly 8x faster, operating in near real-time. Youtube Video It begins by reading the patient's electrical brain signals after the intent to speak has been formed but before the thought has produced a vocal muscle response. "We are essentially intercepting signals where the thought is translated into articulation and in the middle of that motor control," said co-lead author Cheol Jun Cho, UC Berkeley PhD student in electrical engineering and computer sciences, in a statement.
"So what we’re decoding is after a thought has happened, after we’ve decided what to say, after we’ve decided what words to use and how to move our vocal-tract muscles." The neuroprosthesis works by passing 80ms chunks of electrocorticogram (ECoG) data through a neural encoder and then using a deep learning recurrent neural network transducer model to convert brain signals to sounds. The researchers used a recording of the patient's pre-injury voice to make the model's output sound more like natural speech.
While this particular neuroprosthesis requires a direct electrical connection to the brain, the researchers believe their approach is generalizable to other interfaces, including surgically implanted microelectrode arrays (MEAs) and non-invasive surface electromyography (SEMG). The work builds on research funded by Facebook that the social media biz abandoned four years ago to pursue more market-friendly SEMG wrist sensors. Edward Chang, chair of neurosurgery at the UCSF, who oversaw the Facebook-funded project is the senior co-principal investigator of this latest study.
Code for the Streaming Brain2Speech Decoder has been posted to GitHub, in case anyone is looking to reproduce the researchers' results. ®.