Publish Date | Title | Authors | URL | Abstract |
---|---|---|---|---|
2024-11-18 | Equivariant spatio-hemispherical networks for diffusion MRI deconvolution | Axel Elaldi, Guido Gerig, Neel Dey | Link | Each voxel in a diffusion MRI (dMRI) image contains a spherical signal corresponding to the direction and strength of water diffusion in the brain. This paper advances the analysis of such spatio-spherical data by developing convolutional network layers that are equivariant to the |
2024-11-18 | Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion | Meng Zhou, Yuxuan Zhang, Xiaolan Xu, Jiayi Wang, Farzad Khalvati | Link | Multimodal medical image fusion is a crucial task that combines complementary information from different imaging modalities into a unified representation, thereby enhancing diagnostic accuracy and treatment planning. While deep learning methods, particularly Convolutional Neural Networks (CNNs) and Transformers, have significantly advanced fusion performance, some of the existing CNN-based methods fall short in capturing fine-grained multiscale and edge features, leading to suboptimal feature integration. Transformer-based models, on the other hand, are computationally intensive in both the training and fusion stages, making them impractical for real-time clinical use. Moreover, the clinical application of fused images remains unexplored. In this paper, we propose a novel CNN-based architecture that addresses these limitations by introducing a Dilated Residual Attention Network Module for effective multiscale feature extraction, coupled with a gradient operator to enhance edge detail learning. To ensure fast and efficient fusion, we present a parameter-free fusion strategy based on the weighted nuclear norm of softmax, which requires no additional computations during training or inference. Extensive experiments, including a downstream brain tumor classification task, demonstrate that our approach outperforms various baseline methods in terms of visual quality, texture preservation, and fusion speed, making it a possible practical solution for real-world clinical applications. The code will be released at https://github.com/simonZhou86/en_dran. |
2024-11-18 | Temporal and Spatial Reservoir Ensembling Techniques for Liquid State Machines | Anmol Biswas, Sharvari Ashok Medhe, Raghav Singhal, Udayan Ganguly | Link | Reservoir computing (RC), is a class of computational methods such as Echo State Networks (ESN) and Liquid State Machines (LSM) describe a generic method to perform pattern recognition and temporal analysis with any non-linear system. This is enabled by Reservoir Computing being a shallow network model with only Input, Reservoir, and Readout layers where input and reservoir weights are not learned (only the readout layer is trained). LSM is a special case of Reservoir computing inspired by the organization of neurons in the brain and generally refers to spike-based Reservoir computing approaches. LSMs have been successfully used to showcase decent performance on some neuromorphic vision and speech datasets but a common problem associated with LSMs is that since the model is more-or-less fixed, the main way to improve the performance is by scaling up the Reservoir size, but that only gives diminishing rewards despite a tremendous increase in model size and computation. In this paper, we propose two approaches for effectively ensembling LSM models - Multi-Length Scale Reservoir Ensemble (MuLRE) and Temporal Excitation Partitioned Reservoir Ensemble (TEPRE) and benchmark them on Neuromorphic-MNIST (N-MNIST), Spiking Heidelberg Digits (SHD), and DVSGesture datasets, which are standard neuromorphic benchmarks. We achieve 98.1% test accuracy on N-MNIST with a 3600-neuron LSM model which is higher than any prior LSM-based approach and 77.8% test accuracy on the SHD dataset which is on par with a standard Recurrent Spiking Neural Network trained by Backprop Through Time (BPTT). We also propose receptive field-based input weights to the Reservoir to work alongside the Multi-Length Scale Reservoir ensemble model for vision tasks. Thus, we introduce effective means of scaling up the performance of LSM models and evaluate them against relevant neuromorphic benchmarks |
2024-11-18 | Towards Personalized Brain-Computer Interface Application Based on Endogenous EEG Paradigms | Heon-Gyu Kwak, Gi-Hwan Shin, Yeon-Woo Choi, Dong-Hoon Lee, Yoo-In Jeon, Jun-Su Kang, Seong-Whan Lee | Link | In this paper, we propose a conceptual framework for personalized brain-computer interface (BCI) applications, which can offer an enhanced user experience by customizing services to individual preferences and needs, based on endogenous electroencephalography (EEG) paradigms including motor imagery (MI), speech imagery (SI), and visual imagery. The framework includes two essential components: user identification and intention classification, which enable personalized services by identifying individual users and recognizing their intended actions through EEG signals. We validate the feasibility of our framework using a private EEG dataset collected from eight subjects, employing the ShallowConvNet architecture to decode EEG features. The experimental results demonstrate that user identification achieved an average classification accuracy of 0.995, while intention classification achieved 0.47 accuracy across all paradigms, with MI demonstrating the best performance. These findings indicate that EEG signals can effectively support personalized BCI applications, offering robust identification and reliable intention decoding, especially for MI and SI. |
2024-11-17 | Integrated Ising Model with global inhibition for decision making | Olga Tapinova, Tal Finkelman, Tamar Reitich-Stolero, Rony Paz, Assaf Tal, Nir S. Gov | Link | Humans and other organisms make decisions choosing between different options, with the aim to maximize the reward and minimize the cost. The main theoretical framework for modeling the decision-making process has been based on the highly successful drift-diffusion model, which is a simple tool for explaining many aspects of this process. However, new observations challenge this model. Recently, it was found that inhibitory tone increases during high cognitive load and situations of uncertainty, but the origin of this phenomenon is not understood. Motivated by this observation, we extend a recently developed model for decision making while animals move towards targets in real space. We introduce an integrated Ising-type model, that includes global inhibition, and use it to explore its role in decision-making. This model can explain how the brain may utilize inhibition to improve its decision-making accuracy. Compared to experimental results, this model suggests that the regime of the brain's decision-making activity is in proximity to a critical transition line between the ordered and disordered. Within the model, the critical region near the transition line has the advantageous property of enabling a significant decrease in error with a small increase in inhibition and also exhibits unique properties with respect to learning and memory decay. |
2024-11-17 | Retinal Vessel Segmentation via Neuron Programming | Tingting Wu, Ruyi Min, Peixuan Song, Hengtao Guo, Tieyong Zeng, Feng-Lei Fan | Link | The accurate segmentation of retinal blood vessels plays a crucial role in the early diagnosis and treatment of various ophthalmic diseases. Designing a network model for this task requires meticulous tuning and extensive experimentation to handle the tiny and intertwined morphology of retinal blood vessels. To tackle this challenge, Neural Architecture Search (NAS) methods are developed to fully explore the space of potential network architectures and go after the most powerful one. Inspired by neuronal diversity which is the biological foundation of all kinds of intelligent behaviors in our brain, this paper introduces a novel and foundational approach to neural network design, termed ``neuron programming'', to automatically search neuronal types into a network to enhance a network's representation ability at the neuronal level, which is complementary to architecture-level enhancement done by NAS. Additionally, to mitigate the time and computational intensity of neuron programming, we develop a hypernetwork that leverages the search-derived architectural information to predict optimal neuronal configurations. Comprehensive experiments validate that neuron programming can achieve competitive performance in retinal blood segmentation, demonstrating the strong potential of neuronal diversity in medical image analysis. |
2024-11-17 | STOP: Spatiotemporal Orthogonal Propagation for Weight-Threshold-Leakage Synergistic Training of Deep Spiking Neural Networks | Haoran Gao, Xichuan Zhou, Yingcheng Lin, Min Tian, Liyuan Liu, Cong Shi | Link | The prevailing of artificial intelligence-of-things calls for higher energy-efficient edge computing paradigms, such as neuromorphic agents leveraging brain-inspired spiking neural network (SNN) models based on spatiotemporally sparse binary activations. However, the lack of efficient and high-accuracy deep SNN learning algorithms prevents them from practical edge deployments with a strictly bounded cost. In this paper, we propose a spatiotemporal orthogonal propagation (STOP) algorithm to tack this challenge. Our algorithm enables fully synergistic learning of synaptic weights as well as firing thresholds and leakage factors in spiking neurons to improve SNN accuracy, while under a unified temporally-forward trace-based framework to mitigate the huge memory requirement for storing neural states of all time-steps in the forward pass. Characteristically, the spatially-backward neuronal errors and temporally-forward traces propagate orthogonally to and independently of each other, substantially reducing computational overhead. Our STOP algorithm obtained high recognition accuracies of 99.53%, 94.84%, 74.92%, 98.26% and 77.10% on the MNIST, CIFAR-10, CIFAR-100, DVS-Gesture and DVS-CIFAR10 datasets with adequate SNNs of intermediate scales from LeNet-5 to ResNet-18. Compared with other deep SNN training works, our method is more plausible for edge intelligent scenarios where resources are limited but high-accuracy in-situ learning is desired. |
2024-11-16 | In silico discovery of representational relationships across visual cortex | Alessandro T. Gifford, Maya A. Jastrzębowska, Johannes J. D. Singer, Radoslaw M. Cichy | Link | Human vision is mediated by a complex interconnected network of cortical brain areas jointly representing visual information. While these areas are increasingly understood in isolation, their representational relationships remain elusive. Here we developed relational neural control (RNC), and used it to investigate the representational relationships for univariate and multivariate fMRI responses of early- and mid-level visual areas. RNC generated and explored in silico fMRI responses for large amounts of images, discovering controlling images that align or disentangle responses across areas, thus indicating their shared or unique representational content. A large portion of representational content was shared across areas, unique representational content increased with cortical distance, and we isolated the visual features determining these effects. Closing the empirical cycle, we validated the in silico discoveries on in vivo fMRI responses from independent subjects. Together, this reveals how visual areas jointly represent the world as an interconnected network. |
2024-11-16 | Demonstrating Remote Synchronization: An Experimental Approach with Nonlinear Oscillators | Sanjeev Kumar Pandey, Neetish Patel | Link | This study investigates remote synchronization in arbitrary network clusters of coupled nonlinear oscillators, a phenomenon inspired by neural synchronization in the brain. Employing a multi-faceted approach encompassing analytical, numerical, and experimental methodologies, we leverage the Master Stability Function (MSF) to analyze network stability. We provide experimental evidence of remote synchronization between two clusters of nonlinear oscillators, where oscillators within each cluster are also remotely connected. This observation parallels the thalamus-mediated synchronization of neuronal populations in the brain. An electronic circuit testbed, supported by nonlinear ODE modeling and LT Spice simulation, was developed to validate our theoretical predictions. Future work will extend this investigation to encompass diverse network topologies and explore potential applications in neuroscience, communication networks, and power systems. |
2024-11-16 | Multi Scale Graph Neural Network for Alzheimer's Disease | Anya Chauhan, Ayush Noori, Zhaozhi Li, Yingnan He, Michelle M Li, Marinka Zitnik, Sudeshna Das | Link | Alzheimer's disease (AD) is a complex, progressive neurodegenerative disorder characterized by extracellular A\b{eta} plaques, neurofibrillary tau tangles, glial activation, and neuronal degeneration, involving multiple cell types and pathways. Current models often overlook the cellular context of these pathways. To address this, we developed a multiscale graph neural network (GNN) model, ALZ PINNACLE, using brain omics data from donors spanning the entire aging to AD spectrum. ALZ PINNACLE is based on the PINNACLE GNN framework, which learns context-aware protein, cell type, and tissue representations within a unified latent space. ALZ PINNACLE was trained on 14,951 proteins, 206,850 protein interactions, 7 cell types, and 48 cell subtypes or states. After pretraining, we investigated the learned embedding of APOE, the largest genetic risk factor for AD, across different cell types. Notably, APOE embeddings showed high similarity in microglial, neuronal, and CD8 cells, suggesting a similar role of APOE in these cell types. Fine tuning the model on AD risk genes revealed cell type contexts predictive of the role of APOE in AD. Our results suggest that ALZ PINNACLE may provide a valuable framework for uncovering novel insights into AD neurobiology. |
Publish Date | Title | Authors | URL | Abstract |
---|---|---|---|---|
2024-11-18 | Uncovering the role of semantic and acoustic cues in normal and dichotic listening | Akshara Soman, Sai Samrat Kankanala, Sriram Ganapathy | Link | Despite extensive research, the precise role of acoustic and semantic cues in complex speech perception tasks remains unclear. In this study, we propose a paradigm to understand the encoding of these cues in electroencephalogram (EEG) data, using match-mismatch (MM) classification task. The MM task involves determining whether the stimulus and response correspond to each other or not. We design a multi-modal sequence model, based on long short term memory (LSTM) architecture, to perform the MM task. The model is input with acoustic stimulus (derived from the speech envelope), semantic stimulus (derived from textual representations of the speech content), and neural response (derived from the EEG data). Our experiments are performed on two separate conditions, i) natural passive listening condition and, ii) an auditory attention based dichotic listening condition. Using the MM task as the analysis framework, we observe that - a) speech perception is fragmented based on word boundaries, b) acoustic and semantic cues offer similar levels of MM task performance in natural listening conditions, and c) semantic cues offer significantly improved MM classification over acoustic cues in dichotic listening task. Further, the study provides evidence of right ear advantage in dichotic listening conditions. |
2024-11-18 | Towards Personalized Brain-Computer Interface Application Based on Endogenous EEG Paradigms | Heon-Gyu Kwak, Gi-Hwan Shin, Yeon-Woo Choi, Dong-Hoon Lee, Yoo-In Jeon, Jun-Su Kang, Seong-Whan Lee | Link | In this paper, we propose a conceptual framework for personalized brain-computer interface (BCI) applications, which can offer an enhanced user experience by customizing services to individual preferences and needs, based on endogenous electroencephalography (EEG) paradigms including motor imagery (MI), speech imagery (SI), and visual imagery. The framework includes two essential components: user identification and intention classification, which enable personalized services by identifying individual users and recognizing their intended actions through EEG signals. We validate the feasibility of our framework using a private EEG dataset collected from eight subjects, employing the ShallowConvNet architecture to decode EEG features. The experimental results demonstrate that user identification achieved an average classification accuracy of 0.995, while intention classification achieved 0.47 accuracy across all paradigms, with MI demonstrating the best performance. These findings indicate that EEG signals can effectively support personalized BCI applications, offering robust identification and reliable intention decoding, especially for MI and SI. |
2024-11-15 | PFML: Self-Supervised Learning of Time-Series Data Without Representation Collapse | Einari Vaaras, Manu Airaksinen, Okko Räsänen | Link | Self-supervised learning (SSL) is a data-driven learning approach that utilizes the innate structure of the data to guide the learning process. In contrast to supervised learning, which depends on external labels, SSL utilizes the inherent characteristics of the data to produce its own supervisory signal. However, one frequent issue with SSL methods is representation collapse, where the model outputs a constant input-invariant feature representation. This issue hinders the potential application of SSL methods to new data modalities, as trying to avoid representation collapse wastes researchers' time and effort. This paper introduces a novel SSL algorithm for time-series data called Prediction of Functionals from Masked Latents (PFML). Instead of predicting masked input signals or their latent representations directly, PFML operates by predicting statistical functionals of the input signal corresponding to masked embeddings, given a sequence of unmasked embeddings. The algorithm is designed to avoid representation collapse, rendering it straightforwardly applicable to different time-series data domains, such as novel sensor modalities in clinical data. We demonstrate the effectiveness of PFML through complex, real-life classification tasks across three different data modalities: infant posture and movement classification from multi-sensor inertial measurement unit data, emotion recognition from speech data, and sleep stage classification from EEG data. The results show that PFML is superior to a conceptually similar pre-existing SSL method and competitive against the current state-of-the-art SSL method, while also being conceptually simpler and without suffering from representation collapse. |
2024-11-15 | A Multi-Label EEG Dataset for Mental Attention State Classification in Online Learning | Huan Liu, Yuzhe Zhang, Guanjian Liu, Xinxin Du, Haochong Wang, Dalin Zhang | Link | Attention is a vital cognitive process in the learning and memory environment, particularly in the context of online learning. Traditional methods for classifying attention states of online learners based on behavioral signals are prone to distortion, leading to increased interest in using electroencephalography (EEG) signals for authentic and accurate assessment. However, the field of attention state classification based on EEG signals in online learning faces challenges, including the scarcity of publicly available datasets, the lack of standardized data collection paradigms, and the requirement to consider the interplay between attention and other psychological states. In light of this, we present the Multi-label EEG dataset for classifying Mental Attention states (MEMA) in online learning. We meticulously designed a reliable and standard experimental paradigm with three attention states: neutral, relaxing, and concentrating, considering human physiological and psychological characteristics. This paradigm collected EEG signals from 20 subjects, each participating in 12 trials, resulting in 1,060 minutes of data. Emotional state labels, basic personal information, and personality traits were also collected to investigate the relationship between attention and other psychological states. Extensive quantitative and qualitative analysis, including a multi-label correlation study, validated the quality of the EEG attention data. The MEMA dataset and analysis provide valuable insights for advancing research on attention in online learning. The dataset is publicly available at \url{https://github.com/GuanjianLiu/MEMA}. |
2024-11-15 | EEG Spectral Analysis in Gray Zone Between Healthy and Insomnia | Ha-Na Jo, Young-Seok Kweon, Seo-Hyun Lee | Link | This study investigates the sleep characteristics and brain activity of individuals in the gray zone of insomnia, a population that experiences sleep disturbances yet does not fully meet the clinical criteria for chronic insomnia. Thirteen healthy participants and thirteen individuals from the gray zone were assessed using polysomnography and electroencephalogram to analyze both sleep architecture and neural activity. Although no significant differences in objective sleep quality or structure were found between the groups, gray zone individuals reported higher insomnia severity index scores, indicating subjective sleep difficulties. Electroencephalogram analysis revealed increased delta and alpha activity during the wake stage, suggesting lingering sleep inertia, while non-rapid eye movement stages 1 and 2 exhibited elevated beta and gamma activity, often associated with chronic insomnia. However, these high-frequency patterns were not observed in non-rapid eye movement stage 3 or rapid eye movement sleep, suggesting less severe disruptions compared to chronic insomnia. This study emphasizes that despite normal polysomnography findings, EEG patterns in gray zone individuals suggest a potential risk for chronic insomnia, highlighting the need for early identification and tailored intervention strategies to prevent progression. |
2024-11-15 | A Hybrid Artificial Intelligence System for Automated EEG Background Analysis and Report Generation | Chin-Sung Tung, Sheng-Fu Liang, Shu-Feng Chang, Chung-Ping Young | Link | Electroencephalography (EEG) plays a crucial role in the diagnosis of various neurological disorders. However, small hospitals and clinics often lack advanced EEG signal analysis systems and are prone to misinterpretation in manual EEG reading. This study proposes an innovative hybrid artificial intelligence (AI) system for automatic interpretation of EEG background activity and report generation. The system combines deep learning models for posterior dominant rhythm (PDR) prediction, unsupervised artifact removal, and expert-designed algorithms for abnormality detection. For PDR prediction, 1530 labeled EEGs were used, and the best ensemble model achieved a mean absolute error (MAE) of 0.237, a root mean square error (RMSE) of 0.359, an accuracy of 91.8% within a 0.6Hz error, and an accuracy of 99% within a 1.2Hz error. The AI system significantly outperformed neurologists in detecting generalized background slowing (p = 0.02; F1: AI 0.93, neurologists 0.82) and demonstrated improved focal abnormality detection, although not statistically significant (p = 0.79; F1: AI 0.71, neurologists 0.55). Validation on both an internal dataset and the Temple University Abnormal EEG Corpus showed consistent performance (F1: 0.884 and 0.835, respectively; p = 0.66), demonstrating generalizability. The use of large language models (LLMs) for report generation demonstrated 100% accuracy, verified by three other independent LLMs. This hybrid AI system provides an easily scalable and accurate solution for EEG interpretation in resource-limited settings, assisting neurologists in improving diagnostic accuracy and reducing misdiagnosis rates. |
2024-11-14 | Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion | Matteo Ferrante, Tommaso Boccato, Grigorii Rashkov, Nicola Toschi | Link | This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities. The results highlight the model's ability to accurately capture semantic information across different brain imaging techniques, illustrating its potential in decoding, encoding, and modality conversion tasks. |
2024-11-14 | EEG-Based Speech Decoding: A Novel Approach Using Multi-Kernel Ensemble Diffusion Models | Soowon Kim, Ha-Na Jo, Eunyeong Ko | Link | In this study, we propose an ensemble learning framework for electroencephalogram-based overt speech classification, leveraging denoising diffusion probabilistic models with varying convolutional kernel sizes. The ensemble comprises three models with kernel sizes of 51, 101, and 201, effectively capturing multi-scale temporal features inherent in signals. This approach improves the robustness and accuracy of speech decoding by accommodating the rich temporal complexity of neural signals. The ensemble models work in conjunction with conditional autoencoders that refine the reconstructed signals and maximize the useful information for downstream classification tasks. The results indicate that the proposed ensemble-based approach significantly outperforms individual models and existing state-of-the-art techniques. These findings demonstrate the potential of ensemble methods in advancing brain signal decoding, offering new possibilities for non-verbal communication applications, particularly in brain-computer interface systems aimed at aiding individuals with speech impairments. |
2024-11-14 | Towards Unified Neural Decoding of Perceived, Spoken and Imagined Speech from EEG Signals | Jung-Sun Lee, Ha-Na Jo, Seo-Hyun Lee | Link | Brain signals accompany various information relevant to human actions and mental imagery, making them crucial to interpreting and understanding human intentions. Brain-computer interface technology leverages this brain activity to generate external commands for controlling the environment, offering critical advantages to individuals with paralysis or locked-in syndrome. Within the brain-computer interface domain, brain-to-speech research has gained attention, focusing on the direct synthesis of audible speech from brain signals. Most current studies decode speech from brain activity using invasive techniques and emphasize spoken speech data. However, humans express various speech states, and distinguishing these states through non-invasive approaches remains a significant yet challenging task. This research investigated the effectiveness of deep learning models for non-invasive-based neural signal decoding, with an emphasis on distinguishing between different speech paradigms, including perceived, overt, whispered, and imagined speech, across multiple frequency bands. The model utilizing the spatial conventional neural network module demonstrated superior performance compared to other models, especially in the gamma band. Additionally, imagined speech in the theta frequency band, where deep learning also showed strong effects, exhibited statistically significant differences compared to the other speech paradigms. |
2024-11-14 | Towards Scalable Handwriting Communication via EEG Decoding and Latent Embedding Integration | Jun-Young Kim, Deok-Seon Kim, Seo-Hyun Lee | Link | In recent years, brain-computer interfaces have made advances in decoding various motor-related tasks, including gesture recognition and movement classification, utilizing electroencephalogram (EEG) data. These developments are fundamental in exploring how neural signals can be interpreted to recognize specific physical actions. This study centers on a written alphabet classification task, where we aim to decode EEG signals associated with handwriting. To achieve this, we incorporate hand kinematics to guide the extraction of the consistent embeddings from high-dimensional neural recordings using auxiliary variables (CEBRA). These CEBRA embeddings, along with the EEG, are processed by a parallel convolutional neural network model that extracts features from both data sources simultaneously. The model classifies nine different handwritten characters, including symbols such as exclamation marks and commas, within the alphabet. We evaluate the model using a quantitative five-fold cross-validation approach and explore the structure of the embedding space through visualizations. Our approach achieves a classification accuracy of 91 % for the nine-class task, demonstrating the feasibility of fine-grained handwriting decoding from EEG. |
Publish Date | Title | Authors | URL | Abstract |
---|---|---|---|---|
2024-11-18 | Towards Personalized Brain-Computer Interface Application Based on Endogenous EEG Paradigms | Heon-Gyu Kwak, Gi-Hwan Shin, Yeon-Woo Choi, Dong-Hoon Lee, Yoo-In Jeon, Jun-Su Kang, Seong-Whan Lee | Link | In this paper, we propose a conceptual framework for personalized brain-computer interface (BCI) applications, which can offer an enhanced user experience by customizing services to individual preferences and needs, based on endogenous electroencephalography (EEG) paradigms including motor imagery (MI), speech imagery (SI), and visual imagery. The framework includes two essential components: user identification and intention classification, which enable personalized services by identifying individual users and recognizing their intended actions through EEG signals. We validate the feasibility of our framework using a private EEG dataset collected from eight subjects, employing the ShallowConvNet architecture to decode EEG features. The experimental results demonstrate that user identification achieved an average classification accuracy of 0.995, while intention classification achieved 0.47 accuracy across all paradigms, with MI demonstrating the best performance. These findings indicate that EEG signals can effectively support personalized BCI applications, offering robust identification and reliable intention decoding, especially for MI and SI. |
2024-11-14 | Imagined Speech and Visual Imagery as Intuitive Paradigms for Brain-Computer Interfaces | Seo-Hyun Lee, Ji-Ha Park, Deok-Seon Kim | Link | Recent advancements in brain-computer interface (BCI) technology have emphasized the promise of imagined speech and visual imagery as effective paradigms for intuitive communication. This study investigates the classification performance and brain connectivity patterns associated with these paradigms, focusing on decoding accuracy across selected word classes. Sixteen participants engaged in tasks involving thirteen imagined speech and visual imagery classes, revealing above-chance classification accuracy for both paradigms. Variability in classification accuracy across individual classes highlights the influence of sensory and motor associations in imagined speech and vivid visual associations in visual imagery. Connectivity analysis further demonstrated increased functional connectivity in language-related and sensory regions for imagined speech, whereas visual imagery activated spatial and visual processing networks. These findings suggest the potential of imagined speech and visual imagery as an intuitive and scalable paradigm for BCI communication when selecting optimal word classes. Further exploration of the decoding outcomes for these two paradigms could provide insights for practical BCI communication. |
2024-11-12 | On the BCI Problem | Ted Dobson, Gregory Robson | Link | Let |
2024-11-04 | User-wise Perturbations for User Identity Protection in EEG-Based BCIs | Xiaoqing Chen, Siyang Li, Yunlu Tu, Ziwei Wang, Dongrui Wu | Link | Objective: An electroencephalogram (EEG)-based brain-computer interface (BCI) is a direct communication pathway between the human brain and a computer. Most research so far studied more accurate BCIs, but much less attention has been paid to the ethics of BCIs. Aside from task-specific information, EEG signals also contain rich private information, e.g., user identity, emotion, disorders, etc., which should be protected. Approach: We show for the first time that adding user-wise perturbations can make identity information in EEG unlearnable. We propose four types of user-wise privacy-preserving perturbations, i.e., random noise, synthetic noise, error minimization noise, and error maximization noise. After adding the proposed perturbations to EEG training data, the user identity information in the data becomes unlearnable, while the BCI task information remains unaffected. Main results: Experiments on six EEG datasets using three neural network classifiers and various traditional machine learning models demonstrated the robustness and practicability of the proposed perturbations. Significance: Our research shows the feasibility of hiding user identity information in EEG data without impacting the primary BCI task information. |
2024-11-04 | Alignment-Based Adversarial Training (ABAT) for Improving the Robustness and Accuracy of EEG-Based BCIs | Xiaoqing Chen, Ziwei Wang, Dongrui Wu | Link | Machine learning has achieved great success in electroencephalogram (EEG) based brain-computer interfaces (BCIs). Most existing BCI studies focused on improving the decoding accuracy, with only a few considering the adversarial security. Although many adversarial defense approaches have been proposed in other application domains such as computer vision, previous research showed that their direct extensions to BCIs degrade the classification accuracy on benign samples. This phenomenon greatly affects the applicability of adversarial defense approaches to EEG-based BCIs. To mitigate this problem, we propose alignment-based adversarial training (ABAT), which performs EEG data alignment before adversarial training. Data alignment aligns EEG trials from different domains to reduce their distribution discrepancies, and adversarial training further robustifies the classification boundary. The integration of data alignment and adversarial training can make the trained EEG classifiers simultaneously more accurate and more robust. Experiments on five EEG datasets from two different BCI paradigms (motor imagery classification, and event related potential recognition), three convolutional neural network classifiers (EEGNet, ShallowCNN and DeepCNN) and three different experimental settings (offline within-subject cross-block/-session classification, online cross-session classification, and pre-trained classifiers) demonstrated its effectiveness. It is very intriguing that adversarial attacks, which are usually used to damage BCI systems, can be used in ABAT to simultaneously improve the model accuracy and robustness. |
2024-10-31 | Biologically-Inspired Technologies: Integrating Brain-Computer Interface and Neuromorphic Computing for Human Digital Twins | Chen Shang, Jiadong Yu, Dinh Thai Hoang | Link | The integration of the Metaverse into a human-centric ecosystem has intensified the need for sophisticated Human Digital Twins (HDTs) that are driven by the multifaceted human data. However, the effective construction of HDTs faces significant challenges due to the heterogeneity of data collection devices, the high energy demands associated with processing intricate data, and concerns over the privacy of sensitive information. This work introduces a novel biologically-inspired (bio-inspired) HDT framework that leverages Brain-Computer Interface (BCI) sensor technology to capture brain signals as the data source for constructing HDT. By collecting and analyzing these signals, the framework not only minimizes device heterogeneity and enhances data collection efficiency, but also provides richer and more nuanced physiological and psychological data for constructing personalized HDTs. To this end, we further propose a bio-inspired neuromorphic computing learning model based on the Spiking Neural Network (SNN). This model utilizes discrete neural spikes to emulate the way of human brain processes information, thereby enhancing the system's ability to process data effectively while reducing energy consumption. Additionally, we integrate a Federated Learning (FL) strategy within the model to strengthen data privacy. We then conduct a case study to demonstrate the performance of our proposed twofold bio-inspired scheme. Finally, we present several challenges and promising directions for future research of HDTs driven by bio-inspired technologies. |
2024-10-31 | Feature Selection via Dynamic Graph-based Attention Block in MI-based EEG Signals | Hyeon-Taek Han, Dae-Hyeok Lee, Heon-Gyu Kwak | Link | Brain-computer interface (BCI) technology enables direct interaction between humans and computers by analyzing brain signals. Electroencephalogram (EEG) is one of the non-invasive tools used in BCI systems, providing high temporal resolution for real-time applications. However, EEG signals are often affected by a low signal-to-noise ratio, physiological artifacts, and individual variability, representing challenges in extracting distinct features. Also, motor imagery (MI)-based EEG signals could contain features with low correlation to MI characteristics, which might cause the weights of the deep model to become biased towards those features. To address these problems, we proposed the end-to-end deep preprocessing method that effectively enhances MI characteristics while attenuating features with low correlation to MI characteristics. The proposed method consisted of the temporal, spatial, graph, and similarity blocks to preprocess MI-based EEG signals, aiming to extract more discriminative features and improve the robustness. We evaluated the proposed method using the public dataset 2a of BCI Competition IV to compare the performances when integrating the proposed method into the conventional models, including the DeepConvNet, the M-ShallowConvNet, and the EEGNet. The experimental results showed that the proposed method could achieve the improved performances and lead to more clustered feature distributions of MI tasks. Hence, we demonstrated that our proposed method could enhance discriminative features related to MI characteristics. |
2024-10-31 | Neurophysiological Analysis in Motor and Sensory Cortices for Improving Motor Imagination | Si-Hyun Kim, Sung-Jin Kim, Dae-Hyeok Lee | Link | Brain-computer interface (BCI) enables direct communication between the brain and external devices by decoding neural signals, offering potential solutions for individuals with motor impairments. This study explores the neural signatures of motor execution (ME) and motor imagery (MI) tasks using EEG signals, focusing on four conditions categorized as sense-related (hot and cold) and motor-related (pull and push) conditions. We conducted scalp topography analysis to examine activation patterns in the sensorimotor cortex, revealing distinct regional differences: sense--related conditions primarily activated the posterior region of the sensorimotor cortex, while motor--related conditions activated the anterior region of the sensorimotor cortex. These spatial distinctions align with neurophysiological principles, suggesting condition-specific functional subdivisions within the sensorimotor cortex. We further evaluated the performances of three neural network models-EEGNet, ShallowConvNet, and DeepConvNet-demonstrating that ME tasks achieved higher classification accuracies compared to MI tasks. Specifically, in sense-related conditions, the highest accuracy was observed in the cold condition. In motor-related conditions, the pull condition showed the highest performance, with DeepConvNet yielding the highest results. These findings provide insights into optimizing BCI applications by leveraging specific condition-induced neural activations. |
2024-10-29 | Neurofeedback-Driven 6-DOF Robotic Arm: Integration of Brain-Computer Interface with Arduino for Advanced Control | Ihab A. Satam, Róbert Szabolcsi | Link | Brain computer interface (BCI) applications in robotics are becoming more famous and famous. People with disabilities are facing a real-time problem of doing simple activities such as grasping, handshaking etc. in order to aid with this problem, the use of brain signals to control actuators is showing a great importance. The Emotive Insight, a Brain-Computer Interface (BCI) device, is utilized in this project to collect brain signals and transform them into commands for controlling a robotic arm using an Arduino controller. The Emotive Insight captures brain signals, which are subsequently analyzed using Emotive software and connected with Arduino code. The HITI Brain software integrates these devices, allowing for smooth communication between brain activity and the robotic arm. This system demonstrates how brain impulses may be utilized to control external devices directly. The results showed that the system is applicable efficiently to robotic arms and also for prosthetic arms with Multi Degree of Freedom. In addition to that, the system can be used for other actuators such as bikes, mobile robots, wheelchairs etc. |
2024-10-28 | Can EEG resting state data benefit data-driven approaches for motor-imagery decoding? | Rishan Mehta, Param Rajpura, Yogesh Kumar Meena | Link | Resting-state EEG data in neuroscience research serve as reliable markers for user identification and reveal individual-specific traits. Despite this, the use of resting-state data in EEG classification models is limited. In this work, we propose a feature concatenation approach to enhance decoding models' generalization by integrating resting-state EEG, aiming to improve motor imagery BCI performance and develop a user-generalized model. Using feature concatenation, we combine the EEGNet model, a standard convolutional neural network for EEG signal classification, with functional connectivity measures derived from resting-state EEG data. The findings suggest that although grounded in neuroscience with data-driven learning, the concatenation approach has limited benefits for generalizing models in within-user and across-user scenarios. While an improvement in mean accuracy for within-user scenarios is observed on two datasets, concatenation doesn't benefit across-user scenarios when compared with random data concatenation. The findings indicate the necessity of further investigation on the model interpretability and the effect of random data concatenation on model robustness. |
Publish Date | Title | Authors | URL | Abstract |
---|---|---|---|---|
2024-11-16 | In silico discovery of representational relationships across visual cortex | Alessandro T. Gifford, Maya A. Jastrzębowska, Johannes J. D. Singer, Radoslaw M. Cichy | Link | Human vision is mediated by a complex interconnected network of cortical brain areas jointly representing visual information. While these areas are increasingly understood in isolation, their representational relationships remain elusive. Here we developed relational neural control (RNC), and used it to investigate the representational relationships for univariate and multivariate fMRI responses of early- and mid-level visual areas. RNC generated and explored in silico fMRI responses for large amounts of images, discovering controlling images that align or disentangle responses across areas, thus indicating their shared or unique representational content. A large portion of representational content was shared across areas, unique representational content increased with cortical distance, and we isolated the visual features determining these effects. Closing the empirical cycle, we validated the in silico discoveries on in vivo fMRI responses from independent subjects. Together, this reveals how visual areas jointly represent the world as an interconnected network. |
2024-11-14 | Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion | Matteo Ferrante, Tommaso Boccato, Grigorii Rashkov, Nicola Toschi | Link | This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities. The results highlight the model's ability to accurately capture semantic information across different brain imaging techniques, illustrating its potential in decoding, encoding, and modality conversion tasks. |
2024-11-14 | Interdependent scaling exponents in the human brain | Daniel M. Castro, Ernesto P. Raposo, Mauro Copelli, Fernando A. N. Santos | Link | We apply the phenomenological renormalization group to resting-state fMRI time series of brain activity in a large population. By recursively coarse-graining the data, we compute scaling exponents for the series variance, log probability of silence, and largest covariance eigenvalue. The exponents clearly exhibit linear interdependencies, which we derive analytically in a mean-field approach. We find a significant correlation of exponent values with the gray matter volume and cognitive performance. Akin to scaling relations near critical points in thermodynamics, our findings suggest scaling interdependencies are intrinsic to brain organization and may also exist in other complex systems. |
2024-11-13 | Somatosensory and motor contributions to emotion representation | Marianne C. Reddan, Luke Chang, Philip Kragel, Tor D. Wager | Link | Emotion is often described as something people 'feel' in their bodies. Embodied emotion theorists propose that this connection is not purely linguistic; perceiving an emotion may require somatosensory and motor re-experiencing. However, it remains unclear whether self-reports of emotion-related bodily sensations (i.e., 'lump in my throat') are related to neural simulations of bodily action and sensation or whether they can be explained by cognitive appraisals or the visual features of socioemotional signals. To investigate this, participants (N = 21) were shown arousing emotional images that varied in valence, complexity, and content while undergoing fMRI scans. Participants then rated the images on a set of emotion appraisal scales and indicated where, on a body map, they experienced sensation in response to the image. To derive normative models of responses on these scales, a separate larger online sample online (N = 56 - 128) also rated these images. Representational similarity analysis (RSA) was used to compare the emotional content in the body maps with appraisals and visual features. A pairwise distance matrix between the body maps generated for each stimulus was then used in a whole brain voxel-wise searchlight analysis to identify brain regions which reflect the representational geometry of embodied emotion. This analysis revealed a network including bilateral primary somatosensory and motor cortices, precuneus, insula, and medial prefrontal cortex. The results of this study suggest that the relationship between emotion and the body is not purely conceptual: It is supported by sensorimotor cortical activations. |
2024-11-13 | A Heterogeneous Graph Neural Network Fusing Functional and Structural Connectivity for MCI Diagnosis | Feiyu Yin, Yu Lei, Siyuan Dai, Wenwen Zeng, Guoqing Wu, Liang Zhan, Jinhua Yu | Link | Brain connectivity alternations associated with brain disorders have been widely reported in resting-state functional imaging (rs-fMRI) and diffusion tensor imaging (DTI). While many dual-modal fusion methods based on graph neural networks (GNNs) have been proposed, they generally follow homogenous fusion ways ignoring rich heterogeneity of dual-modal information. To address this issue, we propose a novel method that integrates functional and structural connectivity based on heterogeneous graph neural networks (HGNNs) to better leverage the rich heterogeneity in dual-modal images. We firstly use blood oxygen level dependency and whiter matter structure information provided by rs-fMRI and DTI to establish homo-meta-path, capturing node relationships within the same modality. At the same time, we propose to establish hetero-meta-path based on structure-function coupling and brain community searching to capture relations among cross-modal nodes. Secondly, we further introduce a heterogeneous graph pooling strategy that automatically balances homo- and hetero-meta-path, effectively leveraging heterogeneous information and preventing feature confusion after pooling. Thirdly, based on the flexibility of heterogeneous graphs, we propose a heterogeneous graph data augmentation approach that can conveniently address the sample imbalance issue commonly seen in clinical diagnosis. We evaluate our method on ADNI-3 dataset for mild cognitive impairment (MCI) diagnosis. Experimental results indicate the proposed method is effective and superior to other algorithms, with a mean classification accuracy of 93.3%. |
2024-11-12 | FM-TS: Flow Matching for Time Series Generation | Yang Hu, Xiao Wang, Lirong Wu, Huatian Zhang, Stan Z. Li, Sheng Wang, Tianlong Chen | Link | Time series generation has emerged as an essential tool for analyzing temporal data across numerous fields. While diffusion models have recently gained significant attention in generating high-quality time series, they tend to be computationally demanding and reliant on complex stochastic processes. To address these limitations, we introduce FM-TS, a rectified Flow Matching-based framework for Time Series generation, which simplifies the time series generation process by directly optimizing continuous trajectories. This approach avoids the need for iterative sampling or complex noise schedules typically required in diffusion-based models. FM-TS is more efficient in terms of training and inference. Moreover, FM-TS is highly adaptive, supporting both conditional and unconditional time series generation. Notably, through our novel inference design, the model trained in an unconditional setting can seamlessly generalize to conditional tasks without the need for retraining. Extensive benchmarking across both settings demonstrates that FM-TS consistently delivers superior performance compared to existing approaches while being more efficient in terms of training and inference. For instance, in terms of discriminative score, FM-TS achieves 0.005, 0.019, 0.011, 0.005, 0.053, and 0.106 on the Sines, Stocks, ETTh, MuJoCo, Energy, and fMRI unconditional time series datasets, respectively, significantly outperforming the second-best method which achieves 0.006, 0.067, 0.061, 0.008, 0.122, and 0.167 on the same datasets. We have achieved superior performance in solar forecasting and MuJoCo imputation tasks, significantly enhanced by our innovative |
2024-11-17 | Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models | Yanchen Wang, Adam Turnbull, Tiange Xiang, Yunlong Xu, Sa Zhou, Adnan Masoud, Shekoofeh Azizi, Feng Vankee Lin, Ehsan Adeli | Link | Neural decoding, the process of understanding how brain activity corresponds to different stimuli, has been a primary objective in cognitive sciences. Over the past three decades, advancements in functional Magnetic Resonance Imaging and machine learning have greatly improved our ability to map visual stimuli to brain activity, especially in the visual cortex. Concurrently, research has expanded into decoding more complex processes like language and memory across the whole brain, utilizing techniques to handle greater variability and improve signal accuracy. We argue that "seeing" involves more than just mapping visual stimuli onto the visual cortex; it engages the entire brain, as various emotions and cognitive states can emerge from observing different scenes. In this paper, we develop algorithms to enhance our understanding of visual processes by incorporating whole-brain activation maps while individuals are exposed to visual stimuli. We utilize large-scale fMRI encoders and Image generative models pre-trained on large public datasets, which are then fine-tuned through Image-fMRI contrastive learning. Our models hence can decode visual experience across the entire cerebral cortex, surpassing the traditional confines of the visual cortex. We first compare our method with state-of-the-art approaches to decoding visual processing and show improved predictive semantic accuracy by 43%. A network ablation analysis suggests that beyond the visual cortex, the default mode network contributes most to decoding stimuli, in line with the proposed role of this network in sense-making and semantic processing. Additionally, we implemented zero-shot imagination decoding on an extra validation dataset, achieving a p-value of 0.0206 for mapping the reconstructed images and ground-truth text stimuli, which substantiates the model's capability to capture semantic meanings across various scenarios. |
2024-11-05 | A scalable generative model for dynamical system reconstruction from neuroimaging data | Eric Volkmann, Alena Brändle, Daniel Durstewitz, Georgia Koppe | Link | Data-driven inference of the generative dynamics underlying a set of observed time series is of growing interest in machine learning and the natural sciences. In neuroscience, such methods promise to alleviate the need to handcraft models based on biophysical principles and allow to automatize the inference of inter-individual differences in brain dynamics. Recent breakthroughs in training techniques for state space models (SSMs) specifically geared toward dynamical systems (DS) reconstruction (DSR) enable to recover the underlying system including its geometrical (attractor) and long-term statistical invariants from even short time series. These techniques are based on control-theoretic ideas, like modern variants of teacher forcing (TF), to ensure stable loss gradient propagation while training. However, as it currently stands, these techniques are not directly applicable to data modalities where current observations depend on an entire history of previous states due to a signal's filtering properties, as common in neuroscience (and physiology more generally). Prominent examples are the blood oxygenation level dependent (BOLD) signal in functional magnetic resonance imaging (fMRI) or Ca$^{2+}$ imaging data. Such types of signals render the SSM's decoder model non-invertible, a requirement for previous TF-based methods. Here, exploiting the recent success of control techniques for training SSMs, we propose a novel algorithm that solves this problem and scales exceptionally well with model dimensionality and filter length. We demonstrate its efficiency in reconstructing dynamical systems, including their state space geometry and long-term temporal properties, from just short BOLD time series. |
2024-11-04 | A Novel Deep Learning Tractography Fiber Clustering Framework for Functionally Consistent White Matter Parcellation Using Multimodal Diffusion MRI and Functional MRI | Jin Wang, Bocheng Guo, Yijie Li, Junyi Wang, Yuqian Chen, Jarrett Rushmore, Nikos Makris, Yogesh Rathi, Lauren J O'Donnell, Fan Zhang | Link | Tractography fiber clustering using diffusion MRI (dMRI) is a crucial strategy for white matter (WM) parcellation. Current methods primarily use the geometric information of fibers (i.e., the spatial trajectories) to group similar fibers into clusters, overlooking the important functional signals present along the fiber tracts. There is increasing evidence that neural activity in the WM can be measured using functional MRI (fMRI), offering potentially valuable multimodal information for fiber clustering. In this paper, we develop a novel deep learning fiber clustering framework, namely Deep Multi-view Fiber Clustering (DMVFC), that uses joint dMRI and fMRI data to enable functionally consistent WM parcellation. DMVFC can effectively integrate the geometric characteristics of the WM fibers with the fMRI BOLD signals along the fiber tracts. It includes two major components: 1) a multi-view pretraining module to compute embedding features from fiber geometric information and functional signals separately, and 2) a collaborative fine-tuning module to simultaneously refine the two kinds of embeddings. In the experiments, we compare DMVFC with two state-of-the-art fiber clustering methods and demonstrate superior performance in achieving functionally meaningful and consistent WM parcellation results. |
2024-11-02 | Cost efficiency of fMRI studies using resting-state vs task-based functional connectivity | Xinzhi Zhang, Leslie A Hulvershorn, Todd Constable, Yize Zhao, Selena Wang | Link | We investigate whether and how we can improve the cost efficiency of neuroimaging studies with well-tailored fMRI tasks. The comparative study is conducted using a novel network science-driven Bayesian connectome-based predictive method, which incorporates network theories in model building and substantially improves precision and robustness in imaging biomarker detection. The robustness of the method lays the foundation for identifying predictive power differential across fMRI task conditions if such difference exists. When applied to a clinically heterogeneous transdiagnostic cohort, we found shared and distinct functional fingerprints of neuropsychological outcomes across seven fMRI conditions. For example, emotional N-back memory task was found to be less optimal for negative emotion outcomes, and gradual-onset continuous performance task was found to have stronger links with sensitivity and sociability outcomes than with cognitive control outcomes. Together, our results show that there are unique optimal pairings of task-based fMRI conditions and neuropsychological outcomes that should not be ignored when designing well-powered neuroimaging studies. |
Publish Date | Title | Authors | URL | Abstract |
---|---|---|---|---|
2024-11-14 | Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion | Matteo Ferrante, Tommaso Boccato, Grigorii Rashkov, Nicola Toschi | Link | This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities. The results highlight the model's ability to accurately capture semantic information across different brain imaging techniques, illustrating its potential in decoding, encoding, and modality conversion tasks. |
2024-11-12 | Search for the X17 particle in |
The MEG II collaboration, K. Afanaciev, A. M. Baldini, S. Ban, H. Benmansour, G. Boca, P. W. Cattaneo, G. Cavoto, F. Cei, M. Chiappini, A. Corvaglia, G. Dal Maso, A. De Bari, M. De Gerone, L. Ferrari Barusso, M. Francesconi, L. Galli, G. Gallucci, F. Gatti, L. Gerritzen, F. Grancagnolo, E. G. Grandoni, M. Grassi, D. N. Grigoriev, M. Hildebrandt, F. Ignatov, F. Ikeda, T. Iwamoto, S. Karpov, P. -R. Kettle, N. Khomutov, A. Kolesnikov, N. Kravchuk, V. Krylov, N. Kuchinskiy, F. Leonetti, W. Li, V. Malyshev, A. Matsushita, M. Meucci, S. Mihara, W. Molzon, T. Mori, D. Nicolò, H. Nishiguchi, A. Ochi, W. Ootani, A. Oya, D. Palo, M. Panareo, A. Papa, V. Pettinacci, A. Popov, F. Renga, S. Ritt, M. Rossella, A. Rozhdestvensky. S. Scarpellini, P. Schwendimann, G. Signorelli, M. Takahashi, Y. Uchiyama, A. Venturini, B. Vitali, C. Voena, K. Yamamoto, R. Yokota, T. Yonemoto | Link | The observation of a resonance structure in the opening angle of the electron-positron pairs in the $^{7}$Li(p,\ee) $^{8}$Be reaction was claimed and interpreted as the production and subsequent decay of a hypothetical particle (X17). Similar excesses, consistent with this particle, were later observed in processes involving $^{4}$He and $^{12}$C nuclei with the same experimental technique. The MEG II apparatus at PSI, designed to search for the |
2024-11-07 | MEG: Medical Knowledge-Augmented Large Language Models for Question Answering | Laura Cabello, Carmen Martin-Turrero, Uchenna Akujuobi, Anders Søgaard, Carlos Bobed | Link | Question answering is a natural language understanding task that involves reasoning over both explicit context and unstated, relevant domain knowledge. Large language models (LLMs), which underpin most contemporary question answering systems, struggle to induce how concepts relate in specialized domains such as medicine. Existing medical LLMs are also costly to train. In this work, we present MEG, a parameter-efficient approach for medical knowledge-augmented LLMs. MEG uses a lightweight mapping network to integrate graph embeddings into the LLM, enabling it to leverage external knowledge in a cost-effective way. We evaluate our method on four popular medical multiple-choice datasets and show that LLMs greatly benefit from the factual grounding provided by knowledge graph embeddings. MEG attains an average of +10.2% accuracy over the Mistral-Instruct baseline, and +6.7% over specialized models like BioMistral. We also show results based on Llama-3. Finally, we show that MEG's performance remains robust to the choice of graph encoder. |
2024-10-30 | STIED: A deep learning model for the SpatioTemporal detection of focal Interictal Epileptiform Discharges with MEG | Raquel Fernández-Martín, Alfonso Gijón, Odile Feys, Elodie Juvené, Alec Aeby, Charline Urbain, Xavier De Tiège, Vincent Wens | Link | Magnetoencephalography (MEG) allows the non-invasive detection of interictal epileptiform discharges (IEDs). Clinical MEG analysis in epileptic patients traditionally relies on the visual identification of IEDs, which is time consuming and partially subjective. Automatic, data-driven detection methods exist but show limited performance. Still, the rise of deep learning (DL)-with its ability to reproduce human-like abilities-could revolutionize clinical MEG practice. Here, we developed and validated STIED, a simple yet powerful supervised DL algorithm combining two convolutional neural networks with temporal (1D time-course) and spatial (2D topography) features of MEG signals inspired from current clinical guidelines. Our DL model enabled both temporal and spatial localization of IEDs in patients suffering from focal epilepsy with frequent and high amplitude spikes (FE group), with high-performance metrics-accuracy, specificity, and sensitivity all exceeding 85%-when learning from spatiotemporal features of IEDs. This performance can be attributed to our handling of input data, which mimics established clinical MEG practice. Reverse engineering further revealed that STIED encodes fine spatiotemporal features of IEDs rather than their mere amplitude. The model trained on the FE group also showed promising results when applied to a separate group of presurgical patients with different types of refractory focal epilepsy, though further work is needed to distinguish IEDs from physiological transients. This study paves the way of incorporating STIED and DL algorithms into the routine clinical MEG evaluation of epilepsy. |
2024-10-28 | NeuGPT: Unified multi-modal Neural GPT | Yiqian Yang, Yiqun Duan, Hyejeong Jo, Qiang Zhang, Renjing Xu, Oiwi Parker Jones, Xuming Hu, Chin-teng Lin, Hui Xiong | Link | This paper introduces NeuGPT, a groundbreaking multi-modal language generation model designed to harmonize the fragmented landscape of neural recording research. Traditionally, studies in the field have been compartmentalized by signal type, with EEG, MEG, ECoG, SEEG, fMRI, and fNIRS data being analyzed in isolation. Recognizing the untapped potential for cross-pollination and the adaptability of neural signals across varying experimental conditions, we set out to develop a unified model capable of interfacing with multiple modalities. Drawing inspiration from the success of pre-trained large models in NLP, computer vision, and speech processing, NeuGPT is architected to process a diverse array of neural recordings and interact with speech and text data. Our model mainly focus on brain-to-text decoding, improving SOTA from 6.94 to 12.92 on BLEU-1 and 6.93 to 13.06 on ROUGE-1F. It can also simulate brain signals, thereby serving as a novel neural interface. Code is available at \href{https://github.com/NeuSpeech/NeuGPT}{NeuSpeech/NeuGPT (https://github.com/NeuSpeech/NeuGPT) .} |
2024-10-25 | Resolving Domain Shift For Representations Of Speech In Non-Invasive Brain Recordings | Jeremiah Ridge, Oiwi Parker Jones | Link | Machine learning techniques have enabled researchers to leverage neuroimaging data to decode speech from brain activity, with some amazing recent successes achieved by applications built using invasive devices. However, research requiring surgical implants has a number of practical limitations. Non-invasive neuroimaging techniques provide an alternative but come with their own set of challenges, the limited scale of individual studies being among them. Without the ability to pool the recordings from different non-invasive studies, data on the order of magnitude needed to leverage deep learning techniques to their full potential remains out of reach. In this work, we focus on non-invasive data collected using magnetoencephalography (MEG). We leverage two different, leading speech decoding models to investigate how an adversarial domain adaptation framework augments their ability to generalize across datasets. We successfully improve the performance of both models when training across multiple datasets. To the best of our knowledge, this study is the first ever application of feature-level, deep learning based harmonization for MEG neuroimaging data. Our analysis additionally offers further evidence of the impact of demographic features on neuroimaging data, demonstrating that participant age strongly affects how machine learning models solve speech decoding tasks using MEG data. Lastly, in the course of this study we produce a new open-source implementation of one of these models to the benefit of the broader scientific community. |
2024-10-20 | Non-invasive Neural Decoding in Source Reconstructed Brain Space | Yonatan Gideoni, Ryan Charles Timms, Oiwi Parker Jones | Link | Non-invasive brainwave decoding is usually done using Magneto/Electroencephalography (MEG/EEG) sensor measurements as inputs. This makes combining datasets and building models with inductive biases difficult as most datasets use different scanners and the sensor arrays have a nonintuitive spatial structure. In contrast, fMRI scans are acquired directly in brain space, a voxel grid with a typical structured input representation. By using established techniques to reconstruct the sensors' sources' neural activity it is possible to decode from voxels for MEG data as well. We show that this enables spatial inductive biases, spatial data augmentations, better interpretability, zero-shot generalisation between datasets, and data harmonisation. |
2024-10-19 | BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation | Jilong Li, Zhenxi Song, Jiaqi Wang, Min Zhang, Zhiguo Zhang | Link | Recent advances in decoding language from brain signals (EEG and MEG) have been significantly driven by pre-trained language models, leading to remarkable progress on publicly available non-invasive EEG/MEG datasets. However, previous works predominantly utilize teacher forcing during text generation, leading to significant performance drops without its use. A fundamental issue is the inability to establish a unified feature space correlating textual data with the corresponding evoked brain signals. Although some recent studies attempt to mitigate this gap using an audio-text pre-trained model, Whisper, which is favored for its signal input modality, they still largely overlook the inherent differences between audio signals and brain signals in directly applying Whisper to decode brain signals. To address these limitations, we propose a new multi-stage strategy for semantic brain signal decoding via vEctor-quantized speCtrogram reconstruction for WHisper-enhanced text generatiOn, termed BrainECHO. Specifically, BrainECHO successively conducts: 1) Discrete autoencoding of the audio spectrogram; 2) Brain-audio latent space alignment; and 3) Semantic text generation via Whisper finetuning. Through this autoencoding--alignment--finetuning process, BrainECHO outperforms state-of-the-art methods under the same data split settings on two widely accepted resources: the EEG dataset (Brennan) and the MEG dataset (GWilliams). The innovation of BrainECHO, coupled with its robustness and superiority at the sentence, session, and subject-independent levels across public datasets, underscores its significance for language-based brain-computer interfaces. |
2024-10-11 | Determining sensor geometry and gain in a wearable MEG system | Ryan M. Hill, Gonzalo Reina Rivero, Ashley J. Tyler, Holly Schofield, Cody Doyle, James Osborne, David Bobela, Lukas Rier, Joseph Gibson, Zoe Tanner, Elena Boto, Richard Bowtell, Matthew J. Brookes, Vishal Shah, Niall Holmes | Link | Optically pumped magnetometers (OPMs) are compact and lightweight sensors that can measure magnetic fields generated by current flow in neuronal assemblies in the brain. Such sensors enable construction of magnetoencephalography (MEG) instrumentation, with significant advantages over conventional MEG devices including adaptability to head size, enhanced movement tolerance, lower complexity and improved data quality. However, realising the potential of OPMs depends on our ability to perform system calibration, which means finding sensor locations, orientations, and the relationship between the sensor output and magnetic field (termed sensor gain). Such calibration is complex in OPMMEG since, for example, OPM placement can change from subject to subject (unlike in conventional MEG where sensor locations or orientations are fixed). Here, we present two methods for calibration, both based on generating well-characterised magnetic fields across a sensor array. Our first device (the HALO) is a head mounted system that generates dipole like fields from a set of coils. Our second (the matrix coil (MC)) generates fields using coils embedded in the walls of a magnetically shielded room. Our results show that both methods offer an accurate means to calibrate an OPM array (e.g. sensor locations within 2 mm of the ground truth) and that the calibrations produced by the two methods agree strongly with each other. When applied to data from human MEG experiments, both methods offer improved signal to noise ratio after beamforming suggesting that they give calibration parameters closer to the ground truth than factory settings and presumed physical sensor coordinates and orientations. Both techniques are practical and easy to integrate into real world MEG applications. This advances the field significantly closer to the routine use of OPMs for MEG recording. |
2024-10-09 | Nested Deep Learning Model Towards A Foundation Model for Brain Signal Data | Fangyi Wei, Jiajie Mo, Kai Zhang, Haipeng Shen, Srikantan Nagarajan, Fei Jiang | Link | Epilepsy affects over 50 million people globally, with EEG/MEG-based spike detection playing a crucial role in diagnosis and treatment. Manual spike identification is time-consuming and requires specialized training, limiting the number of professionals available to analyze EEG/MEG data. To address this, various algorithmic approaches have been developed. However, current methods face challenges in handling varying channel configurations and in identifying the specific channels where spikes originate. This paper introduces a novel Nested Deep Learning (NDL) framework designed to overcome these limitations. NDL applies a weighted combination of signals across all channels, ensuring adaptability to different channel setups, and allows clinicians to identify key channels more accurately. Through theoretical analysis and empirical validation on real EEG/MEG datasets, NDL demonstrates superior accuracy in spike detection and channel localization compared to traditional methods. The results show that NDL improves prediction accuracy, supports cross-modality data integration, and can be fine-tuned for various neurophysiological applications. |
Publish Date | Title | Authors | URL | Abstract |
---|---|---|---|---|
2024-10-25 | A prescriptive theory for brain-like inference | Hadi Vafaii, Dekel Galor, Jacob L. Yates | Link | The Evidence Lower Bound (ELBO) is a widely used objective for training deep generative models, such as Variational Autoencoders (VAEs). In the neuroscience literature, an identical objective is known as the variational free energy, hinting at a potential unified framework for brain function and machine learning. Despite its utility in interpreting generative models, including diffusion models, ELBO maximization is often seen as too broad to offer prescriptive guidance for specific architectures in neuroscience or machine learning. In this work, we show that maximizing ELBO under Poisson assumptions for general sequence data leads to a spiking neural network that performs Bayesian posterior inference through its membrane potential dynamics. The resulting model, the iterative Poisson VAE (iP-VAE), has a closer connection to biological neurons than previous brain-inspired predictive coding models based on Gaussian assumptions. Compared to amortized and iterative VAEs, iP-VAElearns sparser representations and exhibits superior generalization to out-of-distribution samples. These findings suggest that optimizing ELBO, combined with Poisson assumptions, provides a solid foundation for developing prescriptive theories in NeuroAI. |
2024-09-09 | Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models | Emily Cheng, Richard J. Antonello | Link | Research has repeatedly demonstrated that intermediate hidden states extracted from large language models are able to predict measured brain response to natural language stimuli. Yet, very little is known about the representation properties that enable this high prediction performance. Why is it the intermediate layers, and not the output layers, that are most capable for this unique and highly general transfer task? In this work, we show that evidence from language encoding models in fMRI supports the existence of a two-phase abstraction process within LLMs. We use manifold learning methods to show that this abstraction process naturally arises over the course of training a language model and that the first "composition" phase of this abstraction process is compressed into fewer layers as training continues. Finally, we demonstrate a strong correspondence between layerwise encoding performance and the intrinsic dimensionality of representations from LLMs. We give initial evidence that this correspondence primarily derives from the inherent compositionality of LLMs and not their next-word prediction properties. |
2024-07-22 | Predictive Coding Networks and Inference Learning: Tutorial and Survey | Björn van Zwol, Ro Jefferson, Egon L. van den Broek | Link | Recent years have witnessed a growing call for renewed emphasis on neuroscience-inspired approaches in artificial intelligence research, under the banner of NeuroAI. A prime example of this is predictive coding networks (PCNs), based on the neuroscientific framework of predictive coding. This framework views the brain as a hierarchical Bayesian inference model that minimizes prediction errors through feedback connections. Unlike traditional neural networks trained with backpropagation (BP), PCNs utilize inference learning (IL), a more biologically plausible algorithm that explains patterns of neural activity that BP cannot. Historically, IL has been more computationally intensive, but recent advancements have demonstrated that it can achieve higher efficiency than BP with sufficient parallelization. Furthermore, PCNs can be mathematically considered a superset of traditional feedforward neural networks (FNNs), significantly extending the range of trainable architectures. As inherently probabilistic (graphical) latent variable models, PCNs provide a versatile framework for both supervised learning and unsupervised (generative) modeling that goes beyond traditional artificial neural networks. This work provides a comprehensive review and detailed formal specification of PCNs, particularly situating them within the context of modern ML methods. Additionally, we introduce a Python library (PRECO) for practical implementation. This positions PC as a promising framework for future ML innovations. |
2023-10-29 | Beyond Geometry: Comparing the Temporal Structure of Computation in Neural Circuits with Dynamical Similarity Analysis | Mitchell Ostrow, Adam Eisen, Leo Kozachkov, Ila Fiete | Link | How can we tell whether two neural networks utilize the same internal processes for a particular computation? This question is pertinent for multiple subfields of neuroscience and machine learning, including neuroAI, mechanistic interpretability, and brain-machine interfaces. Standard approaches for comparing neural networks focus on the spatial geometry of latent states. Yet in recurrent networks, computations are implemented at the level of dynamics, and two networks performing the same computation with equivalent dynamics need not exhibit the same geometry. To bridge this gap, we introduce a novel similarity metric that compares two systems at the level of their dynamics, called Dynamical Similarity Analysis (DSA). Our method incorporates two components: Using recent advances in data-driven dynamical systems theory, we learn a high-dimensional linear system that accurately captures core features of the original nonlinear dynamics. Next, we compare different systems passed through this embedding using a novel extension of Procrustes Analysis that accounts for how vector fields change under orthogonal transformation. In four case studies, we demonstrate that our method disentangles conjugate and non-conjugate recurrent neural networks (RNNs), while geometric methods fall short. We additionally show that our method can distinguish learning rules in an unsupervised manner. Our method opens the door to comparative analyses of the essential temporal structure of computation in neural circuits. |
2023-05-25 | Explaining V1 Properties with a Biologically Constrained Deep Learning Architecture | Galen Pogoncheff, Jacob Granley, Michael Beyeler | Link | Convolutional neural networks (CNNs) have recently emerged as promising models of the ventral visual stream, despite their lack of biological specificity. While current state-of-the-art models of the primary visual cortex (V1) have surfaced from training with adversarial examples and extensively augmented data, these models are still unable to explain key neural properties observed in V1 that arise from biological circuitry. To address this gap, we systematically incorporated neuroscience-derived architectural components into CNNs to identify a set of mechanisms and architectures that comprehensively explain neural activity in V1. We show drastic improvements in model-V1 alignment driven by the integration of architectural components that simulate center-surround antagonism, local receptive fields, tuned normalization, and cortical magnification. Upon enhancing task-driven CNNs with a collection of these specialized components, we uncover models with latent representations that yield state-of-the-art explanation of V1 neural activity and tuning properties. Our results highlight an important advancement in the field of NeuroAI, as we systematically establish a set of architectural components that contribute to unprecedented explanation of V1. The neuroscience insights that could be gleaned from increasingly accurate in-silico models of the brain have the potential to greatly advance the fields of both neuroscience and artificial intelligence. |
2024-11-09 | A Deep Probabilistic Spatiotemporal Framework for Dynamic Graph Representation Learning with Application to Brain Disorder Identification | Sin-Yee Yap, Junn Yong Loo, Chee-Ming Ting, Fuad Noman, Raphael C. -W. Phan, Adeel Razi, David L. Dowe | Link | Recent applications of pattern recognition techniques on brain connectome classification using functional connectivity (FC) are shifting towards acknowledging the non-Euclidean topology and dynamic aspects of brain connectivity across time. In this paper, a deep spatiotemporal variational Bayes (DSVB) framework is proposed to learn time-varying topological structures in dynamic FC networks for identifying autism spectrum disorder (ASD) in human participants. The framework incorporates a spatial-aware recurrent neural network with an attention-based message passing scheme to capture rich spatiotemporal patterns across dynamic FC networks. To overcome model overfitting on limited training datasets, an adversarial training strategy is introduced to learn graph embedding models that generalize well to unseen brain networks. Evaluation on the ABIDE resting-state functional magnetic resonance imaging dataset shows that our proposed framework substantially outperforms state-of-the-art methods in identifying patients with ASD. Dynamic FC analyses with DSVB-learned embeddings reveal apparent group differences between ASD and healthy controls in brain network connectivity patterns and switching dynamics of brain states. The code is available at https://github.com/Monash-NeuroAI/Deep-Spatiotemporal-Variational-Bayes. |
2023-03-11 | Towards NeuroAI: Introducing Neuronal Diversity into Artificial Neural Networks | Feng-Lei Fan, Yingxin Li, Hanchuan Peng, Tieyong Zeng, Fei Wang | Link | Throughout history, the development of artificial intelligence, particularly artificial neural networks, has been open to and constantly inspired by the increasingly deepened understanding of the brain, such as the inspiration of neocognitron, which is the pioneering work of convolutional neural networks. Per the motives of the emerging field: NeuroAI, a great amount of neuroscience knowledge can help catalyze the next generation of AI by endowing a network with more powerful capabilities. As we know, the human brain has numerous morphologically and functionally different neurons, while artificial neural networks are almost exclusively built on a single neuron type. In the human brain, neuronal diversity is an enabling factor for all kinds of biological intelligent behaviors. Since an artificial network is a miniature of the human brain, introducing neuronal diversity should be valuable in terms of addressing those essential problems of artificial networks such as efficiency, interpretability, and memory. In this Primer, we first discuss the preliminaries of biological neuronal diversity and the characteristics of information transmission and processing in a biological neuron. Then, we review studies of designing new neurons for artificial networks. Next, we discuss what gains can neuronal diversity bring into artificial networks and exemplary applications in several important fields. Lastly, we discuss the challenges and future directions of neuronal diversity to explore the potential of NeuroAI. |
2022-12-08 | A Rubric for Human-like Agents and NeuroAI | Ida Momennejad | Link | Researchers across cognitive, neuro-, and computer sciences increasingly reference human-like artificial intelligence and neuroAI. However, the scope and use of the terms are often inconsistent. Contributed research ranges widely from mimicking behaviour, to testing machine learning methods as neurally plausible hypotheses at the cellular or functional levels, or solving engineering problems. However, it cannot be assumed nor expected that progress on one of these three goals will automatically translate to progress in others. Here a simple rubric is proposed to clarify the scope of individual contributions, grounded in their commitments to human-like behaviour, neural plausibility, or benchmark/engineering goals. This is clarified using examples of weak and strong neuroAI and human-like agents, and discussing the generative, corroborate, and corrective ways in which the three dimensions interact with one another. The author maintains that future progress in artificial intelligence will need strong interactions across the disciplines, with iterative feedback loops and meticulous validity tests, leading to both known and yet-unknown advances that may span decades to come. |
2023-02-22 | Toward Next-Generation Artificial Intelligence: Catalyzing the NeuroAI Revolution | Anthony Zador, Sean Escola, Blake Richards, Bence Ölveczky, Yoshua Bengio, Kwabena Boahen, Matthew Botvinick, Dmitri Chklovskii, Anne Churchland, Claudia Clopath, James DiCarlo, Surya Ganguli, Jeff Hawkins, Konrad Koerding, Alexei Koulakov, Yann LeCun, Timothy Lillicrap, Adam Marblestone, Bruno Olshausen, Alexandre Pouget, Cristina Savin, Terrence Sejnowski, Eero Simoncelli, Sara Solla, David Sussillo, Andreas S. Tolias, Doris Tsao | Link | Neuroscience has long been an essential driver of progress in artificial intelligence (AI). We propose that to accelerate progress in AI, we must invest in fundamental research in NeuroAI. A core component of this is the embodied Turing test, which challenges AI animal models to interact with the sensorimotor world at skill levels akin to their living counterparts. The embodied Turing test shifts the focus from those capabilities like game playing and language that are especially well-developed or uniquely human to those capabilities, inherited from over 500 million years of evolution, that are shared with all animals. Building models that can pass the embodied Turing test will provide a roadmap for the next generation of AI. |
2022-04-11 | Social Neuro AI: Social Interaction as the "dark matter" of AI | Samuele Bolotta, Guillaume Dumas | Link | This article introduces a three-axis framework indicating how AI can be informed by biological examples of social learning mechanisms. We argue that the complex human cognitive architecture owes a large portion of its expressive power to its ability to engage in social and cultural learning. However, the field of AI has mostly embraced a solipsistic perspective on intelligence. We thus argue that social interactions not only are largely unexplored in this field but also are an essential element of advanced cognitive ability, and therefore constitute metaphorically the dark matter of AI. In the first section, we discuss how social learning plays a key role in the development of intelligence. We do so by discussing social and cultural learning theories and empirical findings from social neuroscience. Then, we discuss three lines of research that fall under the umbrella of Social NeuroAI and can contribute to developing socially intelligent embodied agents in complex environments. First, neuroscientific theories of cognitive architecture, such as the global workspace theory and the attention schema theory, can enhance biological plausibility and help us understand how we could bridge individual and social theories of intelligence. Second, intelligence occurs in time as opposed to over time, and this is naturally incorporated by dynamical systems. Third, embodiment has been demonstrated to provide more sophisticated array of communicative signals. To conclude, we discuss the example of active inference, which offers powerful insights for developing agents that possess biological realism, can self-organize in time, and are socially embodied. |
Publish Date | Title | Authors | URL | Abstract |
---|---|---|---|---|
2024-11-18 | Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion | Meng Zhou, Yuxuan Zhang, Xiaolan Xu, Jiayi Wang, Farzad Khalvati | Link | Multimodal medical image fusion is a crucial task that combines complementary information from different imaging modalities into a unified representation, thereby enhancing diagnostic accuracy and treatment planning. While deep learning methods, particularly Convolutional Neural Networks (CNNs) and Transformers, have significantly advanced fusion performance, some of the existing CNN-based methods fall short in capturing fine-grained multiscale and edge features, leading to suboptimal feature integration. Transformer-based models, on the other hand, are computationally intensive in both the training and fusion stages, making them impractical for real-time clinical use. Moreover, the clinical application of fused images remains unexplored. In this paper, we propose a novel CNN-based architecture that addresses these limitations by introducing a Dilated Residual Attention Network Module for effective multiscale feature extraction, coupled with a gradient operator to enhance edge detail learning. To ensure fast and efficient fusion, we present a parameter-free fusion strategy based on the weighted nuclear norm of softmax, which requires no additional computations during training or inference. Extensive experiments, including a downstream brain tumor classification task, demonstrate that our approach outperforms various baseline methods in terms of visual quality, texture preservation, and fusion speed, making it a possible practical solution for real-world clinical applications. The code will be released at https://github.com/simonZhou86/en_dran. |
2024-11-18 | SP${ }^3$ : Superpixel-propagated pseudo-label learning for weakly semi-supervised medical image segmentation | Shiman Li, Jiayue Zhao, Shaolei Liu, Xiaokun Dai, Chenxi Zhang, Zhijian Song | Link | Deep learning-based medical image segmentation helps assist diagnosis and accelerate the treatment process while the model training usually requires large-scale dense annotation datasets. Weakly semi-supervised medical image segmentation is an essential application because it only requires a small amount of scribbles and a large number of unlabeled data to train the model, which greatly reduces the clinician's effort to fully annotate images. To handle the inadequate supervisory information challenge in weakly semi-supervised segmentation (WSSS), a SuperPixel-Propagated Pseudo-label (SP${}^3$) learning method is proposed, using the structural information contained in superpixel for supplemental information. Specifically, the annotation of scribbles is propagated to superpixels and thus obtains a dense annotation for supervised training. Since the quality of pseudo-labels is limited by the low-quality annotation, the beneficial superpixels selected by dynamic thresholding are used to refine pseudo-labels. Furthermore, aiming to alleviate the negative impact of noise in pseudo-label, superpixel-level uncertainty is incorporated to guide the pseudo-label supervision for stable learning. Our method achieves state-of-the-art performance on both tumor and organ segmentation datasets under the WSSS setting, using only 3\% of the annotation workload compared to fully supervised methods and attaining approximately 80\% Dice score. Additionally, our method outperforms eight weakly and semi-supervised methods under both weakly supervised and semi-supervised settings. Results of extensive experiments validate the effectiveness and annotation efficiency of our weakly semi-supervised segmentation, which can assist clinicians in achieving automated segmentation for organs or tumors quickly and ultimately benefit patients. |
2024-11-18 | Review on vortex dynamics in the left ventricle as an early diagnosis marker for heart diseases and its treatment outcomes | Mahesh S. Nagargoje, Eneko Lazpita, Jesús Garicano-Mena, Soledad Le Clainche | Link | The heart is the central part of the cardiovascular network. Its role is to pump blood to various body organs. Many cardiovascular diseases occur due to an abnormal functioning of the heart. A diseased heart leads to severe complications and in some cases death of an individual. The medical community believes that early diagnosis and treatment of heart diseases can be controlled by referring to numerical simulations of image-based heart models. Computational Fluid Dynamics (CFD) is a commonly used tool for patient-specific simulations in the cardiac flows, and it can be equipped to allow a better understanding of flow patterns. In this paper, we review the progress of CFD tools to understand the flow patterns in healthy and dilated cardiomyopathic (DCM) left ventricles (LV). The formation of an asymmetric vortex in a healthy LV shows an efficient way of blood transport. The vortex pattern changes before any change in the geometry of LV is noticeable. This flow change can be used as a marker of DCM progression. We can conclude that understanding vortex dynamics in LV using various vortex indexes coupled with data-driven approaches can be used as an early diagnosis tool and improvement in DCM treatment. |
2024-11-18 | Membership Inference Attack against Long-Context Large Language Models | Zixiong Wang, Gaoyang Liu, Yang Yang, Chen Wang | Link | Recent advances in Large Language Models (LLMs) have enabled them to overcome their context window limitations, and demonstrate exceptional retrieval and reasoning capacities on longer context. Quesion-answering systems augmented with Long-Context Language Models (LCLMs) can automatically search massive external data and incorporate it into their contexts, enabling faithful predictions and reducing issues such as hallucinations and knowledge staleness. Existing studies targeting LCLMs mainly concentrate on addressing the so-called lost-in-the-middle problem or improving the inference effiencicy, leaving their privacy risks largely unexplored. In this paper, we aim to bridge this gap and argue that integrating all information into the long context makes it a repository of sensitive information, which often contains private data such as medical records or personal identities. We further investigate the membership privacy within LCLMs external context, with the aim of determining whether a given document or sequence is included in the LCLMs context. Our basic idea is that if a document lies in the context, it will exhibit a low generation loss or a high degree of semantic similarity to the contents generated by LCLMs. We for the first time propose six membership inference attack (MIA) strategies tailored for LCLMs and conduct extensive experiments on various popular models. Empirical results demonstrate that our attacks can accurately infer membership status in most cases, e.g., 90.66% attack F1-score on Multi-document QA datasets with LongChat-7b-v1.5-32k, highlighting significant risks of membership leakage within LCLMs input contexts. Furthermore, we examine the underlying reasons why LCLMs are susceptible to revealing such membership information. |
2024-11-18 | Lung Disease Detection with Vision Transformers: A Comparative Study of Machine Learning Methods | Baljinnyam Dayan | Link | Recent advancements in medical image analysis have predominantly relied on Convolutional Neural Networks (CNNs), achieving impressive performance in chest X-ray classification tasks, such as the 92% AUC reported by AutoThorax-Net and the 88% AUC achieved by ChexNet in classifcation tasks. However, in the medical field, even small improvements in accuracy can have significant clinical implications. This study explores the application of Vision Transformers (ViT), a state-of-the-art architecture in machine learning, to chest X-ray analysis, aiming to push the boundaries of diagnostic accuracy. I present a comparative analysis of two ViT-based approaches: one utilizing full chest X-ray images and another focusing on segmented lung regions. Experiments demonstrate that both methods surpass the performance of traditional CNN-based models, with the full-image ViT achieving up to 97.83% accuracy and the lung-segmented ViT reaching 96.58% accuracy in classifcation of diseases on three label and AUC of 94.54% when label numbers are increased to eight. Notably, the full-image approach showed superior performance across all metrics, including precision, recall, F1 score, and AUC-ROC. These findings suggest that Vision Transformers can effectively capture relevant features from chest X-rays without the need for explicit lung segmentation, potentially simplifying the preprocessing pipeline while maintaining high accuracy. This research contributes to the growing body of evidence supporting the efficacy of transformer-based architectures in medical image analysis and highlights their potential to enhance diagnostic precision in clinical settings. |
2024-11-18 | ezyMRI: How to build an MRI machine from scratch -- Experience from a four-day hackathon | Shaoying Huang, José Miguel Algarín, Joseba Alonso, Anieyrudh R, Jose Borreguero, Fabian Bschorr, Paul Cassidy, Wei Ming Choo, David Corcos, Teresa Guallart-Naval, Heng Jing Han, Kay Chioma Igwe, Jacob Kang, Joe Li, Sebastian Littin, Jie Liu, Gonzalo Gabriel Rodriguez, Eddy Solomon, Li-Kuo Tan, Rui Tian, Andrew Webb, Susanna Weber, Dan Xiao, Minxuan Xu, Wenwei Yu, Zhiyong Zhang, Isabelle Zinghini, Bernhard Blümich | Link | Nuclear magnetic resonance instruments are becoming available to the do-it-yourself community. The challenges encountered in the endeavor to build a magnetic resonance imaging instrument from scratch were confronted in a four-day hackathon at Singapore University of Technology and Design in spring 2024. One day was devoted to educational lectures and three days to system construction and testing. Seventy young researchers from all parts of the world formed six teams focusing on magnet, gradient coil, RF coil, console, system integration, and design, which together produced a working MRI instrument in three days. The different steps, encountered challenges, and their solutions are reported. |
2024-11-18 | TP-UNet: Temporal Prompt Guided UNet for Medical Image Segmentation | Ranmin Wang, Limin Zhuang, Hongkun Chen, Boyan Xu, Ruichu Cai | Link | The advancement of medical image segmentation techniques has been propelled by the adoption of deep learning techniques, particularly UNet-based approaches, which exploit semantic information to improve the accuracy of segmentations. However, the order of organs in scanned images has been disregarded by current medical image segmentation approaches based on UNet. Furthermore, the inherent network structure of UNet does not provide direct capabilities for integrating temporal information. To efficiently integrate temporal information, we propose TP-UNet that utilizes temporal prompts, encompassing organ-construction relationships, to guide the segmentation UNet model. Specifically, our framework is featured with cross-attention and semantic alignment based on unsupervised contrastive learning to combine temporal prompts and image features effectively. Extensive evaluations on two medical image segmentation datasets demonstrate the state-of-the-art performance of TP-UNet. Our implementation will be open-sourced after acceptance. |
2024-11-18 | Progressive Generalization Risk Reduction for Data-Efficient Causal Effect Estimation | Hechuan Wen, Tong Chen, Guanhua Ye, Li Kheng Chai, Shazia Sadiq, Hongzhi Yin | Link | Causal effect estimation (CEE) provides a crucial tool for predicting the unobserved counterfactual outcome for an entity. As CEE relaxes the requirement for ``perfect'' counterfactual samples (e.g., patients with identical attributes and only differ in treatments received) that are impractical to obtain and can instead operate on observational data, it is usually used in high-stake domains like medical treatment effect prediction. Nevertheless, in those high-stake domains, gathering a decently sized, fully labelled observational dataset remains challenging due to hurdles associated with costs, ethics, expertise and time needed, etc., of which medical treatment surveys are a typical example. Consequently, if the training dataset is small in scale, low generalization risks can hardly be achieved on any CEE algorithms. Unlike existing CEE methods that assume the constant availability of a dataset with abundant samples, in this paper, we study a more realistic CEE setting where the labelled data samples are scarce at the beginning, while more can be gradually acquired over the course of training -- assuredly under a limited budget considering their expensive nature. Then, the problem naturally comes down to actively selecting the best possible samples to be labelled, e.g., identifying the next subset of patients to conduct the treatment survey. However, acquiring quality data for reducing the CEE risk under limited labelling budgets remains under-explored until now. To fill the gap, we theoretically analyse the generalization risk from an intriguing perspective of progressively shrinking its upper bound, and develop a principled label acquisition pipeline exclusively for CEE tasks. With our analysis, we propose the Model Agnostic Causal Active Learning (MACAL) algorithm for batch-wise label acquisition, which aims to reduce both the CEE model's uncertainty and the post-acquisition ... |
2024-11-17 | DeepSPV: An Interpretable Deep Learning Pipeline for 3D Spleen Volume Estimation from 2D Ultrasound Images | Zhen Yuan, David Stojanovski, Lei Li, Alberto Gomez, Haran Jogeesvaran, Esther Puyol-Antón, Baba Inusa, Andrew P. King | Link | Splenomegaly, the enlargement of the spleen, is an important clinical indicator for various associated medical conditions, such as sickle cell disease (SCD). Spleen length measured from 2D ultrasound is the most widely used metric for characterising spleen size. However, it is still considered a surrogate measure, and spleen volume remains the gold standard for assessing spleen size. Accurate spleen volume measurement typically requires 3D imaging modalities, such as computed tomography or magnetic resonance imaging, but these are not widely available, especially in the Global South which has a high prevalence of SCD. In this work, we introduce a deep learning pipeline, DeepSPV, for precise spleen volume estimation from single or dual 2D ultrasound images. The pipeline involves a segmentation network and a variational autoencoder for learning low-dimensional representations from the estimated segmentations. We investigate three approaches for spleen volume estimation and our best model achieves 86.62%/92.5% mean relative volume accuracy (MRVA) under single-view/dual-view settings, surpassing the performance of human experts. In addition, the pipeline can provide confidence intervals for the volume estimates as well as offering benefits in terms of interpretability, which further support clinicians in decision-making when identifying splenomegaly. We evaluate the full pipeline using a highly realistic synthetic dataset generated by a diffusion model, achieving an overall MRVA of 83.0% from a single 2D ultrasound image. Our proposed DeepSPV is the first work to use deep learning to estimate 3D spleen volume from 2D ultrasound images and can be seamlessly integrated into the current clinical workflow for spleen assessment. |
2024-11-17 | MPLite: Multi-Aspect Pretraining for Mining Clinical Health Records | Eric Yang, Pengfei Hu, Xiaoxue Han, Yue Ning | Link | The adoption of digital systems in healthcare has resulted in the accumulation of vast electronic health records (EHRs), offering valuable data for machine learning methods to predict patient health outcomes. However, single-visit records of patients are often neglected in the training process due to the lack of annotations of next-visit information, thereby limiting the predictive and expressive power of machine learning models. In this paper, we present a novel framework MPLite that utilizes Multi-aspect Pretraining with Lab results through a light-weight neural network to enhance medical concept representation and predict future health outcomes of individuals. By incorporating both structured medical data and additional information from lab results, our approach fully leverages patient admission records. We design a pretraining module that predicts medical codes based on lab results, ensuring robust prediction by fusing multiple aspects of features. Our experimental evaluation using both MIMIC-III and MIMIC-IV datasets demonstrates improvements over existing models in diagnosis prediction and heart failure prediction tasks, achieving a higher weighted-F1 and recall with MPLite. This work reveals the potential of integrating diverse aspects of data to advance predictive modeling in healthcare. |