Summary: This paper proposes a method to directly align language models using human preference data without reinforcement learning. It simplifies the optimization process by utilizing a binary classification loss based on human preferences, eliminating the need for explicit reward model training. This approach is simpler, more stable, and more efficient than RLHF.
Summary: ReST is a reinforcement learning algorithm that improves language model policies by sampling outputs from an initial model and refining them using offline RL. It enhances data reusability, reduces computational costs, and aligns outputs with human preferences.
Summary: RLCD aligns language models using contrastive distillation by generating preference data from positive and negative prompts. A reward model trained on these preferences refines the model via reinforcement learning. This approach reduces noise in preference data and achieves superior alignment compared to RLHF.
summary: This study explored how the hierarchical structure of scene grammar is reflected in our object recognition. Scenes are divided into several “phrases,” each consisting of a central “anchor” object and its surrounding “local objects.” Participants consistently judged object pairs within the same phrase to be more similar, and this tendency was observed across both images and words. This demonstrates that the hierarchical structure of the visual environment is integrated into our abstract mental representations. Consequently, this study is expected to provide insights into how stimuli can be extracted from natural scene data, such as COCO images.
Summary: The first study reviews the principles and techniques of blood pressure (BP) measurement, emphasizing the importance of accurate methods for clinical and research applications. It discusses various BP measurement techniques, including auscultatory, oscillometric, and ambulatory methods, and highlights potential sources of error and strategies to minimize them.
Summary: The second study examines the physiological basis of PPG features for BP estimation by analyzing 65 features from 12 healthy subjects during cold stimuli and exercise recovery.
Summary: The third study demonstrates a cuffless BP measurement method combining PTT and the novel PPG intensity ratio (PIR).
summary: The first study investigates the hierarchical organization of social action features along the lateral visual stream, revealing that the brain processes increasingly complex features, from low-level motion in early visual areas to high-level communicative actions in the superior temporal sulcus (STS). The second study demonstrates a shared neural code for representing both human actions and object events, suggesting that the brain uses a common neural mechanism to interpret the physics of interactions, independent of animacy. Together, these findings provide new insights into the neural architecture underlying social perception and event understanding, highlighting both specialized and generalized processes in the human brain.
summary: Here they develop and validate an accessible and affordable probe of neural activation related to reward processing in the ventral striatum(VS). Using an fMRI-informed EEG approach, they identified a particular spatial-temporal-spectral EEG representation that is predictive of the concurrently acquired fMRI activity in the ventral striatum while responding to rewarding stimuli. They found the VS-electrical fingerprint model to be correlated significantly with the BOLD signal in the VS and associated regions across individuals.
Summary: This paper provides a comprehensive overview of how psychological stress can be detected through various biosignals. It discusses the physiological processes triggered by stress, which are measurable through signals like EEG, ECG, EDA, and others(7 more bio-signals). The paper aims to establish reliable biosignal indices that can effectively indicate stress levels, emphasizing the need for consistency and robustness in biosignal data features.
Summary: The article introduces Eye-LRCN, a new method for eye blink detection that also evaluates blink completeness using a Long-Term Recurrent Convolutional Network (LRCN). This approach combines a CNN for feature extraction with a bidirectional RNN for sequence learning, and employs a Siamese architecture to handle class imbalance and limited data. Eye-LRCN demonstrates superior performance in blink detection and completeness assessment, and achieves noticeable results in eye state detection.
Summary: The author thoroughly reviewed organization of the default mode network (DMN) and cognitive roles of the DMN (i.e., self-reference, social cognition, memory, mind wandering). Finally, he suggested a new perspective of the DMN function in human cognitition, in which the DMN intergrate and “broadcast” various representations to create coherent “interal narrative”.
Menon, V. (2023). 20 years of the default mode network: A review and synthesis. Neuron.
Summary: Transformers are recently being compared to the brain. Usually, the internal representations (“embeddings”) are adopted for comparisons. However, the authors focused on “transformations” that integrate contextual information across words, and found that they are more layer-specific than the embeddings. It differs from existing research in that it focuses on transformations related to attention instead of embeddings, which has been one of our recent interests.
summary : This study introduces a cognitive model and task controller to enhance human causal inference abilities through controlled learning strategies, including one-shot and incremental learning. It aims to optimize the efficiency of learning causal relationships by manipulating the presentation sequence of stimulus-outcome pairs, with potential applications in cognitive training.
summary : This thesis investigates methods to enhance the adaptivity of reinforcement learning agents based on the prefrontal cortex meta-control theory of the human brain. The proposed Meta-Dyna algorithm is designed to adapt flexibly to changes in the environment and has demonstrated optimal performance in various settings.