Seminar Papers

[Article] Direct preference optimization: Your language model is secretly a reward model & Reinforced self-training (rest) for language modeling & Rlcd: Reinforcement learning from contrast distillation for language model alignment

Direct preference optimization: Your language model is secretly a reward model

Summary: This paper proposes a method to directly align language models using human preference data without reinforcement learning. It simplifies the optimization process by utilizing a binary classification loss based on human preferences, eliminating the need for explicit reward model training. This approach is simpler, more stable, and more efficient than RLHF.

Rafailov, Rafael, et al. “Direct preference optimization: Your language model is secretly a reward model.” Advances in Neural Information Processing Systems 36 (2024).

Reinforced self-training (rest) for language modeling

Summary: ReST is a reinforcement learning algorithm that improves language model policies by sampling outputs from an initial model and refining them using offline RL. It enhances data reusability, reduces computational costs, and aligns outputs with human preferences.

Gulcehre, Caglar, et al. “Reinforced self-training (rest) for language modeling.” arXiv preprint arXiv:2308.08998 (2023).

Rlcd: Reinforcement learning from contrast distillation for language model alignment

Summary: RLCD aligns language models using contrastive distillation by generating preference data from positive and negative prompts. A reward model trained on these preferences refines the model via reinforcement learning. This approach reduces noise in preference data and achieves superior alignment compared to RLHF.

Yang, Kevin, et al. “Rlcd: Reinforcement learning from contrast distillation for language model alignment.” arXiv preprint arXiv:2307.12950 (2023).

[Article] Hierarchical organization of objects in scenes is reflected in mental representations of objects

summary: This study explored how the hierarchical structure of scene grammar is reflected in our object recognition. Scenes are divided into several “phrases,” each consisting of a central “anchor” object and its surrounding “local objects.” Participants consistently judged object pairs within the same phrase to be more similar, and this tendency was observed across both images and words. This demonstrates that the hierarchical structure of the visual environment is integrated into our abstract mental representations. Consequently, this study is expected to provide insights into how stimuli can be extracted from natural scene data, such as COCO images.

Turini, Jacopo, and Melissa Le-Hoa Võ. “Hierarchical organization of objects in scenes is reflected in mental representations of objects.” Scientific Reports 12.1 (2022): 20068.

[Article] Principles and Techniques of Blood Pressure Measurement & Investigating the physiological mechanisms of the photoplethysmogram features for blood pressure estimation & Pulse Transit Time Based Continuous Cuffless Blood Pressure Estimation: A New Extension and A Comprehensive Evaluation.

Principles and Techniques of Blood Pressure Measurement

Summary: The first study reviews the principles and techniques of blood pressure (BP) measurement, emphasizing the importance of accurate methods for clinical and research applications. It discusses various BP measurement techniques, including auscultatory, oscillometric, and ambulatory methods, and highlights potential sources of error and strategies to minimize them.

Ogedegbe, G., & Pickering, T. (2010). Principles and techniques of blood pressure measurement. Cardiology clinics, 28(4), 571-586.

Investigating the physiological mechanisms of the photoplethysmogram features for blood pressure estimation

Summary: The second study examines the physiological basis of PPG features for BP estimation by analyzing 65 features from 12 healthy subjects during cold stimuli and exercise recovery.

Lin, W. H., Li, X., Li, Y., Li, G., & Chen, F. (2020). Investigating the physiological mechanisms of the photoplethysmogram features for blood pressure estimation. Physiological measurement, 41(4), 044003.

Pulse Transit Time Based Continuous Cuffless Blood Pressure Estimation: A New Extension and A Comprehensive Evaluation

Summary: The third study demonstrates a cuffless BP measurement method combining PTT and the novel PPG intensity ratio (PIR).

Ding, X., Yan, B. P., Zhang, Y. T., Liu, J., Zhao, N., & Tsang, H. K. (2017). Pulse transit time based continuous cuffless blood pressure estimation: A new extension and a comprehensive evaluation. Scientific reports, 7(1), 1-11.

[Article] Hierarchical organization of social action features along the lateral visual pathway

summary: The first study investigates the hierarchical organization of social action features along the lateral visual stream, revealing that the brain processes increasingly complex features, from low-level motion in early visual areas to high-level communicative actions in the superior temporal sulcus (STS). The second study demonstrates a shared neural code for representing both human actions and object events, suggesting that the brain uses a common neural mechanism to interpret the physics of interactions, independent of animacy. Together, these findings provide new insights into the neural architecture underlying social perception and event understanding, highlighting both specialized and generalized processes in the human brain.

Karakose-Akbiyik, Seda, Alfonso Caramazza, and Moritz F. Wurm. “A shared neural code for the physics of actions and object events.” Nature Communications 14.1 (2023): 3316.

McMahon, Emalie, Michael F. Bonner, and Leyla Isik. “Hierarchical organization of social action features along the lateral visual pathway.” Current Biology 33.23 (2023): 5035-5047.

[Article] Development and validation of an fMRI-informed EEG model of reward-related ventral striatum activation. Neuroimage.

summary: Here they develop and validate an accessible and affordable probe of neural activation related to reward processing in the ventral striatum(VS). Using an fMRI-informed EEG approach, they identified a particular spatial-temporal-spectral EEG representation that is predictive of the concurrently acquired fMRI activity in the ventral striatum while responding to rewarding stimuli. They found the VS-electrical fingerprint model to be correlated significantly with the BOLD signal in the VS and associated regions across individuals.

Singer, Neomi, et al. “Development and validation of an fMRI-informed EEG model of reward-related ventral striatum activation.” Neuroimage 276 (2023): 120183.

[Article] Review on Psychological Stress Detection Using Biosignals. IEEE Transactions on Affective Computing.

Summary: This paper provides a comprehensive overview of how psychological stress can be detected through various biosignals. It discusses the physiological processes triggered by stress, which are measurable through signals like EEG, ECG, EDA, and others(7 more bio-signals). The paper aims to establish reliable biosignal indices that can effectively indicate stress levels, emphasizing the need for consistency and robustness in biosignal data features.

Giannakakis, Giorgos, et al. “Review on psychological stress detection using biosignals.” IEEE transactions on affective computing 13.1 (2019): 440-460.

[Article] Eye-lrcn: A long-term recurrent convolutional network for eye blink completeness detection. IEEE Transactions on Neural Networks and Learning Systems.

Summary: The article introduces Eye-LRCN, a new method for eye blink detection that also evaluates blink completeness using a Long-Term Recurrent Convolutional Network (LRCN). This approach combines a CNN for feature extraction with a bidirectional RNN for sequence learning, and employs a Siamese architecture to handle class imbalance and limited data. Eye-LRCN demonstrates superior performance in blink detection and completeness assessment, and achieves noticeable results in eye state detection.

de la Cruz, Gonzalo, et al. “Eye-lrcn: A long-term recurrent convolutional network for eye blink completeness detection.” IEEE Transactions on Neural Networks and Learning Systems 35.4 (2022): 5130-5140.

[Article] 20 years of the default mode network: A review and synthesis. Neuron.

Summary: The author thoroughly reviewed organization of the default mode network (DMN) and cognitive roles of the DMN (i.e., self-reference, social cognition, memory, mind wandering). Finally, he suggested a new perspective of the DMN function in human cognitition, in which the DMN intergrate and “broadcast” various representations to create coherent “interal narrative”.

Menon, V. (2023). 20 years of the default mode network: A review and synthesis. Neuron.

[Article] Shared functional specialization in transformer-based language models and the human brain.

Summary: Transformers are recently being compared to the brain. Usually, the internal representations (“embeddings”) are adopted for comparisons. However, the authors focused on “transformations” that integrate contextual information across words, and found that they are more layer-specific than the embeddings. It differs from existing research in that it focuses on transformations related to attention instead of embeddings, which has been one of our recent interests.

Kumar, S., Sumers, T. R., Yamakoshi, T., Goldstein, A., Hasson, U., Norman, K. A., … & Nastase, S. A. (2024). Shared functional specialization in transformer-based language models and the human brain. Nature Communications, 15(1), 5523.

[not published] 1. Protocol to investigate the strategic manipulation of human causal inference through in-silico task design 2. Improving the Adaptivity of Reinforcement Learning Agent Based on the Prefrontal Cortex Meta-Control Theory of the Human Brain

summary : This study introduces a cognitive model and task controller to enhance human causal inference abilities through controlled learning strategies, including one-shot and incremental learning. It aims to optimize the efficiency of learning causal relationships by manipulating the presentation sequence of stimulus-outcome pairs, with potential applications in cognitive training.
summary : This thesis investigates methods to enhance the adaptivity of reinforcement learning agents based on the prefrontal cortex meta-control theory of the human brain. The proposed Meta-Dyna algorithm is designed to adapt flexibly to changes in the environment and has demonstrated optimal performance in various settings.

KJH_lab_seminar_24Jul17.pdf