[Article] Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types.

Summary: Current deep learning methods often focus on modeling genome sequences of a fixed set of cell types and do not account for the interaction between multiple regulatory elements. They propose a simple yet effective approach for pre-training genome data in a multi-modal and self-supervised manner, which we call GeneBERT. They pre-train and evaluate GeneBERT model on regulatory downstream tasks across different cell types, including promoter classification, transaction factor binding sites prediction, disease risk estimation, and splicing sites prediction.

Mo, Shentong, et al. “Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types.” arXiv preprint arXiv:2110.05231 (2021).