[Article] Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types.
Summary: Current deep learning methods often focus on modeling genome sequences of a fixed set of cell types and do not account for the interaction between multiple regulatory elements. They propose a simple yet effective approach for pre-training genome data in a multi-modal and self-supervised manner, which we call GeneBERT. They pre-train and evaluate GeneBERT model on regulatory downstream tasks across different cell types, including promoter classification, transaction factor binding sites prediction, disease risk estimation, and splicing sites prediction.