Next One:Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering