Learning dynamic multimodal networks