硕士论文答辩

发布时间：2026年02月05日作者:aiycxz.cn

委员会名单论文题目：基于深度学习的多模态图像融合方法研究答辩人：张悦专业：计算机科学与技术答辩委员会委员：主席：王伟教授北京航空航天大学委员：李超教授北京航空航天大学李建欣教授北京航空航天大学秘书：王静副教授北京航空航天大学2022年5月# 北京航空航天大学硕士学位论文题目：基于深度学习的多模态图像融合方法研究# 作者：______张悦______指导教师：______王静副教授______*专业：______计算机科学与技术______*2022年5月摘要图像融合技术旨在将来自不同传感器的多幅图像融合成一幅包含更丰富信息的图像，在军事侦察、遥感、医学影像等领域有着广泛的应用。近年来，随着深度学习技术的发展，基于深度学习的图像融合方法取得了显著的效果，但现有方法仍存在一些不足：1）融合图像质量受限于源图像质量，当源图像质量较差时，融合图像质量会显著下降；2）现有方法大多仅能处理特定模态的图像融合任务，缺乏通用性；3）现有方法大多仅关注融合图像的质量，而忽略了融合过程的可解释性。针对上述问题，本文围绕基于深度学习的多模态图像融合方法展开研究，主要工作如下：1. 针对源图像质量较差时融合图像质量下降的问题，提出了一种基于生成对抗网络（GAN）的图像融合方法。该方法利用 GAN 的生成能力，在融合过程中对源图像进行增强，从而提升融合图像的质量。具体地，设计了一个包含生成器和判别器的 GAN 网络，生成器负责将多幅源图像融合成一幅图像，判别器负责判断生成图像的真实性。通过对抗训练，生成器能够学习到源图像之间的互补信息，并生成高质量的融合图像。实验结果表明，该方法在红外与可见光图像融合、多聚焦图像融合等任务上均取得了优于现有方法的效果。2. 针对现有方法缺乏通用性的问题，提出了一种基于 Transformer 的通用图像融合方法。该方法利用 Transformer 强大的特征提取和融合能力，能够处理不同模态的图像融合任务。具体地，设计了一个基于 Transformer 的编码器-解码器网络，编码器负责提取源图像的特征，解码器负责将特征融合并重建出融合图像。通过自注意力机制，Transformer 能够捕捉源图像之间的长程依赖关系，从而更好地融合互补信息。实验结果表明，该方法在红外与可见光图像融合、多聚焦图像融合、医学图像融合等多种任务上均取得了良好的效果，展现了较强的通用性。3. 针对现有方法缺乏可解释性的问题，提出了一种基于注意力机制的可解释图像融合方法。该方法通过可视化注意力权重，揭示了融合过程中源图像不同区域的重要性，从而提高了融合过程的可解释性。具体地，在融合网络中引入注意力机制，让网络自适应地学习源图像不同区域的权重。通过可视化注意力权重，可以直观地看到哪些区域对融合图像的贡献更大。实验结果表明，该方法不仅能够生成高质量的融合图像，还能够提供融合过程的合理解释，有助于用户理解融合结果。综上所述，本文针对基于深度学习的多模态图像融合方法中存在的三个问题，分别提出了相应的解决方案，并通过实验验证了所提方法的有效性。本文的研究成果为图像融合领域提供了新的思路和方法，具有一定的理论意义和应用价值。关键词：图像融合，深度学习，生成对抗网络，Transformer，注意力机制，可解释性# AbstractImage fusion technology aims to fuse multiple images from different sensors into one image containing richer information, which has wide applications in military reconnaissance, remote sensing, medical imaging and other fields. In recent years, with the development of deep learning technology, image fusion methods based on deep learning have achieved remarkable results, but existing methods still have some shortcomings: 1) The quality of fused images is limited by the quality of source images. When the quality of source images is poor, the quality of fused images will decrease significantly; 2) Most existing methods can only handle specific modal image fusion tasks, lacking generality; 3) Most existing methods only focus on the quality of fused images, while ignoring the interpretability of the fusion process.Aiming at the above problems, this paper focuses on the research of multi-modal image fusion methods based on deep learning. The main work is as follows:1. To address the problem of degraded fused image quality when source image quality is poor, an image fusion method based on Generative Adversarial Network (GAN) is proposed. This method uses the generative ability of GAN to enhance source images during the fusion process, thereby improving the quality of fused images. Specifically, a GAN network containing a generator and a discriminator is designed. The generator is responsible for fusing multiple source images into one image, and the discriminator is responsible for judging the authenticity of the generated image. Through adversarial training, the

硕士论文答辩

相关文章