目前,AI已经在很多领域的各类任务中取得了优异的成绩,但是仍在很多地方受人诟病,其中最常最提及的就是其可解释性差,靠数据、模型、概率等方式训练出来的AI明显很难满足人们心中对AI的理想化,这样的方式并不elegant,与此同时,在一些特殊领域,如医学领域这类对结果极其严格的领域,其解释性更是限制了其应用(当然也限制了一些无意义的SOTA提升和灌水,所以可解释性,还是很有意义和前景的
最近看了一些关于卷积神经网络可视化的内容,觉得挺有趣的,暂做简短笔记
模板匹配?权重?
传统的图像处理方法,主要内容包括滤波、变换、连通域分析、边缘检测等等,比较依赖于对图像的理解与相应处理方法的选择,后期比较火的SIFT,就是集大成者,也是非常成功的算法,而此类操作过程都比较符合大家的认知,也主要基于人为的设计,所以过程容易理解,人们普遍容易接受,但是性能逐渐出现了瓶颈
而后神经网络逐渐复兴,越来越强势,自Alexnet开始,逐渐屠榜,而大家其实也知道,卷积神经网络的优秀之处就在于仅需要定义相应合适的模型结构,就可以通过数据和训练,逐渐收敛并得到参数稳定的模型,而隐藏层的权重其实也就类似于滤波,其根本上的原理也是通过窗口扫描获得候选图像与滤波进行匹配,计算响应,
在我研究神经网络挑选冷冻电镜颗粒的时候有看到类似的探究:
一开始使用三层金字塔式的简单NN训练,结果已经不错了,能够比较好地区别particle和non-particle:
(2001-NN-An Automatic Particle Pickup Method Using a Neural Network Applicable to Low-Contrast Electron Micrographs)
他们在讨论里就分析了,其实隐藏层权重就是类似模板匹配的滤波,(冷冻电镜挑选颗粒的早期就是通过模板匹配),而且隐藏层和输出层之间的连接权重更像每一个模板匹配输出结果的混合,所以需要很少的节点就能够满足原来认为需要很多的模板空间
In the NN method the connection weights assigned to the particle featured during training work as template-matching filters, allowing particle recognition.
The better the signs of the pixel densities match those of the connection weights, the larger the input into the hidden unit becomes. In the present study, the NN makes 81 template-matching filters (groups of connection weights) since 81 hidden units were adopted. The connection weight between the hidden layer and the output layer acts as a mixer of each output created by the template filter matching. Thereby, a limited number of templatematching filters of 81 seems to work well even on a particle projection of the intermediate Euler angle from that of the filters. For the number of the unit in the hidden layer of the NN, 81 seems accurate enough to pickup the present pseudo-fourfold symmetric sodium channel. Thus, the net weight of the layers defines image fields as particles or noise and indicates the precision of the assignment; the closer the value is to 1, the better the match to the particle templates.
而后他们在04年,继续探究:
(2004-NN-Automatic particle pickup method using a neural network has high accuracy by applying an initial weight derived from eigenimages a new reference free )
相对颗粒的模板图像进行PCA分解,然后选择将其特征图像作为网络的初始权重,在其基础上再进行训练,取得了更好的效果。他进一步分析是,原来的模板匹配嘞,是颗粒的完整模式去进行匹配,虽然容易对,但是很局限,而我们通过PCA权重训练的NN呢,是拆解的部分模式去匹配,然后组合,这样
Fig. 9. A schematic representation of the particle recognition mechanism in the NN as revealed in the present study. A trained NN starting from the PCA weights was compared to one starting from the random weights. (A) A NN trained from the random weights. A few recognition filters which correspond to large parts of a particle feature recognize a particle by a positive output. (B) A NN trained from the PCA weights. More than several recognition filters which correspond to small parts of a particle feature recognize a particle. In both cases, the combination of the recognition filters highly depended on the input.
Thereby, in the NN from the PCA, a huge number of combinations of the recognition filters was created to recognize a particle, leading to robust recognition of particle projections at various Euler angles.
其实当时看到还蛮震惊的,因为其实我没了解过早期人们对CNN这方面的探究,但是在这里我们看到了他们尝试对CNN机制的研究还是蛮有前瞻性的
低层次特征 ——> 高层次特征?
卷积神经网络,其浅层网络提取的就是简单的像素值、色块,而深层网络,在浅层简单特征的基础上,逐渐提取出高层次的特征,最后体现对分类模式的响应
所以,伴随着resnet等更高明的CNN模型涌出,深度学习再次将神经网络推上了新的高地,因为能够提取出更加高层次的特征,而这些恰恰是我们传统的图像算法难以切入的地方,因为我们无法认为设计出那么合适的高层次特征,所以这也是如今DCNN如此有效的原因之一吧
近年来通过DCNN挑选颗粒的文章就太多了,也很有意思,(希望自己综述发表后能够在这里补上一个链接哈哈
那么深度卷积神经网络的解释性、可视化上就有很多有意思的探究工作了,也是我写这篇文章的初衷
Deep Dream
详情看之前草率整理过的blogDeep Dream
Guided-Backpropagation
这个没了解过,感觉好像和Deep Dream是一回事,以后有空再分析一下
CAM(Class activation map)
Learning Deep Features for Discriminative Localization,原文链接
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that can be applied to a variety of tasks. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014, which is remarkably close to the 34.2% top-5 error achieved by a fully supervised CNN approach. We demonstrate that our network is able to localize the discriminative image regions on a variety of tasks despite not being trained for them
浅谈Class Activation Mapping(CAM),大体了解,看这个就够了
[论文笔记] CAM:Class Activation Maps,详细论文解析
CAM Class Activation Mapping -pytorch,测试代码,实测无问题,非常nice
Class Activation Mapping In PyTorch,还未测试过
Grad-CAM
针对CAM需要修改模型,重现训练的问题,提出了Grad-CAM
We propose a technique for producing “visual explanations” for decisions from a large class of CNN-based models, making them more transparent. Our approach - Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept, flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Unlike previous approaches, GradCAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers (e.g. VGG), (2) CNNs used for structured outputs (e.g. captioning), (3) CNNs used in tasks with multimodal inputs (e.g. VQA) or reinforcement learning, without any architectural changes or re-training. We combine GradCAM with fine-grained visualizations to create a high-resolution class-discriminative visualization and apply it to off-the-shelf image classification, captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into their failure modes (showing that seemingly unreasonable predictions have reasonable explanations), (b) are robust to adversarial images, (c) outperform previous methods on weakly-supervised localization, (d) are more faithful to the underlying model and (e) help achieve generalization by identifying dataset bias. For captioning and VQA, our visualizations show that even non-attention based models can localize inputs. Finally, we conduct human studies to measure if GradCAM explanations help users establish trust in predictions from deep networks and show that GradCAM helps untrained users successfully discern a “stronger” deep network from a “weaker” one. Our code is available at this https URL A demo and a video of the demo can be found at this http URL and youtu.be/COjUB9Izk6E.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization,原文链接
“卷积可视化”:Grad-CAM,解析和代码实现
Grad-CAM implementation in Pytorch
Ohers
在看博客的时候,看到一个神仙github库,有个作者整理了Convolutional Neural Network Visualizations,非常全乎,还有源码,超棒!(虽然我还没试过嘿嘿
冲!
其实还有很多有意思的探索和发现,目前还没有机会深入了解,希望如果能读博的话,还是蛮想探索一下这块和医学图像的结合的,最近组里在做骨龄检测,有传统方法和神经网络方法,但是感觉缺乏一些临床知识的支持,所以我们希望发挥我们医学方面的积累,结合临床上医生判断的方法,结合到处理和分类中去,就涉及到局部区域的比对,而看到一篇文章,虽然是传统的路子,但是训练结束后,通过CAM导出了骨龄判断的map,就发现,确实激励比较大的区域是需要重点关注的区域,所以其实我们不去添加此类先验知识,模型也是可能学到的,(感觉还有点强化学习的意思?),而这就通过CAM体现了出来,虽然他只是一个讨论,没有反过来继续针对这个区域进行特殊处理(也许是做不出来?),但是也已经非常有趣了
所以如果能够对医学图像可视化、可解释性进行探究,不仅能够提高性能,让我们更好地理解图像,同时也能够帮助我们解释work的原因,让AI更让大家认可