Grad-CAM uses the gradient information flowing into the last convolutional layer of the CNN to assign importance values to each neuron for a particular decision of interest.
in order to obain the class-discriminative localization map Grad-CAM $L^c_{GRAD-CAM}$ of with u and height v for any class c, we first compute the gradient of the score the class c, $y^c$(before the softmax), with respect to feature map activations $A^k$ of a convolutional layer, (${\partial y^c \over \partial A^k}$).
These gradients flowing back are global-average-pooled over the width and height dimensions (index by i and j respectively) to obtain the neuron importance weights $\alpha^c_k$:
$\alpha^c_k = {1\over Z}\sum_i \sum_j {\partial y^c \over \partial A^k_{ij}}$
This weight $a^c_k$ represents a partial linearization of the deep entwork downstream from A, and captures the importance of feature map k for a tarket class c. then perform a weighted combination of forward activation maps, and follow it by a ReLU to obtain.
$L^c_{Grad-CAM} = ReLU(\sum_k a^c_kA^k)$
Research Question:
Potential Problem: