The main claims of the paper [1] A certain level of localization labels are inevitable for WSOL. In fact, prior works that claim to be weakly supervised use strong supervision implicitly. Therefore, let’s standardize a protocol where the models are allowed to use pixel-level masks or bounding boxes to a limited degree. According to their proposed evaluation method, they have not observed any improvement in WSOL performances since CAM (2016) in this protocol.