A Discussion of ‘Adversarial Examples Are Not Bugs, They Are Features’: Robust Feature Leakage
Ilyas et al. report a surprising result: a model trained on adversarial examples is effective on clean data. They suggest this transfer is driven by adverserial examples containing geuinely useful non-robust cues. But an alternate mechanism for the transfer could be a kind of “robust feature leakage” where the model picks up on faint robust […]