TY - GEN
T1 - Detecting Edit Failures In Large Language Models
T2 - Findings of the Association for Computational Linguistics, ACL 2023
AU - Hoelscher-Obermaier, Jason
AU - Persson, Julia H.
AU - Kran, Esben
AU - Konstas, Ioannis
AU - Barez, Fazl
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Recent model editing techniques promise to mitigate the problem of memorizing false or outdated associations during large language model (LLM) training. However, we show that these techniques can introduce large unwanted side effects which are not detected by existing specificity benchmarks. We extend the existing COUNTERFACT benchmark to include a dynamic component and dub our benchmark COUNTERFACT+. Additionally, we extend the metrics used for measuring specificity by a principled KL divergence-based metric. We use this improved benchmark to evaluate recent model editing techniques and find that they suffer from low specificity. Our findings highlight the need for improved specificity benchmarks that identify and prevent unwanted side effects.
AB - Recent model editing techniques promise to mitigate the problem of memorizing false or outdated associations during large language model (LLM) training. However, we show that these techniques can introduce large unwanted side effects which are not detected by existing specificity benchmarks. We extend the existing COUNTERFACT benchmark to include a dynamic component and dub our benchmark COUNTERFACT+. Additionally, we extend the metrics used for measuring specificity by a principled KL divergence-based metric. We use this improved benchmark to evaluate recent model editing techniques and find that they suffer from low specificity. Our findings highlight the need for improved specificity benchmarks that identify and prevent unwanted side effects.
UR - http://www.scopus.com/inward/record.url?scp=85166307252&partnerID=8YFLogxK
U2 - 10.18653/v1/2023.findings-acl.733
DO - 10.18653/v1/2023.findings-acl.733
M3 - Conference article
AN - SCOPUS:85166307252
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 11548
EP - 11559
BT - Findings of the Association for Computational Linguistics, ACL 2023
PB - Association for Computational Linguistics (ACL)
Y2 - 9 July 2023 through 14 July 2023
ER -