Animesh Nighojkar

Ph.D. Student


Curriculum vitae



Advancing Machine and Human Reasoning (AMHR) Lab

University of South Florida



No strong feelings one way or another: Re-operationalizing Neutrality in Natural Language Inference


Conference paper


Animesh Nighojkar, Antonio Laverghetta Jr., John Licato
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII), Association for Computational Linguistics, Toronto, Canada, 2023 Jul, pp. 199–210


ACL Anthology arXiv
Cite

Cite

APA   Click to copy
Nighojkar, A., Jr., A. L., & Licato, J. (2023). No strong feelings one way or another: Re-operationalizing Neutrality in Natural Language Inference (pp. 199–210). Toronto, Canada: Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.law-1.20


Chicago/Turabian   Click to copy
Nighojkar, Animesh, Antonio Laverghetta Jr., and John Licato. “No Strong Feelings One Way or Another: Re-Operationalizing Neutrality in Natural Language Inference.” In , 199–210. Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII). Toronto, Canada: Association for Computational Linguistics, 2023.


MLA   Click to copy
Nighojkar, Animesh, et al. No Strong Feelings One Way or Another: Re-Operationalizing Neutrality in Natural Language Inference. Association for Computational Linguistics, 2023, pp. 199–210, doi:10.18653/v1/2023.law-1.20.


BibTeX   Click to copy

@inproceedings{animesh2023a,
  title = {No strong feelings one way or another: Re-operationalizing Neutrality in Natural Language Inference},
  year = {2023},
  month = jul,
  address = {Toronto, Canada},
  pages = {199–210},
  publisher = {Association for Computational Linguistics},
  series = {Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)},
  doi = {10.18653/v1/2023.law-1.20},
  author = {Nighojkar, Animesh and Jr., Antonio Laverghetta and Licato, John},
  month_numeric = {7}
}

Abstract

Natural Language Inference (NLI) has been a cornerstone task in evaluating language models’ inferential reasoning capabilities. However, the standard three-way classification scheme used in NLI has well-known shortcomings in evaluating models’ ability to capture the nuances of natural human reasoning. In this paper, we argue that the operationalization of the neutral label in current NLI datasets has low validity, is interpreted inconsistently, and that at least one important sense of neutrality is often ignored. We uncover the detrimental impact of these shortcomings, which in some cases leads to annotation datasets that actually decrease performance on downstream tasks. We compare approaches of handling annotator disagreement and identify flaws in a recent NLI dataset that designs an annotator study based on a problematic operationalization. Our findings highlight the need for a more refined evaluation framework for NLI, and we hope to spark further discussion and action in the NLP community.

Share

Tools
Translate to