Interpreting Language Models Through Concept Descriptions: A Survey Paper β’ 2510.01048 β’ Published Oct 1 β’ 2
Infherno: End-to-end Agent-based FHIR Resource Synthesis from Free-form Clinical Notes Paper β’ 2507.12261 β’ Published Jul 16 β’ 1
Table Understanding and (Multimodal) LLMs: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data Paper β’ 2507.00152 β’ Published Jun 30 β’ 1
Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework Paper β’ 2506.15538 β’ Published Jun 18 β’ 1
Truth or Twist? Optimal Model Selection for Reliable Label Flipping Evaluation in LLM-based Counterfactuals Paper β’ 2505.13972 β’ Published May 20 β’ 1
Through a Compressed Lens: Investigating the Impact of Quantization on LLM Explainability and Interpretability Paper β’ 2505.13963 β’ Published May 20 β’ 1
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods Paper β’ 2505.01198 β’ Published May 2 β’ 2
Inseq: An Interpretability Toolkit for Sequence Generation Models Paper β’ 2302.13942 β’ Published Feb 27, 2023 β’ 1
LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools Paper β’ 2401.12576 β’ Published Jan 23, 2024 β’ 2
Free-text Rationale Generation under Readability Level Control Paper β’ 2407.01384 β’ Published Jul 1, 2024
Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools Paper β’ 2108.13961 β’ Published Aug 31, 2021
Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods Paper β’ 2210.07222 β’ Published Oct 13, 2022
InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations Paper β’ 2310.05592 β’ Published Oct 9, 2023