Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of
large language models
(LLMs) on various
question answering tasks
. While understanding why CoT prompting is effective is cruci
通过分析从基于提示的模型中提取的归因得分的合理性和忠实性,并将其与从微调模型和大型语言模型中提取的归因得分进行比较,我们发现使用基于提示的范例(无论是基于编码器的模型还是解码器的模型)比在低资源环境下微调模型产生更合理的解释,并且Shapley Value Sampling在产生更合理和忠实的解释方面始终优于注意力和积分梯度。