kp1197 17 hours ago Does performing gradient descent on token input embeddings lead to interpretable results? And if not, why?
Does performing gradient descent on token input embeddings lead to interpretable results? And if not, why?