SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought
PositiveArtificial Intelligence
SALT, or Steering Activations towards Leakage-free Thinking, has been introduced as a solution to a pressing privacy issue in Large Language Models (LLMs). These models, while increasingly utilized as personal assistants, have been found to leak sensitive information through their internal reasoning processes, which can violate user privacy expectations. SALT aims to mitigate this leakage by injecting targeted steering vectors into the model's hidden states, effectively reducing the exposure of sensitive details during reasoning. Experimental results demonstrate that SALT achieves notable reductions in privacy leakage, with an 18.2% decrease in CPL on QwQ-32B, 17.9% on Llama-3.1-8B, and 31.2% on Deepseek, all while maintaining comparable task performance. This balance between privacy and utility is essential as LLMs continue to evolve and integrate into daily life, highlighting the importance of safeguarding user data against inadvertent exposure.
— via World Pulse Now AI Editorial System
