Automatic Detection of LLM-Generated Code: A Comparative Case Study of Contemporary Models Across Function and Class Granularities
NeutralArtificial Intelligence
- A comparative study has been conducted on the automatic detection of code generated by Large Language Models (LLMs), specifically analyzing outputs from GPT-3.5, Claude 3 Haiku, Claude Haiku 4.5, and GPT-OSS. The research evaluated 14,485 Python functions and 11,913 classes from the CodeSearchNet dataset, revealing significant differences in detection capabilities based on code granularity.
- This development is crucial as it highlights the limitations of existing detection methods, which often operate as 'black boxes' and lack systematic validation across different models. The findings suggest that understanding the structural signatures of code at function and class levels is essential for improving detection accuracy.
- The study underscores ongoing concerns regarding the safety and reliability of LLMs in code generation, particularly as these models are increasingly integrated into software systems. The introduction of new methodologies, such as Graph-Regularized Sparse Autoencoders and continual learning approaches, reflects a growing emphasis on enhancing LLM safety and performance in various applications.
— via World Pulse Now AI Editorial System
