In recent research highlighted by a Purdue University team at the Computer-Human Interaction conference, the level of reliability of AI, particularly OpenAI’s ChatGPT, in providing coding assistance traditionally sought from platforms like Stack Overflow was scrutinized.
While AI continues to evolve as an auxiliary tool for various needs, including coding assistance, the findings from this study underscore the necessity for ongoing evaluation and improvement of AI’s accuracy and awareness of its limitations. Integrating these advanced tools into everyday tasks requires a balanced understanding of their capabilities and shortcomings for optimal utilization.
- Correctness: 52% of ChatGPT’s answers contained incorrect information.
- Verbosity: 77% of ChatGPT’s answers were more verbose than necessary.
- Consistency: 78% of ChatGPT’s answers showed different degrees of inconsistency than human answers.
- Linguistic Analysis: ChatGPT tends to use more formal and analytical language and portrays less negative sentiment.
- User Study: Despite the incorrect information, participants preferred ChatGPT answers 35% of the time due to their comprehensiveness and well-articulated language style. However, they also overlooked the misinformation in ChatGPT answers 39% of the time.
The study suggests that while ChatGPT can be remarkably helpful in many cases, it frequently makes errors and unnecessarily prolongs responses. The richer linguistic features of ChatGPT answers lead some users to prefer them over human answers, sometimes overlooking the underlying incorrectness and inconsistencies.
Inaccuracies in ChatGPT’s coding responses can be attributed to conceptual errors due to a lack of understanding of programming concepts and an inability to reason about program semantics rather than hallucinations. The implications of these findings are significant for the programming community, as they highlight the need for awareness about the potential risks of misinformation in AI-generated content and the importance of verifying the accuracy of such information.
You can read the detailed discussion in the original study article for more insights.