Conceptual Limitations of Current AI Safety Approaches and Virtue Ethics as an Alternative

Masaharu Mizumoto, Rujuta Karekar, Mads Udengaard, Mayank Goel, Daan Henselmans, Nurshafira Noh, Saptadip Saha, Pranshul Bohra

Abstract

In our project iVAIS: Ideally Virtuous AI System with Virtue as its Deep Character, we try to build an ideally virtuous AI system, as a contribution to AI Safety research. It is still a preliminary attempt as a pilot study, and in the present short paper, we rather focus on the ideological justification of our project by demonstrating why our approach is necessary for AI Safety to prevent the ultimate X-risks, pointing out the fundamental, conceptual limitations of the currently major rule-based approaches of frontier AI companies. We argue that philosophy has already demonstrated the limitations of the rule-based or principle-based approaches, which are closely related to the advantage of virtual ethics over deontology and consequentialism in moral theories. Also, widely shared views about meaning, understanding, and knowledge in philosophy demonstrate the limitations of mechanistic interpretability. Although we do not deny the value of such approaches, for the purpose of AI Safety, we argue that they are rather inefficient and roundabout approaches to achieve the same goal, and the approach based on virtue ethics is simple and robust, and therefore, much more efficient.

Previous
Previous

Noel — "Alternative approach for Sparse Embeddings for AI Mechanical Interpretability"

Next
Next

Kierans et al. — "Catastrophic Liability: Managing Systemic Risks in Frontier AI Development"