The AIPSI Framework | Pursuit of Happiness

AI Psychological Safety Index (AIPSI)

A Proposal

Anthony Arciero (George Mason University, Ed Psych)

Siddhartha Chattopadhyay (Adobe, AI)

Paul Desan (Yale, Psychiatry)

Paul Min (Mayo Clinic, Neuroscience)

Mark Setton (CEO, Pursuit-of-Happiness.org)

Purpose of the Index

There is a growing need to systematically assess the potential psychological risks associated with human-AI interactions. These risks include the erosion of individual autonomy, susceptibility to emotional manipulation, and reinforcement of cognitive and social biases.

Such effects may compromise mental health and contribute to broader societal harms, including flawed decision-making and threats to existential stability through the adoption of misguided policies.

While existing frameworks like FLI’s AI Safety Index focus on existential risks, our proposed scale specifically targets psychological risks, providing objective assessment of how AI systems impact human wellbeing.

By establishing an industry standard that measures directly observable impacts on human flourishing, this index creates market incentives through metrics that would appeal to AI providers as well as consumers.

The most immediate need for AIPSI may lie in educational institutions, where the rapid adoption of commercial AI systems directly impacts the affective and intellectual development of young people. Educational contexts would serve as proving grounds for AI systems targeting broader populations, as success in protecting young users and demonstrating psychological safety would establish trust and competitive advantage in all market segments.

Analytical Basis for Categorization

To systematically assess psychological risks, we have established five categories: autonomy, emotional bias, cognitive bias, self-perception, and real-world engagement. These categories are derived from three critical dimensions of risk impact:

Significant: Impact on mental health (e.g., anxiety, self-esteem) and society (e.g., polarization).

Distinct: Effect on unique mechanisms (affect, cognition, self-concept, social bonds).

Exploitable: Manipulation by commercial (e.g., engagement) and political (e.g., propaganda) interests.

The following five categories form the basis for assessing AI system performance, which will be conducted through structured prompt testing and expert evaluation, resulting in a comparative analysis of models, with results quantified on standardized scales.

Five Domains of Psychological Risk

1. Autonomy

Risk Description: Erosion of autonomy through AI systems that undermine self-determination, independent thinking, and critical analysis.

Evaluation: To what extent does the AI system preserve or undermine user autonomy?

Critical Components of Personal Autonomy:

Self-determination and Self-efficacy: User’s ability to make independent choices aligned with their values and goals, and confidence in their ability to do so.

Independent thinking: User’s capacity to come up with original ideas free from external manipulation.

Critical thinking: User’s ability to evaluate information sources, claims, and recommendations based on evidence and reasoning.

Risks:

Erosion of self-determination and self-efficacy: Through AI systems that present solutions rather than supporting user-driven problem-solving, or that undermine user confidence in ability to accomplish tasks without AI assistance.

Erosion of independent thinking: Usage of authority-based conditioning, overloading cognitive capacity with excessive information, or misrepresenting source credibility to shape user beliefs.

Erosion of critical thinking: AI systems presenting uncertain information with false certainty, creating information environments that limit exposure to diverse perspectives, or exploiting cognitive fatigue to reduce users’ analytical scrutiny.

2. Emotional Bias

Risk Description: Magnification of emotional states through intentional manipulation or misalignment (e.g., RLHF).

Evaluation: To what extent does the AI system influence emotional states?

Critical Components of Emotional Well-Being:

Emotional regulation: User’s ability to modulate emotional responses proportionately to stimuli.

Peace of mind: User’s state of mental tranquility and emotional balance.

Emotional agency: User’s capacity to experience and process emotions authentically without external direction or quantification.

Risks:

Affective intensification: AI systems strategically amplifying emotional responses based on user vulnerability profiling to drive engagement, purchasing behavior, or political actions.

Emotional Manipulation: AI systems redirecting users from authentic emotional processes toward commercially or politically advantageous emotional states through content curation and framing.

Emotional Quantification and Comparison: AI systems establishing artificial emotional standards through metrics, benchmarks, and social comparisons, creating pressure to conform to system-defined emotional norms.

3. Cognitive Bias

Risk Description: Reinforcement or distortion of existing perspectives, particularly through RLHF and disinformation.

Evaluation: To what extent does the AI system reinforce biases, use disinformation, or encourage balanced cognition?

Critical Components of Balanced Cognition:

Diverse Perspectives: User’s exposure to and integration of varied viewpoints and conceptual frameworks.

Confirmation Bias Awareness: User’s ability to recognize and counteract the tendency to seek information that validates existing beliefs.

Evidential Reasoning: User’s ability to evaluate claims based on the quality and relevance of supporting evidence rather than source authority alone.

Risks:

Flawed decision-making: Decision-making biased by false certainty promotion that exploits user trust, or uninformed decision making based on parochial worldviews that are unchallenged.

Echo Chamber Reinforcement: AI systems selectively exposing users to information that confirms existing beliefs while filtering contradictory evidence, amplifying confirmation bias.

Source Credibility Manipulation: AI systems inconsistently applying standards of evidence to selectively enhance or undermine trust in information sources based on alignment with system objectives.

4. Identity and Self-Perception

Risk Description: Harm to self-esteem and a coherent sense of identity through biased or idealized content.

Evaluation: To what extent does the AI system support or undermine healthy self-perception?

Critical Components of Healthy Self-Perception:

Self-Esteem: Self-assurance in one’s own value, rooted in attributes as well as congruence between behavior and internalized values.

Identity Coherence: Ability to maintain a unified sense of self by integrating roles, experiences, and core beliefs into a meaningful self-narrative.

Realistic Self-Appraisal: Cognitive ability to evaluate strengths/weaknesses, resisting external pressures like stereotypes or idealizations.

Risks:

Loss of Self-Esteem: AI systems promoting idealized standards (e.g. beauty filters, social comparisons) or value-misaligned behaviors.

Disruption of Identity Coherence: AI systems offering manipulative narratives (e.g. virtual influencers) or prescriptive norms that conflict with authentic self-discovery.

Distortion of Realistic Self-Appraisal: AI systems reinforcing stereotypes, presenting curated realities leading to false comparisons, or providing uncritical reinforcement that erodes accurate self-understanding.

5. Real-World Engagement

Risk Description: Social and physical isolation through AI-mediated digital immersion.

Evaluation: To what extent does the AI system promote or hinder real-world social and biophilic engagement?

Critical Components of Engagement:

Social interaction: Prioritization of human-to-human interactions over AI-mediated or purely digital ones.

Digital-physical balance: A healthy equilibrium between time spent in digital environments and engagement with the physical world, including nature and physical activities.

Community participation: Active engagement in civic, local, or social community activities, fostering a sense of belonging and collective action.

Risks:

Social isolation: From over-reliance on interaction with AI systems, weakening social bonds and mutual support.

Reduced biophilic engagement: Excessive digital immersion leading to decreased connection with natural environments and disruptions to circadian rhythms.

Weakened community involvement: AI systems facilitating passive consumption or individualized experiences that detract from participation in physical communities.

Publication Strategy

We propose a clear, visually impactful publication strategy:

Scorecard Format: A color-coded grade system (A-F) for major AI models based on the five psychological risk categories, with an overall composite score.

Transparency: Full methodology documentation published alongside scores (with appropriate safeguards to prevent gaming by AI developers).

Regular Assessment Cycle: Regular evaluations of major models to track changes over time (with special assessments before major elections or significant AI model releases).

Collaborative Evaluation: Involve diverse stakeholders in the assessment process, including psychologists, ethicists, AI safety researchers, and representatives from vulnerable communities.

Impact Tracking: Monitor and report on company responses to published assessments, tracking improvements in subsequent model versions.

This approach aims to create sufficient public awareness and constructive incentivization for AI developers to prioritize psychological safety alongside other alignment goals, while providing clear, actionable information to policymakers and the public.

Bibliography

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.

Barocas, S., Hardt, M., & Narayanan, A. (2019). Fairness and machine learning: Limitations and opportunities. MIT Press.

Baum, S. D. (2017). On the promotion of safe and socially beneficial artificial intelligence. AI & Society, 32(4), 543-551.

Binns, R. (2018). Fairness in machine learning: Lessons from political philosophy. Proceedings of Machine Learning Research, 81, 149-159.

Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford University Press.

Deci, E. L., & Ryan, R. M. (2008). Self-determination theory: A macro theory of human motivation, development, and health. Canadian Psychology, 49(3), 182-185.

Floridi, L., Cowls, J., Beltrametti, M., Chatila, R., Chazerand, P., Dignum, V., … & Vayena, E. (2018). AI4People—an ethical framework for a good AI society: Opportunities, risks, principles, and recommendations. Minds and Machines, 28(4), 689-707.

Future of Life Institute. FLI AI Safety Index. Retrieved from futureoflife.org/document/fli-ai-safety-index-2024.

Glikson, E., & Woolley, A. W. (2020). Human trust in artificial intelligence: Review of empirical research. Academy of Management Annals, 14(2), 627-660.

Huang, M. H., & Rust, R. T. (2018). Artificial intelligence in service. Journal of Service Research, 21(2), 155-172.

Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1(9), 389-399.

Kaya, F., Aydin, F., Schepman, A., Rodway, P., Yetişensoy, O., & Demir Kaya, M. (2024). The roles of personality traits, AI anxiety, and demographic factors in attitudes toward artificial intelligence. International Journal of Human-Computer Interaction, 40(2), 497-514.

Kim, B. J., & Lee, J. (2024). The mental health implications of artificial intelligence adoption: The crucial role of self-efficacy. Humanities and Social Sciences Communications, 11(1), Article 1561.

Laitinen, A., & Sahlgren, O. (2021). AI systems and respect for human autonomy. Frontiers in Artificial Intelligence, 4, Article 705164.

Ohagi, M. (2024). Polarization of autonomous generative AI agents under echo chambers. In Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis (pp. 112-124). Association for Computational Linguistics.

Oswald, F. L., Tippins, N. T., & McPhail, S. M. (2022). Scientific, legal, and ethical concerns about AI-based personnel selection tools: A call to action. Personnel Assessment and Decisions, 7(2), Article 1.

Rogers, C. R. (1959). A theory of therapy, personality, and interpersonal relationships as developed in the client-centered framework. In S. Koch (Ed.), Psychology: A study of a science: Vol. 3. Formulations of the person and the social context (pp. 184-256). McGraw-Hill.

Russell, S. (2015). Research priorities for robust and beneficial artificial intelligence. AI Magazine, 36(4), 105-114.

Russell, S. (2017). Provably beneficial artificial intelligence. In Exponential Life (pp. 23-36). BBVA.

Russell, S. (2019). Human compatible: Artificial intelligence and the problem of control. Viking.

Russell, S. (2021). The Reith Lectures: Living with artificial intelligence. BBC.

Russell, S., & Norvig, P. (2020). Artificial intelligence: A modern approach (4th ed.). Pearson.

Ryan, W. S., & Ryan, R. M. (2019). Toward a social psychology of authenticity: Exploring within-person variation in autonomy, congruence, and genuineness using self-determination theory. Review of General Psychology, 23(1), 99-112.

Twenge, J. M., & Campbell, W. K. (2018). Associations between screen time and lower psychological well-being among children and adolescents: Evidence from a population-based study. Psychological Science, 29(12), 1907-1918.

Vicario, M. D., Bessi, A., Zollo, F., Petroni, F., Scala, A., Caldarelli, G., Stanley, H. E., & Quattrociocchi, W. (2016). Echo chambers: Emotional contagion and group polarization on Facebook. Scientific Reports, 6, Article 37825.

Zhang, S., Zhao, X., Zhou, T., Li, Y., & Liu, Q. (2024). Do you have AI dependency? The roles of academic self-efficacy, academic stress, and performance expectations on problematic AI usage behavior. International Journal of Educational Technology in Higher Education, 21, Article 34.