#ai

In the field of artificial intelligence (AI), alignment aims to steer AI systems toward a person's or group's intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues unintended objectives.
It is often challenging for AI designers to align an AI system because it is difficult for them to specify the full range of desired and undesired behaviors. Therefore, AI designers often use simpler proxy goals, such as gaining human approval. But proxy goals can overlook necessary constraints or reward the AI system for merely appearing aligned. AI systems may also find loopholes that allow them to accomplish their proxy goals efficiently but in unintended, sometimes harmful, ways (reward hacking).
Advanced AI systems may develop unwanted instrumental strategies, such as seeking power or survival because such strategies help them achieve their assigned final goals. Furthermore, they might develop undesirable emergent goals that could be hard to detect before the system is deployed and encounters new situations and data distributions. Empirical research showed in 2024 that advanced large language models (LLMs) such as OpenAI o1 or Claude 3 sometimes engage in strategic deception to achieve their goals or prevent them from being changed.
Today, some of these issues affect existing commercial systems such as LLMs, robots, autonomous vehicles, and social media recommendation engines. Some AI researchers argue that more capable future systems will be more severely affected because these problems partially result from high capabilities.
Many prominent AI researchers and the leadership of major AI companies have argued or asserted that AI is approaching human-like (AGI) and superhuman cognitive capabilities (ASI), and could endanger human civilization if misaligned. These include "AI Godfathers" Geoffrey Hinton and Yoshua Bengio and the CEOs of OpenAI, Anthropic, and Google DeepMind. These risks remain debated.
AI alignment is a subfield of AI safety, the study of how to build safe AI systems. Other subfields of AI safety include robustness, monitoring, and capability control. Research challenges in alignment include instilling complex values in AI, developing honest AI, scalable oversight, auditing and interpreting AI models, and preventing emergent AI behaviors like power-seeking. Alignment research has connections to interpretability research, (adversarial) robustness, anomaly detection, calibrated uncertainty, formal verification, preference learning, safety-critical engineering, game theory, algorithmic fairness, and social sciences.

View More On Wikipedia.org
  1. Mwl.RCT

    Reasoning AI vs GPT: Choose the RIGHT Model!

    Choosing the Right AI Model for Strategic Tasks Businesses today face the challenge of selecting the optimal AI model for tasks ranging from simple automation to complex strategic decision-making. OpenAI offers a powerful toolkit with two distinct model families: reasoning models (like o1 and...
Back
Top Bottom