Paul Christiano's AI control blog
This is a blog by @3.
Speculations on the design of safe, efficient AI systems. Supported by a grant from the Future of Life Institute.
Original source: https://medium.com/ai-control
Children:
- Stable self-improvement as an AI safety problem
- Approval directed agents
- Approval-directed bootstrapping
- Optimization and goals
- Adversarial collaboration
- Implementing our considered judgment
- Human in counterfactual loop
- Automated assistants
- Delegating to a mixed crowd
- Learning and logic
- The steering problem
- Humans consulting HCH
- Implicit consequentialism
- Scalable AI control
- Reinforcement learning and linguistic convention
- On heterogeneous objectives
- Safe AI from question-answering
- Problem: safe AI from episodic RL
- Challenges for safe AI from RL
- The state of the steering problem
- The easy goal inference problem is still hard
- Reward engineering
- Handling adversarial errors
- Learn policies or goals?
- In defense of maximization
- Against mimicry
- Mimicry and meeting halfway
- Apprenticeship learning and mimicry
- Human arguments and AI control
- How common is imitation?
- Efficient feedback
- A possible stance for AI control research
- Unsupervised learning and AI control
- Act based agents
- Abstract approval-direction
- Synthesizing training data
- Indirect decision theory
- Optimizing with comparisons
- Modeling AI control with humans
- Of simulations and inductive definitions
- The absentee billionaire