AF - [Aspiration-based designs] 2. Formal framework, basic algorithm by Jobst Heitzig

The Nonlinear Library

Sisällön tarjoaa The Nonlinear Fund. The Nonlinear Fund tai sen podcast-alustan kumppani lataa ja toimittaa kaiken podcast-sisällön, mukaan lukien jaksot, grafiikat ja podcast-kuvaukset. Jos uskot jonkun käyttävän tekijänoikeudella suojattua teostasi ilman lupaasi, voit seurata tässä https://fi.player.fm/legal kuvattua prosessia.

5M ago 29:21

MP3•Jakson koti

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Aspiration-based designs] 2. Formal framework, basic algorithm, published by Jobst Heitzig on April 28, 2024 on The AI Alignment Forum.
Summary. In this post, we present the formal framework we adopt during the sequence, and the simplest form of the type of aspiration-based algorithms we study. We do this for a simple form of aspiration-type goals: making the expectation of some variable equal to some given target value. The algorithm is based on the idea of propagating aspirations along time, and we prove that the algorithm gives a performance guarantee if the goal is feasible.
Later posts discuss safety criteria, other types of goals, and variants of the basic algorithm.
Assumptions
In line with the working hypotheses stated in the previous post, we assume more specifically the following in this post:
The agent is a general-purpose AI system that is given a potentially long sequence of tasks, one by one, which it does not know in advance. Most aspects of what we discuss focus on the current task only, but some aspects relate to the fact that there will be further, unknown tasks later (e.g., the question of how much power the agent shall aim to retain at the end of the task).
It possesses an overall world model that represents a good enough general understanding of how the world works.
Whenever the agent is given a task, an episode begins and its overall world model provides it with a (potentially much simpler) task-specific world model that represents everything that is relevant for the time period until the agent gets a different task or is deactivated, and that can be used to predict the potentially stochastic consequences of taking certain actions in certain world states.
That task-specific world model has the form of a (fully observed) Markov Decision Process (MDP) that however does not contain a reward function R but instead contains what we call an evaluation function related to the task (see 2nd to next bullet point).
As a consequence of a state transition, i.e., of taking a certain action a in a certain state s and finding itself in a certain successor state s', a certain task-relevant evaluation metric changes by some amount. Importantly, we do not assume that the evaluation metric inherently encodes things of which more is better. E.g., the evaluation metric could be global mean temperature, client's body mass, x coordinate of the agent's right thumb, etc.
We call the step-wise change in the evaluation metric the received Delta in that time step, denoted δ. We call its cumulative sum over all time steps of the episode the Total, denoted τ. Formally, Delta and Total play a similar role for our aspiration-based approach as the concepts of "reward" and "return" play for maximization-based approaches.
The crucial difference is that our agent is not tasked to maximize Total (since the evaluation metric does not have the interpretation of "more is better") but to aim for some specific value of the Total.
The evaluation function contained in the MDP specifies the expected value of δ for all possible transitions: Eδ(s,a,s').[1]
First challenge: guaranteeing the fulfillment of expectation-type goals
The challenge in this post is to design a decision algorithm for tasks where the agent's goal is to make the
expected (!) Total equal (!) a certain value ER which we call the aspiration value. [2] This is a crucial difference from a "satisficing" approach that would aim to make expected Total at least as large as E and would thus still be happy to maximize Total. Later we consider other types of tasks, both less restrictive ones (including those related to satisficing) and more specific ones that also care about other aspects of the resulting distribution of Total or states.
It turns out that we can guarantee the fulfillment of this type of goal under some weak condit...

2448 jaksoa

#Podcasting Education #The Nonlinear Fund