AF - [Aspiration-based designs] 3. Performance and safety criteria, and aspiration intervals by Jobst Heitzig

The Nonlinear Library

Sisällön tarjoaa The Nonlinear Fund. The Nonlinear Fund tai sen podcast-alustan kumppani lataa ja toimittaa kaiken podcast-sisällön, mukaan lukien jaksot, grafiikat ja podcast-kuvaukset. Jos uskot jonkun käyttävän tekijänoikeudella suojattua teostasi ilman lupaasi, voit seurata tässä https://fi.player.fm/legal kuvattua prosessia.

5M ago 20:26

MP3•Jakson koti

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Aspiration-based designs] 3. Performance and safety criteria, and aspiration intervals, published by Jobst Heitzig on April 28, 2024 on The AI Alignment Forum.
Summary. In this post, we extend the basic algorithm by adding criteria for choosing the two candidate actions the algorithm mixes, and by generalizing the goal from making the expected Total equal a particular value to making it fall into a particular interval. We only use simple illustrative examples of performance and safety criteria and reserve the discussion of more useful criteria for later posts.
Introduction: using the gained freedom to increase safety
After having introduced the basic structure of our decision algorithms in the last post, in this post we will focus on the core question: How shall we make use of the freedom gained from having aspiration-type goals rather than maximization goals?
After all, while there is typically only a single policy that maximize some objective function (or very few, more or less equivalent policies), there is typically a much larger set of policies that fulfill some constraints (such as the aspiration to make the expected Total equal some desired value).
More formally: Let us think of the space of all (probabilistic) policies, Π, as a compact convex subset of a high-dimensional vector space with dimension d1 and Lebesgue measure μ. Let us call a policy πΠ successful iff it fulfills the specified goal, G, and let ΠGΠ be the set of successful policies. Then this set has typically zero measure, μ(ΠG)=0, and low dimension, dim(ΠG)d, if the goal is a maximization goals, but it has large dimension, dim(ΠG)d, for most aspiration-type goals.
E.g., if the goal is to make expected Total equal an aspiration value, Eτ=E, we typically have dim(ΠG)=d1 but still μ(ΠG)=0. At the end of this post, we discuss how the set of successful policies can be further enlarged by switching from aspiration values to aspiration intervals to encode goals, which makes the set have full dimension, dim(ΠG)=d, and positive measure, μ(ΠG)>0.
What does that mean? It means we have a lot of freedom to choose the actual policy πΠG that the agent should use to fulfill an aspiration-type goal. We can try to use this freedom to choose policies that promise to be rather safe than unsafe according to some generic safety metric, similar to the impact metrics used in reward function regularization for maximizers.
Depending on the type of goal, we might also want to use this freedom to choose policies that fulfill the goal in a rather desirable than undesirable way according to some goal-related performance metric.
In this post, we will illustrate this with only very few, "toy" safety metrics, and one rather simple goal-related performance metric, to exemplify how such metrics might be used in our framework. In a later post, we will then discuss more sophisticated and hopefully more useful safety metrics.
Let us begin with a simple goal-related performance metric since that is the most straightforward.
Simple example of a goal-related performance metric
Recall that in step 2 of the basic algorithm, we could make the agent pick any action a whose action-aspiration is at most as large as the current state-aspiration, E(s,a)E(s), and it can also pick any other action, a+, whose action-aspiration is at least as large as the current state-aspiration, E(s,a+)E(s).
This flexibility is because in steps 3 and 4 of the algorithm, the agent is still able to randomize between these two actions a,a+ in a way that makes expected Total, Eτ, become exactly E(s).
If one had an optimization mindset, one might immediately get the idea to not only match the desired expectation for the Total, but also to minimize the variability of the Total, as measured by some suitable statistic such as its variance. In a sequential decision makin...

2448 jaksoa

#Podcasting Education #The Nonlinear Fund