= tf.ones((2, 2))
x
with tf.GradientTape() as t:
t.watch(x)= tf.reduce_sum(x)
y = tf.multiply(y, y)
z
# Derivative of z with respect to the original input tnesor x
= t.gradient(z, x)
dz_dx = dz_dx[0][0].numpy() # 2 (2*x_0_0 at x_0_0 = 1)
dz_dx_0_0 = dz_dx[0][1].numpy() # 2 (2*x_0_1 at x_0_1 = 1)
dz_dx_0_1 = dz_dx[1][0].numpy() # 2 (2*x_1_0 at x_1_0 = 1)
dz_dx_1_0 = dz_dx[1][1].numpy() # 2 (2*x_1_1 at x_1_1 = 1) dz_dx_1_1
12 Advanced Tensorflow for Reinforcement Learning
12.1 Eager Execution in TensorFlow 2.x
12.2 Computing (Customised) Gradients
12.3 Gradient Tapes
TensorFlow provides the
tf.GradientTape
API for automatic differentiation, i.e., computing the gradient of a computation with respect to its input variables.TensorFlow records all operations executed inside the context of a
tf.GradientTape
onto a tape.TensorFlow then uses that tape and the gradients of the outputs with respect to the intermediate values computed during a recorded
tf.GradientTape
context.
By default, the resources held by a \(\texttt{GradientTape}\) are released as soon as the
GradientTape.gradient()
method is called.To compute multiple gradients over the same computation, it is necessary to create a persistent graient tape.
This allows multiple calls to the
gradient()
method.Resources are released when the tape object is garbage collected.
= tf.constant(3.0)
x
with tf.GradientTape(peristent=True) as t:
t.watch(x)= x*x
y = y*y
z
= t.gradient(z, x) # 108.0 (4*x^3 at x = 3)
dz_dx = t.gradient(y, x) # 6 (2*x at x = 3)
dy_dx
del t # remove the reference to the tape and invoke garbage collection
12.4 \(\texttt{with... as}\) construct in Python
When the \(\texttt{with}\) statement is executed, Python evakuates the expression, called the
__enter__
method on the resulting value, which is called a context guard and assign the object returned by__enter__
to the variable given by \(\texttt{as}\).Python will then execute the body of the code.
In any case, also in case of an exception the
__exit__
method of the guard object is executed.
class guarded_execution:
def __enter__(self):
<initialisation>
return p
def __exit__(self, type, value, traceback):
<free resources and manage exceptions>
with guarded_execution as p:
<some instructions>
with open("textfile.txt") as f:
= f.read()
data <work with data>
\(\blacktriangleright\) See: https://effbot.org/zone/python-with-statement.html
12.5 \(\texttt{tf.function}\)
tf.function
allows to transform a subset of Python syntax into portable and high-performance TensorFlow graphs, which are the component that are “under its hood”.It is possible to write “graph code” using natural Python syntax.
- This topic is outside the scope of this module. You can find the definition of the language and more details in the TensorFlow documentation.
12.6 Custom Training Loops with Keras Models
12.7 TensorFlow \(\texttt{tf.data}\) API
12.8 Lambda Layers
12.9 TF-Agents Library
12.10 Training Architecture
\(-\)tf agents are not good\(-\)
Scaling up is important.
12.11 OpenAI Gym Environment
12.13 Implementing Atari Games
- Frame skipping: the agent only sees one frame at a time.
12.14 Collect Driver
12.15 Initialising the Replay Buffer
12.16 Extensions
Inside these agents there are a lot of predefined things, like:
Implementation with a Policy Networks (DQN) with Actor-Critic
Use of Advantage Actor-Critic (A2C)
Use of Asynchronous Advantage Actor-Critic (A3C)
Use of Soft-Actor Critic
Use of Trust-Region Optimization (available in TF-Agents)
Use of Proximal Policy Optimization (available in TF-Agents)
12.17 References
Géron (2022)