
Programme of the 22nd Euro AD Workshop
Monday, July 1, 2019
 8^{30} –9^{00} Coffee
 9^{00} –12^{00} Morning Session
 Andrea Walther (Universität Paderborn)
AD for nonsmooth functions
If applied in a pure black box fashoin, the user might not notice
that the function to be differentiated is nonsmooth. In this talk,
the handling of nonsmooth function by AD tools like ADOLC will be discussed. Furthermore, we will present an approach to generate a local model for the nonsmooth function based on abslinearization. Numerical results illustrate the benefits of this technique.
 Timo Kreimeier (Universität Paderborn)
Using Abslinearization to determine local minima of piecewise linear constrained programs
The development of algorithms to optimize functions of a certain class is an essential part of applied mathematics and of central importance for industrial applications.
When using gradientbased optimization methods, derivative information is used to find a local optimum.
However, piecewise linear functions have areas where the function is not differentiable. The nondifferentiabilty is often caused by an operator like the absolute value function.
To solve a piecewise linear constrained optimization problem, the function and the constraints are rewritten as a matrixvector based representation  the socalled AbsNormal form.
Certain elements of the AbsNormal form are computed using algorithmic differentiation.
In the talk the algorithm is introduced and first numerical results are presented.
 Torsten Bosse (FriedrichSchillerUniversität Jena)
Localization of ANFs
It was observed that the size of the algebraic representation for piecewise linear approximations is proportional to the number of absolute function value calls in the evaluation procedure of the underlying function. Hence, the algebraic representation, called ANF, can easily become very large  even for functions with a very simple structure. To reduce the size and, thus, the numerical effort, we propose the idea to compute a local piecewise linearization. These local piecewise linearization coincide with the original piecewise linear approximation only over a neighborhood but can be significantly smaller.
In detail, we propose two methods: a simple heuristic to identify which parts of the ANF can be considered as affine/linear over the neighborhood and which ones require a piecewise linear representation.
The second method uses this information and computes a local approximation from the original ANF.
We report the results on several test problems that not only show the benefits but also the limitations of the proposed approach.
 Navjot Kukreja (Imperial College London)
Automatic Differentiation in the Devito DomainSpecific Language
Devito is a Domainspecific Language (DSL) and code generation framework for the design of highly optimised finite difference kernels for use in inversion methods. Devito utilises SymPy to allow the definition of operators from highlevel symbolic equations and generates optimised and automatically tuned, parallelised and vectorised code specific to a given target architecture.
We are integrating Automatic Differentiation and checkpointing into the Devito compiler to enable automated generation of discretely consistent adjoint PDE solvers that match the computational efficiency of the primal solvers.
 12^{00} –13^{30} Lunch Break
 13^{30} –17^{15} Afternoon Session
 Mike Innes (Julia Computing)
Zygote: Bridging ML and Scientific Computing
AD has a split personality. While scientific applications of AD have had decades to mature, the techniques were recently rediscovered in machine learning, which has built its own parallel universe of tooling with very different set of design considerations and constraints. Can these two worlds be combined?
We present Zygote, and AD system which attempts to meet the needs of both the ML and scientific communities in one tool, building on work by the AD community. We'll assess Zygote's current design and implementation, future challenges to building generalpurpose AD, and some of the exciting applications enabled by the merging of ML with traditional scientific modelling.
 Max Sagebaum (Chair for Scientific Computing,TU Kaiserslautern,Bldg/Geb 34, PaulEhrlichStrasse,67663 Kaiserslautern, Germany)
Validation techniques for operator overloading AD
This talk will explore validation techniques for programs that are differentiated with operator overloading AD tools. Common problems, that can lead to faulty tapes, are analyzed and detection methods are presented.
 Patrick Peltzer (STCE, RWTH Aachen University, Germany)
Nesting Template Libraries: Notes on Coupling Eigen and dco/c++
The C++ template header library Eigen offers a wide range of features for linear algebra computations and is used in numerous scientific fields. While it comes with basic support for forward AD, dedicated AD tools offer more features and advantages, but also pose challenges in combination with Eigen.
In this talk the potential of incorporating an AD tool into Eigen is presented, using dco/c++ as an example tool. Besides general performance improvements, details on the implementation of symbolic linear algebra operations are given for the matrixmatrix product as well as for dense linear system solvers. Impact on performance is presented based on benchmark results.
 Simon Lukas Maertens (RWTH Aachen University (STCE), Germany)
Derivative Code by Overloading in MATLAB
We present our attempt to provide Algorithmic Differentiation (AD) in MATLAB. By using the MATLAB MEX API we created an interface between the overloading tool dco/C++ and MATLAB, which enables us to calculate derivatives of arbitrary order in forward as well as reverse mode.
We support all numeric types that are available in MATLAB including complex numbers as well as all casts between those types. The derivatives calculated with our tool were validated against finite differences for over 200 common MATLAB functions.
As a first case study we used dco/MATLAB in combination with the MATLAB Optimization Toolbox in order to create an AD enabled optimization framework.
 JensDominik Mueller (Queen Mary University of London)
Lessons from differentiating barely stable PDE solvers
Many fields in scientific computing have established iterative
algorithms that trade efficiency for robustness. A wellknown example is
the segregated pressurecorrection scheme SIMPLE that is the workhorse
method to discretise the incompressible NavierStokes Eq. The approach
is very memoryefficient, solving a linear equation for one system
variable at at a time, but is known to converge very poorly for large
grids or complex flows. Similar problems arise where strongly coupled
multiphysics systems are solved one discipline at a time.
The presentation will tell the story of applying sourcetransformation
AD to a commercial incompressible flow solver, taking a slightly
historical detour to show how far we have come from cumbersome scripting
with earlier versions of Tapenade, to a quite robust use today, and will
demonstrate the exactness of the produced derivative code.
The talk will then focus on the stability of the adjoint code,
demonstrating how the skeletons in the SIMPLE closet are coming out in
reverse, but that all is not lost.
 Dominic Jones (University of Buckingham)
Compile time differentiation in C++
This technology was presented in Oxford in 2016 through work done at NAG and RWTH Aachen. However, details of an implementation were omitted (for good reasons).
Here, I wish to present one way of approaching compile time adjoint differentation, working through the essential details of my own implementation. The approach lacks many of the qualities of the NAG implementation, though in one detail, the capacity to have nonscalars as primitives, is supported, unlike in the NAG implementation, as far as I know.
Presenting such a tool is more aimed at showing what is possible rather than showing what is generally advisable. Implementing such an algorithm requires expertise in a narrow area of programming and considerable effort in moderating compilation time. Possible use of such a tool might be as part of a larger mesh of cooperating differentiation implemenations, varying in compiletime and runtime flexibility.
 17^{15} –18^{00} Break
 18^{00} Pub Dinner

Tuesday, July 2, 2019
 8^{30} –9^{00} Coffee
 9^{00} –12^{00} Preliminary: Morning Session talks
 William Moses (MIT)
Efficient CrossPlatform AD by Synthesizing LLVM
In this talk I discuss some ongoing work to leverage LLVM as a tool for synthesizing efficient derivatives from arbitrary functions without modification. I present an algorithm that achieves stateofthe art performance by creating a representation that is able to build off existing LLVM optimizations to great benefit (optimizing away tapes, etc).
 José CARDESA (Institut de Mécanique des Fluides de Toulouse (IMFT))
Automatic differentiation of a parallel CFD code
The quest for performance improvements in aerodynamics software
dictates many coding choices such as the mathematical discretization,
the code structure or its programming language. Simple integration
with AD tools, however, is seldom considered at the design stage of
computational fluid dynamics (CFD) codes. The developer facing the
task of implementing a discrete adjoint of a "finished" code is
left with little margin to iron out incompatibilities between the code
and the AD tool. For this reason, we test AD on a modern MPIparallel
code that is still under development, and issue recommendations
that feed back into the code design process. Our aim is to obtain a
code that is easy to differentiate each time a CFD problem arises
where new/different sensitivity computations are needed. We
illustrate some of our findings in this talk.
 Mladen Banovic (Paderborn University)
Gradientbased shape optimization of a compressor stator blade with assembly constraints using a differentiated CAD kernel
ComputerAided Design (CAD) tools and systems are considered essential in industrial design workflow. They serve to construct and manipulate the geometric representation of a model with an arbitrary set of design parameters. To integrate a CAD library into the gradientbased shape optimization, one requires calculation of the socalled shape sensitivities, i.e. derivatives of surface points w.r.t. the design parameters of the model to be optimized. Typically in commercial CAD systems, this information is calculated using inaccurate finite differences. Here, algorithmic differentiation (AD) is applied to the opensource CAD kernel Open CASCADE Technology (OCCT) using the AD tool ADOLC. The differentiated OCCT is coupled with a discrete adjoint CFD solver also produced by AD, therefore providing a complete differentiated design chain at hand. This design chain is utilized to perform aerodynamic shape optimization of a compressor stator blade, namely the TU Berlin TurboLab stator, to minimize the total pressure loss. Moreover, the testcase imposes several assembly constraints that have to be respected during the optimization. The most challenging requirement is that the blade has to accommodate four cylinders (bolts) that serve to mount the blade to its casing. Their optimal position in the blade is calculated using the derivative information from the differentiated OCCT.
 Johannes Blühdorn (Chair for Scientific Computing, Technische Universität Kaiserslautern)
Automatic Differentiation for Material Law Evaluation on GPUs
Often, automatic differentiation is applied globally to a large simulation. However, it can also be convenient to base a simulation on local algorithmic derivatives. Specifically, we outline how the evaluation of generalized standard materials can be automatized so that the user must only provide code for two potentials. This greatly simplifies the material law implementation process. The resulting solver uses automatic differentiation in the shape of a special purpose AD tool. In terms of performance, material law evaluation is critical and the automatization is costly. Therefore, we move our solver as well as the AD tool to the GPU. We show that the performance gap to classical material law implementations can be fully closed and demonstrate that selected techniques of AD admit applications in GPU code.
 12^{00} –13^{30} Lunch Break
 13^{30} –17^{15} Preliminary: Afternoon Session talks
 Adam Paszke (University of Warsaw)
Building a multidimensionalarray focused hybrid automatic differentiation system
One of the features critical to the wide adoption of PyTorch was its AD system. Aimed with a goal of differentiating through arbitrary Python programs it was infeasible to implement it in any way other than based on operatoroverloading. However, the development of TorchScript  an arrayoriented programming language embedded in Python  changes the landscape significantly. It allows us to perform sourcetosource differentiation on parts of user programs, while tightly integrating with the existing operatoroverloading based system, yielding a hybrid AD implementation. In this talk I will discuss the challenges we have faced while integrating the two techniques, and delve a bit into some surprising performance advantages that the operatoroverloading might have over ahead of time compiler transforms.
 Tom Minka (Microsoft Research)
From automatic differentiation to message passing
Automatic differentiation (AD) is widely used by machine learning languages. However, machine learning languages typically compute approximate gradients, and they achieve this by approximating the input program for AD. This is both inefficient and cumbersome for the user of AD. I argue that the next generation of AD tools should be aware of these approximations and faciliate them. I will show that a class of approximations known as "messagepassing algorithms" can be mechanized in a similar manner as AD.
 Bruce Christianson (University of Hertfordshire)
Ensuring Descent  and Global Convergence  for OneShot Optimization
We wish to find u so as to minimize z* = f (x*) where x* is the exact solution
to the implicit equations psi (x,u)=0.
For values of u that are far from this optimum we don’t want to waste time
calculating accurate values for x and z. We’d rather use the cpu time to
update u to a “sufficiently better” value u + alpha s, based on approximate
values for x and an approximation g for the exact gradient g* = grad_u z.
Usually we either take g as the new search direction s, or combine
g with a previous s in some way. In order to make progress,
we need to ensure that s is an acceptable descent direction
(has a sufficiently positive inner product with the actual gradient g* )
and that the new value for u satisfies some global convergence
conditions such as Goldstein or Wolfe.
We can calculate the gradient g of f by setting barx = f’(x), solving
the implicit equations xi psi'_x + barx = 0 for xi and then taking
g = baru = xi psi'_u
Often, we compute x by an iterative schema such as x := phi (x,u),
in which case we can compute xi by the adjoint iterative schema
xi := xi phi'_x + barx.
So, our problem is to find a way to check that we have a suitable
direction s and step length alpha for the _exact_ problem z*
(with solution x* and gradient g*) but that uses only the _approximate_
values x and g, along with _exact_ values for the residuals such as
w = psi (x,u).
We can do this by using a more accurate estimate hatz for z*, where
hatz = z + xi w. Comparing hatz before and after taking the step
alpha s in u lets us check that we actually did get appropriate descent.
We can also check that we are tightening w “sufficiently” at each step,
by ensuring that the size of the correction term xi w doesn’t outweigh
the decrease in hat z. [Indeed we can use the components of xi to
weight the norm we use on w for converging x.]
Finally, calculation of a single forward derivative of hatz (in the direction
of s) verifies that s lies in an appropriate descent cone, and guarantees
convergence.
 Julien Herrmann (INRIA)
HRevolve: A Framework for Adjoint Computation on Synchrone Hierarchical Platforms
In this work, we study the problem of checkpointing strategies for adjoint computation on synchrone
hierarchical platforms. Specifically we consider computational platforms with several levels of storage with different
writing and reading costs. When reversing a large adjoint chain, choosing which data to checkpoint and where
is a critical decision for the overall performance of the computation. We introduce HRevolve, an optimal
algorithm for this problem. We make it available in a public Python library along with the implementation
of several stateoftheart algorithms for the variant of the problem with two levels of storage.
 Trond Steihaug (Department of InformaticsUniversity of Bergen)
A little on Newton, Raphson and Halley, a litte more on Simpson. Why we should or should not care.
This is an overview of examples and problems posed late 1600 up to mid 1700 for the purpose of testing or explaining the two different implementations of the NewtonRaphson method, Newton's method as described by Wallis in 1685, Raphson's method from 1690 and Halley's method from 1694 for solving nonlinear equations. It is demonstrated that already in 1745 it was shown that the methods of Newton and Raphson were the same but implemented in different ways.
 Sri Hari Krishna Narayanan (Argonne National Laboratory)
Efficient Quantum Optimal Control
Quantum computing is computing using quantummechanical phenomena, such as superposition and entanglement. It holds the promise of being able to efficiently solve problems that classical computers practically cannot. In quantum computing, a quantum algorithm is an algorithm
that runs on a realistic model of quantum computation, the most commonly used model being the quantum circuit model of computation. A quantum circuit is a model for quantum computation
in which a computation is a sequence of quantum gates.Quantum gates are the building blocks of quantum circuits and operate on small number of qubits, similar to how classical logic gates operate on a small number of bits in conventional digital circuits.
Practitioners of quantum computing must map the logical quantum gates in their algorithms onto the physical quantum devices that implement quantum gates through a process call quantum control. The general goal of quantum control is to actively manipulate dynamical processes at the atomic or molecular scale, typically by means of external electromagnetic fields or forces. The objective of quantum optimal control is to devise and implement shapes of pulses of external fields or
sequences of such pulses that reach a given task in a quantum system in the best way possible. I will present ongoing work to improve the performance of quantum control.
 17^{15} End

