If applied in a pure black box fashoin, the user might not notice
that the function to be differentiated is nonsmooth. In this talk,
the handling of nonsmooth function by AD tools like ADOL-C will be discussed. Furthermore, we will present an approach to generate a local model for the nonsmooth function based on abs-linearization. Numerical results illustrate the benefits of this technique.
The development of algorithms to optimize functions of a certain class is an essential part of applied mathematics and of central importance for industrial applications.
When using gradient-based optimization methods, derivative information is used to find a local optimum.
However, piecewise linear functions have areas where the function is not differentiable. The non-differentiabilty is often caused by an operator like the absolute value function.
To solve a piecewise linear constrained optimization problem, the function and the constraints are rewritten as a matrix-vector based representation - the so-called Abs-Normal form.
Certain elements of the Abs-Normal form are computed using algorithmic differentiation.
In the talk the algorithm is introduced and first numerical results are presented.
It was observed that the size of the algebraic representation for piece-wise linear approximations is proportional to the number of absolute function value calls in the evaluation procedure of the underlying function. Hence, the algebraic representation, called ANF, can easily become very large - even for functions with a very simple structure. To reduce the size and, thus, the numerical effort, we propose the idea to compute a local piece-wise linearization. These local piece-wise linearization coincide with the original piece-wise linear approximation only over a neighborhood but can be significantly smaller.
In detail, we propose two methods: a simple heuristic to identify which parts of the ANF can be considered as affine/linear over the neighborhood and which ones require a piece-wise linear representation.
The second method uses this information and computes a local approximation from the original ANF.
We report the results on several test problems that not only show the benefits but also the limitations of the proposed approach.
Devito is a Domain-specific Language (DSL) and code generation framework for the design of highly optimised finite difference kernels for use in inversion methods. Devito utilises SymPy to allow the definition of operators from high-level symbolic equations and generates optimised and automatically tuned, parallelised and vectorised code specific to a given target architecture.
We are integrating Automatic Differentiation and checkpointing into the Devito compiler to enable automated generation of discretely consistent adjoint PDE solvers that match the computational efficiency of the primal solvers.
AD has a split personality. While scientific applications of AD have had decades to mature, the techniques were recently rediscovered in machine learning, which has built its own parallel universe of tooling with very different set of design considerations and constraints. Can these two worlds be combined?
We present Zygote, and AD system which attempts to meet the needs of both the ML and scientific communities in one tool, building on work by the AD community. We'll assess Zygote's current design and implementation, future challenges to building general-purpose AD, and some of the exciting applications enabled by the merging of ML with traditional scientific modelling.
This talk will explore validation techniques for programs that are differentiated with operator overloading AD tools. Common problems, that can lead to faulty tapes, are analyzed and detection methods are presented.
Patrick Peltzer (STCE, RWTH Aachen University, Germany)
Nesting Template Libraries: Notes on Coupling Eigen and dco/c++
The C++ template header library Eigen offers a wide range of features for linear algebra computations and is used in numerous scientific fields. While it comes with basic support for forward AD, dedicated AD tools offer more features and advantages, but also pose challenges in combination with Eigen.
In this talk the potential of incorporating an AD tool into Eigen is presented, using dco/c++ as an example tool. Besides general performance improvements, details on the implementation of symbolic linear algebra operations are given for the matrix-matrix product as well as for dense linear system solvers. Impact on performance is presented based on benchmark results.
Simon Lukas Maertens (RWTH Aachen University (STCE), Germany)
Derivative Code by Overloading in MATLAB
We present our attempt to provide Algorithmic Differentiation (AD) in MATLAB. By using the MATLAB MEX API we created an interface between the overloading tool dco/C++ and MATLAB, which enables us to calculate derivatives of arbitrary order in forward as well as reverse mode.
We support all numeric types that are available in MATLAB including complex numbers as well as all casts between those types. The derivatives calculated with our tool were validated against finite differences for over 200 common MATLAB functions.
As a first case study we used dco/MATLAB in combination with the MATLAB Optimization Toolbox in order to create an AD enabled optimization framework.
Many fields in scientific computing have established iterative
algorithms that trade efficiency for robustness. A well-known example is
the segregated pressure-correction scheme SIMPLE that is the workhorse
method to discretise the incompressible Navier-Stokes Eq. The approach
is very memory-efficient, solving a linear equation for one system
variable at at a time, but is known to converge very poorly for large
grids or complex flows. Similar problems arise where strongly coupled
multi-physics systems are solved one discipline at a time.
The presentation will tell the story of applying source-transformation
AD to a commercial incompressible flow solver, taking a slightly
historical detour to show how far we have come from cumbersome scripting
with earlier versions of Tapenade, to a quite robust use today, and will
demonstrate the exactness of the produced derivative code.
The talk will then focus on the stability of the adjoint code,
demonstrating how the skeletons in the SIMPLE closet are coming out in
reverse, but that all is not lost.
This technology was presented in Oxford in 2016 through work done at NAG and RWTH Aachen. However, details of an implementation were omitted (for good reasons).
Here, I wish to present one way of approaching compile time adjoint differentation, working through the essential details of my own implementation. The approach lacks many of the qualities of the NAG implementation, though in one detail, the capacity to have non-scalars as primitives, is supported, unlike in the NAG implementation, as far as I know.
Presenting such a tool is more aimed at showing what is possible rather than showing what is generally advisable. Implementing such an algorithm requires expertise in a narrow area of programming and considerable effort in moderating compilation time. Possible use of such a tool might be as part of a larger mesh of cooperating differentiation implemenations, varying in compile-time and run-time flexibility.
1715 –1800 Break
1800 Pub Dinner
Tuesday, July 2, 2019
830 –900 Coffee
900 –1200 Preliminary: Morning Session talks
William Moses (MIT)
Efficient Cross-Platform AD by Synthesizing LLVM
In this talk I discuss some ongoing work to leverage LLVM as a tool for synthesizing efficient derivatives from arbitrary functions without modification. I present an algorithm that achieves state-of-the art performance by creating a representation that is able to build off existing LLVM optimizations to great benefit (optimizing away tapes, etc).
José CARDESA (Institut de Mécanique des Fluides de Toulouse (IMFT))
Automatic differentiation of a parallel CFD code
The quest for performance improvements in aerodynamics software
dictates many coding choices such as the mathematical discretization,
the code structure or its programming language. Simple integration
with AD tools, however, is seldom considered at the design stage of
computational fluid dynamics (CFD) codes. The developer facing the
task of implementing a discrete adjoint of a "finished" code is
left with little margin to iron out incompatibilities between the code
and the AD tool. For this reason, we test AD on a modern MPI-parallel
code that is still under development, and issue recommendations
that feed back into the code design process. Our aim is to obtain a
code that is easy to differentiate each time a CFD problem arises
where new/different sensitivity computations are needed. We
illustrate some of our findings in this talk.
Mladen Banovic (Paderborn University)
Gradient-based shape optimization of a compressor stator blade with assembly constraints using a differentiated CAD kernel
Computer-Aided Design (CAD) tools and systems are considered essential in industrial design workflow. They serve to construct and manipulate the geometric representation of a model with an arbitrary set of design parameters. To integrate a CAD library into the gradient-based shape optimization, one requires calculation of the so-called shape sensitivities, i.e. derivatives of surface points w.r.t. the design parameters of the model to be optimized. Typically in commercial CAD systems, this information is calculated using inaccurate finite differences. Here, algorithmic differentiation (AD) is applied to the open-source CAD kernel Open CASCADE Technology (OCCT) using the AD tool ADOL-C. The differentiated OCCT is coupled with a discrete adjoint CFD solver also produced by AD, therefore providing a complete differentiated design chain at hand. This design chain is utilized to perform aerodynamic shape optimization of a compressor stator blade, namely the TU Berlin TurboLab stator, to minimize the total pressure loss. Moreover, the test-case imposes several assembly constraints that have to be respected during the optimization. The most challenging requirement is that the blade has to accommodate four cylinders (bolts) that serve to mount the blade to its casing. Their optimal position in the blade is calculated using the derivative information from the differentiated OCCT.
Johannes Blühdorn (Chair for Scientific Computing, Technische Universität Kaiserslautern)
Automatic Differentiation for Material Law Evaluation on GPUs
Often, automatic differentiation is applied globally to a large simulation. However, it can also be convenient to base a simulation on local algorithmic derivatives. Specifically, we outline how the evaluation of generalized standard materials can be automatized so that the user must only provide code for two potentials. This greatly simplifies the material law implementation process. The resulting solver uses automatic differentiation in the shape of a special purpose AD tool. In terms of performance, material law evaluation is critical and the automatization is costly. Therefore, we move our solver as well as the AD tool to the GPU. We show that the performance gap to classical material law implementations can be fully closed and demonstrate that selected techniques of AD admit applications in GPU code.
One of the features critical to the wide adoption of PyTorch was its AD system. Aimed with a goal of differentiating through arbitrary Python programs it was infeasible to implement it in any way other than based on operator-overloading. However, the development of TorchScript - an array-oriented programming language embedded in Python - changes the landscape significantly. It allows us to perform source-to-source differentiation on parts of user programs, while tightly integrating with the existing operator-overloading based system, yielding a hybrid AD implementation. In this talk I will discuss the challenges we have faced while integrating the two techniques, and delve a bit into some surprising performance advantages that the operator-overloading might have over ahead of time compiler transforms.
Automatic differentiation (AD) is widely used by machine learning languages. However, machine learning languages typically compute approximate gradients, and they achieve this by approximating the input program for AD. This is both inefficient and cumbersome for the user of AD. I argue that the next generation of AD tools should be aware of these approximations and faciliate them. I will show that a class of approximations known as "message-passing algorithms" can be mechanized in a similar manner as AD.
Bruce Christianson (University of Hertfordshire)
Ensuring Descent - and Global Convergence - for One-Shot Optimization
We wish to find u so as to minimize z* = f (x*) where x* is the exact solution
to the implicit equations psi (x,u)=0.
For values of u that are far from this optimum we don’t want to waste time
calculating accurate values for x and z. We’d rather use the cpu time to
update u to a “sufficiently better” value u + alpha s, based on approximate
values for x and an approximation g for the exact gradient g* = grad_u z.
Usually we either take g as the new search direction s, or combine
g with a previous s in some way. In order to make progress,
we need to ensure that s is an acceptable descent direction
(has a sufficiently positive inner product with the actual gradient g* )
and that the new value for u satisfies some global convergence
conditions such as Goldstein or Wolfe.
We can calculate the gradient g of f by setting bar-x = f’(x), solving
the implicit equations xi psi'_x + bar-x = 0 for xi and then taking
g = bar-u = xi psi'_u
Often, we compute x by an iterative schema such as x := phi (x,u),
in which case we can compute xi by the adjoint iterative schema
xi := xi phi'_x + bar-x.
So, our problem is to find a way to check that we have a suitable
direction s and step length alpha for the _exact_ problem z*
(with solution x* and gradient g*) but that uses only the _approximate_
values x and g, along with _exact_ values for the residuals such as
w = psi (x,u).
We can do this by using a more accurate estimate hat-z for z*, where
hat-z = z + xi w. Comparing hat-z before and after taking the step
alpha s in u lets us check that we actually did get appropriate descent.
We can also check that we are tightening w “sufficiently” at each step,
by ensuring that the size of the correction term xi w doesn’t outweigh
the decrease in hat z. [Indeed we can use the components of xi to
weight the norm we use on w for converging x.]
Finally, calculation of a single forward derivative of hat-z (in the direction
of s) verifies that s lies in an appropriate descent cone, and guarantees
Julien Herrmann (INRIA)
H-Revolve: A Framework for Adjoint Computation on Synchrone Hierarchical Platforms
In this work, we study the problem of checkpointing strategies for adjoint computation on synchrone
hierarchical platforms. Specifically we consider computational platforms with several levels of storage with different
writing and reading costs. When reversing a large adjoint chain, choosing which data to checkpoint and where
is a critical decision for the overall performance of the computation. We introduce H-Revolve, an optimal
algorithm for this problem. We make it available in a public Python library along with the implementation
of several state-of-the-art algorithms for the variant of the problem with two levels of storage.
This is an overview of examples and problems posed late 1600 up to mid 1700 for the purpose of testing or explaining the two different implementations of the Newton-Raphson method, Newton's method as described by Wallis in 1685, Raphson's method from 1690 and Halley's method from 1694 for solving nonlinear equations. It is demonstrated that already in 1745 it was shown that the methods of Newton and Raphson were the same but implemented in different ways.
Quantum computing is computing using quantum-mechanical phenomena, such as superposition and entanglement. It holds the promise of being able to efficiently solve problems that classical computers practically cannot. In quantum computing, a quantum algorithm is an algorithm
that runs on a realistic model of quantum computation, the most commonly used model being the quantum circuit model of computation. A quantum circuit is a model for quantum computation
in which a computation is a sequence of quantum gates.Quantum gates are the building blocks of quantum circuits and operate on small number of qubits, similar to how classical logic gates operate on a small number of bits in conventional digital circuits.
Practitioners of quantum computing must map the logical quantum gates in their algorithms onto the physical quantum devices that implement quantum gates through a process call quantum control. The general goal of quantum control is to actively manipulate dynamical processes at the atomic or molecular scale, typically by means of external electromagnetic fields or forces. The objective of quantum optimal control is to devise and implement shapes of pulses of external fields or
sequences of such pulses that reach a given task in a quantum system in the best way possible. I will present ongoing work to improve the performance of quantum control.