ADiJaC is a source transformation automatic differentiation tool for Java Classfiles that is able to compute derivatives in both Forward and Reverse mode. The purpose of this talk is to present the implementation details of the Parallel Forward Vector Mode. The architecture of the targeted machine and the computational intensity of the code to be differentiated are taken into consideration in order to decide if the derivate computations are parallelized or not. A series of tests are run to attest the correctness and efficiency of the proposed solution.
Johannes Blühdorn (TU Kaiserslautern)
Event-Based Automatic Differentiation of OpenMP with OpDiLib
We present OpDiLib, a universal add-on for operator overloading AD tools that enables the automatic differentiation of OpenMP parallel codes. Previously, support for OpenMP in reverse mode operator overloading AD tools was limited by the fact that pragma directives are outside the scope of the programming language and hence inaccessible with overloading techniques. We show how OMPT, a modern OpenMP feature, can be used to achieve fully automatic differentiation while retaining the original pragma directives; alternatively, a set of replacement macros can be used. Both approaches are supported by OpDiLib's event-based differentiation logic. As there are no a priori restrictions on data access patterns and the AD workflow remains unchanged, OpDiLib is very easy to apply. Additionally, fine-grained optimizations can be applied, including elimination of atomic updates on adjoint variables. We demonstrate that very good parallel performance can be achieved with OpDiLib in an OpenMP-MPI hybrid parallel environment.
The effective design of instruments that rely on the interaction of radiation with matter for their operation is a complex task. Furthermore, the underlying physics processes are intrinsically stochastic in nature and open a vast space of possible choices for the physical characteristics of the instrument. While even large scale detectors such as e.g. at the LHC are built using surrogates for the ultimate physics objective, the MODE Collaboration (an acronym for Machine-learning Optimized Design of Experiments) aims at developing tools also based on deep learning techniques to achieve end-to-end optimization of the design of instruments via a fully differentiable pipeline capable of exploring the Pareto-optimal frontier of the utility function for future particle collider experiments and related detectors. The construction of such a differentiable model requires inclusion of information-extraction procedures, including data collection, detector response, pattern recognition, and other existing constraints such as cost. This talk will give an introduction to the goals of the newly founded MODE collaboration and highlight some of the already existing ingredients.
I will present joint work with Laurent, which introduced support for differentiation of OpenMP parallel loops with Tapenade in both forward and reverse mode. Besides a recap of differentiation rules for OpenMP scopes, I will discuss the run time support library that we developed to support dynamic schedules. If time allows, I would also like to talk about Spray (joint work with Johannes Doerfert), a library that could help Tapenade and other AD tools safely reverse-differentiate through parallel non-exclusive read access.
1630 –1700 Break, Breakouts
1700 –1800 Session II (Chair: Krishna Narayanan, Argonne National Labs)
The simulation of gas transmission and distribution networks are a driving motivation for the development of new approaches towards the numerical approximation of solutions to non-smooth systems of differential algebraic equations (DAE) with underlying flow network structure.
In this talk we will derive a class of generalized Taylor based integrators for such systems, based on higher order spline expansions of piecewise differentiable functions.
We will generalize the theorem of Taylor, the formula of Faa di Bruno and discuss numerical experiments.
Deterministic methods for global optimization typically proceed by generating upper and lower bounds on the unknown objective value at a global minimum, and progressively improving these bounds. Lower bounds in global minimization are typically obtained by minimizing convex relaxations of the original objective function using a local optimization solver. However, in certain cases, this minimization may be difficult: some convex relaxations are significantly nonsmooth or noisy, and for other convex relaxations, crucial gradient/subgradient information may be unavailable. This presentation illustrates the first tractable method to construct a guaranteed linear underestimator by sampling a convex function of n variables (2n+1) times. This is compatible with established forward and reverse AD modes for subgradient evaluation by Mitsos et al. (2009) and Beckers and Naumann (AD2012), and is compatible with the established McCormick convex relaxation procedure that uses the same computational graphs and operator overloading as AD.
The chain rule of differentiation is the fundamental prerequisite for computing accurate derivatives of composite functions which perform a potentially very large number of elemental function evaluations. Data flow dependences amongst the elemental functions give rise to a combinatorial optimization problem.
We formulate Chain Rule Differentiation and we prove
it to be NP-complete. The new proof holds for derivatives of arbitrary order.
1800 End of day 1
Wednesday, November 3, 2021
1500 –1630 Session III (Chair: Bruce Christianson, University of Hertfordshire)
Thomas Oberbichler (TU Munich)
Algorithmic Differentiation for interactive CAD-integrated Isogeometric Analysis
In this talk, we discuss the application of algorithmic differentiation at the interface between architecture and civil engineering. The integration of analysis tools in computer aided design (CAD) enables structures to be generated and explored intuitively. To achieve a high degree of interactivity, the use of natural CAD geometric parametrization – for example NURBS – is also desirable at the analysis stage. Beyond NURBS, modern CAD systems provide other descriptions of free-form geometries, such as discrete meshes or subdivision surfaces. To perform various types of analysis with different geometric descriptions, it is necessary to generalize the process of CAD-integrated isogeometric analysis (IGA) while also increasing the computational speed. To address this issue, we present a new, efficient, and modular approach for implementing CAD-integrated analysis based on algorithmic differentiation. A feature-rich digital toolbox can be derived from a set of highly optimized mechanical and geometric building blocks. We present this concept for a range of mechanical element types and geometric parameterizations. The method can be employed for classic structural analysis as well as in form-finding and the constraint-driven design of free-form geometries.
We present a new framework for the ice sheet model SICOPOLIS that enables adjoint and tangent linear code generation via source transformation using the open-source AD tool Tapenade. This framework has several advantages over earlier work using OpenAD: (1) it is up-to-date with the latest SICOPOLIS code; (2) the AD tool Tapenade is open-source and actively maintained; (3) a new tangent linear code generation capability is introduced; (4) we are now able to deal with inputs in the NetCDF format; (5) we leverage continuous integration in order to track changes in the trunk that "break" the AD-based code generation; (6) we have now correctly incorporated the LIS solver, its tangent linear code, and its adjoint which improve the simulation of Antarctic ice shelves and Greenland outlet glaciers.
The adjoint and tangent linear results are validated using a finite-difference check, increasing confidence in the validity of the code produced by Tapenade. This new framework will be freely available.
The development of AD tools focuses mostly on handling floating point types in the target language. Taping optimizations in these tools mostly focus on specific operations like matrix vector products.
Aggregated types like std::complex are usually handled by specifying the AD type as a template argument.
This approach provides exact results, but prevents the use of expression templates.
If AD tools are extended and specialized such that aggregated types can be added to the expression framework, then this will result in reduced memory utilization and improve the timing for applications where aggregated types such as complex number or matrix vector operations are used. Such an integration requires a reformulation of the stored data per expression and a rework of the tape evaluation process. In this talk we demonstrate the overhead of unhandled aggregated types in expression templates and provide basic ingredients for a tape implementation that supports arbitrary aggregated types for which the user has implemented some type traits. Finally, we demonstrate the advantages of aggregated type handling on a synthetic benchmark case.
The MODE introductory article*, published earlier this year proposed an end-to-end differential pipeline for the optimisation of detector designs directly with respect to the end goal of the experiment, rather than intermediate proxy targets. The TomOpt python package is the first concrete endeavour in attempting to realise such a pipeline, and aims to allow the optimisation of detectors for the purpose of muon tomography with respect to both imaging performance and detector budget.
This modular and customisable package is capable of simulating detectors which scan unknown volumes by muon radiography, using cosmic ray muons to infer the density of the material. The full simulation and reconstruction chain is made differentiable and an objective function including the goal of the apparatus as well as its cost and other factors can be specified. The derivatives of such a loss function can be back-propagated to each parameter of the detectors, which can be updated via gradient descent until an optimal configuration is reached.
In 2001, Griewank and Mitev introduced a technique for efficiently determining the sparsity pattern of Jacobian matrices via Bayesian Probing. Twenty years later, we describe some extensions to that pioneering work and some alternative probing strategies.
Jean Utke (Allstate, USA)
Remembering Andreas Griewank
An eclectic talk touching on a few topics related to AD in C++:
- reversing a call stack without RAII
- obtaining full and partial derivatives from a single function
- C++ SG7 reflection group influence from AD
Rodrigo Alejandro Vargas Hernandez (University of Toronto)
Applications of automatic differentiation for wavefunction-based quantum chemistry methodologies.
The central role of theoretical chemistry or quantum chemistry is the
simulation of molecular properties using the laws of quantum mechanics.
During the last decade, a great variety of methods have been developed creating three main branches, density functional theory, wavefunction based, and semi-empirical methods. For the last two methodologies, the quantum state that describes the distribution of electrons in a molecule, the wave function, is parametrized by a set of atomic basis functions that are linearly combined.
Here, we show that with automatic differentiation it is possible to optimize
the internal parameters of quantum chemistry methodologies, increasing the accuracy of the wavefunction.
In this talk, we present two applications of automatic in quantum chemistry. Difficult, a software package for ab-initio quantum chemistry calculations, and Huxel, a software package for fully differentiable semi-empirical methods. Both packages were implemented in the JAX ecosystem.
Diffiqult is designed to optimize any parameter of the atomic basis-
set, e.g., Gaussian widths and centers, and contraction coefficients, and compute other higher-order derivatives. The optimization procedure is highly non-trivial, and as our results point, automatic differentiation engines could improve the accuracy of the wavefunction.
Huxel is the implementation of the H ̈uckel method, a semi-empirical
model capable of qualitatively describing large chemical systems and providing initial guesses for ab-initio methodologies. Our implementation permits us to optimize the parameters, improving the screening procedure for material science.
Nuclear fusion has a large potential of becoming a clean and stable source of energy. However, for a reliable energy production, several challenges still have to be solved, one of the most critical being power and particle exhaust in the so called divertor component. In fact, due to the fast and strongly anisotropic transport in the plasma edge, the divertor has to withstand heat fluxes of several tens of megawatts concentrated on a few square meters area, thus reaching engineering limits of state-of-the-art cooling concepts. Moreover, divertor surface damage induced by sputtering from highly energetic plasma particles significantly reduces the component lifetime. Therefore, it is of foremost importance to adequately model and predict the plasma edge behaviour for designing divertor concepts for future reactors.
To this end, plasma edge codes such as SOLPS-ITER  are commonly employed, which solve transport equations for the plasma particles, together with transport equations and interactions with neutral particles. Within these codes, the effect of plasma turbulence is only approximated with a diffusion model, adopting ad-hoc coefficients. In the current practice, these coefficients are estimated by visually comparing simulation results to experimental data, with manual tuning performed by the modeler in large and computationally expensive parameter scans. Moreover, such coefficients are spatially dependent, and different for each operational regime and reactor, thus hampering the code’s predictive and interpretive capability. Parameter estimation with gradient-based optimization allows to automate the procedure, with Algorithmic Differentiation (AD) providing efficient and accurate gradient calculations .
This talk gives an overview of the advancements of AD applied to SOLPS-ITER, following the first demonstration in Ref. . The TAPENADE tool  is again employed, and adjoint derivative calculation is now available. Comparison of sensitivities obtained through finite differences, tangent and adjoint AD prove the correctness of the differentiation. Preliminary analysis of adjoint AD performance shows that a better checkpointing strategy needs to be achieved. Finally, AD is employed in a recently developed parameter estimation framework implementing both regression and Bayesian MAP estimation .
 X. Bonnin, W. Dekeyser, R. Pitts, et al. Plasma and Fusion Res., Vol. 11:1403102, 2016
 A. Griewank and A. Walther. Evaluating Derivatives. SIAM, 2 edition, 2008
 S. Carli, M. Blommaert, W. Dekeyser, M. Baelmans, Nuclear Materials and Energy 18, 6-11, 2019
 L. Hascoet and V. Pascual, ACM Trans. Math. Softw. 39, 3, Article 20 (May 2013)
 S. Carli, W. Dekeyser, M. Blommaert, R. Coosemans, W. Van Uytven, M. Baelmans, submitted to Contributions to Plasma Physics
Quantum optimal control is solved by the GRAPE algorithm, which suffers from exponential growth in storage with increasing number of qubits and linear growth in memory requirements with increasing number of timesteps. These memory requirements are a barrier for simulating larger models or longer time spans. We have created a non-standard automatic differentiation technique that can compute gradients needed by GRAPE by exploiting the fact that the inverse of a unitary matrix is its conjugate transpose. Our approach significantly reduces the memory requirements for GRAPE, at the cost of a reasonable amount of recomputation. We present our implementation in JAX, as well as benchmark results.
Additionally, we will discuss the so called parameter-shift rule for computing partial derivatives of a variational circuit. For many quantum functions implemented as quantum circuits in hardware, the same circuit can be used to compute both the quantum function and the gradient of the quantum function. We will explain some of the concepts in this area.
1630 –1700 Break, Breakouts
1700 –1800 Session V (Chair: Jan Hückelheim, Argonne National Labs)
William Moses (MIT)
Language-Independent Automatic Differentiation and Optimization of GPU Programs with Enzyme
Derivatives are fundamental to a variety of algorithms scientific computing and machine learning such as back-propagation in neural networks, uncertainty quantification, sensitivity analysis and Bayesian inference. Enzyme is a LLVM compiler plugin for reverse-mode automatic differentiation (AD) and thus generates fast gradients of programs in a variety of languages, including C/C++, Fortran, Julia, and Rust. Our talk will present a combination of novel techniques that make Enzyme the first automatic reverse-mode AD tool to generate gradients of GPU kernels. As Enzyme differentiates within a general-purpose compiler, we are able to introduce novel GPU and AD-specific optimizations. We differentiate five GPU-based HPC applications, executed on NVIDIA and AMD GPUs. All benchmarks run within an order of magnitude of the original program's runtime. Without GPU and AD-specific optimizations, gradients of GPU kernels either fail to run from a lack of resources or have infeasible overhead.
Clad enables automatic differentiation (AD) for C++ algorithms through source-to-source transformation. It is based on LLVM compiler infrastructure and as a Clang compiler plugin. Different from other tools, Clad manipulates the high-level code representation (the AST) rather than implementing it’s own C++ parser and does not require modifications to existing code bases. This methodology is both easier to adopt and potentially more performant than other approaches. Having full access to the Clang compiler’s internals means that Clad is able to follow the high-level semantics of algorithms and can perform domain-specific optimisations; automatically generate code (re-targeting C++) on accelerator hardware with appropriate scheduling; and has a direct connection to compiler diagnostics engine and thus producing precise and expressive diagnostics positioned at desired source locations.
In this talk, we showcase the above mentioned advantages through examples and outline Clad’s features, applications and support extensions. We describe the challenges coming from supporting automatic differentiation of broader C++ and present how Clad can compute derivatives of functions, member functions, functors and lambda expressions. We show the newly added support of array differentiation which provides the basis utility for CUDA support and parallelisation of gradient computation. Moreover, we will demo different interactive use-cases of Clad, either within a Jupyter environment as a kernel extension based on xeus-cling or within a gpu-cpu environment where the gradient computation can be accelerated through gpu-code produced by Clad and run with the help of the Cling interpreter.
The ChainRules project is a suite of JuliaLang packages that define custom primitives (i.e. rules) for doing AD in JuliaLang.
Importantly it is AD system agnostic.
It has proved successful in this goal.
At present it works with about half a dozen different JuliaLang AD systems.
It has been a long journey, but as of August 2021, the core packages have now hit version 1.0.
This talk will go through why this is useful, the particular objectives the project had, and the challenges that had to be solved.
This talk is not intended as an educational guide for users (For that see our 2021 JuliaCon talk: Everything you need to know about ChainRules 1.0 (https://live.juliacon.org/talk/LWVB39)).
Rather this talk is to share the insights we have had, and likely (inadvertently) the mistakes we have made, with the wider autodiff community.
We believe these insights can be informative and useful to efforts in other languages and ecosystems.