# Mila Computer Calculus RG

### News: If you enjoyed COMPCALC, be sure to check out AIPLANS, a new workshop at NeurIPS ‘21!

“The derivative, as this notion appears in the elementary differential calculus, is a familiar mathematical example of a function for which both [the domain and the range] consist of functions. Or, turning to the integral calculus, if in the expression we take the function as independent variable, we are led to a function for which the range of arguments consists of functions and the range of values, of numbers.”

In ancient times, our ancestors used small pebbles to encode and manipulate abstract quantities according to a small set of rules. In effect, they were trading mental for mechanical labor. With the advent of pen and paper, those same rules could be translated into written symbols and more recently, electrical impulses and transistors.

Today we make the same trade when implementing mathematical algorithms: one can either spend a great deal of effort manipulating an expression with pen and paper, then implement the result as a computer program, or we could implement the rules, then let the computer manipulate the expression on our behalf. We call this exchange calculus.

Modern calculus is a topic with a variety of applications in classical and statistical computing. Broadly, it encompasses any formal system for manipulating symbolic expressions according to a fixed set of rules. We are specifically interested in its applications to computer programming. Each week, we will read a paper and hold a short discussion. All are welcome to attend.

## Schedule

### Winter/Spring 2021

Date Talk Presenter
Feb. 5, 9:30 am Automatic Differentiation in PyTorch
Breandan Considine
Feb. 12, 1:00 pm Programming with Probabilities
-Recording, notes
Breandan Considine
Feb. 19, 12:00 pm Introduction to Sequent Calculus
-Recording, notes
Breandan Considine
Feb. 26, 10:00 am Quantitative Equational Reasoning
-Recording, slides, textbook #1, textbook #2
We develop a quantitative analogue of equational reasoning which we call quantitative algebra. We define an equality relation indexed by rationals: a =ε b which we think of as saying that “a is approximately equal to b up to an error of ε”. We have 4 interesting examples where we have a quantitative equational theory whose free algebras correspond to well known structures. The most interesting example comes from the case of the Kantorovich metric. I will present this as a tutorial on equational reasoning.
Mar. 1 - Mar. 5 Study Break / Période d’activités libres N/A
Mar. 12, 11:00 am Drasil: Generate all the Things!
-Recording, slides
Generative techniques are useful for more than just code generation: many of the artifacts that make up ‘software’ can be generated too. The question then becomes, which ones, what is an adequate ‘source’ language for describing them, and when is such an approach effective. This talk will meander through these questions, giving answers that have emerged from our work on the Drasil system. While far from a silver bullet, an interesting subset of software does emerge that can be effectively attacked in this way.
Jacques Carette
Mar. 19, 9:30 am Automatic Reparameterisation of Probabilistic Programs
-Recording, slides, Stan tutorial
Markov chain Monte Carlo (MCMC) algorithms can be used to approximate a probability distribution by continuously sampling from it. Some MCMC strategies, such as Hamiltonian Monte Carlo (HMC), use the gradient of the unnormalised density function to increase the quality of the obtained samples and the speed of convergence of the algorithm. But such algorithms can fail, and often silently, if the curvature of this target density function varies. One way to work around this problem is to reparameterise the distribution of interest, meaning to express it in terms of different parameters.
In this talk, I will describe the practical challenges of finding a suitable reparameterisation, and demonstrate how we can use mechanisms available in recent probabilistic programming languages to implement a family of parameterisations often used in practice. In particular, we will look at a continuous relaxation of the question of what parameterisation to use and combine it with variational inference to obtain robust and efficient MCMC samplers.
Maria Gorinova
Mar. 26, 12:00 pm Inside Every Calculus Is A Little Algebra Waiting To Get Out
-Recording, textbook #1, textbook #2
Because of deep learning, there has been a surge in interest in automatic differentiation, especially from the functional programming community. As a result there are many recent papers that look at AD from a Category Theory perspective. However, Category Theorists have already been looking at differentiation and calculus in general since the late 60’s in the context of Synthetic Differential Geometry, but it seems that this work is largely ignored by those interested in AD. In this talk, we will provide a gentle introduction to the ideas of SDG, by relating them to dual numbers, and show how it provides a simple and purely algebraic approach to (automatic) differentiation.
Erik Meijer
Apr. 1, 9:30 am Tensor Networks, Probabilistic Modeling, and Formal Grammar
-Recording, slides, FGs, Codes on graphs (thanks to Tristan)
Tensor networks (TNs) are a general formalism for efficiently modeling complex higher-order tensors, whose broad applications within quantum many-body physics have led them to be described as the natural framework for modeling quantum states of matter. Spurred on by their independent rediscovery in applied mathematics, TNs have increasingly been used for machine learning, where they have unlocked a growing number of novel theoretical and practical results. In this talk, we discuss one such result, concerning the use of TN models for modeling probabilistic sequence data. We show how a simple recurrent TN model enables a novel algorithm for conditional sampling from collections of strings defined by formal grammars, a significant generalization of the autoregressive sampling of typical language models. By developing this unexpected link between quantum physics and language, we expect our results to contribute to the search for mathematically principled methods for unsupervised grammatical inference in NLP.
Jacob Miller
Apr. 9, 10:00 am Computing derivatives of matrix and tensor expressions
-Recording, slides
Expressions involving matrix and tensors are often encountered in the area of machine learning. For such expressions, it is often necessary to obtain first and second-order derivatives. In my talk, I will present two algorithmic approaches for computing such matrix and tensor derivatives. Both approaches can be used for symbolic as well as for forward and reverse mode algorithmic differentiation. Experiments show a speedup between two and three orders of magnitude over state-of-the-art frameworks like TensorFlow or PyTorch when evaluating higher-order derivatives.
An online interface to our approach can be found at http://www.MatrixCalculus.org.
Sören Laue
Apr. 16, 3:00 pm Harnessing second order optimizers from deep learning frameworks
-Recording, slides, GitHub, PyPI, Read the Docs
Have you ever wanted to use a second order optimizer on code written in TensorFlow or PyTorch? What about optimizing a dictionary of tensors using SciPy minimize? If so, a lot of troublesome glue code was probably needed. For another way, look no further than the dict-minimize package, which takes care of everything and lets users easily optimize objectives implemented in TensorFlow, PyTorch, or JAX.
Ryan Turner
Apr. 22, 1:30 pm Factor Graph Grammars for Probabilistic Modeling
-Recording, slides, fggs / HRG survey, Graph & AMR parsing, MT
Probabilistic graphical models (PGMs) are a powerful method for specifying probability distributions. However, they are unable to represent some of the most popular models used in NLP, such as HMMs and PCFGs (since PGMs assume a fixed number of random variables, while HMMs and PCFGs can generate sentences of arbitrary length). In this talk, I will present factor graph grammars (FGGs), an extension of PGMs that is able to capture these models and many others. A FGG is a type of grammar, called a hyperedge replacement graph grammar, which generates a (possibly infinite) set of factor graphs. Conveniently, inference can be done over the FGG without enumerating all the factor graphs in the set. For finite variable domains (but possibly infinite sets of graphs), a generalization of variable elimination to FGGs allows exact and tractable inference in many situations. This talk will introduce the FGG formalism and present that inference algorithm.
Darcey Riley
Torsten Scholak

If you would like to sign up to present or wish to receive updates from the mailing list, please reach out to the organizer via the following address: breandan dot considine at mail dot mcgill dot ca.

A few recent papers on differential, integral and logical calculi are listed for illustrative purposes below, following the general theme of topics we plan to discuss. Other suggested reading materials are also welcome!

### Differential

Differentiable programming concerns the calculus of infinitesimal change and change propagation through a computation graph. This subject is the original and primary focus of this reading group.

### Integral

We will also discuss some topics in automatic integration or probabilistic programming, i.e. the study of volume, density and how to derive their exact or approximate quantity.

### Logical

Finally, we plan to discuss some related topics in other logical calculi, including combinatory logic, sequent calculus and other formal systems. This material will be primarily introductory and no prior experience is expected or required.