Mila Computer Calculus RG

Reading about differential, integral and logical calculi.

News: If you enjoyed COMPCALC, be sure to check out AIPLANS, a new workshop at NeurIPS ‘21!

“The derivative, as this notion appears in the elementary differential calculus, is a familiar mathematical example of a function for which both [the domain and the range] consist of functions. Or, turning to the integral calculus, if in the expression $\int^{1}_0(f x)dx$ we take the function $f$ as independent variable, we are led to a function for which the range of arguments consists of functions and the range of values, of numbers.”

In ancient times, our ancestors used small pebbles to encode and manipulate abstract quantities according to a small set of rules. In effect, they were trading mental for mechanical labor. With the advent of pen and paper, those same rules could be translated into written symbols and more recently, electrical impulses and transistors.

Today we make the same trade when implementing mathematical algorithms: one can either spend a great deal of effort manipulating an expression with pen and paper, then implement the result as a computer program, or we could implement the rules, then let the computer manipulate the expression on our behalf. We call this exchange calculus.

Modern calculus is a topic with a variety of applications in classical and statistical computing. Broadly, it encompasses any formal system for manipulating symbolic expressions according to a fixed set of rules. We are specifically interested in its applications to computer programming. Each week, we will read a paper and hold a short discussion. All are welcome to attend.

Schedule

Winter/Spring 2021

Date Talk Presenter
Feb. 5, 9:30 am Automatic Differentiation in PyTorch
-Slides, notebook, picograd
Breandan Considine
Feb. 12, 1:00 pm Programming with Probabilities
-Recording, notes
Breandan Considine
Feb. 19, 12:00 pm Introduction to Sequent Calculus
-Recording, notes
Breandan Considine
Feb. 26, 10:00 am Quantitative Equational Reasoning
-Recording, slides, textbook #1, textbook #2
We develop a quantitative analogue of equational reasoning which we call quantitative algebra. We define an equality relation indexed by rationals: a =ε b which we think of as saying that “a is approximately equal to b up to an error of ε”. We have 4 interesting examples where we have a quantitative equational theory whose free algebras correspond to well known structures. The most interesting example comes from the case of the Kantorovich metric. I will present this as a tutorial on equational reasoning.
Prakash Panangaden
Mar. 1 - Mar. 5 Study Break / Période d’activités libres N/A
Mar. 12, 11:00 am Drasil: Generate all the Things!
-Recording, slides
Generative techniques are useful for more than just code generation: many of the artifacts that make up ‘software’ can be generated too. The question then becomes, which ones, what is an adequate ‘source’ language for describing them, and when is such an approach effective. This talk will meander through these questions, giving answers that have emerged from our work on the Drasil system. While far from a silver bullet, an interesting subset of software does emerge that can be effectively attacked in this way.
Jacques Carette
Mar. 19, 9:30 am Automatic Reparameterisation of Probabilistic Programs
-Recording, slides, Stan tutorial
Markov chain Monte Carlo (MCMC) algorithms can be used to approximate a probability distribution by continuously sampling from it. Some MCMC strategies, such as Hamiltonian Monte Carlo (HMC), use the gradient of the unnormalised density function to increase the quality of the obtained samples and the speed of convergence of the algorithm. But such algorithms can fail, and often silently, if the curvature of this target density function varies. One way to work around this problem is to reparameterise the distribution of interest, meaning to express it in terms of different parameters.
In this talk, I will describe the practical challenges of finding a suitable reparameterisation, and demonstrate how we can use mechanisms available in recent probabilistic programming languages to implement a family of parameterisations often used in practice. In particular, we will look at a continuous relaxation of the question of what parameterisation to use and combine it with variational inference to obtain robust and efficient MCMC samplers.
Maria Gorinova
Mar. 26, 12:00 pm Inside Every Calculus Is A Little Algebra Waiting To Get Out
-Recording, textbook #1, textbook #2
Because of deep learning, there has been a surge in interest in automatic differentiation, especially from the functional programming community. As a result there are many recent papers that look at AD from a Category Theory perspective. However, Category Theorists have already been looking at differentiation and calculus in general since the late 60’s in the context of Synthetic Differential Geometry, but it seems that this work is largely ignored by those interested in AD. In this talk, we will provide a gentle introduction to the ideas of SDG, by relating them to dual numbers, and show how it provides a simple and purely algebraic approach to (automatic) differentiation.
Erik Meijer
Apr. 1, 9:30 am Tensor Networks, Probabilistic Modeling, and Formal Grammar
-Recording, slides, FGs, Codes on graphs (thanks to Tristan)
Tensor networks (TNs) are a general formalism for efficiently modeling complex higher-order tensors, whose broad applications within quantum many-body physics have led them to be described as the natural framework for modeling quantum states of matter. Spurred on by their independent rediscovery in applied mathematics, TNs have increasingly been used for machine learning, where they have unlocked a growing number of novel theoretical and practical results. In this talk, we discuss one such result, concerning the use of TN models for modeling probabilistic sequence data. We show how a simple recurrent TN model enables a novel algorithm for conditional sampling from collections of strings defined by formal grammars, a significant generalization of the autoregressive sampling of typical language models. By developing this unexpected link between quantum physics and language, we expect our results to contribute to the search for mathematically principled methods for unsupervised grammatical inference in NLP.
Jacob Miller
Apr. 9, 10:00 am Computing derivatives of matrix and tensor expressions
-Recording, slides
Expressions involving matrix and tensors are often encountered in the area of machine learning. For such expressions, it is often necessary to obtain first and second-order derivatives. In my talk, I will present two algorithmic approaches for computing such matrix and tensor derivatives. Both approaches can be used for symbolic as well as for forward and reverse mode algorithmic differentiation. Experiments show a speedup between two and three orders of magnitude over state-of-the-art frameworks like TensorFlow or PyTorch when evaluating higher-order derivatives.
An online interface to our approach can be found at http://www.MatrixCalculus.org.
Sören Laue
Apr. 16, 3:00 pm Harnessing second order optimizers from deep learning frameworks
-Recording, slides, GitHub, PyPI, Read the Docs
Have you ever wanted to use a second order optimizer on code written in TensorFlow or PyTorch? What about optimizing a dictionary of tensors using SciPy minimize? If so, a lot of troublesome glue code was probably needed. For another way, look no further than the dict-minimize package, which takes care of everything and lets users easily optimize objectives implemented in TensorFlow, PyTorch, or JAX.
Ryan Turner
Apr. 22, 1:30 pm Factor Graph Grammars for Probabilistic Modeling
-Recording, slides, fggs / HRG survey, Graph & AMR parsing, MT
Probabilistic graphical models (PGMs) are a powerful method for specifying probability distributions. However, they are unable to represent some of the most popular models used in NLP, such as HMMs and PCFGs (since PGMs assume a fixed number of random variables, while HMMs and PCFGs can generate sentences of arbitrary length). In this talk, I will present factor graph grammars (FGGs), an extension of PGMs that is able to capture these models and many others. A FGG is a type of grammar, called a hyperedge replacement graph grammar, which generates a (possibly infinite) set of factor graphs. Conveniently, inference can be done over the FGG without enumerating all the factor graphs in the set. For finite variable domains (but possibly infinite sets of graphs), a generalization of variable elimination to FGGs allows exact and tractable inference in many situations. This talk will introduce the FGG formalism and present that inference algorithm.
Darcey Riley
Apr. 30, 1:00 pm Gradual tensor typing for practical machine learning in Haskell with Hasktorch
-Recording, Haskell [Mask], Twitch / Lambda Montreal, Chris Olah’s blog
Hasktorch is a library for neural networks and tensor math in Haskell. It is leveraging the C++ backend of PyTorch for fast numerical computation with GPU support. Hasktorch’s goal is to provide a platform for machine learning using typed functional programming. Gradually typed Hasktorch is a new API for tensors and neural networks that interpolates between the already existing unchecked (that is, untyped) and checked (typed) Hasktorch APIs. Thus far, users have to choose whether they want to commit fully to either typed or untyped tensors and models. The new gradually typed API relaxes this black-and-white tradeoff and makes the decision more granular. In gradually typed Hasktorch, users can choose whether or not they want type checking for every individual type variable, like a tensor’s compute device (e.g. the CPU or a GPU), its precision (Bool, Int64, Float, etc.), or its shape (the names and sizes of its dimensions). Thus, users can enjoy the flexibility of an unchecked API for rapid prototyping, while they can also add as much type checking as they want later on. Alternatively, users can start with fully checked tensor and model types and relax them when and where they get in the way. Thus, gradually typed Hasktorch combines the best of both worlds, of checked and of unchecked Hasktorch. The talk will use simple examples to demonstrate the benefits of gradual typing for tensor math and machine learning.
Torsten Scholak
Apr. 26, 2022* Quiver Geometry
-Recording (*Presented one year later, for the ML4Code RG.)
Continuous geometry serves as a substrate for a deeper understanding of many topics in science and engineering. Understanding how families of objects form an abstract space can allow us to exploit familiar intuitions as well as powerful mathematical tools. An incomplete list of such abstract spaces includes the phase spaces of physical systems, the parameter spaces of learned models, the state spaces of systems under active control, spaces of continuous distributions in information geometry, and configuration spaces of mechanical robots. Continuous geometry is also the oldest playground of rational thought and argument, dating back to Euclid. Over the last two hundred years, sophisticated mathematical frameworks have been developed to organize in a unified way the geometrical phenomena that occur in multiple different contexts. However, the same maturity of effort does not exist for discrete spaces, for which the underlying mathematical descriptions, the graph and the hypergraph, are relatively recent inventions. This missing body of theory hobbles efforts to develop a deeper understanding of these discrete spaces where they occur, and apply standardized tools to understand them. I’ll try to present some of my efforts in this direction, using the very flexible formalism of rewriting systems to show how causality and geometry relate, and how seemingly continuous notions like curvature can have a completely discrete origin and interpretation. I’ll also outline where I think machine learning, program synthesis, and other disciplines might enjoy benefits from these kinds of tools as they mature.
Tali Benyon

If you would like to sign up to present or wish to receive updates from the mailing list, please reach out to the organizer via the following address: breandan dot considine at mail dot mcgill dot ca.

Reading Materials

A few recent papers on differential, integral and logical calculi are listed for illustrative purposes below, following the general theme of topics we plan to discuss. Other suggested reading materials are also welcome!

Differential

Differentiable programming concerns the calculus of infinitesimal change and change propagation through a computation graph. This subject is the original and primary focus of this reading group.

Integral

We will also discuss some topics in automatic integration or probabilistic programming, i.e. the study of volume, density and how to derive their exact or approximate quantity.

Presentations

Books

Logical

Finally, we plan to discuss some related topics in other logical calculi, including combinatory logic, sequent calculus and other formal systems. This material will be primarily introductory and no prior experience is expected or required.

Books