The Discipline of Compassion & De Finetti's Theorem

by Karthik Dinakar. Average Reading Time: about 6 minutes.

Resplendent and nourishing clouds of compassion gather around the work of stellar figures in an ancient lineage of wisdom spanning almost four thousand years. Sanskrit, a language of studying the mind, mathematics, a language of science, and music, a language of the human spirit congregate at a triumvirate focal point that embodies profound meaning and indivisible bliss. A focal point above which stands the choicest of all virtues with bolts of brilliant lightning that signify a shattering of illusion, now receiving the quenching downpour of compassion. A sanguine replenishment of energies, soothing and moving, in stark contrast to the drought of the mindless decadence of ego.

A discriminative examination of compassion reveals it to be free of discrimination itself; of selective erudition or preferential umbilical chords attached to self-indulgence. It is quite easy to be compassionate in the service of ego, easier still when it is in the service of self-delusion, but rather arduous to be compassionate with uniformity and without bias. Often in a world that seems to operate in an acerbic environment starved of trust, where cunning machinations run roost in the direction of self-deception of others, it could be argued that unblinkered compassion towards recalcitrant strains of tyranny and destructive evil is tantamount to their tacit approval. How then, can we reconcile the paradoxical ideas that while true compassion is independent and identically distributed free of ego and self-delusion, compassion towards the grotesquely revolting malice is dangerous complacency in its best?

A way of thinking about this conundrum is to look at the history of statistics, particularly the evolution of probabilistic decision theory. Structural similarities between ideas in probabilistic decision theory and four thousand years of rich studies of the human mind in Vedic and Buddhist philosophy is a mystical intersection of complementary energies that has not received attention beyond the oracles of early classical relative and quantum physics. The 18th-century reverend Thomas Bayes and Pierre-Simon Laplace, invented the Bayes' Theorem at a time when there was no distinction between descriptive and inferential probability. Let x \in X, observable items in a sample space X; \theta in \Theta where p(\theta) is a prior to the Bayesian drawn from a distribution \Theta. Then Bayes' Theorem says:

p(\theta|x) = \frac{p(x|\theta)p(\theta)}{p(x)}  \approx p(x|\theta)p(\theta), where

  • p(\theta|x) is the posterior,
  • p(x|\theta) is the likelihood
  • p(\theta) is the prior

The prior p(\theta) at this time was often viewed with a great deal of suspicion, as it entailed prior knowledge of the distribution at hand, heavily influencing inference. Both Bayes and Laplace distrusted the prior, questioning the rational base for selecting one prior over another, and called themselves 'objective Bayesians'. Frequentists like Fisher, Neyman and Wald, who were in favor of repeating a procedure many times and seeing how the inference changes with each iteration detested the prior and found ways of circumventing the prior by avoiding the Bayesian route completely. It was not until the late 1940's that saw the advent of 'subjective Bayesians' that embraced the prior, where the subjectivity of the prior was viewed as its strength, where the prior was derived in practice in consultation with a domain expert. While the pendulum in contemporary mainstream Bayesian statistics is now back in the court of the objective Bayes, the prior is often derived from frequentist analytical tools.

Bruno de Finneti

But the work of Bruno De Finitte, the Italian statistician who promoted the emphasis on the subjective prior, is my inspiration, for his work on consultation of domain experts is palpably underestimated. His theorem of infinite exchangeability of random variables is the precursor to families of latent variable models and the door to human-in-the-loop Bayesian machine learning and a forceful clarion call for inspection of biases laden in data, human beings and indeed the inference routines for Bayesian estimation. To understand how De Finitte's theorem can help us understand the conundrum of disciplined compassion, let us first look at this theorem:

A set of independent and identically distributed (iid) random variables (x_{1}, x_{2},...,x_{n}) is an infinitely exchangeable sequence of random variables if for any n, the joint distribution p(x_{1}, x_{2},..,x_{n}) is invariant to permutations of the indices, that is, for any permutation \pi,

p(x_{1}, x_{2},..,x_{n}) = p(x_{\pi_{1}}, x_{\pi_{2}},...,x_{\pi_{n}})

The key lesson here is the while independent and identically distributed random variables are always infinitely exchangeable, the inverse is not necessarily true. A sequence of infinitely exchangeable variables are not necessarily iid . Here lies the strength of his theorem, which states that:

p(x_{1}, x_{2},..,x_{n}) = \int \prod_{i=1}^{n} p(x_{i} | \theta) P(d\theta)

A sequence of infinitely exchangeable variables are as if a random parameter was drawn from a distribution \theta, and conditioned on this random parameter, the random variables in question were independent and identically distributed. Most often than not, the random parameter \theta represents a hidden or latent parameter in graphical models. For example, in a collection of documents, each document is a distribution over some hidden topics, (coming from a Dirichlet distribution). So topics are exchangeable and shared across every document, and no document has zero presence of a topic, not every document has the same probability mass of topic in a previous document.

De Finitte's theorem of infinite exchangeability offers one of the most direct insights into the conundrum of disciplined compassion. First, compassion is a joint distribution of a sequence of many variables, and these are infinitely exchangeable. Second, it is important to recognize that while the mind must not hold the elements of compassion hostage to ego and self-indulgence, it does not mean that those very elements are to be dispensed without deliberation to all scenarios without any conditions. The conditions here are an act of discernment -- helping acute suffering with deep compassion is not the same as compassion towards a tyrant.

The elements of compassion as a joint distribution is an instrument that speaks about recognizing and staring at pain and suffering in its face. Being mindful at not getting sucked into the pain and suffering of the self and others when one realizes that the instrument of compassion in the self is unbound to ego and self-delusion. And in this moment of some clarity is a realization that when compassion is not tied to ego or self-delusion, it uses discernment for how it is dispensed towards others. Sometimes true compassion towards an obstinate child throwing incorrigible tantrums is to let the child fall and learn a lesson. Sometimes, true compassion towards tyranny means just and peaceful ways of finding change.

If Sanskrit is a language of studying the mind, and mathematics is a language of science, and music a language of the human spirit, then the ancient and glorious tradition of Carnātic music is a focal point amalgamation of all three, a flintstone that ignites the subtle consciousness and mentors the soul, like the rāgam kīravāni falling as compassionate rain on a lush garden filled with opulent green.




[1] Shankara, 780CE. Shankara's Crest Jewel of Discrimination. 2nd Edition. Vedanta Press & Bookshop

[2] Dalai Lama, XIV; Geshe Thupten Jinpa (trans & ed) (2004), Practicing Wisdom: The Perfection of Shantideva's Bodhisattva Way, Wisdom Publications

[3] B. de Finetti. Theory of probability. Vol. 1-2. John Wiley & Sons Ltd., Chichester, 1990. Reprint of the 1975 translation

[4] Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022.