Wednesday, October 21, 2020

Lunchtime Lecture at Centre for Information Modelling (ZIM)

I recently had the pleasure to give a talk about neural networks and information theory at the Lunchtime Lectures organized by the Centre for Information Modelling (ZIM) of University of Graz. While most of the material I presented is either already common knowledge or based on work done by others, the last part of the presentation contains material from our own research. If you are interested in learning more about information-ordered cumulative ablation analyses and what they can do for neural network understanding, check out our paper on arXiv.

The slides of my talk can (as always) be accessed by clicking on the image below.

Monday, July 22, 2019

Talk at Workshop on Causality and Dynamics in Brain Networks @ IJCNN

I recently had the pleasure to give a talk about the information dimension of stochastic processes at the Workshop on Causality and Dynamics in Brain Networks which was held in conjunction with the Int. Joint Conf. On Neural Networks in Budapest. Our paper on this work is published in the IEEE Transactions on Information Theory and available on arXiv. The slides of my talk can (as always) be accessed by clicking on the image below.



This work on information dimension means a lot to me, and I’d like to let you know how it all started. I had come into contact with Renyi’s information dimension in 2012, when I worked on the information lossin principal components analysis. Back then I was considering only random variables, but in the summer of 2013, when I was investigating the information loss in anti-aliasing filters, I had the feeling that there should be a similar definition also for stochastic processes. I somehow managed to get the results published at the Int. Zurich Seminar in 2014, even though I have to admit that some of the more interesting results were obtained only with the help of two Assumptions, which would have been theorems had the results on information dimension already existed. I was certain at this point that a proper definition of the information dimension of a stochastic process must be connected with the bandwidth of the process.

After my PhD, I spent some time at TU Munich as a postdoc. My Schroedinger grant from the Austria Science Fund allowed me to work on information-theoretic reduction of Markov models. The first year though, which was funded directly by my boss Prof. Gerhard Kramer, I set out to work on “Two Little (?) Problems” - at least that was the title under which I presented these two problems to my colleagues during our winter retreat in the beautiful village of Stilfs in early 2015. The two problems were the fractal properties of polar codes (which are an entirely different story) and the information dimension of stochastic processes. While my work on polar codes progressed quickly, I was stuck on the other problem. In the meantime I have found out that researchers published a possible definition of information dimension for stochastic processes. But I was not able to pursue this further until I met Stefan Moser and Tobi Koch at the Int. Zurich Seminar in 2016 (where I presented my work on polar codes). I’ve met both of them before and I read some of their works – I just knew that they were able to help me.

Long story short, Tobi was strongly interested and invited me to stay with him at Universidad Carlos III de Madrid later this year. The stay was amazing: Late breakfasts, dinners at 9 pm, a dry 40 degress weather, and ice cold beer at night to cool down. Most importantly, we got a good way towards proving the connection between information dimension – defined differently than previously suggested – and bandwidth, at least for Gaussian processes. Over the coming months our results were refined and extended, and now it feels as if there is not much important left to do. It seems as if both of my “Two Little (?) Problems” are solved now.

Tuesday, June 4, 2019

Talk at LSIT 2019

On May 30th, I had the great pleasure to give a talk at the 5th London Symposium on Information Theory. The symposium is a revival of a conference series that was started in the 50s and 60s, with notable speakers such as Shannon and Turing. As back then, this year’s LSIT was jointly organized by Imperial College London (Deniz Gündüz) and King’s College London (Osvaldo Simeone). It was a great honor to be one of the invited speakers, and I was happy to talk about the potentials and pitfalls of training neural networks to minimize the information bottleneck functional (joint work with Ali Amjad from TUM). The paper accompanying this work is accepted for publication in the IEEE Transactions on Pattern Analysis and Machine Intelligence (but you can also find it on arXiv). If you are interested in the talk, as always you can download it by clicking on the image below.


Unfortunately, my stay at this symposium was the shorted I ever had (and, hopefully, will ever have): I got notice on the morning of my talk that my wife and my son fell sick, so I decided to fly back right after my talk to support them as best as I can. Apparently, the universe decided at the same time to make my trip back home as complicated as possible: The mobile website of Austrian Airlines claimed that my last name is invalid (whatever that means), a two-mile run to get my luggage from the hotel that made me all sweaty, and a fire alarm right in the middle of my talk overthrew the conference schedule. I still managed to hold my talk – it would not have been possible without the generous help of the organizers and the kind understanding of the entire audience.


Leaving a conference right after the talk is rude; it does not give your colleagues the opportunity to discuss your own ideas offline over coffee (or beer). Even worse, it can be seen as an expression of the disinterest in the talks of your colleagues. In my case, leaving the conference so early made me sad in one more way: I had to leave a group of people – information theorists – that I consider my academic family (and many of which I consider even friends). Only my own family could make me do that – and I know that the attendees of the London Symposium understand. Thanks!

Monday, April 15, 2019

Talk at apc|m 2019

I recently attended the 19th European Advanced Process Control and Manufacturing Conference, held this year in the nice city of Villach, Austria. The conference hosts experts in semiconductor manufacturing from both academia and industry.

I had the pleasure to talk about our work on an information-theoretic similarity measure for patterns on analog wafermaps. Analog wafermaps depict electrical measurement values of devices on a wafer, and patterns on these wafermaps may indicate process deviations. Detection and classifying these patterns, and reacting appropriately, can prevent further such deviations and, consequently, yield loss. Our work, a collaboration between Know-Center and K-AI within the SemI40 project, makes use of a feature extraction pipeline that was recently accepted for publication in the IEEE Transactions on Semiconductor Manufacturing. If you are interested in the slides, just click on the image below.


Tuesday, November 6, 2018

Data Science 101: Average Silhouette Coefficient

In this short entry I will talk about the average silhouette coefficient (ASC) which is a popular internal cluster validation measure. To be precise, the ASC is the average of the silhouette of a given dataset. We will consider a very specific dataset in this entry, which we shall call the Mouse dataset:


We will next cluster this dataset into three clusters using k-means. Furthermore, we will evaluate both the clustering result from k-means and the groundtruth clustering (namely, one "head" and two "ears") by means of the ASC:


What we observe is quite interesting. First of all, it can be seen that k-means fails to detect the groundtruth clustering, even though the clusters are separated. (See also here; it is argued that k-means prefers clusters of similar size, where size is taken in a Euclidean sense and not in the sense of equal number of datapoints.) Second, and more important, it is shown that the ASC for the "wrong" solution is larger (i.e., better) than the groundtruth.

As a second experiment, we projected the Mouse dataset in three-dimensional space and evaluated the ASC for the groundtruth clustering:



As it can be seen, the ASC differs from the ASC of the same cluster assignment in two-dimensional space -- ASC depends on the dimension of the dataset.

All this of course makes sense by recognizing that the ASC is distance-dependent. Since distances change when a dataset is projected in some higher-dimensional space, it is not surprising that the ASC changes as well. Furthermore, since k-means is a distance-based clustering technique, it is not surprising that the ASC of a k-means clustering is high. And finally, ASC will be a good indicator of cluster validity if the clusters in the dataset are distance-based (and not, e.g., density-, model-, or graph-based).

Related to this, in "Understanding of Internal Clustering Validation Measures" it is shown that k-means performs worse than Chameleon (Figure 6) on a very similar dataset (Figure 5); at least using Chameleon, the ASC is maximized by the correct number of clusters. This paper and the short analysis presented in this entry lead to the following questions:

  • Based on what cluster assumptions (distance, density, etc.) are different internal validation measures defined?
  • Given any internal validation measure, can we find a synthetic dataset for which the groundtruth clustering has a bad value, while an "obviously wrong" clustering has an extremely good value? I.e., can we find pathological examples for which a given internal validation measure fails? (This entry shows that the answer is positive for ASC.)
  • Given these pathological examples, can we show that their properties are in contrast with the cluster assumptions inherent to the considered internal validation measure?
Answering these questions will improve our understanding of these internal cluster validity measures and will help us choose the correct validity measure.