Guest talk on BigBird, a sparse attention mechanism
Posted by Natarajan Vaidyanathan on February 24, 2021
We had an amazing talk by Dr. Guru Guruganesh from Google Research team on their recent NeurIPS paper: "Big Bird: Transformers for Longer Sequences".
It was a great opportunity to learn about the novel optimization strategies and techniques they are exploring, in order to train and scale up transformer models making them bigger, better and more efficient.