May 27, 2020Illustrated Guide to TransformerIllustrated breakdown of the Transformer architecture: self-attention, multi-head attention, positional encoding, and why it outperforms RNNs for NLP tasks.machine learningresearchLoading...