In the heart of modern neural networks capable of understanding the sentence semantics and generating thousands of words per second, Transformers, as we call them, lie 2 core mathematical operations that often go unnoticed, namely the Softmax and Neg...