Zur JKU Startseite
Institut für Machine Learning
Was ist das?

Institute, Schools und andere Einrichtungen oder Angebote haben einen Webauftritt mit eigenen Inhalten und Menüs.

Um die Navigation zu erleichtern, ist hier erkennbar, wo man sich gerade befindet.

Hopfield Networks is All You Need

Main contributions 

We introduce a new energy function and a corresponding new update rule which is guaranteed to converge to a local minimum of the energy function.

The new energy function is a generalization (discrete states -> continuous states) of the modern Hopfield networks introduced by Krotov&Hopfield, öffnet eine externe URL in einem neuen Fenster and Demircigil et al, öffnet eine externe URL in einem neuen Fenster.

The new Hopfield network with continuous states keeps the characteristics of their discrete counterparts: exponential storage capacity, extremely fast convergence.

Surprisingly, the new update rule is the attention mechanism of transformer networks, see the "Attention Is All You Need" paper of Vaswani et al, öffnet eine externe URL in einem neuen Fenster.

We use these new insights to analyze transformer models. We found out that they have different operating modes and prefer operating in higher energy minima, which are metastable states.

We therefore choose the title Hopfield Networks is All You Need.

A new energy function and a new update rule

We introduce a new energy function using the log-sum-exp function (lse)

\(\displaystyle \text{E} = - \text{lse}\left( -\beta, \boldsymbol{X}^T \mathbf{\xi} \right) + \frac{1}{2} \xi^T \xi + \beta^{-1} \text{log} N + \frac{1}{2} M^2 \ , \)

which is constructed from \(N\) continuous patterns by the matrix \(\boldsymbol{X} =(\boldsymbol{x}_1, ..., \boldsymbol{x}_N)\), where \(M\) is the largest norm of all patterns.

The state  \(\xi\) is updated by the following new udpate rule:

\(\xi^{\text{new}} = \boldsymbol{X} \text{softmax} (\beta \boldsymbol{X}^T \xi) \).

We can now compare our new energy function to the discrete counterparts of Krotov&Hopfield, öffnet eine externe URL in einem neuen Fenster and  Demircigil et al., öffnet eine externe URL in einem neuen Fenster, which are also composed of a sum of a function of the dot product of a pattern \(\boldsymbol{x}_i\) and a state \(\xi\):

\(\displaystyle \text{E} = - F(\boldsymbol{X}^T \xi) \quad \text{and} \quad \displaystyle \text{E} = -\text{exp} (\text{lse}(1,\boldsymbol{X}^T\xi)) \ .\)

The most important properties of our new energy function are:

  1. Global convergence to a local minimum (Theorem 2)
  2. Exponential storage capacity (Theorem 3)
  3. Convergence after one update step (Theorem 4)

Exponential storage capacity and convergence after one update are inherited from Demircigil et al., öffnet eine externe URL in einem neuen Fenster

If we now (i) generalize the new update rule to multiple updates at once (\(\xi\) is replaced by the query matrix \(\boldsymbol{Q}\)), (ii) \(\boldsymbol{X}\) is denoted by \(\boldsymbol{K}\), and (iii) the result is multiplied by \(\boldsymbol{W}_V\) setting \(\boldsymbol{V} = \boldsymbol{W}_V \boldsymbol{K}\), we arrive at the self-attention of transformer networks.

Versatile Hopfield layer (beyond self-attention)

The new insights allow us to introduce a new PyTorch Hopfield layer which can be used as plug-in replacement for existing layers as well as for applications like multiple instance learning, set-based and permutation invariant learning, associative learning, and many more.

Additional functionalities of the new Hopfield layer compared to the transformer self-attention layer are:

  1. Association of two sets
  2. Variable Beta that determines the kind of fixed points
  3. Multiple Updates for precise fixed points
  4. Dimension of the associative space for controlling the storage capacity
  5. Static Patterns for fixed pattern search
  6. Pattern Normalization to control the fixed point dynamics by norm and shift of the patterns

If you want to test all these new functionalities in transformer models you can pass the Hopfield encoder layer and Hopfield decoder layer to the transformer encoder and transformer decoder modules.

For more information see Appendix C in our paper Hopfield Networks is All You Need, öffnet eine externe URL in einem neuen Fenster and our github repo https://github.com/ml-jku/hopfield-layers, öffnet eine externe URL in einem neuen Fenster .