Stack the mention scorer

In the reference implementations, there's usually a function to build a ffnn of arbitrary depth, consisting of a stack of Linear >> Relu >> Dropout. In practice the depth is always 1 in coref-hoi, but in earlier iterations of the model, which are more similar to our model here (since we aren't using attention or even necessarily BERT), using a small depth like 2 was common. This hard-codes a stack of 2. In brief tests this allows similar performance to the unstacked version with much smaller embedding sizes. The depth of the stack could be made into a hyperparameter.
2025-07-18 20:22:25 +03:00 · 2021-08-09 18:04:42 +09:00 · 2021-08-09 18:04:42 +09:00 · 00d481dd12
commit 00d481dd12
parent 56803d3909
1 changed files with 3 additions and 0 deletions
--- a/spacy/ml/models/coref.py
+++ b/spacy/ml/models/coref.py
@ -36,6 +36,9 @@ def build_coref(
            Linear(nI=dim, nO=hidden)
            >> Relu(nI=hidden, nO=hidden)
            >> Dropout(dropout)
+            >> Linear(nI=hidden, nO=hidden)
+            >> Relu(nI=hidden, nO=hidden)
+            >> Dropout(dropout)
            >> Linear(nI=hidden, nO=1)
        )
        mention_scorer.initialize()