From 00d481dd12c1fc6ed5a9ef865f775159b76ac2c4 Mon Sep 17 00:00:00 2001
From: Paul O'Leary McCann <polm@dampfkraft.com>
Date: Mon, 9 Aug 2021 18:04:42 +0900
Subject: [PATCH] Stack the mention scorer

In the reference implementations, there's usually a function to build a
ffnn of arbitrary depth, consisting of a stack of Linear >> Relu >>
Dropout. In practice the depth is always 1 in coref-hoi, but in earlier
iterations of the model, which are more similar to our model here (since
we aren't using attention or even necessarily BERT), using a small depth
like 2 was common. This hard-codes a stack of 2.

In brief tests this allows similar performance to the unstacked version
with much smaller embedding sizes.

The depth of the stack could be made into a hyperparameter.
---
 spacy/ml/models/coref.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/spacy/ml/models/coref.py b/spacy/ml/models/coref.py
index 3b14e6ecb..511e44476 100644
--- a/spacy/ml/models/coref.py
+++ b/spacy/ml/models/coref.py
@@ -36,6 +36,9 @@ def build_coref(
             Linear(nI=dim, nO=hidden)
             >> Relu(nI=hidden, nO=hidden)
             >> Dropout(dropout)
+            >> Linear(nI=hidden, nO=hidden)
+            >> Relu(nI=hidden, nO=hidden)
+            >> Dropout(dropout)
             >> Linear(nI=hidden, nO=1)
         )
         mention_scorer.initialize()