Fix spacing after token_match

The boolean flag indicating a space after the token was being set incorrectly after the token_match regex was applied. Fixes #859.
2025-07-15 18:52:29 +03:00 · 2017-03-08 14:33:32 +01:00 · 2017-03-08 14:33:32 +01:00 · 0a6d7ca200
commit 0a6d7ca200
parent cd33b39a04
1 changed files with 4 additions and 1 deletions
--- a/spacy/tokenizer.pyx
+++ b/spacy/tokenizer.pyx
@ -275,7 +275,10 @@ cdef class Tokenizer:
            if cache_hit:
                pass
            elif self.token_match and self.token_match(string): 
-                tokens.push_back(self.vocab.get(tokens.mem, string), not suffixes.size())
+                # We're always saying 'no' to spaces here -- the caller will
+                # fix up the outermost one, with reference to the original.
+                # See Issue #859
+                tokens.push_back(self.vocab.get(tokens.mem, string), False)
            else:
                matches = self.find_infix(string)
                if not matches: