spaCy/language_data.py at b5247c49ebbb9cd3541767828d1e197c05cc760e - spaCy - Gitea

explosion/spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-06 04:43:17 +03:00

Matthew Honnibal 26446aa728 Avoid loading all French exceptions on import

Move exceptions loading behind a get_tokenizer_exceptions() function
for French, instead of loading into the top-level namespace. This
cuts import times from 0.6s to 0.2s, at the expense of making the
French data a little different from the others (there's no top-level
TOKENIZER_EXCEPTIONS variable.) The current solution feels somewhat
unsatisfying.

2017-02-25 11:55:00 +01:00

12 lines

266 B

Python

Raw Blame History

 # encoding: utf8
 from __future__ import unicode_literals
 from .stop_words import STOP_WORDS
 from .tokenizer_exceptions import get_tokenizer_exceptions, TOKEN_MATCH
 STOP_WORDS = set(STOP_WORDS)
 __all__ = ["STOP_WORDS", "get_tokenizer_exceptions", "TOKEN_MATCH"]