spaCy/spacy
Adriane Boyd 357fdd4871 Load exceptions last in Tokenizer.from_bytes (#12553)
In `Tokenizer.from_bytes`, the exceptions should be loaded last so that
they are only processed once as part of loading the model.

The exceptions are tokenized as phrase matcher patterns in the
background and the internal tokenization needs to be synced with all the
remaining tokenizer settings. If the exceptions are not loaded last,
there are speed regressions for `Tokenizer.from_bytes/disk` vs.
`Tokenizer.add_special_case` as the caches are reloaded more than
necessary during deserialization.
2023-05-12 09:55:22 +02:00
..
cli fix typo (#12543) 2023-05-12 09:55:22 +02:00
displacy Allow passing a Span to displacy.parse_deps (#12477) 2023-04-03 15:28:52 +02:00
kb rely on is_empty property instead of __len__ (#12347) 2023-03-01 17:33:31 +01:00
lang Update stop_words.py (#11997) 2022-12-19 16:17:49 +01:00
matcher perf(REL_OP): Replace some token.children with token.rights or token.lefts (#12528) 2023-05-12 09:55:22 +02:00
ml Support floret for PretrainVectors (#12435) 2023-04-03 15:28:52 +02:00
pipeline Fix pickle for ngram suggester (#12486) 2023-04-03 15:28:52 +02:00
tests Add model-last saving mechanism to pretraining (#12459) 2023-04-03 15:28:52 +02:00
tokens Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set (#12493) 2023-04-03 15:28:52 +02:00
training Add model-last saving mechanism to pretraining (#12459) 2023-04-03 15:28:52 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Simplify and clarify enable/disable behavior of spacy.load() (#11459) 2022-09-27 14:22:36 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.5.2 (#12508) 2023-04-06 17:30:39 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Intify IOB (#9738) 2022-01-20 13:19:38 +01:00
compat.py Custom component types in spacy.ty (#9469) 2021-10-21 15:31:06 +02:00
default_config_pretraining.cfg Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
default_config.cfg Add training.before_update callback (#11739) 2022-11-23 17:54:58 +01:00
errors.py Support floret for PretrainVectors (#12435) 2023-04-03 15:28:52 +02:00
glossary.py Add glossary entry for root (#10821) 2022-05-20 09:56:32 +02:00
language.py Have logging calls use string formatting types (#12215) 2023-02-02 11:15:22 +01:00
lexeme.pxd Fix Lexeme.from_ptr 2020-08-10 16:43:37 +02:00
lexeme.pyi fix types (#12365) 2023-03-09 10:32:33 +01:00
lexeme.pyx fix types (#12365) 2023-03-09 10:32:33 +01:00
lookups.py Fix issues for Mypy 0.950 and Pydantic 1.9.0 (#10786) 2022-05-25 09:33:54 +02:00
morphology.pxd Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
morphology.pyx Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Auto-format code with black (#12100) 2023-01-13 10:12:10 +01:00
scorer.py Restore v2 token_acc score implementation (#12073) 2023-01-11 08:01:47 +01:00
strings.pxd StringStore-related optimizations (#10938) 2022-07-04 15:04:03 +02:00
strings.pyi Fix StringStore.__getitem__ return type depending on parameter types (#10741) 2022-05-03 17:57:07 +02:00
strings.pyx StringStore-related optimizations (#10938) 2022-07-04 15:04:03 +02:00
structs.pxd Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
symbols.pxd introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
symbols.pyx introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
tokenizer.pxd Add tokenizer option to allow Matcher handling for all rules (#10452) 2022-03-24 13:21:32 +01:00
tokenizer.pyx Load exceptions last in Tokenizer.from_bytes (#12553) 2023-05-12 09:55:22 +02:00
ty.py Custom component types in spacy.ty (#9469) 2021-10-21 15:31:06 +02:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Add tests for projects to master (#12303) 2023-02-23 10:22:57 +01:00
vectors.pyx Add equality definition for vectors (#11806) 2022-11-16 09:44:42 +01:00
vocab.pxd Add support for floret vectors (#8909) 2021-10-27 14:08:31 +02:00
vocab.pyi Add vector deduplication (#10551) 2022-03-30 08:54:23 +02:00
vocab.pyx fix comparison of constants (#11834) 2022-11-21 08:12:03 +01:00