spaCy/spacy
Matthew Honnibal 4b2e5e59ed Add flush_cache method to tokenizer, to fix #1061
The tokenizer caches output for common chunks, for efficiency. This
cache is be invalidated when the tokenizer rules change, e.g. when a new
special-case rule is introduced. That's what was causing #1061.

When the cache is flushed, we free the intermediate token chunks.
I *think* this is safe --- but if we start getting segfaults, this patch
is to blame. The resolution would be to simply not free those bits of
memory. They'll be freed when the tokenizer exits anyway.
2017-07-22 15:06:50 +02:00
..
bn Merge pull request #885 from PySUST/master 2017-03-12 13:20:59 +01:00
cli Fixed typo in cli/package.py 2017-06-07 16:19:08 +02:00
data Make spacy/data a package 2017-03-18 20:04:22 +01:00
de Handle deprecated language-specific model downloading 2017-03-15 17:37:55 +01:00
en Fix typo in English tokenizer exceptions (resolves #1071) 2017-05-23 12:18:00 +02:00
es Update tokenizer_exceptions.py 2017-06-02 19:00:01 +02:00
fi Remove duplicate keys in [en|fi] data dicts 2017-03-19 11:40:29 +01:00
fr French NUM_WORDS and ORDINAL_WORDS 2017-06-28 14:11:20 +02:00
he add hebrew tokenizer 2017-03-24 18:27:44 +03:00
hu Use regex instead of re 2017-04-20 02:22:52 +03:00
it Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
ja Make create_tokenizer work with Japanese 2017-06-28 01:18:05 +09:00
language_data Add missing SP symbol to tag map, re #1052 2017-07-22 13:44:17 +02:00
munge
nb Add newline 2017-04-27 11:15:41 +02:00
nl fix import of stop words in language data 2017-07-05 14:08:04 +02:00
pt Import and combine Portuguese tokenizer exceptions (see #943) 2017-04-01 10:37:42 +02:00
serialize
sv Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
syntax Default to English noun chunks iterator if no lang set 2017-07-22 14:15:25 +02:00
tests Add flush_cache method to tokenizer, to fix #1061 2017-07-22 15:06:50 +02:00
tokens Fix Span.noun_chunks. Closes #1207 2017-07-22 14:14:57 +02:00
zh Update __init__.py 2017-07-01 13:12:00 +08:00
__init__.pxd
__init__.py Add __version__ symbol in __init__.py 2017-07-22 13:45:21 +02:00
__main__.py Add more options to read in meta data in package command 2017-04-16 13:06:02 +02:00
about.py Rename about.__docs__ to about.__docs_models__ 2017-05-13 13:09:00 +02:00
attrs.pxd
attrs.pyx Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
cfile.pxd Add hacky support for StringCFile, to make pickling easier. 2017-03-07 20:24:37 +01:00
cfile.pyx Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
compat.py Simplify compat.fix_text 2017-04-23 21:06:50 +02:00
deprecated.py Rename about.__docs__ to about.__docs_models__ 2017-05-13 13:09:00 +02:00
glossary.py Fix formatting 2017-05-03 20:11:02 +02:00
gold.pxd
gold.pyx Fix training methods 2017-04-16 13:00:37 -05:00
language.py Create directory if missing in save_to_directory 2017-04-23 21:24:43 +02:00
lemmatizer.py Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
lexeme.pxd
lexeme.pyx Fix gaps in Lexeme API. Closes #1031 2017-07-22 13:53:48 +02:00
matcher.pyx Fix json imports and use ujson 2017-04-15 12:13:34 +02:00
morphology.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
morphology.pyx Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
orth.pxd
orth.pyx Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
pipeline.pxd Add classes for beam parser and beam NER 2017-03-11 12:45:37 -06:00
pipeline.pyx Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
scorer.py Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
strings.pxd
strings.pyx Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
structs.pxd
symbols.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
symbols.pyx Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
tagger.pxd
tagger.pyx Fix json imports and use ujson 2017-04-15 12:13:34 +02:00
tokenizer.pxd
tokenizer.pyx Add flush_cache method to tokenizer, to fix #1061 2017-07-22 15:06:50 +02:00
train.py Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
typedefs.pxd
typedefs.pyx
util.py Use regex instead of re 2017-04-20 02:22:52 +03:00
vocab.pxd
vocab.pyx Fix json imports and use ujson 2017-04-15 12:13:34 +02:00