spaCy

explosion/spaCy

Fork 0

mirror of https://github.com/explosion/spaCy.git synced 2025-07-04 03:43:09 +03:00

Commit Graph

Author	SHA1	Message	Date
Matthew Honnibal	4b2e5e59ed	Add flush_cache method to tokenizer, to fix #1061 The tokenizer caches output for common chunks, for efficiency. This cache is be invalidated when the tokenizer rules change, e.g. when a new special-case rule is introduced. That's what was causing #1061. When the cache is flushed, we free the intermediate token chunks. I think this is safe --- but if we start getting segfaults, this patch is to blame. The resolution would be to simply not free those bits of memory. They'll be freed when the tokenizer exits anyway.	2017-07-22 15:06:50 +02:00

Author

SHA1

Message

Date

Matthew Honnibal

4b2e5e59ed

Add flush_cache method to tokenizer, to fix #1061

The tokenizer caches output for common chunks, for efficiency. This
cache is be invalidated when the tokenizer rules change, e.g. when a new
special-case rule is introduced. That's what was causing #1061.

When the cache is flushed, we free the intermediate token chunks.
I *think* this is safe --- but if we start getting segfaults, this patch
is to blame. The resolution would be to simply not free those bits of
memory. They'll be freed when the tokenizer exits anyway.

2017-07-22 15:06:50 +02:00

1 Commits