* Fix inconsistency
This makes the failing test pass, so that behavior is consistent whether
patterns are added in one call or two.
The issue is that the hash for patterns depended on the index of the
pattern in the list of current patterns, not the list of total patterns,
so a second call would get identical match ids.
* Add illustrative test case
* Add failing test for remove case
Patterns are not removed from the internal matcher on calls to remove,
which causes spurious weird matches (or misses).
* Fix removal issue
Remove patterns from the internal matcher.
* Check that the single add call also gets no matches
Since a component may reference anything in the vocab, share the full
vocab when loading source components and vectors (which will include
`strings` as of #8909).
When loading a source component from a config, save and restore the
vocab state after loading source pipelines, in particular to preserve
the original state without vectors, since `[initialize.vectors]
= null` skips rather than resets the vectors.
The vocab references are not synced for components loaded with
`Language.add_pipe(source=)` because the pipelines are already loaded
and not necessarily with the same vocab. A warning could be added in
`Language.create_pipe_from_source` that it may be necessary to save and
reload before training, but it's a rare enough case that this kind of
warning may be too noisy overall.
* Add link to Discussions FAQ
* Remove old FAQ entries
I think these are no longer relevant.
- no-cache-dir: affected pip versions are *very* old now
- narrow unicode: not an issue from py3.3+
- utf-8 osx: upstream bug closed in 2019
Some of the other issues are also maybe not frequent.
* Validate pos values when creating Doc
* Add clear error when setting invalid pos
This also changes the error language slightly.
* Fix variable name
* Update spacy/tokens/doc.pyx
* Test that setting invalid pos raises an error
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* First take at StringStore/Vocab docs
Things to check:
1. The mysterious vocab members
2. How to make table of contents? Is it autogenerated?
3. Anything I missed / needs more detail?
* Update docs
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Updates based on review feedback
* Minor fix
* Move example code down
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Handle spacy-legacy in package CLI for dependencies
* Implement legacy backoff in spacy registry.find
* Remove unused import
* Update and format test
* pass alignments to callbacks
* refactor for single callback loop
* Update spacy/matcher/matcher.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>