* Clarify Span.ents documentation
Ref: #10135
Retain current behaviour. Span.ents will only include entities within
said span. You can't get tokens outside of the original span.
* Reword docstrings
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update API docs in the website
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* This comma has been most probably been left out unintentionally, leading to string concatenation between the two consecutive lines. This issue has been found automatically using a regular expression.
* This comma has been most probably been left out unintentionally, leading to string concatenation between the two consecutive lines. This issue has been found automatically using a regular expression.
* Fix infix as prefix in Tokenizer.explain
Update `Tokenizer.explain` to align with the `Tokenizer` algorithm:
* skip infix matches that are prefixes in the current substring
* Update tokenizer pseudocode in docs
* Improve typing hints for Matcher.__call__
* Add typing hints for DependencyMatcher
* Add typing hints to underscore extensions
* Update Doc.tensor type (requires numpy 1.21)
* Fix typing hints for Language.component decorator
* Use generic np.ndarray type in Doc to avoid numpy version update
* Fix mypy errors
* Fix cyclic import caused by Underscore typing hints
* Use Literal type from spacy.compat
* Update matcher.pyi import format
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Instead of the running the actual suggester, which may require
annotation from annotating components that is not necessarily present in
the reference docs, use the built-in 1-gram suggester.
Two changes to speed up masking by ~10%:
- Use a bool array rather than an array of float32.
- Let the mask indicate whether a label was seen, rather than
unseen. The mask is most frequently used to index scores for
seen labels. However, since the mask marked unseen labels,
this required computing an intermittent flipped mask.
* Support version tags in universe and add note about reporting
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* added iob to int
* added tests
* added iob strings
* added error
* blacked attrs
* Update spacy/tests/lang/test_attrs.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update spacy/attrs.pyx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* added iob strings as global
* minor refinement with iob
* removed iob strings from token
* changed to uppercase
* cleaned and went back to master version
* imported iob from attrs
* Update and format errors
* Support and test both str and int ENT_IOB key
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* added new field
* added exception for IOb strings
* minor refinement to schema
* removed field
* fixed typo
* imported numeriacla val
* changed the code bit
* cosmetics
* added test for matcher
* set ents of moc docs
* added invalid pattern
* minor update to documentation
* blacked matcher
* added pattern validation
* add IOB vals to schema
* changed into test
* mypy compat
* cleaned left over
* added compat import
* changed type
* added compat import
* changed literal a bit
* went back to old
* made explicit type
* Update spacy/schemas.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update spacy/schemas.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update spacy/schemas.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Determine labels by factory name in debug data
For all components, return labels for all components with the
corresponding factory name rather than for only the default name.
For `spancat`, return labels as a dict keyed by `spans_key`.
* Refactor for typing
* Add test
* Use assert instead of cast, removed unneeded arg
* Mark test as slow
* Add link to pattern file info in EntityRuler.initialize docs
* Update website/docs/api/entityruler.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Use Vectors.shape rather than Vectors.data.shape
* Use Vectors.size rather than Vectors.data.size
* Add Vectors.to_ops to move data between different ops
* Add documentation for Vector.to_ops
By @polm, redone from #9917 after incorrect (reverted) rebase.
`sudachipy>=0.5.2` is needed for newer dictionaries. `sudachipy<0.6.0`
is kept for users who might still prefer the older version, in
particular to be able to compile it without rust.
* Corrected Span's __richcmp__ implementation to take end, label and kb_id in consideration
* Updated test
* Updated test
* Removed formatting from a test for readability sake
* Use same tuples for all comparisons
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* add entry for Applied Language Technology under "Courses"
Added the following entry into `universe.json`:
```
{
"type": "education",
"id": "applt-course",
"title": "Applied Language Technology",
"slogan": "NLP for newcomers using spaCy and Stanza",
"description": "These learning materials provide an introduction to applied language technology for audiences who are unfamiliar with language technology and programming. The learning materials assume no previous knowledge of the Python programming language.",
"url": "https://applied-language-technology.readthedocs.io/",
"image": "https://www.mv.helsinki.fi/home/thiippal/images/applt-preview.jpg",
"thumb": "https://applied-language-technology.readthedocs.io/en/latest/_static/logo.png",
"author": "Tuomo Hiippala",
"author_links": {
"twitter": "tuomo_h",
"github": "thiippal",
"website": "https://www.mv.helsinki.fi/home/thiippal/"
},
"category": ["courses"]
},
```
* Update the entry for "Applied Language Technology"