* added crosslingual coreference to spacy universe
* Updated example to introduce batching example.
Co-authored-by: David Berenstein <david.berenstein@pandoraintelligence.com>
* Add edit tree lemmatizer
Co-authored-by: Daniël de Kok <me@danieldk.eu>
* Hide edit tree lemmatizer labels
* Use relative imports
* Switch to single quotes in error message
* Type annotation fixes
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Reformat edit_tree_lemmatizer with black
* EditTreeLemmatizer.predict: take Iterable
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Validate edit trees during deserialization
This change also changes the serialized representation. Rather than
mirroring the deep C structure, we use a simple flat union of the match
and substitution node types.
* Move edit_trees to _edit_tree_internals
* Fix invalid edit tree format error message
* edit_tree_lemmatizer: remove outdated TODO comment
* Rename factory name to trainable_lemmatizer
* Ignore type instead of casting truths to List[Union[Ints1d, Floats2d, List[int], List[str]]] for thinc v8.0.14
* Switch to Tagger.v2
* Add documentation for EditTreeLemmatizer
* docs: Fix 3.2 -> 3.3 somewhere
* trainable_lemmatizer documentation fixes
* docs: EditTreeLemmatizer is in edit_tree_lemmatizer.py
Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update universe.json
added classy-classification to Spacy universe
* Update universe.json
added classy-classification to the spacy universe resources
* Update universe.json
corrected a small typo in json
* Update website/meta/universe.json
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/meta/universe.json
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/meta/universe.json
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update universe.json
processed merge feedback
* Update universe.json
* updated information for Classy Classificaiton
Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.
* added note about examples
* corrected for wrong formatting changes
* Update website/meta/universe.json with small typo correction
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* resolved another typo
* Update website/meta/universe.json
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* added Concise Concepts package to spaCy universe.
* updated example code Concise Concepts
* updated description for Concise Concepts
* updated PR with more visually appealing examples
SO to koaning for the suggestions.
* corrected for small json typo's in concise concepts
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update universe.json
added classy-classification to Spacy universe
* Update universe.json
added classy-classification to the spacy universe resources
* Update universe.json
corrected a small typo in json
* Update website/meta/universe.json
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/meta/universe.json
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/meta/universe.json
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update universe.json
processed merge feedback
* Update universe.json
* updated information for Classy Classificaiton
Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.
* added note about examples
* corrected for wrong formatting changes
* Update website/meta/universe.json with small typo correction
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* resolved another typo
* Update website/meta/universe.json
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Added spacy-wrap to universe
Added spacy-wrap to universe a small package for wrapping fine-tuned huggingface transformers to a spacy pipeline following the same API as spacy-transformers. (Currently limited to classification models)
* Update website/meta/universe.json
* Update website/meta/universe.json
* Update website/meta/universe.json
* Update website/meta/universe.json
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Support version tags in universe and add note about reporting
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* add entry for Applied Language Technology under "Courses"
Added the following entry into `universe.json`:
```
{
"type": "education",
"id": "applt-course",
"title": "Applied Language Technology",
"slogan": "NLP for newcomers using spaCy and Stanza",
"description": "These learning materials provide an introduction to applied language technology for audiences who are unfamiliar with language technology and programming. The learning materials assume no previous knowledge of the Python programming language.",
"url": "https://applied-language-technology.readthedocs.io/",
"image": "https://www.mv.helsinki.fi/home/thiippal/images/applt-preview.jpg",
"thumb": "https://applied-language-technology.readthedocs.io/en/latest/_static/logo.png",
"author": "Tuomo Hiippala",
"author_links": {
"twitter": "tuomo_h",
"github": "thiippal",
"website": "https://www.mv.helsinki.fi/home/thiippal/"
},
"category": ["courses"]
},
```
* Update the entry for "Applied Language Technology"
* Update universe plugins
* Adjust azure trigger
* Add init to tests/universe
* deliberatly trying to break the universe to see if the CI catches it
* revert
Co-authored-by: svlandeg <svlandeg@github.com>
* Draft spancat model
* Add spancat model
* Add test for extract_spans
* Add extract_spans layer
* Upd extract_spans
* Add spancat model
* Add test for spancat model
* Upd spancat model
* Update spancat component
* Upd spancat
* Update spancat model
* Add quick spancat test
* Import SpanCategorizer
* Fix SpanCategorizer component
* Import SpanGroup
* Fix span extraction
* Fix import
* Fix import
* Upd model
* Update spancat models
* Add scoring, update defaults
* Update and add docs
* Fix type
* Update spacy/ml/extract_spans.py
* Auto-format and fix import
* Fix comment
* Fix type
* Fix type
* Update website/docs/api/spancategorizer.md
* Fix comment
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Better defense
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix labels list
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/extract_spans.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/pipeline/spancat.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Set annotations during update
* Set annotations in spancat
* fix imports in test
* Update spacy/pipeline/spancat.py
* replace MaxoutLogistic with LinearLogistic
* fix config
* various small fixes
* remove set_annotations parameter in update
* use our beloved tupley format with recent support for doc.spans
* bugfix to allow renaming the default span_key (scores weren't showing up)
* use different key in docs example
* change defaults to better-working parameters from project (WIP)
* register spacy.extract_spans.v1 for legacy purposes
* Upd dev version so can build wheel
* layers instead of architectures for smaller building blocks
* Update website/docs/api/spancategorizer.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update website/docs/api/spancategorizer.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Include additional scores from overrides in combined score weights
* Parameterize spans key in scoring
Parameterize the `SpanCategorizer` `spans_key` for scoring purposes so
that it's possible to evaluate multiple `spancat` components in the same
pipeline.
* Use the (intentionally very short) default spans key `sc` in the
`SpanCategorizer`
* Adjust the default score weights to include the default key
* Adjust the scorer to use `spans_{spans_key}` as the prefix for the
returned score
* Revert addition of `attr_name` argument to `score_spans` and adjust
the key in the `getter` instead.
Note that for `spancat` components with a custom `span_key`, the score
weights currently need to be modified manually in
`[training.score_weights]` for them to be available during training. To
suppress the default score weights `spans_sc_p/r/f` during training, set
them to `null` in `[training.score_weights]`.
* Update website/docs/api/scorer.md
* Fix scorer for spans key containing underscore
* Increment version
* Add Spans to Evaluate CLI (#8439)
* Add Spans to Evaluate CLI
* Change to spans_key
* Add spans per_type output
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Fix spancat GPU issues (#8455)
* Fix GPU issues
* Require thinc >=8.0.6
* Switch to glorot_uniform_init
* Fix and test ngram suggester
* Include final ngram in doc for all sizes
* Fix ngrams for docs of the same length as ngram size
* Handle batches of docs that result in no ngrams
* Add tests
Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Nirant <NirantK@users.noreply.github.com>
* Update cats score names in Scorer API docs
* Refer to performance in meta
* Update package naming/versions, lemmatizer details
* Minor formatting fixes
* Provide more explanation for cats_score_desc
* Provide language-specific lemmatizer defaults in API docs
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* Draft out initial Spans data structure
* Initial span group commit
* Basic span group support on Doc
* Basic test for span group
* Compile span_group.pyx
* Draft addition of SpanGroup to DocBin
* Add deserialization for SpanGroup
* Add tests for serializing SpanGroup
* Fix serialization of SpanGroup
* Add EdgeC and GraphC structs
* Add draft Graph data structure
* Compile graph
* More work on Graph
* Update GraphC
* Upd graph
* Fix walk functions
* Let Graph take nodes and edges on construction
* Fix walking and getting
* Add graph tests
* Fix import
* Add module with the SpanGroups dict thingy
* Update test
* Rename 'span_groups' attribute
* Try to fix c++11 compilation
* Fix test
* Update DocBin
* Try to fix compilation
* Try to fix graph
* Improve SpanGroup docstrings
* Add doc.spans to documentation
* Fix serialization
* Tidy up and add docs
* Update docs [ci skip]
* Add SpanGroup.has_overlap
* WIP updated Graph API
* Start testing new Graph API
* Update Graph tests
* Update Graph
* Add docstring
Co-authored-by: Ines Montani <ines@ines.io>
* Avoid a SyntaxError in self-attentive-parser
Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser
* Create forest1988.md
Fill in the spaCy contributor agreement
* Adding Mindmeld to Universe JSON
Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/
* Signing contribution agreement.
Co-authored-by: kunshar2 <kunshar2@cisco.com>