Commit Graph

363 Commits

Author SHA1 Message Date
Adriane Boyd
81874265e9 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5-1 2022-08-24 12:47:42 +02:00
Tobius Saul
c09d2fa25b
luganda language extension (#10847)
* luganda language extension

* __init__.py changes

* New enhancements

* Lexical attribute changed

* punctuaction and sentence additions

* Remove comment header

* Fix typos, reformat

* reformated version

* Add tokenizer test

* Remove contractions from stop words

* Format

* Add Luganda to website

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-23 13:09:36 +02:00
Adriane Boyd
5fa8f4faca
Switch ru and uk lemmatizers to pymorphy3 (#11345)
* Switch ru and uk lemmatizers to pymorphy3

* Switch to pymorphy3 in tests
2022-08-22 11:27:14 +02:00
Adriane Boyd
09b3118b26
Add uk pipelines to website (#11332) 2022-08-18 14:04:57 +02:00
Jules Belveze
cd09614ab2
chore: add 'concepCy' to spacy universe (#11255)
* chore: add 'concepCy' to spacy universe

* docs: add 'slogan' to concepCy
2022-08-04 15:42:38 +09:00
Adriane Boyd
7a99fe3c65
Move sent-patterns to correct section of universe.json (#11192) 2022-07-25 09:14:50 +02:00
0xpeIpeI
93960dc4b5
[universe project] create English interpretation project (#11184)
* [add] my universe  project setting

* [modify] A few adjustments

* [Modify] change package description
2022-07-24 19:01:04 +09:00
Lucas Terriel
7ff52c02a1
Update meta for spacyfishing in spaCy Universe (#11185)
* add new logo for spacyfishing to update spacy universe

* change logo location
2022-07-24 17:10:29 +09:00
Maarten Grootendorst
1caa2d1d16
Added BERTopic to Spacy Universe (#11159)
* Added BERTopic to Spacy Universe

* Fix no render of visualization
2022-07-19 19:37:18 +09:00
Adriane Boyd
2235e3520c
Update binder version in docs (#11124) 2022-07-12 15:20:33 +02:00
Adriane Boyd
11f859c132
Docs for v3.4 (#11057)
* Add draft of v3.4 usage

* Add Croatian models

* Add Matcher min/max

* Update release notes

* Minor edits

* Add updates, tables

* Update pydantic/mypy versions

* Update version in README

* Fix sidebar
2022-07-11 15:36:31 +02:00
Richard Hudson
dc38a0f079
Change demo URL (#11102) 2022-07-08 19:19:48 +02:00
Nipun Sadvilkar
bb3e11b9a1
Github Action for spaCy universe project alert (#11090) 2022-07-07 17:50:30 +05:30
Kenneth Enevoldsen
7b220afc29
Added asent to spacy universe (#11078)
* Added asent to spacy universe

* Update addition of asent following correction
2022-07-07 13:25:25 +09:00
schaeran
b3165db41b remove universe object: spacy-langdetect 2022-07-04 16:07:18 +02:00
schaeran
4e8a5994df remove universe object: NLPre 2022-07-04 16:06:58 +02:00
schaeran
0e4a835468 remove universe object: num_fh 2022-07-04 16:06:38 +02:00
schaeran
5000a08a20 remove universe object: adam_qas 2022-07-04 16:06:20 +02:00
schaeran
60a35a2bb2 remove universe object: spacy_kenlm 2022-07-04 16:06:02 +02:00
schaeran
224f30c563 remove universe object: spacy-raspberry 2022-07-04 16:05:34 +02:00
schaeran
a9062ebf17 remove universe object: spacy-lookup 2022-07-04 16:05:11 +02:00
schaeran
9b823fc9e9 remove universe object: NeuroNER 2022-07-04 16:04:50 +02:00
schaeran
b94bcaa62f remove universe object: spacy-vis 2022-07-04 16:04:29 +02:00
schaeran
880e7db44e remove universe object: spacy_grammar 2022-07-04 16:04:06 +02:00
schaeran
6c036d1e25 remove universe object: spacy_hunspell 2022-07-04 16:03:30 +02:00
Dmytro Sadovnychyi
4cd8b4cc22
Fix some of the broken links on universe pages (#11011)
Currently some of the "AUTHOR INFO" links (e.g. here[0]) are broken:

```
https://github.com/https://github.com/explosion
```

[0] https://spacy.io/universe/project/spacy-experimental


Also one remains broken with `https://szegedai.github.io/`.
2022-06-23 17:53:00 +02:00
Adriane Boyd
f1197d9175
Add API docs for token attribute symbols (#10836)
* Add API docs for token attribute symbols

* Remove NBSP's

* Fix typo

* Rephrase

Co-authored-by: svlandeg <svlandeg@github.com>
2022-06-23 08:16:38 +02:00
Lucaterre
2820d7dd8d correct typo in universe.json for 'code_example' key : pipe name 'entityfishing' 2022-06-20 15:26:23 +02:00
Lucaterre
cdad815c68 updated spacy universe for spacyfishing 2022-06-20 14:28:49 +02:00
Gor Arakelyan
605f84938b
Add "Aim-spaCy" to spaCy Universe (#10943)
* Add Aim-spaCy to spaCy universe

* Update Aim thumbnail

* Fix author links

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-06-10 18:33:17 +09:00
vincent d warmerdam
e7d2b26966
Add spacy-report to universe (#10910)
* Add spacy-report to universe

* Remove extra comma

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-06-05 18:57:58 +09:00
Adriane Boyd
a322d6d5f2
Add SpanRuler component (#9880)
* Add SpanRuler component

Add a `SpanRuler` component similar to `EntityRuler` that saves a list
of matched spans to `Doc.spans[spans_key]`. The matches from the token
and phrase matchers are deduplicated and sorted before assignment but
are not otherwise filtered.

* Update spacy/pipeline/span_ruler.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix cast

* Add self.key property

* Use number of patterns as length

* Remove patterns kwarg from init

* Update spacy/tests/pipeline/test_span_ruler.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Add options for spans filter and setting to ents

* Add `spans_filter` option as a registered function'
* Make `spans_key` optional and if `None`, set to `doc.ents` instead of
`doc.spans[spans_key]`.

* Update and generalize tests

* Add test for setting doc.ents, fix key property type

* Fix typing

* Allow independent doc.spans and doc.ents

* If `spans_key` is set, set `doc.spans` with `spans_filter`.
* If `annotate_ents` is set, set `doc.ents` with `ents_fitler`.
  * Use `util.filter_spans` by default as `ents_filter`.
  * Use a custom warning if the filter does not work for `doc.ents`.

* Enable use of SpanC.id in Span

* Support id in SpanRuler as Span.id

* Update types

* `id` can only be provided as string (already by `PatternType`
definition)

* Update all uses of Span.id/ent_id in Doc

* Rename Span id kwarg to span_id

* Update types and docs

* Add ents filter to mimic EntityRuler overwrite_ents

* Refactor `ents_filter` to take `entities, spans` args for more
  filtering options
* Give registered filters more descriptive names
* Allow registered `filter_spans` filter
  (`spacy.first_longest_spans_filter.v1`) to take any number of
  `Iterable[Span]` objects as args so it can be used for spans filter
  or ents filter

* Implement future entity ruler as span ruler

Implement a compatible `entity_ruler` as `future_entity_ruler` using
`SpanRuler` as the underlying component:
* Add `sort_key` and `sort_reverse` to allow the sorting behavior to be
  customized. (Necessary for the same sorting/filtering as in
  `EntityRuler`.)
* Implement `overwrite_overlapping_ents_filter` and
  `preserve_existing_ents_filter` to support
  `EntityRuler.overwrite_ents` settings.
* Add `remove_by_id` to support `EntityRuler.remove` functionality.
* Refactor `entity_ruler` tests to parametrize all tests to test both
  `entity_ruler` and `future_entity_ruler`
* Implement `SpanRuler.token_patterns` and `SpanRuler.phrase_patterns`
  properties.

Additional changes:

* Move all config settings to top-level attributes to avoid duplicating
  settings in the config vs. `span_ruler/cfg`. (Also avoids a lot of
  casting.)

* Format

* Fix filter make method name

* Refactor to use same error for removing by label or ID

* Also provide existing spans to spans filter

* Support ids property

* Remove token_patterns and phrase_patterns

* Update docstrings

* Add span ruler docs

* Fix types

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Move sorting into filters

* Check for all tokens in seen tokens in entity ruler filters

* Remove registered sort key

* Set Token.ent_id in a backwards-compatible way in Doc.set_ents

* Remove sort options from API docs

* Update docstrings

* Rename entity ruler filters

* Fix and parameterize scoring

* Add id to Span API docs

* Fix typo in API docs

* Include explicit labeled=True for scorer

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-06-02 13:12:53 +02:00
richardpaulhudson
d4218366c5 Update Holmes entry in universe.json 2022-05-30 18:05:26 +02:00
schaeran
f5952c0851 update spaCy Universe: spacytextblob (code example) 2022-05-12 18:23:00 +02:00
Richard Hudson
c32e1a0079
Updated Coreferee Universe entry (#10763) 2022-05-06 13:21:39 +02:00
vincent d warmerdam
f3de976513
Update universe.json to Include spaCy video #6 (#10723)
* Update universe.json

I noticed that episode 6 was missing, so I added it.

* Update universe.json

* Update universe.json
2022-05-02 13:35:14 +02:00
Adriane Boyd
497a708c71
Docs for v3.3 (#10628)
* Temporarily disable CI tests

* Start v3.3 website updates

* Add trainable lemmatizer to pipeline design

* Fix Vectors.most_similar

* Add floret vector info to pipeline design

* Add Lower and Upper Sorbian

* Add span to sidebar

* Work on release notes

* Copy from release notes

* Update pipeline design graphic

* Upgrading note about Doc.from_docs

* Add tables and details

* Update website/docs/models/index.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix da lemma acc

* Add minimal intro, various updates

* Round lemma acc

* Add section on floret / word lists

* Add new pipelines table, minor edits

* Fix displacy spans example title

* Clarify adding non-trainable lemmatizer

* Update adding-languages URLs

* Revert "Temporarily disable CI tests"

This reverts commit 1dee505920.

* Spell out words/sec

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-04-28 14:09:35 +02:00
Mike
3b208197c3
Fixed example for spacy_syllables (#10705)
There was a typo in the example for the spacy_syllables project.
2022-04-25 16:40:54 +02:00
Schero1994
d622883a42
Adding and updating content in the spacy universe (#10493)
* signing contributor agreement

* adding new content to the spaCy universe

* updating outdated example codes

* resolving issues for the PR

* resolve review for klayers

* remove contributor-agreement file from the PR

* Update code example of spaCySentiWS

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy-sentiws code example

Co-authored-by: schaeran <schaeran1994@gmail.com>
Co-authored-by: schaeran <schaeran@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-04-15 15:36:54 +02:00
Philip Vollet
e63a5d4888
Update newsletter id (#10655) 2022-04-14 13:34:01 +02:00
Schero1994
caf8528af7
Batch #1 | spaCy universe cleanup (#10642)
* delete universe object: wmd-relax

* delete universe object: spaCy.jl

* delete universe object: saber

* delete universe object: languagecrunch

* delete universe object: gracyql

* delete universe object: ExcelCy

* delete universe object: EpiTator

Co-authored-by: schaeran <schaeran1994@gmail.com>
2022-04-14 10:08:19 +02:00
David Berenstein
d4196a62f1
added crosslingual coreference to spacy universe without additional commits (#10580)
* added crosslingual coreference to spacy universe

* Updated example to introduce batching example.

Co-authored-by: David Berenstein <david.berenstein@pandoraintelligence.com>
2022-04-08 08:23:58 +02:00
Bram Vanroy
f966bf6a15
Update to spacy_conll in universe (#10617)
* update to spacy_conll

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-04-04 17:57:52 +02:00
Adriane Boyd
85778dfcf4
Add edit tree lemmatizer (#10231)
* Add edit tree lemmatizer

Co-authored-by: Daniël de Kok <me@danieldk.eu>

* Hide edit tree lemmatizer labels

* Use relative imports

* Switch to single quotes in error message

* Type annotation fixes

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Reformat edit_tree_lemmatizer with black

* EditTreeLemmatizer.predict: take Iterable

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Validate edit trees during deserialization

This change also changes the serialized representation. Rather than
mirroring the deep C structure, we use a simple flat union of the match
and substitution node types.

* Move edit_trees to _edit_tree_internals

* Fix invalid edit tree format error message

* edit_tree_lemmatizer: remove outdated TODO comment

* Rename factory name to trainable_lemmatizer

* Ignore type instead of casting truths to List[Union[Ints1d, Floats2d, List[int], List[str]]] for thinc v8.0.14

* Switch to Tagger.v2

* Add documentation for EditTreeLemmatizer

* docs: Fix 3.2 -> 3.3 somewhere

* trainable_lemmatizer documentation fixes

* docs: EditTreeLemmatizer is in edit_tree_lemmatizer.py

Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-03-28 11:13:50 +02:00
David Berenstein
ed2ac34a8a
added Concise Concepts to spaCy universe (#10499)
* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

* updated information for Classy Classificaiton 

Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.

* added note about examples

* corrected for wrong formatting changes

* Update website/meta/universe.json with small typo correction

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* resolved another typo

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* added Concise Concepts package to spaCy universe.

* updated example code Concise Concepts

* updated description for Concise Concepts

* updated PR with more visually appealing examples

SO to koaning for the suggestions.

* corrected for small json typo's in concise concepts

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-03-24 18:00:12 +01:00
Basile Dura
107bab56b5
docs: add EDS-NLP to spaCy universe (#10489)
* docs: add EDS-NLP to spaCy universe

* fix: remove "standalone" tag for EDS-NLP

Co-authored-by: Basile Dura <basile.dura-ext@aphp.fr>
2022-03-21 11:03:39 +01:00
Lj Miranda
0b02dc4c57
Fix mixed-up parameters for spacy-conll (#10516) 2022-03-18 08:56:21 +01:00
David Berenstein
e021dc6279
Updated explenation for for classy classification (#10484)
* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

* updated information for Classy Classificaiton 

Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.

* added note about examples

* corrected for wrong formatting changes

* Update website/meta/universe.json with small typo correction

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* resolved another typo

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-03-15 16:42:33 +01:00
vincent d warmerdam
610001e8c7
Update universe.json (#10490)
The project moved away from Rasa and into my personal GitHub account.
2022-03-15 11:12:04 +01:00
Adriane Boyd
b2bbefd0b5
Add Finnish, Korean, and Swedish models and Korean support notes (#10355)
* Add Finnish, Korean, and Swedish models to website

* Add Korean language support notes
2022-03-07 17:03:45 +01:00