Commit Graph

15223 Commits

Author SHA1 Message Date
Paul O'Leary McCann
de5bc8a0e1 Update subset/superset docs (#8795)
* Update subset/superset docs

* Update website/docs/usage/rule-based-matching.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-27 13:15:27 +02:00
Adriane Boyd
8547514aa4
Remove labels from textcat component config example (#8815) 2021-07-27 13:14:38 +02:00
Paul O'Leary McCann
76ac95923a Add note to migration guide about lexeme tables (fix #7290)
This just adds the resolution from #6388 to the docs.
2021-07-27 19:19:25 +09:00
Paul O'Leary McCann
67ecdcc3ac
Update subset/superset docs (#8795)
* Update subset/superset docs

* Update website/docs/usage/rule-based-matching.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-27 12:08:46 +02:00
Adriane Boyd
81d3a1edb1
Use tokenizer URL_MATCH pattern in LIKE_URL (#8765) 2021-07-27 12:07:01 +02:00
Adriane Boyd
4f28190afe
Merge pull request #8813 from adrianeboyd/chore/develop-v3.2
Update develop for v3.2
2021-07-27 11:26:18 +02:00
Ines Montani
7f21c7dfa2
Merge pull request #8794 from explosion/autoblack
Auto-format code with black
2021-07-27 12:17:15 +10:00
Ines Montani
34c401f04f
Merge pull request #8801 from polm/fix/respect-no-skip (fixes #8796)
Respect the no_skip value
2021-07-27 12:16:47 +10:00
Ines Montani
cf3855ae05 Merge pull request #8806 from Ledenel/master [ci skip]
fix typo
2021-07-27 12:15:44 +10:00
Ines Montani
5c762e08d7 Merge pull request #8808 from kevinlu1248/master [ci skip]
Changed a CLI command in data-formats.md due to erroneous information
2021-07-27 12:15:35 +10:00
Ines Montani
134cb06af3
Merge pull request #8808 from kevinlu1248/master [ci skip]
Changed a CLI command in data-formats.md due to erroneous information
2021-07-27 12:15:16 +10:00
Ines Montani
9bf0d6f2fd
Merge pull request #8806 from Ledenel/master [ci skip]
fix typo
2021-07-27 12:14:22 +10:00
Kevin Lu
4a8e9e4e4e
Update data-formats.md 2021-07-25 22:58:53 -07:00
Ledenel
413f745c68 fix broken example in spaCy universe Chatterbot 2021-07-25 15:53:32 +00:00
Paul O'Leary McCann
284b530c63 Respect the no_skip value
Seems like the logic for this was just left out. See #8796.
2021-07-24 15:31:17 +09:00
explosion-bot
a58ab6ea22 Auto-format code with black 2021-07-23 08:04:09 +00:00
Adriane Boyd
6bbc2b1956
Reload train corpus in debug data after initialize (#8776) 2021-07-21 22:38:40 +02:00
svlandeg
f4f270940c Merge remote-tracking branch 'upstream/master' into spacy.io 2021-07-20 16:14:16 +02:00
Adriane Boyd
d48c01a6f7
Remove extraneous grc test file (#8768) 2021-07-20 15:51:15 +02:00
Sofie Van Landeghem
ffaead8fe0
bump to 3.1.1 2021-07-19 14:48:27 +02:00
Sofie Van Landeghem
83e27d262e
negative tag annotation (#8731)
* unit test to unlearn tag via negative annotation

* bump thinc to 8.0.8
2021-07-19 14:39:11 +02:00
Adriane Boyd
0e4b96c97e
Update lexeme ranks for loaded vectors (#8640)
Update the ranks for any lexemes that have been added to the vocab
before the vectors are added to the model.
2021-07-19 18:25:54 +10:00
Adriane Boyd
e532c69475
Update Language.replace_pipe for disabled components (#8729)
* Fix the index where the replacement in inserted to account for
disabled components
* Allow `Language.replace_pipe` to replace disabled components
2021-07-19 18:06:12 +10:00
Kenneth Enevoldsen
2880ae70b0 removed outdated spacy version for spacymoji
From the documentation of spacymoji (and the requirements.txt) it seems like it is not only for version 2.
2021-07-18 19:19:55 +09:00
Kenneth Enevoldsen
812746464b fixed GitHub link and thumbnail
Sorry, I seem to have misunderstood that the GitHub reference shouldn't be a link.
2021-07-18 19:19:37 +09:00
Paul O'Leary McCann
d717593eb7
Merge pull request #8754 from KennethEnevoldsen/patch-1
[minor] removed outdated spacy version for spacymoji
2021-07-18 19:17:33 +09:00
Paul O'Leary McCann
ac67639eaf
Merge pull request #8755 from KennethEnevoldsen/patch-2
fixed GitHub link and thumbnail
2021-07-18 19:14:57 +09:00
Kenneth Enevoldsen
5d6aed0773
fixed GitHub link and thumbnail
Sorry, I seem to have misunderstood that the GitHub reference shouldn't be a link.
2021-07-18 10:22:00 +02:00
Ines Montani
f90482d077 Tidy up and auto-format 2021-07-18 15:44:56 +10:00
Ines Montani
98cf872e11 Fix JSON [ci skip] 2021-07-18 13:21:43 +10:00
Ines Montani
313f55e560 Fix JSON [ci skip] 2021-07-18 13:21:33 +10:00
Ines Montani
a792e1119f Merge pull request #8702 from KennethEnevoldsen/master [ci skip] 2021-07-18 13:19:09 +10:00
Ines Montani
51e5903d6f
Merge pull request #8702 from KennethEnevoldsen/master [ci skip] 2021-07-18 13:18:42 +10:00
Kenneth Enevoldsen
8546948fba
removed outdated spacy version for spacymoji
From the documentation of spacymoji (and the requirements.txt) it seems like it is not only for version 2.
2021-07-17 15:19:43 +02:00
Kenneth Enevoldsen
a0e0ccdb46
Update website/meta/universe.json
Co-authored-by: Ines Montani <ines@ines.io>
2021-07-17 07:14:46 +02:00
Ines Montani
c0f436efbc
Merge pull request #8735 from explosion/autoblack 2021-07-17 13:46:17 +10:00
Ines Montani
483f3175cb Tidy up [ci skip] 2021-07-17 13:43:15 +10:00
Ines Montani
15e6578f7d
Adjust formatting 2021-07-17 10:49:13 +10:00
Mario Šaško
47c5a63a83 Add TakeLab/spacy-udpipe to Universe (#8698)
* Add TakeLab/spacy-udpipe to universe

* Add SCA

* Sign SCA
2021-07-16 11:18:09 +02:00
Mario Šaško
1ba2e8a646
Add TakeLab/spacy-udpipe to Universe (#8698)
* Add TakeLab/spacy-udpipe to universe

* Add SCA

* Sign SCA
2021-07-16 11:15:52 +02:00
explosion-bot
eff3d1088b Auto-format code with black 2021-07-16 08:03:36 +00:00
Adriane Boyd
e76e2addd1 Remove TrainablePipe as base class for Lemmatizer in API docs (#8725) 2021-07-15 16:42:14 +02:00
Adriane Boyd
f5acc48111
Remove TrainablePipe as base class for Lemmatizer in API docs (#8725) 2021-07-15 16:41:36 +02:00
Adriane Boyd
ac45c7c045
Add pre-commit to ignored requirements (#8728) 2021-07-15 16:41:15 +02:00
jmyerston
993b0fab0e
Added ancient Greek language support (#8606)
* Add ancient Greek language support

Initial commit

* Contributor Agreement

* grc tokenizer test added  and files formatted with black, unnecessary import removed

Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Commas in lists fixed. __init__py added to test

* Update lex_attrs.py

* Update stop_words.py

* Update stop_words.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-15 10:27:17 +02:00
Sofie Van Landeghem
77859beb99
spacy.ngram_range_suggester.v1 (#8699) 2021-07-15 10:01:22 +02:00
Julien Rossi
e117573822
Adding noun_chunks to the DUTCH language model (nl) (#8529)
*  implement noun_chunks for dutch language

* copy/paste FR and SV syntax iterators to accomodate UD tags
* added tests with dutch text
* signed contributor agreement

* 🐛 fix noun chunks generator

* built from scratch
* define noun chunk as a single Noun-Phrase
* includes some corner cases debugging (incorrect POS tagging)
* test with provided annotated sample (POS, DEP)

*  fix failing test

* CI pipeline did not like the added sample file
* add the sample as a pytest fixture

* Update spacy/lang/nl/syntax_iterators.py

* Update spacy/lang/nl/syntax_iterators.py

Code readability

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/tests/lang/nl/test_noun_chunks.py

correct comment

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* finalize code

* change "if next_word" into "if next_word is not None"

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-14 14:01:02 +02:00
Ines Montani
8ca6c58625 Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip]
Update spacy-stanza universe.json
2021-07-13 19:03:56 +10:00
Ines Montani
2a8eeed5da
Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip]
Update spacy-stanza universe.json
2021-07-13 19:03:42 +10:00
thomashacker
aafb89df78 Update universe.json code_example 2021-07-13 10:22:49 +02:00