Commit Graph

14871 Commits

Author SHA1 Message Date
Ines Montani
f2d19e6dc2 Merge pull request #9003 from bbieniek/add-spacy-api-v3 [ci skip] 2021-08-20 11:23:50 +10:00
Paul O'Leary McCann
4ed5d9ad5a Add notes on preparing training data to docs (#8964)
* Add training data section

Not entirely sure this is in the right location on the page - maybe it
should be after quickstart?

* Add pointer from binary format to training data section

* Minor cleanup

* Add to ToC, fix filename

* Update website/docs/usage/training.md

Co-authored-by: Ines Montani <ines@ines.io>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Move the training data section further down the page

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/usage/training.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Run prettier

Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-16 17:39:19 +02:00
Ines Montani
d65e03adae Merge pull request #8951 from HLasse/master 2021-08-16 11:41:53 +10:00
Ines Montani
647abe186c Merge pull request #8938 from explosion/docs/prodigy-v1-11-project [ci skip]
Update Prodigy project template for v1.11
2021-08-12 21:17:14 +10:00
Ines Montani
c581848cbb Merge pull request #8910 from DuyguA/patch-1 [ci skip]
updated unv json for new book
2021-08-09 23:13:17 +10:00
Paul O'Leary McCann
35255786a1 Fix #8902 (bad link in docs)
typo fix
2021-08-09 13:59:59 +02:00
Adriane Boyd
c1caa47aa7 Support list values and INTERSECTS in Matcher (#8784)
* Support list values and IS_INTERSECT in Matcher

* Support list values as token attributes for set operators, not just as
pattern values.

* Add `IS_INTERSECT` operator.

* Fix incorrect `ISSUBSET` and `ISSUPERSET` in schema and docs.

* Rename IS_INTERSECT to INTERSECTS
2021-08-02 19:41:10 +02:00
Ines Montani
d79dbd0624 Merge pull request #8844 from thomashacker/bugfix/fix-doc-transformer-typo [ci skip]
Fix typo in Tok2VecTransformer example config
2021-07-30 09:11:24 +10:00
Ines Montani
4ddee5e84c Merge pull request #8841 from adrianeboyd/docs/ent-id-sep [ci skip]
Fix formatting of ent_id_sep in EntityRuler API docs
2021-07-30 09:11:15 +10:00
Ines Montani
cf9b671566 Merge pull request #8840 from polm/docs/evaluate-speed [ci skip] 2021-07-30 09:11:05 +10:00
Ines Montani
03a742f332 Merge pull request #8814 from polm/docs/migrate-lexeme-tables [ci skip] 2021-07-29 17:19:44 +10:00
Adriane Boyd
9e9611233f Remove labels from textcat component config example (#8815) 2021-07-27 13:15:33 +02:00
Paul O'Leary McCann
de5bc8a0e1 Update subset/superset docs (#8795)
* Update subset/superset docs

* Update website/docs/usage/rule-based-matching.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-27 13:15:27 +02:00
Ines Montani
cf3855ae05 Merge pull request #8806 from Ledenel/master [ci skip]
fix typo
2021-07-27 12:15:44 +10:00
Ines Montani
5c762e08d7 Merge pull request #8808 from kevinlu1248/master [ci skip]
Changed a CLI command in data-formats.md due to erroneous information
2021-07-27 12:15:35 +10:00
svlandeg
f4f270940c Merge remote-tracking branch 'upstream/master' into spacy.io 2021-07-20 16:14:16 +02:00
Adriane Boyd
d48c01a6f7
Remove extraneous grc test file (#8768) 2021-07-20 15:51:15 +02:00
Sofie Van Landeghem
ffaead8fe0
bump to 3.1.1 2021-07-19 14:48:27 +02:00
Sofie Van Landeghem
83e27d262e
negative tag annotation (#8731)
* unit test to unlearn tag via negative annotation

* bump thinc to 8.0.8
2021-07-19 14:39:11 +02:00
Adriane Boyd
0e4b96c97e
Update lexeme ranks for loaded vectors (#8640)
Update the ranks for any lexemes that have been added to the vocab
before the vectors are added to the model.
2021-07-19 18:25:54 +10:00
Adriane Boyd
e532c69475
Update Language.replace_pipe for disabled components (#8729)
* Fix the index where the replacement in inserted to account for
disabled components
* Allow `Language.replace_pipe` to replace disabled components
2021-07-19 18:06:12 +10:00
Kenneth Enevoldsen
2880ae70b0 removed outdated spacy version for spacymoji
From the documentation of spacymoji (and the requirements.txt) it seems like it is not only for version 2.
2021-07-18 19:19:55 +09:00
Kenneth Enevoldsen
812746464b fixed GitHub link and thumbnail
Sorry, I seem to have misunderstood that the GitHub reference shouldn't be a link.
2021-07-18 19:19:37 +09:00
Paul O'Leary McCann
d717593eb7
Merge pull request #8754 from KennethEnevoldsen/patch-1
[minor] removed outdated spacy version for spacymoji
2021-07-18 19:17:33 +09:00
Paul O'Leary McCann
ac67639eaf
Merge pull request #8755 from KennethEnevoldsen/patch-2
fixed GitHub link and thumbnail
2021-07-18 19:14:57 +09:00
Kenneth Enevoldsen
5d6aed0773
fixed GitHub link and thumbnail
Sorry, I seem to have misunderstood that the GitHub reference shouldn't be a link.
2021-07-18 10:22:00 +02:00
Ines Montani
f90482d077 Tidy up and auto-format 2021-07-18 15:44:56 +10:00
Ines Montani
98cf872e11 Fix JSON [ci skip] 2021-07-18 13:21:43 +10:00
Ines Montani
313f55e560 Fix JSON [ci skip] 2021-07-18 13:21:33 +10:00
Ines Montani
a792e1119f Merge pull request #8702 from KennethEnevoldsen/master [ci skip] 2021-07-18 13:19:09 +10:00
Ines Montani
51e5903d6f
Merge pull request #8702 from KennethEnevoldsen/master [ci skip] 2021-07-18 13:18:42 +10:00
Kenneth Enevoldsen
8546948fba
removed outdated spacy version for spacymoji
From the documentation of spacymoji (and the requirements.txt) it seems like it is not only for version 2.
2021-07-17 15:19:43 +02:00
Kenneth Enevoldsen
a0e0ccdb46
Update website/meta/universe.json
Co-authored-by: Ines Montani <ines@ines.io>
2021-07-17 07:14:46 +02:00
Ines Montani
c0f436efbc
Merge pull request #8735 from explosion/autoblack 2021-07-17 13:46:17 +10:00
Ines Montani
483f3175cb Tidy up [ci skip] 2021-07-17 13:43:15 +10:00
Ines Montani
15e6578f7d
Adjust formatting 2021-07-17 10:49:13 +10:00
Mario Šaško
47c5a63a83 Add TakeLab/spacy-udpipe to Universe (#8698)
* Add TakeLab/spacy-udpipe to universe

* Add SCA

* Sign SCA
2021-07-16 11:18:09 +02:00
Mario Šaško
1ba2e8a646
Add TakeLab/spacy-udpipe to Universe (#8698)
* Add TakeLab/spacy-udpipe to universe

* Add SCA

* Sign SCA
2021-07-16 11:15:52 +02:00
explosion-bot
eff3d1088b Auto-format code with black 2021-07-16 08:03:36 +00:00
Adriane Boyd
e76e2addd1 Remove TrainablePipe as base class for Lemmatizer in API docs (#8725) 2021-07-15 16:42:14 +02:00
Adriane Boyd
f5acc48111
Remove TrainablePipe as base class for Lemmatizer in API docs (#8725) 2021-07-15 16:41:36 +02:00
Adriane Boyd
ac45c7c045
Add pre-commit to ignored requirements (#8728) 2021-07-15 16:41:15 +02:00
jmyerston
993b0fab0e
Added ancient Greek language support (#8606)
* Add ancient Greek language support

Initial commit

* Contributor Agreement

* grc tokenizer test added  and files formatted with black, unnecessary import removed

Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Commas in lists fixed. __init__py added to test

* Update lex_attrs.py

* Update stop_words.py

* Update stop_words.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-15 10:27:17 +02:00
Sofie Van Landeghem
77859beb99
spacy.ngram_range_suggester.v1 (#8699) 2021-07-15 10:01:22 +02:00
Julien Rossi
e117573822
Adding noun_chunks to the DUTCH language model (nl) (#8529)
*  implement noun_chunks for dutch language

* copy/paste FR and SV syntax iterators to accomodate UD tags
* added tests with dutch text
* signed contributor agreement

* 🐛 fix noun chunks generator

* built from scratch
* define noun chunk as a single Noun-Phrase
* includes some corner cases debugging (incorrect POS tagging)
* test with provided annotated sample (POS, DEP)

*  fix failing test

* CI pipeline did not like the added sample file
* add the sample as a pytest fixture

* Update spacy/lang/nl/syntax_iterators.py

* Update spacy/lang/nl/syntax_iterators.py

Code readability

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/tests/lang/nl/test_noun_chunks.py

correct comment

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* finalize code

* change "if next_word" into "if next_word is not None"

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-14 14:01:02 +02:00
Ines Montani
8ca6c58625 Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip]
Update spacy-stanza universe.json
2021-07-13 19:03:56 +10:00
Ines Montani
2a8eeed5da
Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip]
Update spacy-stanza universe.json
2021-07-13 19:03:42 +10:00
thomashacker
aafb89df78 Update universe.json code_example 2021-07-13 10:22:49 +02:00
KennethEnevoldsen
e5127992a0 added agreement 2021-07-13 10:11:02 +02:00
Kenneth Enevoldsen
94ce904e10
added missing comma 2021-07-13 09:59:34 +02:00