Commit Graph

15332 Commits

Author SHA1 Message Date
Adriane Boyd
6bbc2b1956
Reload train corpus in debug data after initialize (#8776) 2021-07-21 22:38:40 +02:00
Paul O'Leary McCann
1d1679d431 Minor speedup
This continue should be a break. The current form doesn't cause errors
but using a break will be a bit faster.
2021-07-21 19:50:10 +09:00
svlandeg
f4f270940c Merge remote-tracking branch 'upstream/master' into spacy.io 2021-07-20 16:14:16 +02:00
Adriane Boyd
d48c01a6f7
Remove extraneous grc test file (#8768) 2021-07-20 15:51:15 +02:00
Sofie Van Landeghem
ffaead8fe0
bump to 3.1.1 2021-07-19 14:48:27 +02:00
Sofie Van Landeghem
83e27d262e
negative tag annotation (#8731)
* unit test to unlearn tag via negative annotation

* bump thinc to 8.0.8
2021-07-19 14:39:11 +02:00
Adriane Boyd
0e4b96c97e
Update lexeme ranks for loaded vectors (#8640)
Update the ranks for any lexemes that have been added to the vocab
before the vectors are added to the model.
2021-07-19 18:25:54 +10:00
Adriane Boyd
e532c69475
Update Language.replace_pipe for disabled components (#8729)
* Fix the index where the replacement in inserted to account for
disabled components
* Allow `Language.replace_pipe` to replace disabled components
2021-07-19 18:06:12 +10:00
Paul O'Leary McCann
a151c62d13 Add sentence map test 2021-07-19 13:05:26 +09:00
Paul O'Leary McCann
3ed0fae671 Add multi-sentence mention test
Also formatting.
2021-07-19 13:00:16 +09:00
Paul O'Leary McCann
8bd0474730 Run black 2021-07-18 20:20:22 +09:00
Paul O'Leary McCann
bc081c24fa Add full traditional scoring
This calculates scores as an average of three metrics. As noted in the
code, these metrics all have issues, but we want to use them to match up
with prior work.

This should be replaced with some simpler default scoring and the scorer
here should be moved to an external project to be passed in just for
generating the traditional scores.
2021-07-18 20:13:10 +09:00
Kenneth Enevoldsen
2880ae70b0 removed outdated spacy version for spacymoji
From the documentation of spacymoji (and the requirements.txt) it seems like it is not only for version 2.
2021-07-18 19:19:55 +09:00
Kenneth Enevoldsen
812746464b fixed GitHub link and thumbnail
Sorry, I seem to have misunderstood that the GitHub reference shouldn't be a link.
2021-07-18 19:19:37 +09:00
Paul O'Leary McCann
d717593eb7
Merge pull request #8754 from KennethEnevoldsen/patch-1
[minor] removed outdated spacy version for spacymoji
2021-07-18 19:17:33 +09:00
Paul O'Leary McCann
a4531be099 Add simple mention test 2021-07-18 19:15:32 +09:00
Paul O'Leary McCann
ac67639eaf
Merge pull request #8755 from KennethEnevoldsen/patch-2
fixed GitHub link and thumbnail
2021-07-18 19:14:57 +09:00
Kenneth Enevoldsen
5d6aed0773
fixed GitHub link and thumbnail
Sorry, I seem to have misunderstood that the GitHub reference shouldn't be a link.
2021-07-18 10:22:00 +02:00
Ines Montani
f90482d077 Tidy up and auto-format 2021-07-18 15:44:56 +10:00
Ines Montani
98cf872e11 Fix JSON [ci skip] 2021-07-18 13:21:43 +10:00
Ines Montani
313f55e560 Fix JSON [ci skip] 2021-07-18 13:21:33 +10:00
Ines Montani
a792e1119f Merge pull request #8702 from KennethEnevoldsen/master [ci skip] 2021-07-18 13:19:09 +10:00
Ines Montani
51e5903d6f
Merge pull request #8702 from KennethEnevoldsen/master [ci skip] 2021-07-18 13:18:42 +10:00
Kenneth Enevoldsen
8546948fba
removed outdated spacy version for spacymoji
From the documentation of spacymoji (and the requirements.txt) it seems like it is not only for version 2.
2021-07-17 15:19:43 +02:00
Kenneth Enevoldsen
a0e0ccdb46
Update website/meta/universe.json
Co-authored-by: Ines Montani <ines@ines.io>
2021-07-17 07:14:46 +02:00
Ines Montani
c0f436efbc
Merge pull request #8735 from explosion/autoblack 2021-07-17 13:46:17 +10:00
Ines Montani
483f3175cb Tidy up [ci skip] 2021-07-17 13:43:15 +10:00
Ines Montani
15e6578f7d
Adjust formatting 2021-07-17 10:49:13 +10:00
Mario Šaško
47c5a63a83 Add TakeLab/spacy-udpipe to Universe (#8698)
* Add TakeLab/spacy-udpipe to universe

* Add SCA

* Sign SCA
2021-07-16 11:18:09 +02:00
Mario Šaško
1ba2e8a646
Add TakeLab/spacy-udpipe to Universe (#8698)
* Add TakeLab/spacy-udpipe to universe

* Add SCA

* Sign SCA
2021-07-16 11:15:52 +02:00
explosion-bot
eff3d1088b Auto-format code with black 2021-07-16 08:03:36 +00:00
Adriane Boyd
e76e2addd1 Remove TrainablePipe as base class for Lemmatizer in API docs (#8725) 2021-07-15 16:42:14 +02:00
Adriane Boyd
f5acc48111
Remove TrainablePipe as base class for Lemmatizer in API docs (#8725) 2021-07-15 16:41:36 +02:00
Adriane Boyd
ac45c7c045
Add pre-commit to ignored requirements (#8728) 2021-07-15 16:41:15 +02:00
Paul O'Leary McCann
9b63cbb775 Add extract spans import 2021-07-15 18:16:53 +09:00
jmyerston
993b0fab0e
Added ancient Greek language support (#8606)
* Add ancient Greek language support

Initial commit

* Contributor Agreement

* grc tokenizer test added  and files formatted with black, unnecessary import removed

Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Commas in lists fixed. __init__py added to test

* Update lex_attrs.py

* Update stop_words.py

* Update stop_words.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-15 10:27:17 +02:00
Sofie Van Landeghem
77859beb99
spacy.ngram_range_suggester.v1 (#8699) 2021-07-15 10:01:22 +02:00
Julien Rossi
e117573822
Adding noun_chunks to the DUTCH language model (nl) (#8529)
*  implement noun_chunks for dutch language

* copy/paste FR and SV syntax iterators to accomodate UD tags
* added tests with dutch text
* signed contributor agreement

* 🐛 fix noun chunks generator

* built from scratch
* define noun chunk as a single Noun-Phrase
* includes some corner cases debugging (incorrect POS tagging)
* test with provided annotated sample (POS, DEP)

*  fix failing test

* CI pipeline did not like the added sample file
* add the sample as a pytest fixture

* Update spacy/lang/nl/syntax_iterators.py

* Update spacy/lang/nl/syntax_iterators.py

Code readability

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/tests/lang/nl/test_noun_chunks.py

correct comment

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* finalize code

* change "if next_word" into "if next_word is not None"

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-14 14:01:02 +02:00
Paul O'Leary McCann
e9626e38c1 Fix serialization test
This test was failing not because the thing it was testing wasn't
working, but because of the way span equality works. Span equality
relies on doc equality, and doc equality is object identity, so spans
from different docs will never be equal.
2021-07-14 18:37:34 +09:00
Paul O'Leary McCann
4a9dc00d86 Use relative indices for mentions
Was using batch absolute indices to manage mentions, but extract_spans
expects doc-relative ones.
2021-07-14 18:36:18 +09:00
Paul O'Leary McCann
3684f7fdfd Remove comment from fixed test 2021-07-14 18:22:14 +09:00
Paul O'Leary McCann
f1796e4af7 Fix mention list bug
There was an off-by-one error in how mentions are generated that would
affect mentions at the end of a sentence. This was pretty nasty.
2021-07-14 18:19:00 +09:00
Ines Montani
8ca6c58625 Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip]
Update spacy-stanza universe.json
2021-07-13 19:03:56 +10:00
Ines Montani
2a8eeed5da
Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip]
Update spacy-stanza universe.json
2021-07-13 19:03:42 +10:00
thomashacker
aafb89df78 Update universe.json code_example 2021-07-13 10:22:49 +02:00
KennethEnevoldsen
e5127992a0 added agreement 2021-07-13 10:11:02 +02:00
Kenneth Enevoldsen
94ce904e10
added missing comma 2021-07-13 09:59:34 +02:00
Kenneth Enevoldsen
a81fcc81b0
added dacy to universe 2021-07-13 09:54:08 +02:00
Adriane Boyd
f9fd2889b7
Use 0-vector for OOV lexemes (#8639) 2021-07-13 14:48:12 +10:00
Edward
8233359225
Fix preservation of spacy package meta (#8663)
* update package meta with existing_meta and nlp_meta

* Add spaCy contributor agreement

* Added more info when creating readme
2021-07-12 11:18:52 +02:00