spaCy/spacy/tests/lang
Julien Rossi e117573822
Adding noun_chunks to the DUTCH language model (nl) (#8529)
*  implement noun_chunks for dutch language

* copy/paste FR and SV syntax iterators to accomodate UD tags
* added tests with dutch text
* signed contributor agreement

* 🐛 fix noun chunks generator

* built from scratch
* define noun chunk as a single Noun-Phrase
* includes some corner cases debugging (incorrect POS tagging)
* test with provided annotated sample (POS, DEP)

*  fix failing test

* CI pipeline did not like the added sample file
* add the sample as a pytest fixture

* Update spacy/lang/nl/syntax_iterators.py

* Update spacy/lang/nl/syntax_iterators.py

Code readability

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/tests/lang/nl/test_noun_chunks.py

correct comment

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* finalize code

* change "if next_word" into "if next_word is not None"

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-14 14:01:02 +02:00
..
am Tidy up and auto-format 2021-01-15 11:57:36 +11:00
ar Remove POS, TAG and LEMMA from tokenizer exceptions 2020-07-22 23:09:01 +02:00
bg Tidy up with flake8: imports, comparisons, etc. 2021-06-28 12:08:15 +02:00
bn Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ca Auto-format code with black 2021-07-02 07:48:26 +00:00
cs Remove unicode declarations and update language data 2020-09-04 13:19:16 +02:00
da Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3 2021-01-14 11:49:58 +01:00
de Tidy up and auto-format 2020-09-29 21:39:28 +02:00
el Tidy up and auto-format 2020-09-29 21:39:28 +02:00
en Fix/fix en ordinals (#8028) 2021-05-07 10:26:42 +02:00
es Tidy up and auto-format 2020-09-29 21:39:28 +02:00
eu Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
fa Tidy up and auto-format 2020-09-29 21:39:28 +02:00
fi Tidy up code 2021-06-28 12:08:15 +02:00
fr Tidy up and auto-format 2020-09-29 21:39:28 +02:00
ga Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
gu Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
he Merge branch 'develop' into master-tmp 2020-09-04 13:15:36 +02:00
hi Auto-format [ci skip] 2020-10-15 10:08:53 +02:00
hu Tidy up and auto-format 2020-03-25 12:28:12 +01:00
hy Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
id Tidy up and auto-format 2020-09-29 21:39:28 +02:00
it Auto-format code with black 2021-07-02 07:48:26 +00:00
ja Tidy up and auto-format 2020-09-29 21:39:28 +02:00
ko Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ky Tidy up and auto-format 2021-01-30 12:52:33 +11:00
lb Remove POS, TAG and LEMMA from tokenizer exceptions 2020-07-22 23:09:01 +02:00
lt Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
mk Tidy up and auto-format 2021-01-05 13:41:53 +11:00
ml Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
nb Tidy up and auto-format 2020-09-29 21:39:28 +02:00
ne Tidy up and auto-format 2020-09-29 21:39:28 +02:00
nl Adding noun_chunks to the DUTCH language model (nl) (#8529) 2021-07-14 14:01:02 +02:00
pl Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
pt Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ro Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ru Tidy up tests and docs 2020-09-21 20:43:54 +02:00
sa Tidy up and auto-format 2020-09-29 21:39:28 +02:00
sr Un-xfail passing tests 2019-12-25 18:02:20 +01:00
sv Tidy up and auto-format 2020-09-29 21:39:28 +02:00
th Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ti Tidy up and auto-format 2021-01-15 11:57:36 +11:00
tr Tidy up and auto-format 2021-01-05 13:41:53 +11:00
tt Merge branch 'master' into develop 2020-02-18 14:47:23 +01:00
uk Tidy up with flake8: imports, comparisons, etc. 2021-06-28 12:08:15 +02:00
ur Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
vi Tidy up with flake8: imports, comparisons, etc. 2021-06-28 12:08:15 +02:00
yo Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
zh Tidy up and auto-format 2020-10-03 17:20:18 +02:00
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_attrs.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
test_initialize.py Fix Azerbaijani init, extend lang init tests (#8656) 2021-07-09 15:36:35 +02:00
test_lemmatizers.py Update Catalan language data (#8308) 2021-06-11 10:21:22 +02:00