spaCy/spacy/lang/nl
Julien Rossi e117573822
Adding noun_chunks to the DUTCH language model (nl) (#8529)
*  implement noun_chunks for dutch language

* copy/paste FR and SV syntax iterators to accomodate UD tags
* added tests with dutch text
* signed contributor agreement

* 🐛 fix noun chunks generator

* built from scratch
* define noun chunk as a single Noun-Phrase
* includes some corner cases debugging (incorrect POS tagging)
* test with provided annotated sample (POS, DEP)

*  fix failing test

* CI pipeline did not like the added sample file
* add the sample as a pytest fixture

* Update spacy/lang/nl/syntax_iterators.py

* Update spacy/lang/nl/syntax_iterators.py

Code readability

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/tests/lang/nl/test_noun_chunks.py

correct comment

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* finalize code

* change "if next_word" into "if next_word is not None"

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-14 14:01:02 +02:00
..
__init__.py Adding noun_chunks to the DUTCH language model (nl) (#8529) 2021-07-14 14:01:02 +02:00
examples.py Tidy up and auto-format 2020-02-18 15:38:18 +01:00
lemmatizer.py Fix Lemmatizer.get_lookups_config 2020-10-03 17:16:10 +02:00
lex_attrs.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
punctuation.py Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
stop_words.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
syntax_iterators.py Adding noun_chunks to the DUTCH language model (nl) (#8529) 2021-07-14 14:01:02 +02:00
tokenizer_exceptions.py Tidy up and move noun_chunks, token_match, url_match 2020-07-22 22:18:46 +02:00