spaCy/spacy/lang/ca
Adriane Boyd b98d216205
Update Catalan language data (#8308)
* Update Catalan language data

Update Catalan language data based on contributions from the Text Mining
Unit at the Barcelona Supercomputing Center:

https://github.com/TeMU-BSC/spacy4release/tree/main/lang_data

* Update tokenizer settings for UD Catalan AnCora

Update for UD Catalan AnCora v2.7 with merged multi-word tokens.

* Update test

* Move prefix patternt to more generic infix pattern

* Clean up
2021-06-11 10:21:22 +02:00
..
__init__.py Update Catalan language data (#8308) 2021-06-11 10:21:22 +02:00
examples.py Tidy up and auto-format 2020-02-18 15:38:18 +01:00
lemmatizer.py Update Catalan language data (#8308) 2021-06-11 10:21:22 +02:00
lex_attrs.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
punctuation.py Update Catalan language data (#8308) 2021-06-11 10:21:22 +02:00
stop_words.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
syntax_iterators.py Update Catalan language data (#8308) 2021-06-11 10:21:22 +02:00
tokenizer_exceptions.py Update Catalan language data (#8308) 2021-06-11 10:21:22 +02:00