spaCy/spacy/lang
Adriane Boyd fe5f5d6ac6
Update Catalan tokenizer (#9297)
* Update Makefile

For more recent python version

* updated for bsc changes

New tokenization changes

* Update test_text.py

* updating tests and requirements

* changed failed test in test/lang/ca

changed failed test in test/lang/ca

* Update .gitignore

deleted stashed changes line

* back to python 3.6 and remove transformer requirements

As per request

* Update test_exception.py

Change the test

* Update test_exception.py

Remove test print

* Update Makefile

For more recent python version

* updated for bsc changes

New tokenization changes

* updating tests and requirements

* Update requirements.txt

Removed spacy-transfromers from requirements

* Update test_exception.py

Added final punctuation to ensure consistency

* Update Makefile

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Format

* Update test to check all tokens

Co-authored-by: cayorodriguez <crodriguezp@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-27 14:42:30 +02:00
..
af Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
am Update Tigrinya ትግርኛ language support (#8900) 2021-08-10 13:55:08 +02:00
ar Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
az Fix Azerbaijani init, extend lang init tests (#8656) 2021-07-09 15:36:35 +02:00
bg Improve the stop words and the tokenizer exceptions in Bulgarian language. (#8862) 2021-08-10 13:44:23 +02:00
bn Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
ca Update Catalan tokenizer (#9297) 2021-09-27 14:42:30 +02:00
cs Tidy up and auto-format 2021-01-05 13:41:53 +11:00
da Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3 2021-01-14 11:49:58 +01:00
de Merge branch 'develop' into master-tmp 2020-10-04 14:52:20 +02:00
el Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
en Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
es Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
et Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
eu Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
fa Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
fi Tidy up code 2021-06-28 12:08:15 +02:00
fr Merge remote-tracking branch 'upstream/master' into develop 2021-09-27 09:10:45 +02:00
ga Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
grc Remove extraneous grc test file (#8768) 2021-07-20 15:51:15 +02:00
gu Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
he raise NotImplementedError when noun_chunks iterator is not implemented (#6711) 2021-01-17 19:56:05 +08:00
hi Auto-format [ci skip] 2020-10-15 10:08:53 +02:00
hr Remove tag map 2020-12-09 11:13:49 +11:00
hu Fix Hungarian % tokenization (#6013) 2020-09-02 13:06:16 +02:00
hy Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
id Merge branch 'develop' into master-tmp 2020-10-04 14:52:20 +02:00
is Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
it Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
ja Update custom tokenizer APIs and pickling (#8972) 2021-08-19 14:37:47 +02:00
kn Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
ko Update custom tokenizer APIs and pickling (#8972) 2021-08-19 14:37:47 +02:00
ky Tidy up and auto-format 2021-01-30 12:52:33 +11:00
lb Remove default initialize lookups 2020-10-01 21:54:33 +02:00
lij Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
lt Fix escape sequence 2021-01-30 12:39:58 +11:00
lv Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
mk Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
ml Add missing lex_attr_getters (resolves #5806 ) 2020-07-25 12:55:18 +02:00
mr Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
nb Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
ne Remove unicode declarations and update language data 2020-09-04 13:19:16 +02:00
nl Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
pl Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
pt Tidy up and auto-format 2021-01-15 11:57:36 +11:00
ro Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3 2021-01-14 11:49:58 +01:00
ru Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
sa Tidy up and auto-format 2020-09-29 21:39:28 +02:00
si Updating the stop word list for Sinhala language (#9270) 2021-09-22 20:43:42 +02:00
sk Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
sl Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
sq Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
sr Remove default initialize lookups 2020-10-01 21:54:33 +02:00
sv Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
ta Merge branch 'develop' into master-tmp 2020-10-15 09:06:03 +02:00
te Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
th Update custom tokenizer APIs and pickling (#8972) 2021-08-19 14:37:47 +02:00
ti Update Tigrinya ትግርኛ language support (#8900) 2021-08-10 13:55:08 +02:00
tl Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
tn Tidy up and auto-format 2021-02-13 12:55:56 +11:00
tr Tidy up and auto-format 2021-01-05 13:41:53 +11:00
tt Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
uk Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
ur Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
vi Update custom tokenizer APIs and pickling (#8972) 2021-08-19 14:37:47 +02:00
xx Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
yo Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
zh Update custom tokenizer APIs and pickling (#8972) 2021-08-19 14:37:47 +02:00
__init__.py Remove imports in /lang/__init__.py 2017-05-08 23:58:07 +02:00
char_classes.py Add all symbols in Unicode Currency Symbols block (#8212) 2021-05-31 18:03:40 +10:00
lex_attrs.py Use tokenizer URL_MATCH pattern in LIKE_URL (#8765) 2021-07-27 12:07:01 +02:00
norm_exceptions.py Tidy up and auto-format 2020-02-18 15:38:18 +01:00
punctuation.py Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
tokenizer_exceptions.py Tidy up with flake8: imports, comparisons, etc. 2021-06-28 12:08:15 +02:00