..
af
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
am
Update Tigrinya ትግርኛ language support ( #8900 )
2021-08-10 13:55:08 +02:00
ar
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
az
Fix Azerbaijani init, extend lang init tests ( #8656 )
2021-07-09 15:36:35 +02:00
bg
Improve the stop words and the tokenizer exceptions in Bulgarian language. ( #8862 )
2021-08-10 13:44:23 +02:00
bn
Refactor scoring methods to use registered functions ( #8766 )
2021-08-10 15:13:39 +02:00
ca
Update Catalan tokenizer ( #9297 )
2021-09-27 14:42:30 +02:00
cs
Tidy up and auto-format
2021-01-05 13:41:53 +11:00
da
Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3
2021-01-14 11:49:58 +01:00
de
Merge branch 'develop' into master-tmp
2020-10-04 14:52:20 +02:00
el
Refactor scoring methods to use registered functions ( #8766 )
2021-08-10 15:13:39 +02:00
en
Update Cython string types ( #9143 )
2021-09-13 17:02:17 +02:00
es
Refactor scoring methods to use registered functions ( #8766 )
2021-08-10 15:13:39 +02:00
et
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
eu
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
fa
Refactor scoring methods to use registered functions ( #8766 )
2021-08-10 15:13:39 +02:00
fi
Tidy up code
2021-06-28 12:08:15 +02:00
fr
Merge remote-tracking branch 'upstream/master' into develop
2021-09-27 09:10:45 +02:00
ga
Add an Irish lemmatiser, based on BuNaMo ( #9102 )
2021-09-30 14:18:47 +02:00
grc
Remove extraneous grc test file ( #8768 )
2021-07-20 15:51:15 +02:00
gu
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
he
raise NotImplementedError when noun_chunks iterator is not implemented ( #6711 )
2021-01-17 19:56:05 +08:00
hi
Auto-format [ci skip]
2020-10-15 10:08:53 +02:00
hr
Remove tag map
2020-12-09 11:13:49 +11:00
hu
Fix Hungarian % tokenization ( #6013 )
2020-09-02 13:06:16 +02:00
hy
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
id
Merge branch 'develop' into master-tmp
2020-10-04 14:52:20 +02:00
is
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
it
Refactor scoring methods to use registered functions ( #8766 )
2021-08-10 15:13:39 +02:00
ja
Change JA inflection separator to semicolon
2021-10-07 17:28:15 +09:00
kn
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
ko
Update custom tokenizer APIs and pickling ( #8972 )
2021-08-19 14:37:47 +02:00
ky
Tidy up and auto-format
2021-01-30 12:52:33 +11:00
lb
Remove default initialize lookups
2020-10-01 21:54:33 +02:00
lij
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
lt
Fix escape sequence
2021-01-30 12:39:58 +11:00
lv
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
mk
Refactor scoring methods to use registered functions ( #8766 )
2021-08-10 15:13:39 +02:00
ml
Add missing lex_attr_getters ( resolves #5806 )
2020-07-25 12:55:18 +02:00
mr
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
nb
Refactor scoring methods to use registered functions ( #8766 )
2021-08-10 15:13:39 +02:00
ne
Remove unicode declarations and update language data
2020-09-04 13:19:16 +02:00
nl
Refactor scoring methods to use registered functions ( #8766 )
2021-08-10 15:13:39 +02:00
pl
Refactor scoring methods to use registered functions ( #8766 )
2021-08-10 15:13:39 +02:00
pt
Tidy up and auto-format
2021-01-15 11:57:36 +11:00
ro
Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3
2021-01-14 11:49:58 +01:00
ru
Refactor scoring methods to use registered functions ( #8766 )
2021-08-10 15:13:39 +02:00
sa
Tidy up and auto-format
2020-09-29 21:39:28 +02:00
si
Updating the stop word list for Sinhala language ( #9270 )
2021-09-22 20:43:42 +02:00
sk
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
sl
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
sq
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
sr
Remove default initialize lookups
2020-10-01 21:54:33 +02:00
sv
Refactor scoring methods to use registered functions ( #8766 )
2021-08-10 15:13:39 +02:00
ta
Merge branch 'develop' into master-tmp
2020-10-15 09:06:03 +02:00
te
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
th
Update custom tokenizer APIs and pickling ( #8972 )
2021-08-19 14:37:47 +02:00
ti
Update Tigrinya ትግርኛ language support ( #8900 )
2021-08-10 13:55:08 +02:00
tl
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
tn
Tidy up and auto-format
2021-02-13 12:55:56 +11:00
tr
Tidy up and auto-format
2021-01-05 13:41:53 +11:00
tt
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
uk
Refactor scoring methods to use registered functions ( #8766 )
2021-08-10 15:13:39 +02:00
ur
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
vi
Update custom tokenizer APIs and pickling ( #8972 )
2021-08-19 14:37:47 +02:00
xx
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
yo
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
zh
Update custom tokenizer APIs and pickling ( #8972 )
2021-08-19 14:37:47 +02:00
__init__.py
Remove imports in /lang/__init__.py
2017-05-08 23:58:07 +02:00
char_classes.py
Add all symbols in Unicode Currency Symbols block ( #8212 )
2021-05-31 18:03:40 +10:00
lex_attrs.py
Use tokenizer URL_MATCH pattern in LIKE_URL ( #8765 )
2021-07-27 12:07:01 +02:00
norm_exceptions.py
Tidy up and auto-format
2020-02-18 15:38:18 +01:00
punctuation.py
Simplify language data and revert detailed configs
2020-07-24 14:50:26 +02:00
tokenizer_exceptions.py
Tidy up with flake8: imports, comparisons, etc.
2021-06-28 12:08:15 +02:00