spaCy/spacy/lang
Jacobo Myerston 3e8bc1272f
add punctuation to grc (#11426)
* add punctuation to grc

Add support for special editorial punctuation that is common in ancient Greek texts.  Ancient Greek texts, as found in digital and print form, have been largely edited by scholars. Restorations and improvements are normally marked with special characters that need to be handled properly by the tokenizer.

* add unit tests

* simplify regex

* move generic quotes to char classes

* rename unit test

* fix regex

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: svlandeg <svlandeg@github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-09-27 11:38:56 +02:00
..
af
am Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
ar
az
bg Handle Cyrillic combining diacritics (#10837) 2022-06-28 15:35:32 +02:00
bn
ca Fix lookup usage in French/Catalan (fix #11347) (#11382) 2022-08-29 10:32:38 +02:00
cs
da
de
dsb Auto-format code with black (#10479) 2022-03-11 12:20:23 +01:00
el
en Remove English exceptions with mismatched features (#10873) 2022-06-03 09:44:04 +02:00
es Fix some issues in Spanish examples 2022-04-18 22:12:57 +02:00
et
eu
fa Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
fi Add a noun chunker for Finnish (#10214) 2022-02-08 08:44:11 +01:00
fr Fix lookup usage in French/Catalan (fix #11347) (#11382) 2022-08-29 10:32:38 +02:00
ga
grc add punctuation to grc (#11426) 2022-09-27 11:38:56 +02:00
gu
he
hi
hr 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
hsb Auto-format code with black (#10479) 2022-03-11 12:20:23 +01:00
hu
hy 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
id
is
it added ellided forms (#9878) 2021-12-23 13:41:01 +01:00
ja
kn
ko Fix regex invalid escape sequences (#11276) 2022-08-09 10:59:36 +02:00
ky
la Auto-format code with black (#11427) 2022-09-02 11:43:20 +02:00
lb
lg luganda language extension (#10847) 2022-08-23 13:09:36 +02:00
lij 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
lt 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
lv
mk
ml
mr
nb Remove NER words from stop words in Norwegian (#9820) 2021-12-07 09:45:10 +01:00
ne 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
nl Fix Dutch noun chunks to skip overlapping spans (#11275) 2022-08-10 09:49:08 +02:00
pl
pt
ro
ru Switch ru and uk lemmatizers to pymorphy3 (#11345) 2022-08-22 11:27:14 +02:00
sa
si
sk
sl Updates to Slovenian language (#11162) 2022-08-05 10:10:18 +02:00
sq
sr
sv Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
ta
te
th
ti
tl
tn
tr Fixed typo in Turkish lang. (#10582) 2022-03-30 13:16:08 +02:00
tt
uk Switch ru and uk lemmatizers to pymorphy3 (#11345) 2022-08-22 11:27:14 +02:00
ur
vi Auto-format code with black (#10687) 2022-04-22 11:24:53 +02:00
xx fix: Add missing comma to examples.py (#10167) 2022-01-30 16:43:29 +09:00
yo
zh
__init__.py
char_classes.py add punctuation to grc (#11426) 2022-09-27 11:38:56 +02:00
lex_attrs.py
norm_exceptions.py
punctuation.py Handle Cyrillic combining diacritics (#10837) 2022-06-28 15:35:32 +02:00
tokenizer_exceptions.py