spaCy/spacy
Jacobo Myerston 3e8bc1272f
add punctuation to grc (#11426)
* add punctuation to grc

Add support for special editorial punctuation that is common in ancient Greek texts.  Ancient Greek texts, as found in digital and print form, have been largely edited by scholars. Restorations and improvements are normally marked with special characters that need to be handled properly by the tokenizer.

* add unit tests

* simplify regex

* move generic quotes to char classes

* rename unit test

* fix regex

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: svlandeg <svlandeg@github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-09-27 11:38:56 +02:00
..
cli Add a way to get the URL to download a pipeline to the CLI (#11175) 2022-09-02 11:58:21 +02:00
displacy Docs: displaCy documentation - data types, parse_{deps,ents,spans}, spans example (#10950) 2022-08-16 11:23:34 -04:00
kb Refactor KB for easier customization (#11268) 2022-09-08 10:38:07 +02:00
lang add punctuation to grc (#11426) 2022-09-27 11:38:56 +02:00
matcher Better handling of unexpected types in SetPredicate (#11312) 2022-09-02 09:09:48 +02:00
ml Refactor KB for easier customization (#11268) 2022-09-08 10:38:07 +02:00
pipeline Prevent tok2vec to broadcast to listeners when predicting (#11385) 2022-09-12 15:36:48 +02:00
tests add punctuation to grc (#11426) 2022-09-27 11:38:56 +02:00
tokens SpanGroup(s)-related optimizations (#11380) 2022-08-31 09:03:20 +02:00
training Add ConsoleLogger.v2 (#11214) 2022-08-29 10:23:05 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Allow string argument for disable/enable/exclude (#11406) 2022-08-31 09:02:34 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.4.1 (#11209) 2022-07-26 12:52:38 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Intify IOB (#9738) 2022-01-20 13:19:38 +01:00
compat.py Custom component types in spacy.ty (#9469) 2021-10-21 15:31:06 +02:00
default_config_pretraining.cfg Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
default_config.cfg Add a few docs to the default_config.cfg (#9981) 2022-01-05 09:16:40 +01:00
errors.py Prevent tok2vec to broadcast to listeners when predicting (#11385) 2022-09-12 15:36:48 +02:00
glossary.py Add glossary entry for root (#10821) 2022-05-20 09:56:32 +02:00
language.py Check for any non-Doc returned value for components (#11424) 2022-09-01 19:37:23 +02:00
lexeme.pxd Fix Lexeme.from_ptr 2020-08-10 16:43:37 +02:00
lexeme.pyi fix type of lexeme.rank (#9979) 2022-01-04 13:15:25 +01:00
lexeme.pyx Bugfix for similarity return types (#10051) 2022-01-20 11:40:46 +01:00
lookups.py Fix issues for Mypy 0.950 and Pydantic 1.9.0 (#10786) 2022-05-25 09:33:54 +02:00
morphology.pxd Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
morphology.pyx Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Support custom attributes for tokens and spans in json conversion (#11125) 2022-08-23 10:05:02 +02:00
scorer.py Alignment: use a simplified ragged type for performance (#10319) 2022-04-01 09:02:06 +02:00
strings.pxd StringStore-related optimizations (#10938) 2022-07-04 15:04:03 +02:00
strings.pyi Fix StringStore.__getitem__ return type depending on parameter types (#10741) 2022-05-03 17:57:07 +02:00
strings.pyx StringStore-related optimizations (#10938) 2022-07-04 15:04:03 +02:00
structs.pxd Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
symbols.pxd introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
symbols.pyx introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
tokenizer.pxd Add tokenizer option to allow Matcher handling for all rules (#10452) 2022-03-24 13:21:32 +01:00
tokenizer.pyx Add tokenizer option to allow Matcher handling for all rules (#10452) 2022-03-24 13:21:32 +01:00
ty.py Custom component types in spacy.ty (#9469) 2021-10-21 15:31:06 +02:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Allow string argument for disable/enable/exclude (#11406) 2022-08-31 09:02:34 +02:00
vectors.pyx vectors: avoid expensive comparisons between numpy ints and Python ints (#10992) 2022-06-29 12:58:31 +02:00
vocab.pxd Add support for floret vectors (#8909) 2021-10-27 14:08:31 +02:00
vocab.pyi Add vector deduplication (#10551) 2022-03-30 08:54:23 +02:00
vocab.pyx Add vector deduplication (#10551) 2022-03-30 08:54:23 +02:00