spaCy/spacy
Paul O'Leary McCann d1a221a374
Add all symbols in Unicode Currency Symbols block (#8212)
* Add all symbols in Unicode Currency Symbols block

In #8102 it came up that the rupee symbol was treated different from
dollar / euro / yen symbols. This adds many symbols not already
included.

* Fix test

* Fix training test
2021-05-31 18:03:40 +10:00
..
cli Update debug data for textcat (#8066) 2021-05-17 13:27:04 +02:00
displacy Also exclude user hooks in displacy conversion (#7419) 2021-03-12 09:41:59 +01:00
lang Add all symbols in Unicode Currency Symbols block (#8212) 2021-05-31 18:03:40 +10:00
matcher Fix span offsets for Matcher(as_spans) on spans (#7992) 2021-05-06 18:42:44 +10:00
ml make EntityLinker robust for nO=None (#7930) 2021-05-06 18:14:47 +10:00
pipeline KB & NEL to/from bytes (#8113) 2021-05-20 18:11:30 +10:00
tests Add all symbols in Unicode Currency Symbols block (#8212) 2021-05-31 18:03:40 +10:00
tokens Fix offsets in Span.get_lca_matrix (#8116) 2021-05-17 16:54:23 +02:00
training fix docs (#8200) 2021-05-27 10:48:59 +02:00
__init__.pxd
__init__.py Add vocab kwarg back to spacy.load 2021-03-11 10:58:59 +01:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.0.6 (#7854) 2021-04-22 16:33:26 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
compat.py Use Literal type for nr_feature_tokens 2020-09-23 16:00:03 +02:00
default_config_pretraining.cfg pretrain architectures (#6451) 2020-12-08 14:41:03 +08:00
default_config.cfg Support large/infinite training corpora (#7208) 2021-04-08 18:08:04 +10:00
errors.py Custom warning if the doc_bin is too large (#8069) 2021-05-17 15:48:40 +02:00
glossary.py Add Chinese PTB tags to glossary (#7993) 2021-05-06 18:43:03 +10:00
kb.pxd Revert added_strings change (#6236) 2020-10-10 18:55:07 +02:00
kb.pyx KB & NEL to/from bytes (#8113) 2021-05-20 18:11:30 +10:00
language.py Merge pull request #8112 from svlandeg/bugfix/replace-trf 2021-05-28 11:35:17 +10:00
lexeme.pxd Fix Lexeme.from_ptr 2020-08-10 16:43:37 +02:00
lexeme.pyx reduce memory load when reading all vectors from file (#6945) 2021-02-07 08:05:43 +08:00
lookups.py Update load_lookups return type and docstring (#7907) 2021-04-27 09:13:39 +02:00
morphology.pxd Add Lemmatizer and simplify related components (#5848) 2020-08-07 15:27:13 +02:00
morphology.pyx Prevent 0-length mem alloc (#6653) 2021-01-06 12:50:17 +11:00
parts_of_speech.pxd
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py Tidy up and auto-format 2020-09-29 21:39:28 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Support env vars and CLI overrides for project.yml 2021-02-10 13:45:27 +11:00
scorer.py Extend score_spans for overlapping & non-labeled spans (#7209) 2021-04-08 12:19:17 +02:00
strings.pxd Remove 'cleanup' of strings (#6007) 2020-09-01 16:12:15 +02:00
strings.pyx Make vocab update in get_docs deterministic (#7603) 2021-04-09 11:53:13 +02:00
structs.pxd Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
symbols.pxd introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
symbols.pyx introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
tokenizer.pxd Fix tokenizer cache flushing (#7836) 2021-04-22 18:14:57 +10:00
tokenizer.pyx Fix tokenizer cache flushing (#7836) 2021-04-22 18:14:57 +10:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx
util.py Refactor util.to_ternary_int (#7944) 2021-04-29 16:58:54 +02:00
vectors.pyx Fix vectors data on GPU (#7626) 2021-04-19 18:30:03 +10:00
vocab.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
vocab.pyx Skip vector ngram backoff if minn is not set (#7925) 2021-05-06 18:34:35 +10:00