spaCy/spacy
adrianeboyd 84e06f9fb7
Improve GoldParse NER alignment (#5335)
Improve GoldParse NER alignment by including all cases where the start
and end of the NER span can be aligned, regardless of internal
tokenization differences.

To do this, convert BILUO tags to character offsets, check start/end
alignment with `doc.char_span()`, and assign the BILUO tags for the
aligned spans. Alignment for `O/-` tags is handled through the
one-to-one and multi alignments.
2020-04-23 16:58:23 +02:00
..
cli Use max(uint64) for OOV lexeme rank (#5303) 2020-04-15 13:49:47 +02:00
data Make spacy/data a package 2017-03-18 20:04:22 +01:00
displacy Tidy up and auto-format 2020-03-25 12:28:12 +01:00
lang Modify jieba install message (#5328) 2020-04-20 22:06:53 +02:00
matcher Matcher support for Span as well as Doc (#5113) 2020-04-15 13:51:33 +02:00
ml Replace function registries with catalogue (#4584) 2019-11-07 11:45:22 +01:00
pipeline Add ideographic stops to sentencizer (#5263) 2020-04-08 12:58:39 +02:00
syntax prevent updating cfg if the Model was already defined (#5078) 2020-03-03 13:58:56 +01:00
tests Improve GoldParse NER alignment (#5335) 2020-04-23 16:58:23 +02:00
tokens additional information if doc is empty 2020-03-09 18:08:18 +01:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Replace function registries with catalogue (#4584) 2019-11-07 11:45:22 +01:00
__main__.py Use latest wasabi 2019-11-04 02:38:45 +01:00
_ml.py Use max(uint64) for OOV lexeme rank (#5303) 2020-04-15 13:49:47 +02:00
about.py Set version to v2.2.4 2020-03-12 11:30:41 +01:00
analysis.py Support span._. in component decorator attrs (#4555) 2019-10-30 17:19:36 +01:00
attrs.pxd make idx available via to_array (#5030) 2020-02-22 14:13:06 +01:00
attrs.pyx make idx available via to_array (#5030) 2020-02-22 14:13:06 +01:00
compat.py Replace function registries with catalogue (#4584) 2019-11-07 11:45:22 +01:00
errors.py Improve GoldParse NER alignment (#5335) 2020-04-23 16:58:23 +02:00
glossary.py Update tag maps and docs for English and German (#4501) 2019-10-24 12:56:05 +02:00
gold.pxd Merge changes from master 2019-08-21 14:18:52 +02:00
gold.pyx Improve GoldParse NER alignment (#5335) 2020-04-23 16:58:23 +02:00
kb.pxd rename entity frequency 2019-07-19 17:40:28 +02:00
kb.pyx More robust set entities method in KB (#4794) 2019-12-13 10:45:29 +01:00
language.py Add pkuseg and serialization support for Chinese (#5308) 2020-04-18 17:01:53 +02:00
lemmatizer.py Remove duplicated branch in if/else-if statement (#5234) 2020-04-02 14:47:42 +02:00
lexeme.pxd Use max(uint64) for OOV lexeme rank (#5303) 2020-04-15 13:49:47 +02:00
lexeme.pyx Use max(uint64) for OOV lexeme rank (#5303) 2020-04-15 13:49:47 +02:00
lookups.py Refactor lemmatizer and data table integration (#4353) 2019-10-01 21:36:03 +02:00
morphology.pxd annotate kb_id through ents in doc 2019-03-22 11:36:44 +01:00
morphology.pyx Improve Morphology errors (#4314) 2019-09-21 14:37:06 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
scorer.py Fix GoldParse init when token count differs (#5191) 2020-03-26 10:46:23 +01:00
strings.pxd Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
strings.pyx Merge branch 'master' into feature/lemmatizer 2019-03-16 13:44:22 +01:00
structs.pxd Replace Entity/MatchStruct with SpanC (#4459) 2019-10-18 11:01:47 +02:00
symbols.pxd make idx available via to_array (#5030) 2020-02-22 14:13:06 +01:00
symbols.pyx make idx available via to_array (#5030) 2020-02-22 14:13:06 +01:00
tokenizer.pxd Flush tokenizer cache when necessary (#4258) 2019-09-08 20:52:46 +02:00
tokenizer.pyx Use inline flags in token_match patterns (#5257) 2020-04-06 13:19:04 +02:00
typedefs.pxd Work on changing StringStore to return hashes. 2017-05-28 12:36:27 +02:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Improve GoldParse NER alignment (#5335) 2020-04-23 16:58:23 +02:00
vectors.pyx Raise error for inplace resize with new vector dim (#5228) 2020-04-02 10:43:13 +02:00
vocab.pxd 💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167) 2019-08-22 14:21:32 +02:00
vocab.pyx Use max(uint64) for OOV lexeme rank (#5303) 2020-04-15 13:49:47 +02:00