spaCy/spacy/cli
adrianeboyd b841d3fe75 Add a tagger-based SentenceRecognizer (#4713)
* Add sent_starts to GoldParse

* Add SentTagger pipeline component

Add `SentTagger` pipeline component as a subclass of `Tagger`.

* Model reduces default parameters from `Tagger` to be small and fast
* Hard-coded set of two labels:
  * S (1): token at beginning of sentence
  * I (0): all other sentence positions
* Sets `token.sent_start` values

* Add sentence segmentation to Scorer

Report `sent_p/r/f` for sentence boundaries, which may be provided by
various pipeline components.

* Add sentence segmentation to CLI evaluate

* Add senttagger metrics/scoring to train CLI

* Rename SentTagger to SentenceRecognizer

* Add SentenceRecognizer to spacy.pipes imports

* Add SentenceRecognizer serialization test

* Shorten component name to sentrec

* Remove duplicates from train CLI output metrics
2019-11-28 11:10:07 +01:00
..
converters Update conllu2json MISC column handling (#4715) 2019-11-26 16:10:08 +01:00
__init__.py Move UD scripts to bin 2019-03-20 01:19:34 +01:00
_schemas.py Store JSON schemas in Python and tidy up (#3235) 2019-02-07 19:44:31 +11:00
convert.py Auto-format [ci skip] 2019-10-24 16:21:08 +02:00
debug_data.py Fix minor issues in debug-data (#4636) 2019-11-13 15:25:03 +01:00
download.py Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
evaluate.py Add a tagger-based SentenceRecognizer (#4713) 2019-11-28 11:10:07 +01:00
info.py Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
init_model.py Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
link.py Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
package.py Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
pretrain.py Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
profile.py Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
train.py Add a tagger-based SentenceRecognizer (#4713) 2019-11-28 11:10:07 +01:00
validate.py Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00