1
1
mirror of https://github.com/explosion/spaCy.git synced 2025-04-26 20:03:40 +03:00
spaCy/spacy
Adriane Boyd cd6bd91c3a
Switch default train corpus max_length to 0 in quickstart ()
The behavior of `spacy.Corpus.v1` is unexpected enough for `max_length
!= 0` that `0` is a better default for users creating a new config with
the quickstart.

If not, documents are skipped, sometimes the entire corpus is skipped,
and sometimes documents are (quite unexpectedly for your average user)
split into sentences.
2021-05-20 14:48:09 +02:00
..
cli Switch default train corpus max_length to 0 in quickstart () 2021-05-20 14:48:09 +02:00
displacy Also exclude user hooks in displacy conversion () 2021-03-12 09:41:59 +01:00
lang Update Vietnamese tokenizer () 2021-05-17 18:16:20 +10:00
matcher Support match alignments () 2021-04-08 18:10:14 +10:00
ml Set up GPU CI testing () 2021-04-22 14:58:29 +02:00
pipeline Replace negative rows with 0 in StaticVectors () 2021-04-22 18:04:15 +10:00
tests Update Vietnamese tokenizer () 2021-05-17 18:16:20 +10:00
tokens Fix/update extension copying in Span.as_doc and Doc.from_docs () 2021-03-30 09:49:12 +02:00
training Add training option to set annotations on update () 2021-04-26 16:53:53 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Add vocab kwarg back to spacy.load 2021-03-11 10:58:59 +01:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.0.6 () 2021-04-22 16:33:26 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
compat.py Use Literal type for nr_feature_tokens 2020-09-23 16:00:03 +02:00
default_config_pretraining.cfg pretrain architectures () 2020-12-08 14:41:03 +08:00
default_config.cfg Add training option to set annotations on update () 2021-04-26 16:53:53 +02:00
errors.py Add callback to copy vocab/tokenizer from model () 2021-04-22 12:36:50 +02:00
glossary.py unicode -> str consistency 2020-05-24 17:20:58 +02:00
kb.pxd Replace cpdef variables with cdef () 2021-04-26 16:54:02 +02:00
kb.pyx Replace links to nightly docs [ci skip] 2021-01-30 20:09:38 +11:00
language.py Add training option to set annotations on update () 2021-04-26 16:53:53 +02:00
lexeme.pxd Fix Lexeme.from_ptr 2020-08-10 16:43:37 +02:00
lexeme.pyx reduce memory load when reading all vectors from file () 2021-02-07 08:05:43 +08:00
lookups.py Replace links to nightly docs [ci skip] 2021-01-30 20:09:38 +11:00
morphology.pxd Clean up Morphology imports and definitions () 2021-04-26 16:54:23 +02:00
morphology.pyx Clean up Morphology imports and definitions () 2021-04-26 16:54:23 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 () 2019-12-22 01:53:56 +01:00
pipe_analysis.py Tidy up and auto-format 2020-09-29 21:39:28 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Add training option to set annotations on update () 2021-04-26 16:53:53 +02:00
scorer.py Extend score_spans for overlapping & non-labeled spans () 2021-04-08 12:19:17 +02:00
strings.pxd Remove 'cleanup' of strings () 2020-09-01 16:12:15 +02:00
strings.pyx Make vocab update in get_docs deterministic () 2021-04-09 11:53:13 +02:00
structs.pxd Add SpanGroup and Graph container types to represent arbitrary annotations () 2021-01-14 17:30:41 +11:00
symbols.pxd introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
symbols.pyx introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
tokenizer.pxd Replace cpdef variables with cdef () 2021-04-26 16:54:02 +02:00
tokenizer.pyx Fix tokenizer cache flushing () 2021-04-22 18:14:57 +10:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Fix scoring normalization () 2021-04-26 16:53:38 +02:00
vectors.pyx Fix vectors data on GPU () 2021-04-19 18:30:03 +10:00
vocab.pxd Replace cpdef variables with cdef () 2021-04-26 16:54:02 +02:00
vocab.pyx Fix vectors data on GPU () 2021-04-19 18:30:03 +10:00