Switch default train corpus max_length to 0 in quickstart (#8142)

The behavior of `spacy.Corpus.v1` is unexpected enough for `max_length
!= 0` that `0` is a better default for users creating a new config with
the quickstart.

If not, documents are skipped, sometimes the entire corpus is skipped,
and sometimes documents are (quite unexpectedly for your average user)
split into sentences.
This commit is contained in:
Adriane Boyd 2021-05-20 14:48:09 +02:00 committed by GitHub
parent 4e69fcaa50
commit cd6bd91c3a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -372,7 +372,7 @@ factory = "{{ pipe }}"
[corpora.train] [corpora.train]
@readers = "spacy.Corpus.v1" @readers = "spacy.Corpus.v1"
path = ${paths.train} path = ${paths.train}
max_length = {{ 500 if hardware == "gpu" else 2000 }} max_length = 0
[corpora.dev] [corpora.dev]
@readers = "spacy.Corpus.v1" @readers = "spacy.Corpus.v1"