Adriane Boyd
ad43cbb042
Sync missing and misaligned values in Tagger loss ( #6689 )
...
Use `None` for both missing and misaligned annotation in
`Tagger.get_loss`, reverting to the default missing value in the loss
function.
2021-01-10 11:30:37 +11:00
Matthew Honnibal
c04bab6bae
Fix train loop to avoid swallowing tracebacks ( #6693 )
...
* Avoid swallowing tracebacks in train loop
* Format
* Handle first
2021-01-09 08:25:47 +08:00
Sofie Van Landeghem
a612a5ba3f
fix small typos ( #6698 )
2021-01-08 09:39:47 +01:00
Yohei Tamura
411c842a71
convert tuple to list, because the type mismatches ( #6625 )
2021-01-07 16:42:12 +11:00
Sofie Van Landeghem
75d9019343
Fix types of Tok2Vec encoding architectures ( #6442 )
...
* fix TorchBiLSTMEncoder documentation
* ensure the types of the encoding Tok2vec layers are correct
* update references from v1 to v2 for the new architectures
2021-01-07 16:39:27 +11:00
Sofie Van Landeghem
8c1a23209f
Getting scores out of beam_parser ( #6684 )
...
* clean up of ner tests
* beam_parser tests
* implement get_beam_parses and scored_parses for the dep parser
* we don't have to add the parse if there are no arcs
2021-01-07 16:28:27 +11:00
Sofie Van Landeghem
3983bc6b1e
Fix Transformer width in TextCatEnsemble ( #6431 )
...
* add convenience method to determine tok2vec width in a model
* fix transformer tok2vec dimensions in TextCatEnsemble architecture
* init function should not be nested to avoid pickle issues
2021-01-06 12:44:04 +01:00
Sofie Van Landeghem
402dbc5bae
Getting scores out of beam_ner ( #6575 )
...
* small fixes and formatting
* bring test_issue4313 up-to-date, currently fails
* formatting
* add get_beam_parses method back
* add scored_ents function
* delete tag map
2021-01-06 12:02:32 +01:00
Sofie Van Landeghem
82ae95267a
Docs for pretrain architectures ( #6605 )
...
* document pretraining architectures
* formatting
* bit more info
* small fixes
2021-01-06 16:12:30 +11:00
Adriane Boyd
bf9096437e
Set default lemmas in retokenizer ( #6667 )
...
Instead of unsetting lemmas on retokenized tokens, set the default
lemmas to:
* merge: concatenate any existing lemmas with `SPACY` preserved
* split: use the new `ORTH` values if lemmas were previously set,
otherwise leave unset
2021-01-06 12:29:44 +08:00
Adriane Boyd
0041dfbc7f
Use special matcher for exceptions with spaces ( #6668 )
...
Use the special cases phrase matcher for exceptions that include space
characters so that exceptions including spaces are supported.
2021-01-06 12:05:10 +08:00
Sofie Van Landeghem
afc5714d32
multi-label textcat component ( #6474 )
...
* multi-label textcat component
* formatting
* fix comment
* cleanup
* fix from #6481
* random edit to push the tests
* add explicit error when textcat is called with multi-label gold data
* fix error nr
* small fix
2021-01-06 13:07:14 +11:00
Bruno
1a77607036
spaCy v3 is not saving the best version in training loop ( #6629 )
...
* Save best only if is the best and also respect the average config
* Create bratao.md
* Update loop.py
* Remove average check
* Keep before_to_disk
2021-01-06 12:51:30 +11:00
Sofie Van Landeghem
29b59086f9
Prevent 0-length mem alloc ( #6653 )
...
* prevent 0-length mem alloc by adding asserts
* fix lexeme mem allocation
2021-01-06 12:50:17 +11:00
Ines Montani
6f83abb971
Merge pull request #6647 from svlandeg/feature/init_config_overwrite
2021-01-05 14:59:04 +11:00
Ines Montani
81f018fb67
Merge pull request #6671 from explosion/chore/tidy-autoformat
...
Tidy up and auto-format
2021-01-05 14:45:31 +11:00
Ines Montani
224a3590e9
Merge pull request #6654 from svlandeg/chore/tests-cleanup
...
Unskipping tests
2021-01-05 13:53:40 +11:00
Ines Montani
3614472e29
Merge pull request #6646 from svlandeg/feature/cli-docs [ci skip]
2021-01-05 13:52:49 +11:00
Ines Montani
9c078a5885
Update formatting for consistency [ci skip]
2021-01-05 13:52:28 +11:00
Ines Montani
a9e845426f
Use --force for consistency and add docs
2021-01-05 13:49:59 +11:00
Ines Montani
c4993f16d0
Merge pull request #6651 from svlandeg/bugfix/cli_info
2021-01-05 13:44:26 +11:00
Ines Montani
991669c934
Tidy up and auto-format
2021-01-05 13:41:53 +11:00
Adriane Boyd
b57be94c78
Fix memory issues in Language.evaluate ( #6386 )
...
* Fix memory issues in Language.evaluate
Reset annotation in predicted docs before evaluating and store all data
in `examples`.
* Minor refactor to docs generator init
* Fix generator expression
* Fix final generator check
* Refactor pipeline loop
* Handle examples generator in Language.evaluate
* Add test with generator
* Use make_doc
2020-12-31 10:45:50 +11:00
svlandeg
a6a68da673
unskipping tests with python >= 3.6
2020-12-30 18:46:43 +01:00
svlandeg
d5ff0fecf8
add docs
2020-12-30 14:01:13 +01:00
svlandeg
c74ab6a313
fix imports
2020-12-30 12:40:12 +01:00
svlandeg
712a78b74a
add simple unit test
2020-12-30 12:35:26 +01:00
svlandeg
4347e6d39b
fixes for CLI info command
2020-12-30 12:05:58 +01:00
svlandeg
62b4fe118f
prevent overwriting existing config file
2020-12-29 15:40:22 +01:00
svlandeg
2fa23b0304
fix capitalization for link
2020-12-29 15:01:22 +01:00
svlandeg
43cc6aea93
remove non-existing link
2020-12-29 14:59:39 +01:00
svlandeg
543073bf9d
add pretrain example
2020-12-29 14:51:23 +01:00
svlandeg
1d0ef98873
move example
2020-12-29 14:46:03 +01:00
svlandeg
20113b8063
add train CLI example
2020-12-29 14:44:56 +01:00
Adriane Boyd
5ca57d8221
Add logger warning when serializing user hooks ( #6595 )
...
Add a warning that user hooks are lost on serialization.
Add a `user_hooks` exclude to skip the warning with pickle.
2020-12-29 11:54:32 +01:00
Adriane Boyd
cabd4ae5b1
Use logger.warning instead of logger.warn ( #6596 )
...
Use `logger.warning` instead of deprecated `logger.warn`.
2020-12-21 08:25:10 +08:00
Sofie Van Landeghem
282a3b49ea
Fix parser resizing when there is no upper layer ( #6460 )
...
* allow resizing of the parser model even when upper=False
* update from spacy.TransitionBasedParser.v1 to v2
* bugfix
2020-12-18 18:56:57 +08:00
Sofie Van Landeghem
0a923a7915
Tagger robustness ( #6580 )
...
* require labels in taggers
* ensure tagger works with incomplete data
2020-12-18 18:51:47 +08:00
Adriane Boyd
e10295c9fd
Fix memory leak when adding empty morph ( #6581 )
...
Fix lookup of empty morph in the morphology table, which fixes a memory
leak where a new morphology tag was allocated each time the empty morph
tag was added.
2020-12-18 18:51:01 +08:00
Ines Montani
fd640afcd8
Add comment on CI strategy [ci skip]
2020-12-17 22:13:05 +11:00
Ines Montani
e9b0963827
Merge pull request #6333 from adrianeboyd/chore/python39
2020-12-17 22:11:57 +11:00
Adriane Boyd
51820180ba
Reduce CI builds
2020-12-17 08:55:05 +01:00
Adriane Boyd
2df1ab8a1f
Remove detailed numpy constraints from pyproject.toml
2020-12-17 08:54:20 +01:00
Ines Montani
e99cd82367
Update version pins
2020-12-17 10:21:08 +11:00
Ines Montani
47c1ec678b
Merge branch 'develop' into pr/6333
2020-12-17 10:19:28 +11:00
Ines Montani
3f90bffa27
Merge pull request #6571 from adrianeboyd/bugfix/debug-data-missing-vectors
...
Fix alignment and vector checks in debug data
2020-12-17 10:10:47 +11:00
Ines Montani
546af3966a
Merge pull request #6577 from LeapBeyond/bug/root_logger
...
Prevent root logger from initialising
2020-12-16 16:42:54 +11:00
Thomas Bird
cbb8c66da3
prevent the root logger from inialising
2020-12-15 19:50:34 +00:00
Adriane Boyd
1ddf2f39c7
Switch converters to generator functions ( #6547 )
...
* Switch converters to generator functions
To reduce the memory usage when converting large corpora, refactor the
convert methods to be generator functions.
* Update tests
2020-12-15 16:47:16 +08:00
Adriane Boyd
20e18cc246
Fix alignment and vector checks in debug data
...
* Update token alignment check to use Example alignment
* Update missing vector check further related to changes in v3
2020-12-15 09:43:14 +01:00