Sofie Van Landeghem
8c1a23209f
Getting scores out of beam_parser ( #6684 )
...
* clean up of ner tests
* beam_parser tests
* implement get_beam_parses and scored_parses for the dep parser
* we don't have to add the parse if there are no arcs
2021-01-07 16:28:27 +11:00
Sofie Van Landeghem
3983bc6b1e
Fix Transformer width in TextCatEnsemble ( #6431 )
...
* add convenience method to determine tok2vec width in a model
* fix transformer tok2vec dimensions in TextCatEnsemble architecture
* init function should not be nested to avoid pickle issues
2021-01-06 12:44:04 +01:00
Sofie Van Landeghem
402dbc5bae
Getting scores out of beam_ner ( #6575 )
...
* small fixes and formatting
* bring test_issue4313 up-to-date, currently fails
* formatting
* add get_beam_parses method back
* add scored_ents function
* delete tag map
2021-01-06 12:02:32 +01:00
Sofie Van Landeghem
82ae95267a
Docs for pretrain architectures ( #6605 )
...
* document pretraining architectures
* formatting
* bit more info
* small fixes
2021-01-06 16:12:30 +11:00
Sofie Van Landeghem
6f7e7d88b9
remove cause without apostrophe from norm exceptions ( #6636 )
2021-01-06 12:30:30 +08:00
Adriane Boyd
bf9096437e
Set default lemmas in retokenizer ( #6667 )
...
Instead of unsetting lemmas on retokenized tokens, set the default
lemmas to:
* merge: concatenate any existing lemmas with `SPACY` preserved
* split: use the new `ORTH` values if lemmas were previously set,
otherwise leave unset
2021-01-06 12:29:44 +08:00
Adriane Boyd
0041dfbc7f
Use special matcher for exceptions with spaces ( #6668 )
...
Use the special cases phrase matcher for exceptions that include space
characters so that exceptions including spaces are supported.
2021-01-06 12:05:10 +08:00
Sofie Van Landeghem
afc5714d32
multi-label textcat component ( #6474 )
...
* multi-label textcat component
* formatting
* fix comment
* cleanup
* fix from #6481
* random edit to push the tests
* add explicit error when textcat is called with multi-label gold data
* fix error nr
* small fix
2021-01-06 13:07:14 +11:00
Bruno
1a77607036
spaCy v3 is not saving the best version in training loop ( #6629 )
...
* Save best only if is the best and also respect the average config
* Create bratao.md
* Update loop.py
* Remove average check
* Keep before_to_disk
2021-01-06 12:51:30 +11:00
Sofie Van Landeghem
29b59086f9
Prevent 0-length mem alloc ( #6653 )
...
* prevent 0-length mem alloc by adding asserts
* fix lexeme mem allocation
2021-01-06 12:50:17 +11:00
Ines Montani
6f83abb971
Merge pull request #6647 from svlandeg/feature/init_config_overwrite
2021-01-05 14:59:04 +11:00
Ines Montani
81f018fb67
Merge pull request #6671 from explosion/chore/tidy-autoformat
...
Tidy up and auto-format
2021-01-05 14:45:31 +11:00
Ines Montani
224a3590e9
Merge pull request #6654 from svlandeg/chore/tests-cleanup
...
Unskipping tests
2021-01-05 13:53:40 +11:00
Ines Montani
3614472e29
Merge pull request #6646 from svlandeg/feature/cli-docs [ci skip]
2021-01-05 13:52:49 +11:00
Ines Montani
9c078a5885
Update formatting for consistency [ci skip]
2021-01-05 13:52:28 +11:00
Ines Montani
a9e845426f
Use --force for consistency and add docs
2021-01-05 13:49:59 +11:00
Ines Montani
c4993f16d0
Merge pull request #6651 from svlandeg/bugfix/cli_info
2021-01-05 13:44:26 +11:00
Ines Montani
991669c934
Tidy up and auto-format
2021-01-05 13:41:53 +11:00
Adriane Boyd
b57be94c78
Fix memory issues in Language.evaluate ( #6386 )
...
* Fix memory issues in Language.evaluate
Reset annotation in predicted docs before evaluating and store all data
in `examples`.
* Minor refactor to docs generator init
* Fix generator expression
* Fix final generator check
* Refactor pipeline loop
* Handle examples generator in Language.evaluate
* Add test with generator
* Use make_doc
2020-12-31 10:45:50 +11:00
svlandeg
a6a68da673
unskipping tests with python >= 3.6
2020-12-30 18:46:43 +01:00
svlandeg
d5ff0fecf8
add docs
2020-12-30 14:01:13 +01:00
svlandeg
c74ab6a313
fix imports
2020-12-30 12:40:12 +01:00
svlandeg
712a78b74a
add simple unit test
2020-12-30 12:35:26 +01:00
svlandeg
4347e6d39b
fixes for CLI info command
2020-12-30 12:05:58 +01:00
svlandeg
62b4fe118f
prevent overwriting existing config file
2020-12-29 15:40:22 +01:00
svlandeg
2fa23b0304
fix capitalization for link
2020-12-29 15:01:22 +01:00
svlandeg
43cc6aea93
remove non-existing link
2020-12-29 14:59:39 +01:00
svlandeg
543073bf9d
add pretrain example
2020-12-29 14:51:23 +01:00
svlandeg
1d0ef98873
move example
2020-12-29 14:46:03 +01:00
svlandeg
20113b8063
add train CLI example
2020-12-29 14:44:56 +01:00
Adam Bittlingmayer
f2fe60bacf
Update tokenizer_exceptions.py
...
See https://github.com/explosion/spaCy/pull/6643
2020-12-29 16:05:11 +04:00
Adriane Boyd
5ca57d8221
Add logger warning when serializing user hooks ( #6595 )
...
Add a warning that user hooks are lost on serialization.
Add a `user_hooks` exclude to skip the warning with pickle.
2020-12-29 11:54:32 +01:00
Sofie Van Landeghem
87562e470d
fix backticks in docs ( #6635 )
2020-12-27 22:12:37 +01:00
Sofie Van Landeghem
8df5b7f513
fix documentation of 'path' in tokenizer.to_disk ( #6634 )
2020-12-27 22:01:06 +01:00
Yosi
cf52510631
Add Amharic አማርኛ Language support ( #6583 )
...
* Add Amharic to space
* clean up
* Add some PRON_LEMMA
* add Tigrinya support
* remove text_noun_chunks
* Tigrinya Support
* added some more details for ti
* fix unit test
* add amharic char range
* changes from review
* amharic and tigrinya share same unicode block
* get rid of _amharic/_tigrinya in char_classes
Co-authored-by: Josiah Solomon <jsolomon@meteorcomm.com>
2020-12-22 16:50:34 +01:00
Tim Gates
292c1d6a73
docs: fix simple typo, speficied -> specified ( #6611 )
...
There is a small typo in spacy/cli/info.py.
Should read `specified` rather than `speficied`.
2020-12-22 09:14:10 +01:00
Adriane Boyd
cabd4ae5b1
Use logger.warning instead of logger.warn ( #6596 )
...
Use `logger.warning` instead of deprecated `logger.warn`.
2020-12-21 08:25:10 +08:00
Sofie Van Landeghem
282a3b49ea
Fix parser resizing when there is no upper layer ( #6460 )
...
* allow resizing of the parser model even when upper=False
* update from spacy.TransitionBasedParser.v1 to v2
* bugfix
2020-12-18 18:56:57 +08:00
Sofie Van Landeghem
0a923a7915
Tagger robustness ( #6580 )
...
* require labels in taggers
* ensure tagger works with incomplete data
2020-12-18 18:51:47 +08:00
Adriane Boyd
e10295c9fd
Fix memory leak when adding empty morph ( #6581 )
...
Fix lookup of empty morph in the morphology table, which fixes a memory
leak where a new morphology tag was allocated each time the empty morph
tag was added.
2020-12-18 18:51:01 +08:00
Gareth Sparks
efc229c3f4
Doc.char_span arg: alignment_mode ( #6591 )
...
Currently labeled "mode", actually "alignment_mode"
2020-12-18 09:54:56 +01:00
Ines Montani
fd640afcd8
Add comment on CI strategy [ci skip]
2020-12-17 22:13:05 +11:00
Ines Montani
e9b0963827
Merge pull request #6333 from adrianeboyd/chore/python39
2020-12-17 22:11:57 +11:00
Adriane Boyd
51820180ba
Reduce CI builds
2020-12-17 08:55:05 +01:00
Adriane Boyd
2df1ab8a1f
Remove detailed numpy constraints from pyproject.toml
2020-12-17 08:54:20 +01:00
Ines Montani
e99cd82367
Update version pins
2020-12-17 10:21:08 +11:00
Ines Montani
47c1ec678b
Merge branch 'develop' into pr/6333
2020-12-17 10:19:28 +11:00
Ines Montani
3f90bffa27
Merge pull request #6571 from adrianeboyd/bugfix/debug-data-missing-vectors
...
Fix alignment and vector checks in debug data
2020-12-17 10:10:47 +11:00
Ines Montani
7c9a2f298c
Merge pull request #6578 from jenojp/master [ci skip]
2020-12-16 17:31:55 +11:00
Ines Montani
546af3966a
Merge pull request #6577 from LeapBeyond/bug/root_logger
...
Prevent root logger from initialising
2020-12-16 16:42:54 +11:00