Ines Montani
e624fcd5d9
Merge branch 'nightly.spacy.io' into develop
2020-07-09 23:26:26 +02:00
Ines Montani
52e9b5b472
Fix formatting
2020-07-09 23:25:58 +02:00
Ines Montani
28cdae898a
Update projects.md
2020-07-09 22:35:54 +02:00
Adriane Boyd
0a62098c5f
Fix lemmatizer is_base_form for python2.7 ( #5734 )
...
* Fix lemmatizer init args for python2.7
* Move English is_base_form to a class method
* Skip test pickling PhraseMatcher for python2
2020-07-09 22:11:24 +02:00
Adriane Boyd
923affd091
Remove is_base_form from French lemmatizer ( #5733 )
...
Remove English-specific is_base_form from French lemmatizer.
2020-07-09 22:11:13 +02:00
Ines Montani
7bcf9f7cfb
Document new features
2020-07-09 21:10:36 +02:00
Ines Montani
797ca6f3dd
Merge branch 'develop' into nightly.spacy.io
2020-07-09 20:48:24 +02:00
Matthew Honnibal
552d1ad226
Hack at tests
2020-07-09 20:25:51 +02:00
Matthew Honnibal
eb064c59cd
Try to fix textcat test
2020-07-09 20:24:53 +02:00
Ines Montani
018319a640
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-09 19:44:41 +02:00
Ines Montani
05e182e421
Update CLI args and docstrings
2020-07-09 19:44:28 +02:00
Sofie Van Landeghem
dd207a28be
cleanup components API ( #5726 )
...
* add keyword separator for update functions and drop unused "state"
* few more Example tests and various small fixes
* consistently return losses after update call
* eliminate unused tensors field across pipe components
* fix name
* fix arg name
2020-07-09 19:43:39 +02:00
Ines Montani
ea01831f6a
Update projects docs etc.
2020-07-09 19:43:25 +02:00
Adriane Boyd
ac4297ee39
Minor refactor to conversion of output docs ( #5718 )
...
Minor refactor of conversion of docs to output format to avoid
duplicate conversion steps.
2020-07-09 19:42:32 +02:00
Sofie Van Landeghem
c1ea55307b
Fixing reproducible training ( #5735 )
...
* Add initial reproducibility tests
* failing test for default_text_classifier (WIP)
* track trouble to underlying tok2vec layer
* add regression test for Issue 5551
* tests go green with https://github.com/explosion/thinc/pull/359
* update test
* adding fixed seeds to HashEmbed layers, seems to fix the reproducility issue
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-07-09 19:39:31 +02:00
Matthew Honnibal
1827f22f56
Set version to v3.0.0a3
2020-07-09 19:38:04 +02:00
Matthw Honnibal
7010f1a2be
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-09 19:34:11 +02:00
Matthw Honnibal
0becc5954b
Update NER config
2020-07-09 19:33:54 +02:00
Matthw Honnibal
77af0a6bb4
Offer option of padding-sensitive batching
2020-07-09 14:50:20 +02:00
Matthw Honnibal
3a7f275c02
Add extra batch util
2020-07-09 14:38:41 +02:00
Matthw Honnibal
eb0798c421
Add __len__ method for Example
2020-07-09 14:38:26 +02:00
Ines Montani
175d34d8f9
Update sidebar menu
2020-07-09 11:44:09 +02:00
Ines Montani
9ee5b71412
Update cli.md
2020-07-09 11:44:00 +02:00
Ines Montani
028f8210e8
Merge branch 'develop' into nightly.spacy.io
2020-07-09 11:43:57 +02:00
Ines Montani
8f9552d9e7
Refactor project CLI ( #5732 )
...
* Make project command a submodule
* Update with WIP
* Add helper for joining commands
* Update docstrins, formatting and types
* Update assets and add support for copying local files
* Fix type
* Update success messages
2020-07-09 01:42:51 +02:00
Adriane Boyd
ad15499b3b
Fix get_loss for values outside of labels in senter ( #5730 )
...
* Fix get_loss for None alignments in senter
When converting the `sent_start` values back to `SentenceRecognizer`
labels, handle `None` alignments.
* Handle SENT_START as -1
Handle SENT_START as -1 (or -1 converted to uint64) by treating any
values other than 1 the same as 0 in `SentenceRecognizer.get_loss`.
2020-07-09 01:41:58 +02:00
Matthw Honnibal
9b49787f35
Update NER config. Getting 84.8
2020-07-08 21:38:01 +02:00
Matthw Honnibal
1b20ffac38
batch_by_words by default
2020-07-08 21:37:06 +02:00
Matthw Honnibal
93e50da46a
Remove auto 'set_annotation' in training to address GPU memory
2020-07-08 21:36:51 +02:00
Matthw Honnibal
fb8a5967c1
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-08 15:27:50 +02:00
Ines Montani
0a3d41bb1d
Deprecat model shortcuts and simplify download ( #5722 )
2020-07-08 14:00:07 +02:00
Adriane Boyd
c9f0f75778
Update get_loss for senter and morphologizer ( #5724 )
...
* Update get_loss for senter
Update `SentenceRecognizer.get_loss` to keep it similar to `Tagger`.
* Update get_loss for morphologizer
Update `Morphologizer.get_loss` to keep it similar to `Tagger`.
2020-07-08 13:59:28 +02:00
Ines Montani
9ae4040183
Update API docs
2020-07-08 13:34:35 +02:00
svlandeg
c94279ac1b
remove tensors, fix predict, get_loss and set_annotations
2020-07-08 13:11:54 +02:00
svlandeg
90b100c39f
remove component.Model, update constructor, losses is return value of update
2020-07-08 12:14:30 +02:00
Ines Montani
3d83721551
Merge pull request #5723 from gandersen101/fix-spaczz-universe-typo
2020-07-08 11:35:40 +02:00
Matthw Honnibal
ca989f4cc4
Improve cutting logic in parser
2020-07-08 11:27:54 +02:00
Matthw Honnibal
42e1109def
Support option to not batch by number of words
2020-07-08 11:26:54 +02:00
gandersen101
893133873d
Fix quote issue in spaczz universe.json
2020-07-07 19:16:28 -05:00
Ines Montani
109849bd31
Fix and update universe.json [ci skip]
2020-07-07 21:12:28 +02:00
gandersen101
9097549227
Adding spaczz package to universe.json ( #5717 )
...
* Adding spaczz package to universe.json
* Adding contributor agreement.
2020-07-07 20:55:24 +02:00
Jonathan Besomi
546f3d10d4
Add texthero to universe.json ( #5716 )
...
* Add texthero to universe.json
* Add spaCy contributor Agreement
2020-07-07 20:54:22 +02:00
Ines Montani
8cb7f9ccff
Improve assets and DVC handling ( #5719 )
...
* Improve assets and DVC handling
* Remove outdated comment [ci skip]
2020-07-07 20:51:50 +02:00
Ines Montani
2298e129e6
Update example and training docs
2020-07-07 20:30:12 +02:00
svlandeg
2b60e894cb
fix component constructors, update, begin_training, reference to GoldParse
2020-07-07 19:17:19 +02:00
Sofie Van Landeghem
a39a110c4e
Few more Example unit tests ( #5720 )
...
* small fixes in Example, UX
* add gold tests for aligned_spans and get_aligned_parse
* sentencizer unnecessary
2020-07-07 18:46:00 +02:00
Matthw Honnibal
433dc3c9c9
Simplify PrecomputableAffine slightly
2020-07-07 17:22:47 +02:00
Matthw Honnibal
a4164f67ca
Don't normalize gradients
2020-07-07 17:21:58 +02:00
Matthw Honnibal
8177f25b6c
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-07 17:21:10 +02:00
svlandeg
14a796e3f9
add Example API with examples of Example usage
2020-07-07 14:46:41 +02:00