Ines Montani
634ae609b4
Adjust formatting [ci skip]
2021-01-27 13:08:00 +11:00
Ines Montani
d5ef245bb1
Merge pull request #6822 from jganseman/master [ci skip]
2021-01-27 13:04:30 +11:00
Ines Montani
5d79d1af50
Merge pull request #6796 from svlandeg/docs/benchmarks [ci skip]
2021-01-27 13:01:23 +11:00
Ines Montani
1ed7029d47
Update website for v3 launch
2021-01-27 12:39:47 +11:00
Adriane Boyd
c447aa2b98
Update --code arg in evaluate CLI docs
2021-01-26 15:30:46 +01:00
jganseman
907bce7a78
Merge pull request #1 from jganseman/patch-1
...
Patch 1
2021-01-26 11:12:30 +01:00
jganseman
8bc57ec372
also update is_oov in lexeme docs
2021-01-26 11:09:16 +01:00
jganseman
1f2b0ec168
proposing a more concise explanation for is_oov
...
proposing a more concise explanation for is_oov
2021-01-26 10:53:39 +01:00
Matthew Honnibal
f049df1715
Revert "Set annotations in update" ( #6810 )
...
* Revert "Set annotations in update (#6767 )"
This reverts commit e680efc7cc
.
* Fix version
* Update spacy/pipeline/entity_linker.py
* Update spacy/pipeline/entity_linker.py
* Update spacy/pipeline/tagger.pyx
* Update spacy/pipeline/tok2vec.py
* Update spacy/pipeline/tok2vec.py
* Update spacy/pipeline/transition_parser.pyx
* Update spacy/pipeline/transition_parser.pyx
* Update website/docs/api/multilabel_textcategorizer.md
* Update website/docs/api/tok2vec.md
* Update website/docs/usage/layers-architectures.md
* Update website/docs/usage/layers-architectures.md
* Update website/docs/api/transformer.md
* Update website/docs/api/textcategorizer.md
* Update website/docs/api/tagger.md
* Update spacy/pipeline/entity_linker.py
* Update website/docs/api/sentencerecognizer.md
* Update website/docs/api/pipe.md
* Update website/docs/api/morphologizer.md
* Update website/docs/api/entityrecognizer.md
* Update spacy/pipeline/entity_linker.py
* Update spacy/pipeline/multitask.pyx
* Update spacy/pipeline/tagger.pyx
* Update spacy/pipeline/tagger.pyx
* Update spacy/pipeline/textcat.py
* Update spacy/pipeline/textcat.py
* Update spacy/pipeline/textcat.py
* Update spacy/pipeline/tok2vec.py
* Update spacy/pipeline/trainable_pipe.pyx
* Update spacy/pipeline/trainable_pipe.pyx
* Update spacy/pipeline/transition_parser.pyx
* Update spacy/pipeline/transition_parser.pyx
* Update website/docs/api/entitylinker.md
* Update website/docs/api/dependencyparser.md
* Update spacy/pipeline/trainable_pipe.pyx
2021-01-25 22:18:45 +08:00
Adriane Boyd
61c9f8bf24
Remove transformers model max length section ( #6807 )
2021-01-25 19:59:34 +08:00
muratjumashev
7d0154a36e
Added language meta data
2021-01-25 00:42:19 +06:00
svlandeg
56064faed9
update caption
2021-01-23 00:57:00 +01:00
svlandeg
d7c0f40a96
update comment
2021-01-22 18:55:18 +01:00
svlandeg
a071279bc7
add speed comparison to docs
2021-01-22 18:46:35 +01:00
svlandeg
b132cb3036
update accuracies for new a1 models
2021-01-21 20:24:05 +01:00
Adriane Boyd
d0236136a2
Fix default config init in Transformer API docs ( #6781 )
2021-01-21 23:18:03 +08:00
Sofie Van Landeghem
e680efc7cc
Set annotations in update ( #6767 )
...
* bump to 3.0.0rc4
* do set_annotations in component update calls
* update docs and remove set_annotations flag
* fix EL test
2021-01-20 11:49:25 +11:00
Sofie Van Landeghem
57640aa838
warn when frozen components break listener pattern ( #6766 )
...
* warn when frozen components break listener pattern
* few notes in the documentation
* update arg name
* formatting
* cleanup
* specify listeners return type
2021-01-20 11:12:35 +11:00
Ines Montani
4a1029a9b6
Add infobox [ci skip]
2021-01-19 19:18:39 +11:00
Adriane Boyd
7cd5c9e098
Add xx_sent_ud_sm model to website
2021-01-19 09:02:35 +01:00
Ines Montani
76e25afcd7
Merge pull request #6757 from adrianeboyd/docs/mk-ru-langs [ci skip]
...
Update languages for website
2021-01-19 11:10:48 +11:00
Ines Montani
f50502dad7
Update docs [ci skip]
2021-01-19 00:22:47 +11:00
Adriane Boyd
e8f6400923
Update languages for website
...
* Add Macedonian
* Add Russian dependencies
* Switch Chinese dependency to spacy-pkuseg
2021-01-18 14:09:34 +01:00
Ines Montani
2ae8dfbb93
Fix website [ci skip]
2021-01-18 22:31:32 +11:00
Ines Montani
09cacbb7ee
Fix website [ci skip]
2021-01-18 11:37:04 +11:00
Sofie Van Landeghem
fed8f48965
raise NotImplementedError when noun_chunks iterator is not implemented ( #6711 )
...
* raise NotImplementedError when noun_chunks iterator is not implemented
* bring back, fix and document span.noun_chunks
* formatting
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2021-01-17 19:56:05 +08:00
Adriane Boyd
bf0cdae8d4
Add token_splitter component ( #6726 )
...
* Add long_token_splitter component
Add a `long_token_splitter` component for use with transformer
pipelines. This component splits up long tokens like URLs into smaller
tokens. This is particularly relevant for pretrained pipelines with
`strided_spans`, since the user can't change the length of the span
`window` and may not wish to preprocess the input texts.
The `long_token_splitter` splits tokens that are at least
`long_token_length` tokens long into smaller tokens of `split_length`
size.
Notes:
* Since this is intended for use as the first component in a pipeline,
the token splitter does not try to preserve any token annotation.
* API docs to come when the API is stable.
* Adjust API, add test
* Fix name in factory
2021-01-17 19:54:41 +08:00
Adriane Boyd
9328dd5625
Handle unset token.morph in Morphologizer ( #6704 )
...
* Handle unset token.morph in Morphologizer
Handle unset `token.morph` in `Morphologizer.initialize` and
`Morphologizer.get_loss`. If both `token.morph` and `token.pos` are
unset, treat the annotation as missing rather than empty.
* Add token.has_morph()
2021-01-15 17:20:10 +01:00
Adriane Boyd
0c936004d1
Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3
2021-01-14 11:49:58 +01:00
Matthew Honnibal
f277bfdf0f
Add SpanGroup and Graph container types to represent arbitrary annotations ( #6696 )
...
* Draft out initial Spans data structure
* Initial span group commit
* Basic span group support on Doc
* Basic test for span group
* Compile span_group.pyx
* Draft addition of SpanGroup to DocBin
* Add deserialization for SpanGroup
* Add tests for serializing SpanGroup
* Fix serialization of SpanGroup
* Add EdgeC and GraphC structs
* Add draft Graph data structure
* Compile graph
* More work on Graph
* Update GraphC
* Upd graph
* Fix walk functions
* Let Graph take nodes and edges on construction
* Fix walking and getting
* Add graph tests
* Fix import
* Add module with the SpanGroups dict thingy
* Update test
* Rename 'span_groups' attribute
* Try to fix c++11 compilation
* Fix test
* Update DocBin
* Try to fix compilation
* Try to fix graph
* Improve SpanGroup docstrings
* Add doc.spans to documentation
* Fix serialization
* Tidy up and add docs
* Update docs [ci skip]
* Add SpanGroup.has_overlap
* WIP updated Graph API
* Start testing new Graph API
* Update Graph tests
* Update Graph
* Add docstring
Co-authored-by: Ines Montani <ines@ines.io>
2021-01-14 17:30:41 +11:00
Ines Montani
29c3ca7e34
Fix SVG integration [ci skip]
2021-01-14 13:33:41 +11:00
Antonio Miras
b4bd8f347a
spaCy Universe: New project; SpacyDotNet ( #6702 )
...
* Universe: SpacyDotNet a .NET Core spaCy wrapper
* Signed contributor agreement
Co-authored-by: Antonio Miras <antonio@amiras.net>
2021-01-13 12:47:30 +11:00
Adriane Boyd
a45d89f09a
Add initialize.before_init and after_init callbacks
...
Add `initialize.before_init` and `initialize.after_init` callbacks to
the config. The `initialize.before_init` callback is a place to
implement one-time tokenizer customizations that are then saved with the
model.
2021-01-12 13:07:44 +01:00
Sofie Van Landeghem
a612a5ba3f
fix small typos ( #6698 )
2021-01-08 09:39:47 +01:00
Sofie Van Landeghem
75d9019343
Fix types of Tok2Vec encoding architectures ( #6442 )
...
* fix TorchBiLSTMEncoder documentation
* ensure the types of the encoding Tok2vec layers are correct
* update references from v1 to v2 for the new architectures
2021-01-07 16:39:27 +11:00
Sofie Van Landeghem
82ae95267a
Docs for pretrain architectures ( #6605 )
...
* document pretraining architectures
* formatting
* bit more info
* small fixes
2021-01-06 16:12:30 +11:00
Sofie Van Landeghem
afc5714d32
multi-label textcat component ( #6474 )
...
* multi-label textcat component
* formatting
* fix comment
* cleanup
* fix from #6481
* random edit to push the tests
* add explicit error when textcat is called with multi-label gold data
* fix error nr
* small fix
2021-01-06 13:07:14 +11:00
Ines Montani
6f83abb971
Merge pull request #6647 from svlandeg/feature/init_config_overwrite
2021-01-05 14:59:04 +11:00
Ines Montani
3614472e29
Merge pull request #6646 from svlandeg/feature/cli-docs [ci skip]
2021-01-05 13:52:49 +11:00
Ines Montani
9c078a5885
Update formatting for consistency [ci skip]
2021-01-05 13:52:28 +11:00
Ines Montani
a9e845426f
Use --force for consistency and add docs
2021-01-05 13:49:59 +11:00
svlandeg
d5ff0fecf8
add docs
2020-12-30 14:01:13 +01:00
svlandeg
2fa23b0304
fix capitalization for link
2020-12-29 15:01:22 +01:00
svlandeg
43cc6aea93
remove non-existing link
2020-12-29 14:59:39 +01:00
svlandeg
543073bf9d
add pretrain example
2020-12-29 14:51:23 +01:00
svlandeg
1d0ef98873
move example
2020-12-29 14:46:03 +01:00
svlandeg
20113b8063
add train CLI example
2020-12-29 14:44:56 +01:00
Sofie Van Landeghem
87562e470d
fix backticks in docs ( #6635 )
2020-12-27 22:12:37 +01:00
Sofie Van Landeghem
8df5b7f513
fix documentation of 'path' in tokenizer.to_disk ( #6634 )
2020-12-27 22:01:06 +01:00
Sofie Van Landeghem
282a3b49ea
Fix parser resizing when there is no upper layer ( #6460 )
...
* allow resizing of the parser model even when upper=False
* update from spacy.TransitionBasedParser.v1 to v2
* bugfix
2020-12-18 18:56:57 +08:00
Gareth Sparks
efc229c3f4
Doc.char_span arg: alignment_mode ( #6591 )
...
Currently labeled "mode", actually "alignment_mode"
2020-12-18 09:54:56 +01:00
Jeno Pizarro
a6fe35a0f9
Update universe.json
2020-12-15 21:53:20 -05:00
Jeno Pizarro
343a44abe9
Merge branch 'master' of https://github.com/explosion/spaCy
2020-12-15 21:49:46 -05:00
Ines Montani
85ca8c2bdd
Merge branch 'master' into develop
2020-12-11 13:44:41 +11:00
Ines Montani
fb43a30a71
Merge pull request #6545 from svlandeg/feature/discussions [ci skip]
2020-12-11 10:20:35 +11:00
Ines Montani
76cfd89dea
Update site.json
2020-12-11 10:19:42 +11:00
Ines Montani
43a69eecb7
Update site.json
2020-12-11 10:05:21 +11:00
svlandeg
d156b423ae
remove gitter and reddit links
2020-12-10 20:41:02 +01:00
svlandeg
5afa567767
replace gitter with discussions in 101
2020-12-10 20:17:36 +01:00
svlandeg
ae1ccf2b04
update link to discussion forum
2020-12-10 20:02:49 +01:00
Adriane Boyd
27bb75e2a0
Docs and extras updates for v2.3.5
...
* Update install instructions for updated packages
* Add `cuda110` and `cuda111` extras, remove upper `cupy` pins (only
compatible with `thinc>=7.4.4`)
2020-12-10 15:34:34 +01:00
Ines Montani
513c4e332a
Include custom code via spacy package command ( #6531 )
2020-12-10 20:36:46 +08:00
Ines Montani
2a6043fabb
Merge pull request #6530 from explosion/feature/init-config-cpu-gpu
2020-12-10 09:38:46 +11:00
Ines Montani
9d32e839d3
Merge branch 'develop' into feature/init-config-cpu-gpu
2020-12-10 08:50:53 +11:00
Adriane Boyd
972820e2b3
Add batch_size to data formats docs
2020-12-09 12:44:04 +01:00
Adriane Boyd
80ac8af1bf
Format
2020-12-09 12:44:01 +01:00
Adriane Boyd
795b5bd049
Update website/docs/api/language.md
...
Co-authored-by: Ines Montani <ines@ines.io>
2020-12-09 12:23:32 +01:00
Adriane Boyd
fa8fa474a3
Add nlp.batch_size setting
...
Add a default `batch_size` setting for `Language.pipe` and
`Language.evaluate` as `nlp.batch_size`.
2020-12-09 09:13:26 +01:00
Ines Montani
04b3068747
Revert landing [ci skip]
2020-12-09 11:20:45 +11:00
Ines Montani
34449b66fd
Update matcher.md
2020-12-09 11:09:45 +11:00
Ines Montani
1980203229
Merge branch 'master' into pr/6444
2020-12-09 11:09:40 +11:00
Ines Montani
05a2812ae0
Merge branch 'develop' into pr/6444
2020-12-09 11:04:03 +11:00
Ines Montani
758ad6c3cd
Make CPU the default for init config
2020-12-09 11:00:51 +11:00
Ines Montani
8921364579
Merge pull request #6521 from explosion/feature/config-stdin
...
Allow reading config from stdin in spacy train
2020-12-08 22:07:43 +11:00
Ines Montani
94a5a9814f
Update argument handling and documentation
2020-12-08 20:41:18 +11:00
Adriane Boyd
5ceac425ee
Remove non-working --use-chars from train CLI
...
Remove the non-working `--use-chars` option from the train CLI. The
implementation of the option across component types and the CLI settings
could be fixed, but the `CharacterEmbed` model does not work on GPU in
v2 so it's better to remove it.
2020-12-08 08:30:00 +01:00
Ines Montani
ef59ce783b
Adjust install instructions [ci skip]
2020-12-08 18:06:50 +11:00
Sofie Van Landeghem
2c27093c5f
require_cpu functionality ( #6336 )
...
* add require_cpu from Thinc 8.0.0rc2
* add docs
* fix test if cupy is not installed
2020-12-08 14:42:40 +08:00
Ines Montani
d8e01ca931
Merge pull request #6391 from adrianeboyd/docs/install-guide
2020-12-08 07:42:16 +01:00
Ines Montani
ee2ec52f48
Merge pull request #6409 from svlandeg/feature/trf-docs
2020-12-08 06:32:10 +01:00
Ines Montani
c2b196c2c1
Merge pull request #6419 from svlandeg/feature/rel-docs
2020-12-08 06:30:41 +01:00
Ines Montani
82e88f0e3b
Merge pull request #6379 from svlandeg/fix/labels-constructor
2020-12-08 06:29:56 +01:00
Adriane Boyd
1442d2f213
Improve simple training example in v3 migration ( #6438 )
...
* Create the examples once
* Use the examples in the initialization
* Provide the batch size
* Fix `begin_training` migration example
2020-11-30 09:39:45 +08:00
Adriane Boyd
03ae77e603
Add SPACY as a Matcher attribute ( #6463 )
2020-11-30 09:34:50 +08:00
Ines Montani
d21d2c2e59
Don't multiply accuracy by 100
2020-11-27 15:15:51 +08:00
Adriane Boyd
724831b066
Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master
...
* Update Macedonian for v3
* Update Turkish for v3
2020-11-25 11:49:34 +01:00
Jacob Bortell
fe9009911a
Update rule-based-matching.md ( #6421 )
...
* Update rule-based-matching.md
Clarified case-sensititivy of dictionary-referencing attributes (POS/TAG/DEP/etc).
Clarified "Type" column header to "Value Type"
* Update rule-based-matching.md
Improved clarity of wording
2020-11-24 16:20:19 +01:00
Adriane Boyd
6f133877aa
Update source install instructions
...
* Don't recommend an editable install in the default source
instructions.
* Use `pip install --no-build-isolation` for editable installs.
* Remove reference to `virtualenv`.
2020-11-24 14:44:13 +01:00
Yusuke Mori
e3ac90b035
Avoid a SyntaxError in self-attentive-parser ( #6428 )
...
* Avoid a SyntaxError in self-attentive-parser
Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser
* Create forest1988.md
Fill in the spaCy contributor agreement
2020-11-22 21:59:37 +01:00
svlandeg
218abaa69a
typo
2020-11-20 22:36:49 +01:00
svlandeg
e861e928df
more small corrections
2020-11-20 22:29:58 +01:00
svlandeg
5ac0867427
final fixes
2020-11-20 22:18:53 +01:00
svlandeg
331ec83493
edits and updates to implementing REL component docs
2020-11-20 21:41:52 +01:00
svlandeg
4a3e611abc
small fixes and formatting
2020-11-20 15:55:05 +01:00
svlandeg
124f49feb6
update REL model code
2020-11-20 15:25:20 +01:00
svlandeg
636be3c791
Merge remote-tracking branch 'upstream/develop' into feature/trf-docs
2020-11-19 14:15:35 +01:00
Sofie Van Landeghem
165993d8e5
fix typo in transformer docs ( #6404 )
2020-11-19 14:11:38 +01:00
M. Revuelta Espinosa
51232ffb9e
Update universe.json (include PatternOmatic) ( #6399 )
...
Request to include PatternOmatic in spaCy Universe
Adds @revuel to contributors
2020-11-19 13:15:50 +01:00
Adriane Boyd
3cf6479467
Fix JSON in #6395
2020-11-17 15:25:41 +01:00
Sam Edwardes
78913a4f95
Added spaCyTextBlob to universe.json ( #6395 )
2020-11-17 14:38:34 +01:00