Commit Graph

56 Commits

Author SHA1 Message Date
Adriane Boyd
bf0cdae8d4
Add token_splitter component (#6726)
* Add long_token_splitter component

Add a `long_token_splitter` component for use with transformer
pipelines. This component splits up long tokens like URLs into smaller
tokens. This is particularly relevant for pretrained pipelines with
`strided_spans`, since the user can't change the length of the span
`window` and may not wish to preprocess the input texts.

The `long_token_splitter` splits tokens that are at least
`long_token_length` tokens long into smaller tokens of `split_length`
size.

Notes:

* Since this is intended for use as the first component in a pipeline,
the token splitter does not try to preserve any token annotation.
* API docs to come when the API is stable.

* Adjust API, add test

* Fix name in factory
2021-01-17 19:54:41 +08:00
Sofie Van Landeghem
75d9019343
Fix types of Tok2Vec encoding architectures (#6442)
* fix TorchBiLSTMEncoder documentation

* ensure the types of the encoding Tok2vec layers are correct

* update references from v1 to v2 for the new architectures
2021-01-07 16:39:27 +11:00
Sofie Van Landeghem
82ae95267a
Docs for pretrain architectures (#6605)
* document pretraining architectures

* formatting

* bit more info

* small fixes
2021-01-06 16:12:30 +11:00
Ines Montani
c968d1560f Fix docs example [ci skip] 2020-10-16 11:33:20 +02:00
Ines Montani
ba1e004049 Fix typo [ci skip] 2020-10-15 23:39:04 +02:00
svlandeg
08cb085f6c Merge remote-tracking branch 'upstream/develop' into fix/various 2020-10-09 17:01:27 +02:00
Ines Montani
9fb3244672
Merge pull request #6231 from adrianeboyd/feature/include-static-vectors 2020-10-09 15:54:52 +02:00
Adriane Boyd
2dd79454af Update docs 2020-10-09 14:42:07 +02:00
svlandeg
853edace37 fix MultiHashEmbed example in documentation 2020-10-09 14:11:06 +02:00
Ines Montani
e50dc2c1c9 Update docs [ci skip] 2020-10-09 12:04:52 +02:00
Ines Montani
d1602e1ece Update docs [ci skip] 2020-10-08 11:56:50 +02:00
Ines Montani
43e59bb22a Update docs and install extras [ci skip] 2020-10-08 10:58:50 +02:00
Ines Montani
01c1538c72 Integrate file readers 2020-10-02 01:36:06 +02:00
Sofie Van Landeghem
a22215f427
Add FeatureExtractor from Thinc (#6170)
* move featureextractor from Thinc

* Update website/docs/api/architectures.md

Co-authored-by: Ines Montani <ines@ines.io>

* Update website/docs/api/architectures.md

Co-authored-by: Ines Montani <ines@ines.io>

Co-authored-by: Ines Montani <ines@ines.io>
2020-10-01 16:22:48 +02:00
Ines Montani
0a8a124a6e Update docs [ci skip] 2020-10-01 12:15:53 +02:00
Ines Montani
361f91e286
Merge pull request #6135 from walterhenry/develop-proof 2020-09-29 20:49:06 +02:00
walterhenry
1d80b3dc1b Proofreading
Finished with the API docs and started on the Usage, but Embedding & Transformers
2020-09-29 12:39:10 +02:00
Sofie Van Landeghem
009ba14aaf
Fix pretraining in train script (#6143)
* update pretraining API in train CLI

* bump thinc to 8.0.0a35

* bump to 3.0.0a26

* doc fixes

* small doc fix
2020-09-25 15:47:10 +02:00
Ines Montani
6836b66433 Update docs and resolve todos [ci skip] 2020-09-24 13:41:25 +02:00
svlandeg
6c85fab316 state_type and extra_state_tokens instead of nr_feature_tokens 2020-09-23 13:35:09 +02:00
Ines Montani
012b3a7096 Update docs [ci skip] 2020-09-20 17:44:58 +02:00
Ines Montani
554c9a2497 Update docs [ci skip] 2020-09-20 12:30:53 +02:00
Ines Montani
a0b4389a38 Update docs [ci skip] 2020-09-17 19:24:48 +02:00
Matthew Honnibal
6efb7688a6 Draft pretrain usage 2020-09-17 18:17:03 +02:00
Ines Montani
a2c8cda26f Update docs [ci skip] 2020-09-17 17:12:51 +02:00
Matthew Honnibal
ec751068f3 Draft text for static vectors intro 2020-09-17 16:42:53 +02:00
Ines Montani
8b0dabe987 Update docs [ci skip] 2020-09-12 17:05:10 +02:00
Sofie Van Landeghem
8e7557656f
Renaming gold & annotation_setter (#6042)
* version bump to 3.0.0a16

* rename "gold" folder to "training"

* rename 'annotation_setter' to 'set_extra_annotations'

* formatting
2020-09-09 10:31:03 +02:00
Ines Montani
23b7d9cfa3 Prefix span getters 2020-09-03 17:37:06 +02:00
Ines Montani
690bd77669 Add todos [ci skip] 2020-09-01 14:04:36 +02:00
svlandeg
e47ea88aeb revert annotations refactor 2020-08-31 14:40:55 +02:00
svlandeg
c18eb63483 Merge remote-tracking branch 'upstream/develop' into feature/vectors-docs
# Conflicts:
#	website/docs/usage/embeddings-transformers.md
2020-08-31 13:21:36 +02:00
Sofie Van Landeghem
ec14744ee4
Rename Transformer listener (#6001)
* rename to spacy-transformers.TransformerListener

* add some more tok2vec tests

* use select_pipes

* fix docs - annotation setter was not changed in the end
2020-08-31 12:41:39 +02:00
Ines Montani
bc0730be3f Update docs [ci skip] 2020-08-29 12:53:14 +02:00
svlandeg
9f00a20ce4 proofreading and custom examples 2020-08-28 21:50:42 +02:00
svlandeg
556e975a30 various fixes 2020-08-27 19:24:44 +02:00
svlandeg
329e490560 small import fixes 2020-08-27 14:50:43 +02:00
svlandeg
28e4ba7270 fix references to TransformerListener 2020-08-27 14:33:28 +02:00
svlandeg
4d37ac3f33 configure_custom_sent_spans example 2020-08-27 14:14:16 +02:00
svlandeg
c68169f83f fix link 2020-08-27 10:19:43 +02:00
svlandeg
acc794c975 example of writing to other custom attribute 2020-08-27 10:10:10 +02:00
svlandeg
559b65f2e0 adjust references to null_annotation_setter to trfdata_setter 2020-08-27 09:43:32 +02:00
svlandeg
ec069627fe rename to TransformerListener 2020-08-26 13:31:01 +02:00
svlandeg
15902c5aa2 fix link 2020-08-26 11:51:57 +02:00
Ines Montani
8ac5ef1284 Update docs 2020-08-25 11:54:37 +02:00
Ines Montani
98a9e063b6 Update docs [ci skip] 2020-08-22 17:15:05 +02:00
Matthew Honnibal
048de64d4c Suggest edits 2020-08-22 17:11:28 +02:00
Ines Montani
37ebff6997 Update docs [ci skip] 2020-08-22 16:47:03 +02:00
Matthew Honnibal
d97695d09d Update embeddings-transformers.md 2020-08-22 15:41:35 +02:00
Ines Montani
aa6a7cd6e7 Update docs and consistency [ci skip] 2020-08-21 13:49:18 +02:00