svlandeg
556f3e4652
add pooling to NEL's TransformerListener
2020-09-23 09:24:28 +02:00
Sofie Van Landeghem
86a08f819d
tok2vec.update instead of predict ( #6113 )
2020-09-22 21:54:52 +02:00
Ines Montani
5e3b796b12
Validate section refs in debug config
2020-09-22 12:24:39 +02:00
svlandeg
085a1c8e2b
add no_output_layer to TextCatBOW config
2020-09-22 12:06:40 +02:00
svlandeg
b556a10808
rename converts in_to_out
2020-09-22 11:50:19 +02:00
svlandeg
e931f4d757
add textcat score
2020-09-22 10:56:43 +02:00
svlandeg
396b33257f
add entity_linker to jinja template
2020-09-22 10:40:05 +02:00
svlandeg
135de82a2d
add textcat to quickstart
2020-09-22 10:22:06 +02:00
Ines Montani
6316d5f398
Improve messages in project CLI [ci skip]
2020-09-22 09:45:34 +02:00
Ines Montani
81606b29bd
Merge pull request #6104 from svlandeg/fix/debug_model [ci skip]
2020-09-22 09:31:23 +02:00
svlandeg
45b29c4a5b
cleanup
2020-09-21 23:17:23 +02:00
svlandeg
fa5c416db6
initialize through nlp object and with train_corpus
2020-09-21 23:09:22 +02:00
svlandeg
447b3e5787
Merge remote-tracking branch 'upstream/develop' into fix/debug_model
...
# Conflicts:
# spacy/cli/debug_model.py
2020-09-21 16:58:40 +02:00
Ines Montani
e8bcaa44f1
Don't auto-decompress archives with smart_open [ci skip]
2020-09-21 16:01:46 +02:00
svlandeg
eb9b447960
Merge remote-tracking branch 'upstream/develop' into fix/debug_model
...
# Conflicts:
# spacy/cli/debug_model.py
2020-09-21 14:05:16 +02:00
Ines Montani
758ead8a47
Sync overrides with CLI overrides
2020-09-21 12:50:13 +02:00
Ines Montani
5497acf49a
Support config overrides via environment variables
2020-09-21 11:25:10 +02:00
Ines Montani
1114219ae3
Tidy up and auto-format
2020-09-21 10:59:07 +02:00
Ines Montani
b2302c0a1c
Improve error for missing dependency
2020-09-20 17:44:51 +02:00
Matthew Honnibal
8fb59d958c
Format
2020-09-20 16:31:48 +02:00
Matthew Honnibal
dc22771f87
Fix sparse checkout
2020-09-20 16:30:05 +02:00
Matthew Honnibal
a0fb5e50db
Use simple git clone call if not sparse
2020-09-20 16:22:04 +02:00
Matthew Honnibal
2c24d633d0
Use updated run_command
2020-09-20 16:21:43 +02:00
Ines Montani
554c9a2497
Update docs [ci skip]
2020-09-20 12:30:53 +02:00
svlandeg
6db1d5dc0d
trying some stuff
2020-09-19 19:11:30 +02:00
Ines Montani
e863b3dc14
Merge pull request #6092 from adrianeboyd/bugfix/load-vocab-lookups-2
2020-09-19 12:33:38 +02:00
Sofie Van Landeghem
39872de1f6
Introducing the gpu_allocator ( #6091 )
...
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'
* --code instead of --code-path
* update documentation
* avoid querying the "system" section directly
* add explanation of gpu_allocator to TF/PyTorch section in docs
* fix typo
* fix typo 2
* use set_gpu_allocator from thinc 8.0.0a34
* default null instead of empty string
2020-09-19 01:17:02 +02:00
svlandeg
73ff52b9ec
hack for tok2vec listener
2020-09-18 16:43:15 +02:00
Adriane Boyd
eed4b785f5
Load vocab lookups tables at beginning of training
...
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.
The option moves from `nlp.load_vocab_data` to `training.lookups`.
Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.
The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.
To load `lexeme_norm` from `spacy-lookups-data`:
```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
Ines Montani
a127fa475e
Merge pull request #6078 from svlandeg/fix/corpus
2020-09-18 14:44:21 +02:00
svlandeg
e4fc7e0222
fixing output sample to proper 2D array
2020-09-17 22:34:36 +02:00
Ines Montani
3865214343
Use consistent shortcut
2020-09-17 16:57:02 +02:00
svlandeg
35a3931064
fix typo
2020-09-17 16:36:27 +02:00
svlandeg
ddfc1fc146
add pretraining option to init config
2020-09-17 16:05:40 +02:00
svlandeg
427dbecdd6
cleanup and formatting
2020-09-17 11:48:04 +02:00
svlandeg
0c35885751
generalize corpora, dot notation for dev and train corpus
2020-09-17 11:38:59 +02:00
svlandeg
51fa929f47
rewrite train_corpus to corpus.train in config
2020-09-15 21:58:04 +02:00
Ines Montani
9cc304c194
Merge pull request #6064 from explosion/fix/sparse-checkout-ux
...
Fix sparse checkout and error handling
2020-09-15 00:32:20 +02:00
Sofie Van Landeghem
3216a33149
positive_label config for textcat ( #6062 )
...
* hook up positive_label in textcat
* unit tests
* documentation
* formatting
* tests
* fix typo
* move verify_config to after begin_training
* revert accidential commit
2020-09-14 17:08:00 +02:00
Ines Montani
c052017025
Fix sparse checkout and error handling
2020-09-14 14:12:58 +02:00
Matthew Honnibal
54c40223a1
Improve v3 pretrain command ( #6040 )
...
* Starts to run
* Update pretrain script
* Update corpus
* Update pretrain schema
* Remove outdated test
* Make JsonlTexts produce Example objects.
2020-09-13 14:05:05 +02:00
Ines Montani
febb99916d
Tidy up and auto-format [ci skip]
2020-09-13 10:55:36 +02:00
Ines Montani
a5633b205f
Fix handling of errors around git [ci skip]
2020-09-13 10:52:28 +02:00
Ines Montani
f8846c198d
Update types and docstrings
2020-09-13 10:52:02 +02:00
Matthew Honnibal
37347830d4
Fix reading in GloVe vectors
2020-09-12 17:31:18 +02:00
Ines Montani
b41be87213
Merge pull request #6051 from svlandeg/feature/cli-config
2020-09-12 17:12:35 +02:00
Ines Montani
eedaaaec75
Fix handling of existing asset without checksum [ci skip]
2020-09-12 17:02:53 +02:00
svlandeg
a75cfe0da6
Merge remote-tracking branch 'upstream/develop' into feature/cli-config
2020-09-12 14:44:40 +02:00
svlandeg
115147804a
string_to_list to parse comma-separated string into a list
2020-09-12 14:43:22 +02:00
Ines Montani
f886f5bbc8
Merge pull request #6048 from explosion/fix/clone-compat
2020-09-12 10:30:49 +02:00
Ines Montani
0b2e07215d
Support overwriting name on spacy package
2020-09-11 11:38:28 +02:00
svlandeg
5b94aeece9
support pipeline as "list in string"
2020-09-11 11:08:46 +02:00
Ines Montani
1bce432b4a
Adjust message [ci skip]
2020-09-11 10:00:49 +02:00
Ines Montani
5acd4fbcd8
Merge branch 'develop' into fix/clone-compat
2020-09-11 09:58:30 +02:00
Ines Montani
761bd60d43
Adjust info message
2020-09-11 09:57:00 +02:00
Ines Montani
6831161bfa
Resolve path to be extra sure
2020-09-11 09:56:49 +02:00
svlandeg
1723fb73c4
remove brol
2020-09-10 17:44:59 +02:00
svlandeg
08a831ce83
process trailing slash if any
2020-09-10 17:39:52 +02:00
Ines Montani
3e83a509bb
WIP: fix project clone compatibility
2020-09-10 15:49:13 +02:00
svlandeg
f1bc09c1e9
restore partly
2020-09-10 14:53:02 +02:00
svlandeg
3889747119
asset fix & UX
2020-09-10 14:36:53 +02:00
svlandeg
a36766d153
hookup branch
2020-09-10 12:00:34 +02:00
svlandeg
97d99f7efa
Merge remote-tracking branch 'upstream/develop' into feature/doc-fixes
2020-09-10 11:51:34 +02:00
Ines Montani
908f3a4494
Update default projects repo [ci skip]
2020-09-10 11:42:14 +02:00
svlandeg
92f9d2f406
small UX fixes
2020-09-10 11:35:50 +02:00
svlandeg
1fc5486792
more fine-grained errors for git_sparse_checkout
2020-09-10 11:31:32 +02:00
Ines Montani
15bc3a37b4
Add --branch to project clone
2020-09-10 11:08:15 +02:00
Sofie Van Landeghem
8e7557656f
Renaming gold & annotation_setter ( #6042 )
...
* version bump to 3.0.0a16
* rename "gold" folder to "training"
* rename 'annotation_setter' to 'set_extra_annotations'
* formatting
2020-09-09 10:31:03 +02:00
Sofie Van Landeghem
60f22e1800
Pipe API ( #6034 )
...
* ensure Language passes on valid examples for initialization
* fix tagger model initialization
* check for valid get_examples across components
* assume labels were added before begin_training
* fix senter initialization
* fix morphologizer initialization
* use methods to check arguments
* test textcat init, requires thinc>=8.0.0a31
* fix tok2vec init
* fix entity linker init
* use islice
* fix simple NER
* cleanup debug model
* fix assert statements
* fix tests
* throw error when adding a label if the output layer can't be resized anymore
* fix test
* add failing test for simple_ner
* UX improvements
* morphologizer UX
* assume begin_training gets a representative set and processes the labels
* remove assumptions for output of untrained NER model
* restore test for original purpose
2020-09-08 22:44:25 +02:00
Matthew Honnibal
ba5f4c9b32
Add words and seconds to train info
2020-09-08 15:24:47 +02:00
Matthew Honnibal
b470062153
Add CLI registry ( #6037 )
2020-09-08 15:23:34 +02:00
Matthew Honnibal
4b7abaafdb
Fix learn rate for non-transformer
2020-09-04 21:22:50 +02:00
Matthew Honnibal
465785a672
Fix project pull and push
2020-09-04 21:15:55 +02:00
Ines Montani
ab1bb421ed
Update docs links in codebase
2020-09-04 12:58:50 +02:00
Ines Montani
2189046869
Merge pull request #6024 from explosion/chore/registry-renaming
2020-09-04 10:54:10 +02:00
Matthew Honnibal
1c07820681
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-09-03 18:54:21 +02:00
Matthew Honnibal
7be8a0516a
Fix project pull
2020-09-03 18:54:03 +02:00
Ines Montani
23b7d9cfa3
Prefix span getters
2020-09-03 17:37:06 +02:00
Ines Montani
c063e55eb7
Add prefix to batchers
2020-09-03 17:30:41 +02:00
Ines Montani
c53b1433b9
Adjust more arguments [ci skip]
2020-09-03 17:12:24 +02:00
Ines Montani
b5a0657fd6
"model" terminology consistency in docs
2020-09-03 13:13:03 +02:00
Matthew Honnibal
122cb02001
Fix averages
2020-09-02 19:37:43 +02:00
Sofie Van Landeghem
6bfb1b3a29
Fix sparse checkout for 'spacy project' ( #6008 )
...
* exit if cloning fails
* UX
* rewrite http link to git protocol, don't use stdin
* fixes to sparse checkout
* formatting
2020-09-01 19:49:01 +02:00
Ines Montani
70b226f69d
Support ignore marker in project document [ci skip]
2020-09-01 12:49:04 +02:00
Ines Montani
a4c51f0f18
Add v3 info to project docs [ci skip]
2020-09-01 12:36:21 +02:00
Ines Montani
ef9005273b
Update fill-config command and add silent mode [ci skip]
2020-09-01 12:07:04 +02:00
Matthew Honnibal
ec660e3131
Fix use_pytorch_for_gpu_memory
2020-09-01 00:41:38 +02:00
Matthw Honnibal
c38298b8fa
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-08-31 19:55:55 +02:00
Matthw Honnibal
fe298fa50a
Shuffle on first epoch of train
2020-08-31 19:55:22 +02:00
svlandeg
13ee742fb4
example of custom logger
2020-08-31 14:24:41 +02:00
svlandeg
c18eb63483
Merge remote-tracking branch 'upstream/develop' into feature/vectors-docs
...
# Conflicts:
# website/docs/usage/embeddings-transformers.md
2020-08-31 13:21:36 +02:00
Sofie Van Landeghem
ec14744ee4
Rename Transformer listener ( #6001 )
...
* rename to spacy-transformers.TransformerListener
* add some more tok2vec tests
* use select_pipes
* fix docs - annotation setter was not changed in the end
2020-08-31 12:41:39 +02:00
Ines Montani
45f46a5c85
Merge pull request #5993 from explosion/feature/disabled-components
2020-08-29 15:58:41 +02:00
Ines Montani
34146750d4
Use frozen list with custom errors
...
We don't want to break backwards compatibility too much but we also want to provide the best possible UX
2020-08-29 15:20:11 +02:00
Ines Montani
2bc31e15c9
Tidy up and auto-format [ci skip]
2020-08-29 13:01:10 +02:00
svlandeg
5230529de2
add loggers registry & logger docs sections
2020-08-28 21:44:04 +02:00
Ines Montani
4ca2698f85
Merge branch 'develop' into feature/debug-config
2020-08-28 11:19:17 +02:00
Ines Montani
d1780db6a4
Tidy up and use different error [ci skip]
2020-08-27 18:56:55 +02:00
Ines Montani
ff4175e839
Add more info to debug config
2020-08-27 18:17:58 +02:00
Ines Montani
8692d176f6
Merge pull request #5978 from explosion/feature/update-wasabi
...
Update wasabi: new diff_strings and MarkdownRenderer
2020-08-26 19:02:52 +02:00