Adriane Boyd
d83e3c44c5
Remove corpus-specific morph rules
...
* Remove corpus-specific morph rules
* Add options similar to tag maps to provide them in the `train` and
`debug-data` CLIs
2020-07-15 19:44:18 +02:00
Adriane Boyd
2f981d5af1
Remove corpus-specific tag maps
...
Remove corpus-specific tag maps from the language data for languages
without custom tokenizers. For languages with custom word segmenters
that also provide tags (Japanese and Korean), the tag maps for the
custom tokenizers are kept as the default.
The default tag maps for languages without custom tokenizers are now the
default tag map from `lang/tag_map/py`, UPOS -> UPOS.
2020-07-15 15:58:29 +02:00
Adriane Boyd
5228920e2f
Clarify warning W030 for misaligned BILUO tags ( #5761 )
2020-07-14 14:09:48 +02:00
Adriane Boyd
a7a7e0d2a6
Add morph to morphology in Doc.from_array ( #5762 )
...
* Add morph to morphology in Doc.from_array
Add morphological analyses to morphology table in `Doc.from_array`.
* Use separate vocab in DocBin roundtrip test
2020-07-14 14:07:35 +02:00
Ines Montani
872938ec76
Merge pull request #5747 from explosion/feature/refactor-config-args
2020-07-14 00:00:22 +02:00
Sofie Van Landeghem
6f3bb6f77c
fix doc.to_utf8 on GPU ( #5757 )
2020-07-13 23:05:33 +02:00
Adriane Boyd
7ea2cc7650
Set version to 2.3.2 ( #5756 )
2020-07-13 14:55:56 +02:00
Mark Neumann
27a1cd3c63
fix meta serialization in train ( #5751 )
...
Co-authored-by: Mark Neumann <markng@allenai.org>
2020-07-12 22:06:46 +02:00
Ines Montani
dcfa910e4e
Merge pull request #5752 from explosion/compat/remove-object-subclass
2020-07-12 16:37:04 +02:00
Ines Montani
ed55143c0d
Merge branch 'develop' into compat/remove-object-subclass
2020-07-12 14:28:52 +02:00
Ines Montani
7906ddd56c
Fix test
2020-07-12 14:28:34 +02:00
Ines Montani
5f6f4ff594
Remove object subclassing
2020-07-12 14:03:23 +02:00
Ines Montani
c96535e338
Update command docstrings and docs
2020-07-12 13:53:49 +02:00
Ines Montani
0ab483037c
Make debug commands subcommands of spacy debug
...
Also handle backwards-compatibility so the old commands don't break
2020-07-12 13:53:41 +02:00
Ines Montani
3f948b9c74
Update docs
2020-07-12 12:32:28 +02:00
Ines Montani
8a67ddd6f1
Remove unused import
2020-07-12 12:32:24 +02:00
Ines Montani
d1d7fd5f5d
Don't use file paths in schemas
...
It should be possible to validate top-level config with file paths that don't exist
2020-07-12 12:32:08 +02:00
Ines Montani
79346853aa
Add debug-config command
2020-07-12 12:31:17 +02:00
Ines Montani
3a8632c3fb
Hide command from public --help for now
...
Not sure we want this to be officially documented yet?
2020-07-11 19:21:22 +02:00
Ines Montani
5e683d03fe
Allow extra args on pretrain and debug_data
2020-07-11 19:17:59 +02:00
Ines Montani
70abcca60e
Update Thinc pin
2020-07-11 17:02:54 +02:00
Ines Montani
b7111da1d7
Update config and commands
2020-07-11 13:03:53 +02:00
Ines Montani
11bbc82c24
Update cli.md [ci skip]
2020-07-10 23:37:52 +02:00
Ines Montani
9e48ea48a1
Update Thinc pin
2020-07-10 23:34:57 +02:00
Ines Montani
f99ce7fbfb
Make validation errors more elegant
2020-07-10 23:34:17 +02:00
Ines Montani
9455b060d2
Update cli.md
2020-07-10 22:57:22 +02:00
Ines Montani
7b5717cac3
Merge branch 'develop' into feature/refactor-config-args
2020-07-10 22:50:07 +02:00
Ines Montani
e6a6587a9a
Update projects.md [ci skip]
2020-07-10 22:41:27 +02:00
Matthew Honnibal
743f7fb73a
Set version to v3.0.0a4
2020-07-10 22:40:12 +02:00
Matthew Honnibal
b68216e263
Explicitly delete objects after parser.update to free GPU memory ( #5748 )
...
* Try explicitly deleting objects
* Refactor parser model backprop slightly
* Free parser data explicitly after rehearse and update
2020-07-10 22:35:20 +02:00
Ines Montani
f2cd982e7b
Update training.md
2020-07-10 22:34:27 +02:00
Ines Montani
fb6f6f584e
Replace - with _ in command names
...
We might as well be nice if user accidentally types --training.use-gpu
2020-07-10 22:34:22 +02:00
Ines Montani
bfa8e11ffa
Update and auto-format
2020-07-10 20:52:00 +02:00
Ines Montani
0389c34b81
Merge branch 'develop' into feature/refactor-config-args
2020-07-10 20:51:52 +02:00
Ines Montani
931250e1f5
Fix pipeline component schema
2020-07-10 20:32:53 +02:00
Ines Montani
9fe1fa88ad
Fix typo
2020-07-10 20:32:37 +02:00
Ines Montani
459c6aa8f0
Merge branch 'feature/refactor-config-args' of https://github.com/explosion/spaCy into feature/refactor-config-args
2020-07-10 20:01:28 +02:00
Ines Montani
defe1e7213
Pretty-print config validation errors
2020-07-10 20:01:20 +02:00
Matthew Honnibal
894f31226b
Update config
2020-07-10 19:59:12 +02:00
Sofie Van Landeghem
de6a32315c
debug-model script ( #5749 )
...
* adding debug-model to print the internals for debugging purposes
* expend debug-model script with 4 stages: before, init, train, predict
* avoid enforcing to have a seed in the train script
* small fixes
2020-07-10 19:47:53 +02:00
Ines Montani
a3667394b4
Integrate with latest Thinc and config overrides
2020-07-10 19:47:05 +02:00
Ines Montani
5cfc3edcaa
Update CLI tests
2020-07-10 18:21:01 +02:00
Ines Montani
3583ea84d8
Update arg parsing
2020-07-10 18:20:52 +02:00
Ines Montani
73332ddb67
Update CLI commans to use one shared util file
2020-07-10 17:57:40 +02:00
Ines Montani
240e0a62ca
Update with WIP
2020-07-10 13:31:27 +02:00
Ines Montani
a60562f208
Update project CLI hashes, directories, skipping ( #5741 )
...
* Update project CLI hashes, directories, skipping
* Improve clone success message
* Remove unused context args
* Move project-specific utils to project utils
The hashing/checksum functions may not end up being general-purpose functions and are more designed for the projects, so they shouldn't live in spacy.util
* Improve run help and add workflows
* Add note re: directory checksum speed
* Fix cloning from subdirectories and output messages
* Remove hard-coded dirs
2020-07-09 23:51:18 +02:00
Ines Montani
e624fcd5d9
Merge branch 'nightly.spacy.io' into develop
2020-07-09 23:26:26 +02:00
Ines Montani
52e9b5b472
Fix formatting
2020-07-09 23:25:58 +02:00
Ines Montani
28cdae898a
Update projects.md
2020-07-09 22:35:54 +02:00
Adriane Boyd
0a62098c5f
Fix lemmatizer is_base_form for python2.7 ( #5734 )
...
* Fix lemmatizer init args for python2.7
* Move English is_base_form to a class method
* Skip test pickling PhraseMatcher for python2
2020-07-09 22:11:24 +02:00