Commit Graph

12461 Commits

Author SHA1 Message Date
Ines Montani
6f4e4aceb3 Add Plausible [ci skip] 2020-07-18 23:50:29 +02:00
Adriane Boyd
50db3f0cdb Serialize morph rules with tagger
Serialize `morph_rules` with the tagger alongside the `tag_map`.

Use `Morphology.load_tag_map` and `Morphology.load_morph_exceptions` to
load these settings rather than reinitializing the morphology each time
they are changed.
2020-07-17 08:22:21 +02:00
Adriane Boyd
d106cf66dd Update Morphology to load exceptions as MORPH_RULES
Update `Morphology` to load exceptions in `Morphology.__init__` and
`Morphology.load_morph_exceptions` from the format used in `MORPH_RULES`
rather than the internal format with tuple keys.

* Rename to `Morphology.exc` to `Morphology._exc` for internal use with
tuple keys
* Add `Morphology.exc` as a property that converts the internal `_exc`
back to `MORPH_RULES` format, primarily for serialization
2020-07-16 21:16:49 +02:00
Adriane Boyd
d83e3c44c5 Remove corpus-specific morph rules
* Remove corpus-specific morph rules
* Add options similar to tag maps to provide them in the `train` and
`debug-data` CLIs
2020-07-15 19:44:18 +02:00
Adriane Boyd
2f981d5af1 Remove corpus-specific tag maps
Remove corpus-specific tag maps from the language data for languages
without custom tokenizers. For languages with custom word segmenters
that also provide tags (Japanese and Korean), the tag maps for the
custom tokenizers are kept as the default.

The default tag maps for languages without custom tokenizers are now the
default tag map from `lang/tag_map/py`, UPOS -> UPOS.
2020-07-15 15:58:29 +02:00
Adriane Boyd
5228920e2f
Clarify warning W030 for misaligned BILUO tags (#5761) 2020-07-14 14:09:48 +02:00
Adriane Boyd
a7a7e0d2a6
Add morph to morphology in Doc.from_array (#5762)
* Add morph to morphology in Doc.from_array

Add morphological analyses to morphology table in `Doc.from_array`.

* Use separate vocab in DocBin roundtrip test
2020-07-14 14:07:35 +02:00
Ines Montani
872938ec76
Merge pull request #5747 from explosion/feature/refactor-config-args 2020-07-14 00:00:22 +02:00
Sofie Van Landeghem
6f3bb6f77c
fix doc.to_utf8 on GPU (#5757) 2020-07-13 23:05:33 +02:00
Adriane Boyd
7ea2cc7650
Set version to 2.3.2 (#5756) 2020-07-13 14:55:56 +02:00
Mark Neumann
27a1cd3c63
fix meta serialization in train (#5751)
Co-authored-by: Mark Neumann <markng@allenai.org>
2020-07-12 22:06:46 +02:00
Ines Montani
dcfa910e4e
Merge pull request #5752 from explosion/compat/remove-object-subclass 2020-07-12 16:37:04 +02:00
Ines Montani
ed55143c0d Merge branch 'develop' into compat/remove-object-subclass 2020-07-12 14:28:52 +02:00
Ines Montani
7906ddd56c Fix test 2020-07-12 14:28:34 +02:00
Ines Montani
5f6f4ff594 Remove object subclassing 2020-07-12 14:03:23 +02:00
Ines Montani
c96535e338 Update command docstrings and docs 2020-07-12 13:53:49 +02:00
Ines Montani
0ab483037c Make debug commands subcommands of spacy debug
Also handle backwards-compatibility so the old commands don't break
2020-07-12 13:53:41 +02:00
Ines Montani
3f948b9c74 Update docs 2020-07-12 12:32:28 +02:00
Ines Montani
8a67ddd6f1 Remove unused import 2020-07-12 12:32:24 +02:00
Ines Montani
d1d7fd5f5d Don't use file paths in schemas
It should be possible to validate top-level config with file paths that don't exist
2020-07-12 12:32:08 +02:00
Ines Montani
79346853aa Add debug-config command 2020-07-12 12:31:17 +02:00
Ines Montani
3a8632c3fb Hide command from public --help for now
Not sure we want this to be officially documented yet?
2020-07-11 19:21:22 +02:00
Ines Montani
5e683d03fe Allow extra args on pretrain and debug_data 2020-07-11 19:17:59 +02:00
Ines Montani
70abcca60e Update Thinc pin 2020-07-11 17:02:54 +02:00
Ines Montani
b7111da1d7 Update config and commands 2020-07-11 13:03:53 +02:00
Ines Montani
11bbc82c24 Update cli.md [ci skip] 2020-07-10 23:37:52 +02:00
Ines Montani
9e48ea48a1 Update Thinc pin 2020-07-10 23:34:57 +02:00
Ines Montani
f99ce7fbfb Make validation errors more elegant 2020-07-10 23:34:17 +02:00
Ines Montani
9455b060d2 Update cli.md 2020-07-10 22:57:22 +02:00
Ines Montani
7b5717cac3 Merge branch 'develop' into feature/refactor-config-args 2020-07-10 22:50:07 +02:00
Ines Montani
e6a6587a9a Update projects.md [ci skip] 2020-07-10 22:41:27 +02:00
Matthew Honnibal
743f7fb73a Set version to v3.0.0a4 2020-07-10 22:40:12 +02:00
Matthew Honnibal
b68216e263
Explicitly delete objects after parser.update to free GPU memory (#5748)
* Try explicitly deleting objects

* Refactor parser model backprop slightly

* Free parser data explicitly after rehearse and update
2020-07-10 22:35:20 +02:00
Ines Montani
f2cd982e7b Update training.md 2020-07-10 22:34:27 +02:00
Ines Montani
fb6f6f584e Replace - with _ in command names
We might as well be nice if user accidentally types --training.use-gpu
2020-07-10 22:34:22 +02:00
Ines Montani
bfa8e11ffa Update and auto-format 2020-07-10 20:52:00 +02:00
Ines Montani
0389c34b81 Merge branch 'develop' into feature/refactor-config-args 2020-07-10 20:51:52 +02:00
Ines Montani
931250e1f5 Fix pipeline component schema 2020-07-10 20:32:53 +02:00
Ines Montani
9fe1fa88ad Fix typo 2020-07-10 20:32:37 +02:00
Ines Montani
459c6aa8f0 Merge branch 'feature/refactor-config-args' of https://github.com/explosion/spaCy into feature/refactor-config-args 2020-07-10 20:01:28 +02:00
Ines Montani
defe1e7213 Pretty-print config validation errors 2020-07-10 20:01:20 +02:00
Matthew Honnibal
894f31226b Update config 2020-07-10 19:59:12 +02:00
Sofie Van Landeghem
de6a32315c
debug-model script (#5749)
* adding debug-model to print the internals for debugging purposes

* expend debug-model script with 4 stages: before, init, train, predict

* avoid enforcing to have a seed in the train script

* small fixes
2020-07-10 19:47:53 +02:00
Ines Montani
a3667394b4 Integrate with latest Thinc and config overrides 2020-07-10 19:47:05 +02:00
Ines Montani
5cfc3edcaa Update CLI tests 2020-07-10 18:21:01 +02:00
Ines Montani
3583ea84d8 Update arg parsing 2020-07-10 18:20:52 +02:00
Ines Montani
73332ddb67 Update CLI commans to use one shared util file 2020-07-10 17:57:40 +02:00
Ines Montani
240e0a62ca Update with WIP 2020-07-10 13:31:27 +02:00
Ines Montani
a60562f208
Update project CLI hashes, directories, skipping (#5741)
* Update project CLI hashes, directories, skipping

* Improve clone success message

* Remove unused context args

* Move project-specific utils to project utils

The hashing/checksum functions may not end up being general-purpose functions and are more designed for the projects, so they shouldn't live in spacy.util

* Improve run help and add workflows

* Add note re: directory checksum speed

* Fix cloning from subdirectories and output messages

* Remove hard-coded dirs
2020-07-09 23:51:18 +02:00
Ines Montani
e624fcd5d9 Merge branch 'nightly.spacy.io' into develop 2020-07-09 23:26:26 +02:00