Adriane Boyd
b81a89f0a9
Update morphologizer ( #5766 )
...
* update `Morphologizer.begin_training` for use with `Example`
* make init and begin_training more consistent
* add `Morphology.normalize_features` to normalize outside of
`Morphology.add`
* make sure `get_loss` doesn't create unknown labels when the POS and
morph alignments differ
2020-07-19 11:10:51 +02:00
Sofie Van Landeghem
38b59d728d
Upgrade of UD eval script ( #5776 )
...
* new morph feature format
* add new languages with tokenization
* update with all new pretrained models
2020-07-19 11:10:31 +02:00
Adriane Boyd
7e14272096
Lower upper pin for cupy to 8.0.0 ( #5773 )
2020-07-19 11:10:11 +02:00
Adriane Boyd
cd5af72c9a
Update pkuseg version ( #5774 )
...
* Update pkuseg version in Chinese tokenizer warnings
* Update pkuseg version in `Makefile`
* Remove warning about python3.8 wheels in docs
2020-07-19 11:09:49 +02:00
Ines Montani
68fade8f76
Add Plausible [ci skip]
2020-07-19 00:02:29 +02:00
Ines Montani
73098dbaf6
Add Plausible
2020-07-18 23:53:27 +02:00
Ines Montani
6f4e4aceb3
Add Plausible [ci skip]
2020-07-18 23:50:29 +02:00
Adriane Boyd
50db3f0cdb
Serialize morph rules with tagger
...
Serialize `morph_rules` with the tagger alongside the `tag_map`.
Use `Morphology.load_tag_map` and `Morphology.load_morph_exceptions` to
load these settings rather than reinitializing the morphology each time
they are changed.
2020-07-17 08:22:21 +02:00
Adriane Boyd
d106cf66dd
Update Morphology to load exceptions as MORPH_RULES
...
Update `Morphology` to load exceptions in `Morphology.__init__` and
`Morphology.load_morph_exceptions` from the format used in `MORPH_RULES`
rather than the internal format with tuple keys.
* Rename to `Morphology.exc` to `Morphology._exc` for internal use with
tuple keys
* Add `Morphology.exc` as a property that converts the internal `_exc`
back to `MORPH_RULES` format, primarily for serialization
2020-07-16 21:16:49 +02:00
Adriane Boyd
d83e3c44c5
Remove corpus-specific morph rules
...
* Remove corpus-specific morph rules
* Add options similar to tag maps to provide them in the `train` and
`debug-data` CLIs
2020-07-15 19:44:18 +02:00
Adriane Boyd
2f981d5af1
Remove corpus-specific tag maps
...
Remove corpus-specific tag maps from the language data for languages
without custom tokenizers. For languages with custom word segmenters
that also provide tags (Japanese and Korean), the tag maps for the
custom tokenizers are kept as the default.
The default tag maps for languages without custom tokenizers are now the
default tag map from `lang/tag_map/py`, UPOS -> UPOS.
2020-07-15 15:58:29 +02:00
Adriane Boyd
5228920e2f
Clarify warning W030 for misaligned BILUO tags ( #5761 )
2020-07-14 14:09:48 +02:00
Adriane Boyd
a7a7e0d2a6
Add morph to morphology in Doc.from_array ( #5762 )
...
* Add morph to morphology in Doc.from_array
Add morphological analyses to morphology table in `Doc.from_array`.
* Use separate vocab in DocBin roundtrip test
2020-07-14 14:07:35 +02:00
Ines Montani
872938ec76
Merge pull request #5747 from explosion/feature/refactor-config-args
2020-07-14 00:00:22 +02:00
Sofie Van Landeghem
6f3bb6f77c
fix doc.to_utf8 on GPU ( #5757 )
2020-07-13 23:05:33 +02:00
Adriane Boyd
7ea2cc7650
Set version to 2.3.2 ( #5756 )
2020-07-13 14:55:56 +02:00
Mark Neumann
27a1cd3c63
fix meta serialization in train ( #5751 )
...
Co-authored-by: Mark Neumann <markng@allenai.org>
2020-07-12 22:06:46 +02:00
Ines Montani
dcfa910e4e
Merge pull request #5752 from explosion/compat/remove-object-subclass
2020-07-12 16:37:04 +02:00
Ines Montani
ed55143c0d
Merge branch 'develop' into compat/remove-object-subclass
2020-07-12 14:28:52 +02:00
Ines Montani
7906ddd56c
Fix test
2020-07-12 14:28:34 +02:00
Ines Montani
5f6f4ff594
Remove object subclassing
2020-07-12 14:03:23 +02:00
Ines Montani
c96535e338
Update command docstrings and docs
2020-07-12 13:53:49 +02:00
Ines Montani
0ab483037c
Make debug commands subcommands of spacy debug
...
Also handle backwards-compatibility so the old commands don't break
2020-07-12 13:53:41 +02:00
Ines Montani
3f948b9c74
Update docs
2020-07-12 12:32:28 +02:00
Ines Montani
8a67ddd6f1
Remove unused import
2020-07-12 12:32:24 +02:00
Ines Montani
d1d7fd5f5d
Don't use file paths in schemas
...
It should be possible to validate top-level config with file paths that don't exist
2020-07-12 12:32:08 +02:00
Ines Montani
79346853aa
Add debug-config command
2020-07-12 12:31:17 +02:00
Ines Montani
3a8632c3fb
Hide command from public --help for now
...
Not sure we want this to be officially documented yet?
2020-07-11 19:21:22 +02:00
Ines Montani
5e683d03fe
Allow extra args on pretrain and debug_data
2020-07-11 19:17:59 +02:00
Ines Montani
70abcca60e
Update Thinc pin
2020-07-11 17:02:54 +02:00
Ines Montani
b7111da1d7
Update config and commands
2020-07-11 13:03:53 +02:00
Ines Montani
11bbc82c24
Update cli.md [ci skip]
2020-07-10 23:37:52 +02:00
Ines Montani
9e48ea48a1
Update Thinc pin
2020-07-10 23:34:57 +02:00
Ines Montani
f99ce7fbfb
Make validation errors more elegant
2020-07-10 23:34:17 +02:00
Ines Montani
9455b060d2
Update cli.md
2020-07-10 22:57:22 +02:00
Ines Montani
7b5717cac3
Merge branch 'develop' into feature/refactor-config-args
2020-07-10 22:50:07 +02:00
Ines Montani
e6a6587a9a
Update projects.md [ci skip]
2020-07-10 22:41:27 +02:00
Matthew Honnibal
743f7fb73a
Set version to v3.0.0a4
2020-07-10 22:40:12 +02:00
Matthew Honnibal
b68216e263
Explicitly delete objects after parser.update to free GPU memory ( #5748 )
...
* Try explicitly deleting objects
* Refactor parser model backprop slightly
* Free parser data explicitly after rehearse and update
2020-07-10 22:35:20 +02:00
Ines Montani
f2cd982e7b
Update training.md
2020-07-10 22:34:27 +02:00
Ines Montani
fb6f6f584e
Replace - with _ in command names
...
We might as well be nice if user accidentally types --training.use-gpu
2020-07-10 22:34:22 +02:00
Ines Montani
bfa8e11ffa
Update and auto-format
2020-07-10 20:52:00 +02:00
Ines Montani
0389c34b81
Merge branch 'develop' into feature/refactor-config-args
2020-07-10 20:51:52 +02:00
Ines Montani
931250e1f5
Fix pipeline component schema
2020-07-10 20:32:53 +02:00
Ines Montani
9fe1fa88ad
Fix typo
2020-07-10 20:32:37 +02:00
Ines Montani
459c6aa8f0
Merge branch 'feature/refactor-config-args' of https://github.com/explosion/spaCy into feature/refactor-config-args
2020-07-10 20:01:28 +02:00
Ines Montani
defe1e7213
Pretty-print config validation errors
2020-07-10 20:01:20 +02:00
Matthew Honnibal
894f31226b
Update config
2020-07-10 19:59:12 +02:00
Sofie Van Landeghem
de6a32315c
debug-model script ( #5749 )
...
* adding debug-model to print the internals for debugging purposes
* expend debug-model script with 4 stages: before, init, train, predict
* avoid enforcing to have a seed in the train script
* small fixes
2020-07-10 19:47:53 +02:00
Ines Montani
a3667394b4
Integrate with latest Thinc and config overrides
2020-07-10 19:47:05 +02:00