Commit Graph

7658 Commits

Author SHA1 Message Date
Ines Montani
34146750d4 Use frozen list with custom errors
We don't want to break backwards compatibility too much but we also want to provide the best possible UX
2020-08-29 15:20:11 +02:00
Ines Montani
744f432420
Merge pull request #5994 from explosion/feature/idempotent-component-decorator 2020-08-29 13:17:13 +02:00
Ines Montani
5de3f8604d
Update spacy/util.py
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-08-29 13:17:06 +02:00
Ines Montani
091a9b522a Remove unused variable [ci skip] 2020-08-29 13:11:26 +02:00
Ines Montani
2bc31e15c9 Tidy up and auto-format [ci skip] 2020-08-29 13:01:10 +02:00
Ines Montani
6520d1a1df Work around set order in Language.disabled 2020-08-29 12:58:22 +02:00
Ines Montani
f45095a666
Merge pull request #5995 from adrianeboyd/bugfix/attribute-ruler-bugfixes 2020-08-29 12:38:30 +02:00
Ines Montani
e0b4984aa4 Make deprecated disable_pipes call into select_pipes 2020-08-29 12:08:46 +02:00
Ines Montani
15d73f4dc3 Make user-facing Language.disabled return list
More consistent with all the other properties
2020-08-29 12:08:33 +02:00
Matthew Honnibal
58f19421b1 Return empty batch from tok2vec listener if no doc.tensor 2020-08-29 03:46:50 +02:00
svlandeg
5230529de2 add loggers registry & logger docs sections 2020-08-28 21:44:04 +02:00
Ines Montani
0687d7148e Rename user-facing API 2020-08-28 21:04:02 +02:00
Adriane Boyd
0104bd1600 Sort the AttributeRuler matches by rule order
Sort the returned matches by rule order (the `match_id`) so that the
rules are applied in the order they were added. This is necessary, for
instance, if the `AttributeRuler` is used for the tag map and later
rules require POS tags.
2020-08-28 21:01:06 +02:00
Ines Montani
6a999c9303 Remove outdated component attr check 2020-08-28 20:59:19 +02:00
Adriane Boyd
8674b17651 Serialize AttributeRuler.patterns
Serialize `AttributeRuler.patterns` instead of the individual lists to
simplify the serialized and so that patterns are reloaded exactly as
they were originally provided (preserving `_attrs_unnormed`).
2020-08-28 20:44:45 +02:00
Ines Montani
10da74382f Raise if disabled components are removed before DisabledPipes.restore 2020-08-28 20:35:26 +02:00
Ines Montani
1e0363290e Remove todos and update docstrings 2020-08-28 20:34:46 +02:00
Ines Montani
cad988da7f Allow component decorators to re-run with same function 2020-08-28 16:27:22 +02:00
Ines Montani
3ce5be4b76 Allow loaded but disabled components 2020-08-28 15:20:14 +02:00
Ines Montani
89f692bc8a
Merge pull request #5992 from svlandeg/feature/wandb-restrict-config 2020-08-28 15:05:29 +02:00
Ines Montani
9c4049b57f
Merge pull request #5986 from explosion/fix/language-config-interpolate-disk-bytes 2020-08-28 15:03:52 +02:00
Ines Montani
adc050cdc5 Fix code style in test [ci skip] 2020-08-28 15:03:21 +02:00
svlandeg
05a1bafa15 fix type 2020-08-28 14:08:33 +02:00
svlandeg
33883aa764 rename field 2020-08-28 14:06:23 +02:00
svlandeg
1d8c4070aa add disable_fields to wandb_logger 2020-08-28 13:55:32 +02:00
Ines Montani
a51b4f3a19 Merge branch 'develop' into fix/language-config-interpolate-disk-bytes 2020-08-28 13:21:17 +02:00
Ines Montani
03dde511b4
Merge pull request #5987 from explosion/feature/debug-config [ci skip] 2020-08-28 11:30:18 +02:00
Ines Montani
62e9967228 Merge branch 'develop' into fix/language-config-interpolate-disk-bytes 2020-08-28 11:19:36 +02:00
Ines Montani
4ca2698f85 Merge branch 'develop' into feature/debug-config 2020-08-28 11:19:17 +02:00
svlandeg
9a8255ffd5 two tests because of different exit type 2020-08-28 10:50:26 +02:00
svlandeg
73baaf330a update error type 2020-08-28 10:46:21 +02:00
Matthew Honnibal
c558ca4485 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-08-27 19:47:26 +02:00
Matthew Honnibal
d3ffe4ca63 Fix error when tagger was initialized with no labels 2020-08-27 18:56:58 +02:00
Ines Montani
d1780db6a4 Tidy up and use different error [ci skip] 2020-08-27 18:56:55 +02:00
Ines Montani
ff4175e839 Add more info to debug config 2020-08-27 18:17:58 +02:00
Ines Montani
daac8ebacd Don't interpolate config on Language deserialization 2020-08-27 16:44:36 +02:00
Matthew Honnibal
e1e1760fd6 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-08-27 03:22:11 +02:00
Matthew Honnibal
95adb58f15 Force tagger to pass batch of docs into model in begin_training 2020-08-27 03:21:03 +02:00
Ines Montani
cdc114e212
Merge pull request #5977 from explosion/refactor/vector-names 2020-08-26 19:03:16 +02:00
Ines Montani
8692d176f6
Merge pull request #5978 from explosion/feature/update-wasabi
Update wasabi: new diff_strings and MarkdownRenderer
2020-08-26 19:02:52 +02:00
Matthew Honnibal
9b22714a4e Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-08-26 15:48:45 +02:00
Matthew Honnibal
172af24f95 Fix upload and download 2020-08-26 15:48:23 +02:00
Ines Montani
a5fff1df51 Remove outdated non-empty output dir warning [ci skip] 2020-08-26 15:45:51 +02:00
Matthew Honnibal
2d520d3b45 Remove unused error 2020-08-26 15:41:14 +02:00
Adriane Boyd
90d88729e0
Add AttributeRuler.score (#5963)
* Add AttributeRuler.score

Add scoring for TAG / POS / MORPH / LEMMA if these are present in the
assigned token attributes.

Add default score weights (that don't really make a lot of sense) so
that the scores are in the default config in some form.

* Update docs
2020-08-26 15:39:30 +02:00
Ines Montani
3aec98ca38 Update wasabi: new diff_strings and MarkdownRenderer 2020-08-26 15:33:11 +02:00
Sofie Van Landeghem
79d460e3a2
Weights & Biases logger for train CLI (#5971)
* quick test as part of train script

* train_logger in config, default ConsoleLogger in loggers catalogue

* entitiy typo

* add wandb_logger

* cleanup

* Update spacy/cli/train_logger.py

Co-authored-by: Ines Montani <ines@ines.io>

* move loggers to gold.loggers

Co-authored-by: Ines Montani <ines@ines.io>
2020-08-26 15:24:33 +02:00
Ines Montani
0997c30b9e
Merge pull request #5974 from explosion/feature/project-document 2020-08-26 15:14:13 +02:00
Matthew Honnibal
191fb4144f Merge branch 'develop' into refactor/vector-names 2020-08-26 14:26:45 +02:00
Ines Montani
627617a079 Tidy up and add docs [ci skip] 2020-08-26 13:24:55 +02:00
Adriane Boyd
43c61da209 Set macro AUC score in Scorer.score_cats 2020-08-26 10:49:30 +02:00
Ines Montani
aeebc6678d Small cleanup and adjustments 2020-08-26 10:26:57 +02:00
Ines Montani
31567d1e42 Link project.yml 2020-08-26 10:26:32 +02:00
Ines Montani
6c2a5ff53b Auto-link local sources 2020-08-26 10:26:06 +02:00
Matthew Honnibal
77852d2428 Fix run_command for python 3.6 2020-08-26 05:02:43 +02:00
Matthew Honnibal
884cac5fb5 Make run_command backwards compatible 2020-08-26 04:33:42 +02:00
Matthew Honnibal
6547472347 Set version to v3.0.0a12 2020-08-26 04:02:34 +02:00
Adriane Boyd
7d7b65ffd4
Fix raw strings in URL pattern (#5972)
Add missing raw string specifiers.
2020-08-26 04:00:49 +02:00
Matthew Honnibal
2771e4f2b3
Fix the git "sparse checkout" functionality (#5973)
* Fix the git sparse checkout functionality

* Format
2020-08-26 04:00:14 +02:00
Ines Montani
1c958a76c1 Add comment markers to only replace auto-generated docs 2020-08-26 00:03:06 +02:00
Ines Montani
f10989e8c4 Add "project document" and more project.yml meta fields 2020-08-25 17:14:27 +02:00
Ines Montani
fdcaf86c54 Adjust docstring
End sentence earlier so it's shown as a full sentence in --help
2020-08-25 17:13:50 +02:00
Ines Montani
b89f6fa011 Fix meta defaults and error in package command 2020-08-25 17:13:33 +02:00
Ines Montani
94705c21c8 Allow reuse on validators to prevent reload error
Otherwise this will cause an error if spaCy is live reloaded, e.g. in Streamlit
2020-08-25 17:13:11 +02:00
Matthew Honnibal
4f82a02b70 Remove 'fix_pretrained_vectors_name' hack 2020-08-25 14:37:45 +02:00
Adriane Boyd
0bab7c8b91
Remove PRON_LEMMA symbol (#5968) 2020-08-25 14:21:29 +02:00
Hiroshi Matsuda
332803eda9
fix ja leading spaces (#5969)
* change condition for space after

* add NAUGHTY_STRINGS test example
2020-08-25 14:16:24 +02:00
Ines Montani
dd84577a98 Update CLI utils, project.yml schema and add test 2020-08-25 11:54:53 +02:00
Shashank
450720aca2
Added support for Sanskrit language (#5956)
* Added support for Sanskrit language

* Added tests for lexical attribute like_num
2020-08-25 10:56:29 +02:00
Matthew Honnibal
ef43152af4 Update scorer 2020-08-25 02:42:47 +02:00
Matthew Honnibal
8d6e1ce306 Update v3.0.0a11 2020-08-25 00:32:08 +02:00
Matthew Honnibal
8038b87f04
Various small tweaks to project CLI (#5965)
* Fix up/download of http and local paths

* Support git_sparse_checkout for assets

* Fix scorer

* Handle already-present directories for git assets

* Improve convert command

* Fix support for existant files in git assets

* Support branches in git sparse checkout

* Format

* Fix git assets

* Document git block in assets

* Fix test

* Fix test

* Revert "Fix test"

This reverts commit cf3097260f.

* Revert "Fix test"

This reverts commit 964d636e27.

* Dont multiply p/r/f by 100

* Display scores * 100 during training
2020-08-25 00:30:52 +02:00
Adriane Boyd
abd3f2b65a
Rename Polish lemmatizer method (#5960)
Rename Polish lemmatizer method to `pos_lookup` to distinguish it from
pure token-based lookup methods.
2020-08-25 00:22:27 +02:00
Ines Montani
e12b03358b
Support removing extra values in fill-config (#5966)
* Support removing extra values in fill-config

* Fix test
2020-08-24 22:53:47 +02:00
Matthew Honnibal
f232d8db96 Report p/r/f out of 100 2020-08-24 17:17:23 +02:00
Ines Montani
0e7f99da58
Fix handling of optional [pretraining] block (#5954)
* Fix handling of optional [pretraining] block

* Remote pretraining from default config

* Fix test

* Add schema option for empty pretrain block
2020-08-24 15:56:03 +02:00
idoshr
b10c7bc56e
Hebrew like num (#5952)
* Update stop_words.py

Hebrew STOP WORDS

* Update stop_words.py

* contributor

* contributor

* add some common domain extentions
support human number 1K/1M....

* support human number 1K/1M....

* hebrew number tokenize
1K/1M implement in EN

* test human tokenize fix

* test

* heb like num
revert human number change

* heb like num
2020-08-24 14:30:05 +02:00
Matthew Honnibal
64df37643f Update lockfile after project pull 2020-08-24 03:27:09 +02:00
Matthew Honnibal
588c28fe45 Fix project pull when deps missing 2020-08-24 01:23:36 +02:00
Matthew Honnibal
001546c19e Set version to v3.0.0a10 2020-08-23 21:15:38 +02:00
Matthew Honnibal
160a855246 Format 2020-08-23 21:15:12 +02:00
Matthew Honnibal
89f5b8abb3 Fix project push 2020-08-23 21:14:44 +02:00
Matthew Honnibal
3828bc3ed0 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-08-23 18:32:24 +02:00
Matthew Honnibal
e559867605
Allow spacy project to push and pull to/from remote storage (#5949)
* Add utils for working with remote storage

* WIP add remote_cache for project

* WIP add push and pull commands

* Use pathy in remote_cache

* Updarte util

* Update remote_cache

* Update util

* Update project assets

* Update pull script

* Update push script

* Fix type annotation in util

* Work on remote storage

* Remove site and env hash

* Fix imports

* Fix type annotation

* Require pathy

* Require pathy

* Fix import

* Add a util to handle project variable substitution

* Import push and pull commands

* Fix pull command

* Fix push command

* Fix tarfile in remote_storage

* Improve printing

* Fiddle with status messages

* Set version to v3.0.0a9

* Draft docs for spacy project remote storages

* Update docs [ci skip]

* Use Thinc config to simplify and unify template variables

* Auto-format

* Don't import Pathy globally for now

Causes slow and annoying Google Cloud warning

* Tidy up test

* Tidy up and update tests

* Update to latest Thinc

* Update docs

* variables -> vars

* Update docs [ci skip]

* Update docs [ci skip]

Co-authored-by: Ines Montani <ines@ines.io>
2020-08-23 18:32:09 +02:00
Matthew Honnibal
fe1cf7e124 Allow score_weights to list extra scores 2020-08-23 18:31:30 +02:00
Ines Montani
9bdc9e81f5 Fix error message [ci skip] 2020-08-23 12:14:02 +02:00
Sofie Van Landeghem
56eabcb2f2
Adding num_like test for Czech (#5946)
* Create lex_attrs.py

Hello,

I am missing a CZECH language in SpaCy. So I would like to help to push it a little. This file is base on others lex_attrs.py files just with translation to Czech.

* Update __init__.py

Updated for use with new Czech Lex_attrs file

* Update stop_words.py

* Create test_text.py

* add like_num testing for czech

Co-authored-by: holubvl3 <47881982+holubvl3@users.noreply.github.com>
Co-authored-by: holubvl3 <vilemrousi@gmail.com>
Co-authored-by: Vladimír Holubec <vholubec@arcdata.cz>
2020-08-21 17:06:33 +02:00
holubvl3
a341b4ef09
Adding support for Czech language (#5826)
* Create lex_attrs.py

Hello,

I am missing a CZECH language in SpaCy. So I would like to help to push it a little. This file is base on others lex_attrs.py files just with translation to Czech.

* Update __init__.py

Updated for use with new Czech Lex_attrs file

* Update stop_words.py

* Create test_text.py

Co-authored-by: Vladimír Holubec <vholubec@arcdata.cz>
2020-08-21 16:17:53 +02:00
svlandeg
af36d77d01 fix typo in docstring 2020-08-21 15:56:03 +02:00
svlandeg
3060e4ae65 Merge remote-tracking branch 'upstream/develop' into feature/docs-docs-docs
# Conflicts:
#	website/src/widgets/quickstart-training-generator.js
2020-08-21 15:16:30 +02:00
svlandeg
cc926267f8 small fixes 2020-08-21 15:05:40 +02:00
Ines Montani
aa6a7cd6e7 Update docs and consistency [ci skip] 2020-08-21 13:49:18 +02:00
Ines Montani
3826cfb8fe
Merge pull request #5930 from svlandeg/feature/init-config-fix
UX for init config
2020-08-21 12:06:33 +02:00
Ines Montani
79af7dcd6d Small wording adjustments [ci skip] 2020-08-21 12:06:19 +02:00
Ines Montani
e60442d83a Adjust label casing in displaCy NER visualizer (resolves #4866)
- Accept any case for label names in ents and colors option, even if actual predicted label uses different casing
- Don't text-transform: uppercase visually, if it's important to users that the label is represented as-is in the UI
2020-08-21 11:51:31 +02:00
Matthew Honnibal
c356e62908 Minor adjustments to quickstart template 2020-08-21 00:10:21 +02:00
Ines Montani
6ad59d59fe Merge branch 'develop' of https://github.com/explosion/spaCy into develop [ci skip] 2020-08-20 11:20:58 +02:00
Sofie Van Landeghem
071c09ff35
add coding (#5942) 2020-08-20 11:08:38 +02:00
Ines Montani
ea6640ea72
Merge pull request #5939 from explosion/feature/thinc-v8.0.0a28
Update Thinc and config variables
2020-08-19 21:14:36 +02:00
Ines Montani
3dd390b1a1 Update Thinc and config variables 2020-08-19 19:46:12 +02:00