Commit Graph

13124 Commits

Author SHA1 Message Date
Adriane Boyd
a119667a36
Clean up spacy.tokens (#6046)
* Clean up spacy.tokens

* Update `set_children_from_heads`:
  * Don't check `dep` when setting lr_* or sentence starts
  * Set all non-sentence starts to `False`

* Use `set_children_from_heads` in `Token.head` setter
  * Reduce similar/duplicate code (admittedly adds a bit of overhead)
  * Update sentence starts consistently

* Remove unused `Doc.set_parse`

* Minor changes:
  * Declare cython variables (to avoid cython warnings)
  * Clean up imports

* Modify set_children_from_heads to set token range

Modify `set_children_from_heads` so that it adjust tokens within a
specified range rather then the whole document.

Modify the `Token.head` setter to adjust only the tokens affected by the
new head assignment.
2020-09-16 20:32:38 +02:00
Matthew Honnibal
c776594ab1 Fix 2020-09-16 18:15:14 +02:00
Matthew Honnibal
4a573d18b3 Add comment 2020-09-16 17:51:29 +02:00
Matthew Honnibal
d31afc8334 Fix Language.link_components when model is None 2020-09-16 17:49:48 +02:00
Adriane Boyd
f3db3f6fe0
Add vectors option to CharacterEmbed (#6069)
* Add vectors option to CharacterEmbed

* Update spacy/pipeline/morphologizer.pyx

* Adjust default morphologizer config

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-09-16 17:45:04 +02:00
Adriane Boyd
d722a439aa
Remove unneeded methods in senter and morphologizer (#6074)
Now that the tagger doesn't manage the tag map, the child classes senter
and morphologizer don't need to override the serialization methods.
2020-09-16 17:39:41 +02:00
Adriane Boyd
87c329c711
Set rule-based lemmatizers as default (#6076)
For languages without provided models and with lemmatizer rules in
`spacy-lookups-data`, make the rule-based lemmatizer the default:
Bengali, Persian, Norwegian, Swedish
2020-09-16 17:37:29 +02:00
svlandeg
0dc914b667 bump thinc to 8.0.0a33 2020-09-16 16:42:58 +02:00
svlandeg
1040e250d8 actual commit with test for custom readers with ml_datasets >= 0.2 2020-09-16 16:41:28 +02:00
svlandeg
714a5a05c6 test for custom readers with ml_datasets >= 0.2 2020-09-16 16:39:55 +02:00
svlandeg
0d1392340f Merge remote-tracking branch 'upstream/develop' into fix/corpus 2020-09-15 23:17:08 +02:00
Ines Montani
4d75040546
Merge pull request #6072 from svlandeg/bugfix/ExceptionInfo
Fix unit test with ExceptionInfo
2020-09-15 22:52:48 +02:00
svlandeg
f420aa1138 use e.value to get to the ExceptionInfo value 2020-09-15 22:30:09 +02:00
svlandeg
55f8d5478e fix example output 2020-09-15 22:09:30 +02:00
svlandeg
7336657662 corpus is a Dict 2020-09-15 22:07:16 +02:00
svlandeg
51fa929f47 rewrite train_corpus to corpus.train in config 2020-09-15 21:58:04 +02:00
svlandeg
bd87e8686e move tests to correct subdir 2020-09-15 21:40:38 +02:00
Ines Montani
aaf01689a1 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-15 14:24:42 +02:00
Ines Montani
91a6637f74 Remove extra pipe config values before merging 2020-09-15 14:24:17 +02:00
Ines Montani
d3d7f92f05 Fix lang check and error handling in Language.from_config 2020-09-15 14:24:06 +02:00
Ines Montani
2ed6e2a218 Auto-format 2020-09-15 14:20:04 +02:00
Ines Montani
2214d1bb7b
Merge pull request #6067 from explosion/feature/spacy-blank-from-config 2020-09-15 14:18:33 +02:00
Ines Montani
a3d24b02db
Merge pull request #6068 from svlandeg/fix/wandb
fix W&B logger
2020-09-15 14:00:58 +02:00
Matthew Honnibal
46e04d12db Fix make 2020-09-15 13:36:26 +02:00
Ines Montani
253ba5ef14 Raise for bad Vocab values 2020-09-15 13:25:34 +02:00
svlandeg
7677e5c0e2 fix wandb logger when calling multiple times from same script 2020-09-15 12:56:33 +02:00
Ines Montani
b7faa38960 Update docs [ci skip] 2020-09-15 12:44:03 +02:00
Matthew Honnibal
0f0870d45e Avoid baking '-m spacy' into the pex by default 2020-09-15 12:35:33 +02:00
Ines Montani
0edd695bf6 Update docs 2020-09-15 11:41:49 +02:00
Ines Montani
eff9406718 Support vocab arg in spacy.blank 2020-09-15 11:39:36 +02:00
Ines Montani
99549a5ace Fix consistency and update docs 2020-09-15 11:37:37 +02:00
Ines Montani
7dfc4bc062 Allow overriding meta from spacy.blank 2020-09-15 11:12:12 +02:00
Ines Montani
0f943157af Delegate to Language.from_config in spacy.blank 2020-09-15 11:07:55 +02:00
Ines Montani
e977086a9a Update default pretraining config [ci skip] 2020-09-15 01:12:02 +02:00
Ines Montani
154752f9c2 Update docs and consistency [ci skip] 2020-09-15 00:32:49 +02:00
Ines Montani
9cc304c194
Merge pull request #6064 from explosion/fix/sparse-checkout-ux
Fix sparse checkout and error handling
2020-09-15 00:32:20 +02:00
Matthew Honnibal
475323cd36 Set version to v3.0.0a18 2020-09-14 22:05:43 +02:00
Matthew Honnibal
e8378b57bc Fix test 2020-09-14 21:21:13 +02:00
Matthew Honnibal
adf0bab23a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-14 21:04:49 +02:00
Matthew Honnibal
ae15fa9688 Fix iob converter 2020-09-14 21:02:18 +02:00
Sofie Van Landeghem
3216a33149
positive_label config for textcat (#6062)
* hook up positive_label in textcat

* unit tests

* documentation

* formatting

* tests

* fix typo

* move verify_config to after begin_training

* revert accidential commit
2020-09-14 17:08:00 +02:00
Ines Montani
c052017025 Fix sparse checkout and error handling 2020-09-14 14:12:58 +02:00
Ines Montani
b854e0bef9 Update styleguide [ci skip] 2020-09-14 11:25:57 +02:00
Ines Montani
9afb1d9965
Merge pull request #6063 from svlandeg/feature/doc_cleanup [ci skip] 2020-09-14 10:35:43 +02:00
Ines Montani
35156429c4 Update docs [ci skip] 2020-09-14 10:34:50 +02:00
Ines Montani
80754d7065 Update README.md [ci skip] 2020-09-14 10:29:06 +02:00
Matthew Honnibal
fdd2340f6c Set version to v3.0.0a17 2020-09-13 23:52:03 +02:00
Ines Montani
081413f210 Update docs [ci skip] 2020-09-13 23:46:51 +02:00
Ines Montani
85e5910102 Update docs [ci skip] 2020-09-13 23:09:19 +02:00
Ines Montani
5ebb2a2ac8 Update docs [ci skip] 2020-09-13 22:36:20 +02:00