Commit Graph

5762 Commits

Author SHA1 Message Date
Matthew Honnibal
b0b990e405 Fix token.conjuncts (closes #795) (#3392)
* Implement conjuncts method

* Add span.conjuncts property

* Un-xfail token.conjuncts tests

* Update docs for token.conjuncts and span.conjuncts

* Fix merge error in token.conjuncts
2019-03-11 17:05:45 +01:00
Matthew Honnibal
e2b9b523ce Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2019-03-11 15:59:28 +01:00
Ines Montani
47e9c274ef Tidy up property code style (#3391)
Use decorator if properties only have a getter and existing syntax if there's getter and setter
2019-03-11 15:59:09 +01:00
Matthew Honnibal
db79a704bf Add xfail tests for token.conjuncts 2019-03-11 15:46:52 +01:00
Ines Montani
c3df4d1108 Move displaCy tests to own file 2019-03-11 15:28:34 +01:00
Ines Montani
c5a407e95a Fix code style 2019-03-11 15:28:22 +01:00
Matthew Honnibal
39a4741e26 Add support for vocab.writing_system property (#3390)
* Add xfail test for vocab.writing_system

* Add vocab.writing_system property

* Set Language.Defaults.writing_system

* Set default writing system

* Remove xfail on test_vocab_writing_system
2019-03-11 15:23:20 +01:00
Matthew Honnibal
05ef0a5abb Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2019-03-11 14:33:15 +01:00
Ines Montani
ee4f312e89 Add writing_system to ArabicDefaults (experimental) 2019-03-11 14:22:23 +01:00
Ines Montani
ebcf2bb1c3 Add Doc.lang and Doc.lang_ 2019-03-11 14:21:40 +01:00
Ines Montani
ef80cfde6f Fix pickling of Japanese (closes #3191) 2019-03-11 13:34:23 +01:00
Ines Montani
c399162a82 Tidy up 2019-03-11 13:34:14 +01:00
Ines Montani
7c05ca01e8 💫 Support mutable default values for extension attributes (#3389)
* Support mutable default values in extensions

* Update documentation
2019-03-11 12:50:44 +01:00
Matthew Honnibal
4e8a07c7d3 Set version to v2.1.0a11 2019-03-11 10:45:06 +01:00
Matthew Honnibal
80b94313b6 💫 Fix interaction of lemmatizer and tokenizer exceptions (#3388)
Closes #2203. Closes #3268.

Lemmas set from outside the `Morphology` class were being overwritten. The result was especially confusing when deserialising, as it meant some lemmas could change when storing and retrieving a `Doc` object.

This PR applies two fixes:

1) When we go to set the lemma in the `Morphology` class, first check whether a lemma is already set. If so, don't overwrite.
2) When we load with `doc.from_array()`, take care to apply the `TAG` field first. This allows other fields to overwrite the `TAG` implied properties, if they're provided explicitly (e.g. the `LEMMA`).

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-11 01:31:21 +01:00
Matthew Honnibal
04ca710da7 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2019-03-11 01:07:34 +01:00
Matthew Honnibal
5d25ee52fb Fix English tag map 2019-03-11 01:06:02 +01:00
Ines Montani
8f45ff3dc2 Adjust formatting [ci skip] 2019-03-11 00:47:41 +01:00
Matthew Honnibal
7503e1e505 Improve English tag map. Re #593, #3311 2019-03-10 23:50:00 +01:00
Matthew Honnibal
98acf5ffe4 💫 Allow passing of config parameters to specific pipeline components (#3386)
* Add component_cfg kwarg to begin_training

* Document component_cfg arg to begin_training

* Update docs and auto-format

* Support component_cfg across Language

* Format

* Update docs and docstrings [ci skip]

* Fix begin_training
2019-03-10 23:36:47 +01:00
Ines Montani
c998cde7e2 Auto-format [ci skip] 2019-03-10 19:22:59 +01:00
Ines Montani
7ba3a5d95c 💫 Make serialization methods consistent (#3385)
* Make serialization methods consistent

exclude keyword argument instead of random named keyword arguments and deprecation handling

* Update docs and add section on serialization fields
2019-03-10 19:16:45 +01:00
Ines Montani
67e38690d4 Un-xfail passing tests and tidy up 2019-03-10 18:42:16 +01:00
Matthew Honnibal
27dd820753
Fix vocab deserialization when loading already present lexemes (#3383)
* Fix vocab deserialization bug. Closes #2153

* Un-xfail test for #2153
2019-03-10 17:21:19 +01:00
Matthew Honnibal
d6eaa71afc Handle scalar values in doc.from_array() 2019-03-10 16:54:03 +01:00
Matthew Honnibal
61e5ce02a4 Add xfailing test for #2153 2019-03-10 16:36:29 +01:00
Matthew Honnibal
7461e5e055 Fix batch bug in issue #3344 2019-03-10 16:01:34 +01:00
Matthew Honnibal
8a6272f842 Un-xfail test 2019-03-10 15:51:15 +01:00
Matthew Honnibal
4e80fc41ad Make doc.from_array() consistent with doc.to_array(). Closes #3382 2019-03-10 15:50:48 +01:00
Ines Montani
0426689db8 💫 Improve Doc.to_json and add Doc.is_nered (#3381)
* Use default return instead of else

* Add Doc.is_nered to indicate if entities have been set

* Add properties in Doc.to_json if they were set, not if they're available

This way, if a processed Doc exports "pos": None, it means that the tag was explicitly unset. If it exports "ents": [], it means that entity annotations are available but that this document doesn't contain any entities. Before, this would have been unclear and problematic for training.
2019-03-10 15:24:34 +01:00
Ines Montani
7984543953 Add xfailing test for to_array/from_array string attrs 2019-03-10 15:08:15 +01:00
Ines Montani
6bbf4ea309 Simplify tests and avoid tokenizing 2019-03-10 15:05:56 +01:00
Matthew Honnibal
a5b1f6dcec Fix NER when preset entities cross sentence boundaries (#3379)
💫 Fix NER when preset entities cross sentence boundaries
2019-03-10 14:53:03 +01:00
Ines Montani
3fe5811fa7 Only link model after download if shortcut link (#3378) 2019-03-10 13:02:24 +01:00
Matthew Honnibal
231bc7bb7b Add xfailing test for #3345 2019-03-10 13:00:15 +01:00
Matthew Honnibal
bdc77848f5 Add helper method to apply a transition in parser/NER 2019-03-10 13:00:00 +01:00
Matthew Honnibal
ce1fe8a510 Add comment 2019-03-09 17:51:17 +00:00
Matthew Honnibal
28c26e212d Fix textcat model for GPU 2019-03-09 17:50:08 +00:00
Ines Montani
610fb306bd Revert hyphens 2019-03-09 12:51:53 +01:00
Ines Montani
bbabb6aaae Escape more hyphens 2019-03-09 12:41:05 +01:00
Ines Montani
b8db219850 Auto-format 2019-03-09 12:40:58 +01:00
Ines Montani
a145bfe627 Try escaping hyphens again 2019-03-09 03:06:50 +01:00
Ines Montani
b9c71fc0f0 Fix flags 2019-03-09 02:46:04 +01:00
Ines Montani
ae09b6a6cf Try fixing unicode inconsistencies on Python 2 2019-03-09 02:37:50 +01:00
Ines Montani
d957d7a697 Auto-format 2019-03-09 02:37:41 +01:00
Ines Montani
65402c3d02 Revert "Experiment with escaping hyphens"
This reverts commit 9b42e2d5dd.
2019-03-09 02:13:00 +01:00
Ines Montani
9b42e2d5dd Experiment with escaping hyphens 2019-03-09 02:05:26 +01:00
Ines Montani
76764fcf59 💫 Improve converters and training data file formats (#3374)
* Populate converter argument info automatically

* Add conversion option for msgpack

* Update docs

* Allow reading training data from JSONL
2019-03-08 23:15:23 +01:00
Ines Montani
296446a1c8
Tidy up and improve docs and docstrings (#3370)
<!--- Provide a general summary of your changes in the title. -->

## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs

### Types of change
enhancement, docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-08 11:42:26 +01:00
Ines Montani
daaeeb7a2b Merge branch 'master' into develop 2019-03-07 22:07:31 +01:00