Ines Montani
dfbed07d3b
Remove unused temp errors
2019-02-24 22:26:08 +01:00
Ines Montani
d0b3af9222
Fix remaining inaccuracies in API docs ( closes #2329 )
2019-02-24 22:21:25 +01:00
Ines Montani
49d0938038
Update version [ci skip]
2019-02-24 22:01:47 +01:00
Ines Montani
62b558ab72
💫 Support lexical attributes in retokenizer attrs ( closes #2390 ) ( #3325 )
...
* Fix formatting and whitespace
* Add support for lexical attributes (closes #2390 )
* Document lexical attribute setting during retokenization
* Assign variable oputside of nested loop
2019-02-24 21:13:51 +01:00
Ines Montani
a48deb4081
Merge regression tests
2019-02-24 21:03:39 +01:00
Ines Montani
8f6c193a4d
Delete _test_issue1622.py
2019-02-24 20:33:31 +01:00
Ines Montani
c8e967c78d
Try include previously segfaulting test
2019-02-24 20:32:46 +01:00
Ines Montani
328b589deb
Merge regression tests
2019-02-24 20:31:38 +01:00
Ines Montani
3bc53905cc
Remove print statements from test
2019-02-24 20:31:15 +01:00
Ines Montani
1ae0df3da9
Un-x-fail passing test
2019-02-24 20:24:15 +01:00
Ines Montani
399a5803d0
Tidy up tests [ci skip]
2019-02-24 19:02:16 +01:00
Ines Montani
aa52305461
Improve pipeline model and meta example [ci skip]
2019-02-24 18:45:39 +01:00
Ines Montani
2011563c51
Update docstrings [ci skip]
2019-02-24 18:39:59 +01:00
Ines Montani
df19e2bff6
💫 Allow setting of custom attributes during retokenization ( closes #3314 ) ( #3324 )
...
<!--- Provide a general summary of your changes in the title. -->
## Description
This PR adds the abilility to override custom extension attributes during merging. This will only work for attributes that are writable, i.e. attributes registered with a default value like `default=False` or attribute that have both a getter *and* a setter implemented.
```python
Token.set_extension('is_musician', default=False)
doc = nlp("I like David Bowie.")
with doc.retokenize() as retokenizer:
attrs = {"LEMMA": "David Bowie", "_": {"is_musician": True}}
retokenizer.merge(doc[2:4], attrs=attrs)
assert doc[2].text == "David Bowie"
assert doc[2].lemma_ == "David Bowie"
assert doc[2]._.is_musician
```
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-24 18:38:47 +01:00
Ines Montani
403b9cd58b
Add docs on adding to existing tokenizer rules [ci skip]
2019-02-24 18:35:19 +01:00
Ines Montani
1ea1bc98e7
Document regex utilities [ci skip]
2019-02-24 18:34:10 +01:00
Ines Montani
cd4bc6757b
Update README.md [ci skip]
2019-02-24 17:40:01 +01:00
Matthew Honnibal
1f7c56cd93
Fix parser.add_label()
2019-02-24 16:53:22 +01:00
Matthew Honnibal
893aa40d73
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2019-02-24 16:43:01 +01:00
Matthew Honnibal
5882d82915
Set version to v2.1.0a9.dev2
2019-02-24 16:42:06 +01:00
Matthew Honnibal
0367f864fe
Fix handling of added labels. Resolves #3189
2019-02-24 16:41:41 +01:00
Matthew Honnibal
4dc57d9e15
Update train_new_entity_type example
2019-02-24 16:41:03 +01:00
Matthew Honnibal
d74dbde828
Fix order of actions when labels added to parser
...
When labels were added to the parser or NER, we weren't loading back the
classes in the correct order. Re issue #3189
2019-02-24 16:36:29 +01:00
Matthew Honnibal
7ac0f9626c
Update rehearsal example
2019-02-24 16:17:41 +01:00
Ines Montani
6de81ae310
Fix formatting of errors
2019-02-24 15:11:28 +01:00
Ines Montani
d8f69d592f
Tidy up retokenizer tests
2019-02-24 14:14:11 +01:00
Ines Montani
723e27cb8c
Tidy up tests
2019-02-24 14:11:23 +01:00
Ines Montani
2982f82934
Auto-format
2019-02-24 14:09:15 +01:00
Ines Montani
09bf08b3c3
Update redirects [ci skip]
2019-02-24 13:37:50 +01:00
Ines Montani
dceca3264d
Tidy up package.json [ci skip]
2019-02-24 13:37:41 +01:00
Ines Montani
3ef4da3503
Update and auto-format README [ci skip]
2019-02-24 13:12:13 +01:00
Ines Montani
46ec5cdccc
Update TextCategorizer docs
2019-02-24 13:11:57 +01:00
Ines Montani
c03cb1cc63
Improve built-in component API docs
2019-02-24 13:11:49 +01:00
Ines Montani
235a0e948e
Tidy up CI config
2019-02-24 12:07:33 +01:00
Ines Montani
b570a1e203
Exclude website branch from CI
2019-02-24 11:52:16 +01:00
Ines Montani
383e2e1f12
Update Python versions [ci skip]
2019-02-24 11:49:45 +01:00
Ines Montani
b624cb4b89
Update v2-1.md
2019-02-24 11:49:27 +01:00
Matthew Honnibal
909a9d9932
Set version to v2.1.0a9.dev1
2019-02-23 13:10:42 +01:00
Matthew Honnibal
55bb3cc482
Require thinc 7.0.2
2019-02-23 13:10:09 +01:00
Matthew Honnibal
981cb89194
Fix f-score calculation if zero
2019-02-23 12:45:41 +01:00
Matthew Honnibal
6b0008afc6
Clean up TextCategorizer slightly
2019-02-23 12:28:06 +01:00
Matthew Honnibal
d13b9373bf
Improve initialization for mutually textcat
2019-02-23 12:27:45 +01:00
Matthew Honnibal
5063d999e5
Set architecture in textcat example
2019-02-23 11:57:59 +01:00
Matthew Honnibal
e9dd5943b9
Support exclusive_classes setting for textcat models
2019-02-23 11:57:16 +01:00
Matthew Honnibal
ce1e4eace2
Default to former TextCategorizer model
...
* Keep TextCategorizer default model same as v2.0
* Add option 'architecture' that allows "simple_cnn" to switch to
simpler model.
* Add option exclusive_classes, defaulting to False. If set to True,
the model treats classes as mutually exclusive, i.e. only one class can
be true per instance.
2019-02-23 11:55:16 +01:00
Matthew Honnibal
829c9091a4
Set version to v2.1.0a9.dev0
2019-02-21 17:13:34 +01:00
Matthew Honnibal
d396a69c7b
More fixes for issue #3112
2019-02-21 17:12:23 +01:00
Ines Montani
80bdcb99c5
Fix escaping of HTML in displacy ENT ( closes #2728 )
2019-02-21 14:30:39 +01:00
Ines Montani
250e88ef55
Fix docs example (see #2728 )
2019-02-21 14:22:06 +01:00
Ines Montani
0fc908d7a5
Add note on merging speed in v2.1 (see #3300 ) [ci skip]
2019-02-21 12:34:18 +01:00