spaCy/spacy/tokens
Ines Montani df19e2bff6
💫 Allow setting of custom attributes during retokenization (closes #3314) (#3324)
<!--- Provide a general summary of your changes in the title. -->

## Description

This PR adds the abilility to override custom extension attributes during merging. This will only work for attributes that are writable, i.e. attributes registered with a default value like `default=False` or attribute that have both a getter *and* a setter implemented.

```python
Token.set_extension('is_musician', default=False)

doc = nlp("I like David Bowie.")
with doc.retokenize() as retokenizer:
    attrs = {"LEMMA": "David Bowie", "_": {"is_musician": True}}
    retokenizer.merge(doc[2:4], attrs=attrs)

assert doc[2].text == "David Bowie"
assert doc[2].lemma_ == "David Bowie"
assert doc[2]._.is_musician
```

### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-24 18:38:47 +01:00
..
__init__.pxd * Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx 2015-07-13 20:20:58 +02:00
__init__.py Tidy up and document Doc, Token and Span 2017-10-27 15:41:45 +02:00
_retokenize.pyx 💫 Allow setting of custom attributes during retokenization (closes #3314) (#3324) 2019-02-24 18:38:47 +01:00
_serialize.py 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
doc.pxd Fix issue 2396 (#3089) 2018-12-29 18:05:52 +01:00
doc.pyx 💫 Replace {Doc,Span}.merge with Doc.retokenize (#3280) 2019-02-15 10:29:44 +01:00
span.pxd Add Span.to_array method 2017-08-19 12:20:45 +02:00
span.pyx 💫 Replace {Doc,Span}.merge with Doc.retokenize (#3280) 2019-02-15 10:29:44 +01:00
token.pxd Make NORM a token attribute (#3029) 2018-12-08 10:49:10 +01:00
token.pyx Raise better error if token is pickled (resolves #2833) (#3267) 2019-02-13 11:27:04 +01:00
underscore.py 💫 Allow setting of custom attributes during retokenization (closes #3314) (#3324) 2019-02-24 18:38:47 +01:00