adrianeboyd
a6e521cd79
Add is_sent_end token property ( #5375 )
...
Reconstruction of the original PR #4697 by @MiniLau.
Removes unused `SENT_END` symbol and `IS_SENT_END` from `Matcher` schema
because the Matcher is only going to be able to support `IS_SENT_START`.
2020-04-29 12:53:16 +02:00
Tom Keefe
ddf63b97a8
make idx available via to_array ( #5030 )
2020-02-22 14:13:06 +01:00
Sofie Van Landeghem
a1b22e90cd
serialize ENT_ID ( #4852 )
...
* expand serialization test for custom token attribute
* add failing test for issue 4849
* define ENT_ID as attr and use in doc serialization
* fix few typos
2020-01-06 14:57:34 +01:00
Sofie Van Landeghem
4e7259c6cf
Bugfix initializing DocBin with attributes ( #4368 )
...
* docbin init fix + documentation fix + unit tests
* newline
* try with zlib instead of gzip (python 2 incompatibilities)
2019-10-03 14:48:45 +02:00
Matthew Honnibal
bcd08f20af
Merge changes from master
2019-08-21 14:18:52 +02:00
svlandeg
8608685543
ensure Span.as_doc keeps the entity links + unit test
2019-06-25 15:28:51 +02:00
Matthew Honnibal
dd9ea478c5
Fix intify_attrs function for obsolete data
2019-03-07 21:59:03 +01:00
Matthew Honnibal
1f7229f40f
Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
...
This reverts commit c9ba3d3c2d
, reversing
changes made to 92c26a35d4
.
2018-03-27 19:23:02 +02:00
4altinok
94fb0b75e3
code for is_currency
2018-02-11 18:51:32 +01:00
Vadim Mazaev
cacd859dcd
Added tag map, fixed tests fails, added more exceptions
2017-11-26 20:54:48 +03:00
ines
d96e72f656
Tidy up rest
2017-10-27 21:07:59 +02:00
Matthew Honnibal
16122f566e
Fix cpdef enum in attrs.pyx
2017-09-17 12:28:53 -05:00
Matthew Honnibal
fe11564b8e
Finish stringstore change. Also xfail vectors tests
2017-05-28 15:10:22 +02:00
Matthew Honnibal
84e66ca6d4
WIP on stringstore change. 27 failures
2017-05-28 14:06:40 +02:00
Matthew Honnibal
d68dd1f251
Add SENT_START attribute, for custom sentence boundary detection
2017-05-23 18:37:58 +02:00
ines
d24589aa72
Clean up imports, unused code, whitespace, docstrings
2017-04-15 12:05:47 +02:00
ines
561f2a3eb4
Use consistent formatting for docstrings
2017-04-15 11:59:21 +02:00
Matthew Honnibal
d864708072
Add more morphology names in attrs.pyx
2017-03-15 09:26:16 -05:00
Roman Inflianskas
66e1109b53
Add support for Universal Dependencies v2.0
2017-03-03 13:17:34 +01:00
Matthew Honnibal
3980f1b0cb
Ignore more morphology attributes in deprecated mode of intify_attrs
2016-12-18 17:33:46 +01:00
Matthew Honnibal
d58187ffa7
Filter out morphology keys in deprecated attrs
2016-12-18 16:50:26 +01:00
Matthew Honnibal
6dd3b94fa6
Filter out deprecated attributes when reading special-case tokenization rules.
2016-11-25 09:57:18 -06:00
Matthew Honnibal
a335c6dcc2
Exclude morphs from deprecated token attributes for now
2016-11-25 16:17:32 +01:00
Matthew Honnibal
846e80f2f4
Exclude morphs from deprecated token attributes for now
2016-11-25 16:14:54 +01:00
Matthew Honnibal
53d8ca8f51
Add spacy.attrs.intify_attrs function, to normalize strings in token attribute dictionaries.
2016-11-25 11:34:30 +01:00
Wolfgang Seeker
03fb498dbe
introduce lang field for LexemeC to hold language id
...
put noun_chunk logic into iterators.py for each language separately
2016-03-10 13:01:34 +01:00
Matthew Honnibal
c4017a06d9
* Add placeholders for the new flags in attrs and symbols
2016-02-04 15:49:45 +01:00
Matthew Honnibal
22bd0095f5
* Map empty string to NULL_ATTR in attrs
2015-10-10 22:10:19 +11:00
Matthew Honnibal
94bafc1417
* Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS
2015-10-10 17:57:29 +11:00
Matthew Honnibal
064bd69ad0
* Refactor symbols, so that frequency rank can be derived from the orth id of a word.
2015-10-10 16:03:48 +11:00
Matthew Honnibal
44f39a876f
* Add a blank attrs.pyx
2015-07-17 16:40:42 +02:00