Commit Graph

320 Commits

Author SHA1 Message Date
Matthew Honnibal
56164ab688 Set l_edge and r_edge correctly for non-projective parses. Fixes #1799 2018-01-22 20:18:04 +01:00
Matthew Honnibal
ccb51a9f36 Make .similarity() return 1.0 if all orth attrs match 2018-01-15 16:29:48 +01:00
Matthew Honnibal
b904d81e9a Fix rich comparison against None objects. Closes #1757 2018-01-15 15:51:25 +01:00
Matthew Honnibal
ab7c45b12d Fix error message and handling of doc.sents 2018-01-15 15:21:11 +01:00
Matthew Honnibal
465a6f6452 Add missing Span.vocab property. Closes #1633 2018-01-14 15:06:30 +01:00
Matthew Honnibal
0cb090e526 Fix infinite recursion in token.sent_start. Closes #1640 2018-01-14 15:02:15 +01:00
Matthew Honnibal
5cbe913b6f Don't raise deprecation warning in property. Closes #1813, #1712 2018-01-14 14:55:58 +01:00
Matthew Honnibal
e10e9ad2c5 Improve efficiency of Doc.to_array 2017-11-23 12:33:27 +00:00
Matthew Honnibal
fa62427300 Remove lookup-based lemmatization 2017-11-23 12:32:22 +00:00
Matthew Honnibal
fb26b2cb12 Use lookup lemmatizer if lemma unset 2017-11-23 12:31:58 +00:00
Burton DeWilde
a5c6869b2d Fix bug where span.orth_ != span.text (see #1612) 2017-11-20 12:05:43 -06:00
Motoki Wu
a52e195a0a Fixes Issue #1207 where noun_chunks of Span gives an error.
Make sure to reference `self.doc` when getting the noun chunks.

Same fix as 9750a0128c
2017-11-17 17:16:20 -08:00
ines
1c218397f6 Ensure path in Doc.to_disk/from_disk (resolves ##1521)
Also add Doc serialization tests with both Path and string path options
2017-11-09 02:29:03 +01:00
Matthew Honnibal
144a93c2a5 Back-off to tensor for similarity if no vectors 2017-11-03 20:56:33 +01:00
Matthew Honnibal
62ed58935a Add Doc.extend_tensor() method 2017-11-03 11:20:31 +01:00
ines
9659391944 Update deprecated methods and add warnings 2017-11-01 16:49:42 +01:00
ines
705a4e3e4a Fix formatting 2017-11-01 16:44:08 +01:00
Matthew Honnibal
9e0ebee81c Add Token.is_sent_start property, so can deprecate Token.sent_start 2017-11-01 13:27:14 +01:00
Matthew Honnibal
7e7116cdf7 Fix Doc.to_array when only one string attr provided 2017-11-01 13:26:43 +01:00
Matthew Honnibal
301fb2bb60 Implement Span.n_lefts and Span.n_rights 2017-11-01 13:25:12 +01:00
Matthew Honnibal
86eba61fae Fix token.vector when vectors are missing 2017-11-01 00:47:35 +01:00
ines
d96e72f656 Tidy up rest 2017-10-27 21:07:59 +02:00
ines
d2df81d907 Fix not implemented Span getters 2017-10-27 18:09:28 +02:00
ines
544a407b93 Tidy up Doc, Token and Span and add missing docs 2017-10-27 17:07:26 +02:00
ines
6a0483b7aa Tidy up and document Doc, Token and Span 2017-10-27 15:41:45 +02:00
ines
1a559d4c95 Remove old, unused file 2017-10-27 15:34:35 +02:00
ines
ea4a41c8fb Tidy up util and helpers 2017-10-27 14:39:09 +02:00
Matthew Honnibal
b66b8f028b Fix #1375 -- out-of-bounds on token.nbor() 2017-10-24 12:10:39 +02:00
Matthew Honnibal
ccd2ab1a62 Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix
Add LCA matrix for spans and docs
2017-10-24 11:22:46 +02:00
Matthew Honnibal
fdf25d10ba Merge pull request #1440 from ramananbalakrishnan/develop
Support single value for attribute list in doc.to_array
2017-10-24 10:23:12 +02:00
ines
a31f048b4d Fix formatting 2017-10-23 10:38:06 +02:00
Ramanan Balakrishnan
d2fe56a577
Add LCA matrix for spans and docs 2017-10-20 23:58:00 +05:30
Ramanan Balakrishnan
0726946563
cleanup to_array implementation using fixes on master 2017-10-20 17:09:37 +05:30
Ramanan Balakrishnan
b3ab124fc5
Support strings for attribute list in doc.to_array 2017-10-20 11:46:57 +05:30
Ramanan Balakrishnan
7b9b1be44c
Support single value for attribute list in doc.to_array 2017-10-19 17:00:41 +05:30
Matthew Honnibal
394633efce Make doc pickling support hooks 2017-10-17 19:44:09 +02:00
Matthew Honnibal
cdb0c426d8 Improve deserialization of user_data, esp. for Underscore 2017-10-17 19:29:20 +02:00
Matthew Honnibal
32a8564c79 Fix doc pickling 2017-10-17 18:20:24 +02:00
Matthew Honnibal
92c1eb2d6f Fix Doc pickling. This also removes need for Binder class 2017-10-17 16:11:13 +02:00
Matthew Honnibal
a002264fec Remove caching of Token in Doc, as caused cycle. 2017-10-16 19:34:21 +02:00
Matthew Honnibal
59c216196c Allow weakrefs on Doc objects 2017-10-16 19:22:11 +02:00
ines
e0ff145a8b Merge branch 'develop' into feature/dot-underscore 2017-10-11 11:57:05 +02:00
Matthew Honnibal
3b527fa52b Call morphology.assign_untagged when pushing token to Doc 2017-10-11 03:23:57 +02:00
Matthew Honnibal
e0a9b02b67 Merge Span._ and Span.as_doc methods 2017-10-09 22:00:15 -05:00
ines
3fc4fe61d2 Fix typo 2017-10-10 04:15:14 +02:00
ines
59c4f27499 Add get, set and has methods to Underscore 2017-10-10 04:14:35 +02:00
Matthew Honnibal
51d18937af Partially apply doc/span/token into method
We want methods to act like they're "bound" to the object, so that you can make your method conditional on the `doc`, `span` or `token` instance --- like, well, a method. We therefore partially apply the function, which works like this:

```
def partial(unbound_method, constant_arg):
    def bound_method(*args, **kwargs):
        return unbound_method(constant_arg, *args, **kwargs)
    return bound_method
2017-10-10 02:21:28 +02:00
Matthew Honnibal
e938bce320 Adjust parsing transition system to allow preset sentence segments. 2017-10-08 23:53:34 +02:00
Matthew Honnibal
080afd4924 Add ternary value setting to Token.sent_start 2017-10-08 23:51:58 +02:00
Matthew Honnibal
7ae67ec6a1 Add Span.as_doc method 2017-10-08 23:50:20 +02:00