Matthew Honnibal
|
11664b9f20
|
Fix variable error in token
|
2016-11-01 13:28:00 +01:00 |
|
Matthew Honnibal
|
8c4d1b46ce
|
Fix variable error in Span
|
2016-11-01 13:27:44 +01:00 |
|
Matthew Honnibal
|
e7af6b937f
|
Fix syntax error while fixing doc strings
|
2016-11-01 13:27:32 +01:00 |
|
Matthew Honnibal
|
b86f8af0c1
|
Fix doc strings
|
2016-11-01 12:25:36 +01:00 |
|
Matthew Honnibal
|
4ca31b4d87
|
Fix clobbering of 'missing' named ent values after assigning ents.
|
2016-10-26 13:13:56 +02:00 |
|
Matthew Honnibal
|
15c9b59f0e
|
Fix Issue #461: O tag was being clobbered by doc.ents.__set__
|
2016-10-23 15:50:26 +02:00 |
|
Matthew Honnibal
|
2c3a67b693
|
Fix calculation of vector norm, re Issue #522. Need to consolidate the calculations into a helper function.
|
2016-10-23 14:49:31 +02:00 |
|
Matthew Honnibal
|
e80944276f
|
Fix Span.vector_norm
|
2016-10-20 21:58:56 +02:00 |
|
Matthew Honnibal
|
3588a18fb8
|
Fix hook names in doc
|
2016-10-19 21:15:16 +02:00 |
|
Matthew Honnibal
|
5d5742b773
|
Add sentiment field to doc, rename getters_for_tokens and getters_for_spans, add user_hooks field to Doc.
|
2016-10-19 20:54:22 +02:00 |
|
Matthew Honnibal
|
9b60186266
|
Fix doc class
|
2016-10-17 15:23:47 +02:00 |
|
Matthew Honnibal
|
7fd98fc91c
|
Remove deprecation shim around str/bytes in Token.
|
2016-10-17 14:02:47 +02:00 |
|
Matthew Honnibal
|
b67697a97b
|
Improve API for doc.merge() and span.merge(), to use keyword arguments.
|
2016-10-17 14:02:13 +02:00 |
|
Matthew Honnibal
|
fbb7f3f15c
|
Add user_data attribute to Doc object.
|
2016-10-17 11:43:22 +02:00 |
|
Matthew Honnibal
|
c1abc8f6ed
|
Fix deprecation stuff in Token: Remove the shim for the str/unicode semantics, and raise for has_repvec and repvec
|
2016-10-17 11:18:41 +02:00 |
|
Matthew Honnibal
|
09ab447a18
|
Remove tensor property from token.
|
2016-10-17 02:45:09 +02:00 |
|
Matthew Honnibal
|
5d10e2005c
|
Defer some attributes to Doc, via getters_for_tokens attribute.
|
2016-10-17 02:44:49 +02:00 |
|
Matthew Honnibal
|
8829984efb
|
Remove tensor attribute from Span and Token.
|
2016-10-17 02:44:04 +02:00 |
|
Matthew Honnibal
|
d15a88c66a
|
Defer some attributes to Doc via getters_for_spans
|
2016-10-17 02:43:35 +02:00 |
|
Matthew Honnibal
|
62230dd13a
|
Add getters_for_spans and getters_for_tokens attributes to Doc. Fix docstring
|
2016-10-17 02:42:51 +02:00 |
|
Matthew Honnibal
|
ae11ea8240
|
Add getters_for_tokens and getters_for_spans attributes to Doc object.
|
2016-10-17 02:42:05 +02:00 |
|
Matthew Honnibal
|
311a985fe0
|
Add input error handling in Doc
|
2016-10-16 18:16:42 +02:00 |
|
Matthew Honnibal
|
06322ba99d
|
Add words and spaces keyword arguments to Doc.
|
2016-10-16 18:13:03 +02:00 |
|
Matthew Honnibal
|
f3be9d0a9a
|
Add tensor field to Lexeme, Token, Doc and Span, so that users have a place to hang neural network outputs
|
2016-10-14 03:24:13 +02:00 |
|
Matthew Honnibal
|
ca32a1ab01
|
Revert "Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good."
This reverts commit 8423e8627f .
|
2016-09-30 20:20:22 +02:00 |
|
Matthew Honnibal
|
6736977d82
|
Revert "Changes to Doc and Token for new string store scheme"
This reverts commit 99de44d864 .
|
2016-09-30 20:11:15 +02:00 |
|
Matthew Honnibal
|
99de44d864
|
Changes to Doc and Token for new string store scheme
|
2016-09-30 20:00:21 +02:00 |
|
Matthew Honnibal
|
8423e8627f
|
Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good.
|
2016-09-30 10:14:47 +02:00 |
|
Matthew Honnibal
|
d3dc5718b2
|
Fix syntax error in Doc
|
2016-09-28 11:39:49 +02:00 |
|
Matthew Honnibal
|
1b520e7bab
|
Improve docstrings for Doc object
|
2016-09-28 11:15:13 +02:00 |
|
Matthew Honnibal
|
fc4a7ad794
|
Test and fix Issue #411: IndexError when .sents property is used on empty string.
|
2016-09-27 18:49:14 +02:00 |
|
Matthew Honnibal
|
15e42a1ba9
|
Allow entities to be set by Span, or by 4-tuple (with entity ID)
|
2016-09-24 01:17:43 +02:00 |
|
Matthew Honnibal
|
e48df859b5
|
Fix typedef import in span.pyx
|
2016-09-23 16:02:28 +02:00 |
|
Matthew Honnibal
|
4de13606fd
|
Fix token.pyx
|
2016-09-23 15:07:07 +02:00 |
|
Matthew Honnibal
|
b4de419e19
|
Import hash_t typedef in token.pyx
|
2016-09-23 14:22:06 +02:00 |
|
Matthew Honnibal
|
c1a2e96604
|
Clean up notes at end of token.pyx
|
2016-09-21 20:45:51 +02:00 |
|
Matthew Honnibal
|
58e83fe34b
|
Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match.
|
2016-09-21 14:54:55 +02:00 |
|
Matthew Honnibal
|
2735b6247b
|
Fix orths_and_spaces in Doc.__init__
|
2016-09-21 14:52:05 +02:00 |
|
Matthew Honnibal
|
cdc10e9a1c
|
* Fix Issue #375: noun phrase iteration results in index error if noun phrases are merged during the loop. Fix by accumulating the spans inside the noun_chunks property, allowing the Span index tricks to work.
|
2016-05-20 10:14:06 +02:00 |
|
Matthew Honnibal
|
5d86c30f0b
|
* Fix Issue #367: Missing has_vector property on Doc and Span objects
|
2016-05-09 12:36:14 +02:00 |
|
Matthew Honnibal
|
8c0888d6cb
|
* Fix error in span.sent
|
2016-05-06 00:28:05 +02:00 |
|
Matthew Honnibal
|
26095f9722
|
* Add span.sent property, re Issue #366
|
2016-05-06 00:17:38 +02:00 |
|
Matthew Honnibal
|
76021cb853
|
* Fix bug in Doc.text, introduced by a862edc
|
2016-05-04 11:02:16 +02:00 |
|
Matthew Honnibal
|
29a114e645
|
* Don't assign 0-valued tags in Doc.from_array
|
2016-05-02 16:07:50 +02:00 |
|
Matthew Honnibal
|
276fbe9996
|
* Fix assignment of iterator on Doc object
|
2016-05-02 15:26:24 +02:00 |
|
Matthew Honnibal
|
508fd1f6dc
|
* Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples.
|
2016-05-02 14:25:10 +02:00 |
|
Matthew Honnibal
|
6df3858dbc
|
* Fix Issue #323: Incorrect semantics of Token.__str__ built-in. Add flag to allow users to switch the old semantics back on, to ease transition.
|
2016-04-12 13:17:59 +10:00 |
|
Matthew Honnibal
|
872695759d
|
Merge pull request #306 from wbwseeker/german_noun_chunks
add German noun chunk functionality
|
2016-04-08 00:54:24 +10:00 |
|
Matthew Honnibal
|
26622f0ffc
|
Merge branch 'master' of ssh://github.com/honnibal/spaCy
|
2016-03-29 14:31:52 +11:00 |
|
Matthew Honnibal
|
ad119c074f
|
* Fix incorrect whitespacing in Doc.text. This change is potentially breaking, to anyone who was relying on the previous incorrect semantics.
|
2016-03-29 13:02:42 +11:00 |
|