Matthew Honnibal
f3be9d0a9a
Add tensor field to Lexeme, Token, Doc and Span, so that users have a place to hang neural network outputs
2016-10-14 03:24:13 +02:00
Matthew Honnibal
ca32a1ab01
Revert "Work on Issue #285 : intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good."
...
This reverts commit 8423e8627f
.
2016-09-30 20:20:22 +02:00
Matthew Honnibal
6736977d82
Revert "Changes to Doc and Token for new string store scheme"
...
This reverts commit 99de44d864
.
2016-09-30 20:11:15 +02:00
Matthew Honnibal
99de44d864
Changes to Doc and Token for new string store scheme
2016-09-30 20:00:21 +02:00
Matthew Honnibal
8423e8627f
Work on Issue #285 : intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good.
2016-09-30 10:14:47 +02:00
Matthew Honnibal
d3dc5718b2
Fix syntax error in Doc
2016-09-28 11:39:49 +02:00
Matthew Honnibal
1b520e7bab
Improve docstrings for Doc object
2016-09-28 11:15:13 +02:00
Matthew Honnibal
fc4a7ad794
Test and fix Issue #411 : IndexError when .sents property is used on empty string.
2016-09-27 18:49:14 +02:00
Matthew Honnibal
15e42a1ba9
Allow entities to be set by Span, or by 4-tuple (with entity ID)
2016-09-24 01:17:43 +02:00
Matthew Honnibal
e48df859b5
Fix typedef import in span.pyx
2016-09-23 16:02:28 +02:00
Matthew Honnibal
4de13606fd
Fix token.pyx
2016-09-23 15:07:07 +02:00
Matthew Honnibal
b4de419e19
Import hash_t typedef in token.pyx
2016-09-23 14:22:06 +02:00
Matthew Honnibal
c1a2e96604
Clean up notes at end of token.pyx
2016-09-21 20:45:51 +02:00
Matthew Honnibal
58e83fe34b
Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match.
2016-09-21 14:54:55 +02:00
Matthew Honnibal
2735b6247b
Fix orths_and_spaces in Doc.__init__
2016-09-21 14:52:05 +02:00
Matthew Honnibal
cdc10e9a1c
* Fix Issue #375 : noun phrase iteration results in index error if noun phrases are merged during the loop. Fix by accumulating the spans inside the noun_chunks property, allowing the Span index tricks to work.
2016-05-20 10:14:06 +02:00
Matthew Honnibal
5d86c30f0b
* Fix Issue #367 : Missing has_vector property on Doc and Span objects
2016-05-09 12:36:14 +02:00
Matthew Honnibal
8c0888d6cb
* Fix error in span.sent
2016-05-06 00:28:05 +02:00
Matthew Honnibal
26095f9722
* Add span.sent property, re Issue #366
2016-05-06 00:17:38 +02:00
Matthew Honnibal
76021cb853
* Fix bug in Doc.text, introduced by a862edc
2016-05-04 11:02:16 +02:00
Matthew Honnibal
29a114e645
* Don't assign 0-valued tags in Doc.from_array
2016-05-02 16:07:50 +02:00
Matthew Honnibal
276fbe9996
* Fix assignment of iterator on Doc object
2016-05-02 15:26:24 +02:00
Matthew Honnibal
508fd1f6dc
* Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples.
2016-05-02 14:25:10 +02:00
Matthew Honnibal
6df3858dbc
* Fix Issue #323 : Incorrect semantics of Token.__str__ built-in. Add flag to allow users to switch the old semantics back on, to ease transition.
2016-04-12 13:17:59 +10:00
Matthew Honnibal
872695759d
Merge pull request #306 from wbwseeker/german_noun_chunks
...
add German noun chunk functionality
2016-04-08 00:54:24 +10:00
Matthew Honnibal
26622f0ffc
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2016-03-29 14:31:52 +11:00
Matthew Honnibal
ad119c074f
* Fix incorrect whitespacing in Doc.text. This change is potentially breaking, to anyone who was relying on the previous incorrect semantics.
2016-03-29 13:02:42 +11:00
Wolfgang Seeker
d65ef41d08
make error messages language independent
2016-03-24 11:47:09 +01:00
Wolfgang Seeker
5080077097
revert init_model.py back to pre-german state (because it makes more sense)
...
simplify token.n_rights and token.n_lefts
2016-03-21 16:10:25 +01:00
Wolfgang Seeker
5e2e8e951a
add baseclass DocIterator for iterators over documents
...
add classes for English and German noun chunks
the respective iterators are set for the document when created by the parser
as they depend on the annotation scheme of the parsing model
2016-03-16 15:53:35 +01:00
Wolfgang Seeker
2ae253ef5b
changed head.__set__ to make it simpler
2016-03-14 13:43:48 +01:00
Wolfgang Seeker
46e3f979f1
add function for setting head and label to token
...
change PseudoProjectivity.deprojectivize to use these functions
2016-03-11 17:31:06 +01:00
Wolfgang Seeker
03fb498dbe
introduce lang field for LexemeC to hold language id
...
put noun_chunk logic into iterators.py for each language separately
2016-03-10 13:01:34 +01:00
Wolfgang Seeker
d9312bc9ea
add new files npchunks.{pyx,pxd} to hold noun phrase chunk generators
2016-03-09 16:18:48 +01:00
Wolfgang Seeker
3448cb40a4
integrated pseudo-projective parsing into parser
...
- nonproj.pyx holds a class PseudoProjectivity which currently holds
all functionality to implement Nivre & Nilsson 2005's pseudo-projective
parsing using the HEAD decoration scheme
- changed lefts/rights in Token to account for possible non-projective
structures
2016-03-01 10:09:08 +01:00
Matthew Honnibal
af8514cb0c
* Refine the way the is_parsed attribute is set by from_array
2016-02-06 14:44:35 +01:00
Matthew Honnibal
e66d45bf66
* Restore previous patch to Span.root, as it seems it wasn't the cause of the problem.
2016-02-06 13:37:41 +01:00
Matthew Honnibal
031b00cb91
* Fix Span.root calculation
2016-02-05 20:12:09 +01:00
Matthew Honnibal
e5c447e237
* Questionable fix to problem in Span.root
2016-02-05 19:18:35 +01:00
Matthew Honnibal
1ef84a0557
* Merge master into rethinc2
2016-02-05 12:55:59 +01:00
Matthew Honnibal
6aa92b70f1
* Fix merge problem in span
2016-02-05 12:46:11 +01:00
Matthew Honnibal
419edfab50
* Use generic flags for the new attributes until they're added
2016-02-04 15:50:54 +01:00
Matthew Honnibal
11810be33e
* Add Python hooks for is_bracket/is_quote/is_left_punct/is_right_punct
2016-02-04 13:04:16 +01:00
Matthew Honnibal
4cbad510ff
* Fix calculation of head for spans with punctuation.
2016-02-03 02:32:21 +01:00
Matthew Honnibal
6bb007d16e
* Make set_parse nogil
2016-01-30 20:27:52 +01:00
Matthew Honnibal
87172a15c6
* Fix runtime error bug that arose from updated Span.root function.
2016-01-25 15:22:42 +01:00
Matthew Honnibal
334c4b2b57
* Disprefer punctuation and spaces as heads of spans
2016-01-18 18:14:09 +01:00
Matthew Honnibal
c107da9738
* Bug fix to _count_words_to_root
2016-01-18 16:59:38 +01:00
Matthew Honnibal
f24833d607
* Fix merge for coordinations
2016-01-18 16:03:19 +01:00
Matthew Honnibal
14534958a9
* Fix bug in Span.root
2016-01-18 15:40:28 +01:00