Matthew Honnibal
26622f0ffc
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2016-03-29 14:31:52 +11:00
Matthew Honnibal
ad119c074f
* Fix incorrect whitespacing in Doc.text. This change is potentially breaking, to anyone who was relying on the previous incorrect semantics.
2016-03-29 13:02:42 +11:00
Wolfgang Seeker
2ae253ef5b
changed head.__set__ to make it simpler
2016-03-14 13:43:48 +01:00
Wolfgang Seeker
46e3f979f1
add function for setting head and label to token
...
change PseudoProjectivity.deprojectivize to use these functions
2016-03-11 17:31:06 +01:00
Wolfgang Seeker
3448cb40a4
integrated pseudo-projective parsing into parser
...
- nonproj.pyx holds a class PseudoProjectivity which currently holds
all functionality to implement Nivre & Nilsson 2005's pseudo-projective
parsing using the HEAD decoration scheme
- changed lefts/rights in Token to account for possible non-projective
structures
2016-03-01 10:09:08 +01:00
Matthew Honnibal
af8514cb0c
* Refine the way the is_parsed attribute is set by from_array
2016-02-06 14:44:35 +01:00
Matthew Honnibal
e66d45bf66
* Restore previous patch to Span.root, as it seems it wasn't the cause of the problem.
2016-02-06 13:37:41 +01:00
Matthew Honnibal
031b00cb91
* Fix Span.root calculation
2016-02-05 20:12:09 +01:00
Matthew Honnibal
e5c447e237
* Questionable fix to problem in Span.root
2016-02-05 19:18:35 +01:00
Matthew Honnibal
1ef84a0557
* Merge master into rethinc2
2016-02-05 12:55:59 +01:00
Matthew Honnibal
6aa92b70f1
* Fix merge problem in span
2016-02-05 12:46:11 +01:00
Matthew Honnibal
419edfab50
* Use generic flags for the new attributes until they're added
2016-02-04 15:50:54 +01:00
Matthew Honnibal
11810be33e
* Add Python hooks for is_bracket/is_quote/is_left_punct/is_right_punct
2016-02-04 13:04:16 +01:00
Matthew Honnibal
4cbad510ff
* Fix calculation of head for spans with punctuation.
2016-02-03 02:32:21 +01:00
Matthew Honnibal
6bb007d16e
* Make set_parse nogil
2016-01-30 20:27:52 +01:00
Matthew Honnibal
87172a15c6
* Fix runtime error bug that arose from updated Span.root function.
2016-01-25 15:22:42 +01:00
Matthew Honnibal
334c4b2b57
* Disprefer punctuation and spaces as heads of spans
2016-01-18 18:14:09 +01:00
Matthew Honnibal
c107da9738
* Bug fix to _count_words_to_root
2016-01-18 16:59:38 +01:00
Matthew Honnibal
f24833d607
* Fix merge for coordinations
2016-01-18 16:03:19 +01:00
Matthew Honnibal
14534958a9
* Fix bug in Span.root
2016-01-18 15:40:28 +01:00
Matthew Honnibal
fc8f26584a
* Don't consider NPs connected to parse via conj relation as noun chunks. Change motivated by the nested noun chunks identified in Issue #203 , but might be problematic. Also allow root NPs to be considered noun chunks.
2016-01-16 17:52:40 +01:00
Matthew Honnibal
995b2d18fd
* Route token.string via token.txt_with_ws, to deprecate token.string in future
2016-01-16 17:14:34 +01:00
Matthew Honnibal
54a98eaf19
* Fix typo text_wth_ws --> text_with_ws. Reroute .string attribute to text_with_ws, to deprecate .string in future
2016-01-16 17:13:50 +01:00
Matthew Honnibal
03e8a4293d
* Add loop guard to Token.lefts and Token.rights properties
2016-01-16 16:18:17 +01:00
Matthew Honnibal
304339985e
* Add a linear scan to Span.root method, to help with long sentences
2016-01-16 16:17:28 +01:00
Matthew Honnibal
8cbcc3a799
* Fix calculation of root token in Span. Now take root to be word with shortest tree path. Avoids parse trees ending up in inconsistent state, as had occurred in Issue #214 .
2016-01-16 15:38:50 +01:00
Matthew Honnibal
42a9f29b40
* Add loop guard in Span.root, to raise errors if there is a cycle in the dependency parse, instead of entering an infinite loop. Re Issue #214
2016-01-16 11:53:37 +01:00
Matthew Honnibal
ab5aac5b2f
* Add .rank property to Token and Lexeme, for frequency rank
2015-11-08 16:18:25 +01:00
Matthew Honnibal
7663970d5f
* Removed unused i variable from Span, and set attributes to read-only
2015-11-07 17:06:15 +11:00
Matthew Honnibal
4b3c96d76d
* Fix zero-length spans
2015-11-07 17:05:16 +11:00
Matthew Honnibal
cc8febcbe1
* Fix Span comparison
2015-11-07 09:54:14 +11:00
Matthew Honnibal
a9b612abdf
* Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient
2015-11-07 09:01:12 +11:00
Matthew Honnibal
56499d89ef
* Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient
2015-11-07 08:55:34 +11:00
Andreas Grivas
4be7fda453
* span start, end -> properties. autoupdate after merge
2015-11-07 07:57:04 +11:00
Andreas Grivas
562db6d2d0
* merge add lex last - add index finder funcs
2015-11-07 07:57:04 +11:00
Matthew Honnibal
68f479e821
* Rename Doc.data to Doc.c
2015-11-04 00:15:14 +11:00
Matthew Honnibal
3ddea19b2b
* Rename spans.pyx to span.pyx
2015-11-04 00:14:40 +11:00
Matthew Honnibal
9482d616bc
* Rename spans.pyx to span.pyx
2015-11-03 23:51:05 +11:00
Matthew Honnibal
116da5990a
* Clean up setting of tag in doc.from_bytes
2015-11-03 23:48:57 +11:00
Matthew Honnibal
1e99fcd413
* Rename .repvec to .vector in C API
2015-11-03 23:47:59 +11:00
Matthew Honnibal
9e37437ba8
* Fix assign_tag in doc.merge
2015-11-03 19:07:02 +11:00
Matthew Honnibal
833eb35c57
* Fix tag assignment in doc.from_array
2015-11-03 18:45:54 +11:00
Matthew Honnibal
09664177d7
* Fix tag handling in doc.merge, and assign sent_start when setting heads.
2015-11-03 18:15:52 +11:00
Matthew Honnibal
604ceac4c6
* Fix morphological assignment in doc.merge()
2015-11-03 17:57:51 +11:00
Matthew Honnibal
5e040855a5
* Ensure morphological features and lemmas are loaded in from_array, re Issue #152
2015-11-03 17:56:50 +11:00
Matthew Honnibal
6161d2529a
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2015-11-03 13:36:30 +11:00
Matthew Honnibal
f7dd377575
* Adjust conjuncts iterator in Token
2015-11-03 13:23:22 +11:00
Andreas Grivas
d418f00eb1
fixed error when printing unicode
2015-11-02 20:23:18 +02:00
Matthew Honnibal
52fc338001
* Set is_parsed and is_tagged attrs when loading annotations into Doc, re Issue #152
2015-10-28 10:43:22 +11:00
Andreas Grivas
93ada458e2
added __repr__ that prints text in ipython for doc, token, and span objects
2015-10-21 14:11:46 +03:00