spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-11-17 00:06:04 +03:00

Author	SHA1	Message	Date
Wolfgang Seeker	46e3f979f1	add function for setting head and label to token change PseudoProjectivity.deprojectivize to use these functions	2016-03-11 17:31:06 +01:00
Wolfgang Seeker	3448cb40a4	integrated pseudo-projective parsing into parser - nonproj.pyx holds a class PseudoProjectivity which currently holds all functionality to implement Nivre & Nilsson 2005's pseudo-projective parsing using the HEAD decoration scheme - changed lefts/rights in Token to account for possible non-projective structures	2016-03-01 10:09:08 +01:00
Matthew Honnibal	af8514cb0c	* Refine the way the is_parsed attribute is set by from_array	2016-02-06 14:44:35 +01:00
Matthew Honnibal	e66d45bf66	* Restore previous patch to Span.root, as it seems it wasn't the cause of the problem.	2016-02-06 13:37:41 +01:00
Matthew Honnibal	031b00cb91	* Fix Span.root calculation	2016-02-05 20:12:09 +01:00
Matthew Honnibal	e5c447e237	* Questionable fix to problem in Span.root	2016-02-05 19:18:35 +01:00
Matthew Honnibal	1ef84a0557	* Merge master into rethinc2	2016-02-05 12:55:59 +01:00
Matthew Honnibal	6aa92b70f1	* Fix merge problem in span	2016-02-05 12:46:11 +01:00
Matthew Honnibal	419edfab50	* Use generic flags for the new attributes until they're added	2016-02-04 15:50:54 +01:00
Matthew Honnibal	11810be33e	* Add Python hooks for is_bracket/is_quote/is_left_punct/is_right_punct	2016-02-04 13:04:16 +01:00
Matthew Honnibal	4cbad510ff	* Fix calculation of head for spans with punctuation.	2016-02-03 02:32:21 +01:00
Matthew Honnibal	6bb007d16e	* Make set_parse nogil	2016-01-30 20:27:52 +01:00
Matthew Honnibal	87172a15c6	* Fix runtime error bug that arose from updated Span.root function.	2016-01-25 15:22:42 +01:00
Matthew Honnibal	334c4b2b57	* Disprefer punctuation and spaces as heads of spans	2016-01-18 18:14:09 +01:00
Matthew Honnibal	c107da9738	* Bug fix to _count_words_to_root	2016-01-18 16:59:38 +01:00
Matthew Honnibal	f24833d607	* Fix merge for coordinations	2016-01-18 16:03:19 +01:00
Matthew Honnibal	14534958a9	* Fix bug in Span.root	2016-01-18 15:40:28 +01:00
Matthew Honnibal	fc8f26584a	* Don't consider NPs connected to parse via conj relation as noun chunks. Change motivated by the nested noun chunks identified in Issue #203 , but might be problematic. Also allow root NPs to be considered noun chunks.	2016-01-16 17:52:40 +01:00
Matthew Honnibal	995b2d18fd	* Route token.string via token.txt_with_ws, to deprecate token.string in future	2016-01-16 17:14:34 +01:00
Matthew Honnibal	54a98eaf19	* Fix typo text_wth_ws --> text_with_ws. Reroute .string attribute to text_with_ws, to deprecate .string in future	2016-01-16 17:13:50 +01:00
Matthew Honnibal	03e8a4293d	* Add loop guard to Token.lefts and Token.rights properties	2016-01-16 16:18:17 +01:00
Matthew Honnibal	304339985e	* Add a linear scan to Span.root method, to help with long sentences	2016-01-16 16:17:28 +01:00
Matthew Honnibal	8cbcc3a799	* Fix calculation of root token in Span. Now take root to be word with shortest tree path. Avoids parse trees ending up in inconsistent state, as had occurred in Issue #214 .	2016-01-16 15:38:50 +01:00
Matthew Honnibal	42a9f29b40	* Add loop guard in Span.root, to raise errors if there is a cycle in the dependency parse, instead of entering an infinite loop. Re Issue #214	2016-01-16 11:53:37 +01:00
Matthew Honnibal	ab5aac5b2f	* Add .rank property to Token and Lexeme, for frequency rank	2015-11-08 16:18:25 +01:00
Matthew Honnibal	7663970d5f	* Removed unused i variable from Span, and set attributes to read-only	2015-11-07 17:06:15 +11:00
Matthew Honnibal	4b3c96d76d	* Fix zero-length spans	2015-11-07 17:05:16 +11:00
Matthew Honnibal	cc8febcbe1	* Fix Span comparison	2015-11-07 09:54:14 +11:00
Matthew Honnibal	a9b612abdf	* Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient	2015-11-07 09:01:12 +11:00
Matthew Honnibal	56499d89ef	* Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient	2015-11-07 08:55:34 +11:00
Andreas Grivas	4be7fda453	* span start, end -> properties. autoupdate after merge	2015-11-07 07:57:04 +11:00
Andreas Grivas	562db6d2d0	* merge add lex last - add index finder funcs	2015-11-07 07:57:04 +11:00
Matthew Honnibal	68f479e821	* Rename Doc.data to Doc.c	2015-11-04 00:15:14 +11:00
Matthew Honnibal	3ddea19b2b	* Rename spans.pyx to span.pyx	2015-11-04 00:14:40 +11:00
Matthew Honnibal	9482d616bc	* Rename spans.pyx to span.pyx	2015-11-03 23:51:05 +11:00
Matthew Honnibal	116da5990a	* Clean up setting of tag in doc.from_bytes	2015-11-03 23:48:57 +11:00
Matthew Honnibal	1e99fcd413	* Rename .repvec to .vector in C API	2015-11-03 23:47:59 +11:00
Matthew Honnibal	9e37437ba8	* Fix assign_tag in doc.merge	2015-11-03 19:07:02 +11:00
Matthew Honnibal	833eb35c57	* Fix tag assignment in doc.from_array	2015-11-03 18:45:54 +11:00
Matthew Honnibal	09664177d7	* Fix tag handling in doc.merge, and assign sent_start when setting heads.	2015-11-03 18:15:52 +11:00
Matthew Honnibal	604ceac4c6	* Fix morphological assignment in doc.merge()	2015-11-03 17:57:51 +11:00
Matthew Honnibal	5e040855a5	* Ensure morphological features and lemmas are loaded in from_array, re Issue #152	2015-11-03 17:56:50 +11:00
Matthew Honnibal	6161d2529a	Merge branch 'master' of ssh://github.com/honnibal/spaCy	2015-11-03 13:36:30 +11:00
Matthew Honnibal	f7dd377575	* Adjust conjuncts iterator in Token	2015-11-03 13:23:22 +11:00
Andreas Grivas	d418f00eb1	fixed error when printing unicode	2015-11-02 20:23:18 +02:00
Matthew Honnibal	52fc338001	* Set is_parsed and is_tagged attrs when loading annotations into Doc, re Issue #152	2015-10-28 10:43:22 +11:00
Andreas Grivas	93ada458e2	added __repr__ that prints text in ipython for doc, token, and span objects	2015-10-21 14:11:46 +03:00
Matthew Honnibal	135062d23c	* Fix error with merged text when merged region did not have trailing whitespace	2015-10-19 15:47:04 +11:00
Matthew Honnibal	9839cd2c0b	* Fix whitespace_ calculation in Token	2015-10-18 17:21:11 +11:00
Matthew Honnibal	a7e6c5ac8f	* Fix Issue #122 : Incorrect calculation of children after Doc.merge()	2015-10-18 17:17:27 +11:00

1 2 3

126 Commits