spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-28 02:46:35 +03:00

Author	SHA1	Message	Date
Wolfgang Seeker	03fb498dbe	introduce lang field for LexemeC to hold language id put noun_chunk logic into iterators.py for each language separately	2016-03-10 13:01:34 +01:00
Wolfgang Seeker	d9312bc9ea	add new files npchunks.{pyx,pxd} to hold noun phrase chunk generators	2016-03-09 16:18:48 +01:00
Wolfgang Seeker	3448cb40a4	integrated pseudo-projective parsing into parser - nonproj.pyx holds a class PseudoProjectivity which currently holds all functionality to implement Nivre & Nilsson 2005's pseudo-projective parsing using the HEAD decoration scheme - changed lefts/rights in Token to account for possible non-projective structures	2016-03-01 10:09:08 +01:00
Matthew Honnibal	af8514cb0c	* Refine the way the is_parsed attribute is set by from_array	2016-02-06 14:44:35 +01:00
Matthew Honnibal	e66d45bf66	* Restore previous patch to Span.root, as it seems it wasn't the cause of the problem.	2016-02-06 13:37:41 +01:00
Matthew Honnibal	031b00cb91	* Fix Span.root calculation	2016-02-05 20:12:09 +01:00
Matthew Honnibal	e5c447e237	* Questionable fix to problem in Span.root	2016-02-05 19:18:35 +01:00
Matthew Honnibal	1ef84a0557	* Merge master into rethinc2	2016-02-05 12:55:59 +01:00
Matthew Honnibal	6aa92b70f1	* Fix merge problem in span	2016-02-05 12:46:11 +01:00
Matthew Honnibal	419edfab50	* Use generic flags for the new attributes until they're added	2016-02-04 15:50:54 +01:00
Matthew Honnibal	11810be33e	* Add Python hooks for is_bracket/is_quote/is_left_punct/is_right_punct	2016-02-04 13:04:16 +01:00
Matthew Honnibal	4cbad510ff	* Fix calculation of head for spans with punctuation.	2016-02-03 02:32:21 +01:00
Matthew Honnibal	6bb007d16e	* Make set_parse nogil	2016-01-30 20:27:52 +01:00
Matthew Honnibal	87172a15c6	* Fix runtime error bug that arose from updated Span.root function.	2016-01-25 15:22:42 +01:00
Matthew Honnibal	334c4b2b57	* Disprefer punctuation and spaces as heads of spans	2016-01-18 18:14:09 +01:00
Matthew Honnibal	c107da9738	* Bug fix to _count_words_to_root	2016-01-18 16:59:38 +01:00
Matthew Honnibal	f24833d607	* Fix merge for coordinations	2016-01-18 16:03:19 +01:00
Matthew Honnibal	14534958a9	* Fix bug in Span.root	2016-01-18 15:40:28 +01:00
Matthew Honnibal	fc8f26584a	* Don't consider NPs connected to parse via conj relation as noun chunks. Change motivated by the nested noun chunks identified in Issue #203 , but might be problematic. Also allow root NPs to be considered noun chunks.	2016-01-16 17:52:40 +01:00
Matthew Honnibal	995b2d18fd	* Route token.string via token.txt_with_ws, to deprecate token.string in future	2016-01-16 17:14:34 +01:00
Matthew Honnibal	54a98eaf19	* Fix typo text_wth_ws --> text_with_ws. Reroute .string attribute to text_with_ws, to deprecate .string in future	2016-01-16 17:13:50 +01:00
Matthew Honnibal	03e8a4293d	* Add loop guard to Token.lefts and Token.rights properties	2016-01-16 16:18:17 +01:00
Matthew Honnibal	304339985e	* Add a linear scan to Span.root method, to help with long sentences	2016-01-16 16:17:28 +01:00
Matthew Honnibal	8cbcc3a799	* Fix calculation of root token in Span. Now take root to be word with shortest tree path. Avoids parse trees ending up in inconsistent state, as had occurred in Issue #214 .	2016-01-16 15:38:50 +01:00
Matthew Honnibal	42a9f29b40	* Add loop guard in Span.root, to raise errors if there is a cycle in the dependency parse, instead of entering an infinite loop. Re Issue #214	2016-01-16 11:53:37 +01:00
Matthew Honnibal	ab5aac5b2f	* Add .rank property to Token and Lexeme, for frequency rank	2015-11-08 16:18:25 +01:00
Matthew Honnibal	7663970d5f	* Removed unused i variable from Span, and set attributes to read-only	2015-11-07 17:06:15 +11:00
Matthew Honnibal	4b3c96d76d	* Fix zero-length spans	2015-11-07 17:05:16 +11:00
Matthew Honnibal	cc8febcbe1	* Fix Span comparison	2015-11-07 09:54:14 +11:00
Matthew Honnibal	a9b612abdf	* Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient	2015-11-07 09:01:12 +11:00
Matthew Honnibal	56499d89ef	* Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient	2015-11-07 08:55:34 +11:00
Andreas Grivas	4be7fda453	* span start, end -> properties. autoupdate after merge	2015-11-07 07:57:04 +11:00
Andreas Grivas	562db6d2d0	* merge add lex last - add index finder funcs	2015-11-07 07:57:04 +11:00
Matthew Honnibal	68f479e821	* Rename Doc.data to Doc.c	2015-11-04 00:15:14 +11:00
Matthew Honnibal	3ddea19b2b	* Rename spans.pyx to span.pyx	2015-11-04 00:14:40 +11:00
Matthew Honnibal	9482d616bc	* Rename spans.pyx to span.pyx	2015-11-03 23:51:05 +11:00
Matthew Honnibal	116da5990a	* Clean up setting of tag in doc.from_bytes	2015-11-03 23:48:57 +11:00
Matthew Honnibal	1e99fcd413	* Rename .repvec to .vector in C API	2015-11-03 23:47:59 +11:00
Matthew Honnibal	9e37437ba8	* Fix assign_tag in doc.merge	2015-11-03 19:07:02 +11:00
Matthew Honnibal	833eb35c57	* Fix tag assignment in doc.from_array	2015-11-03 18:45:54 +11:00
Matthew Honnibal	09664177d7	* Fix tag handling in doc.merge, and assign sent_start when setting heads.	2015-11-03 18:15:52 +11:00
Matthew Honnibal	604ceac4c6	* Fix morphological assignment in doc.merge()	2015-11-03 17:57:51 +11:00
Matthew Honnibal	5e040855a5	* Ensure morphological features and lemmas are loaded in from_array, re Issue #152	2015-11-03 17:56:50 +11:00
Matthew Honnibal	6161d2529a	Merge branch 'master' of ssh://github.com/honnibal/spaCy	2015-11-03 13:36:30 +11:00
Matthew Honnibal	f7dd377575	* Adjust conjuncts iterator in Token	2015-11-03 13:23:22 +11:00
Andreas Grivas	d418f00eb1	fixed error when printing unicode	2015-11-02 20:23:18 +02:00
Matthew Honnibal	52fc338001	* Set is_parsed and is_tagged attrs when loading annotations into Doc, re Issue #152	2015-10-28 10:43:22 +11:00
Andreas Grivas	93ada458e2	added __repr__ that prints text in ipython for doc, token, and span objects	2015-10-21 14:11:46 +03:00
Matthew Honnibal	135062d23c	* Fix error with merged text when merged region did not have trailing whitespace	2015-10-19 15:47:04 +11:00
Matthew Honnibal	9839cd2c0b	* Fix whitespace_ calculation in Token	2015-10-18 17:21:11 +11:00
Matthew Honnibal	a7e6c5ac8f	* Fix Issue #122 : Incorrect calculation of children after Doc.merge()	2015-10-18 17:17:27 +11:00
Matthew Honnibal	6e0f985afc	* Fix token.conjuncts	2015-10-15 03:49:45 +11:00
Matthew Honnibal	2e0104ac81	* Fix token.conjuncts	2015-10-15 03:47:45 +11:00
Matthew Honnibal	b8f3345a82	* Fix token.conjuncts method	2015-10-15 03:36:01 +11:00
Matthew Honnibal	23818f89b8	* Fix token.conjuncts method	2015-10-15 03:34:57 +11:00
Matthew Honnibal	94bafc1417	* Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS	2015-10-10 17:57:29 +11:00
Yubing (Tom) Dong	0f601b8b75	Update docstring of Doc.__getitem__	2015-10-07 01:27:28 -07:00
Yubing (Tom) Dong	3fd3bc79aa	Refactor to remove duplicate slicing logic	2015-10-07 01:25:35 -07:00
Yubing (Tom) Dong	97685aecb7	Add slicing support to Span	2015-10-06 02:45:49 -07:00
Yubing (Tom) Dong	ef2af20cd3	Make Doc's slicing behavior conform to Python conventions	2015-10-06 02:41:28 -07:00
Yubing (Tom) Dong	2fc33e8024	Allow step=1 when slicing a Doc	2015-10-06 00:57:05 -07:00
Matthew Honnibal	87e6186828	* Rename _seq to doc attribute in Span	2015-09-29 23:03:55 +10:00
Matthew Honnibal	ab694b0364	* Fix open-bounded slice indices.	2015-09-29 23:03:09 +10:00
Matthew Honnibal	f7283a5067	* Fix vectors bugs for OOV words	2015-09-22 02:10:25 +02:00
Matthew Honnibal	44aecba701	* Fix Token.has_vector and Lexeme.has_vector	2015-09-22 01:43:16 +02:00
Matthew Honnibal	596fde8daa	* Add has_vector attribute to Token and Lexeme	2015-09-21 19:52:43 +10:00
Matthew Honnibal	f32927efbf	* Raise exceptions if attempt to access parse, but data is not installed. This partly but not fully addresses Issue #97 . Still need exceptions on the various Token attributes that access the parse tree, e.g. token.head, token.lefts, token.rights, etc. Exceptions should be centralized, too.	2015-09-21 18:35:40 +10:00
Matthew Honnibal	388062ae01	* Fix repvec_length problem	2015-09-21 18:10:51 +10:00
Matthew Honnibal	d00fe2bbc6	* Don't allow Span objects to be written to, as it introduces subtle bugs because they're created afresh from Doc.sents, Doc.ents etc.	2015-09-21 17:59:39 +10:00
Matthew Honnibal	77856c4fcd	* Try giving Doc and Span objects vector and vector_norm attributes, and .similarity functions. Turns out to be bad idea.	2015-09-17 11:50:11 +10:00
Matthew Honnibal	60c26b2dfa	* Fix slicing when start or stop is None	2015-09-15 14:43:10 +10:00
Matthew Honnibal	193f127f81	* Fix ugly py_check_flag and py_set_flag functions in Lexeme	2015-09-15 13:06:18 +10:00
Matthew Honnibal	65dc0d1dfb	* Extend word vectors support, with .similarity() function, vector_norm property, and rename repvec to vector. Keep repvec name as well for now for backwards compatibility.	2015-09-14 17:49:58 +10:00
Matthew Honnibal	c08f10083c	* Add test and test_with_ws attributes.	2015-09-13 10:27:42 +10:00
Matthew Honnibal	9e7bfe8449	* Fix space at end of merged token	2015-09-10 14:45:17 +02:00
Matthew Honnibal	31ccf494e6	Merge branch 'develop' of https://github.com/honnibal/spaCy into develop	2015-09-09 14:33:38 +02:00
Matthew Honnibal	07686470a9	* Don't consider a coordinated NP a base chunk	2015-09-09 14:32:28 +02:00
Matthew Honnibal	0e24d099a1	* Fix L/R edge bug, by ensuring l_edge and r_edge are preset, and fixing the way the edge update in del_arc. Bugs keep arising here because the edges are absolute positions, where everything else is relative. I'm also not 100% convinced that del_arc is handled correctly. Do we need to update the parents?	2015-09-09 03:40:44 +02:00
Matthew Honnibal	86c888667f	* Merge in changes from de branch	2015-09-06 19:49:28 +02:00
Matthew Honnibal	d2fc104a26	* Begin merge of Gazetteer and DE branches	2015-09-06 19:45:15 +02:00
Matthew Honnibal	7e4fea67d3	* Fix bug in token subtree, introduced by duplication of L/R code in Stateclass. Need to consolidate the two methods.	2015-09-06 10:48:36 +02:00
Matthew Honnibal	fd1eeb3102	* Add POS attribute support in get_attr	2015-09-06 04:13:03 +02:00
Matthew Honnibal	c2307fa9ee	* More work on language-generic parsing	2015-08-28 02:02:33 +02:00
Matthew Honnibal	d30029979e	* Avoid import of morphology in spans	2015-08-26 19:20:46 +02:00
Matthew Honnibal	6f1743692a	* Work on language-independent refactoring	2015-08-23 20:49:18 +02:00
Matthew Honnibal	01be34d55a	* Whitespace	2015-08-08 23:37:44 +02:00
Matthew Honnibal	b0f5c39084	* Fix handling of exclusion entities	2015-08-06 17:28:43 +02:00
Matthew Honnibal	10d869d102	* Don't allow conjunction between NPs in base NP chunks	2015-08-06 16:31:53 +02:00
Matthew Honnibal	9c1724ecae	* Gazetteer stuff working, now need to wire up to API	2015-08-06 00:35:40 +02:00
Matthew Honnibal	eb7138c761	* Add attr relation in base NP detection	2015-08-01 00:34:40 +02:00
Matthew Honnibal	4988356cf0	* Fix dependency type bug from merged tokens	2015-08-01 00:33:24 +02:00
Matthew Honnibal	78a9068319	* Fix spacy attr on merged tokens	2015-07-30 04:25:58 +02:00
Matthew Honnibal	430e2edb96	* Fix noun_chunks issue	2015-07-30 03:51:50 +02:00
Matthew Honnibal	9590968fc1	* Fix negative indices in Span	2015-07-30 02:30:24 +02:00
Matthew Honnibal	74d8cb3980	* Add noun_chunks iterator, and fix left/right child setting in Doc.merge	2015-07-30 02:29:49 +02:00
Matthew Honnibal	d153f18969	* Fix negative indices on spans	2015-07-29 22:36:03 +02:00
Matthew Honnibal	b5132bed7d	* Set left and right children when loading parse from byte string	2015-07-28 21:03:18 +02:00
Matthew Honnibal	6609fcf4b2	* Make mem and vocab python-visible in Doc	2015-07-28 20:46:59 +02:00
Matthew Honnibal	aa7a964a4f	* Add a type declaration for doc.from_array	2015-07-27 22:57:22 +02:00
Matthew Honnibal	8e4c69ee8c	* Add is_oov property, and fix up handling of attributes	2015-07-27 01:50:06 +02:00

1 2 3 4

177 Commits