spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-04 02:46:40 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	76f1d871da	Merge branch 'master' of ssh://github.com/spacy-io/spaCy	2016-05-04 15:54:00 +02:00
Matthew Honnibal	b4bfc6ae55	* Add test for Issue #351 : Indices off when leading whitespace	2016-05-04 15:53:17 +02:00
Wolfgang Seeker	a06fca9fdf	German noun chunk iterator now doesn't return tokens more than once	2016-05-03 16:58:59 +02:00
Wolfgang Seeker	7825b75548	add tests for German noun chunker	2016-05-03 15:01:28 +02:00
Wolfgang Seeker	7b246c13cb	reformulate noun chunk tests for English	2016-05-03 14:24:35 +02:00
Wolfgang Seeker	1786331cd8	add model sanity test	2016-05-03 12:51:47 +02:00
Matthew Honnibal	308a28c26c	* Whitespace	2016-05-02 16:08:11 +02:00
Matthew Honnibal	c1c11a8ae0	* Fix formatting on serializer tests	2016-05-02 16:07:21 +02:00
Matthew Honnibal	902a389d85	* Fix merge conflict in test_parse	2016-05-02 15:28:07 +02:00
Matthew Honnibal	02c23cc1d0	* Fix sentence boundary test	2016-05-02 15:26:07 +02:00
Matthew Honnibal	d2f469b809	* Fix parsing tests, so that labels are added if they're missing, and so that the branching test values are correct	2016-05-02 15:25:27 +02:00
Wolfgang Seeker	b11cbb06c6	remove old tests for sentence boundary detection	2016-05-02 14:36:35 +02:00
Matthew Honnibal	508fd1f6dc	* Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples.	2016-05-02 14:25:10 +02:00
Wolfgang Seeker	fa961ea694	add tests for serialization bug	2016-05-02 11:01:56 +02:00
Wolfgang Seeker	1003e7ccec	remove debug output from tests	2016-04-25 12:12:40 +02:00
Wolfgang Seeker	f57f843e85	fix bug in updating tree structure when introducing additional roots	2016-04-25 12:01:19 +02:00
Wolfgang Seeker	b6477fc4f4	adjusted tests to Travis Setup	2016-04-21 17:15:10 +02:00
Wolfgang Seeker	736ffcb9a2	remove whitespace	2016-04-21 16:55:55 +02:00
Wolfgang Seeker	6c7301cc6d	the parser now introduces sentence boundaries properly when predicting dependents with root labels	2016-04-21 16:50:53 +02:00
Wolfgang Seeker	12024b0b0a	bugfix: introducing multiple roots now updates original head's properties adjust tests to rely less on statistical model	2016-04-20 16:42:41 +02:00
Matthew Honnibal	2add5206aa	* Fix description of matcher test	2016-04-17 15:40:21 +02:00
Matthew Honnibal	2b419d5b8c	* Update test for Issue #242	2016-04-17 15:34:23 +02:00
Matthew Honnibal	f12b043308	* Add test for Issue #242 : Overlapping matches not well recognised.	2016-04-17 15:19:17 +02:00
Matthew Honnibal	c0909afe22	Merge pull request #312 from wbwseeker/space_head_bug add restrictions to L-arc and R-arc to prevent space heads	2016-04-15 20:36:03 +10:00
Matthew Honnibal	6f82065761	* Fix infixed commas in tokenizer, re Issue #326 . Need to benchmark on empirical data, to make sure this doesn't break other cases.	2016-04-14 11:36:03 +02:00
Matthew Honnibal	0f957dd586	Merge branch 'master' of ssh://github.com/honnibal/spaCy	2016-04-14 10:37:56 +02:00
Wolfgang Seeker	d99a9cbce9	different handling of space tokens space tokens are now always attached to the previous non-space token there are two exceptions: leading space tokens are attached to the first following non-space token in input that consists exclusively of space tokens, the last space token is the head of all others.	2016-04-13 15:28:28 +02:00
Matthew Honnibal	04d0209be9	* Recognise multiple infixes in a token.	2016-04-13 18:38:26 +10:00
Henning Peters	a473d6e937	fix tests (use english model)	2016-04-12 16:41:57 +02:00
Matthew Honnibal	6df3858dbc	* Fix Issue #323 : Incorrect semantics of Token.__str__ built-in. Add flag to allow users to switch the old semantics back on, to ease transition.	2016-04-12 13:17:59 +10:00
Wolfgang Seeker	80bea62842	bugfix in unit test	2016-04-08 16:46:44 +02:00
Matthew Honnibal	26622f0ffc	Merge branch 'master' of ssh://github.com/honnibal/spaCy	2016-03-29 14:31:52 +11:00
Matthew Honnibal	b1fe41b45d	* Extend infix test, commenting on limitation of tokenizer w.r.t. infixes at the moment.	2016-03-29 14:31:05 +11:00
Matthew Honnibal	9c73983bdd	* Add test for hyphenation problem in Issue #302	2016-03-29 14:27:13 +11:00
Matthew Honnibal	4a37fdcee1	Merge pull request #287 from wbwseeker/deproj_sentbnd_bug add function to Token for setting head and dep (and dep_)	2016-03-25 09:47:45 +11:00
Henning Peters	c12d3dd200	add __init__.py to empty package dirs	2016-03-14 11:28:03 +01:00
Wolfgang Seeker	46e3f979f1	add function for setting head and label to token change PseudoProjectivity.deprojectivize to use these functions	2016-03-11 17:31:06 +01:00
Matthew Honnibal	963fe5258e	* Add missing __contains__ method to vocab	2016-03-08 15:49:10 +00:00
Wolfgang Seeker	9d1e6de4a0	make a proper list from zip iterator	2016-03-03 19:51:01 +01:00
Wolfgang Seeker	49f9d1c085	change test_nonproj.py to not use zip inside numpy.asarray	2016-03-03 19:42:09 +01:00
Matthew Honnibal	fcaa0ad7ce	Merge pull request #280 from wbwseeker/german_parser German parser	2016-03-04 03:27:42 +11:00
Wolfgang Seeker	690c5acabf	adjust train.py to train both english and german models	2016-03-03 15:21:00 +01:00
Wolfgang Seeker	3448cb40a4	integrated pseudo-projective parsing into parser - nonproj.pyx holds a class PseudoProjectivity which currently holds all functionality to implement Nivre & Nilsson 2005's pseudo-projective parsing using the HEAD decoration scheme - changed lefts/rights in Token to account for possible non-projective structures	2016-03-01 10:09:08 +01:00
Henning Peters	f3df736e0a	remove unidecode-related test	2016-02-24 18:22:22 +01:00
Wolfgang Seeker	4b2297d5d4	add class PseudoProjective for pseudo-projective parsing PseudoProjective() implements the algorithm from Nivre & Nilsson 2005 using their HEAD decoration scheme.	2016-02-24 11:26:25 +01:00
Wolfgang Seeker	8d531c958b	replace tests for non-projectivity - add functions to find non-projective edges - add test file for non-projectivity functions	2016-02-22 14:40:40 +01:00
Henning Peters	9d8966a2c0	Update test_tokenizer.py	2016-02-10 19:24:37 +01:00
Henning Peters	3b5f1e753b	py26 compatibility	2016-02-10 14:32:54 +01:00
Henning Peters	ee1f1ac300	mark test_sentence_space() as model test	2016-02-10 07:49:11 +01:00
Matthew Honnibal	c6623889c1	* Add test for Issue #251 : Incorrect right edges, caused by bad update to r_edge in del_arc, triggered from non-monotonic left-arc	2016-02-06 23:47:51 +01:00

1 2 3

133 Commits