spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-27 10:26:35 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	6f82065761	* Fix infixed commas in tokenizer, re Issue #326 . Need to benchmark on empirical data, to make sure this doesn't break other cases.	2016-04-14 11:36:03 +02:00
Matthew Honnibal	0f957dd586	Merge branch 'master' of ssh://github.com/honnibal/spaCy	2016-04-14 10:37:56 +02:00
Matthew Honnibal	108aca0e50	* Make Matcher use attrs from the attrs.pyx file, rather than having an incomplete function doing the mapping.	2016-04-14 10:37:39 +02:00
Matthew Honnibal	61d20de35d	* Fix language.py docstring	2016-04-14 10:36:57 +02:00
Wolfgang Seeker	d99a9cbce9	different handling of space tokens space tokens are now always attached to the previous non-space token there are two exceptions: leading space tokens are attached to the first following non-space token in input that consists exclusively of space tokens, the last space token is the head of all others.	2016-04-13 15:28:28 +02:00
Matthew Honnibal	04d0209be9	* Recognise multiple infixes in a token.	2016-04-13 18:38:26 +10:00
Henning Peters	a473d6e937	fix tests (use english model)	2016-04-12 16:41:57 +02:00
Henning Peters	f2d011c034	avoid polluting spacy namespace with lang classes	2016-04-12 16:31:16 +02:00
Henning Peters	ff690f76ba	fix loading non-german models	2016-04-12 16:00:56 +02:00
Henning Peters	6215272786	remove ujson as default non-dev dependency (still works as fallback if installed), because ujson doesn't ship wheels	2016-04-12 11:28:07 +02:00
Matthew Honnibal	6df3858dbc	* Fix Issue #323 : Incorrect semantics of Token.__str__ built-in. Add flag to allow users to switch the old semantics back on, to ease transition.	2016-04-12 13:17:59 +10:00
Wolfgang Seeker	d328e0b4a8	Merge branch 'master' into space_head_bug	2016-04-11 12:11:01 +02:00
Wolfgang Seeker	80bea62842	bugfix in unit test	2016-04-08 16:46:44 +02:00
Wolfgang Seeker	be4903a1b2	update version numbers	2016-04-08 13:54:05 +02:00
Wolfgang Seeker	1fe911cdb0	bigfix	2016-04-07 18:19:51 +02:00
Matthew Honnibal	872695759d	Merge pull request #306 from wbwseeker/german_noun_chunks add German noun chunk functionality	2016-04-08 00:54:24 +10:00
Henning Peters	470cdf5bf9	remove deprecated LOCAL_DATA_DIR	2016-04-05 11:25:54 +02:00
Matthew Honnibal	26622f0ffc	Merge branch 'master' of ssh://github.com/honnibal/spaCy	2016-03-29 14:31:52 +11:00
Matthew Honnibal	b1fe41b45d	* Extend infix test, commenting on limitation of tokenizer w.r.t. infixes at the moment.	2016-03-29 14:31:05 +11:00
Matthew Honnibal	9c73983bdd	* Add test for hyphenation problem in Issue #302	2016-03-29 14:27:13 +11:00
Matthew Honnibal	ad119c074f	* Fix incorrect whitespacing in Doc.text. This change is potentially breaking, to anyone who was relying on the previous incorrect semantics.	2016-03-29 13:02:42 +11:00
Matthew Honnibal	8c7a1908ee	Merge pull request #307 from scoder/faster_string_store remove internal redundancy and overhead from StringStore	2016-03-29 12:59:52 +11:00
Wolfgang Seeker	7195b6742d	add restrictions to L-arc and R-arc to prevent space heads	2016-03-28 10:40:52 +02:00
Matthew Honnibal	8c77a994c6	Merge pull request #305 from henningpeters/master multiple langs in download script	2016-03-26 21:54:59 +11:00
Henning Peters	c90d4a6f17	relative imports in __init__.py	2016-03-26 11:44:53 +01:00
Henning Peters	db095a162c	fix	2016-03-25 18:59:47 +01:00
Henning Peters	b8f63071eb	add lang registration facility	2016-03-25 18:54:45 +01:00
Matthew Honnibal	4a37fdcee1	Merge pull request #287 from wbwseeker/deproj_sentbnd_bug add function to Token for setting head and dep (and dep_)	2016-03-25 09:47:45 +11:00
Stefan Behnel	f18805ee1c	make StringStore.__contains__() return True for the empty string (which is also contained in iteration)	2016-03-24 15:42:12 +01:00
Stefan Behnel	f2cfbfc412	remove internal redundancy and overhead from StringStore	2016-03-24 15:25:27 +01:00
Wolfgang Seeker	d65ef41d08	make error messages language independent	2016-03-24 11:47:09 +01:00
Henning Peters	963570aa49	Merge branch 'master' of github.com:spacy-io/spaCy	2016-03-24 11:19:47 +01:00
Henning Peters	a7d7ea3afa	first idea for supporting multiple langs in download script	2016-03-24 11:19:43 +01:00
Wolfgang Seeker	5080077097	revert init_model.py back to pre-german state (because it makes more sense) simplify token.n_rights and token.n_lefts	2016-03-21 16:10:25 +01:00
Wolfgang Seeker	5e2e8e951a	add baseclass DocIterator for iterators over documents add classes for English and German noun chunks the respective iterators are set for the document when created by the parser as they depend on the annotation scheme of the parsing model	2016-03-16 15:53:35 +01:00
Matthew Honnibal	80134eb12d	Merge branch 'master' of https://github.com/spacy-io/spaCy	2016-03-15 19:14:50 +00:00
Wolfgang Seeker	2ae253ef5b	changed head.__set__ to make it simpler	2016-03-14 13:43:48 +01:00
Henning Peters	c12d3dd200	add __init__.py to empty package dirs	2016-03-14 11:28:03 +01:00
Henning Peters	54f3447b5f	cleanup	2016-03-14 01:46:33 +01:00
Wolfgang Seeker	46e3f979f1	add function for setting head and label to token change PseudoProjectivity.deprojectivize to use these functions	2016-03-11 17:31:06 +01:00
Wolfgang Seeker	03fb498dbe	introduce lang field for LexemeC to hold language id put noun_chunk logic into iterators.py for each language separately	2016-03-10 13:01:34 +01:00
Wolfgang Seeker	bc9c62e279	replace Language functions with corresponding orth functions implement punctuation functions in orth	2016-03-09 18:07:37 +01:00
Wolfgang Seeker	d9312bc9ea	add new files npchunks.{pyx,pxd} to hold noun phrase chunk generators	2016-03-09 16:18:48 +01:00
Matthew Honnibal	1508528c8c	* Increment version	2016-03-08 15:58:45 +00:00
Matthew Honnibal	963fe5258e	* Add missing __contains__ method to vocab	2016-03-08 15:49:10 +00:00
Matthew Honnibal	478aa21cb0	* Remove broken __reduce__ method on vocab	2016-03-08 15:48:21 +00:00
Matthew Honnibal	20235bde00	Merge pull request #282 from henningpeters/switch_vectors initial proposal for ability to switch vectors	2016-03-09 01:39:41 +11:00
Henning Peters	eb7ae61b1c	cleanup api	2016-03-08 12:59:18 +01:00
Henning Peters	b740f20191	hash_string() should not depend on python's internal unicode representation, also fixes https://github.com/spacy-io/sense2vec/issues/5 for py2	2016-03-06 09:19:27 +01:00
Henning Peters	aa4d964c14	cleanup api	2016-03-05 17:51:32 +01:00
Henning Peters	931c07a609	initial proposal for separate vector package	2016-03-04 11:09:06 +01:00
Wolfgang Seeker	7adbd7a785	replace Counter with normal dict	2016-03-03 21:36:27 +01:00
Wolfgang Seeker	1ae487a4f6	add backwards compatibility with python 2.6	2016-03-03 21:18:12 +01:00
Wolfgang Seeker	9d1e6de4a0	make a proper list from zip iterator	2016-03-03 19:51:01 +01:00
Wolfgang Seeker	49f9d1c085	change test_nonproj.py to not use zip inside numpy.asarray	2016-03-03 19:42:09 +01:00
Wolfgang Seeker	72b8df0684	turned PseudoProjectivity into a normal python class	2016-03-03 19:05:08 +01:00
Matthew Honnibal	fcaa0ad7ce	Merge pull request #280 from wbwseeker/german_parser German parser	2016-03-04 03:27:42 +11:00
Wolfgang Seeker	690c5acabf	adjust train.py to train both english and german models	2016-03-03 15:21:00 +01:00
Wolfgang Seeker	3448cb40a4	integrated pseudo-projective parsing into parser - nonproj.pyx holds a class PseudoProjectivity which currently holds all functionality to implement Nivre & Nilsson 2005's pseudo-projective parsing using the HEAD decoration scheme - changed lefts/rights in Token to account for possible non-projective structures	2016-03-01 10:09:08 +01:00
Wolfgang Seeker	56b7210e82	moved nonproj.py to syntax/nonproj.pyx	2016-02-25 15:08:49 +01:00
Henning Peters	f3df736e0a	remove unidecode-related test	2016-02-24 18:22:22 +01:00
Wolfgang Seeker	4b2297d5d4	add class PseudoProjective for pseudo-projective parsing PseudoProjective() implements the algorithm from Nivre & Nilsson 2005 using their HEAD decoration scheme.	2016-02-24 11:26:25 +01:00
Henning Peters	12d58a7099	remove text-unidecode dependency	2016-02-24 08:01:59 +01:00
Wolfgang Seeker	8d531c958b	replace tests for non-projectivity - add functions to find non-projective edges - add test file for non-projectivity functions	2016-02-22 14:40:40 +01:00
Matthew Honnibal	141639ea3a	* Fix bug in tokenizer that caused new tokens to be added for affixes	2016-02-21 23:17:47 +00:00
Wolfgang Seeker	eae35e9b27	add tokenizer files for German, add/change code to train German pos tagger - add files to specify rules for German tokenization - change generate_specials.py to generate from an external file (abbrev.de.tab) - copy gazetteer.json from lang_data/en/ - init_model.py - change doc freq threshold to 0 - add train_german_tagger.py - expects conll09-formatted input	2016-02-18 13:24:20 +01:00
Henning Peters	9cc4f8d5b3	avoid shadowing __name__	2016-02-15 01:33:39 +01:00
Henning Peters	4c9e3c7911	upgrade spuntik, enforce data api via model version constraints	2016-02-14 16:03:17 +01:00
Henning Peters	9d8966a2c0	Update test_tokenizer.py	2016-02-10 19:24:37 +01:00
Henning Peters	3b5f1e753b	py26 compatibility	2016-02-10 14:32:54 +01:00
Henning Peters	ee1f1ac300	mark test_sentence_space() as model test	2016-02-10 07:49:11 +01:00
Matthew Honnibal	5d96b3ef4f	* Increment version	2016-02-07 13:48:58 +01:00
Matthew Honnibal	1b83cb9dfa	* Fix Issue #251 : Incorrect right edge calculation on left-clobber low in the tree	2016-02-07 00:00:42 +01:00
Matthew Honnibal	c6623889c1	* Add test for Issue #251 : Incorrect right edges, caused by bad update to r_edge in del_arc, triggered from non-monotonic left-arc	2016-02-06 23:47:51 +01:00
Matthew Honnibal	a95974ad3f	* Fix oov probability	2016-02-06 15:13:55 +01:00
Matthew Honnibal	af8514cb0c	* Refine the way the is_parsed attribute is set by from_array	2016-02-06 14:44:35 +01:00
Matthew Honnibal	161b01d4c0	* Tweak usage example for multi-processing	2016-02-06 14:44:11 +01:00
Matthew Honnibal	7f24229f10	* Don't try to pickle the tokenizer	2016-02-06 14:09:05 +01:00
Matthew Honnibal	dcb401f3e1	* Remove broken Vocab pickling	2016-02-06 14:08:47 +01:00
Matthew Honnibal	e66d45bf66	* Restore previous patch to Span.root, as it seems it wasn't the cause of the problem.	2016-02-06 13:37:41 +01:00
Matthew Honnibal	4412a70dc5	* Initialize StateC._empty_token to 0, to avoid undefined behaviour.	2016-02-06 13:34:38 +01:00
Matthew Honnibal	1b41f868d2	* Check for errors in parser, and parallelise the left-over batch	2016-02-06 10:06:30 +01:00
Matthew Honnibal	031b00cb91	* Fix Span.root calculation	2016-02-05 20:12:09 +01:00
Matthew Honnibal	165ca28b80	* Set is_parsed flag in Parser.pipe	2016-02-05 19:51:44 +01:00
Matthew Honnibal	bdd579db0a	* Set is_parsed flag in Parser.pipe	2016-02-05 19:50:11 +01:00
Matthew Honnibal	7119e77fb6	* Fix Matcher.pipe	2016-02-05 19:46:02 +01:00
Matthew Honnibal	1cf0100bf6	* Add test for multithreading	2016-02-05 19:38:22 +01:00
Matthew Honnibal	b04c9aad71	* Fix off-by-one in Parser.pipe	2016-02-05 19:37:50 +01:00
Matthew Honnibal	e5c447e237	* Questionable fix to problem in Span.root	2016-02-05 19:18:35 +01:00
Matthew Honnibal	1ef84a0557	* Merge master into rethinc2	2016-02-05 12:55:59 +01:00
Matthew Honnibal	4cf34fc170	Merge branch 'rethinc2' of ssh://github.com/honnibal/spaCy into rethinc2	2016-02-05 12:48:28 +01:00
Matthew Honnibal	249dccbe95	* Fix Language.pipe	2016-02-05 12:47:57 +01:00
Matthew Honnibal	c0e63feccc	* xfail pickle tests	2016-02-05 12:46:58 +01:00
Matthew Honnibal	6aa92b70f1	* Fix merge problem in span	2016-02-05 12:46:11 +01:00
Matthew Honnibal	048dfe35aa	* cimport cython.parallel	2016-02-05 12:20:42 +01:00
Matthew Honnibal	af58f273b3	* Fix spacy.language.pipe	2016-02-05 12:20:29 +01:00
Matthew Honnibal	8a13cebdcc	* Update for modified thinc interface	2016-02-05 11:44:39 +01:00
Matthew Honnibal	48ce09687d	* Skip pickling the vocab in the tests	2016-02-04 15:51:19 +01:00
Matthew Honnibal	419edfab50	* Use generic flags for the new attributes until they're added	2016-02-04 15:50:54 +01:00
Matthew Honnibal	c4017a06d9	* Add placeholders for the new flags in attrs and symbols	2016-02-04 15:49:45 +01:00

1 2 3 4 5 ...

1568 Commits