spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-14 05:37:03 +03:00

Author	SHA1	Message	Date
Stefan Behnel	f18805ee1c	make StringStore.__contains__() return True for the empty string (which is also contained in iteration)	2016-03-24 15:42:12 +01:00
Stefan Behnel	f2cfbfc412	remove internal redundancy and overhead from StringStore	2016-03-24 15:25:27 +01:00
Wolfgang Seeker	d65ef41d08	make error messages language independent	2016-03-24 11:47:09 +01:00
Henning Peters	963570aa49	Merge branch 'master' of github.com:spacy-io/spaCy	2016-03-24 11:19:47 +01:00
Henning Peters	a7d7ea3afa	first idea for supporting multiple langs in download script	2016-03-24 11:19:43 +01:00
Wolfgang Seeker	5080077097	revert init_model.py back to pre-german state (because it makes more sense) simplify token.n_rights and token.n_lefts	2016-03-21 16:10:25 +01:00
Wolfgang Seeker	5e2e8e951a	add baseclass DocIterator for iterators over documents add classes for English and German noun chunks the respective iterators are set for the document when created by the parser as they depend on the annotation scheme of the parsing model	2016-03-16 15:53:35 +01:00
Matthew Honnibal	80134eb12d	Merge branch 'master' of https://github.com/spacy-io/spaCy	2016-03-15 19:14:50 +00:00
Wolfgang Seeker	2ae253ef5b	changed head.__set__ to make it simpler	2016-03-14 13:43:48 +01:00
Henning Peters	c12d3dd200	add __init__.py to empty package dirs	2016-03-14 11:28:03 +01:00
Henning Peters	54f3447b5f	cleanup	2016-03-14 01:46:33 +01:00
Wolfgang Seeker	46e3f979f1	add function for setting head and label to token change PseudoProjectivity.deprojectivize to use these functions	2016-03-11 17:31:06 +01:00
Wolfgang Seeker	03fb498dbe	introduce lang field for LexemeC to hold language id put noun_chunk logic into iterators.py for each language separately	2016-03-10 13:01:34 +01:00
Wolfgang Seeker	bc9c62e279	replace Language functions with corresponding orth functions implement punctuation functions in orth	2016-03-09 18:07:37 +01:00
Wolfgang Seeker	d9312bc9ea	add new files npchunks.{pyx,pxd} to hold noun phrase chunk generators	2016-03-09 16:18:48 +01:00
Matthew Honnibal	1508528c8c	* Increment version	2016-03-08 15:58:45 +00:00
Matthew Honnibal	963fe5258e	* Add missing __contains__ method to vocab	2016-03-08 15:49:10 +00:00
Matthew Honnibal	478aa21cb0	* Remove broken __reduce__ method on vocab	2016-03-08 15:48:21 +00:00
Matthew Honnibal	20235bde00	Merge pull request #282 from henningpeters/switch_vectors initial proposal for ability to switch vectors	2016-03-09 01:39:41 +11:00
Henning Peters	eb7ae61b1c	cleanup api	2016-03-08 12:59:18 +01:00
Henning Peters	b740f20191	hash_string() should not depend on python's internal unicode representation, also fixes https://github.com/spacy-io/sense2vec/issues/5 for py2	2016-03-06 09:19:27 +01:00
Henning Peters	aa4d964c14	cleanup api	2016-03-05 17:51:32 +01:00
Henning Peters	931c07a609	initial proposal for separate vector package	2016-03-04 11:09:06 +01:00
Wolfgang Seeker	7adbd7a785	replace Counter with normal dict	2016-03-03 21:36:27 +01:00
Wolfgang Seeker	1ae487a4f6	add backwards compatibility with python 2.6	2016-03-03 21:18:12 +01:00
Wolfgang Seeker	9d1e6de4a0	make a proper list from zip iterator	2016-03-03 19:51:01 +01:00
Wolfgang Seeker	49f9d1c085	change test_nonproj.py to not use zip inside numpy.asarray	2016-03-03 19:42:09 +01:00
Wolfgang Seeker	72b8df0684	turned PseudoProjectivity into a normal python class	2016-03-03 19:05:08 +01:00
Matthew Honnibal	fcaa0ad7ce	Merge pull request #280 from wbwseeker/german_parser German parser	2016-03-04 03:27:42 +11:00
Wolfgang Seeker	690c5acabf	adjust train.py to train both english and german models	2016-03-03 15:21:00 +01:00
Wolfgang Seeker	3448cb40a4	integrated pseudo-projective parsing into parser - nonproj.pyx holds a class PseudoProjectivity which currently holds all functionality to implement Nivre & Nilsson 2005's pseudo-projective parsing using the HEAD decoration scheme - changed lefts/rights in Token to account for possible non-projective structures	2016-03-01 10:09:08 +01:00
Wolfgang Seeker	56b7210e82	moved nonproj.py to syntax/nonproj.pyx	2016-02-25 15:08:49 +01:00
Henning Peters	f3df736e0a	remove unidecode-related test	2016-02-24 18:22:22 +01:00
Wolfgang Seeker	4b2297d5d4	add class PseudoProjective for pseudo-projective parsing PseudoProjective() implements the algorithm from Nivre & Nilsson 2005 using their HEAD decoration scheme.	2016-02-24 11:26:25 +01:00
Henning Peters	12d58a7099	remove text-unidecode dependency	2016-02-24 08:01:59 +01:00
Wolfgang Seeker	8d531c958b	replace tests for non-projectivity - add functions to find non-projective edges - add test file for non-projectivity functions	2016-02-22 14:40:40 +01:00
Matthew Honnibal	141639ea3a	* Fix bug in tokenizer that caused new tokens to be added for affixes	2016-02-21 23:17:47 +00:00
Wolfgang Seeker	eae35e9b27	add tokenizer files for German, add/change code to train German pos tagger - add files to specify rules for German tokenization - change generate_specials.py to generate from an external file (abbrev.de.tab) - copy gazetteer.json from lang_data/en/ - init_model.py - change doc freq threshold to 0 - add train_german_tagger.py - expects conll09-formatted input	2016-02-18 13:24:20 +01:00
Henning Peters	9cc4f8d5b3	avoid shadowing __name__	2016-02-15 01:33:39 +01:00
Henning Peters	4c9e3c7911	upgrade spuntik, enforce data api via model version constraints	2016-02-14 16:03:17 +01:00
Henning Peters	9d8966a2c0	Update test_tokenizer.py	2016-02-10 19:24:37 +01:00
Henning Peters	3b5f1e753b	py26 compatibility	2016-02-10 14:32:54 +01:00
Henning Peters	ee1f1ac300	mark test_sentence_space() as model test	2016-02-10 07:49:11 +01:00
Matthew Honnibal	5d96b3ef4f	* Increment version	2016-02-07 13:48:58 +01:00
Matthew Honnibal	1b83cb9dfa	* Fix Issue #251 : Incorrect right edge calculation on left-clobber low in the tree	2016-02-07 00:00:42 +01:00
Matthew Honnibal	c6623889c1	* Add test for Issue #251 : Incorrect right edges, caused by bad update to r_edge in del_arc, triggered from non-monotonic left-arc	2016-02-06 23:47:51 +01:00
Matthew Honnibal	a95974ad3f	* Fix oov probability	2016-02-06 15:13:55 +01:00
Matthew Honnibal	af8514cb0c	* Refine the way the is_parsed attribute is set by from_array	2016-02-06 14:44:35 +01:00
Matthew Honnibal	161b01d4c0	* Tweak usage example for multi-processing	2016-02-06 14:44:11 +01:00
Matthew Honnibal	7f24229f10	* Don't try to pickle the tokenizer	2016-02-06 14:09:05 +01:00
Matthew Honnibal	dcb401f3e1	* Remove broken Vocab pickling	2016-02-06 14:08:47 +01:00
Matthew Honnibal	e66d45bf66	* Restore previous patch to Span.root, as it seems it wasn't the cause of the problem.	2016-02-06 13:37:41 +01:00
Matthew Honnibal	4412a70dc5	* Initialize StateC._empty_token to 0, to avoid undefined behaviour.	2016-02-06 13:34:38 +01:00
Matthew Honnibal	1b41f868d2	* Check for errors in parser, and parallelise the left-over batch	2016-02-06 10:06:30 +01:00
Matthew Honnibal	031b00cb91	* Fix Span.root calculation	2016-02-05 20:12:09 +01:00
Matthew Honnibal	165ca28b80	* Set is_parsed flag in Parser.pipe	2016-02-05 19:51:44 +01:00
Matthew Honnibal	bdd579db0a	* Set is_parsed flag in Parser.pipe	2016-02-05 19:50:11 +01:00
Matthew Honnibal	7119e77fb6	* Fix Matcher.pipe	2016-02-05 19:46:02 +01:00
Matthew Honnibal	1cf0100bf6	* Add test for multithreading	2016-02-05 19:38:22 +01:00
Matthew Honnibal	b04c9aad71	* Fix off-by-one in Parser.pipe	2016-02-05 19:37:50 +01:00
Matthew Honnibal	e5c447e237	* Questionable fix to problem in Span.root	2016-02-05 19:18:35 +01:00
Matthew Honnibal	1ef84a0557	* Merge master into rethinc2	2016-02-05 12:55:59 +01:00
Matthew Honnibal	4cf34fc170	Merge branch 'rethinc2' of ssh://github.com/honnibal/spaCy into rethinc2	2016-02-05 12:48:28 +01:00
Matthew Honnibal	249dccbe95	* Fix Language.pipe	2016-02-05 12:47:57 +01:00
Matthew Honnibal	c0e63feccc	* xfail pickle tests	2016-02-05 12:46:58 +01:00
Matthew Honnibal	6aa92b70f1	* Fix merge problem in span	2016-02-05 12:46:11 +01:00
Matthew Honnibal	048dfe35aa	* cimport cython.parallel	2016-02-05 12:20:42 +01:00
Matthew Honnibal	af58f273b3	* Fix spacy.language.pipe	2016-02-05 12:20:29 +01:00
Matthew Honnibal	8a13cebdcc	* Update for modified thinc interface	2016-02-05 11:44:39 +01:00
Matthew Honnibal	48ce09687d	* Skip pickling the vocab in the tests	2016-02-04 15:51:19 +01:00
Matthew Honnibal	419edfab50	* Use generic flags for the new attributes until they're added	2016-02-04 15:50:54 +01:00
Matthew Honnibal	c4017a06d9	* Add placeholders for the new flags in attrs and symbols	2016-02-04 15:49:45 +01:00
Matthew Honnibal	e5c96c969f	* Wire up new attributes	2016-02-04 13:04:58 +01:00
Matthew Honnibal	9703ccc3de	* Remove unused import	2016-02-04 13:04:33 +01:00
Matthew Honnibal	11810be33e	* Add Python hooks for is_bracket/is_quote/is_left_punct/is_right_punct	2016-02-04 13:04:16 +01:00
Matthew Honnibal	fe611132f0	* Add stubs for is_bracket/is_quote/is_left_punct/is_right_punct functions	2016-02-04 13:03:04 +01:00
Matthew Honnibal	ee975d36d0	* Add stubs to test is_bracket/is_quote/is_left_punct/is_right_punct functions	2016-02-04 13:02:25 +01:00
Matthew Honnibal	f9e765cae7	* Add pipe() method to tokenizer	2016-02-03 02:32:37 +01:00
Matthew Honnibal	4cbad510ff	* Fix calculation of head for spans with punctuation.	2016-02-03 02:32:21 +01:00
Matthew Honnibal	84b247ef83	* Add a .pipe method, that takes a stream of input, operates on it, and streams the output. Internally, the stream may be buffered, to allow multi-threading.	2016-02-03 02:10:58 +01:00
Matthew Honnibal	fcfc17a164	Merge branch 'master' into rethinc2	2016-02-02 23:05:34 +01:00
Matthew Honnibal	f204daf27b	* Add error warning that a gold tag is unrecognised	2016-02-02 22:59:59 +01:00
Matthew Honnibal	99b8906100	* Accept punct_labels as an argument to the scorer	2016-02-02 22:59:06 +01:00
Matthew Honnibal	59123443e2	* Check for presence/absence of the different models in Language.end_training	2016-02-02 22:49:55 +01:00
Matthew Honnibal	9e9d4c8706	* Fix stupid error in Language.batch	2016-02-01 09:49:32 +01:00
Matthew Honnibal	e3db39dd21	* Fix compiler warning about signed/unsigned comparison	2016-02-01 09:08:07 +01:00
Matthew Honnibal	98fbdf2856	* Add Language.batch() method, to support multi-threaded jobs	2016-02-01 09:01:13 +01:00
Matthew Honnibal	b3802562d6	Merge branch 'rethinc2' of https://github.com/honnibal/spaCy into rethinc2	2016-02-01 08:59:24 +01:00
Matthew Honnibal	4b08a3fafd	* Fix merge conflict	2016-02-01 08:58:18 +01:00
Matthew Honnibal	5188f6d9d8	* Fix parseC function	2016-02-01 08:48:48 +01:00
Matthew Honnibal	bcf8f7ba40	* Add a parse_batch method to Parser, that releases the GIL around a batch of documents.	2016-02-01 08:34:55 +01:00
Matthew Honnibal	d5579cd0d8	Merge branch 'rethinc2' of https://github.com/honnibal/spaCy into rethinc2	2016-02-01 03:08:49 +01:00
Matthew Honnibal	490ba65398	* Use openmp in parser	2016-02-01 03:08:42 +01:00
Matthew Honnibal	cb78d91ec5	* Fix ArcEager.set_valid	2016-02-01 03:07:37 +01:00
Matthew Honnibal	28e5ad62bc	* Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents	2016-02-01 03:00:15 +01:00
Matthew Honnibal	a47f00901b	* Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents	2016-02-01 02:58:14 +01:00
Matthew Honnibal	daaad66448	* Now fully proxied	2016-02-01 02:37:08 +01:00
Matthew Honnibal	7a0e3bb9c1	* Continue proxying. Some problem currently	2016-02-01 02:22:21 +01:00
Matthew Honnibal	2169bbb7ea	* Shadow StateClass with StateC, to start proxying	2016-02-01 01:16:14 +01:00
Matthew Honnibal	2fa228458e	* Add _state file, which StateClass will proxy to	2016-02-01 01:09:21 +01:00

1 2 3 4 5 ...

1540 Commits