spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-12-11 12:14:30 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	e5c447e237	* Questionable fix to problem in Span.root	2016-02-05 19:18:35 +01:00
Matthew Honnibal	1ef84a0557	* Merge master into rethinc2	2016-02-05 12:55:59 +01:00
Matthew Honnibal	4cf34fc170	Merge branch 'rethinc2' of ssh://github.com/honnibal/spaCy into rethinc2	2016-02-05 12:48:28 +01:00
Matthew Honnibal	249dccbe95	* Fix Language.pipe	2016-02-05 12:47:57 +01:00
Matthew Honnibal	c0e63feccc	* xfail pickle tests	2016-02-05 12:46:58 +01:00
Matthew Honnibal	6aa92b70f1	* Fix merge problem in span	2016-02-05 12:46:11 +01:00
Matthew Honnibal	048dfe35aa	* cimport cython.parallel	2016-02-05 12:20:42 +01:00
Matthew Honnibal	af58f273b3	* Fix spacy.language.pipe	2016-02-05 12:20:29 +01:00
Matthew Honnibal	8a13cebdcc	* Update for modified thinc interface	2016-02-05 11:44:39 +01:00
Matthew Honnibal	48ce09687d	* Skip pickling the vocab in the tests	2016-02-04 15:51:19 +01:00
Matthew Honnibal	419edfab50	* Use generic flags for the new attributes until they're added	2016-02-04 15:50:54 +01:00
Matthew Honnibal	c4017a06d9	* Add placeholders for the new flags in attrs and symbols	2016-02-04 15:49:45 +01:00
Matthew Honnibal	e5c96c969f	* Wire up new attributes	2016-02-04 13:04:58 +01:00
Matthew Honnibal	9703ccc3de	* Remove unused import	2016-02-04 13:04:33 +01:00
Matthew Honnibal	11810be33e	* Add Python hooks for is_bracket/is_quote/is_left_punct/is_right_punct	2016-02-04 13:04:16 +01:00
Matthew Honnibal	fe611132f0	* Add stubs for is_bracket/is_quote/is_left_punct/is_right_punct functions	2016-02-04 13:03:04 +01:00
Matthew Honnibal	ee975d36d0	* Add stubs to test is_bracket/is_quote/is_left_punct/is_right_punct functions	2016-02-04 13:02:25 +01:00
Matthew Honnibal	f9e765cae7	* Add pipe() method to tokenizer	2016-02-03 02:32:37 +01:00
Matthew Honnibal	4cbad510ff	* Fix calculation of head for spans with punctuation.	2016-02-03 02:32:21 +01:00
Matthew Honnibal	84b247ef83	* Add a .pipe method, that takes a stream of input, operates on it, and streams the output. Internally, the stream may be buffered, to allow multi-threading.	2016-02-03 02:10:58 +01:00
Matthew Honnibal	fcfc17a164	Merge branch 'master' into rethinc2	2016-02-02 23:05:34 +01:00
Matthew Honnibal	f204daf27b	* Add error warning that a gold tag is unrecognised	2016-02-02 22:59:59 +01:00
Matthew Honnibal	99b8906100	* Accept punct_labels as an argument to the scorer	2016-02-02 22:59:06 +01:00
Matthew Honnibal	59123443e2	* Check for presence/absence of the different models in Language.end_training	2016-02-02 22:49:55 +01:00
Matthew Honnibal	9e9d4c8706	* Fix stupid error in Language.batch	2016-02-01 09:49:32 +01:00
Matthew Honnibal	e3db39dd21	* Fix compiler warning about signed/unsigned comparison	2016-02-01 09:08:07 +01:00
Matthew Honnibal	98fbdf2856	* Add Language.batch() method, to support multi-threaded jobs	2016-02-01 09:01:13 +01:00
Matthew Honnibal	b3802562d6	Merge branch 'rethinc2' of https://github.com/honnibal/spaCy into rethinc2	2016-02-01 08:59:24 +01:00
Matthew Honnibal	4b08a3fafd	* Fix merge conflict	2016-02-01 08:58:18 +01:00
Matthew Honnibal	5188f6d9d8	* Fix parseC function	2016-02-01 08:48:48 +01:00
Matthew Honnibal	bcf8f7ba40	* Add a parse_batch method to Parser, that releases the GIL around a batch of documents.	2016-02-01 08:34:55 +01:00
Matthew Honnibal	d5579cd0d8	Merge branch 'rethinc2' of https://github.com/honnibal/spaCy into rethinc2	2016-02-01 03:08:49 +01:00
Matthew Honnibal	490ba65398	* Use openmp in parser	2016-02-01 03:08:42 +01:00
Matthew Honnibal	cb78d91ec5	* Fix ArcEager.set_valid	2016-02-01 03:07:37 +01:00
Matthew Honnibal	28e5ad62bc	* Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents	2016-02-01 03:00:15 +01:00
Matthew Honnibal	a47f00901b	* Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents	2016-02-01 02:58:14 +01:00
Matthew Honnibal	daaad66448	* Now fully proxied	2016-02-01 02:37:08 +01:00
Matthew Honnibal	7a0e3bb9c1	* Continue proxying. Some problem currently	2016-02-01 02:22:21 +01:00
Matthew Honnibal	2169bbb7ea	* Shadow StateClass with StateC, to start proxying	2016-02-01 01:16:14 +01:00
Matthew Honnibal	2fa228458e	* Add _state file, which StateClass will proxy to	2016-02-01 01:09:21 +01:00
Matthew Honnibal	6bb007d16e	* Make set_parse nogil	2016-01-30 20:27:52 +01:00
Matthew Honnibal	9410e74c92	* Switch parser to use nogil functions	2016-01-30 20:27:07 +01:00
Matthew Honnibal	10877a7791	* Update for thinc 5.0, including changing cost from int to weight_t, and updating the tagger and parser	2016-01-30 14:31:36 +01:00
Matthew Honnibal	ea4ff94cde	* Whitespace	2016-01-29 03:59:22 +01:00
Matthew Honnibal	b0718b6ee1	* Move to thinc 5.0	2016-01-29 03:58:55 +01:00
Matthew Honnibal	9721502c81	* Update version	2016-01-25 15:52:59 +01:00
Matthew Honnibal	907e8cf07d	* Add u prefix to string in web example	2016-01-25 15:51:38 +01:00
Matthew Honnibal	eba03695ef	* Comment out pickle tests	2016-01-25 15:51:13 +01:00
Matthew Honnibal	de94e6c525	* Mark pickle tests as xfail, due to temp files problem	2016-01-25 15:24:17 +01:00
Matthew Honnibal	87172a15c6	* Fix runtime error bug that arose from updated Span.root function.	2016-01-25 15:22:42 +01:00
Matthew Honnibal	2c8dd91785	* Fix first code example on the website	2016-01-23 18:09:19 +01:00
Matthew Honnibal	3af84cfd6e	* Increment version	2016-01-21 17:49:27 +01:00
Henning Peters	65aeac24cb	remove package version constraint	2016-01-21 17:40:51 +01:00
Matthew Honnibal	792c98a438	* Increment version for OSX-fixed release of v0.100	2016-01-21 00:23:04 +01:00
Matthew Honnibal	82d011ac43	* Fix test for whitespace	2016-01-19 20:38:26 +01:00
Matthew Honnibal	e89069dcae	* Fix matcher test	2016-01-19 20:24:01 +01:00
Matthew Honnibal	63e3d4e27f	* Add comment on Vocab.__reduce__	2016-01-19 20:11:25 +01:00
Matthew Honnibal	e1282b7f2f	* Require user-custom NER classes to work without adding the label.	2016-01-19 20:11:03 +01:00
Matthew Honnibal	84c5dfbfc3	* Clean up debugging python list	2016-01-19 20:10:32 +01:00
Matthew Honnibal	04d0686b26	* Make TransitionSystem.add_action idempotent, i.e. ignore duplicate added actions.	2016-01-19 20:10:04 +01:00
Matthew Honnibal	c4a89d56bd	* Automatically register any entity types pre-set on the tokens, so that the NER works with user-given entity types.	2016-01-19 20:09:26 +01:00
Matthew Honnibal	f0f92793f6	* Add test for user NER classes in matcher blocking the NER model. Re Issue #178 and Issue #217	2016-01-19 19:23:16 +01:00
Matthew Honnibal	65c5bc4988	* Add add_label method, to allow users to register new entity types and dependency labels.	2016-01-19 19:11:02 +01:00
Matthew Honnibal	151aa0b0e2	* Allow users to add_label, in order to extend the entity recogniser to new classes. Does not by itself add a class to the model	2016-01-19 19:09:33 +01:00
Matthew Honnibal	c8e0011ebc	* Add iterators to the NER and parser transition systems, to get the action types	2016-01-19 19:07:43 +01:00
Matthew Honnibal	515493c675	* Add xfail test for Issue #225 : tokenization with non-whitespace delimiters	2016-01-19 13:20:14 +01:00
Matthew Honnibal	7abe653223	* Fix imports	2016-01-19 03:36:51 +01:00
Matthew Honnibal	590f38bdb2	* Add hacky solution to Issue #220 . Currently specials.json only supports literal patterns, which doesn't allow us to pre-tag whitespace with the correct token, SP, as a rule. The data-driven approach should be easy but for some reason fails here. Adding a hard code in Morphology isn't a good solution, but we do want to fix the behaviour right away, and don't want to wait for an architecturally better solution.	2016-01-19 03:35:20 +01:00
Matthew Honnibal	445164d5b4	* Restore the LOCAL_DATA_DIR global in spacy/en/__init__.py, although this is now deprecated	2016-01-19 02:54:56 +01:00
Matthew Honnibal	04177debd0	* Unwind limit to sentence boundary detection that prevents it from inserting boundaries on whitespace. Replace it with a check for whitespace in StateClass.fast_forward, so that whitespace is LeftArced when it's on the stack. This should prevent the previous problem of whitespace-only sentences. Should fix Issue #184 , but may cause further problems. Needs testing.	2016-01-19 02:54:15 +01:00
Matthew Honnibal	7893de3203	* Add test for Issue #184 : Whitespace at sentence boundary causes sentence boundary error.	2016-01-18 23:04:38 +01:00
Matthew Honnibal	bba0a5e078	* Handle string paths in default_vocab, default_parser, default_entity in Language class	2016-01-18 22:37:24 +01:00
Matthew Honnibal	e825fd9554	* Make some of the website tests work without models	2016-01-18 18:14:44 +01:00
Matthew Honnibal	334c4b2b57	* Disprefer punctuation and spaces as heads of spans	2016-01-18 18:14:09 +01:00
Matthew Honnibal	bed36ab0ff	* Fix import of HEAD attribute	2016-01-18 17:34:43 +01:00
Matthew Honnibal	28c659c1fe	* Fix import for numpy	2016-01-18 17:25:04 +01:00
Matthew Honnibal	fc36bcf458	* Fix import for English	2016-01-18 17:14:40 +01:00
Matthew Honnibal	cc4c335e14	* Set heads for test_merge_tokens, to make the test run without models	2016-01-18 17:00:11 +01:00
Matthew Honnibal	c107da9738	* Bug fix to _count_words_to_root	2016-01-18 16:59:38 +01:00
Matthew Honnibal	f24833d607	* Fix merge for coordinations	2016-01-18 16:03:19 +01:00
Matthew Honnibal	14534958a9	* Fix bug in Span.root	2016-01-18 15:40:28 +01:00
Matthew Honnibal	714cbc03d5	* Add test for Issue #203 : nested noun chunks.	2016-01-16 18:02:30 +01:00
Matthew Honnibal	4e2253170c	* Move test for doc.merge to tokens_api file, to avoid name conflicts which upset pytest	2016-01-16 18:01:36 +01:00
Matthew Honnibal	34a157511f	* Move test_merge_hang to test_tokens_api	2016-01-16 18:00:26 +01:00
Matthew Honnibal	fc8f26584a	* Don't consider NPs connected to parse via conj relation as noun chunks. Change motivated by the nested noun chunks identified in Issue #203 , but might be problematic. Also allow root NPs to be considered noun chunks.	2016-01-16 17:52:40 +01:00
Matthew Honnibal	4a16dbfeca	* Add test for Issue #203 : noun chunks should be flat, but sometimes are nested	2016-01-16 17:41:25 +01:00
Matthew Honnibal	995b2d18fd	* Route token.string via token.txt_with_ws, to deprecate token.string in future	2016-01-16 17:14:34 +01:00
Matthew Honnibal	54a98eaf19	* Fix typo text_wth_ws --> text_with_ws. Reroute .string attribute to text_with_ws, to deprecate .string in future	2016-01-16 17:13:50 +01:00
Matthew Honnibal	3e9961d2c4	* If final token is whitespace, don't mark it as owning a trailing space. Fixes Issue #154	2016-01-16 17:08:59 +01:00
Matthew Honnibal	223d2b3484	* Add test for Issue #154 : Additional whitespace introduced when string ends with a whitespace token.	2016-01-16 17:08:07 +01:00
Matthew Honnibal	3dc398b727	* Fix merge conflict in requirements.txt	2016-01-16 16:20:49 +01:00
Matthew Honnibal	fc5962a77d	* Improve test for root token in Span	2016-01-16 16:19:09 +01:00
Matthew Honnibal	c025a0c64b	* Check for KeyboardInerrupt in parser.__call__	2016-01-16 16:18:44 +01:00
Matthew Honnibal	03e8a4293d	* Add loop guard to Token.lefts and Token.rights properties	2016-01-16 16:18:17 +01:00
Matthew Honnibal	304339985e	* Add a linear scan to Span.root method, to help with long sentences	2016-01-16 16:17:28 +01:00
Matthew Honnibal	aa0dd79f52	* Delete test_token_references, which checked a flakey strategy for preventing orphan tokens from a while ago. Now orphan tokens simply hold a reference to Pool, preventing the memory from being freed underneath them. This means that we don't need to run this slow test.	2016-01-16 16:03:35 +01:00
Matthew Honnibal	8cbcc3a799	* Fix calculation of root token in Span. Now take root to be word with shortest tree path. Avoids parse trees ending up in inconsistent state, as had occurred in Issue #214 .	2016-01-16 15:38:50 +01:00
Matthew Honnibal	c1039fa4b4	* Add test for Issue #214 . Resolved in change to Span.root	2016-01-16 15:37:47 +01:00
Henning Peters	41ea14a56f	fix pickling	2016-01-16 13:23:11 +01:00
Henning Peters	5551052840	fix py2/3 issue	2016-01-16 12:44:53 +01:00

1 2 3 4 5 ...

1480 Commits