spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-04-15 14:42:00 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	8c883cef58	* Refactored transition system code now compiling. Still need to hook up label oracle, and test	2015-03-26 16:44:43 +01:00
Matthew Honnibal	f0159ab4b6	* Add file to hold GoldParse class	2015-03-26 16:44:42 +01:00
Matthew Honnibal	8eadb984cb	* Refactor arc_eager to use new TransitionSystem base class. Need to fix oracle	2015-03-26 16:44:42 +01:00
Matthew Honnibal	b063001596	* Add base TransitionSystem class. Still need to rethink how non-monotonic labelling will work for best_valid	2015-03-26 16:44:42 +01:00
Matthew Honnibal	01bc4d6815	* Add set_parse method, to assign parse to tokens in a less hacky way.	2015-03-26 16:44:42 +01:00
Matthew Honnibal	dc986dbc0b	* Work on refactored parser, where TransitionSystem can be easily subclassed	2015-03-26 16:44:42 +01:00
Matthew Honnibal	1cc6329b18	* Add base class to do transitions	2015-03-26 16:44:42 +01:00
Matthew Honnibal	135756ac3d	* Tmp commit of NER refactoring	2015-03-26 16:44:42 +01:00
Matthew Honnibal	23c1f6fc04	* Merge changes from stash	2015-03-26 16:44:41 +01:00
Matthew Honnibal	0ff078876a	* Commit some work on ner.yx done on the plane	2015-03-26 16:44:41 +01:00
Matthew Honnibal	d81b7be6a2	* Merge train.py	2015-03-26 16:44:41 +01:00
Matthew Honnibal	2e3dc3dfe2	* Merge changes in tokens.pyx	2015-03-26 16:44:41 +01:00
Matthew Honnibal	8cc3524dc9	* Ws	2015-03-26 16:44:41 +01:00
Matthew Honnibal	3d0570685c	* Add NER transition system	2015-03-26 16:44:41 +01:00
Matthew Honnibal	043b758cf4	* Resurrect old NER code. This version won't be the one that runs; we want to re-use the parser code. But for now this is a useful reference.	2015-03-26 16:44:41 +01:00
Matthew Honnibal	b139aa92ba	* Start setting out how NER will be implemented in the data model	2015-03-26 16:44:41 +01:00
Matthew Honnibal	0962ffc095	* Fix issue #37 : missing check_flag attribute from Token class	2015-03-26 15:06:26 +01:00
Matthew Honnibal	2e8d0e5d45	* Upd download script	2015-03-03 05:47:16 -05:00
Matthew Honnibal	dbe26f5793	* Add children and subtree methods to Token, which are generators to assist parse-tree navigation.	2015-03-03 04:18:41 -05:00
Matthew Honnibal	ea90d136e8	* Fix bug in labelled parsing, that caused an 8% drop in labelled accuracy.	2015-02-27 03:56:10 -05:00
Matthew Honnibal	caf046b220	* Hastily add method to apply tags from a list of strings, instead of predicting the tags.	2015-02-23 15:40:17 -05:00
Matthew Honnibal	cae077b583	* Work on fixing orphaned Token objects bug	2015-02-16 15:20:31 -05:00
Matthew Honnibal	7572e31f5e	* Pass ownership of C data to Token instances if Tokens object is being garbage-collected, but Token instances are staying alive.	2015-02-11 18:05:06 -05:00
Matthew Honnibal	64645a1c2f	* Improve docstring on English	2015-02-11 15:13:20 -05:00
Matthew Honnibal	594e50bd45	* Add option to download speech-parsing data set.	2015-02-11 14:20:29 -05:00
Matthew Honnibal	0b7e769211	* Add POS tags to support SWBD tag set	2015-02-11 14:08:28 -05:00
Matthew Honnibal	312b3a45f3	* Fix issue #19 : Allow parsing/pos tagging of empty strings	2015-02-10 10:15:58 -05:00
Matthew Honnibal	2a0615104b	* Upd download script	2015-02-09 10:22:59 -05:00
Matthew Honnibal	5c3513583d	* Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens.	2015-02-09 03:57:10 -05:00
Matthew Honnibal	be5536d239	* Fix Issue #22 : PRP and PRP$ were mapped to NOUN. Should be PRON.	2015-02-08 18:36:18 -05:00
Matthew Honnibal	0492cee8b4	* Fix Issue #24 : Lemmas are empty when the L field is missing for special-cased tokens	2015-02-08 18:30:30 -05:00
Matthew Honnibal	d229fbd228	* Give better error on out-of-bounds array access	2015-02-07 12:59:12 -05:00
Matthew Honnibal	ab8bb047d0	* Fix negative index for __getitem__	2015-02-07 12:58:46 -05:00
Matthew Honnibal	44c7eafe44	* Fix download.py	2015-02-07 12:00:36 -05:00
Matthew Honnibal	6ca7f2eedc	* Upd download script	2015-02-07 11:32:33 -05:00
Matthew Honnibal	f0e0588833	* Fill L2 norm attribute on LexemeC struct	2015-02-07 08:44:42 -05:00
Matthew Honnibal	75f9b7d6bf	* Add L2 norm field to LexemeC struct	2015-02-07 08:43:17 -05:00
Matthew Honnibal	51b618d646	* Add a has_repvec property to Lexeme, and a check function to check flags	2015-02-07 08:42:44 -05:00
Matthew Honnibal	321b402739	* Store the l2 norm of the word's vector	2015-02-07 08:42:16 -05:00
Matthew Honnibal	c7d8644149	* Fix regression on 'prob' attr of Token.	2015-02-03 03:32:18 +11:00
Matthew Honnibal	c55a33d045	* Catch oracle errors	2015-02-02 23:02:04 +11:00
Matthew Honnibal	de772088e6	* Use parse tree for sbd in Tokens.sents	2015-02-02 12:17:32 +11:00
Matthew Honnibal	56c2ef2982	* Tweak POS features for web text	2015-02-02 11:59:36 +11:00
Matthew Honnibal	d68678a93e	* Add Exception class, OracleError	2015-02-02 11:57:32 +11:00
Matthew Honnibal	a20fdbd8ee	* Upd download script	2015-02-01 13:22:23 +11:00
Matthew Honnibal	76d9394cb4	* Fix vocab.pyx for Python3	2015-02-01 13:14:04 +11:00
Matthew Honnibal	63abdf154c	* Hastily hack download file	2015-01-31 22:48:32 +11:00
Matthew Honnibal	7de00c5a79	* Try not holding a reference to Pool, since that seems to confuse the GC	2015-01-31 22:10:22 +11:00
Matthew Honnibal	ce3ae8b5d9	* Fix platform-specific lexicon bug.	2015-01-31 16:38:58 +11:00
Matthew Honnibal	a1ed574b7b	* Fix default model path for English	2015-01-31 16:38:27 +11:00

1 2 3 4 5 ...

472 Commits