spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-13 09:42:26 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	670959f40c	* Fix iteration order on Tokens.rights	2015-03-26 16:44:46 +01:00
Matthew Honnibal	231ce2dae5	* Assign ROOT label by default. May be papering over another bug.	2015-03-26 16:44:46 +01:00
Matthew Honnibal	9f4ad8fdfb	* Assign root words the ROOT label via the Break transition. Something is still wrong here...	2015-03-26 16:44:46 +01:00
Matthew Honnibal	f729164c01	* Fix bug in label assignment: ensure null-label transitions receive the label 0	2015-03-26 16:44:46 +01:00
Matthew Honnibal	7237c805c7	* Load tag for specials.json token	2015-03-26 16:44:46 +01:00
Matthew Honnibal	567388e38d	* Use values encoded by StringStore in POS tagging, rather than indices into a list of tags	2015-03-26 16:44:45 +01:00
Matthew Honnibal	3105c7f8ba	* Don't pass label_ids dict to Tokens, since we now use the StringStore to manage string-to-int mapping for labels	2015-03-26 16:44:45 +01:00
Matthew Honnibal	801bf14f4f	* Clean up handling of dep_strings and ent_strings, using StringStore to encode the label names.	2015-03-26 16:44:45 +01:00
Matthew Honnibal	31fad99518	* Use StringStore to encode label names, instead of label_ids	2015-03-26 16:44:45 +01:00
Matthew Honnibal	64db61bff1	* Add Span class to Python API	2015-03-26 16:44:45 +01:00
Matthew Honnibal	b9b695fb1b	* Remove debug word list	2015-03-26 16:44:45 +01:00
Matthew Honnibal	f21ab2d7fb	* Fix bug in ugly ent_strings hack on English class	2015-03-26 16:44:45 +01:00
Matthew Honnibal	1c843934be	* Fix oracle bug in NER. Now getting 77% F on ontonotes	2015-03-26 16:44:44 +01:00
Matthew Honnibal	903f196b3f	* Fix verbose printing for scorer	2015-03-26 16:44:44 +01:00
Matthew Honnibal	e181c051d5	* Improve features for NER	2015-03-26 16:44:44 +01:00
Matthew Honnibal	7ecb52c0ed	* Add scorer script	2015-03-26 16:44:44 +01:00
Matthew Honnibal	8057a95f20	* NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring.	2015-03-26 16:44:44 +01:00
Matthew Honnibal	ae235e07b9	* Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc.	2015-03-26 16:44:44 +01:00
Matthew Honnibal	b3eda03c9c	* Tmp	2015-03-26 16:44:44 +01:00
Matthew Honnibal	220ce8bfed	* Prepare English class for NER	2015-03-26 16:44:44 +01:00
Matthew Honnibal	f5830dc1c1	* Remove _transitions.pyx	2015-03-26 16:44:44 +01:00
Matthew Honnibal	6865c2fb4d	* Fix assignment of dep strings in tokens.pyx	2015-03-26 16:44:43 +01:00
Matthew Honnibal	6b6bce9e7a	* Fix label loading for transition system	2015-03-26 16:44:43 +01:00
Matthew Honnibal	5278c7504b	* Hacks to conll.pyx. Should clean these up.	2015-03-26 16:44:43 +01:00
Matthew Honnibal	f321b2b2eb	* Remove TODO comment	2015-03-26 16:44:43 +01:00
Matthew Honnibal	fdabd93bfb	* Ensure high loss for invalid moves, and fix label reading for arc-eager	2015-03-26 16:44:43 +01:00
Matthew Honnibal	10ed738df2	* Tmp commit	2015-03-26 16:44:43 +01:00
Matthew Honnibal	4f83c9b3d5	* Make costs label-sensitive	2015-03-26 16:44:43 +01:00
Matthew Honnibal	179b7eb0a7	* Specify parser transition system in language	2015-03-26 16:44:43 +01:00
Matthew Honnibal	8c883cef58	* Refactored transition system code now compiling. Still need to hook up label oracle, and test	2015-03-26 16:44:43 +01:00
Matthew Honnibal	f0159ab4b6	* Add file to hold GoldParse class	2015-03-26 16:44:42 +01:00
Matthew Honnibal	8eadb984cb	* Refactor arc_eager to use new TransitionSystem base class. Need to fix oracle	2015-03-26 16:44:42 +01:00
Matthew Honnibal	b063001596	* Add base TransitionSystem class. Still need to rethink how non-monotonic labelling will work for best_valid	2015-03-26 16:44:42 +01:00
Matthew Honnibal	01bc4d6815	* Add set_parse method, to assign parse to tokens in a less hacky way.	2015-03-26 16:44:42 +01:00
Matthew Honnibal	dc986dbc0b	* Work on refactored parser, where TransitionSystem can be easily subclassed	2015-03-26 16:44:42 +01:00
Matthew Honnibal	1cc6329b18	* Add base class to do transitions	2015-03-26 16:44:42 +01:00
Matthew Honnibal	135756ac3d	* Tmp commit of NER refactoring	2015-03-26 16:44:42 +01:00
Matthew Honnibal	23c1f6fc04	* Merge changes from stash	2015-03-26 16:44:41 +01:00
Matthew Honnibal	0ff078876a	* Commit some work on ner.yx done on the plane	2015-03-26 16:44:41 +01:00
Matthew Honnibal	d81b7be6a2	* Merge train.py	2015-03-26 16:44:41 +01:00
Matthew Honnibal	2e3dc3dfe2	* Merge changes in tokens.pyx	2015-03-26 16:44:41 +01:00
Matthew Honnibal	8cc3524dc9	* Ws	2015-03-26 16:44:41 +01:00
Matthew Honnibal	3d0570685c	* Add NER transition system	2015-03-26 16:44:41 +01:00
Matthew Honnibal	043b758cf4	* Resurrect old NER code. This version won't be the one that runs; we want to re-use the parser code. But for now this is a useful reference.	2015-03-26 16:44:41 +01:00
Matthew Honnibal	b139aa92ba	* Start setting out how NER will be implemented in the data model	2015-03-26 16:44:41 +01:00
Matthew Honnibal	0962ffc095	* Fix issue #37 : missing check_flag attribute from Token class	2015-03-26 15:06:26 +01:00
Matthew Honnibal	2e8d0e5d45	* Upd download script	2015-03-03 05:47:16 -05:00
Matthew Honnibal	dbe26f5793	* Add children and subtree methods to Token, which are generators to assist parse-tree navigation.	2015-03-03 04:18:41 -05:00
Matthew Honnibal	ea90d136e8	* Fix bug in labelled parsing, that caused an 8% drop in labelled accuracy.	2015-02-27 03:56:10 -05:00
Matthew Honnibal	caf046b220	* Hastily add method to apply tags from a list of strings, instead of predicting the tags.	2015-02-23 15:40:17 -05:00
Matthew Honnibal	cae077b583	* Work on fixing orphaned Token objects bug	2015-02-16 15:20:31 -05:00
Matthew Honnibal	7572e31f5e	* Pass ownership of C data to Token instances if Tokens object is being garbage-collected, but Token instances are staying alive.	2015-02-11 18:05:06 -05:00
Matthew Honnibal	64645a1c2f	* Improve docstring on English	2015-02-11 15:13:20 -05:00
Matthew Honnibal	594e50bd45	* Add option to download speech-parsing data set.	2015-02-11 14:20:29 -05:00
Matthew Honnibal	0b7e769211	* Add POS tags to support SWBD tag set	2015-02-11 14:08:28 -05:00
Matthew Honnibal	312b3a45f3	* Fix issue #19 : Allow parsing/pos tagging of empty strings	2015-02-10 10:15:58 -05:00
Matthew Honnibal	2a0615104b	* Upd download script	2015-02-09 10:22:59 -05:00
Matthew Honnibal	5c3513583d	* Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens.	2015-02-09 03:57:10 -05:00
Matthew Honnibal	be5536d239	* Fix Issue #22 : PRP and PRP$ were mapped to NOUN. Should be PRON.	2015-02-08 18:36:18 -05:00
Matthew Honnibal	0492cee8b4	* Fix Issue #24 : Lemmas are empty when the L field is missing for special-cased tokens	2015-02-08 18:30:30 -05:00
Matthew Honnibal	d229fbd228	* Give better error on out-of-bounds array access	2015-02-07 12:59:12 -05:00
Matthew Honnibal	ab8bb047d0	* Fix negative index for __getitem__	2015-02-07 12:58:46 -05:00
Matthew Honnibal	44c7eafe44	* Fix download.py	2015-02-07 12:00:36 -05:00
Matthew Honnibal	6ca7f2eedc	* Upd download script	2015-02-07 11:32:33 -05:00
Matthew Honnibal	f0e0588833	* Fill L2 norm attribute on LexemeC struct	2015-02-07 08:44:42 -05:00
Matthew Honnibal	75f9b7d6bf	* Add L2 norm field to LexemeC struct	2015-02-07 08:43:17 -05:00
Matthew Honnibal	51b618d646	* Add a has_repvec property to Lexeme, and a check function to check flags	2015-02-07 08:42:44 -05:00
Matthew Honnibal	321b402739	* Store the l2 norm of the word's vector	2015-02-07 08:42:16 -05:00
Matthew Honnibal	c7d8644149	* Fix regression on 'prob' attr of Token.	2015-02-03 03:32:18 +11:00
Matthew Honnibal	c55a33d045	* Catch oracle errors	2015-02-02 23:02:04 +11:00
Matthew Honnibal	de772088e6	* Use parse tree for sbd in Tokens.sents	2015-02-02 12:17:32 +11:00
Matthew Honnibal	56c2ef2982	* Tweak POS features for web text	2015-02-02 11:59:36 +11:00
Matthew Honnibal	d68678a93e	* Add Exception class, OracleError	2015-02-02 11:57:32 +11:00
Matthew Honnibal	a20fdbd8ee	* Upd download script	2015-02-01 13:22:23 +11:00
Matthew Honnibal	76d9394cb4	* Fix vocab.pyx for Python3	2015-02-01 13:14:04 +11:00
Matthew Honnibal	63abdf154c	* Hastily hack download file	2015-01-31 22:48:32 +11:00
Matthew Honnibal	7de00c5a79	* Try not holding a reference to Pool, since that seems to confuse the GC	2015-01-31 22:10:22 +11:00
Matthew Honnibal	ce3ae8b5d9	* Fix platform-specific lexicon bug.	2015-01-31 16:38:58 +11:00
Matthew Honnibal	a1ed574b7b	* Fix default model path for English	2015-01-31 16:38:27 +11:00
Matthew Honnibal	018e0bfa24	* Bug fixes to parse navigation	2015-01-31 16:37:13 +11:00
Matthew Honnibal	e013555b25	* Add option to download script	2015-01-31 13:51:56 +11:00
Matthew Honnibal	08ca5c8970	* Add sent_end flag to TokenC struct	2015-01-31 13:44:16 +11:00
Matthew Honnibal	024cfd485c	* Pass tag_strings as a tuple, to support new Tokens API	2015-01-31 13:43:37 +11:00
Matthew Honnibal	77d62d0179	* Large refactor of Token objects, making them much thinner. This is to support fast parse-tree navigation.	2015-01-31 13:42:58 +11:00
Matthew Honnibal	88170e6295	* Supply dep_strings as a tuple, for the changed API on Tokens	2015-01-31 13:42:09 +11:00
Matthew Honnibal	0981d68022	* Set a sent_end flag during parsing, for later use	2015-01-31 13:41:46 +11:00
Matthew Honnibal	251dbf24d7	* Fix unintialised variable error	2015-01-30 20:46:34 +11:00
Matthew Honnibal	83a4df5a1a	* Fix download script	2015-01-30 20:40:42 +11:00
Matthew Honnibal	6f9ebc2f34	* Fix download script	2015-01-30 20:33:19 +11:00
Matthew Honnibal	8b85d0bb8a	* Only download small data if no data dir exists	2015-01-30 20:27:14 +11:00
Matthew Honnibal	1a7a1c2771	* Fix Issue #16 : tokens recurse when printing	2015-01-30 19:47:50 +11:00
Matthew Honnibal	cb95ef6934	* Fix download script	2015-01-30 19:28:43 +11:00
Matthew Honnibal	e578bd37bd	* Fix download script	2015-01-30 18:59:31 +11:00
Matthew Honnibal	df52014d12	* Fix download script	2015-01-30 18:36:24 +11:00
Matthew Honnibal	0f95712189	* Improve accuracy reporting during training	2015-01-30 18:05:06 +11:00
Matthew Honnibal	b68f563c2f	* Fix Issue #14 : Improve parsing API	2015-01-30 18:04:41 +11:00
Matthew Honnibal	998b607f65	* Upd download script, having it download all data if there's no data/ directory, allowing easier compilation from source	2015-01-30 18:04:01 +11:00
Matthew Honnibal	67d6e53a69	* Ensure parser and tagger function correctly when training from missing values, indicated by -1	2015-01-30 14:08:56 +11:00
Matthew Honnibal	4ff180db74	* Fix off-by-one error in commit `0a7fceb`	2015-01-30 12:49:33 +11:00
Matthew Honnibal	0a7fcebdf7	* Fix Issue #12 : Incorrect token.idx calculations for some punctuation, in the presence of token cache	2015-01-30 12:33:38 +11:00

1 2 3 4 5 ...

551 Commits