spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-28 10:56:31 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	6640386b25	* Fix Issue #43 : TAG attr not supported. Also add DEP attr, while I'm at it. Need better way of ensuring future changes don't break in similar way.	2015-04-07 06:00:57 +02:00
Matthew Honnibal	b64b2bd910	* Fix Issue #43 : TAG attr not supported. Also add DEP attr, while I'm at it. Need better way of ensuring future changes don't break in similar way.	2015-04-07 06:00:30 +02:00
Matthew Honnibal	f9e510a893	* Whitespace	2015-04-07 04:53:59 +02:00
Matthew Honnibal	66c7ccf6cc	* Fix Spans.orth_	2015-04-07 04:53:40 +02:00
Matthew Honnibal	b8d34531c4	* Add support for units to English.__init__, by loading and applying regular expressions	2015-04-07 04:02:32 +02:00
Matthew Honnibal	0ea5af88b6	* Add multi-word expression RegexMatcher	2015-04-07 03:45:40 +02:00
Matthew Honnibal	2fee67cfa3	* Add regular expressions for English multi-word expressions	2015-04-07 03:45:18 +02:00
Matthew Honnibal	5a075ea3fc	* Ensure NER moves are available for single-word tokens	2015-04-05 22:30:58 +02:00
Matthew Honnibal	a60a366b2c	* Support 'punct' dep label in conll.pyx	2015-04-05 22:30:19 +02:00
Matthew Honnibal	021c972137	* Print parse if verbose in scorer	2015-04-05 22:29:30 +02:00
Matthew Honnibal	fbf19049cf	* Add ent_type_ property	2015-03-31 02:01:29 +02:00
Matthew Honnibal	e70b87efeb	* Add merge() method to Tokens, with fairly brittle/hacky implementation, but quite easy to test. Passing minimal tests. Still need to fix left/right deps in C data	2015-03-30 01:37:41 +02:00
Matthew Honnibal	557856e84c	* Allow regular expressions to specify labels for merged spans	2015-03-27 17:40:52 +01:00
Matthew Honnibal	a3af6b7c3d	* Left-Arc from Root, to allow non-monotonic reduce to compete with left-arc when the stack is not empty.	2015-03-27 17:39:16 +01:00
Matthew Honnibal	db5a43318c	* Improve print_state debug printer	2015-03-27 17:29:58 +01:00
Matthew Honnibal	1705eccbbe	* Remove whitespace	2015-03-27 15:22:39 +01:00
Matthew Honnibal	3feb52374c	* Break apart a condition, for ease of debug printing	2015-03-27 15:21:38 +01:00
Matthew Honnibal	b32f581acb	* Fix bug in ArcEager.get_labels	2015-03-27 15:21:06 +01:00
Matthew Honnibal	5f2a4ff36d	* Fix spans.lemma_	2015-03-26 16:45:38 +01:00
Matthew Honnibal	f4cc222ec3	* Fix NER scoring	2015-03-26 16:45:38 +01:00
Matthew Honnibal	1320bd19db	* Move Span class to own file	2015-03-26 16:45:38 +01:00
Matthew Honnibal	6f47a667cf	* Move Span class to own file	2015-03-26 16:45:38 +01:00
Matthew Honnibal	f02c39dfaf	* Compare to is not None, for more robustness	2015-03-26 16:44:48 +01:00
Matthew Honnibal	8f68b864c4	* Move Span/Spans to separate files. Currently duplicates lots of Tokens functionality. Should probably be integrated into Tokens	2015-03-26 16:44:48 +01:00
Matthew Honnibal	e854ba0a13	* Remove support for force_gold flag from GreedyParser, since it's not so useful, and it's clutter	2015-03-26 16:44:47 +01:00
Matthew Honnibal	6a6085f8b9	* Clean up GreedyParser.train function a bit	2015-03-26 16:44:47 +01:00
Matthew Honnibal	b3157927e6	* Clean up unused feature templates	2015-03-26 16:44:47 +01:00
Matthew Honnibal	411bf377d4	* Remove dependency on ner_util module	2015-03-26 16:44:47 +01:00
Matthew Honnibal	01c892f583	* Add comment to fill_context	2015-03-26 16:44:47 +01:00
Matthew Honnibal	2741179aff	* Important bug fix: Fill token N2w, which was being unfilled, after a bad edit while writing the NER features.	2015-03-26 16:44:47 +01:00
Matthew Honnibal	2b2dec95d3	* Add comment to set_parse	2015-03-26 16:44:47 +01:00
Matthew Honnibal	e770fade1e	* Don't set dependency labels in set_parse, as this may be used by the Entity recogniser instead. Need to clean this method up...	2015-03-26 16:44:47 +01:00
Matthew Honnibal	71648205d9	* Add support for debug feature set. Just use unigrams for this.	2015-03-26 16:44:47 +01:00
Matthew Honnibal	3b70b304b2	* Add words to gold_tuples from gold conll file	2015-03-26 16:44:47 +01:00
Matthew Honnibal	2e12dec76e	* Adjust scorer to account for tokenization mistakes	2015-03-26 16:44:47 +01:00
Matthew Honnibal	05d6065e2e	* Add assertion	2015-03-26 16:44:46 +01:00
Matthew Honnibal	377e9b29b1	* Whitespace	2015-03-26 16:44:46 +01:00
Matthew Honnibal	670959f40c	* Fix iteration order on Tokens.rights	2015-03-26 16:44:46 +01:00
Matthew Honnibal	231ce2dae5	* Assign ROOT label by default. May be papering over another bug.	2015-03-26 16:44:46 +01:00
Matthew Honnibal	9f4ad8fdfb	* Assign root words the ROOT label via the Break transition. Something is still wrong here...	2015-03-26 16:44:46 +01:00
Matthew Honnibal	f729164c01	* Fix bug in label assignment: ensure null-label transitions receive the label 0	2015-03-26 16:44:46 +01:00
Matthew Honnibal	7237c805c7	* Load tag for specials.json token	2015-03-26 16:44:46 +01:00
Matthew Honnibal	567388e38d	* Use values encoded by StringStore in POS tagging, rather than indices into a list of tags	2015-03-26 16:44:45 +01:00
Matthew Honnibal	3105c7f8ba	* Don't pass label_ids dict to Tokens, since we now use the StringStore to manage string-to-int mapping for labels	2015-03-26 16:44:45 +01:00
Matthew Honnibal	801bf14f4f	* Clean up handling of dep_strings and ent_strings, using StringStore to encode the label names.	2015-03-26 16:44:45 +01:00
Matthew Honnibal	31fad99518	* Use StringStore to encode label names, instead of label_ids	2015-03-26 16:44:45 +01:00
Matthew Honnibal	64db61bff1	* Add Span class to Python API	2015-03-26 16:44:45 +01:00
Matthew Honnibal	b9b695fb1b	* Remove debug word list	2015-03-26 16:44:45 +01:00
Matthew Honnibal	f21ab2d7fb	* Fix bug in ugly ent_strings hack on English class	2015-03-26 16:44:45 +01:00
Matthew Honnibal	1c843934be	* Fix oracle bug in NER. Now getting 77% F on ontonotes	2015-03-26 16:44:44 +01:00
Matthew Honnibal	903f196b3f	* Fix verbose printing for scorer	2015-03-26 16:44:44 +01:00
Matthew Honnibal	e181c051d5	* Improve features for NER	2015-03-26 16:44:44 +01:00
Matthew Honnibal	7ecb52c0ed	* Add scorer script	2015-03-26 16:44:44 +01:00
Matthew Honnibal	8057a95f20	* NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring.	2015-03-26 16:44:44 +01:00
Matthew Honnibal	ae235e07b9	* Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc.	2015-03-26 16:44:44 +01:00
Matthew Honnibal	b3eda03c9c	* Tmp	2015-03-26 16:44:44 +01:00
Matthew Honnibal	220ce8bfed	* Prepare English class for NER	2015-03-26 16:44:44 +01:00
Matthew Honnibal	f5830dc1c1	* Remove _transitions.pyx	2015-03-26 16:44:44 +01:00
Matthew Honnibal	6865c2fb4d	* Fix assignment of dep strings in tokens.pyx	2015-03-26 16:44:43 +01:00
Matthew Honnibal	6b6bce9e7a	* Fix label loading for transition system	2015-03-26 16:44:43 +01:00
Matthew Honnibal	5278c7504b	* Hacks to conll.pyx. Should clean these up.	2015-03-26 16:44:43 +01:00
Matthew Honnibal	f321b2b2eb	* Remove TODO comment	2015-03-26 16:44:43 +01:00
Matthew Honnibal	fdabd93bfb	* Ensure high loss for invalid moves, and fix label reading for arc-eager	2015-03-26 16:44:43 +01:00
Matthew Honnibal	10ed738df2	* Tmp commit	2015-03-26 16:44:43 +01:00
Matthew Honnibal	4f83c9b3d5	* Make costs label-sensitive	2015-03-26 16:44:43 +01:00
Matthew Honnibal	179b7eb0a7	* Specify parser transition system in language	2015-03-26 16:44:43 +01:00
Matthew Honnibal	8c883cef58	* Refactored transition system code now compiling. Still need to hook up label oracle, and test	2015-03-26 16:44:43 +01:00
Matthew Honnibal	f0159ab4b6	* Add file to hold GoldParse class	2015-03-26 16:44:42 +01:00
Matthew Honnibal	8eadb984cb	* Refactor arc_eager to use new TransitionSystem base class. Need to fix oracle	2015-03-26 16:44:42 +01:00
Matthew Honnibal	b063001596	* Add base TransitionSystem class. Still need to rethink how non-monotonic labelling will work for best_valid	2015-03-26 16:44:42 +01:00
Matthew Honnibal	01bc4d6815	* Add set_parse method, to assign parse to tokens in a less hacky way.	2015-03-26 16:44:42 +01:00
Matthew Honnibal	dc986dbc0b	* Work on refactored parser, where TransitionSystem can be easily subclassed	2015-03-26 16:44:42 +01:00
Matthew Honnibal	1cc6329b18	* Add base class to do transitions	2015-03-26 16:44:42 +01:00
Matthew Honnibal	135756ac3d	* Tmp commit of NER refactoring	2015-03-26 16:44:42 +01:00
Matthew Honnibal	23c1f6fc04	* Merge changes from stash	2015-03-26 16:44:41 +01:00
Matthew Honnibal	0ff078876a	* Commit some work on ner.yx done on the plane	2015-03-26 16:44:41 +01:00
Matthew Honnibal	d81b7be6a2	* Merge train.py	2015-03-26 16:44:41 +01:00
Matthew Honnibal	2e3dc3dfe2	* Merge changes in tokens.pyx	2015-03-26 16:44:41 +01:00
Matthew Honnibal	8cc3524dc9	* Ws	2015-03-26 16:44:41 +01:00
Matthew Honnibal	3d0570685c	* Add NER transition system	2015-03-26 16:44:41 +01:00
Matthew Honnibal	043b758cf4	* Resurrect old NER code. This version won't be the one that runs; we want to re-use the parser code. But for now this is a useful reference.	2015-03-26 16:44:41 +01:00
Matthew Honnibal	b139aa92ba	* Start setting out how NER will be implemented in the data model	2015-03-26 16:44:41 +01:00
Matthew Honnibal	0962ffc095	* Fix issue #37 : missing check_flag attribute from Token class	2015-03-26 15:06:26 +01:00
Matthew Honnibal	2e8d0e5d45	* Upd download script	2015-03-03 05:47:16 -05:00
Matthew Honnibal	dbe26f5793	* Add children and subtree methods to Token, which are generators to assist parse-tree navigation.	2015-03-03 04:18:41 -05:00
Matthew Honnibal	ea90d136e8	* Fix bug in labelled parsing, that caused an 8% drop in labelled accuracy.	2015-02-27 03:56:10 -05:00
Matthew Honnibal	caf046b220	* Hastily add method to apply tags from a list of strings, instead of predicting the tags.	2015-02-23 15:40:17 -05:00
Matthew Honnibal	cae077b583	* Work on fixing orphaned Token objects bug	2015-02-16 15:20:31 -05:00
Matthew Honnibal	7572e31f5e	* Pass ownership of C data to Token instances if Tokens object is being garbage-collected, but Token instances are staying alive.	2015-02-11 18:05:06 -05:00
Matthew Honnibal	64645a1c2f	* Improve docstring on English	2015-02-11 15:13:20 -05:00
Matthew Honnibal	594e50bd45	* Add option to download speech-parsing data set.	2015-02-11 14:20:29 -05:00
Matthew Honnibal	0b7e769211	* Add POS tags to support SWBD tag set	2015-02-11 14:08:28 -05:00
Matthew Honnibal	312b3a45f3	* Fix issue #19 : Allow parsing/pos tagging of empty strings	2015-02-10 10:15:58 -05:00
Matthew Honnibal	2a0615104b	* Upd download script	2015-02-09 10:22:59 -05:00
Matthew Honnibal	5c3513583d	* Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens.	2015-02-09 03:57:10 -05:00
Matthew Honnibal	be5536d239	* Fix Issue #22 : PRP and PRP$ were mapped to NOUN. Should be PRON.	2015-02-08 18:36:18 -05:00
Matthew Honnibal	0492cee8b4	* Fix Issue #24 : Lemmas are empty when the L field is missing for special-cased tokens	2015-02-08 18:30:30 -05:00
Matthew Honnibal	d229fbd228	* Give better error on out-of-bounds array access	2015-02-07 12:59:12 -05:00
Matthew Honnibal	ab8bb047d0	* Fix negative index for __getitem__	2015-02-07 12:58:46 -05:00
Matthew Honnibal	44c7eafe44	* Fix download.py	2015-02-07 12:00:36 -05:00

1 2 3 4 5 ...

588 Commits