spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-23 20:46:44 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	8f68b864c4	* Move Span/Spans to separate files. Currently duplicates lots of Tokens functionality. Should probably be integrated into Tokens	2015-03-26 16:44:48 +01:00
Matthew Honnibal	056c672caf	* Bug fixes to tokenization, and support for times	2015-03-26 16:44:48 +01:00
Matthew Honnibal	ee385b439a	* Ensure StringStore is dumped during training	2015-03-26 16:44:47 +01:00
Matthew Honnibal	e854ba0a13	* Remove support for force_gold flag from GreedyParser, since it's not so useful, and it's clutter	2015-03-26 16:44:47 +01:00
Matthew Honnibal	6a6085f8b9	* Clean up GreedyParser.train function a bit	2015-03-26 16:44:47 +01:00
Matthew Honnibal	b3157927e6	* Clean up unused feature templates	2015-03-26 16:44:47 +01:00
Matthew Honnibal	411bf377d4	* Remove dependency on ner_util module	2015-03-26 16:44:47 +01:00
Matthew Honnibal	01c892f583	* Add comment to fill_context	2015-03-26 16:44:47 +01:00
Matthew Honnibal	2741179aff	* Important bug fix: Fill token N2w, which was being unfilled, after a bad edit while writing the NER features.	2015-03-26 16:44:47 +01:00
Matthew Honnibal	2b2dec95d3	* Add comment to set_parse	2015-03-26 16:44:47 +01:00
Matthew Honnibal	e770fade1e	* Don't set dependency labels in set_parse, as this may be used by the Entity recogniser instead. Need to clean this method up...	2015-03-26 16:44:47 +01:00
Matthew Honnibal	71648205d9	* Add support for debug feature set. Just use unigrams for this.	2015-03-26 16:44:47 +01:00
Matthew Honnibal	3b70b304b2	* Add words to gold_tuples from gold conll file	2015-03-26 16:44:47 +01:00
Matthew Honnibal	2e12dec76e	* Adjust scorer to account for tokenization mistakes	2015-03-26 16:44:47 +01:00
Matthew Honnibal	221f43c370	* Ensure better separation between score printing and training in train.py	2015-03-26 16:44:46 +01:00
Matthew Honnibal	6d49f8717b	* Move scoring away from training. Does not support scoring on gold preproc.	2015-03-26 16:44:46 +01:00
Matthew Honnibal	05d6065e2e	* Add assertion	2015-03-26 16:44:46 +01:00
Matthew Honnibal	377e9b29b1	* Whitespace	2015-03-26 16:44:46 +01:00
Matthew Honnibal	670959f40c	* Fix iteration order on Tokens.rights	2015-03-26 16:44:46 +01:00
Matthew Honnibal	231ce2dae5	* Assign ROOT label by default. May be papering over another bug.	2015-03-26 16:44:46 +01:00
Matthew Honnibal	9f4ad8fdfb	* Assign root words the ROOT label via the Break transition. Something is still wrong here...	2015-03-26 16:44:46 +01:00
Matthew Honnibal	52429625f0	* Add write_parses function	2015-03-26 16:44:46 +01:00
Matthew Honnibal	0c91dd9e15	* Re-enable entity training	2015-03-26 16:44:46 +01:00
Matthew Honnibal	f729164c01	* Fix bug in label assignment: ensure null-label transitions receive the label 0	2015-03-26 16:44:46 +01:00
Matthew Honnibal	ee927fbbb4	* Fix test_morph_exceptions	2015-03-26 16:44:46 +01:00
Matthew Honnibal	7237c805c7	* Load tag for specials.json token	2015-03-26 16:44:46 +01:00
Matthew Honnibal	13520e6cf0	* Add i.e. to specials.json	2015-03-26 16:44:45 +01:00
Matthew Honnibal	567388e38d	* Use values encoded by StringStore in POS tagging, rather than indices into a list of tags	2015-03-26 16:44:45 +01:00
Matthew Honnibal	3105c7f8ba	* Don't pass label_ids dict to Tokens, since we now use the StringStore to manage string-to-int mapping for labels	2015-03-26 16:44:45 +01:00
Matthew Honnibal	27d9df49e7	* Upd sbd tests	2015-03-26 16:44:45 +01:00
Matthew Honnibal	801bf14f4f	* Clean up handling of dep_strings and ent_strings, using StringStore to encode the label names.	2015-03-26 16:44:45 +01:00
Matthew Honnibal	9061bbaf61	* Move to fixing up ent_strings and dep_strings passing	2015-03-26 16:44:45 +01:00
Matthew Honnibal	31fad99518	* Use StringStore to encode label names, instead of label_ids	2015-03-26 16:44:45 +01:00
Matthew Honnibal	64db61bff1	* Add Span class to Python API	2015-03-26 16:44:45 +01:00
Matthew Honnibal	b9b695fb1b	* Remove debug word list	2015-03-26 16:44:45 +01:00
Matthew Honnibal	8f7eeb1c2d	* Add verbose flag for Scorer, for debugging, and fix ent_strings bug	2015-03-26 16:44:45 +01:00
Matthew Honnibal	f21ab2d7fb	* Fix bug in ugly ent_strings hack on English class	2015-03-26 16:44:45 +01:00
Matthew Honnibal	1c843934be	* Fix oracle bug in NER. Now getting 77% F on ontonotes	2015-03-26 16:44:44 +01:00
Matthew Honnibal	903f196b3f	* Fix verbose printing for scorer	2015-03-26 16:44:44 +01:00
Matthew Honnibal	e181c051d5	* Improve features for NER	2015-03-26 16:44:44 +01:00
Matthew Honnibal	7ecb52c0ed	* Add scorer script	2015-03-26 16:44:44 +01:00
Matthew Honnibal	8057a95f20	* NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring.	2015-03-26 16:44:44 +01:00
Matthew Honnibal	e99f19dd6c	* Fix clean function	2015-03-26 16:44:44 +01:00
Matthew Honnibal	ae235e07b9	* Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc.	2015-03-26 16:44:44 +01:00
Matthew Honnibal	4539c70542	* Work on updating train script for named entity recognition	2015-03-26 16:44:44 +01:00
Matthew Honnibal	357dcdcc01	* Fix clean function	2015-03-26 16:44:44 +01:00
Matthew Honnibal	b3eda03c9c	* Tmp	2015-03-26 16:44:44 +01:00
Matthew Honnibal	220ce8bfed	* Prepare English class for NER	2015-03-26 16:44:44 +01:00
Matthew Honnibal	f5830dc1c1	* Remove _transitions.pyx	2015-03-26 16:44:44 +01:00
Matthew Honnibal	7a1a333f04	* Allow gold tokenization training, for debugging	2015-03-26 16:44:43 +01:00

1 2 3 4 5 ...

927 Commits