| 
							
							
								 Robert | 8711b64860 | Force SSL for downloading English language data. It would also be nice to have a checksum for this. | 2015-09-21 17:26:01 -07:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e13e47e9e5 | * Add English stop words | 2015-09-14 17:48:51 +10:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0b7d2a6c62 | * Inc version | 2015-09-13 01:26:29 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e2ef78b29c | * Gut pos.pyx module, since functionality moved to spacy/tagger.pyx | 2015-08-26 19:15:42 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c4d8754385 | * Specify LOCAL_DATA_DIR global in spacy.en.__init__.py | 2015-08-26 19:15:07 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c5a27d1821 | * Move lemmatizer to spacy | 2015-08-25 15:47:08 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 82217c6ec6 | * Generalize lemmatizer | 2015-08-25 15:46:19 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8083a07c3e | * Use language base class | 2015-08-25 15:37:30 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5dd76be446 | * Split EnPosTagger up into base class and subclass | 2015-08-24 05:25:55 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6f1743692a | * Work on language-independent refactoring | 2015-08-23 20:49:18 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cad0cca4e3 | * Tmp | 2015-08-22 22:04:34 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5737115e1e | * Work on gazetteer matching | 2015-08-06 14:33:21 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c609ea18f0 | * Increment version in download script | 2015-07-28 15:22:17 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ddc1a5cfe5 | * Fix training under python3 | 2015-07-28 14:09:30 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a296d72b54 | * Fix en/attrs | 2015-07-27 21:16:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8535d872e8 | * Set is_oov property in get_flags | 2015-07-27 01:51:24 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8e4c69ee8c | * Add is_oov property, and fix up handling of attributes | 2015-07-27 01:50:06 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6bb96c122d | * Host IS_ flags in attrs.pxd, and add properties for them on Token and Lexeme objects | 2015-07-26 16:37:16 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | eeaea25f0c | * Check oov_prob file is present | 2015-07-26 16:36:38 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1b5d1da2a7 | * Allow an OOV probability to be specified in get_lex_props | 2015-07-26 00:03:43 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cd6e25132b | * Allow an OOV probability to be specified in get_lex_props | 2015-07-26 00:01:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5b41744270 | * Check for directory presence before loading annotators | 2015-07-23 09:27:37 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 12699a1152 | * Set initial freqs, to avoid missing values in serializer | 2015-07-23 01:16:27 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 680bb47b55 | * Write serializer freqs to single file, vocab/serializer.json | 2015-07-23 01:15:25 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 38ef986b29 | * Update spacy/en/attrs.pxd | 2015-07-23 01:10:58 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c86dbe4944 | * Update English.save_models for new Packer save/load stuff | 2015-07-22 13:40:23 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 317cbbc015 | * Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time. | 2015-07-19 15:18:17 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4dddc8a69b | * Fix type declarations for attr_t. Remove unused id_t. | 2015-07-18 22:39:57 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 95e57c2780 | * Remove unnecessary key and id properties from Utf8String. | 2015-07-17 01:40:18 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | db9dfd2e23 | * Major refactor of serialization. Nearly complete now. | 2015-07-17 01:27:54 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 897de2d438 | * Add 'bitter' property for serializer in English class | 2015-07-16 17:47:53 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6eef0bf9ab | * Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx | 2015-07-13 20:20:58 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ff9ff6f3fa | * Ensure unseen words are given low log probability | 2015-07-12 01:31:09 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 89a91ad726 | * Add SPACE part-of-speech tag, and train tagger to assign it. Also train tagger not to make whitespace an entity | 2015-07-09 13:30:41 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6ddb2f5e45 | * Restore merge_mwe in English class | 2015-07-08 19:35:30 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6859f6adac | * Restore merge_mwe in English class | 2015-07-08 19:34:55 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e3c53f5ecd | * Fix mention of Tokens in docstring | 2015-07-08 18:56:27 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bb522496dd | * Rename Tokens to Doc | 2015-07-08 18:53:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4e4fac452b | * Refactor __init__ for simplicity. Allow parse=True, tag=True etc flags to be passed at top-level. Do not lazy-load parser. | 2015-07-08 12:35:29 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1d2deb4616 | * Work on refactoring default arguments to English.__init__ | 2015-07-07 15:53:25 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6788c86b2f | * Begin refactor | 2015-07-07 14:00:07 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9af86b0b0b | * Fix attrs.pxd | 2015-06-30 18:16:30 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5d595b5a8c | * Inc versions | 2015-06-30 18:11:06 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d2eeba6667 | * Start wiring up color and emotion lexicons. Hopefully we get to use them. | 2015-06-30 16:22:23 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b266a63f2c | * Inc version of downloadble data | 2015-06-24 04:53:08 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7d265a9c62 | * Revert to wget in spacy.en.download | 2015-06-08 00:48:56 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1515862861 | * Fix download.py | 2015-06-08 00:08:05 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7e9e8f654a | * Use urllib in spacy.en.download | 2015-06-07 23:51:38 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 80cff41a9c | * Upd download.py | 2015-06-07 19:13:28 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 58d5ac0944 | * Add beam search capabilities to Parser. Rename GreedyParser to Parser. | 2015-06-02 00:28:02 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 62424e6c76 | * Remove unused regularize argument from _ml.Model | 2015-06-02 00:27:07 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 04bda8648d | * Pass parameter for regularization to model | 2015-05-27 03:16:58 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | eba7b34f66 | * Add flag to disable loading of word vectors | 2015-05-25 01:02:42 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 03ebf70a66 | * Inc version to 0.84 | 2015-05-12 02:38:51 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fb8d50b3d5 | Merge branch 'master' of ssh://github.com/honnibal/spaCy | 2015-04-30 12:45:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 378c2a6435 | * Fix POS model: make it use tag instead of pos in history features | 2015-04-29 00:02:53 +02:00 |  | 
			
				
					| 
							
							
								 Jordan Suchow | 3a8d9b37a6 | Remove trailing whitespace | 2015-04-19 13:01:38 -07:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cc4e395927 | * Add some ad hoc regexes, for multi-word location prepositions | 2015-04-17 04:44:24 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 684d0e5e85 | * Download updated data | 2015-04-16 04:29:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 42617548af | * Disable merge_mwes by default | 2015-04-16 04:20:31 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 77d0700caf | * Add on X way regexes | 2015-04-16 01:35:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c6707778dd | * Fix Issue #51: Handle non-ascii lemmas correctly | 2015-04-13 22:28:59 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 761a19113a | * Fix /tmp moving thing in download.py | 2015-04-12 07:04:10 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b64b2bd910 | * Fix Issue #43: TAG attr not supported. Also add DEP attr, while I'm at it. Need better way of ensuring future changes don't break in similar way. | 2015-04-07 06:00:30 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b8d34531c4 | * Add support for units to English.__init__, by loading and applying regular expressions | 2015-04-07 04:02:32 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2fee67cfa3 | * Add regular expressions for English multi-word expressions | 2015-04-07 03:45:18 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 567388e38d | * Use values encoded by StringStore in POS tagging, rather than indices into a list of tags | 2015-03-26 16:44:45 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 801bf14f4f | * Clean up handling of dep_strings and ent_strings, using StringStore to encode the label names. | 2015-03-26 16:44:45 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f21ab2d7fb | * Fix bug in ugly ent_strings hack on English class | 2015-03-26 16:44:45 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8057a95f20 | * NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring. | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 220ce8bfed | * Prepare English class for NER | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 179b7eb0a7 | * Specify parser transition system in language | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8cc3524dc9 | * Ws | 2015-03-26 16:44:41 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2e8d0e5d45 | * Upd download script | 2015-03-03 05:47:16 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | caf046b220 | * Hastily add method to apply tags from a list of strings, instead of predicting the tags. | 2015-02-23 15:40:17 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 64645a1c2f | * Improve docstring on English | 2015-02-11 15:13:20 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 594e50bd45 | * Add option to download speech-parsing data set. | 2015-02-11 14:20:29 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0b7e769211 | * Add POS tags to support SWBD tag set | 2015-02-11 14:08:28 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 312b3a45f3 | * Fix issue #19: Allow parsing/pos tagging of empty strings | 2015-02-10 10:15:58 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2a0615104b | * Upd download script | 2015-02-09 10:22:59 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5c3513583d | * Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens. | 2015-02-09 03:57:10 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | be5536d239 | * Fix Issue #22: PRP and PRP$ were mapped to NOUN. Should be PRON. | 2015-02-08 18:36:18 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 44c7eafe44 | * Fix download.py | 2015-02-07 12:00:36 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6ca7f2eedc | * Upd download script | 2015-02-07 11:32:33 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 56c2ef2982 | * Tweak POS features for web text | 2015-02-02 11:59:36 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a20fdbd8ee | * Upd download script | 2015-02-01 13:22:23 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 63abdf154c | * Hastily hack download file | 2015-01-31 22:48:32 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a1ed574b7b | * Fix default model path for English | 2015-01-31 16:38:27 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e013555b25 | * Add option to download script | 2015-01-31 13:51:56 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 024cfd485c | * Pass tag_strings as a tuple, to support new Tokens API | 2015-01-31 13:43:37 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 83a4df5a1a | * Fix download script | 2015-01-30 20:40:42 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6f9ebc2f34 | * Fix download script | 2015-01-30 20:33:19 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8b85d0bb8a | * Only download small data if no data dir exists | 2015-01-30 20:27:14 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cb95ef6934 | * Fix download script | 2015-01-30 19:28:43 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e578bd37bd | * Fix download script | 2015-01-30 18:59:31 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | df52014d12 | * Fix download script | 2015-01-30 18:36:24 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 998b607f65 | * Upd download script, having it download all data if there's no data/ directory, allowing easier compilation from source | 2015-01-30 18:04:01 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 67d6e53a69 | * Ensure parser and tagger function correctly when training from missing values, indicated by -1 | 2015-01-30 14:08:56 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c38c62d4a3 | * Add docstring to English class | 2015-01-27 02:45:21 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7f87716cf7 | * Fix download script | 2015-01-25 23:01:10 +11:00 |  |