| 
							
							
								 Ines Montani | bd20ec0a6a | Add get_cosine util function | 2017-01-12 16:51:13 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 51ef75f629 | Fix regression test for #615 and remove unnecessary imports | 2017-01-12 16:51:12 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | aeb747e10c | Adjust formatting | 2017-01-12 16:51:12 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 8e3e58a7e6 | Modernise and merge lexeme vocab tests | 2017-01-12 16:51:12 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | c3d4516fc2 | Move test for #361 to regression tests | 2017-01-12 16:51:12 +01:00 |  | 
			
				
					| 
							
							
								 Daniel Hershcovich | 99eb494a82 | Fix #737: support loading word vectors with " " as a word | 2017-01-12 17:00:14 +02:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 7cb3d74426 | Modernise span tests and don't depend on models | 2017-01-12 15:30:49 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 92e3d8b3ee | Modernise vocab API tests and remove old xfailing tests | 2017-01-12 15:27:46 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 7ea87684cd | Rename test_vocab.py to test_vocab_api.py | 2017-01-12 15:12:21 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 0da2ee5c68 | Merge flag features tests into orth tests in tests root | 2017-01-12 15:12:00 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 03c136cfd3 | Remove StringStore tests from vocab tests | 2017-01-12 15:11:15 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | d7bd57abdf | Modernise add vectors vocab test | 2017-01-12 15:09:49 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 89525ef345 | Use consistent test names | 2017-01-12 15:09:21 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | f8803808ce | Remove old unused tests and conftest files | 2017-01-12 15:09:05 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 4d0bfebcd9 | Move Pragmatic Segmenter test cases (currently unused) to parser tests | 2017-01-12 15:08:02 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 26d018d874 | Add tests for StringStore | 2017-01-12 15:07:31 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 9b6784bab5 | Add fixture for StringStore | 2017-01-12 15:05:40 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 99d66d613a | Modernise tests for merging spans and don't depend on models | 2017-01-12 12:26:26 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | fa8f67596d | Remove unused old test | 2017-01-12 12:26:08 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 359f73a96b | Move test for #54 to regression tests | 2017-01-12 12:25:51 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 3f3a46722c | Remove unused conftest | 2017-01-12 12:25:24 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | c2406e92bc | Allow setting ents in get_doc | 2017-01-12 12:25:10 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | c5914c6fe5 | Fix and pass regression test for #736 | 2017-01-12 11:48:56 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4e48862fa8 | Remove print statement | 2017-01-12 11:25:39 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d1d8214767 | Increment version | 2017-01-12 11:21:57 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fba67fa342 | Fix Issue #736: Times were being tokenized with incorrect string values. | 2017-01-12 11:21:01 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | a6790b6694 | Rename tags to pos in get_doc and allow adding tags to tokens | 2017-01-12 11:18:36 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 1add8ace67 | Merge lemmatizer tests | 2017-01-12 11:16:53 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 3bc082abdf | Modernise morph exceptions test and don't depend on models | 2017-01-12 11:14:29 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | ec7739b76e | Add regression test for #736 | 2017-01-12 11:12:44 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 6c1c564891 | Move language-specific tests out of redundant tokenizer directories | 2017-01-12 02:17:18 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 8fecedac3a | Tidy up | 2017-01-12 02:16:37 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | ae7edd30e7 | Move text file back to tokenizer tests directory | 2017-01-12 02:10:23 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | ffcaba9017 | Remove old and/or redundant tests | 2017-01-12 02:10:18 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 19c4132097 | Modernise space attachment parser tests and don't depend on models | 2017-01-12 01:54:44 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 69778924c8 | Modernise and merge parser tests and don't depend on models | 2017-01-12 01:07:29 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 178c147612 | Modernise nonprojectivity tests and don't depend on models | 2017-01-12 01:06:36 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 1a3984742c | Modernise sentence boundary detection tests and don't depend on models (where possible) | 2017-01-11 23:53:08 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 0cdb6ea61d | Remove old unused pickle test | 2017-01-11 23:52:28 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | c9671329dc | Move test for #309 to regression tests | 2017-01-11 23:52:13 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | d0e37b5670 | Modernise parser tests and don't depend on models | 2017-01-11 21:30:27 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 342cb41782 | Add apply_transition_sequence util function to utils | 2017-01-11 21:30:14 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 09807addff | Add en_parser fixture | 2017-01-11 21:29:59 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 55d151aa61 | Modernise Doc parse tree navigation tests and don't depend on models | 2017-01-11 21:14:15 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 7262421bb2 | Use consistent test names | 2017-01-11 19:00:52 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 33800c9367 | Rename "tokens" tests to "doc" | 2017-01-11 18:59:01 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 3a9c6a9563 | Remove old unused files | 2017-01-11 18:58:38 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 8e962de39f | Remove old word vector tests | 2017-01-11 18:55:08 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | e027936920 | Modernise Doc noun chunks tests | 2017-01-11 18:54:56 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 439f396acd | Modernise Doc array tests and don't depend on models | 2017-01-11 18:54:46 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 05447be884 | Modernise test for adding entities | 2017-01-11 18:54:24 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 6e883f4c00 | Modernise Doc API tests and don't depend on models | 2017-01-11 18:05:36 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 8bf3bb5c44 | Make words optional for get_doc | 2017-01-11 18:05:10 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 928db7e419 | Fix StringIO import for Python 3 | 2017-01-11 14:07:48 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 69998f216b | Rename test_tokens_api.py to test_doc_api.py | 2017-01-11 13:58:56 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | d94dea1b18 | Merge token tests into token API tests | 2017-01-11 13:57:02 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | eb23424ab0 | Modernise token API tests and don't depend on loading models | 2017-01-11 13:56:54 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | c682b8ca90 | Merge conftests into one cohesive file | 2017-01-11 13:56:32 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 909f24d7df | Add test utils and get_doc helper function Create Doc object from given vocab, words and annotations to allow
tests not to depend on loading the models. | 2017-01-11 13:55:33 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e12c90e03f | Merge branch 'master' of ssh://github.com/explosion/spaCy | 2017-01-11 13:03:51 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 12cd27b821 | Amend 8ae8b443f: Handle comparison with None tokens. | 2017-01-11 13:03:32 +01:00 |  | 
			
				
					| 
							
							
								 Daniel Hershcovich | 8e603cc917 | Avoid "True if ... else False" | 2017-01-11 11:18:22 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 44e2b0100d | Support TAG attribute in doc.from_array | 2017-01-10 22:47:07 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 3e6e1f0251 | Tidy up regression tests | 2017-01-10 19:24:10 +01:00 |  | 
			
				
					| 
							
							
								 Magnus Burton | aad23ab0b4 | Supplemented with capitalized Swedish exceptions | 2017-01-10 16:07:20 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 869963c3c4 | Mark extensive prefix/suffix tests as slow | 2017-01-10 15:57:35 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 487e020ebe | Add simple test for surrounding brackets | 2017-01-10 15:57:26 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 0ba5cf51d2 | Assert length first | 2017-01-10 15:57:00 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 2185d31907 | Adjust names and formatting | 2017-01-10 15:56:35 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | e10d4ca964 | Remove semi-redundant URLs and punctuation for faster testing | 2017-01-10 15:54:25 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 3a3cb2c90c | Add unicode declaration | 2017-01-10 15:53:15 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0f9b8a00a5 | Unbreak data download | 2017-01-09 23:40:26 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8ae8b443f1 | Add richcmp method to Token. Closes #631 | 2017-01-09 19:30:31 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 64f747cb65 | Token comparison test | 2017-01-09 19:12:00 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 18c3c2d05c | Add tests for token comparison, re Issue #631 | 2017-01-09 19:09:59 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 97a1286129 | Revert changes to tagger and parser for thinc 6 | 2017-01-09 10:08:34 -06:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 95a52005df | Revert "Fix Issue #683: Add 'SP' to tag_map, if it's not there already, within the Morphology class." This reverts commit 40e71586d6. | 2017-01-09 09:55:55 -06:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 363f09e68c | Merge pull request #726 from magnusburton/master Added Swedish abbreviations as token exceptions | 2017-01-09 14:58:15 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 42cd598f57 | Use correct fixtures in URL tokenizer | 2017-01-09 14:10:40 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d9a77ddf14 | Return None for data path if it doesn't exist | 2017-01-09 14:10:05 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e4862d1dab | Merge branch 'develop' | 2017-01-09 13:36:01 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | aa876884f0 | Revert "Revert "Merge remote-tracking branch 'origin/master'"" This reverts commit fb9d3bb022. | 2017-01-09 13:28:13 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | d5c72c40eb | Remove old tests for old website example code | 2017-01-08 22:28:53 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | eef94e3ee2 | Split off period after two or more uppercase letters (fixes #483) | 2017-01-08 22:28:25 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | a89a6000e5 | Remove unused import | 2017-01-08 22:17:37 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 5d28664fc5 | Don't test Hungarian for numbers and hyphens for now Reinvestigate behaviour of case affixes given reorganised tokenizer
patterns. | 2017-01-08 20:45:40 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 53362b6b93 | Reorganise Hungarian prefixes/suffixes/infixes Use global prefixes and suffixes for non-language-specific rules,
import list of alpha unicode characters and adjust regexes. | 2017-01-08 20:40:33 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 347c4a2d06 | Reorganise and reformat global tokenizer prefixes, suffixes and infixes | 2017-01-08 20:37:39 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 0dec90e9f7 | Use global abbreviation data languages and remove duplicates | 2017-01-08 20:36:00 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 7c3cb2a652 | Add global abbreviations data | 2017-01-08 20:34:03 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | de5aa92bc2 | Handle deprecated tokenizer prefix data | 2017-01-08 20:33:28 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | abb09782f9 | Move sun.txt to original location and fix path to not break parser tests | 2017-01-08 20:32:54 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | cab39c59c5 | Add missing contractions to English tokenizer exceptions Inspired by
https://github.com/kootenpv/contractions/blob/master/contractions/__init
__.py | 2017-01-05 19:59:06 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | a23504fe07 | Move abbreviations below other exceptions | 2017-01-05 19:58:07 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 7d2cf934b9 | Generate he/she/it correctly with 's instead of 've | 2017-01-05 19:57:00 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 8328925e1f | Add newlines to long German text | 2017-01-05 18:13:30 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 55b46d7cf6 | Add tokenizer tests for German | 2017-01-05 18:11:25 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 5bb4081f52 | Remove redundant test_tokenizer.py for English | 2017-01-05 18:11:11 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 8216ba599b | Add tests for longer and mixed English texts | 2017-01-05 18:11:04 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 65f937d5c6 | Move basic contraction tests to test_contractions.py | 2017-01-05 18:09:53 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | bbe7cab3a1 | Move non-English-specific tests back to general tokenizer tests | 2017-01-05 18:09:29 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 038002d616 | Reformat HU tokenizer tests and adapt to general style Improve readability of test cases and add conftest.py with fixture | 2017-01-05 18:06:44 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | bc911322b3 | Move ") to emoticons (see Tweebo challenge test) | 2017-01-05 18:05:38 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 637f785036 | Add general sanity tests for all tokenizers | 2017-01-05 16:25:38 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | c5f2dc15de | Move English tokenizer tests to directory /en | 2017-01-05 16:25:04 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 8b45363b4d | Modernize and merge general tokenizer tests | 2017-01-05 13:17:05 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 02cfda48c9 | Modernize and merge tokenizer tests for string loading | 2017-01-05 13:16:55 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | a11f684822 | Modernize and merge tokenizer tests for whitespace | 2017-01-05 13:16:33 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 8b284fc6f1 | Modernize and merge tokenizer tests for text from file | 2017-01-05 13:15:52 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 2c2e878653 | Modernize and merge tokenizer tests for punctuation | 2017-01-05 13:14:16 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 8a74129cdf | Modernize and merge tokenizer tests for prefixes/suffixes/infixes | 2017-01-05 13:13:12 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 0e65dca9a5 | Modernize and merge tokenizer tests for exception and emoticons | 2017-01-05 13:11:31 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 34c47bb20d | Fix formatting | 2017-01-05 13:10:51 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 2e72683baa | Add missing docstrings | 2017-01-05 13:10:21 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | da10a049a6 | Add unicode declarations | 2017-01-05 13:09:48 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 58adae8774 | Remove unused file | 2017-01-05 13:09:22 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | c6e5a5349d | Move regression test for #360 into own file | 2017-01-04 00:49:31 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 8279993a6f | Modernize and merge tokenizer tests for punctuation | 2017-01-04 00:49:20 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 550630df73 | Update tokenizer tests for contractions | 2017-01-04 00:48:42 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 109f202e8f | Update conftest fixture | 2017-01-04 00:48:21 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | ee6b49b293 | Modernize tokenizer tests for emoticons | 2017-01-04 00:47:59 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | f09b5a5dfd | Modernize tokenizer tests for infixes | 2017-01-04 00:47:42 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 59059fed27 | Move regression test for #351 to own file | 2017-01-04 00:47:11 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 667051375d | Modernize tokenizer tests for whitespace | 2017-01-04 00:46:35 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | aafc894285 | Modernize tokenizer tests for contractions Use @pytest.mark.parametrize. | 2017-01-03 23:02:21 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 1d237664af | Add lowercase lemma to tokenizer exceptions | 2017-01-03 23:02:21 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 84a87951eb | Fix typos | 2017-01-03 18:27:43 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 35b39f53c3 | Reorganise English tokenizer exceptions (as discussed in #718) Add logic to generate exceptions that follow a consistent pattern (like
verbs and pronouns) and allow certain tokens to be excluded explicitly. | 2017-01-03 18:26:09 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | fb9d3bb022 | Revert "Merge remote-tracking branch 'origin/master'" This reverts commit d3b181cdf1, reversing
changes made tob19cfcc144. | 2017-01-03 18:21:36 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 461cbb99d8 | Revert "Reorganise English tokenizer exceptions (as discussed in #718)" This reverts commit b19cfcc144. | 2017-01-03 18:21:29 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | d3b181cdf1 | Merge remote-tracking branch 'origin/master' # Conflicts:
#	spacy/en/tokenizer_exceptions.py | 2017-01-03 18:20:01 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | b19cfcc144 | Reorganise English tokenizer exceptions (as discussed in #718) Add logic to generate exceptions that follow a consistent pattern (like
verbs and pronouns) and allow certain tokens to be excluded explicitly. | 2017-01-03 18:17:57 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 1bd53bbf89 | Fix typos (resolves #718) | 2017-01-03 11:26:21 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fde53be3b4 | Move whole token mach inside _split_affixes. | 2016-12-30 17:11:50 -06:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3ba7c167a8 | Fix URL tests | 2016-12-30 17:10:08 -06:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9936a1b9b5 | Merge branch 'tokenization_w_exception_patterns' of https://github.com/oroszgy/spaCy.hu into oroszgy-tokenization_w_exception_patterns | 2016-12-30 14:53:40 -06:00 |  | 
			
				
					| 
							
							
								 Magnus Burton | 56e2219b65 | Added Swedish city abbreviations | 2016-12-30 21:17:34 +01:00 |  | 
			
				
					| 
							
							
								 Magnus Burton | e935c950d8 | Added months and days as abbreviations for Swedish | 2016-12-30 21:08:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3e8d9c772e | Test interaction of token_match and punctuation Check that the new token_match function applies after punctuation is split off. | 2016-12-31 00:52:17 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 74b921f394 | Merge branch 'master' of ssh://github.com/explosion/spaCy into develop | 2016-12-30 14:38:27 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 623d94e14f | Whitespace | 2016-12-31 00:30:28 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | af81ac8bb0 | Use thinc 6.0 | 2016-12-29 11:58:42 +01:00 |  | 
			
				
					| 
							
							
								 Petter Hohle | f112e7754e | Add PART to tag map 16 of the 17 PoS tags in the UD tag set is added; PART is missing. | 2016-12-28 18:39:01 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f62db78dc3 | Increment version | 2016-12-27 21:11:22 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cade536d1e | Merge branch 'master' of ssh://github.com/explosion/spaCy | 2016-12-27 21:04:10 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ce4539dafd | Allow the vocabulary to grow to 10,000, to prevent cold-start problem. | 2016-12-27 21:03:45 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | ad3669cef5 | Merge pull request #703 from magnusburton/master Added Swedish abbreviations | 2016-12-27 01:01:49 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 78f754dd9a | Merge pull request #705 from oroszgy/hu_tokenizer Initial support for Hungarian | 2016-12-27 00:48:13 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 8785706039 | Reformat stop words for better readability | 2016-12-24 00:58:40 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | 45e045a87b | Unicode/UTF8 compatibility for Python2 | 2016-12-24 00:21:00 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | 72b61b6d03 | Typo fix. | 2016-12-24 00:10:29 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | 3a9be4d485 | Updated token exception handling mechanism to allow the usage of arbitrary functions as token exception matchers. | 2016-12-23 23:49:34 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 1436b9f15a | Fix formatting and consistency | 2016-12-23 21:36:01 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 1d64527727 | Update Spanish tokenizer Remove reflexive pronouns as they're part of an open class, fix
mistakes and add exceptions | 2016-12-23 21:36:01 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 7f411fd01c | Remove exceptions containing whitespace / no special chars | 2016-12-23 14:30:06 +01:00 |  | 
			
				
					| 
							
							
								 Magnus Burton | fdf4776262 | Added Swedish abbreviations | 2016-12-22 22:45:18 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | d9c59c4751 | Maintaining backward compatibility. | 2016-12-21 23:30:49 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | 1748549aeb | Added exception pattern mechanism to the tokenizer. | 2016-12-21 23:16:19 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | 35aa54765d | Hungarian module is exposed in spacy. | 2016-12-21 20:45:36 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | ab2f6ea46c | Removed data files from tests.. | 2016-12-21 20:22:09 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 3c87c71d43 | Add tokenizer exceptions for a.m. and p.m. in Spanish | 2016-12-21 18:19:10 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 78e63dc7d0 | Update tokenizer exceptions for English | 2016-12-21 18:06:34 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 702d1eed93 | Update tokenizer exceptions for German | 2016-12-21 18:06:27 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | d60380418e | Update tokenizer exceptions for Spanish | 2016-12-21 18:06:17 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 920fa0fed2 | Add DET_LEMMA constant | 2016-12-21 18:05:41 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 8978806ea6 | Allow Vocab to load without serializer_freqs | 2016-12-21 18:05:23 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | be8ed811f6 | Remove trailing whitespace | 2016-12-21 18:04:41 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 926e19184a | Merge pull request #695 from magnusburton/master Added Swedish morph rules | 2016-12-21 01:06:00 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | 3d5306acb9 | Added further testcases. | 2016-12-20 23:49:35 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | 23956e72ff | Improved partial support for tokenzing Hungarian numbers | 2016-12-20 23:36:59 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | 6add156075 | Refactored language data structure | 2016-12-20 22:28:20 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | 366b3f8685 | Merge branch 'master' into hu_tokenizer | 2016-12-20 20:53:31 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | c035928156 | Partial Hungarian number tokenization is added. | 2016-12-20 20:46:20 +01:00 |  | 
			
				
					| 
							
							
								 JM | 70ff0639b5 | Fixed missing vec_path declaration that was failing if 'add_vectors' was set Added vec_path variable declaration to avoid accessing it before assignment in case 'add_vectors' is in overrides. | 2016-12-20 18:21:05 +01:00 |  | 
			
				
					| 
							
							
								 Magnus Burton | 48dcc9f647 | Added morph rules | 2016-12-20 13:18:41 +01:00 |  | 
			
				
					| 
							
							
								 Magnus Burton | db5a077d2b | Initial commit for Swedish | 2016-12-20 11:05:06 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3f5747a9b2 | Merge branch 'master' of ssh://github.com/explosion/spaCy | 2016-12-18 23:44:22 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 40e71586d6 | Fix Issue #683: Add 'SP' to tag_map, if it's not there already, within the Morphology class. | 2016-12-18 23:44:05 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fa1d23e10d | Merge branch 'master' of https://github.com/explosion/spaCy | 2016-12-18 23:32:03 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f38eb25fe1 | Fix test for word vector | 2016-12-18 23:31:55 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4e68abebc4 | Merge branch 'master' of ssh://github.com/explosion/spaCy | 2016-12-18 23:19:45 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5a6328a5a4 | Increment version | 2016-12-18 23:19:19 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 13a0b31279 | Another tweak to GloVe path hackery. | 2016-12-18 23:12:49 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2c6228565e | Fix vector loading re glove hack | 2016-12-18 23:06:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 618b50a064 | Fix issue #684: GloVe vectors not loaded in spacy.en.English. | 2016-12-18 22:46:31 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 404019ad2f | Fix issue #672: ent_iob_ was a string, not unicode, due to missing unicode_literals statement. | 2016-12-18 22:33:53 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2ef9d53117 | Untested fix for issue #684: GloVe vectors hack should be inserted in English, not in spacy.load. | 2016-12-18 22:29:31 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c065359459 | Fix path-override bug in spacy.load | 2016-12-18 22:15:29 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 813249f826 | Work on morphology class. Still not fully consistent with rest of library. | 2016-12-18 17:35:22 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3679fb43a3 | Fix loading of lemmatizer | 2016-12-18 17:34:09 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3980f1b0cb | Ignore more morphology attributes in deprecated mode of intify_attrs | 2016-12-18 17:33:46 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7a98ee5e5a | Merge language data change | 2016-12-18 17:03:52 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e4c951c153 | Merge branch 'organize-language-data' of ssh://github.com/explosion/spaCy into organize-language-data | 2016-12-18 17:01:08 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | b99d683a93 | Fix formatting | 2016-12-18 16:58:28 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | b11d8cd3db | Merge remote-tracking branch 'origin/organize-language-data' into organize-language-data | 2016-12-18 16:57:12 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | d1c1d3f9cd | Fix tokenizer test | 2016-12-18 16:55:32 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 753068f1d5 | Use base language data as default | 2016-12-18 16:55:25 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | bcc1d50d09 | Remove trailing whitespace | 2016-12-18 16:54:52 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 4e95737c6c | Add base tag map | 2016-12-18 16:54:28 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 2b2ea8ca11 | Reorganise language data | 2016-12-18 16:54:19 +01:00 |  |