| 
							
							
								 Ines Montani | 959c46eabe | Merge pull request #1365 from wannaphongcom/develop Add Thai language for spaCy v2 | 2017-09-26 23:43:05 +02:00 |  | 
			
				
					| 
							
							
								 Wannaphong Phatthiyaphaibun | 3d5046c499 | fix import in th | 2017-09-26 22:41:20 +07:00 |  | 
			
				
					| 
							
							
								 Wannaphong Phatthiyaphaibun | a63f790b8c | fix thai tag_map | 2017-09-26 22:28:57 +07:00 |  | 
			
				
					| 
							
							
								 Wannaphong Phatthiyaphaibun | 2ea27d07f4 | fix tokenizer_exceptions in thai | 2017-09-26 22:14:47 +07:00 |  | 
			
				
					| 
							
							
								 Wannaphong Phatthiyaphaibun | a2bf4cc7bf | fix newline in file | 2017-09-26 21:49:43 +07:00 |  | 
			
				
					| 
							
							
								 ines | bb5c631402 | Implement like_num getter for French (via #1161) | 2017-09-26 16:47:45 +02:00 |  | 
			
				
					| 
							
							
								 ines | 15479b3bae | Add comment to like_num re: future work | 2017-09-26 16:43:28 +02:00 |  | 
			
				
					| 
							
							
								 ines | adda08fe14 | Implement like_num getter for Dutch (via #1177) | 2017-09-26 16:39:15 +02:00 |  | 
			
				
					| 
							
							
								 ines | 5ee10379db | Port over changes from #1340 | 2017-09-26 16:38:08 +02:00 |  | 
			
				
					| 
							
							
								 Wannaphong Phatthiyaphaibun | 5cba67146c | add thai in spacy2 | 2017-09-26 21:36:27 +07:00 |  | 
			
				
					| 
							
							
								 ines | 10d291f129 | Port over change from #1351 | 2017-09-26 16:11:41 +02:00 |  | 
			
				
					| 
							
							
								 ines | ece30c28a8 | Don't split hyphenated words in German This way, the tokenizer matches the tokenization in German treebanks | 2017-09-16 20:40:15 +02:00 |  | 
			
				
					| 
							
							
								 Ines Montani | bd3da3d6fb | Port over change from #1323 and tidy up | 2017-09-14 19:23:13 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b29e6bff46 | Improve lemmatization rule for am|VBP | 2017-09-04 15:18:10 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2e28982e28 | Merge pull request #1288 from geovedi/indonesian Indonesian language support | 2017-08-26 21:31:13 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cfc055734e | Split % in units, for compatibility with corpus | 2017-08-25 20:03:37 -05:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 58d8078971 | Merge remote-tracking branch 'upstream/develop' into indonesian | 2017-08-25 09:21:49 +08:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bb2541ffd3 | Fix PROB attr for OOV words | 2017-08-23 12:11:52 +02:00 |  | 
			
				
					| 
							
							
								 ines | a68dc891ea | Port over changes from  #1281 | 2017-08-21 23:19:18 +02:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | f77443ab68 | reworked | 2017-08-20 13:43:21 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | b7d83f37c8 | indonesian abbr. | 2017-08-20 12:16:50 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 7193c47f0b | direct lookup | 2017-08-20 11:57:52 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | fdf802d505 | added examples | 2017-08-20 11:57:10 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | fa544e6c9a | Merge remote-tracking branch 'upstream/develop' into indonesian | 2017-08-20 11:49:40 +07:00 |  | 
			
				
					| 
							
							
								 ines | 1fe5e1a4d1 | Add language example sentences (see #1107) da, de, en, es, fr, he, it, nb, pl, pt, sv | 2017-08-19 12:22:29 +02:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 37f19f5ed2 | added more currencies based on corpus data | 2017-08-03 13:03:25 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 30fd068d42 | hashtag prefix should be handled somewhere else | 2017-08-03 13:03:02 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | ba07e23c87 | added USD in currency rules | 2017-08-02 22:42:47 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | bb08d696f9 | added hashtag rule and fixed currency rules | 2017-07-30 21:23:28 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | e9af79a803 | added u-\d+ rules (sports team) | 2017-07-30 21:23:01 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | e5adc26c72 | simplified rules | 2017-07-29 18:21:32 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 4d04898dea | updated regexp | 2017-07-29 17:44:57 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 7d96d477ea | updated like_num | 2017-07-29 17:44:46 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 3cca4ed798 | added lex attrs rules | 2017-07-29 17:22:21 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 8b814c63f1 | more exceptions | 2017-07-27 19:46:30 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 6c725e8dcf | updated lemma | 2017-07-27 19:46:21 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 547973b92a | wip syntax iterators | 2017-07-27 10:51:34 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | bbc75da38d | enable syntax iterator and lemma lookup | 2017-07-27 10:51:15 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 24a8c8bf28 | added wip lemma dict | 2017-07-26 21:39:54 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 63f14ba46b | added hyphen-suffix rules | 2017-07-26 19:28:57 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | f288964441 | removed -el from suffix rules | 2017-07-26 19:28:38 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 6eee7a7411 | updated tokenizer exceptions | 2017-07-26 19:13:47 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | edec51b1b1 | update punctuation rules | 2017-07-26 19:13:36 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 62443d495a | enable token match | 2017-07-26 19:13:14 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | c97f5ae0bb | updated tokenizer exceptions | 2017-07-26 19:12:52 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 73f6ac9d9b | added hyhen | 2017-07-24 15:56:31 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 68454c40bf | added missing import | 2017-07-24 14:12:34 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | eaf9cbd708 | cursed of copy & paste | 2017-07-24 14:11:51 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | 7aad6718bc | enable tokenizer exceptions | 2017-07-24 14:11:10 +07:00 |  | 
			
				
					| 
							
							
								 Jim Geovedi | ad56c9179a | added tokenizer exceptions list | 2017-07-24 14:10:16 +07:00 |  |