| 
							
							
								 ines | 66c1f194f9 | Use consistent unicode declarations | 2017-03-12 13:07:28 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ea2592879f | Merge branch 'master' of https://github.com/explosion/spaCy | 2017-03-11 11:13:37 -06:00 |  | 
			
				
					| 
							
							
								 ines | b04893a059 | Make regex locale-independent for Python 2 | 2017-03-10 14:21:57 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ea53647362 | Merge branch 'develop' | 2017-03-10 02:49:39 -06:00 |  | 
			
				
					| 
							
							
								 Dan Rapp | 3b1df3808d | Issue #840 - URL pattenr too broad | 2017-03-09 11:39:39 -07:00 |  | 
			
				
					| 
							
							
								 Roman Inflianskas | 66e1109b53 | Add support for Universal Dependencies v2.0 | 2017-03-03 13:17:34 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 012f4820cb | Keep infixes of punctuation + hyphens as one token (see #801) | 2017-02-02 16:22:40 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 1219a5f513 | Add = to tokenizer prefixes | 2017-02-02 16:21:11 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | ff04748eb6 | Add missing emoticon | 2017-02-02 16:21:00 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 116c675c3c | Merge pull request #742 from oroszgy/hu_tokenizer_fix Improved Hungarian tokenizer | 2017-01-14 23:52:44 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | 63037e79af | Fixed hyphen handling in the Hungarian tokenizer. | 2017-01-14 16:30:11 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | be7a7aeb1a | Reversed accidental changes. | 2017-01-14 15:59:36 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | 1be5da1ac6 | Fixed Hungarian tokenizer for numbers | 2017-01-14 15:51:59 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 0894b8c0ef | Don't split tokens with digits and "/" infixes (resolves #740) | 2017-01-12 22:58:26 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fba67fa342 | Fix Issue #736: Times were being tokenized with incorrect string values. | 2017-01-12 11:21:01 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | aa876884f0 | Revert "Revert "Merge remote-tracking branch 'origin/master'"" This reverts commit fb9d3bb022. | 2017-01-09 13:28:13 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | eef94e3ee2 | Split off period after two or more uppercase letters (fixes #483) | 2017-01-08 22:28:25 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 347c4a2d06 | Reorganise and reformat global tokenizer prefixes, suffixes and infixes | 2017-01-08 20:37:39 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 7c3cb2a652 | Add global abbreviations data | 2017-01-08 20:34:03 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | bc911322b3 | Move ") to emoticons (see Tweebo challenge test) | 2017-01-05 18:05:38 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | fb9d3bb022 | Revert "Merge remote-tracking branch 'origin/master'" This reverts commit d3b181cdf1, reversing
changes made tob19cfcc144. | 2017-01-03 18:21:36 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9936a1b9b5 | Merge branch 'tokenization_w_exception_patterns' of https://github.com/oroszgy/spaCy.hu into oroszgy-tokenization_w_exception_patterns | 2016-12-30 14:53:40 -06:00 |  | 
			
				
					| 
							
							
								 Petter Hohle | f112e7754e | Add PART to tag map 16 of the 17 PoS tags in the UD tag set is added; PART is missing. | 2016-12-28 18:39:01 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | 3a9be4d485 | Updated token exception handling mechanism to allow the usage of arbitrary functions as token exception matchers. | 2016-12-23 23:49:34 +01:00 |  | 
			
				
					| 
							
							
								 Gyorgy Orosz | 1748549aeb | Added exception pattern mechanism to the tokenizer. | 2016-12-21 23:16:19 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 920fa0fed2 | Add DET_LEMMA constant | 2016-12-21 18:05:41 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 4e95737c6c | Add base tag map | 2016-12-18 16:54:28 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 2b2ea8ca11 | Reorganise language data | 2016-12-18 16:54:19 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | bc40dad7d9 | Add entity rules | 2016-12-18 15:36:53 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | eaa3b1319d | Fix formatting | 2016-12-18 15:36:53 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 62655fd36f | Add ENT_ID constant | 2016-12-18 15:36:53 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | f324311249 | Add global language data utils | 2016-12-17 12:27:41 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | e47ee94761 | Split punctuation into its own file | 2016-12-08 19:46:43 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | e8ae588be9 | Add emoticons | 2016-12-08 19:45:18 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 5908c0ed9f | Fix formatting | 2016-12-08 19:45:11 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 0d07d7fc80 | Apply emoticon exceptions to tokenizer | 2016-12-07 21:11:59 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 9413bcd9ee | Declare encoding and unicode literals | 2016-12-07 21:10:34 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | a280ff2657 | Fix __all__ | 2016-12-07 21:10:12 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | ba8721953c | Add missing emoticons | 2016-12-07 21:09:44 +01:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 79dce0aabe | Add emoticons | 2016-12-07 20:33:28 +01:00 |  |