Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							dc393a5f1d
							
						
					 | 
					
						
						
							
							Merge pull request #126 from tomtung/master
						
						
						
						
						
						
						
						Improve slicing support for both Doc and Span 
						
					 | 
					
						2015-10-10 14:14:57 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							83dccf0fd7
							
						
					 | 
					
						
						
							
							* Use io module insteads of deprecated codecs module
						
						
						
						
						
					 | 
					
						2015-10-10 14:13:01 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Yubing (Tom) Dong
							
						 
					 | 
					
						
						
						
						
							
						
						
							3fd3bc79aa
							
						
					 | 
					
						
						
							
							Refactor to remove duplicate slicing logic
						
						
						
						
						
					 | 
					
						2015-10-07 01:25:35 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								alvations
							
						 
					 | 
					
						
						
						
						
							
						
						
							8199012d26
							
						
					 | 
					
						
						
							
							changing deprecated codecs.open to io.open =)
						
						
						
						
						
					 | 
					
						2015-09-30 20:10:15 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6ab1696b15
							
						
					 | 
					
						
						
							
							* Remove read_encoding_freqs from util.py
						
						
						
						
						
					 | 
					
						2015-07-23 01:17:32 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							317cbbc015
							
						
					 | 
					
						
						
							
							* Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time.
						
						
						
						
						
					 | 
					
						2015-07-19 15:18:17 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Jordan Suchow
							
						 
					 | 
					
						
						
						
						
							
						
						
							3a8d9b37a6
							
						
					 | 
					
						
						
							
							Remove trailing whitespace
						
						
						
						
						
					 | 
					
						2015-04-19 13:01:38 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Jordan Suchow
							
						 
					 | 
					
						
						
						
						
							
						
						
							5f0f940a1f
							
						
					 | 
					
						
						
							
							Remove unused imports
						
						
						
						
						
					 | 
					
						2015-04-19 01:05:22 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3f1944d688
							
						
					 | 
					
						
						
							
							* Make PyPy work
						
						
						
						
						
					 | 
					
						2015-01-05 17:54:38 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f5d41028b5
							
						
					 | 
					
						
						
							
							* Move around data files for test release
						
						
						
						
						
					 | 
					
						2015-01-03 01:59:22 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e1c1a4b868
							
						
					 | 
					
						
						
							
							* Tmp
						
						
						
						
						
					 | 
					
						2014-12-21 05:36:29 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b962fe73d7
							
						
					 | 
					
						
						
							
							* Make suffixes file use full-power regex, so that we can handle periods properly
						
						
						
						
						
					 | 
					
						2014-12-09 19:04:27 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							302e09018b
							
						
					 | 
					
						
						
							
							* Work on fixing special-cases, reading them in as JSON objects so that they can specify lemmas
						
						
						
						
						
					 | 
					
						2014-12-09 14:48:01 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ea8f1e7053
							
						
					 | 
					
						
						
							
							* Tighten interfaces
						
						
						
						
						
					 | 
					
						2014-10-30 18:14:42 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							67c8c8019f
							
						
					 | 
					
						
						
							
							* Update lexeme serialization, using a binary file format
						
						
						
						
						
					 | 
					
						2014-10-30 01:01:00 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							43d5964e13
							
						
					 | 
					
						
						
							
							* Add function to read detokenization rules
						
						
						
						
						
					 | 
					
						2014-10-22 12:54:59 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							12742f4f83
							
						
					 | 
					
						
						
							
							* Add detokenize method and test
						
						
						
						
						
					 | 
					
						2014-10-18 18:07:29 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6fb42c4919
							
						
					 | 
					
						
						
							
							* Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang
						
						
						
						
						
					 | 
					
						2014-10-14 16:17:45 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e40caae51f
							
						
					 | 
					
						
						
							
							* Update Lexicon class to expect a list of lexeme dict descriptions
						
						
						
						
						
					 | 
					
						2014-10-09 14:51:35 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2e44fa7179
							
						
					 | 
					
						
						
							
							* Add util.py
						
						
						
						
						
					 | 
					
						2014-09-25 18:26:22 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e9a62b6eba
							
						
					 | 
					
						
						
							
							* Refactoring with Lexeme as a class now compiles. Basic design seems to work
						
						
						
						
						
					 | 
					
						2014-08-27 17:15:39 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d10993f41a
							
						
					 | 
					
						
						
							
							* More docs work
						
						
						
						
						
					 | 
					
						2014-08-21 16:37:13 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3379d7a571
							
						
					 | 
					
						
						
							
							* Reforming data model for lexemes
						
						
						
						
						
					 | 
					
						2014-08-19 02:40:37 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							01469b0888
							
						
					 | 
					
						
						
							
							* Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word.
						
						
						
						
						
					 | 
					
						2014-08-18 19:14:00 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ff1869ff07
							
						
					 | 
					
						
						
							
							* Fixed major efficiency problem, from not quite grokking pass by reference in cython c++
						
						
						
						
						
					 | 
					
						2014-07-07 07:36:43 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							25849fc926
							
						
					 | 
					
						
						
							
							* Generalize tokenization rules to capitals
						
						
						
						
						
					 | 
					
						2014-07-07 05:07:21 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4e79446dc2
							
						
					 | 
					
						
						
							
							* Reading in tokenization rules correctly. Passing tests.
						
						
						
						
						
					 | 
					
						2014-07-07 00:02:55 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							556f6a18ca
							
						
					 | 
					
						
						
							
							* Initial commit. Tests passing for punctuation handling. Need contractions, file transport, tokenize function, etc.
						
						
						
						
						
					 | 
					
						2014-07-05 20:51:42 +02:00 | 
					
					
						
						
							
							
							
						
					 |