Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							18063803de
							
						
					 | 
					
						
						
							
							Make TokenC.sent_tart an int, to allow ternary value
						
						
						
						
						
					 | 
					
						2017-10-08 19:58:54 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							84e66ca6d4
							
						
					 | 
					
						
						
							
							WIP on stringstore change. 27 failures
						
						
						
						
						
					 | 
					
						2017-05-28 14:06:40 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f51e6a6c16
							
						
					 | 
					
						
						
							
							Adjust lexeme sizing for attr_t being 64 bit
						
						
						
						
						
					 | 
					
						2017-05-28 12:51:09 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3ea98e2043
							
						
					 | 
					
						
						
							
							Remove vector member from lexeme
						
						
						
						
						
					 | 
					
						2017-05-28 11:46:24 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							793430aa7a
							
						
					 | 
					
						
						
							
							Get spaCy train command working with neural network
						
						
						
						
						
						
						
						* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab 
						
					 | 
					
						2017-05-17 12:04:50 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							58e83fe34b
							
						
					 | 
					
						
						
							
							Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match.
						
						
						
						
						
					 | 
					
						2016-09-21 14:54:55 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Wolfgang Seeker
							
						 
					 | 
					
						
						
						
						
							
						
						
							03fb498dbe
							
						
					 | 
					
						
						
							
							introduce lang field for LexemeC to hold language id
						
						
						
						
						
						
						
						put noun_chunk logic into iterators.py for each language separately 
						
					 | 
					
						2016-03-10 13:01:34 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9ec7b9c454
							
						
					 | 
					
						
						
							
							* Clean up unused Constituent struct.
						
						
						
						
						
					 | 
					
						2015-11-03 23:48:21 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1e99fcd413
							
						
					 | 
					
						
						
							
							* Rename .repvec to .vector in C API
						
						
						
						
						
					 | 
					
						2015-11-03 23:47:59 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7ac6cacc26
							
						
					 | 
					
						
						
							
							* Remove const qualifier on LexemeC.repvec
						
						
						
						
						
					 | 
					
						2015-09-15 14:42:51 +10:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c2307fa9ee
							
						
					 | 
					
						
						
							
							* More work on language-generic parsing
						
						
						
						
						
					 | 
					
						2015-08-28 02:02:33 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1d7f2d3abc
							
						
					 | 
					
						
						
							
							* Hack on morphology structs
						
						
						
						
						
					 | 
					
						2015-08-26 19:18:36 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							815bda201d
							
						
					 | 
					
						
						
							
							* Remove UniStr struct
						
						
						
						
						
					 | 
					
						2015-07-22 13:39:17 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							128b6d9714
							
						
					 | 
					
						
						
							
							* Move Utf8Str struct to strings module, as that's the only place it's relevant
						
						
						
						
						
					 | 
					
						2015-07-20 12:06:41 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4dddc8a69b
							
						
					 | 
					
						
						
							
							* Fix type declarations for attr_t. Remove unused id_t.
						
						
						
						
						
					 | 
					
						2015-07-18 22:39:57 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							95e57c2780
							
						
					 | 
					
						
						
							
							* Remove unnecessary key and id properties from Utf8String.
						
						
						
						
						
					 | 
					
						2015-07-17 01:40:18 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							aa82caf8f5
							
						
					 | 
					
						
						
							
							* Add TokenC.spacy attr
						
						
						
						
						
					 | 
					
						2015-07-13 19:48:07 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1d3a592edf
							
						
					 | 
					
						
						
							
							* Remove the senses attr from LexemeC, to keep data compatibility
						
						
						
						
						
					 | 
					
						2015-07-08 19:24:44 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e23d1582a2
							
						
					 | 
					
						
						
							
							* Add supersense data to Lexeme objects. Add simple has_sense method to check the flag.
						
						
						
						
						
					 | 
					
						2015-07-01 18:50:37 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a7bf7b0626
							
						
					 | 
					
						
						
							
							* Rename sent_start to sent_end, to reflect its new usage in the Break transition
						
						
						
						
						
					 | 
					
						2015-06-23 05:39:43 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8ee7c541f1
							
						
					 | 
					
						
						
							
							* Update Constituent definition
						
						
						
						
						
					 | 
					
						2015-05-20 16:03:26 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							03a6626545
							
						
					 | 
					
						
						
							
							* Tmp commit
						
						
						
						
						
					 | 
					
						2015-05-12 20:27:56 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d2ac8d8007
							
						
					 | 
					
						
						
							
							* Add ctnt field to State, in preparation for constituency parsing
						
						
						
						
						
					 | 
					
						2015-05-12 20:27:56 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d634038eb6
							
						
					 | 
					
						
						
							
							* Add l_edge and r_edge props in TokenC for tracking the parse-yield of the token
						
						
						
						
						
					 | 
					
						2015-05-12 20:26:41 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Jordan Suchow
							
						 
					 | 
					
						
						
						
						
							
						
						
							3a8d9b37a6
							
						
					 | 
					
						
						
							
							Remove trailing whitespace
						
						
						
						
						
					 | 
					
						2015-04-19 13:01:38 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8057a95f20
							
						
					 | 
					
						
						
							
							* NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring.
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:44 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b3eda03c9c
							
						
					 | 
					
						
						
							
							* Tmp
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:44 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							135756ac3d
							
						
					 | 
					
						
						
							
							* Tmp commit of NER refactoring
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:42 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b139aa92ba
							
						
					 | 
					
						
						
							
							* Start setting out how NER will be implemented in the data model
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:41 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							75f9b7d6bf
							
						
					 | 
					
						
						
							
							* Add L2 norm field to LexemeC struct
						
						
						
						
						
					 | 
					
						2015-02-07 08:43:17 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							08ca5c8970
							
						
					 | 
					
						
						
							
							* Add sent_end flag to TokenC struct
						
						
						
						
						
					 | 
					
						2015-01-31 13:44:16 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							12b034e3ef
							
						
					 | 
					
						
						
							
							* Move POS tag definitions to parts_of_speech.pxd
						
						
						
						
						
					 | 
					
						2015-01-25 16:31:07 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							fda94271af
							
						
					 | 
					
						
						
							
							* Rename NORM1 and NORM2 attrs to lower and norm
						
						
						
						
						
					 | 
					
						2015-01-24 06:17:03 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5ed8b2b98f
							
						
					 | 
					
						
						
							
							* Rename sic to orth
						
						
						
						
						
					 | 
					
						2015-01-23 02:08:25 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							45264e356b
							
						
					 | 
					
						
						
							
							* Rename vec to repvec
						
						
						
						
						
					 | 
					
						2015-01-22 02:04:24 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6c7e44140b
							
						
					 | 
					
						
						
							
							* Work on word vectors, and other stuff
						
						
						
						
						
					 | 
					
						2015-01-17 16:21:17 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							46da3d74d2
							
						
					 | 
					
						
						
							
							* Tmp. Refactoring, introducing a Lexeme PyObject.
						
						
						
						
						
					 | 
					
						2015-01-12 11:23:44 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ce2edd6312
							
						
					 | 
					
						
						
							
							* Tmp commit. Refactoring to create a Python Lexeme class.
						
						
						
						
						
					 | 
					
						2015-01-12 10:26:22 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b8b65903fc
							
						
					 | 
					
						
						
							
							* Tmp
						
						
						
						
						
					 | 
					
						2014-12-24 17:42:00 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e1c1a4b868
							
						
					 | 
					
						
						
							
							* Tmp
						
						
						
						
						
					 | 
					
						2014-12-21 05:36:29 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							780cbd68b1
							
						
					 | 
					
						
						
							
							* Move all struct definitions to structs.pxd, to avoid circular dependencies
						
						
						
						
						
					 | 
					
						2014-12-20 06:51:33 +11:00 | 
					
					
						
						
							
							
							
						
					 |