| 
							
							
								 ines | 9e83513004 | Add position of invalid token to error message | 2018-03-27 23:56:59 +02:00 |  | 
			
				
					| 
							
							
								 ines | 693971dd8f | Improve error message if token text is empty string (see #2101) | 2018-03-27 22:25:40 +02:00 |  | 
			
				
					| 
							
							
								 ines | 0c829e6605 | Fix whitespace | 2018-03-27 22:20:59 +02:00 |  | 
			
				
					| 
							
							
								 Thomas Opsomer | 515e25910e | fix sent_start in serialization | 2018-01-28 19:50:42 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 56164ab688 | Set l_edge and r_edge correctly for non-projective parses. Fixes #1799 | 2018-01-22 20:18:04 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ccb51a9f36 | Make .similarity() return 1.0 if all orth attrs match | 2018-01-15 16:29:48 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ab7c45b12d | Fix error message and handling of doc.sents | 2018-01-15 15:21:11 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e10e9ad2c5 | Improve efficiency of Doc.to_array | 2017-11-23 12:33:27 +00:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fa62427300 | Remove lookup-based lemmatization | 2017-11-23 12:32:22 +00:00 |  | 
			
				
					| 
							
							
								 ines | 1c218397f6 | Ensure path in Doc.to_disk/from_disk (resolves ##1521) Also add Doc serialization tests with both Path and string path options | 2017-11-09 02:29:03 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 144a93c2a5 | Back-off to tensor for similarity if no vectors | 2017-11-03 20:56:33 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 62ed58935a | Add Doc.extend_tensor() method | 2017-11-03 11:20:31 +01:00 |  | 
			
				
					| 
							
							
								 ines | 9659391944 | Update deprecated methods and add warnings | 2017-11-01 16:49:42 +01:00 |  | 
			
				
					| 
							
							
								 ines | 705a4e3e4a | Fix formatting | 2017-11-01 16:44:08 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7e7116cdf7 | Fix Doc.to_array when only one string attr provided | 2017-11-01 13:26:43 +01:00 |  | 
			
				
					| 
							
							
								 ines | 544a407b93 | Tidy up Doc, Token and Span and add missing docs | 2017-10-27 17:07:26 +02:00 |  | 
			
				
					| 
							
							
								 ines | 6a0483b7aa | Tidy up and document Doc, Token and Span | 2017-10-27 15:41:45 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ccd2ab1a62 | Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix Add LCA matrix for spans and docs | 2017-10-24 11:22:46 +02:00 |  | 
			
				
					| 
							
							
								 Ramanan Balakrishnan | d2fe56a577 | Add LCA matrix for spans and docs | 2017-10-20 23:58:00 +05:30 |  | 
			
				
					| 
							
							
								 Ramanan Balakrishnan | 0726946563 | cleanup to_array implementation using fixes on master | 2017-10-20 17:09:37 +05:30 |  | 
			
				
					| 
							
							
								 Ramanan Balakrishnan | b3ab124fc5 | Support strings for attribute list in doc.to_array | 2017-10-20 11:46:57 +05:30 |  | 
			
				
					| 
							
							
								 Ramanan Balakrishnan | 7b9b1be44c | Support single value for attribute list in doc.to_array | 2017-10-19 17:00:41 +05:30 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 394633efce | Make doc pickling support hooks | 2017-10-17 19:44:09 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cdb0c426d8 | Improve deserialization of user_data, esp. for Underscore | 2017-10-17 19:29:20 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 32a8564c79 | Fix doc pickling | 2017-10-17 18:20:24 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 92c1eb2d6f | Fix Doc pickling. This also removes need for Binder class | 2017-10-17 16:11:13 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a002264fec | Remove caching of Token in Doc, as caused cycle. | 2017-10-16 19:34:21 +02:00 |  | 
			
				
					| 
							
							
								 ines | e0ff145a8b | Merge branch 'develop' into feature/dot-underscore | 2017-10-11 11:57:05 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3b527fa52b | Call morphology.assign_untagged when pushing token to Doc | 2017-10-11 03:23:57 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e0a9b02b67 | Merge Span._ and Span.as_doc methods | 2017-10-09 22:00:15 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e938bce320 | Adjust parsing transition system to allow preset sentence segments. | 2017-10-08 23:53:34 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 668a0ea640 | Pass extensions into Underscore class | 2017-10-07 18:56:01 +02:00 |  | 
			
				
					| 
							
							
								 ines | 2480f8f521 | Add missing return in Doc.from_disk() (closes #1330) | 2017-09-18 15:32:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 03b5b9727a | Fix Doc.vector for empty doc objects | 2017-08-22 19:52:19 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0551b7b03a | Fix doc.vector | 2017-08-22 19:46:52 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8b7ac77c23 | Allow span label to be string in Doc.char_span | 2017-08-19 16:18:09 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 80236116a6 | Add Doc.char_span method, to get a span by character offset | 2017-08-19 12:21:09 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a6a2159969 | Add slot for text categories to Doc | 2017-07-22 00:34:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2a3bd5ee90 | Fix fetching of noun chunk iterator | 2017-06-04 15:53:05 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 92ae36f84e | Improve way noun chunks iterator is looked up | 2017-06-04 21:53:39 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 675f448313 | Fix vector linkage on Doc | 2017-06-04 14:25:30 -05:00 |  | 
			
				
					| 
							
							
								 ines | 459a1e8470 | Fix whitespace | 2017-06-03 11:31:18 +02:00 |  | 
			
				
					| 
							
							
								 ines | 5109bba910 | Port over fix from #1070 | 2017-06-03 11:31:11 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 498ad85309 | Try using tensor for vector/similarity methdos | 2017-05-30 23:35:17 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4ddff020c3 | Fix compile error | 2017-05-28 23:30:40 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6d3caeadd2 | Fix type check for long | 2017-05-28 23:22:45 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7996d21717 | Fixes for new StringStore | 2017-05-28 11:09:27 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fe11564b8e | Finish stringstore change. Also xfail vectors tests | 2017-05-28 15:10:22 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 84e66ca6d4 | WIP on stringstore change. 27 failures | 2017-05-28 14:06:40 +02:00 |  | 
			
				
					| 
							
							
								 ines | 66088851dc | Add Doc.to_disk() and Doc.from_disk() methods | 2017-05-24 11:58:17 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d44b1eafc4 | Fix conflict artefacts | 2017-05-23 18:47:11 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d68dd1f251 | Add SENT_START attribute, for custom sentence boundary detection | 2017-05-23 18:37:58 +02:00 |  | 
			
				
					| 
							
							
								 ines | 23f9a3ccc8 | Update docstrings and API docs for Doc | 2017-05-19 18:47:39 +02:00 |  | 
			
				
					| 
							
							
								 ines | 8455cb1327 | Update docstring for Doc.__getitem__ | 2017-05-19 00:30:51 +02:00 |  | 
			
				
					| 
							
							
								 ines | b687ad109d | Update docstrings and API docs for Doc class | 2017-05-18 23:59:44 +02:00 |  | 
			
				
					| 
							
							
								 ines | b87066ff10 | Update docstrings and API docs for Doc class | 2017-05-18 22:17:41 +02:00 |  | 
			
				
					| 
							
							
								 ines | 9d85cda8e4 | Fix models error message and use about.__docs_models__ (see #1051) | 2017-05-13 13:05:47 +02:00 |  | 
			
				
					| 
							
							
								 ines | 6b942763f0 | Tidy up imports | 2017-05-13 13:04:40 +02:00 |  | 
			
				
					| 
							
							
								 ines | b9dea345e5 | Remove old import | 2017-05-13 12:32:11 +02:00 |  | 
			
				
					| 
							
							
								 ines | 293ee359c5 | Fix formatting | 2017-05-13 12:32:06 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ee1d35bdb0 | Fix merge conflict | 2017-05-13 03:20:19 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b2540d2379 | Merge Kengz's tree_print patch | 2017-05-13 03:18:49 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4efb391994 | Fix serializer | 2017-05-09 18:45:18 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1166b0c491 | Implement Doc.to_bytes and Doc.from_bytes methods | 2017-05-09 18:11:34 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9e167b7bb6 | Strip serializer from code | 2017-05-09 17:28:50 +02:00 |  | 
			
				
					| 
							
							
								 ines | 0739ae7b76 | Tidy up and fix formatting and imports | 2017-04-15 13:05:15 +02:00 |  | 
			
				
					| 
							
							
								 ines | e71a1f4bd0 | Fix download commands in error messages (see #946) | 2017-04-01 10:20:57 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 51882ee2b8 | Fix check for setting ent_id in merge | 2017-03-31 19:32:01 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9720103428 | Improve attribute handlign in doc.merge(). Still unsatisfying | 2017-03-31 13:59:58 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0fefdfcbda | Merge pull request #935 from ericzhao28/master Add option to use label=ent_type in doc.merge arguments (Bug fix for issue #862) | 2017-03-30 02:51:24 +02:00 |  | 
			
				
					| 
							
							
								 Eric Zhao | aafdf6ffb8 | Add option to use label karg to determine ent_type in doc.merge | 2017-03-28 23:35:03 -07:00 |  | 
			
				
					| 
							
							
								 Roman Inflianskas | 66e1109b53 | Add support for Universal Dependencies v2.0 | 2017-03-03 13:17:34 +01:00 |  | 
			
				
					| 
							
							
								 Matvey Ezhov | 32a22291bc | Small Doc.count_bydocumentation updateCurrent example doesn't work | 2017-01-31 19:18:45 +03:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6c665b81df | Fix redundant == TAG in from_array conditional | 2017-01-31 00:46:21 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 44e2b0100d | Support TAG attribute in doc.from_array | 2017-01-10 22:47:07 +01:00 |  | 
			
				
					| 
							
							
								 kengz | 73a38bd4d1 | Merge remote-tracking branch 'upstream/master' | 2016-12-30 12:19:59 -05:00 |  | 
			
				
					| 
							
							
								 kengz | da44183ae1 | move parse_tree logic to a new tokens/printers.py file | 2016-12-30 12:19:18 -05:00 |  | 
			
				
					| 
							
							
								 Pokey Rule | 3e3bda142d | Add noun_chunks to Span | 2016-11-24 10:47:20 +00:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1fb09c3dc1 | Fix morphology tagger | 2016-11-04 19:19:09 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f292f7f0e6 | Fix Issue #599, by considering empty documents to be parsed and tagged. Implementation is a bit dodgy. | 2016-11-02 23:48:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e7af6b937f | Fix syntax error while fixing doc strings | 2016-11-01 13:27:32 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b86f8af0c1 | Fix doc strings | 2016-11-01 12:25:36 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4ca31b4d87 | Fix clobbering of 'missing' named ent values after assigning ents. | 2016-10-26 13:13:56 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 15c9b59f0e | Fix Issue #461: O tag was being clobbered by doc.ents.__set__ | 2016-10-23 15:50:26 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2c3a67b693 | Fix calculation of vector norm, re Issue #522. Need to consolidate the calculations into a helper function. | 2016-10-23 14:49:31 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3588a18fb8 | Fix hook names in doc | 2016-10-19 21:15:16 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5d5742b773 | Add sentiment field to doc, rename getters_for_tokens and getters_for_spans, add user_hooks field to Doc. | 2016-10-19 20:54:22 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9b60186266 | Fix doc class | 2016-10-17 15:23:47 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b67697a97b | Improve API for doc.merge() and span.merge(), to use keyword arguments. | 2016-10-17 14:02:13 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fbb7f3f15c | Add user_data attribute to Doc object. | 2016-10-17 11:43:22 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 62230dd13a | Add getters_for_spans and getters_for_tokens attributes to Doc. Fix docstring | 2016-10-17 02:42:51 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 311a985fe0 | Add input error handling in Doc | 2016-10-16 18:16:42 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 06322ba99d | Add words and spaces keyword arguments to Doc. | 2016-10-16 18:13:03 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6736977d82 | Revert "Changes to Doc and Token for new string store scheme" This reverts commit 99de44d864. | 2016-09-30 20:11:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 99de44d864 | Changes to Doc and Token for new string store scheme | 2016-09-30 20:00:21 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d3dc5718b2 | Fix syntax error in Doc | 2016-09-28 11:39:49 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1b520e7bab | Improve docstrings for Doc object | 2016-09-28 11:15:13 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fc4a7ad794 | Test and fix Issue #411: IndexError when .sents property is used on empty string. | 2016-09-27 18:49:14 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 15e42a1ba9 | Allow entities to be set by Span, or by 4-tuple (with entity ID) | 2016-09-24 01:17:43 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2735b6247b | Fix orths_and_spaces in Doc.__init__ | 2016-09-21 14:52:05 +02:00 |  |