spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-02-03 21:24:11 +03:00

Author	SHA1	Message	Date
Jim O'Regan	08b0bfd153	merge	2017-10-31 22:55:59 +00:00
Jim O'Regan	00ecfa5417	Ó, not O	2017-10-31 22:54:42 +00:00
ines	ba2e6c8c6f	Update docstrings and formatting	2017-10-31 23:23:34 +01:00
Matthew Honnibal	0de8d213a3	Merge pull request #1475 from explosion/feature/sm-vectors Improve and simplify Vectors class	2017-10-31 22:59:50 +01:00
Ines Montani	25b1d6cd91	Fix syntax error	2017-10-31 22:36:03 +01:00
Matthew Honnibal	92dc127569	Fix test for Python 3	2017-10-31 22:21:55 +01:00
Jim O'Regan	fe4b10346a	replace example sentence until I get around to adding a punctuation.py	2017-10-31 20:24:53 +00:00
Matthew Honnibal	c5799ecc7b	Remove print statement	2017-10-31 21:12:33 +01:00
ines	7e424a1804	Don't copy exception dicts if not necessary and tidy up	2017-10-31 21:05:29 +01:00
Matthew Honnibal	c390f2d745	Make it easier to pass explicit no-pruning to vocab	2017-10-31 20:14:47 +01:00
Ines Montani	06c25a8882	Remove comma that caused list to wrap in tuple! Also removed extra dict wrappings for performance (we used to have them in there, but they should only really exist if copying the dict is absolutely necessary)	2017-10-31 20:13:16 +01:00
Matthew Honnibal	d90a22afe6	Fix loading previous vectors models	2017-10-31 19:58:35 +01:00
Ines Montani	147448b65b	Add missing symbols	2017-10-31 19:34:45 +01:00
Matthew Honnibal	997a61557a	Add vectors.n_keys property	2017-10-31 19:30:52 +01:00
Matthew Honnibal	8075726838	Restore vector usage in models	2017-10-31 19:21:17 +01:00
Matthew Honnibal	3659a807b0	Remove vector pruning arg from train CLI	2017-10-31 19:21:05 +01:00
Ines Montani	9b0de9fb43	Fix import of symbols (now nested one level lower)	2017-10-31 19:17:58 +01:00
Matthew Honnibal	59203a2e8a	Move vector pruning command into spacy vocab cli tool	2017-10-31 19:10:01 +01:00
Matthew Honnibal	77d8f5de9a	Revise and simplify Vectors class	2017-10-31 18:25:08 +01:00
Jim O'Regan	d4a8160c36	change quotes	2017-10-31 15:15:44 +00:00
Jim O'Regan	34ca59691b	no idea what is wrong here	2017-10-31 14:50:13 +00:00
Jim O'Regan	41dd29e48e	merge	2017-10-31 14:07:45 +00:00
Matthew Honnibal	cb5217012f	Fix vector remapping	2017-10-31 11:40:46 +01:00
Matthew Honnibal	9c11ee4a1c	WIP on vectors fixes	2017-10-31 11:22:56 +01:00
Matthew Honnibal	ce876c551e	Fix GPU usage	2017-10-31 02:33:34 +01:00
Matthew Honnibal	7698903617	Fix GPU usage	2017-10-31 02:33:16 +01:00
Matthew Honnibal	368fdb389a	WIP on refactoring and fixing vectors	2017-10-31 02:00:26 +01:00
Matthew Honnibal	4e3006cec7	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-30 19:44:58 +01:00
Matthew Honnibal	4112a991ec	Fix vector pruning	2017-10-30 19:44:40 +01:00
ines	ec657c1ddc	Update vocab docs and document Vocab.prune_vectors	2017-10-30 19:35:41 +01:00
ines	803e41bc66	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-30 18:39:51 +01:00
ines	8e02294241	Add vectors to Language.meta	2017-10-30 18:39:48 +01:00
ines	abf8aa05d3	Populate --create-meta defaults from file if available If meta.json is found in directory and user chooses to overwrite it, show existing data as defaults.	2017-10-30 18:39:38 +01:00
ines	ce98fa7934	Fix formatting	2017-10-30 18:38:55 +01:00
ines	98c35d2585	Fix spacy vocab command	2017-10-30 18:38:41 +01:00
Matthew Honnibal	e98451b5f7	Add -prune-vectors argument to spacy.cly.train	2017-10-30 18:00:10 +01:00
Matthew Honnibal	e026b29ea9	Add prune_vectors method to Vocab	2017-10-30 17:59:43 +01:00
Explosion Bot	d0cf12c8c7	Fix off-by-one error in vectors	2017-10-30 16:22:03 +01:00
Explosion Bot	05a1dd570e	Fix vocab script	2017-10-30 16:19:22 +01:00
Explosion Bot	b46bdce8d2	Add missing import	2017-10-30 16:18:10 +01:00
Explosion Bot	2d2cc294b4	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-30 16:15:05 +01:00
Explosion Bot	0fc1209421	Wire up new vocab command	2017-10-30 16:14:50 +01:00
Explosion Bot	aa64031751	Fix clear_vectors() method on Vocab	2017-10-30 16:09:04 +01:00
Explosion Bot	7b56b2f04b	Add Vocab.cfg attr, to hold stuff like oov probs	2017-10-30 16:08:50 +01:00
Explosion Bot	ab5d5ed880	Fix vectors.add()	2017-10-30 16:08:09 +01:00
Explosion Bot	41d0f1665a	Fix add_attrs for cluster	2017-10-30 16:07:50 +01:00
ines	5453821a9f	Update NER annotation scheme Add note on training data sources and include coarse-grained Wikipedia scheme	2017-10-30 13:53:49 +01:00
Explosion Bot	5ede7cec9b	Improve Lexeme.set_attrs method	2017-10-30 11:49:11 +01:00
Explosion Bot	72aea8f105	Update vectors.add() to allow setting keys to rows	2017-10-30 10:03:08 +01:00
Matthew Honnibal	c43cc5361d	Merge pull request #1467 from explosion/feature/better-parser 💫 Bug fixes to parser model (requires retraining)	2017-10-29 02:05:22 +02:00
ines	6c2d8d3b2a	Use shortcuts-nightly.json to resolve model shortcuts	2017-10-29 01:28:31 +02:00
Matthew Honnibal	a0c7dabb72	Fix bug in 8-token parser features	2017-10-28 23:01:35 +00:00
Matthew Honnibal	b713d10d97	Switch to 13 features in parser	2017-10-28 23:01:14 +00:00
Matthew Honnibal	3b91097321	Whitespace	2017-10-28 17:05:11 +00:00
Matthew Honnibal	6ef72864fa	Improve initialization for hidden layers	2017-10-28 17:05:01 +00:00
Matthew Honnibal	5414e2f14b	Use missing features in parser	2017-10-28 16:45:54 +00:00
Matthew Honnibal	df4803cc6d	Add learned missing values for parser	2017-10-28 16:45:14 +00:00
Matthew Honnibal	64e4ff7c4b	Merge 'tidy-up' changes into branch. Resolve conflicts	2017-10-28 13:16:06 +02:00
Explosion Bot	fb0c96f39a	Fix optimizer loading	2017-10-28 11:58:16 +02:00
Explosion Bot	b22e42af7f	Merge changes to parser and _ml	2017-10-28 11:52:10 +02:00
ines	d96e72f656	Tidy up rest	2017-10-27 21:07:59 +02:00
ines	a8e10f94e4	Tidy up Lexeme and update docs	2017-10-27 21:07:50 +02:00
ines	ba5e646219	Tidy up pipeline	2017-10-27 20:29:08 +02:00
ines	b4d226a3f1	Tidy up syntax	2017-10-27 19:45:57 +02:00
ines	5167a0cce2	Tidy up Vectors and docs	2017-10-27 19:45:19 +02:00
ines	7946464742	Remove spacy.tagger (now in pipeline)	2017-10-27 19:45:04 +02:00
ines	9c89e2cdef	Remove unused syntax iterators (now in language data)	2017-10-27 18:09:53 +02:00
ines	d2df81d907	Fix not implemented Span getters	2017-10-27 18:09:28 +02:00
ines	544a407b93	Tidy up Doc, Token and Span and add missing docs	2017-10-27 17:07:26 +02:00
ines	a6135336f5	Tidy up gold	2017-10-27 17:02:55 +02:00
ines	6a0483b7aa	Tidy up and document Doc, Token and Span	2017-10-27 15:41:45 +02:00
ines	1a559d4c95	Remove old, unused file	2017-10-27 15:34:35 +02:00
ines	91899d337b	Tidy up language, lemmatizer and scorer	2017-10-27 14:40:14 +02:00
ines	778212efea	Tidy up init and main	2017-10-27 14:39:51 +02:00
ines	e33b7e0b3c	Tidy up parser and ML	2017-10-27 14:39:30 +02:00
ines	e3265998c0	Tidy up displaCy	2017-10-27 14:39:19 +02:00
ines	ea4a41c8fb	Tidy up util and helpers	2017-10-27 14:39:09 +02:00
ines	d941fc3667	Tidy up CLI	2017-10-27 14:38:39 +02:00
Matthew Honnibal	531142a933	Merge remote-tracking branch 'origin/develop' into feature/better-parser	2017-10-27 12:34:48 +00:00
Matthew Honnibal	19a2b9bf27	Fix import of Optimizer	2017-10-27 12:33:42 +00:00
Matthew Honnibal	4d048e94d3	Add compat for thinc.neural.optimizers.Optimizer	2017-10-27 10:23:49 +00:00
Ines Montani	4033e70c71	Merge pull request #1461 from explosion/feature/disable-pipes 💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples	2017-10-27 12:21:40 +02:00
Matthew Honnibal	75a637fa43	Remove redundant imports from _ml	2017-10-27 10:19:56 +00:00
Matthew Honnibal	c9987cf131	Avoid use of numpy.tensordot	2017-10-27 10:18:36 +00:00
Matthew Honnibal	f6fef30adc	Remove dead code from spacy._ml	2017-10-27 10:16:41 +00:00
Matthew Honnibal	b9616419e1	Add try/except around bz2 import	2017-10-27 01:18:05 +00:00
Matthew Honnibal	783c0c8795	Remove unnecessary bz2 import	2017-10-27 01:17:54 +00:00
Matthew Honnibal	bb25bdcd92	Adjust call to scatter_add for the new version	2017-10-27 01:16:55 +00:00
Ines Montani	287a3ca256	Merge pull request #1466 from explosion/feature/rename-pipeline 💫 Clean up dead linear model code	2017-10-27 02:03:28 +02:00
ines	4eb5bd02e7	Update textcat pre-processing after to_array change	2017-10-27 00:32:12 +02:00
ines	2d6ec99884	Set 'model' as default model name to prevent meta.json errors	2017-10-26 16:12:23 +02:00
ines	9e372913e0	Remove old 'SP' condition in tag map	2017-10-26 16:11:57 +02:00
Matthew Honnibal	c52671420c	Remove old cfile import	2017-10-26 13:28:19 +02:00
Matthew Honnibal	ea03f1ef64	Remove obsolete cfile code	2017-10-26 13:23:36 +02:00
Matthew Honnibal	90d1d9b230	Remove obsolete parser code	2017-10-26 13:22:45 +02:00
ines	6f78e29bed	Add LAW entity label to glossary	2017-10-26 13:04:35 +02:00
ines	9bf78d5fb3	Update spacy.explain docs	2017-10-26 13:04:25 +02:00
Matthew Honnibal	33f8c58782	Remove obsolete parser.pyx	2017-10-26 12:42:05 +02:00
Matthew Honnibal	a8abc47811	Rename BaseThincComponent --> Pipe	2017-10-26 12:40:40 +02:00
Matthew Honnibal	b0f3ea2200	Fix names of pipeline components NeuralDependencyParser --> DependencyParser NeuralEntityRecognizer --> EntityRecognizer TokenVectorEncoder --> Tensorizer NeuralLabeller --> MultitaskObjective	2017-10-26 12:38:23 +02:00
Matthew Honnibal	b6b4f1aaf7	Merge pull request #1462 from explosion/feature/vector-meta-data 💫 Add vector meta data to model meta.json on train/package and show in docs	2017-10-26 11:39:41 +02:00
Matthew Honnibal	35977bdbb9	Update better-parser branch with develop	2017-10-26 00:55:53 +00:00
Ines Montani	090bd00369	Merge pull request #1464 from mayukh18/develop_bengali_pronouns added the bengali pronouns for v2.0	2017-10-25 21:55:25 +02:00
mayukh18	1bc07758fa	added few bengali pronouns	2017-10-25 22:24:40 +05:30
ines	de1e5f35d5	Merge branch 'develop' into feature/disable-pipes	2017-10-25 16:33:12 +02:00
ines	728b609bf9	Merge branch 'develop' into feature/vector-meta-data	2017-10-25 16:32:22 +02:00
ines	c0b55ebdac	Fix PhraseMatcher.__contains__ and add more tests	2017-10-25 16:31:11 +02:00
ines	91beacf5e3	Fix Matcher.__contains__	2017-10-25 16:19:38 +02:00
ines	11e3f19764	Fix vectors data added after training (see #1457 )	2017-10-25 16:08:26 +02:00
ines	057954695b	Read pipeline and vector data off model in --generate-meta	2017-10-25 16:03:26 +02:00
ines	273e638183	Add vector data to model meta after training (see #1457 )	2017-10-25 16:03:05 +02:00
ines	18aae423fb	Remove import of non-existing function	2017-10-25 15:54:10 +02:00
ines	5117a7d24d	Fix whitespace	2017-10-25 15:54:02 +02:00
ines	657a4d91bc	Merge branch 'develop' into feature/disable-pipes	2017-10-25 15:19:05 +02:00
ines	1a722dac31	Merge branch 'develop' into feature/disable-pipes	2017-10-25 15:18:18 +02:00
ines	6a00de4f77	Fix check of unexpected pipe names in restore()	2017-10-25 14:56:35 +02:00
ines	7f03932477	Return self on __enter__	2017-10-25 14:56:16 +02:00
Matthew Honnibal	b5de768852	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-25 14:44:16 +02:00
Matthew Honnibal	094512fd47	Fix model-mark on regression test.	2017-10-25 14:44:00 +02:00
Matthew Honnibal	e70f80f29e	Add Language.disable_pipes()	2017-10-25 13:46:41 +02:00
Matthew Honnibal	075e8118ea	Update from develop	2017-10-25 12:45:21 +02:00
ines	72497c8cb2	Remove comments and add TODO	2017-10-25 12:15:43 +02:00
ines	4d97efc3b5	Add missing docstrings	2017-10-25 12:10:16 +02:00
ines	1262aa0bf9	Implement PhraseMatcher.__contains__	2017-10-25 12:10:04 +02:00
ines	9c733a8849	Implement PhraseMatcher.__len__	2017-10-25 12:09:56 +02:00
ines	7eebeeaf85	Fix Matcher.__contains__	2017-10-25 12:09:47 +02:00
ines	7bcec57462	Remove unused attribute	2017-10-25 12:08:54 +02:00
ines	0b1dcbac14	Remove unused function	2017-10-25 12:08:46 +02:00
ines	3484174e48	Add Language.path	2017-10-25 11:57:43 +02:00
Ines Montani	d3bf488e16	Merge pull request #1171 from mollerhoj/support-danish Improve basic support for Danish	2017-10-24 20:29:57 +02:00
Matthew Honnibal	d9bb1e5de8	Increment version	2017-10-24 17:06:19 +02:00
Matthew Honnibal	908809d488	Update tests	2017-10-24 17:05:15 +02:00
Matthew Honnibal	66766c1454	Restore SP tag to English tag_map, until models migrate	2017-10-24 17:05:00 +02:00
Matthew Honnibal	30e67fa808	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-24 16:08:23 +02:00
Matthew Honnibal	b0f6fd3f1d	Disable tokenizer cache for special-cases. Fixes #1250	2017-10-24 16:08:05 +02:00
Matthew Honnibal	63f0bde749	Add test for #1250 : Tokenizer cache clobbered special-case attrs	2017-10-24 16:07:18 +02:00
ines	8492d5be6d	Always make lemmatizer return a list of lemmas, not a set	2017-10-24 16:00:56 +02:00
ines	95f866f99f	Add lookup argument to Lemmatizer.load	2017-10-24 16:00:56 +02:00
ines	95f6174516	Remove tensorizer from model pipeline example in spacy package	2017-10-24 16:00:56 +02:00
ines	090aed940a	Add test for currently failing span.as_doc case	2017-10-24 16:00:56 +02:00
ines	4ef81a9ebc	Fix whitespace	2017-10-24 16:00:56 +02:00
Matthew Honnibal	18f1c1d0ba	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-24 14:29:43 +02:00
Matthew Honnibal	4bea65a1a8	Fix Issue #1450 : Off-by-1 in * and ? matches Patterns that end in variable-length operators e.g. * and ? now end on the correct token. Previously, they were off by 1: the next token was pulled into the match, even if that's where the pattern failed.	2017-10-24 14:26:27 +02:00
Matthew Honnibal	391d5ef0d1	Normalize imports in regression test	2017-10-24 14:25:49 +02:00
ines	c55db0a4a1	Add example sentences for Japanese and Chinese (see #1107 )	2017-10-24 13:02:24 +02:00
ines	66f8f9d4a0	Fix Japanese tokenizer JapaneseTokenizer now returns a Doc, not individual words	2017-10-24 13:02:19 +02:00
Matthew Honnibal	dd5b2d8fa3	Check for out-of-memory when calling calloc. Closes #1446	2017-10-24 12:40:47 +02:00
Matthew Honnibal	b66b8f028b	Fix #1375 -- out-of-bounds on token.nbor()	2017-10-24 12:10:39 +02:00
Matthew Honnibal	a68d89a4f3	Add failing test for bug #1375 -- no out-of-bounds error for token.nbor()	2017-10-24 12:05:25 +02:00
Ines Montani	facf77e541	Merge branch 'develop' into support-danish	2017-10-24 11:53:19 +02:00
Matthew Honnibal	ccd2ab1a62	Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix Add LCA matrix for spans and docs	2017-10-24 11:22:46 +02:00
Matthew Honnibal	ef3e5a361b	Merge pull request #1442 from explosion/feature/fix-sp 💫Fix SP tag, tweak Vectors.__init__, fix Morphology	2017-10-24 10:24:07 +02:00
Matthew Honnibal	fdf25d10ba	Merge pull request #1440 from ramananbalakrishnan/develop Support single value for attribute list in doc.to_array	2017-10-24 10:23:12 +02:00
Matthew Honnibal	e7556ff048	Fix non-maxout parser	2017-10-23 18:16:23 +02:00
ines	a31f048b4d	Fix formatting	2017-10-23 10:38:06 +02:00
Matthew Honnibal	490ad3eaf0	Check that empty strings are handled. Closes #1242	2017-10-21 00:52:14 +02:00
Matthew Honnibal	8f8bccecb9	Patch deserialisation for invalid loads, to avoid model failure	2017-10-21 00:51:42 +02:00
Ramanan Balakrishnan	d2fe56a577	Add LCA matrix for spans and docs	2017-10-20 23:58:00 +05:30
Matthew Honnibal	d8391b1c4d	Fix #1434 : Matcher failed on ending ? if no token	2017-10-20 16:49:36 +02:00
Matthew Honnibal	fec53f09f7	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-20 16:28:34 +02:00
Matthew Honnibal	f111b228e0	Fix re-parsing of previously parsed text If a Doc object had been previously parsed, it was possible for invalid parses to be added. There were two problems: 1) The parse was only being partially erased 2) The RightArc action was able to create a 1-cycle. This patch fixes both errors, and avoids resetting the parse if one is present. In theory this might allow a better parse to be predicted by running the parser twice. Closes #1253.	2017-10-20 16:27:36 +02:00
Matthew Honnibal	1036798155	Make parser consistent if maxout==1	2017-10-20 16:24:16 +02:00
Matthew Honnibal	3faf9189a2	Make parser hidden shape consistent even if maxout==1	2017-10-20 16:23:31 +02:00
Matthew Honnibal	9010a1a060	Create vectors correctly	2017-10-20 14:19:46 +02:00
Matthew Honnibal	33229b1c9e	Remove print statement	2017-10-20 14:19:29 +02:00
Matthew Honnibal	cfae54c507	Make change to Vectors.__init__	2017-10-20 14:19:04 +02:00
Matthew Honnibal	ebecaddb76	Make 'data_or_width' two keyword args in Vectors.__init__ Previously the data and width options were one argument in Vectors, which meant you couldn't say vectors = Vectors(strings, width=300). It's better to have two keywords.	2017-10-20 14:17:15 +02:00
Matthew Honnibal	49895fbef6	Rename 'SP' special tag to '_SP' Renaming the tag with an underscore lets us add it to the tag map without worrying that we'll change the sequence of tags, which throws off the tag-to-ID mapping. For instance, if we inserted a 'SP' tag, the "VERB" tag is pushed to a different class ID, and the model is all messed up.	2017-10-20 14:01:12 +02:00
Matthew Honnibal	506cf2eb13	Remove cpdef enum, to avoid too much code generation	2017-10-20 14:00:23 +02:00
Matthew Honnibal	6218af0105	Remove cpdef enum, to avoid too much code generation	2017-10-20 13:59:57 +02:00
Matthew Honnibal	92ac9316b5	Fix initialization of vectors, to address serialization problem	2017-10-20 13:59:24 +02:00
Ramanan Balakrishnan	0726946563	cleanup to_array implementation using fixes on master	2017-10-20 17:09:37 +05:30
ines	108f1f786e	Update symbols and document missing token attributes (see #1439 )	2017-10-20 13:08:44 +02:00
ines	4acab77a8a	Add missing symbol for LAW entities (resolves #1427 )	2017-10-20 13:07:57 +02:00
Matthew Honnibal	b101736555	Fix precomputed layer	2017-10-20 12:14:52 +02:00
Ramanan Balakrishnan	b3ab124fc5	Support strings for attribute list in doc.to_array	2017-10-20 11:46:57 +05:30
Matthew Honnibal	64658e02e5	Implement fancier initialisation for precomputed layer	2017-10-20 03:07:45 +02:00
Matthew Honnibal	827cd8a883	Fix support of maxout pieces in parser	2017-10-20 03:07:17 +02:00
Matthew Honnibal	a8850b4282	Remove redundant PrecomputableMaxouts class	2017-10-19 20:27:34 +02:00
Matthew Honnibal	a17a1b60c7	Clean up redundant PrecomputableMaxouts class	2017-10-19 20:26:37 +02:00
Matthew Honnibal	b00d0a2c97	Fix bias in parser	2017-10-19 18:42:11 +02:00
Matthew Honnibal	b54b4b8a97	Make parser_maxout_pieces hyper-param work	2017-10-19 13:45:18 +02:00
Matthew Honnibal	03a215c5fd	Make PrecomputableAffines work	2017-10-19 13:44:49 +02:00
Ramanan Balakrishnan	7b9b1be44c	Support single value for attribute list in doc.to_array	2017-10-19 17:00:41 +05:30
Matthew Honnibal	61bc203f3f	Merge pull request #1438 from explosion/feature/fast-parser 💫 Improve runtime CPU efficiency of parser/NER	2017-10-19 02:42:21 +02:00
Matthew Honnibal	15e5a04a8d	Clean up more depth=0 conditional code	2017-10-19 01:48:43 +02:00
Matthew Honnibal	906c50ac59	Fix loop typing, that caused error on windows	2017-10-19 01:48:39 +02:00
ines	24512420b1	Show error if data_path does not exist or is None (see #1102 )	2017-10-19 00:53:49 +02:00
ines	bf415fd778	Add test for serializing extension attrs (see #1085 )	2017-10-19 00:53:08 +02:00
Matthew Honnibal	960788aaa2	Eliminate dead code in parser, and raise errors for obsolete options	2017-10-19 00:42:34 +02:00
Matthew Honnibal	bbfd7d8d5d	Clean up parser multi-threading	2017-10-19 00:25:21 +02:00
Matthew Honnibal	f018f2030c	Try optimized parser forward loop	2017-10-18 21:48:00 +02:00
Matthew Honnibal	65bf5e85bd	Improve piping in language.pipe	2017-10-18 21:46:12 +02:00
Matthew Honnibal	633a75c7e0	Break parser batches into sub-batches, sorted by length.	2017-10-18 21:45:01 +02:00
Ines Montani	f0d577e460	Merge pull request #1425 from explosion/feature/hindi-tokenizer 💫 Basic Hindi tokenization support	2017-10-18 13:34:52 +02:00
Matthew Honnibal	394633efce	Make doc pickling support hooks	2017-10-17 19:44:09 +02:00
Matthew Honnibal	fe844148f6	Test pickling hooks	2017-10-17 19:43:52 +02:00
Matthew Honnibal	cdb0c426d8	Improve deserialization of user_data, esp. for Underscore	2017-10-17 19:29:20 +02:00
Matthew Honnibal	374819edf8	Test user_data deserialization, re #1085	2017-10-17 19:28:54 +02:00
Matthew Honnibal	e35a83d142	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-17 18:22:06 +02:00
Matthew Honnibal	f45973848c	Rename 'tokens' variable 'doc' in tokenizer	2017-10-17 18:21:41 +02:00
Matthew Honnibal	839de87ca9	Make lambda func a named function, for pickling	2017-10-17 18:21:20 +02:00
Matthew Honnibal	9baa8fe7ec	Convert closure to functools.partial, to promote pickling	2017-10-17 18:20:52 +02:00
Matthew Honnibal	32a8564c79	Fix doc pickling	2017-10-17 18:20:24 +02:00
Matthew Honnibal	8ca97f32a3	Fix doc pickling test	2017-10-17 18:19:57 +02:00
Matthew Honnibal	9ce7d6af87	Make lex attr functions top-level functions, to promote pickling	2017-10-17 18:19:18 +02:00
Matthew Honnibal	1cc85a89ef	Allow reasonably efficient pickling of Language class, using to_bytes() and from_bytes().	2017-10-17 18:18:49 +02:00
Matthew Honnibal	0d57b9748a	Serialize lex_attr_getters with dill, for better pickle support	2017-10-17 18:17:45 +02:00
Matthew Honnibal	45d1dd90b1	Add tests for pickling doc	2017-10-17 17:20:58 +02:00
Ines Montani	afa67de7ee	Merge pull request #1428 from roanuz/develop Fix trailing whitespace and Language.from_disk overwrites	2017-10-17 16:29:15 +02:00
Matthew Honnibal	92c1eb2d6f	Fix Doc pickling. This also removes need for Binder class	2017-10-17 16:11:13 +02:00
Matthew Honnibal	ed8da9b11f	Add missing return statement in SentenceSegmenter	2017-10-17 15:32:56 +02:00
Ines Montani	aab299c8ae	Merge pull request #1429 from vishnunekkanti/develop fix syntax error in zh	2017-10-17 14:45:02 +02:00
Anto Binish Kaspar	534240648e	Fix trailing whitespace on morphology features	2017-10-17 17:15:58 +05:30
Anto Binish Kaspar	8f5b60c168	Fix Language.from_disk overwrites the meta.json file.	2017-10-17 17:15:32 +05:30
ines	8ca344712d	Add Language.has_pipe method	2017-10-17 11:20:07 +02:00
ines	485c4f6df5	Add Hungarian examples (see #1107 )	2017-10-17 02:37:45 +02:00
Matthew Honnibal	19531bad4c	Merge branch 'develop' into feature/streaming-data-memory-growth	2017-10-16 21:44:11 +02:00
Matthew Honnibal	df488274b1	Fix deserialization of vectors	2017-10-16 20:55:00 +02:00
Matthew Honnibal	4018486d31	Merge remote-tracking branch 'origin/develop' into feature/streaming-data-memory-growth	2017-10-16 20:49:48 +02:00
Matthew Honnibal	4174477161	Fix equality check in test	2017-10-16 19:50:35 +02:00
Matthew Honnibal	2bc06e4b22	Bump rolling buffer size to 10k	2017-10-16 19:38:29 +02:00
Matthew Honnibal	66e2eb8f39	Clean up remnant of frozen in StringStore	2017-10-16 19:34:41 +02:00
Matthew Honnibal	a002264fec	Remove caching of Token in Doc, as caused cycle.	2017-10-16 19:34:21 +02:00
Matthew Honnibal	3e037054c8	Remove obsolete is_frozen functionality from StringStore	2017-10-16 19:23:10 +02:00
Matthew Honnibal	5c14f3f033	Create a rolling buffer for the StringStore in Language.pipe()	2017-10-16 19:22:40 +02:00
Matthew Honnibal	59c216196c	Allow weakrefs on Doc objects	2017-10-16 19:22:11 +02:00
ines	d5418553eb	Fix whitespace	2017-10-16 18:30:04 +02:00
ines	6ceadcdb5c	Make sure from_disk passes string to numpy (see #1421 ) If path is a WindowsPath, numpy does not recognise it as a path and as a result, doesn't open the file. https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L369	2017-10-16 18:29:56 +02:00
Matthew Honnibal	010a7309ff	Merge pull request #1402 from explosion/feature/fix-matcher-operators 💫 Fix Matcher variable-length operators	2017-10-16 17:53:19 +02:00
Matthew Honnibal	c29927d2e7	Fix matcher test	2017-10-16 17:22:18 +02:00
Vishnu Kumar Nekkanti	d3c54cf39a	fixed SyntaxError while checking for jieba	2017-10-16 18:51:33 +05:30
Matthew Honnibal	a928ae2f35	Merge branch 'develop' into feature/fix-matcher-operators	2017-10-16 13:38:36 +02:00
Matthew Honnibal	56aa42cc5d	Fix and document matcher operator 'shadowing' behaviour	2017-10-16 13:38:20 +02:00
Matthew Honnibal	748d525801	Add more matcher operator tests	2017-10-16 13:38:01 +02:00
Matthew Honnibal	0433181658	Document operator semantics in Matcher docstring	2017-10-16 12:06:33 +02:00
ines	266e7180a7	Add Language class, stop words and basic stemmer that sets NORM	2017-10-14 14:59:52 +02:00
ines	e85e1d571b	Update base punctuation	2017-10-14 14:59:23 +02:00
ines	9d6c8eaa49	Update base norm exceptions with more unicode characters e.g. unicode variations of punctuation used in Chinese	2017-10-14 14:58:52 +02:00
ines	3516aa0cea	Port over changes from #1389	2017-10-14 13:32:55 +02:00
ines	cd6a29dce7	Port over changes from #1294	2017-10-14 13:28:46 +02:00
ines	38c756fd85	Port over changes from #1287	2017-10-14 13:16:21 +02:00
ines	612224c10d	Port over changes from #1157	2017-10-14 13:11:39 +02:00
ines	9b3f8f9ec3	Fix formatting and add comment on languages	2017-10-14 13:11:18 +02:00
ines	a4d974d97b	Port over URL pattern changes from #1411	2017-10-14 12:58:07 +02:00
ines	09aed58140	Port over changes from #1333 and add comments	2017-10-14 12:52:59 +02:00
Matthew Honnibal	cf6da9301a	Update lemmatizer test	2017-10-12 22:50:52 +02:00
Matthew Honnibal	9b90d235d1	Fix tag check in lemmatizer	2017-10-12 22:50:43 +02:00
Matthew Honnibal	dc01acd821	Escape encoding in validate function	2017-10-12 22:23:21 +02:00
Matthew Honnibal	27b927259a	Add locale_escape compat function	2017-10-12 22:22:04 +02:00

... 3 4 5 6 7 ...

4498 Commits