spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-27 06:26:46 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	db50ac524e	Support zipped vector files in init-model	2018-04-10 21:21:00 +02:00
ines	270fcfd925	Fix typo in package command message (closes #2200 )	2018-04-10 19:14:31 +02:00
ines	24d8bf348d	Revert "Add support for .zip to init_model" This reverts commit `7ee880a0ad`.	2018-04-10 19:08:06 +02:00
Matthew Honnibal	7ee880a0ad	Add support for .zip to init_model	2018-04-10 14:30:04 +00:00
ines	5ecb274764	Fix indentation error and set Doc.is_tagged correctly	2018-04-10 16:14:52 +02:00
ines	0e847d7fe5	Fix typo	2018-04-09 14:51:14 +02:00
ines	987ee27af7	Return Doc if noun chunks merger component if Doc is not parsed	2018-04-09 14:51:02 +02:00
Xiaoquan Kong	e2f13ec722	bugfix: `Doc.noun_chunks` call `Doc.noun_chunks_iterator` without checking (closes #2194 )	2018-04-08 23:44:05 +02:00
Jens Dahl Møllerhøj	e5055e3cf6	Add Danish lemmatizer (#2184 ) * add danish lemmatizer * fill contributor agreement	2018-04-07 19:07:28 +02:00
ines	f86e79aa85	Update README section on tests (resolves #2191 )	2018-04-06 16:32:36 +02:00
ines	bccbf538ef	Revert "Check if spaCy has compiled correctly and show error message" This reverts commit `3463ded7cf`.	2018-04-06 15:49:44 +02:00
ines	fb4eda6616	Merge branch 'master' of https://github.com/explosion/spaCy	2018-04-06 00:38:48 +02:00
Matthew Honnibal	0c7fab4443	Set version to 2.0.11	2018-04-04 11:19:11 +02:00
Matthew Honnibal	a350be0601	Fix vector-name loading fix	2018-04-04 01:31:25 +02:00
Matthew Honnibal	21047bde52	Fix syntax error in italian lemmatizer	2018-04-03 23:13:22 +02:00
Matthew Honnibal	81f4005f3d	Fix loading models with pretrained vectors	2018-04-03 23:11:48 +02:00
ines	3463ded7cf	Check if spaCy has compiled correctly and show error message	2018-04-03 22:18:47 +02:00
Matthew Honnibal	96b612873b	Add hyper-parameter to control whether parser makes a beam update	2018-04-03 22:02:56 +02:00
ines	e5f47cd82d	Update errors	2018-04-03 21:40:29 +02:00
Matthew Honnibal	f7e6313b43	Increment version to v2.0.11.dev0	2018-04-03 20:58:47 +02:00
ines	10462816bc	Fix tests for Python 2	2018-04-03 18:51:31 +02:00
ines	62b4b527d7	Don't raise error if set_extension has getter and setter (closes #2177 ) Improve error messages, raise error if setter is specified without a getter and compare against _unset to allow default=None. Also add more tests.	2018-04-03 18:30:17 +02:00
ines	ee3082ad29	Fix whitespace	2018-04-03 18:29:53 +02:00
ines	de137fba84	Add TensorBoard examples to examples overview [ci skip]	2018-04-03 16:01:52 +02:00
ines	6d87b28f15	Add Vietnamese to language overview [ci skip]	2018-04-03 16:01:36 +02:00
Ines Montani	3141e04822	💫 New system for error messages and warnings (#2163 ) * Add spacy.errors module * Update deprecation and user warnings * Replace errors and asserts with new error message system * Remove redundant asserts * Fix whitespace * Add messages for print/util.prints statements * Fix typo * Fix typos * Move CLI messages to spacy.cli._messages * Add decorator to display error code with message An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc. * Remove unused link in spacy.about * Update errors for invalid pipeline components * Improve error for unknown factories * Add displaCy warnings * Update formatting consistency * Move error message to spacy.errors * Update errors and check if doc returned by component is None	2018-04-03 15:50:31 +02:00
Matthew Honnibal	abf8b16d71	Add doc.retokenize() context manager (#2172 ) This patch takes a step towards #1487 by introducing the doc.retokenize() context manager, to handle merging spans, and soon splitting tokens. The idea is to do merging and splitting like this: with doc.retokenize() as retokenizer: for start, end, label in matches: retokenizer.merge(doc[start : end], attrs={'ent_type': label}) The retokenizer accumulates the merge requests, and applies them together at the end of the block. This will allow retokenization to be more efficient, and much less error prone. A retokenizer.split() function will then be added, to handle splitting a single token into multiple tokens. These methods take `Span` and `Token` objects; if the user wants to go directly from offsets, they can append to the .merges and .splits lists on the retokenizer. The doc.merge() method's behaviour remains unchanged, so this patch should be 100% backwards incompatible (modulo bugs). Internally, doc.merge() fixes up the arguments (to handle the various deprecated styles), opens the retokenizer, and makes the single merge. We can later start making deprecation warnings on direct calls to doc.merge(), to migrate people to use of the retokenize context manager.	2018-04-03 14:10:35 +02:00
Matthew Honnibal	8a120fb455	Disable batch size compounding in ud-train	2018-04-01 08:45:00 +00:00
Matthew Honnibal	98165e43a7	Sometimes update beam with greedy oracle	2018-04-01 08:44:35 +00:00
ines	638068ec6c	Restore contributor agreement	2018-03-31 14:06:37 +02:00
Suraj Rajan	1cdbb7c97c	[2032] - Changed python set to cpp stl set (#2170 ) Changed python set to cpp stl set #2032 ## Description Changed python set to cpp stl set. CPP stl set works better due to the logarithmic run time of its methods. Finding minimum in the cpp set is done in constant time as opposed to the worst case linear runtime of python set. Operations such as find,count,insert,delete are also done in either constant and logarithmic time thus making cpp set a better option to manage vectors. Reference : http://www.cplusplus.com/reference/set/set/ ### Types of change Enhancement for `Vectors` for faster initialising of word vectors(fasttext)	2018-03-31 13:28:25 +02:00
Katrin Leinweber	6f84e32253	Formalise citation info (#2167 ) * Create CITATION file * Add Katrinleinweber contributor agreement	2018-03-30 10:34:14 +02:00
Matthew Honnibal	f3b7c5e537	Fix syntax error	2018-03-29 21:50:32 +02:00
Matthew Honnibal	23afa6429f	Add input length error, to address #1826	2018-03-29 21:45:26 +02:00
Matthew Honnibal	cca7e7ad11	Merge branch 'master' of https://github.com/explosion/spaCy	2018-03-29 20:27:06 +02:00
Matthew Honnibal	68ad366935	Improve train_new_entity_type example	2018-03-29 20:26:41 +02:00
Ines Montani	a609a1ca29	Merge pull request #2152 from explosion/feature/tidy-up-dependencies 💫 Tidy up dependencies	2018-03-29 14:35:09 +02:00
Viet Trung Tran	ea2af94cd9	Add support for Vietnamese in spaCy by leveraging Pyvi, an external Vietnamese tokenizer (#2155 ) * support for Vietnamese * Contributor Agreement for adding Vietnamese support on spaCy	2018-03-29 12:19:51 +02:00
Matthew Honnibal	6efb76bb3f	Require next thinc	2018-03-28 23:30:32 +00:00
ines	e6979bdbbd	Merge branch 'feature/tidy-up-dependencies' of https://github.com/explosion/spaCy into feature/tidy-up-dependencies	2018-03-29 00:19:37 +02:00
ines	83146458a2	Fix urllib for Python 3	2018-03-29 00:19:33 +02:00
Matthew Honnibal	8308bbc617	Get msgpack and msgpack_numpy via Thinc, to avoid potential version conflicts	2018-03-29 00:14:55 +02:00
Matthew Honnibal	b5098079d8	Fix error on urllib	2018-03-29 00:08:16 +02:00
Ines Montani	0de599b16b	Merge pull request #2159 from explosion/feature/fix-merged-entity-iob (resolves #1554 , resolves #1752 ) 💫 Fix token.ent_iob after doc.merge(), and ensure consistency in doc.ents	2018-03-28 23:10:00 +02:00
Ines Montani	98e9cda677	Merge pull request #2158 from explosion/feature/fix-multiple-vectors (resolves #1660 ) 💫 Fix loading of multiple vector models	2018-03-28 23:08:24 +02:00
Matthew Honnibal	a7c5ae2beb	Avoid forcing a name on empty vectors, and remove print statement	2018-03-28 21:08:58 +02:00
ines	3eb67bbe4b	Allow entity types with dashes (resolves #1967 )	2018-03-28 20:51:26 +02:00
Matthew Honnibal	cf5fcf0546	Update serialization test	2018-03-28 20:12:53 +02:00
Matthew Honnibal	4555e3e251	Dont assume pretrained_vectors cfg set in build_tagger	2018-03-28 20:12:45 +02:00
ines	9615ed5ed7	Update emoji/hashtag matcher example (resolves #2156 ) [ci skip]	2018-03-28 18:41:28 +02:00

... 2 3 4 5 6 ...

8750 Commits