spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-25 17:36:30 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	0c7fab4443	Set version to 2.0.11	2018-04-04 11:19:11 +02:00
Matthew Honnibal	a350be0601	Fix vector-name loading fix	2018-04-04 01:31:25 +02:00
Matthew Honnibal	21047bde52	Fix syntax error in italian lemmatizer	2018-04-03 23:13:22 +02:00
Matthew Honnibal	81f4005f3d	Fix loading models with pretrained vectors	2018-04-03 23:11:48 +02:00
ines	3463ded7cf	Check if spaCy has compiled correctly and show error message	2018-04-03 22:18:47 +02:00
Matthew Honnibal	96b612873b	Add hyper-parameter to control whether parser makes a beam update	2018-04-03 22:02:56 +02:00
ines	e5f47cd82d	Update errors	2018-04-03 21:40:29 +02:00
Matthew Honnibal	f7e6313b43	Increment version to v2.0.11.dev0	2018-04-03 20:58:47 +02:00
ines	10462816bc	Fix tests for Python 2	2018-04-03 18:51:31 +02:00
ines	62b4b527d7	Don't raise error if set_extension has getter and setter (closes #2177 ) Improve error messages, raise error if setter is specified without a getter and compare against _unset to allow default=None. Also add more tests.	2018-04-03 18:30:17 +02:00
ines	ee3082ad29	Fix whitespace	2018-04-03 18:29:53 +02:00
Ines Montani	3141e04822	💫 New system for error messages and warnings (#2163 ) * Add spacy.errors module * Update deprecation and user warnings * Replace errors and asserts with new error message system * Remove redundant asserts * Fix whitespace * Add messages for print/util.prints statements * Fix typo * Fix typos * Move CLI messages to spacy.cli._messages * Add decorator to display error code with message An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc. * Remove unused link in spacy.about * Update errors for invalid pipeline components * Improve error for unknown factories * Add displaCy warnings * Update formatting consistency * Move error message to spacy.errors * Update errors and check if doc returned by component is None	2018-04-03 15:50:31 +02:00
Matthew Honnibal	abf8b16d71	Add doc.retokenize() context manager (#2172 ) This patch takes a step towards #1487 by introducing the doc.retokenize() context manager, to handle merging spans, and soon splitting tokens. The idea is to do merging and splitting like this: with doc.retokenize() as retokenizer: for start, end, label in matches: retokenizer.merge(doc[start : end], attrs={'ent_type': label}) The retokenizer accumulates the merge requests, and applies them together at the end of the block. This will allow retokenization to be more efficient, and much less error prone. A retokenizer.split() function will then be added, to handle splitting a single token into multiple tokens. These methods take `Span` and `Token` objects; if the user wants to go directly from offsets, they can append to the .merges and .splits lists on the retokenizer. The doc.merge() method's behaviour remains unchanged, so this patch should be 100% backwards incompatible (modulo bugs). Internally, doc.merge() fixes up the arguments (to handle the various deprecated styles), opens the retokenizer, and makes the single merge. We can later start making deprecation warnings on direct calls to doc.merge(), to migrate people to use of the retokenize context manager.	2018-04-03 14:10:35 +02:00
Suraj Rajan	1cdbb7c97c	[2032] - Changed python set to cpp stl set (#2170 ) Changed python set to cpp stl set #2032 ## Description Changed python set to cpp stl set. CPP stl set works better due to the logarithmic run time of its methods. Finding minimum in the cpp set is done in constant time as opposed to the worst case linear runtime of python set. Operations such as find,count,insert,delete are also done in either constant and logarithmic time thus making cpp set a better option to manage vectors. Reference : http://www.cplusplus.com/reference/set/set/ ### Types of change Enhancement for `Vectors` for faster initialising of word vectors(fasttext)	2018-03-31 13:28:25 +02:00
Matthew Honnibal	f3b7c5e537	Fix syntax error	2018-03-29 21:50:32 +02:00
Matthew Honnibal	23afa6429f	Add input length error, to address #1826	2018-03-29 21:45:26 +02:00
Ines Montani	a609a1ca29	Merge pull request #2152 from explosion/feature/tidy-up-dependencies 💫 Tidy up dependencies	2018-03-29 14:35:09 +02:00
Viet Trung Tran	ea2af94cd9	Add support for Vietnamese in spaCy by leveraging Pyvi, an external Vietnamese tokenizer (#2155 ) * support for Vietnamese * Contributor Agreement for adding Vietnamese support on spaCy	2018-03-29 12:19:51 +02:00
ines	e6979bdbbd	Merge branch 'feature/tidy-up-dependencies' of https://github.com/explosion/spaCy into feature/tidy-up-dependencies	2018-03-29 00:19:37 +02:00
ines	83146458a2	Fix urllib for Python 3	2018-03-29 00:19:33 +02:00
Matthew Honnibal	8308bbc617	Get msgpack and msgpack_numpy via Thinc, to avoid potential version conflicts	2018-03-29 00:14:55 +02:00
Matthew Honnibal	b5098079d8	Fix error on urllib	2018-03-29 00:08:16 +02:00
Ines Montani	0de599b16b	Merge pull request #2159 from explosion/feature/fix-merged-entity-iob (resolves #1554 , resolves #1752 ) 💫 Fix token.ent_iob after doc.merge(), and ensure consistency in doc.ents	2018-03-28 23:10:00 +02:00
Ines Montani	98e9cda677	Merge pull request #2158 from explosion/feature/fix-multiple-vectors (resolves #1660 ) 💫 Fix loading of multiple vector models	2018-03-28 23:08:24 +02:00
Matthew Honnibal	a7c5ae2beb	Avoid forcing a name on empty vectors, and remove print statement	2018-03-28 21:08:58 +02:00
ines	3eb67bbe4b	Allow entity types with dashes (resolves #1967 )	2018-03-28 20:51:26 +02:00
Matthew Honnibal	cf5fcf0546	Update serialization test	2018-03-28 20:12:53 +02:00
Matthew Honnibal	4555e3e251	Dont assume pretrained_vectors cfg set in build_tagger	2018-03-28 20:12:45 +02:00
Matthew Honnibal	0b375d50c8	Fix ent_iob tags in doc.merge to avoid inconsistent sequences	2018-03-28 18:39:03 +02:00
Matthew Honnibal	95fa89c4b8	Update doc.ents test	2018-03-28 18:39:03 +02:00
Matthew Honnibal	e807f88410	Resolve merge when cherry-picking ent iob patches from develop	2018-03-28 18:38:13 +02:00
Matthew Honnibal	99fbc7db33	Improve error message when entity sequence is inconsistent	2018-03-28 18:36:53 +02:00
Matthew Honnibal	cbd2794be0	Add test for ent_iob during span merge	2018-03-28 18:36:53 +02:00
Matthew Honnibal	f8dd905a24	Warn and fallback if vectors have no name	2018-03-28 18:24:53 +02:00
Matthew Honnibal	fd9e259414	Add test for #1660	2018-03-28 18:22:51 +02:00
Matthew Honnibal	bc4afa9881	Remove print statement	2018-03-28 17:48:37 +02:00
Matthew Honnibal	79dc241caa	Set pretrained_vectors in parser cfg	2018-03-28 17:35:07 +02:00
Matthew Honnibal	17c3e7efa2	Add message noting vectors	2018-03-28 16:33:43 +02:00
Matthew Honnibal	9bf6e93b3e	Set pretrained_vectors in begin_training	2018-03-28 16:32:41 +02:00
Matthew Honnibal	95a9615221	Fix loading of multiple pre-trained vectors This patch addresses #1660, which was caused by keying all pre-trained vectors with the same ID when telling Thinc how to refer to them. This meant that if multiple models were loaded that had pre-trained vectors, errors or incorrect behaviour resulted. The vectors class now includes a .name attribute, which defaults to: {nlp.meta['lang']_nlp.meta['name']}.vectors The vectors name is set in the cfg of the pipeline components under the key pretrained_vectors. This replaces the previous cfg key pretrained_dims. In order to make existing models compatible with this change, we check for the pretrained_dims key when loading models in from_disk and from_bytes, and add the cfg key pretrained_vectors if we find it.	2018-03-28 16:02:59 +02:00
ines	7fbc9e5874	Replace requests with urllib	2018-03-28 12:46:07 +02:00
ines	da1f200362	Add compat helpers for urllib	2018-03-28 12:45:53 +02:00
ines	ac88c72c9a	Fix ftfy workaround and remove old import	2018-03-28 12:14:28 +02:00
ines	ce6071ca89	Remove ftfy dependency and update docs	2018-03-28 12:09:42 +02:00
Matthew Honnibal	070b6c6495	Remove dependency on ftfy	2018-03-28 12:07:02 +02:00
ines	6d2c85f428	Drop six and related hacks as a dependency	2018-03-28 10:45:25 +02:00
ines	9e83513004	Add position of invalid token to error message	2018-03-27 23:56:59 +02:00
ines	11c4735ccf	Fix issue in Italian lemmatizer data (resolves #2050 )	2018-03-27 23:55:22 +02:00
ines	693971dd8f	Improve error message if token text is empty string (see #2101 )	2018-03-27 22:25:40 +02:00
ines	0c829e6605	Fix whitespace	2018-03-27 22:20:59 +02:00
Matthew Honnibal	d4680e4d83	Merge branch 'master' of https://github.com/explosion/spaCy	2018-03-27 13:36:37 +02:00
Matthew Honnibal	63a267b34d	Fix #2073 : Token.set_extension not working	2018-03-27 13:36:20 +02:00
Ines Montani	68226109f4	Merge pull request #2142 from jimregan/polish-more-tokens more exceptions	2018-03-24 19:06:44 +01:00
Matthew Honnibal	d566e673bf	Set version to v2.0.10	2018-03-24 18:09:03 +01:00
Matthew Honnibal	0d3bf0d4eb	Merge branch 'master' of https://github.com/explosion/spaCy	2018-03-24 17:31:49 +01:00
dejanmarich	ccd1c04c63	Update stop_words.py Added more words	2018-03-24 17:31:24 +01:00
ines	f1446b0257	Port over Turkish changes	2018-03-24 17:31:07 +01:00
DuyguA	cd604878a4	quick typo fix	2018-03-24 17:26:35 +01:00
Matthew Honnibal	406548b976	Support .gz and .tar.gz files in spacy init-model	2018-03-24 17:18:32 +01:00
Jim O'Regan	efe037e8be	more exceptions	2018-03-24 00:05:27 +00:00
Matthew Honnibal	e3be3d65b3	Version as v2.0.10.dev0	2018-03-15 17:31:22 +01:00
ines	f3f8bfc367	Add built-in factories for merge_entities and merge_noun_chunks Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).	2018-03-15 17:16:54 +01:00
alldefector	f4e5904fc2	Fix Spanish noun_chunks failure caused by typo	2018-03-14 17:03:17 +01:00
Thomas Opsomer	fbf48b3f9f	lemma property to return hash instead of unicode	2018-03-14 17:03:00 +01:00
Matthew Honnibal	8cefc58abc	Fix Vectors pickling	2018-03-14 16:59:37 +01:00
Matthew Honnibal	307aefe131	Increment version to v2.0.9	2018-02-22 17:07:53 +01:00
Ines Montani	14e7e0f12a	Merge pull request #2000 from jimregan/polish-tag-map Polish tag map	2018-02-18 19:05:58 +01:00
Jim O'Regan	664407de5d	missing PrepCase attribute	2018-02-18 14:46:12 +00:00
Jim O'Regan	95f0673fbc	fix typo/missing here too	2018-02-18 14:38:27 +00:00
Matthew Honnibal	cf0e320f2b	Add doc.is_sentenced attribute, re #1959	2018-02-18 14:16:55 +01:00
Matthew Honnibal	1e5aeb4eec	Merge pull request #1987 from thomasopsomer/span-sent Make span.sent work when only manual / custom sbd	2018-02-18 14:05:37 +01:00
Matthew Honnibal	1cf774bdc1	Add output options return_matches and as_tuples to Matcher	2018-02-18 14:00:45 +01:00
Matthew Honnibal	dd9b0945af	Fix inconsistencies in the symbols table	2018-02-18 13:51:31 +01:00
Matthew Honnibal	66496ac8e1	Set version to v2.1.0.dev0	2018-02-18 13:48:39 +01:00
Matthew Honnibal	eb3040ce46	Merge pull request #1891 from fucking-signup/master Fix issue #1889	2018-02-18 13:47:47 +01:00
ines	6bba1db4cc	Drop six and related hacks as a dependency	2018-02-18 13:29:56 +01:00
Matthew Honnibal	b30b09192a	Merge pull request #1665 from jimregan/animacy typo in "inan", add "nhum"	2018-02-18 13:26:53 +01:00
Matthew Honnibal	1b3c98e01b	Set version to v2.0.8	2018-02-18 12:16:31 +01:00
Matthew Honnibal	f9f46e5a07	Revert matcher fixes from GregDubbin	2018-02-18 10:59:28 +01:00
Matthew Honnibal	86405e4ad1	Fix CLI for multitask objectives	2018-02-18 10:59:11 +01:00
Matthew Honnibal	a34749b2bf	Add multitask objectives options to train CLI	2018-02-17 22:03:54 +01:00
Matthew Honnibal	8f06903e09	Fix multitask objectives	2018-02-17 18:41:36 +01:00
Matthew Honnibal	d1246c95fb	Fix model loading when using multitask objectives	2018-02-17 18:11:36 +01:00
Matthew Honnibal	262d0a3148	Fix overwriting of lexical attributes when loading vectors during training	2018-02-17 18:11:11 +01:00
Matthew Honnibal	c0caf7cf27	Fix LANG symbol	2018-02-17 18:10:50 +01:00
Matthew Honnibal	0bf2f6be29	Add missing symbol for LANG attr. Fixes inconsistent numeric ID	2018-02-17 17:37:02 +01:00
Matthew Honnibal	97a228a4ce	Increment to v2.0.8.dev0	2018-02-17 16:54:36 +01:00
Aaron Marquez	ea571e8325	Merge branch 'master' into issue-1959	2018-02-16 15:14:09 -08:00
Matthew Honnibal	7d5c720fc3	Fix multitask objective when no pipeline provided	2018-02-15 23:50:21 +01:00
Aaron Marquez	f0d3672e17	Changed loading EN model	2018-02-15 14:28:38 -08:00
Aaron Marquez	3765d84d57	Fix issue #1959	2018-02-15 12:51:49 -08:00
Aaron Marquez	7ba4111554	Add test for issue-1959	2018-02-15 12:46:22 -08:00
Matthew Honnibal	59b7cf9db8	Add get_beam_parse method in ArcEager, for Prodigy	2018-02-15 21:03:16 +01:00
Matthew Honnibal	3e541de440	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-15 21:02:55 +01:00
Thomas Opsomer	5d24a81c0b	add test for span.sent when doc not parsed	2018-02-15 16:59:16 +01:00
Thomas Opsomer	deab391cbf	correct check on sent_start & raise if no boundaries	2018-02-15 16:58:30 +01:00
Matthew Honnibal	4cb861e080	Merge pull request #1968 from DuyguA/is_currency New lexical feature is_currency	2018-02-15 12:13:36 +01:00
Thomas Opsomer	b902731313	Find span sentence when only sentence boundaries (no parser)	2018-02-14 22:18:54 +01:00
Claudiu-Vlad Ursache	e28de12cbd	Ensure files opened in `from_disk` are closed Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706).	2018-02-13 20:49:43 +01:00
Johannes Dollinger	012e874d09	Add contributor agreement for emulbreh	2018-02-13 13:40:33 +01:00
Johannes Dollinger	bf94c13382	Don't fix random seeds on import	2018-02-13 12:42:23 +01:00
Matthew Honnibal	d7c9b53120	Pass kwargs into pipeline components during begin_training	2018-02-12 10:18:39 +01:00
4altinok	ca8728035d	added new lex feat to token	2018-02-11 18:55:48 +01:00
4altinok	edd7202a06	added new symbol	2018-02-11 18:55:32 +01:00
4altinok	ed1ac2969e	added new lexical feat to lexeme	2018-02-11 18:51:48 +01:00
4altinok	94fb0b75e3	code for is_currency	2018-02-11 18:51:32 +01:00
4altinok	3deef1497a	removed 18 and replaced 18 with is_currency	2018-02-11 18:51:09 +01:00
4altinok	471d3c9e23	added lex test for is_currency	2018-02-11 18:50:50 +01:00
ines	c63e99da8a	Fix typo in glossary (resolves #1964 ) Co-Authored-By: SThomasP <sthomasp@users.noreply.github.com>	2018-02-10 11:58:41 +01:00
Lyndon White	6ee5dff51c	Make python 3.4 compat module loading (fix #1733 )	2018-02-09 23:03:35 +08:00
Matthew Honnibal	e361b4f82b	Fix #1929 : Incorrect NER when pre-set sentence boundaries.	2018-02-08 15:25:41 +01:00
Matthew Honnibal	fd9fd275c5	Make test for #1945 more precise	2018-02-07 02:06:11 +01:00
Matthew Honnibal	c087a14380	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-07 01:29:39 +01:00
Matthew Honnibal	76d89b2180	Add test for #1945 : PhraseMatcher regression	2018-02-07 01:29:23 +01:00
Ines Montani	0954e15dda	Merge pull request #1913 from ohenrik/nb_syntax_iterator Norwegian Language (nb) - Added french syntax iterator with explanation	2018-02-06 04:59:07 +01:00
Ole Henrik Skogstrøm	251a7805fe	Copied French syntax iterator to simplify future changes	2018-02-05 14:45:05 +01:00
Matthew Honnibal	2e7391e627	Merge pull request #1916 from tokestermw/bug/fix-not-passing-in-model-cfg-in-nlp Bug/fix not passing in model cfg in nlp	2018-02-05 01:19:40 +01:00
Ali Zarezade	9df9da34a3	Fix init_model issue Fixing issue #1928	2018-02-03 17:21:34 +03:30
Matthew Honnibal	ebe84e45e5	Increment version to 2.0.7	2018-02-02 03:39:16 +01:00
Matthew Honnibal	e4b1f57599	Increment version	2018-02-02 02:33:23 +01:00
Matthew Honnibal	069531c351	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-02 02:32:58 +01:00
Matthew Honnibal	f74a802d09	Test and fix #1919 : Error resuming training	2018-02-02 02:32:40 +01:00
ines	f1d3deffac	Add Russian example sentences (see #1107 )	2018-02-01 20:09:40 +01:00
Matthew Honnibal	6b1126c312	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-01 02:57:52 +01:00
ines	3c1fb9d02d	Make validate command fail more gracefully if version not found Mostly relevant during develoment when working with .dev versions	2018-01-31 22:06:28 +01:00
Motoki Wu	54062b7326	added tests for issue #1915	2018-01-30 18:30:19 -08:00
Motoki Wu	f4a7d1a423	make to sure pass in **cfg to each component when training	2018-01-30 18:29:54 -08:00
ines	4046823699	Only check component in factories if string (see #1911 )	2018-01-30 16:29:07 +01:00
ines	ce10d320c4	Fix component check in self.factories (see #1911 )	2018-01-30 16:09:37 +01:00
Ole Henrik Skogstrøm	e40465487c	Added french syntax iterator with explenation	2018-01-30 15:44:29 +01:00
ines	8901814248	Improve error handling if pipeline component is not callable (resolves #1911 ) Also add help message if user accidentally calls nlp.add_pipe() with a string of a built-in component name.	2018-01-30 15:43:03 +01:00
Matthew Honnibal	a437ba87a3	Set release=True	2018-01-29 21:26:04 +01:00
Adam Binford	9238749aaf	Removed test to avoid network requests	2018-01-29 14:48:20 -05:00
Adam Binford	1a2c2f7d7f	Fixed auto linking after download and added simple test to check	2018-01-29 14:25:21 -05:00
Matthew Honnibal	cb7110c22e	Merge pull request #1882 from ohenrik/nb_lemma_and_tag_map Add norwegian bokmål ('nb') lemmatizer and tag_map	2018-01-29 18:18:50 +01:00
Matthew Honnibal	0c1e7f0c86	Merge pull request #1893 from azarezade/master Add Persian language	2018-01-29 18:18:33 +01:00
Matthew Honnibal	cbdab75b36	Increment version	2018-01-28 23:46:22 +01:00
Matthew Honnibal	512e6adb08	Merge pull request #1896 from thomasopsomer/fix-sent Fix sentence boundaries serialization (issue #1834)	2018-01-28 21:18:51 +01:00
Matthew Honnibal	f5b1ad4100	Limit parser model size, to hopefully reduce memory during CI tests	2018-01-28 21:00:32 +01:00
Thomas Opsomer	515e25910e	fix sent_start in serialization	2018-01-28 19:50:42 +01:00
Thomas Opsomer	45d62561f7	add test for the issue	2018-01-28 19:49:56 +01:00
ines	6d978e5c35	Don't use deprecated Doc.merge call in displaCy As reported here: https://stackoverflow.com/a/48464412/6400719	2018-01-27 11:25:05 +01:00
Ali Zarezade	bb6bd3d8ae	add persian language	2018-01-27 13:27:26 +03:30
Ali Zarezade	d195675db5	add persian language	2018-01-27 13:21:38 +03:30
Kit	4b42267ba3	Fix issue #1889	2018-01-25 23:17:22 +01:00
Kit	52ef51f36e	Add test for issue #1889	2018-01-25 22:56:48 +01:00
Ole Henrik Skogstrøm	8e2c9f2475	Cleaned up nb tag_map comments	2018-01-25 11:09:28 +01:00
Ole Henrik Skogstrøm	1107e89fcf	Updated doc string on nb tag_map module	2018-01-25 11:08:28 +01:00
Matthew Honnibal	6a8cb905aa	Merge pull request #1876 from GregDubbin/master Pattern matcher fixes	2018-01-24 16:38:11 +01:00
Matthew Honnibal	38b260e0c3	Merge pull request #1879 from azarezade/master Add Persian character and symbols	2018-01-24 16:34:22 +01:00
Matthew Honnibal	edb71a280e	Add test for #1883 : Unpickling Matcher	2018-01-24 15:42:33 +01:00
Matthew Honnibal	2ad050e668	Fix unpickling of Matcher. Also store correct data in matcher._patterns	2018-01-24 15:42:11 +01:00
Ole Henrik Skogstrøm	4058a7d579	Fix æøå characters in lemmatizer	2018-01-24 14:03:14 +01:00
Ole Henrik Skogstrøm	42248f423f	Updated tag map	2018-01-24 13:50:33 +01:00
Ole Henrik Skogstrøm	74b430b49a	Correct Lemmatizer	2018-01-24 13:26:33 +01:00
Ole Henrik Skogstrøm	b9b3a40c78	Add norwegian lemmatizer and tag_map	2018-01-24 12:28:29 +01:00
Matthew Honnibal	42a18ef903	Add test for #1868 : Vocab.__contains__ with ints	2018-01-23 23:27:05 +01:00
Matthew Honnibal	43f381ce36	Make Vocab.__contains__ work with ints. Fixes #1868	2018-01-23 23:26:47 +01:00
greg	85ab99e692	Correct test examples	2018-01-23 15:00:14 -05:00
greg	f50bb1aafc	Restructure StateC to eliminate dependency on unordered_map	2018-01-23 14:40:03 -05:00
Matthew Honnibal	f3753c2453	Further model deserialization fixes re #1727	2018-01-23 19:16:05 +01:00
Matthew Honnibal	91e916cb67	Add comment to new test	2018-01-23 19:11:53 +01:00
Matthew Honnibal	fd187d71ad	Add test for #1727	2018-01-23 19:11:01 +01:00
Matthew Honnibal	85c942a6e3	Dont overwrite pretrained_dims setting from cfg. Fixes #1727	2018-01-23 19:10:49 +01:00
Ali Zarezade	42349471bc	add ٪ as punctuation	2018-01-23 18:11:33 +03:30
Ali Zarezade	2bda582135	Add Persian character and symbols Add Persian characters and the following: - ٪ used instead of % - ؟ used instead of ? - ﷼ used instead of $ - ، used instead of , - ؛ used instead of ;	2018-01-23 13:20:36 +03:30
Matthew Honnibal	7e6dc283db	Fix unicode import in test	2018-01-22 23:55:44 +01:00
greg	686735b94e	Fix matcher import	2018-01-22 16:53:05 -05:00
greg	3a491093ee	Import libcpp.map if libcpp.unordered_map doesn't exist	2018-01-22 16:46:25 -05:00
greg	d55992bdf0	Switch match dictionary to use final state pointer rather than ID	2018-01-22 15:36:47 -05:00
Matthew Honnibal	4ce7d24fd5	Add test for #1799 : Set left and right edges (and thus sentences) in non-projective parses.	2018-01-22 20:18:38 +01:00
Matthew Honnibal	56164ab688	Set l_edge and r_edge correctly for non-projective parses. Fixes #1799	2018-01-22 20:18:04 +01:00
Matthew Honnibal	964aa1b384	Merge branch 'master' of https://github.com/explosion/spaCy	2018-01-22 19:18:46 +01:00
Matthew Honnibal	29897ed1b3	Allow vector loading to work on 1d data files. Fixes #1831	2018-01-22 19:18:26 +01:00
greg	490bc82c27	Add comments clarifying matcher logic for '*'	2018-01-22 10:03:12 -05:00
Matthew Honnibal	fe4748fc38	Merge pull request #1870 from avadhpatel/master Model Load Performance Improvement by more than 5x	2018-01-22 00:05:15 +01:00
Avadh Patel	a517df55c8	Small fix Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-21 15:20:45 -06:00
Avadh Patel	5b5029890d	Merge branch 'perfTuning' into perfTuningMaster Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-21 15:20:00 -06:00
Matthew Honnibal	203d2ea830	Allow multitask objectives to be added to the parser and NER more easily	2018-01-21 19:37:02 +01:00
Matthew Honnibal	4a7d524efb	Merge branch 'master' of https://github.com/explosion/spaCy	2018-01-21 19:22:03 +01:00
Matthew Honnibal	61a051f2c0	Fix MultitaskObjective	2018-01-21 19:21:34 +01:00
Avadh Patel	75903949da	Updated model building after suggestion from Matthew Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-18 06:51:57 -06:00
Avadh Patel	fe879da2a1	Do not train model if its going to be loaded from disk This saves significant time in loading a model from disk. Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-17 06:16:07 -06:00
Avadh Patel	2146faffee	Do not train model if its going to be loaded from disk This saves significant time in loading a model from disk. Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-17 06:04:22 -06:00
greg	7072b395c9	Add greedy matcher tests	2018-01-16 15:46:13 -05:00
greg	441f490c1c	Merge branch 'master' of github.com:GregDubbin/spaCy	2018-01-16 13:31:10 -05:00
greg	8bea62f26e	Correct bugs for greedy matching and introduce ADVANCE_PLUS action	2018-01-16 13:21:43 -05:00
Matthew Honnibal	ccb51a9f36	Make .similarity() return 1.0 if all orth attrs match	2018-01-15 16:29:48 +01:00
Matthew Honnibal	82135d85b7	Fix test	2018-01-15 15:55:15 +01:00
Matthew Honnibal	4b09616b58	Add test for #1757 : Comparison against None	2018-01-15 15:55:01 +01:00
Matthew Honnibal	b904d81e9a	Fix rich comparison against None objects. Closes #1757	2018-01-15 15:51:25 +01:00
Matthew Honnibal	9e413449f6	Fix unicode error in new test	2018-01-15 15:39:00 +01:00
Matthew Honnibal	ab7c45b12d	Fix error message and handling of doc.sents	2018-01-15 15:21:11 +01:00
Matthew Honnibal	6b215d2dd3	Add test for Issue #1537	2018-01-15 15:20:56 +01:00
ines	5babb7d6f6	Merge branch 'master' of https://github.com/explosion/spaCy	2018-01-14 17:31:09 +01:00
ines	793890cb4d	Remove test for removed deprecation warning	2018-01-14 17:31:06 +01:00
Matthew Honnibal	465a6f6452	Add missing Span.vocab property. Closes #1633	2018-01-14 15:06:30 +01:00
Matthew Honnibal	0cb090e526	Fix infinite recursion in token.sent_start. Closes #1640	2018-01-14 15:02:15 +01:00
Matthew Honnibal	5cbe913b6f	Don't raise deprecation warning in property. Closes #1813 , #1712	2018-01-14 14:55:58 +01:00
Matthew Honnibal	1a1cca6052	Fix vectors.resize() on Py3. Closes #1539	2018-01-14 14:48:51 +01:00
Matthew Honnibal	0153220304	Make set_vector add word to vocab. Fixes #1807	2018-01-14 13:57:57 +01:00
Ines Montani	55754f0cee	Merge pull request #1836 from fucking-signup/master Add tests for issue #1769	2018-01-13 00:23:35 +00:00
Kit	4ee97f20a0	Mark like_num tests as slow	2018-01-13 00:44:15 +01:00
Kit	855531537e	Rewrite tests for issue #1769	2018-01-12 23:49:51 +01:00
Kit	5b541cb5ec	Simplify tests for issue #1769	2018-01-12 23:34:27 +01:00
Kit	7a2adc4633	Remove some tests to see build status changes	2018-01-12 22:49:16 +01:00
Kit	0e62809a43	Rewrite tests for issue #1769	2018-01-12 22:26:06 +01:00
Ines Montani	36f426fe0a	Merge pull request #1808 from fucking-signup/master Fix issue #1769	2018-01-12 21:12:02 +00:00
Kit	76f4eeca44	Remove tests to see build changes on Windows (Python 2.7)	2018-01-12 20:30:51 +01:00
Matthew Honnibal	7ca49c2061	Merge branch 'master' into feature-improve-model-download	2018-01-10 18:21:55 +01:00
Kit	7ec0956e8d	Add regression test (issue #1769 )	2018-01-08 03:42:04 +01:00
Kit	701e7cc6aa	Rename variable to keep code consistent	2018-01-08 03:38:44 +01:00
Kit	ed0db95183	Find lowercased forms of ordinal words, where possible	2018-01-08 03:28:50 +01:00
Kit	9bc524982e	Find lowercased forms of numeric words	2018-01-08 03:25:08 +01:00
Søren Lind Kristiansen	62de5da1ff	Remove unsused dummy variable	2018-01-05 09:57:24 +01:00
Søren Lind Kristiansen	10dab8eef8	Remove dummy variable from function calls	2018-01-05 09:37:05 +01:00
Søren Lind Kristiansen	7f0ab145e9	Don't pass CLI command name as dummy argument	2018-01-04 21:33:47 +01:00
Ines Montani	6a008233b5	Merge pull request #1795 from textioHQ/issue1758 (resolves #1758 ) english tokenizer: handle "would've"	2018-01-04 02:43:39 +00:00
Kevin Humphreys	597df5bf83	add test	2018-01-03 13:00:05 -08:00
Kevin Humphreys	7918fa4ef9	handle would've	2018-01-03 12:25:48 -08:00
ines	2c656f90fb	Exit with 1 if incompatible models found (see #1714 )	2018-01-03 21:20:35 +01:00
ines	dacfaa2ca4	Ensure that download command exits properly (resolves #1714 )	2018-01-03 21:03:36 +01:00
Søren Lind Kristiansen	a9ff6eadc9	Prefix dummy argument names with underscore	2018-01-03 20:48:12 +01:00
ines	1081e08efb	Fix formatting	2018-01-03 20:14:50 +01:00
ines	d8109964d6	Use --no-deps on model install In general, it's nice for models to specify spaCy as a dependency. However, this tends to cause problems in conda environments, as pip will re-install spaCy and its dependencies (especially Thinc)	2018-01-03 17:40:37 +01:00
ines	319d754309	Fix overwriting of existing symlinks Check for is_symlink() to also overwrite invalid and outdated symlinks. Also show better error message if link path exists but is not symlink (i.e. file or directory).	2018-01-03 17:39:36 +01:00
ines	8ba0dfd017	Make message on failed linking more clear	2018-01-03 17:38:09 +01:00
Søren Lind Kristiansen	d6327e8495	Fix handling case when vectors not specified	2018-01-03 12:20:49 +01:00
Søren Lind Kristiansen	bcc51d7d8b	Fix shifted positional arguments	2018-01-03 12:19:47 +01:00
zqhZY	f27859fa99	add ChineseDefaults class for pickling	2017-12-28 17:13:58 +08:00
Ines Montani	ff9fc945ab	Merge pull request #1749 from sorenlind/da_ud_tokenization Tune Danish tokenizer to more closely match Universal Dependencies	2017-12-22 16:00:49 +00:00
ines	26f313dabc	Fix missing import	2017-12-22 16:21:44 +01:00
ines	8dc1c27841	Merge branch 'master' of https://github.com/explosion/spaCy	2017-12-22 16:01:00 +01:00
ines	b10ba848b8	xfail test that causes MemoryError on Python 2 on Windows Need to investigate this further!	2017-12-22 16:00:58 +01:00
Søren Lind Kristiansen	bef735aef7	Fix Danish abbreviation 'm.h.t.'	2017-12-21 09:24:31 +01:00
Ines Montani	a3dd167d7f	Merge branch 'master' into da_ud_tokenization	2017-12-20 21:05:34 +00:00
Ines Montani	97f100f69f	Merge pull request #1742 from kimfalk/master Two corrections in the da lan.	2017-12-20 21:02:00 +00:00
Ines Montani	d682a8803e	Merge pull request #1672 from cbilgili/master Adds Turkish Lemmatization	2017-12-20 21:01:00 +00:00
Benjamin Peterson	9452134cd1	remove no-break spaces from Hindi example (fixes #1750 )	2017-12-20 11:35:30 -08:00
Søren Lind Kristiansen	7a2f2f6f94	Fix formatting.	2017-12-20 18:37:37 +01:00
Søren Lind Kristiansen	15d13efafd	Tune Danish tokenizer to more closely match tokenization in Universal Dependencies.	2017-12-20 17:36:52 +01:00
Kim FalkJørgensen	648dc60755	Remove the incorrect exception 'm.h.t'	2017-12-20 10:02:39 +01:00
Kim FalkJørgensen	9c9f4ef84a	Fixing a translation error in examples.py Adding an exception in the tokenizer_exceptions.py	2017-12-19 15:26:50 +01:00
ines	22dc744b48	Fix check for '@' in like_url (see #1715 )	2017-12-16 13:48:43 +01:00
Ines Montani	9c1ee65268	Add regression test for #1698	2017-12-12 10:36:11 +01:00
Ines Montani	6455b574fc	Check for email address first	2017-12-12 10:25:13 +01:00
Bri-Will	d77361d76c	Update lex_attrs.py. Fix like_url from matching on e-mail	2017-12-11 14:13:28 -08:00
Søren Lind Kristiansen	5a9d377580	Remove abbreviation for positional plac argument	2017-12-11 11:08:29 +01:00
Isaac Sijaranamual	38021fbb00	Switch from python 3 only TemporaryDirectory to pytest's tmpdir	2017-12-11 00:16:04 +01:00
Isaac Sijaranamual	20ae0c459a	Fixes "Error saving model" #1622	2017-12-10 23:07:13 +01:00

... 3 4 5 6 7 ...

5045 Commits