spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-09-23 12:29:18 +03:00

Author	SHA1	Message	Date
Thomas Opsomer	515e25910e	fix sent_start in serialization	2018-01-28 19:50:42 +01:00
Thomas Opsomer	45d62561f7	add test for the issue	2018-01-28 19:49:56 +01:00
ines	6d978e5c35	Don't use deprecated Doc.merge call in displaCy As reported here: https://stackoverflow.com/a/48464412/6400719	2018-01-27 11:25:05 +01:00
Ali Zarezade	bb6bd3d8ae	add persian language	2018-01-27 13:27:26 +03:30
Ali Zarezade	d195675db5	add persian language	2018-01-27 13:21:38 +03:30
Kit	4b42267ba3	Fix issue #1889	2018-01-25 23:17:22 +01:00
Kit	52ef51f36e	Add test for issue #1889	2018-01-25 22:56:48 +01:00
Ole Henrik Skogstrøm	8e2c9f2475	Cleaned up nb tag_map comments	2018-01-25 11:09:28 +01:00
Ole Henrik Skogstrøm	1107e89fcf	Updated doc string on nb tag_map module	2018-01-25 11:08:28 +01:00
Matthew Honnibal	6a8cb905aa	Merge pull request #1876 from GregDubbin/master Pattern matcher fixes	2018-01-24 16:38:11 +01:00
Matthew Honnibal	38b260e0c3	Merge pull request #1879 from azarezade/master Add Persian character and symbols	2018-01-24 16:34:22 +01:00
Matthew Honnibal	edb71a280e	Add test for #1883 : Unpickling Matcher	2018-01-24 15:42:33 +01:00
Matthew Honnibal	2ad050e668	Fix unpickling of Matcher. Also store correct data in matcher._patterns	2018-01-24 15:42:11 +01:00
Ole Henrik Skogstrøm	4058a7d579	Fix æøå characters in lemmatizer	2018-01-24 14:03:14 +01:00
Ole Henrik Skogstrøm	42248f423f	Updated tag map	2018-01-24 13:50:33 +01:00
Ole Henrik Skogstrøm	74b430b49a	Correct Lemmatizer	2018-01-24 13:26:33 +01:00
Ole Henrik Skogstrøm	b9b3a40c78	Add norwegian lemmatizer and tag_map	2018-01-24 12:28:29 +01:00
Matthew Honnibal	42a18ef903	Add test for #1868 : Vocab.__contains__ with ints	2018-01-23 23:27:05 +01:00
Matthew Honnibal	43f381ce36	Make Vocab.__contains__ work with ints. Fixes #1868	2018-01-23 23:26:47 +01:00
greg	85ab99e692	Correct test examples	2018-01-23 15:00:14 -05:00
greg	f50bb1aafc	Restructure StateC to eliminate dependency on unordered_map	2018-01-23 14:40:03 -05:00
Matthew Honnibal	f3753c2453	Further model deserialization fixes re #1727	2018-01-23 19:16:05 +01:00
Matthew Honnibal	91e916cb67	Add comment to new test	2018-01-23 19:11:53 +01:00
Matthew Honnibal	fd187d71ad	Add test for #1727	2018-01-23 19:11:01 +01:00
Matthew Honnibal	85c942a6e3	Dont overwrite pretrained_dims setting from cfg. Fixes #1727	2018-01-23 19:10:49 +01:00
Ali Zarezade	42349471bc	add ٪ as punctuation	2018-01-23 18:11:33 +03:30
Ali Zarezade	2bda582135	Add Persian character and symbols Add Persian characters and the following: - ٪ used instead of % - ؟ used instead of ? - ﷼ used instead of $ - ، used instead of , - ؛ used instead of ;	2018-01-23 13:20:36 +03:30
Matthew Honnibal	7e6dc283db	Fix unicode import in test	2018-01-22 23:55:44 +01:00
greg	686735b94e	Fix matcher import	2018-01-22 16:53:05 -05:00
greg	3a491093ee	Import libcpp.map if libcpp.unordered_map doesn't exist	2018-01-22 16:46:25 -05:00
greg	d55992bdf0	Switch match dictionary to use final state pointer rather than ID	2018-01-22 15:36:47 -05:00
Matthew Honnibal	4ce7d24fd5	Add test for #1799 : Set left and right edges (and thus sentences) in non-projective parses.	2018-01-22 20:18:38 +01:00
Matthew Honnibal	56164ab688	Set l_edge and r_edge correctly for non-projective parses. Fixes #1799	2018-01-22 20:18:04 +01:00
Matthew Honnibal	964aa1b384	Merge branch 'master' of https://github.com/explosion/spaCy	2018-01-22 19:18:46 +01:00
Matthew Honnibal	29897ed1b3	Allow vector loading to work on 1d data files. Fixes #1831	2018-01-22 19:18:26 +01:00
greg	490bc82c27	Add comments clarifying matcher logic for '*'	2018-01-22 10:03:12 -05:00
Matthew Honnibal	fe4748fc38	Merge pull request #1870 from avadhpatel/master Model Load Performance Improvement by more than 5x	2018-01-22 00:05:15 +01:00
Avadh Patel	a517df55c8	Small fix Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-21 15:20:45 -06:00
Avadh Patel	5b5029890d	Merge branch 'perfTuning' into perfTuningMaster Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-21 15:20:00 -06:00
Matthew Honnibal	203d2ea830	Allow multitask objectives to be added to the parser and NER more easily	2018-01-21 19:37:02 +01:00
Matthew Honnibal	4a7d524efb	Merge branch 'master' of https://github.com/explosion/spaCy	2018-01-21 19:22:03 +01:00
Matthew Honnibal	61a051f2c0	Fix MultitaskObjective	2018-01-21 19:21:34 +01:00
Avadh Patel	75903949da	Updated model building after suggestion from Matthew Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-18 06:51:57 -06:00
Avadh Patel	fe879da2a1	Do not train model if its going to be loaded from disk This saves significant time in loading a model from disk. Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-17 06:16:07 -06:00
Avadh Patel	2146faffee	Do not train model if its going to be loaded from disk This saves significant time in loading a model from disk. Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-17 06:04:22 -06:00
greg	7072b395c9	Add greedy matcher tests	2018-01-16 15:46:13 -05:00
greg	441f490c1c	Merge branch 'master' of github.com:GregDubbin/spaCy	2018-01-16 13:31:10 -05:00
greg	8bea62f26e	Correct bugs for greedy matching and introduce ADVANCE_PLUS action	2018-01-16 13:21:43 -05:00
Matthew Honnibal	ccb51a9f36	Make .similarity() return 1.0 if all orth attrs match	2018-01-15 16:29:48 +01:00
Matthew Honnibal	82135d85b7	Fix test	2018-01-15 15:55:15 +01:00
Matthew Honnibal	4b09616b58	Add test for #1757 : Comparison against None	2018-01-15 15:55:01 +01:00
Matthew Honnibal	b904d81e9a	Fix rich comparison against None objects. Closes #1757	2018-01-15 15:51:25 +01:00
Matthew Honnibal	9e413449f6	Fix unicode error in new test	2018-01-15 15:39:00 +01:00
Matthew Honnibal	ab7c45b12d	Fix error message and handling of doc.sents	2018-01-15 15:21:11 +01:00
Matthew Honnibal	6b215d2dd3	Add test for Issue #1537	2018-01-15 15:20:56 +01:00
ines	5babb7d6f6	Merge branch 'master' of https://github.com/explosion/spaCy	2018-01-14 17:31:09 +01:00
ines	793890cb4d	Remove test for removed deprecation warning	2018-01-14 17:31:06 +01:00
Matthew Honnibal	465a6f6452	Add missing Span.vocab property. Closes #1633	2018-01-14 15:06:30 +01:00
Matthew Honnibal	0cb090e526	Fix infinite recursion in token.sent_start. Closes #1640	2018-01-14 15:02:15 +01:00
Matthew Honnibal	5cbe913b6f	Don't raise deprecation warning in property. Closes #1813 , #1712	2018-01-14 14:55:58 +01:00
Matthew Honnibal	1a1cca6052	Fix vectors.resize() on Py3. Closes #1539	2018-01-14 14:48:51 +01:00
Matthew Honnibal	0153220304	Make set_vector add word to vocab. Fixes #1807	2018-01-14 13:57:57 +01:00
Ines Montani	55754f0cee	Merge pull request #1836 from fucking-signup/master Add tests for issue #1769	2018-01-13 00:23:35 +00:00
Kit	4ee97f20a0	Mark like_num tests as slow	2018-01-13 00:44:15 +01:00
Kit	855531537e	Rewrite tests for issue #1769	2018-01-12 23:49:51 +01:00
Kit	5b541cb5ec	Simplify tests for issue #1769	2018-01-12 23:34:27 +01:00
Kit	7a2adc4633	Remove some tests to see build status changes	2018-01-12 22:49:16 +01:00
Kit	0e62809a43	Rewrite tests for issue #1769	2018-01-12 22:26:06 +01:00
Ines Montani	36f426fe0a	Merge pull request #1808 from fucking-signup/master Fix issue #1769	2018-01-12 21:12:02 +00:00
Kit	76f4eeca44	Remove tests to see build changes on Windows (Python 2.7)	2018-01-12 20:30:51 +01:00
Matthew Honnibal	7ca49c2061	Merge branch 'master' into feature-improve-model-download	2018-01-10 18:21:55 +01:00
Kit	7ec0956e8d	Add regression test (issue #1769 )	2018-01-08 03:42:04 +01:00
Kit	701e7cc6aa	Rename variable to keep code consistent	2018-01-08 03:38:44 +01:00
Kit	ed0db95183	Find lowercased forms of ordinal words, where possible	2018-01-08 03:28:50 +01:00
Kit	9bc524982e	Find lowercased forms of numeric words	2018-01-08 03:25:08 +01:00
Søren Lind Kristiansen	62de5da1ff	Remove unsused dummy variable	2018-01-05 09:57:24 +01:00
Søren Lind Kristiansen	10dab8eef8	Remove dummy variable from function calls	2018-01-05 09:37:05 +01:00
Søren Lind Kristiansen	7f0ab145e9	Don't pass CLI command name as dummy argument	2018-01-04 21:33:47 +01:00
Ines Montani	6a008233b5	Merge pull request #1795 from textioHQ/issue1758 (resolves #1758 ) english tokenizer: handle "would've"	2018-01-04 02:43:39 +00:00
Kevin Humphreys	597df5bf83	add test	2018-01-03 13:00:05 -08:00
Kevin Humphreys	7918fa4ef9	handle would've	2018-01-03 12:25:48 -08:00
ines	2c656f90fb	Exit with 1 if incompatible models found (see #1714 )	2018-01-03 21:20:35 +01:00
ines	dacfaa2ca4	Ensure that download command exits properly (resolves #1714 )	2018-01-03 21:03:36 +01:00
Søren Lind Kristiansen	a9ff6eadc9	Prefix dummy argument names with underscore	2018-01-03 20:48:12 +01:00
ines	1081e08efb	Fix formatting	2018-01-03 20:14:50 +01:00
ines	d8109964d6	Use --no-deps on model install In general, it's nice for models to specify spaCy as a dependency. However, this tends to cause problems in conda environments, as pip will re-install spaCy and its dependencies (especially Thinc)	2018-01-03 17:40:37 +01:00
ines	319d754309	Fix overwriting of existing symlinks Check for is_symlink() to also overwrite invalid and outdated symlinks. Also show better error message if link path exists but is not symlink (i.e. file or directory).	2018-01-03 17:39:36 +01:00
ines	8ba0dfd017	Make message on failed linking more clear	2018-01-03 17:38:09 +01:00
Søren Lind Kristiansen	d6327e8495	Fix handling case when vectors not specified	2018-01-03 12:20:49 +01:00
Søren Lind Kristiansen	bcc51d7d8b	Fix shifted positional arguments	2018-01-03 12:19:47 +01:00
zqhZY	f27859fa99	add ChineseDefaults class for pickling	2017-12-28 17:13:58 +08:00
Ines Montani	ff9fc945ab	Merge pull request #1749 from sorenlind/da_ud_tokenization Tune Danish tokenizer to more closely match Universal Dependencies	2017-12-22 16:00:49 +00:00
ines	26f313dabc	Fix missing import	2017-12-22 16:21:44 +01:00
ines	8dc1c27841	Merge branch 'master' of https://github.com/explosion/spaCy	2017-12-22 16:01:00 +01:00
ines	b10ba848b8	xfail test that causes MemoryError on Python 2 on Windows Need to investigate this further!	2017-12-22 16:00:58 +01:00
Søren Lind Kristiansen	bef735aef7	Fix Danish abbreviation 'm.h.t.'	2017-12-21 09:24:31 +01:00
Ines Montani	a3dd167d7f	Merge branch 'master' into da_ud_tokenization	2017-12-20 21:05:34 +00:00
Ines Montani	97f100f69f	Merge pull request #1742 from kimfalk/master Two corrections in the da lan.	2017-12-20 21:02:00 +00:00
Ines Montani	d682a8803e	Merge pull request #1672 from cbilgili/master Adds Turkish Lemmatization	2017-12-20 21:01:00 +00:00
Benjamin Peterson	9452134cd1	remove no-break spaces from Hindi example (fixes #1750 )	2017-12-20 11:35:30 -08:00
Søren Lind Kristiansen	7a2f2f6f94	Fix formatting.	2017-12-20 18:37:37 +01:00
Søren Lind Kristiansen	15d13efafd	Tune Danish tokenizer to more closely match tokenization in Universal Dependencies.	2017-12-20 17:36:52 +01:00
Kim FalkJørgensen	648dc60755	Remove the incorrect exception 'm.h.t'	2017-12-20 10:02:39 +01:00
Kim FalkJørgensen	9c9f4ef84a	Fixing a translation error in examples.py Adding an exception in the tokenizer_exceptions.py	2017-12-19 15:26:50 +01:00
ines	22dc744b48	Fix check for '@' in like_url (see #1715 )	2017-12-16 13:48:43 +01:00
Ines Montani	9c1ee65268	Add regression test for #1698	2017-12-12 10:36:11 +01:00
Ines Montani	6455b574fc	Check for email address first	2017-12-12 10:25:13 +01:00
Bri-Will	d77361d76c	Update lex_attrs.py. Fix like_url from matching on e-mail	2017-12-11 14:13:28 -08:00
Søren Lind Kristiansen	5a9d377580	Remove abbreviation for positional plac argument	2017-12-11 11:08:29 +01:00
Isaac Sijaranamual	38021fbb00	Switch from python 3 only TemporaryDirectory to pytest's tmpdir	2017-12-11 00:16:04 +01:00
Isaac Sijaranamual	20ae0c459a	Fixes "Error saving model" #1622	2017-12-10 23:07:13 +01:00
Isaac Sijaranamual	568130ce7c	Adds regression test_issue1622	2017-12-10 23:00:48 +01:00
Isaac Sijaranamual	e188b61960	Make cli/train.py not eat exception	2017-12-10 22:53:08 +01:00
ines	020a7e5d52	Allow 'fine_grained' option in displaCy (see #1703 ) Shows token.tag_ instead of token.pos_. Disabled by default, to not cause rendering issues for models with long fine-grained tags (e.g. merged morphological features).	2017-12-09 15:11:12 +01:00
Matthew Honnibal	3b17eb7c49	Merge branch 'master' of https://github.com/explosion/spaCy	2017-12-07 10:39:32 +01:00
Matthew Honnibal	a6b43729c6	Set version to v2.0.5	2017-12-07 10:39:14 +01:00
ines	5eaa61c2b8	Fix formatting	2017-12-07 10:23:09 +01:00
ines	24e80c51b8	Document init-model command	2017-12-07 10:14:37 +01:00
Matthew Honnibal	c91f451b0f	Fix imports and CLI in init-model	2017-12-07 10:03:07 +01:00
ines	82e80ff928	Rename model command to init_model and fix formatting	2017-12-07 09:59:23 +01:00
Ines Montani	2feeb428d6	Merge pull request #1646 from GreenRiverRUS/master Added model command to create models from raw data	2017-12-07 08:54:26 +00:00
Matthew Honnibal	6373d2580d	Increment version to v2.0.5.dev0	2017-12-07 09:53:59 +01:00
Matthew Honnibal	36b47e3fa6	Fix (and test) vector pickling	2017-12-07 09:53:30 +01:00
Matthew Honnibal	05f41ff587	Set version to 2.0.4	2017-12-06 13:24:02 +01:00
Matthew Honnibal	04c38f7e87	Merge branch 'master' of https://github.com/explosion/spaCy	2017-12-06 12:15:52 +01:00
Matthew Honnibal	361944e512	If no rules are set, lemmatize by lookup	2017-12-06 12:12:11 +01:00
Matthew Honnibal	2ab0f2d186	Merge pull request #1664 from jimregan/italian-lemmatizer BOM in Italian lemmatiser	2017-12-06 11:09:04 +01:00
Matthew Honnibal	3f247119d3	Merge pull request #1668 from sorenlind/da_morph Add more Danish morph rules and clean up existing ones	2017-12-06 11:08:09 +01:00
Matthew Honnibal	b712de774e	Fix vectors pickling	2017-12-05 12:45:24 +01:00
Matthew Honnibal	04650e38c7	Set version to 2.0.4.dev0	2017-12-05 10:52:31 +01:00
Matthew Honnibal	07acb43a85	Merge branch 'master' of https://github.com/explosion/spaCy	2017-12-04 14:42:52 +01:00
Thomas Werkmeister	94eac75b7c	fix setup.py spacy req string for packaging Requirement should be `spacy>=2.0.2` instead of `spacy2.0.2`	2017-12-03 04:16:28 -06:00
ines	f2ea6d4713	Add Dutch example sentences (see #1107 )	2017-12-01 23:36:05 +01:00
Canbey Bilgili	abe098b255	Adds Turkish Lemmatization	2017-12-01 17:04:32 +03:00
Søren Lind Kristiansen	d86b537a38	Enable morph rules for Danish	2017-11-30 15:58:02 +01:00
Søren Lind Kristiansen	13a988adc3	Remove 'Number[psor]'	2017-11-30 15:55:04 +01:00
Søren Lind Kristiansen	dd6fde18a9	Add more Danish morph rules and clean up existing ones	2017-11-30 11:17:19 +01:00
Vadim Mazaev	495eacf470	Merge branch 'model_command'	2017-11-30 12:30:26 +03:00
Vadim Mazaev	4ba7ddf651	Bugfixies	2017-11-30 12:29:38 +03:00
Jim O'Regan	a4ecdeadd4	aha	2017-11-29 23:43:25 +00:00
Jim O'Regan	2c7a9215d7	Merge branch 'master' into animacy	2017-11-29 23:31:12 +00:00
Jim O'Regan	c3e6cee17a	use inan in polimorf tagset conversion	2017-11-29 23:15:47 +00:00
Jim O'Regan	b32575e78c	imports	2017-11-29 23:03:41 +00:00
Jim O'Regan	3696ce6a7b	add UD mapping	2017-11-29 22:59:19 +00:00
Jim O'Regan	f8e7082fe4	typo in "inan", add "nhum"	2017-11-29 22:40:47 +00:00
Matthew Honnibal	6bc0f4d29f	Merge pull request #1611 from fsonntag/master Solving #1494	2017-11-29 23:11:23 +01:00
Matthew Honnibal	f9ed9ea529	Merge pull request #1624 from GreenRiverRUS/russian Add support for Russian	2017-11-29 23:10:01 +01:00
Jim O'Regan	076a6fc60a	symbols	2017-11-29 20:11:20 +00:00
Jim O'Regan	834ba3c69a	(semi generated) Polimorf mapping	2017-11-29 20:08:24 +00:00
Jim O'Regan	ba6a23fd11	BOM in Italian lemmatiser	2017-11-29 17:40:07 +00:00
ines	a31506e060	Fix off-by-one error in nlp.add_pipe(after=name) (fixes #1654 )	2017-11-28 20:37:55 +01:00
ines	b62739fbfe	Add regression test for #1654	2017-11-28 20:27:54 +01:00
ines	2e50dbb9d7	Simplify test	2017-11-28 20:27:27 +01:00
Felix Sonntag	724ae7dc55	Fixed issue of infix capturing prefixes	2017-11-28 17:17:12 +01:00
Ines Montani	9052643e2c	Merge pull request #1653 from sorenlind/da_example_typo Fix typo	2017-11-27 14:47:42 +00:00
Søren Lind Kristiansen	5fe58b885b	Fix typo	2017-11-27 15:36:18 +01:00
Ines Montani	d52b1ab245	Add unicode_literals (hopefully fixes test failure on Python 2)	2017-11-27 15:16:54 +01:00
Søren Lind Kristiansen	0ffd27b0f6	Add several Danish alternative spellings	2017-11-27 13:35:41 +01:00
Ines Montani	6362024cf8	Merge pull request #1645 from GreenRiverRUS/fix_default_meta Fixed spaCy version string in default meta	2017-11-27 11:58:02 +00:00
Vadim Mazaev	c332ffdde1	Added model command to create model from raw data: words counts, brown clusters and vectors	2017-11-27 01:21:47 +03:00
Vadim Mazaev	59f03ab1d7	Fixed spacy version string in default meta	2017-11-26 23:02:07 +03:00
Vadim Mazaev	53e7c38637	Fixed tests depends on pymorphy2	2017-11-26 21:04:44 +03:00
Vadim Mazaev	cacd859dcd	Added tag map, fixed tests fails, added more exceptions	2017-11-26 20:54:48 +03:00
Ines Montani	a7bb8f1b42	Merge pull request #1637 from sorenlind/da_tokenization Improve Danish tokenization	2017-11-26 15:41:38 +00:00
ines	c699aec089	Add offsets_from_biluo_tags helper and tests (see #1626 )	2017-11-26 16:38:01 +01:00
Søren Lind Kristiansen	ef03e9ea53	Remove unused import.	2017-11-25 13:04:02 +01:00
Søren Lind Kristiansen	6aa241bcec	Add day of month tokenizer exceptions for Danish.	2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen	0c276ed020	Add weekday abbreviations and remove abiguous month abbreviations for Danish.	2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen	056547e989	Add multiple tokenizer exceptions for Danish.	2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen	8dc265ac0c	Add test for tokenization of 'i.' for Danish.	2017-11-24 11:29:37 +01:00
Søren Lind Kristiansen	ac8116510d	Fix tokenization of 'i.' for Danish.	2017-11-24 11:16:53 +01:00
Matthew Honnibal	79f11d4f85	Pickle vectors with vocab	2017-11-23 17:19:50 +01:00
Matthew Honnibal	f29c3925ee	Fix more efficient nonproj	2017-11-23 12:48:00 +00:00
Matthew Honnibal	e10e9ad2c5	Improve efficiency of Doc.to_array	2017-11-23 12:33:27 +00:00
Matthew Honnibal	2acc907d55	Improve profiling	2017-11-23 12:33:03 +00:00
Matthew Honnibal	fa62427300	Remove lookup-based lemmatization	2017-11-23 12:32:22 +00:00
Matthew Honnibal	fb26b2cb12	Use lookup lemmatizer if lemma unset	2017-11-23 12:31:58 +00:00
Matthew Honnibal	db5c714ad2	Improve efficiency of deprojectivization	2017-11-23 12:31:34 +00:00
Matthew Honnibal	8fec7268eb	Move string cleanup under a setting flag	2017-11-23 12:19:18 +00:00
Matthew Honnibal	5949777b12	Fix misleading multi-threading docstring	2017-11-23 12:18:59 +00:00
Matthew Honnibal	542e6fd4ea	Don't remove entries from specials	2017-11-23 12:17:42 +00:00
Matthew Honnibal	30ba81f881	Merge pull request #1576 from ligser/master Actually reset caches in pipe [wip]	2017-11-23 12:54:48 +01:00
ines	c90fe92e15	Fix displaCy test	2017-11-22 05:04:39 +01:00
ines	a6f33ac27d	Fix displaCy test	2017-11-22 04:19:28 +01:00
ines	93b0be611a	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-22 00:28:55 +01:00
ines	60b4915569	Use .pos_ instead of .tags_ in displaCy by default (see #1006 )	2017-11-22 00:28:52 +01:00
Vadim Mazaev	81314f8659	Fixed tokenizer: added char classes; added first lemmatizer and tokenizer tests	2017-11-21 22:23:59 +03:00
Vadim Mazaev	52ee1f9bf9	Updated Russian Language, added lemmatizer, norm exceptions and lex attrs	2017-11-21 11:44:46 +03:00
Burton DeWilde	a5c6869b2d	Fix bug where span.orth_ != span.text (see #1612 )	2017-11-20 12:05:43 -06:00
Burton DeWilde	635792997c	Add regression test for #1612	2017-11-20 12:05:35 -06:00
ines	9a63e32f21	Add noqa to Python 2 compat variables of built-ins (see #1617 )	2017-11-20 14:03:42 +01:00
ines	d70a64d78b	Fix syntax error and formatting in test (see #1617 )	2017-11-20 14:01:25 +01:00
ines	17849dee4b	Fix French test (see #1617 )	2017-11-20 13:59:59 +01:00
Felix Sonntag	33b0f86de3	Changed tokenizer to add infix when infix_start is offset	2017-11-19 16:32:10 +01:00
Felix Sonntag	8be3392302	Added regression text for 1494	2017-11-19 16:30:35 +01:00
Motoki Wu	a52e195a0a	Fixes Issue #1207 where `noun_chunks` of `Span` gives an error. Make sure to reference `self.doc` when getting the noun chunks. Same fix as `9750a0128c`	2017-11-17 17:16:20 -08:00
Motoki Wu	b818afaa0e	Added failing test for Issue #1207 . The noun chunk iterator should work for `Doc` but not for `Span`.	2017-11-17 17:04:27 -08:00
Vadim Mazaev	a0739a06d4	Returned russian support from v1.10 branch	2017-11-17 17:06:15 +03:00
yuukos	7401152289	updated Russian tokenizer moved the trying to import pymorph into __init__	2017-11-17 17:04:50 +03:00
yuukos	3aad66cf00	added russian language support	2017-11-17 17:04:22 +03:00
ines	a3d4dd1a5d	Test adding of lots of pipeline components (see #1585 ) Just to make sure that there's no error now or in the future with adding a large number of pipeline components.	2017-11-15 17:28:06 +01:00
Roman Domrachev	61d28d03e4	Try again to do selective remove cache	2017-11-15 19:11:12 +03:00
Roman Domrachev	b3311100c7	Merge branch 'master' of github.com:explosion/spaCy	2017-11-15 18:30:04 +03:00
Matthew Honnibal	b60d92aca8	Increment version	2017-11-15 16:14:46 +01:00
Roman Domrachev	505c6a2f2f	Completely cleanup tokenizer cache Tokenizer cache can have be different keys than string That modification can slow down tokenizer and need to be measured	2017-11-15 17:55:48 +03:00
Matthew Honnibal	cf0be62096	Increment version	2017-11-15 15:00:18 +01:00
ines	97a4f9362b	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-15 14:24:00 +01:00
ines	8e65247886	Fix lex.id if vectors is None	2017-11-15 14:23:58 +01:00
Matthew Honnibal	437ad1a852	Merge pull request #1570 from explosion/feature/fix-beam-leak Fix memory leak in beam parser	2017-11-15 14:15:05 +01:00
Matthew Honnibal	2f169fdb0a	Set lex ID correctly for new tokens in Vocab	2017-11-15 13:58:03 +01:00
Matthew Honnibal	fe3c42a06b	Fix caching in tokenizer	2017-11-15 13:55:46 +01:00
Matthew Honnibal	8d692771f6	Improve profiling	2017-11-15 13:51:25 +01:00
Matthew Honnibal	b797dca977	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-15 13:11:43 +01:00
ines	c9d72de0fb	Add dummy serialization methods for Japanese and missing lang getter (resolves #1557 )	2017-11-15 12:44:02 +01:00
Matthew Honnibal	d274d3a3b9	Let beam forward use minibatches	2017-11-15 00:51:42 +01:00
Matthew Honnibal	855872f872	Remove state hashing	2017-11-14 23:36:46 +01:00
Roman Domrachev	3e21680814	Use safer method to get string without hit	2017-11-14 22:58:46 +03:00
Roman Domrachev	a33d5a068d	Try to hold origin data instead of restore it	2017-11-14 22:40:03 +03:00
Roman Domrachev	91e2fa6561	Clean all caches	2017-11-14 21:15:04 +03:00
Roman Domrachev	4e378dc4a4	Remove all obsolete code and test only initial problem	2017-11-14 20:45:04 +03:00
Roman	47ce2347b0	Create test that fails when actual cleanup caused	2017-11-14 20:28:13 +03:00
Roman	caae77f72d	Update strings.pyx	2017-11-14 19:44:40 +03:00
Roman Domrachev	3d247d2bb8	Get back previous testcase	2017-11-14 18:01:37 +03:00
Roman Domrachev	870defa815	Swap keys in proper place Remove unnecessary clear of the hits	2017-11-14 17:56:30 +03:00
Roman Domrachev	86ca434c93	Merge github.com:explosion/spaCy	2017-11-14 17:46:22 +03:00
Roman Domrachev	a2745b0e84	StringStore now actually cleaned Do not lose docs in ref tracking	2017-11-14 17:45:50 +03:00
Matthew Honnibal	2512ea9eeb	Fix memory leak in beam parser	2017-11-14 02:11:40 +01:00
Matthew Honnibal	86ddf692a1	Fix bug in limit calculation on dev data	2017-11-14 01:37:10 +01:00
Ines Montani	ea6c85c67a	Merge pull request #1566 from MathiasDesch/master (resolves #1248 ) Add exceptions to tokenizer and norm	2017-11-13 19:05:22 +01:00
Matthew Honnibal	1b348389bb	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-13 18:18:48 +01:00
Matthew Honnibal	ca73d0d8fe	Cleanup states after beam parsing, explicitly	2017-11-13 18:18:26 +01:00
Matthew Honnibal	63ef9a2e73	Remove __dealloc__ from ParserBeam	2017-11-13 18:18:08 +01:00
Mathias Deschamps	c0691b2ab4	Add tokenizer exceptions for ing verbs Extend list of tokenizing exceptions introduced in `123810b`	2017-11-13 17:46:05 +01:00
Mathias Deschamps	288298ead9	Add norm exception for ing verbs Some ing verbs are sometimes written in or in'. Make the NORM form correct	2017-11-13 17:46:05 +01:00
Abhinav Sharma	59f5740ede	improved upon the list of included stop_words	2017-11-13 17:13:49 +05:30
Matthew Honnibal	6e641f46d4	Create a preprocess function that gets bigrams	2017-11-12 00:43:41 +01:00
Matthew Honnibal	c9251d79e3	Edit comment	2017-11-11 18:38:32 +01:00
Matthew Honnibal	dd1678eab3	Edit comment	2017-11-11 18:37:08 +01:00
Roman Domrachev	ee60a52ee7	Fix test imports and last batch cleanup	2017-11-11 11:32:16 +03:00
Roman Domrachev	4a6b094e09	Remove unused import	2017-11-11 03:13:05 +03:00
Roman Domrachev	3c600adf23	Try to fix StringStore clean up (see #1506 )	2017-11-11 03:11:27 +03:00
ines	ee97fd3cb4	Add regression test for #1547	2017-11-11 00:14:03 +01:00
ines	2df27db671	Add unicode declaration	2017-11-11 00:13:56 +01:00
ines	35653bef3a	Add missing import (fixes #1546 )	2017-11-10 19:05:18 +01:00
ines	4c5d2c80d5	Re-add python -m to commands, too brittle :( (see #1536 )	2017-11-10 02:30:55 +01:00
ines	123810b6de	Add "lovin'" to tokenizer exceptions (see #1248 )	2017-11-09 17:09:30 +01:00
ines	1c218397f6	Ensure path in Doc.to_disk/from_disk (resolves ##1521) Also add Doc serialization tests with both Path and string path options	2017-11-09 02:29:03 +01:00
Matthew Honnibal	49fd5a646f	Set version for 2.0.2 release	2017-11-08 22:39:39 +01:00
Matthew Honnibal	fba2dbddf7	Increment version	2017-11-08 22:19:08 +01:00
Matthew Honnibal	a5ea0fdf5a	Fix #1518 : vocab.vectors.resize() didn't work	2017-11-08 22:18:37 +01:00
Matthew Honnibal	de45702bbe	Strip dev suffixes from version for compatibility check	2017-11-08 18:40:21 +01:00
Matthew Honnibal	51639214a1	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-08 18:04:33 +01:00
Matthew Honnibal	a2f980de4e	Exclude .devN versioning from compatibility check	2017-11-08 18:03:52 +01:00
Daniel Hershcovich	d7ae54ff44	Fix typo in message	2017-11-08 16:06:28 +02:00
Matthew Honnibal	4194bc5744	Xfail flakey serialization test	2017-11-08 13:55:13 +01:00
Matthew Honnibal	d5537e5516	Work on Windows test failure	2017-11-08 13:25:18 +01:00
Matthew Honnibal	c27c82d5f9	Fix serialization	2017-11-08 13:08:48 +01:00
Matthew Honnibal	1d5599cd28	Fix dtype	2017-11-08 12:18:32 +01:00
Matthew Honnibal	fa7fdd0d9b	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-08 12:11:31 +01:00
Matthew Honnibal	072ff38a01	Try to fix python3.5 serialization	2017-11-08 12:10:49 +01:00
Ines Montani	3a0f34d567	Merge pull request #1509 from abhi18av/patch-1 Create examples.py for Hindi language	2017-11-08 11:37:19 +01:00
Ines Montani	42b241ccd0	Update language code in usage example in comment	2017-11-08 11:36:38 +01:00
Matthew Honnibal	e262e8d942	Increment version to v2.0.2.dev0	2017-11-08 11:25:47 +01:00
Matthew Honnibal	a8b592783b	Make a dtype more specific, to fix a windows build	2017-11-08 11:24:35 +01:00
Abhinav Sharma	84edade82d	Create examples.py Populated the file with the translations of English example sentences	2017-11-08 13:23:08 +05:30
Matthew Honnibal	d725aee4e2	Increment version to 2.0.1	2017-11-08 02:14:47 +01:00
Matthew Honnibal	8d6f68f1df	Increment version	2017-11-08 01:12:34 +01:00
ines	bcf42b8846	Fix typo	2017-11-08 01:06:37 +01:00
Matthew Honnibal	bbd2a3dee1	Fix title in about.py	2017-11-07 14:02:58 +01:00
Matthew Honnibal	4efaf9306c	Set version to spacy-nightly rc2	2017-11-07 13:27:26 +01:00
Matthew Honnibal	bf1ec2965f	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-07 13:20:29 +01:00
Matthew Honnibal	726f689da4	Fix missing import	2017-11-07 13:20:12 +01:00
ines	834f9c1aab	Update about.py	2017-11-07 13:11:33 +01:00
ines	a4662a31a9	Move model package templates to cli.package and update docs	2017-11-07 12:15:35 +01:00
ines	a09c096d3c	Get docs ready for v2.0.0	2017-11-07 12:00:43 +01:00
Matthew Honnibal	9a88e66103	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-07 02:00:06 +01:00
Matthew Honnibal	174abe4677	Increment to 2.0.0rc1	2017-11-07 01:59:46 +01:00
ines	42a0fbf291	Fix textcat simple train example	2017-11-07 01:25:54 +01:00
ines	8fb48b9b91	Update and document new util functions	2017-11-07 00:22:43 +01:00
Matthew Honnibal	1cab703bba	Move minibatch function to util	2017-11-06 23:45:36 +01:00
ines	5f43953536	Move test	2017-11-06 23:14:10 +01:00
Matthew Honnibal	dd90fe09f5	Remove extraneous label from textcat class	2017-11-06 22:09:02 +01:00
Matthew Honnibal	45e0617e61	Allow Language.update to take unicode text and dict objects	2017-11-06 22:07:38 +01:00
Matthew Honnibal	1831dbd065	Add test of simple textcat workflow	2017-11-06 22:04:29 +01:00
Matthew Honnibal	ffb9101f3f	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-06 19:20:41 +01:00
Matthew Honnibal	8fea512ac8	Don't set tensor in textcat	2017-11-06 19:20:14 +01:00
ines	acb9bdb852	Fix PRON_LEMMA imports	2017-11-06 17:41:53 +01:00
Matthew Honnibal	7d46793dd7	Add PRON_LEMMA to spacy.symbols	2017-11-06 17:38:25 +01:00
Matthew Honnibal	2f7e9f390d	Make test less flakey	2017-11-06 17:34:50 +01:00
Matthew Honnibal	407b08017e	Make test less flakey	2017-11-06 17:31:40 +01:00
Matthew Honnibal	102f797933	Fix lemma ordering in test	2017-11-06 17:02:17 +01:00
Matthew Honnibal	75e1618ec3	Fix lemma clobbering	2017-11-06 16:56:19 +01:00
Matthew Honnibal	6fdffd7246	Merge pull request #1497 from explosion/feature/improve-optimizer-handling 💫 Improve optimizer handling	2017-11-06 16:41:15 +01:00
Matthew Honnibal	8e6795437b	Set release=True	2017-11-06 16:39:32 +01:00
Matthew Honnibal	5c85bf3791	Fix missing import	2017-11-06 15:06:27 +01:00
Matthew Honnibal	25859dbb48	Return optimizer from begin_training, creating if necessary	2017-11-06 14:26:49 +01:00
Matthew Honnibal	465adfee94	Remove unused resume_training method, and pass optimizer through	2017-11-06 14:26:00 +01:00
Matthew Honnibal	13336a6197	Fix Adam import	2017-11-06 14:25:37 +01:00
Matthew Honnibal	2eb11d60f2	Add function create_default_optimizer to spacy._ml	2017-11-06 14:11:59 +01:00
Matthew Honnibal	31babe3c3f	Fix non-clobbering lemmatization	2017-11-06 12:36:05 +01:00
Matthew Honnibal	63c6ae4191	Fix lemmatizer test	2017-11-06 11:57:06 +01:00
Matthew Honnibal	a86a0181b5	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 22:19:10 +01:00
Matthew Honnibal	134d3b8143	Fix morphology	2017-11-05 22:18:22 +01:00
ines	08d1cf850a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 21:41:58 +01:00
ines	baa231745c	Fix Dutch tag map	2017-11-05 21:41:50 +01:00
Matthew Honnibal	46e62ad747	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 19:40:00 +01:00
Matthew Honnibal	bb25cb0f76	Avoid clobbering preset lemmas	2017-11-05 19:39:38 +01:00
ines	507ecb67af	Fix Spanish tag map	2017-11-05 19:23:34 +01:00
Matthew Honnibal	320008352b	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 18:46:15 +01:00
Matthew Honnibal	38109a0e4a	Register SentenceSegmenter in Language.factories	2017-11-05 18:45:57 +01:00
ines	975e1042ff	Fix Italian tag map	2017-11-05 18:34:09 +01:00
ines	6b2d6e4937	Fix Portuguese tag map	2017-11-05 18:31:00 +01:00
ines	fa2687fded	Fix Dutch tag map	2017-11-05 17:57:59 +01:00
ines	fb8990d916	Fix Spanish tag map	2017-11-05 17:48:46 +01:00
ines	9d13288f73	Fix French tag map	2017-11-05 17:47:59 +01:00
ines	54579805c5	Fix French tag map	2017-11-05 17:44:05 +01:00
Matthew Honnibal	2b35bb76ad	Fix tensorizer on GPU	2017-11-05 15:34:40 +01:00
Matthew Honnibal	6e5181bbaa	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 15:33:56 +01:00
Matthew Honnibal	6f438b17c1	Increment version to v2.0.0a19	2017-11-05 14:43:36 +01:00
Matthew Honnibal	225cc249c9	Pass string path to numpy, to fix #1479	2017-11-05 14:42:46 +01:00
Matthew Honnibal	00435d8f0c	Add extra beam parsing test	2017-11-05 14:39:57 +01:00
Matthew Honnibal	e777ea25bb	Merge pull request #1492 from uwol/develop TextCategorizer return parameter fix	2017-11-05 14:13:04 +01:00
Matthew Honnibal	0d4bd6414e	Fix Italian tag map	2017-11-05 14:11:03 +01:00
ines	ef597622a6	Add Portuguese tag map	2017-11-05 13:58:34 +01:00
ines	793c62dfda	Add Dutch tag map	2017-11-05 13:48:07 +01:00
ines	f7485a09c8	Fix Italian tag map	2017-11-05 13:12:58 +01:00
uwol	a2162b8908	tensorizer return parameter fix	2017-11-05 12:25:10 +01:00
ines	0a27afbf86	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-04 23:32:52 +01:00
ines	3cef901834	Add tag map for French and Italian	2017-11-04 23:32:51 +01:00
Matthew Honnibal	cfb83c231c	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-04 23:08:19 +01:00
Matthew Honnibal	d185927998	Undo harmful pickling hacks on Language class	2017-11-04 23:07:03 +01:00
ines	6c15aafebd	Fix formatting	2017-11-04 23:07:02 +01:00
Matthew Honnibal	3ca16ddbd4	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-04 00:25:02 +01:00
Matthew Honnibal	e4ec4be948	Fix parser test	2017-11-04 00:23:45 +01:00
Matthew Honnibal	98c29b7912	Add padding vector in parser, to make gradient more correct	2017-11-04 00:23:23 +01:00
ines	5e7d98f72a	Remove test for #1491	2017-11-03 22:10:57 +01:00
ines	718f1c50fb	Add regression test for #1491	2017-11-03 21:11:20 +01:00
Matthew Honnibal	144a93c2a5	Back-off to tensor for similarity if no vectors	2017-11-03 20:56:33 +01:00
Matthew Honnibal	1e9634691a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-03 20:21:15 +01:00
Matthew Honnibal	13c8881d2f	Expose parser's tok2vec model component	2017-11-03 20:20:59 +01:00
Matthew Honnibal	17c63906f9	Update tensorizer component	2017-11-03 20:20:26 +01:00
Matthew Honnibal	2bf21cbe29	Update model after optimising it instead of waiting	2017-11-03 20:20:01 +01:00
Matthew Honnibal	d6e831bf89	Fix lemmatizer tests	2017-11-03 19:46:34 +01:00
ines	eef930c73e	Assert instead of print	2017-11-03 18:50:57 +01:00
ines	f0986df94b	Add test for #1488 (passes on v2.0.0a18?)	2017-11-03 14:44:36 +01:00
Matthew Honnibal	711278b667	Make test less flakey	2017-11-03 14:36:08 +01:00
Matthew Honnibal	7fea845374	Remove print statement	2017-11-03 14:04:51 +01:00
Matthew Honnibal	0a534ae96a	Fix test for backprop d_pad	2017-11-03 14:04:16 +01:00
Matthew Honnibal	33bd2428db	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-03 13:29:56 +01:00
Matthew Honnibal	6681058abd	Fix tensor extending in tagger	2017-11-03 13:29:36 +01:00
Matthew Honnibal	bd2cbdfa85	Make Morphology not fail on unknown tags	2017-11-03 13:29:09 +01:00
Matthew Honnibal	c9b118a7e9	Set softmax attr in tagger model	2017-11-03 11:22:01 +01:00
Matthew Honnibal	a5b05f85f0	Set Doc.tensor attribute in parser	2017-11-03 11:21:00 +01:00
Matthew Honnibal	62ed58935a	Add Doc.extend_tensor() method	2017-11-03 11:20:31 +01:00
Matthew Honnibal	d6fc39c8a6	Set Doc.tensor from Tagger	2017-11-03 11:20:05 +01:00
Matthew Honnibal	b3264aa5f0	Expose the softmax layer in the tagger model, to allow setting tensors	2017-11-03 11:19:51 +01:00
Matthew Honnibal	c2bbf076a4	Add document length cap for training	2017-11-03 01:54:54 +01:00
Matthew Honnibal	6771780d3f	Fix backprop of padding variable	2017-11-03 01:54:34 +01:00
Matthew Honnibal	54a716f2ec	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-03 00:55:20 +01:00
Matthew Honnibal	260e6ee3fb	Improve efficiency of backprop of padding variable	2017-11-03 00:49:11 +01:00
Matthew Honnibal	a22f96c3f1	Add test for backpropagating padding	2017-11-03 00:48:54 +01:00
ines	9baab241b4	Add skeleton language data for Turkish	2017-11-02 16:32:24 +01:00
ines	c6fea3e5f6	Add Romanian and Croatian skeletons (experimental) Add language data templates to make it easier for others to contribute to the language support	2017-11-01 23:04:28 +01:00
ines	18c859500b	Add missing imports	2017-11-01 23:02:51 +01:00
ines	819e30a26e	Tidy up tokenizer exceptions	2017-11-01 23:02:45 +01:00
ines	3af281a334	Update test model name	2017-11-01 23:02:00 +01:00
Matthew Honnibal	b30dd36179	Allow Tagger.add_label() before training	2017-11-01 21:49:24 +01:00
Matthew Honnibal	eca41f0cf6	Fix filename conversion for conllu	2017-11-01 21:26:49 +01:00
Matthew Honnibal	e237472cdc	Fix tag and filename conversion for conllu	2017-11-01 21:25:33 +01:00
Matthew Honnibal	b84d99b281	Revert tagger.add_label() changes, to fix model	2017-11-01 21:10:45 +01:00
Matthew Honnibal	f5855e539b	Fix tagger model loading	2017-11-01 20:42:36 +01:00
Matthew Honnibal	624644adfe	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 20:26:41 +01:00
ines	5f661a1b3a	Remove tensorizer from pre-set pipe_names	2017-11-01 19:48:33 +01:00
Matthew Honnibal	190522efd3	Fix tagger when some tags aren't in Morphology	2017-11-01 19:27:49 +01:00
Matthew Honnibal	e85e31cfbd	Fix backprop of d_pad	2017-11-01 19:27:26 +01:00
Matthew Honnibal	759cc79185	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 19:00:19 +01:00
Matthew Honnibal	1ae40b50b4	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 17:07:02 +01:00
Matthew Honnibal	7ae1aacdb8	Fix add_label methods	2017-11-01 17:06:43 +01:00
ines	8c2260e18c	Move span tests to /doc	2017-11-01 16:56:35 +01:00
Matthew Honnibal	2ef7b59eb0	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 16:51:41 +01:00
ines	1d1f91a041	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 16:49:44 +01:00
ines	9659391944	Update deprecated methods and add warnings	2017-11-01 16:49:42 +01:00
ines	260cb37224	Catch deprecation warning	2017-11-01 16:49:18 +01:00
ines	5914faafbb	Fix .merge tests to not use deprecated API	2017-11-01 16:49:11 +01:00
ines	705a4e3e4a	Fix formatting	2017-11-01 16:44:08 +01:00
Matthew Honnibal	d17a12c71d	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 16:38:26 +01:00
Matthew Honnibal	9f9439667b	Don't create low-data text classifier if no vectors	2017-11-01 16:34:09 +01:00
Matthew Honnibal	e7a9174877	Add add_label methods to Tagger and TextCategorizer	2017-11-01 16:32:44 +01:00
ines	39e0586192	Add deprecated helper Uses warning to show DeprecationWarning and custom stack trace	2017-11-01 16:32:36 +01:00
Matthew Honnibal	a7bf38bf31	Remove misleading comment on util.get_cuda_stream()	2017-11-01 13:57:25 +01:00
Matthew Honnibal	273e96b63f	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 13:27:35 +01:00
Matthew Honnibal	9e0ebee81c	Add Token.is_sent_start property, so can deprecate Token.sent_start	2017-11-01 13:27:14 +01:00
Matthew Honnibal	7e7116cdf7	Fix Doc.to_array when only one string attr provided	2017-11-01 13:26:43 +01:00
Matthew Honnibal	301fb2bb60	Implement Span.n_lefts and Span.n_rights	2017-11-01 13:25:12 +01:00
Matthew Honnibal	c047498f87	Fix vectors test	2017-11-01 13:24:47 +01:00
ines	9a5e7c6fe2	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 13:14:45 +01:00
ines	bfe17b7df1	Fix begin_training if get_gold_tuples is None	2017-11-01 13:14:31 +01:00
ines	affd3404ab	Remove old model command (now "vocab")	2017-11-01 13:14:03 +01:00
Matthew Honnibal	fdb4b8e456	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 02:07:17 +01:00
Matthew Honnibal	c48dd0e1d3	Fix vector pruning	2017-11-01 02:06:58 +01:00
ines	37e62ab0e2	Update vector meta in meta.json	2017-11-01 01:25:09 +01:00
ines	96b4aef0bf	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 01:10:53 +01:00
Matthew Honnibal	86eba61fae	Fix token.vector when vectors are missing	2017-11-01 00:47:35 +01:00
ines	5683fd65ed	Update docstrings	2017-11-01 00:42:39 +01:00
Matthew Honnibal	44bce8e53f	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 00:35:16 +01:00
Matthew Honnibal	c16310d156	Update vectors with find method	2017-11-01 00:34:55 +01:00
Ines Montani	d11659463b	Merge pull request #1152 from jimregan/develop-irish [WIP] attempt a port from #1147	2017-11-01 00:23:43 +01:00
ines	2ad2f09d12	Update docstrings and simplify most_similar	2017-11-01 00:18:08 +01:00
Jim O'Regan	08b0bfd153	merge	2017-10-31 22:55:59 +00:00
Jim O'Regan	00ecfa5417	Ó, not O	2017-10-31 22:54:42 +00:00
ines	ba2e6c8c6f	Update docstrings and formatting	2017-10-31 23:23:34 +01:00
Matthew Honnibal	0de8d213a3	Merge pull request #1475 from explosion/feature/sm-vectors Improve and simplify Vectors class	2017-10-31 22:59:50 +01:00
Ines Montani	25b1d6cd91	Fix syntax error	2017-10-31 22:36:03 +01:00
Matthew Honnibal	92dc127569	Fix test for Python 3	2017-10-31 22:21:55 +01:00
Jim O'Regan	fe4b10346a	replace example sentence until I get around to adding a punctuation.py	2017-10-31 20:24:53 +00:00
Matthew Honnibal	c5799ecc7b	Remove print statement	2017-10-31 21:12:33 +01:00
ines	7e424a1804	Don't copy exception dicts if not necessary and tidy up	2017-10-31 21:05:29 +01:00
Matthew Honnibal	c390f2d745	Make it easier to pass explicit no-pruning to vocab	2017-10-31 20:14:47 +01:00
Ines Montani	06c25a8882	Remove comma that caused list to wrap in tuple! Also removed extra dict wrappings for performance (we used to have them in there, but they should only really exist if copying the dict is absolutely necessary)	2017-10-31 20:13:16 +01:00
Matthew Honnibal	d90a22afe6	Fix loading previous vectors models	2017-10-31 19:58:35 +01:00
Ines Montani	147448b65b	Add missing symbols	2017-10-31 19:34:45 +01:00
Matthew Honnibal	997a61557a	Add vectors.n_keys property	2017-10-31 19:30:52 +01:00
Matthew Honnibal	8075726838	Restore vector usage in models	2017-10-31 19:21:17 +01:00
Matthew Honnibal	3659a807b0	Remove vector pruning arg from train CLI	2017-10-31 19:21:05 +01:00
Ines Montani	9b0de9fb43	Fix import of symbols (now nested one level lower)	2017-10-31 19:17:58 +01:00
Matthew Honnibal	59203a2e8a	Move vector pruning command into spacy vocab cli tool	2017-10-31 19:10:01 +01:00
Matthew Honnibal	77d8f5de9a	Revise and simplify Vectors class	2017-10-31 18:25:08 +01:00
Jim O'Regan	d4a8160c36	change quotes	2017-10-31 15:15:44 +00:00
Jim O'Regan	34ca59691b	no idea what is wrong here	2017-10-31 14:50:13 +00:00
Jim O'Regan	41dd29e48e	merge	2017-10-31 14:07:45 +00:00
Matthew Honnibal	cb5217012f	Fix vector remapping	2017-10-31 11:40:46 +01:00
Matthew Honnibal	9c11ee4a1c	WIP on vectors fixes	2017-10-31 11:22:56 +01:00
Matthew Honnibal	ce876c551e	Fix GPU usage	2017-10-31 02:33:34 +01:00
Matthew Honnibal	7698903617	Fix GPU usage	2017-10-31 02:33:16 +01:00
Matthew Honnibal	368fdb389a	WIP on refactoring and fixing vectors	2017-10-31 02:00:26 +01:00
Matthew Honnibal	4e3006cec7	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-30 19:44:58 +01:00
Matthew Honnibal	4112a991ec	Fix vector pruning	2017-10-30 19:44:40 +01:00
ines	ec657c1ddc	Update vocab docs and document Vocab.prune_vectors	2017-10-30 19:35:41 +01:00
ines	803e41bc66	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-30 18:39:51 +01:00
ines	8e02294241	Add vectors to Language.meta	2017-10-30 18:39:48 +01:00
ines	abf8aa05d3	Populate --create-meta defaults from file if available If meta.json is found in directory and user chooses to overwrite it, show existing data as defaults.	2017-10-30 18:39:38 +01:00
ines	ce98fa7934	Fix formatting	2017-10-30 18:38:55 +01:00
ines	98c35d2585	Fix spacy vocab command	2017-10-30 18:38:41 +01:00
Matthew Honnibal	e98451b5f7	Add -prune-vectors argument to spacy.cly.train	2017-10-30 18:00:10 +01:00
Matthew Honnibal	e026b29ea9	Add prune_vectors method to Vocab	2017-10-30 17:59:43 +01:00
Explosion Bot	d0cf12c8c7	Fix off-by-one error in vectors	2017-10-30 16:22:03 +01:00
Explosion Bot	05a1dd570e	Fix vocab script	2017-10-30 16:19:22 +01:00
Explosion Bot	b46bdce8d2	Add missing import	2017-10-30 16:18:10 +01:00
Explosion Bot	2d2cc294b4	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-30 16:15:05 +01:00
Explosion Bot	0fc1209421	Wire up new vocab command	2017-10-30 16:14:50 +01:00
Explosion Bot	aa64031751	Fix clear_vectors() method on Vocab	2017-10-30 16:09:04 +01:00
Explosion Bot	7b56b2f04b	Add Vocab.cfg attr, to hold stuff like oov probs	2017-10-30 16:08:50 +01:00
Explosion Bot	ab5d5ed880	Fix vectors.add()	2017-10-30 16:08:09 +01:00
Explosion Bot	41d0f1665a	Fix add_attrs for cluster	2017-10-30 16:07:50 +01:00
ines	5453821a9f	Update NER annotation scheme Add note on training data sources and include coarse-grained Wikipedia scheme	2017-10-30 13:53:49 +01:00
Explosion Bot	5ede7cec9b	Improve Lexeme.set_attrs method	2017-10-30 11:49:11 +01:00
Explosion Bot	72aea8f105	Update vectors.add() to allow setting keys to rows	2017-10-30 10:03:08 +01:00
Matthew Honnibal	c43cc5361d	Merge pull request #1467 from explosion/feature/better-parser 💫 Bug fixes to parser model (requires retraining)	2017-10-29 02:05:22 +02:00
ines	6c2d8d3b2a	Use shortcuts-nightly.json to resolve model shortcuts	2017-10-29 01:28:31 +02:00
Matthew Honnibal	a0c7dabb72	Fix bug in 8-token parser features	2017-10-28 23:01:35 +00:00
Matthew Honnibal	b713d10d97	Switch to 13 features in parser	2017-10-28 23:01:14 +00:00
Matthew Honnibal	3b91097321	Whitespace	2017-10-28 17:05:11 +00:00
Matthew Honnibal	6ef72864fa	Improve initialization for hidden layers	2017-10-28 17:05:01 +00:00
Matthew Honnibal	5414e2f14b	Use missing features in parser	2017-10-28 16:45:54 +00:00
Matthew Honnibal	df4803cc6d	Add learned missing values for parser	2017-10-28 16:45:14 +00:00
Matthew Honnibal	64e4ff7c4b	Merge 'tidy-up' changes into branch. Resolve conflicts	2017-10-28 13:16:06 +02:00
Explosion Bot	fb0c96f39a	Fix optimizer loading	2017-10-28 11:58:16 +02:00
Explosion Bot	b22e42af7f	Merge changes to parser and _ml	2017-10-28 11:52:10 +02:00
ines	d96e72f656	Tidy up rest	2017-10-27 21:07:59 +02:00
ines	a8e10f94e4	Tidy up Lexeme and update docs	2017-10-27 21:07:50 +02:00
ines	ba5e646219	Tidy up pipeline	2017-10-27 20:29:08 +02:00
ines	b4d226a3f1	Tidy up syntax	2017-10-27 19:45:57 +02:00
ines	5167a0cce2	Tidy up Vectors and docs	2017-10-27 19:45:19 +02:00
ines	7946464742	Remove spacy.tagger (now in pipeline)	2017-10-27 19:45:04 +02:00
ines	9c89e2cdef	Remove unused syntax iterators (now in language data)	2017-10-27 18:09:53 +02:00
ines	d2df81d907	Fix not implemented Span getters	2017-10-27 18:09:28 +02:00
ines	544a407b93	Tidy up Doc, Token and Span and add missing docs	2017-10-27 17:07:26 +02:00
ines	a6135336f5	Tidy up gold	2017-10-27 17:02:55 +02:00
ines	6a0483b7aa	Tidy up and document Doc, Token and Span	2017-10-27 15:41:45 +02:00
ines	1a559d4c95	Remove old, unused file	2017-10-27 15:34:35 +02:00
ines	91899d337b	Tidy up language, lemmatizer and scorer	2017-10-27 14:40:14 +02:00
ines	778212efea	Tidy up init and main	2017-10-27 14:39:51 +02:00
ines	e33b7e0b3c	Tidy up parser and ML	2017-10-27 14:39:30 +02:00
ines	e3265998c0	Tidy up displaCy	2017-10-27 14:39:19 +02:00
ines	ea4a41c8fb	Tidy up util and helpers	2017-10-27 14:39:09 +02:00
ines	d941fc3667	Tidy up CLI	2017-10-27 14:38:39 +02:00
Matthew Honnibal	531142a933	Merge remote-tracking branch 'origin/develop' into feature/better-parser	2017-10-27 12:34:48 +00:00
Matthew Honnibal	19a2b9bf27	Fix import of Optimizer	2017-10-27 12:33:42 +00:00
Matthew Honnibal	4d048e94d3	Add compat for thinc.neural.optimizers.Optimizer	2017-10-27 10:23:49 +00:00
Ines Montani	4033e70c71	Merge pull request #1461 from explosion/feature/disable-pipes 💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples	2017-10-27 12:21:40 +02:00
Matthew Honnibal	75a637fa43	Remove redundant imports from _ml	2017-10-27 10:19:56 +00:00
Matthew Honnibal	c9987cf131	Avoid use of numpy.tensordot	2017-10-27 10:18:36 +00:00
Matthew Honnibal	f6fef30adc	Remove dead code from spacy._ml	2017-10-27 10:16:41 +00:00
Matthew Honnibal	b9616419e1	Add try/except around bz2 import	2017-10-27 01:18:05 +00:00
Matthew Honnibal	783c0c8795	Remove unnecessary bz2 import	2017-10-27 01:17:54 +00:00
Matthew Honnibal	bb25bdcd92	Adjust call to scatter_add for the new version	2017-10-27 01:16:55 +00:00
Ines Montani	287a3ca256	Merge pull request #1466 from explosion/feature/rename-pipeline 💫 Clean up dead linear model code	2017-10-27 02:03:28 +02:00
ines	4eb5bd02e7	Update textcat pre-processing after to_array change	2017-10-27 00:32:12 +02:00
ines	2d6ec99884	Set 'model' as default model name to prevent meta.json errors	2017-10-26 16:12:23 +02:00
ines	9e372913e0	Remove old 'SP' condition in tag map	2017-10-26 16:11:57 +02:00
Matthew Honnibal	c52671420c	Remove old cfile import	2017-10-26 13:28:19 +02:00
Matthew Honnibal	ea03f1ef64	Remove obsolete cfile code	2017-10-26 13:23:36 +02:00
Matthew Honnibal	90d1d9b230	Remove obsolete parser code	2017-10-26 13:22:45 +02:00
ines	6f78e29bed	Add LAW entity label to glossary	2017-10-26 13:04:35 +02:00
ines	9bf78d5fb3	Update spacy.explain docs	2017-10-26 13:04:25 +02:00
Matthew Honnibal	33f8c58782	Remove obsolete parser.pyx	2017-10-26 12:42:05 +02:00
Matthew Honnibal	a8abc47811	Rename BaseThincComponent --> Pipe	2017-10-26 12:40:40 +02:00
Matthew Honnibal	b0f3ea2200	Fix names of pipeline components NeuralDependencyParser --> DependencyParser NeuralEntityRecognizer --> EntityRecognizer TokenVectorEncoder --> Tensorizer NeuralLabeller --> MultitaskObjective	2017-10-26 12:38:23 +02:00
Matthew Honnibal	b6b4f1aaf7	Merge pull request #1462 from explosion/feature/vector-meta-data 💫 Add vector meta data to model meta.json on train/package and show in docs	2017-10-26 11:39:41 +02:00
Matthew Honnibal	35977bdbb9	Update better-parser branch with develop	2017-10-26 00:55:53 +00:00
Ines Montani	090bd00369	Merge pull request #1464 from mayukh18/develop_bengali_pronouns added the bengali pronouns for v2.0	2017-10-25 21:55:25 +02:00
mayukh18	1bc07758fa	added few bengali pronouns	2017-10-25 22:24:40 +05:30
ines	de1e5f35d5	Merge branch 'develop' into feature/disable-pipes	2017-10-25 16:33:12 +02:00
ines	728b609bf9	Merge branch 'develop' into feature/vector-meta-data	2017-10-25 16:32:22 +02:00
ines	c0b55ebdac	Fix PhraseMatcher.__contains__ and add more tests	2017-10-25 16:31:11 +02:00
ines	91beacf5e3	Fix Matcher.__contains__	2017-10-25 16:19:38 +02:00
ines	11e3f19764	Fix vectors data added after training (see #1457 )	2017-10-25 16:08:26 +02:00
ines	057954695b	Read pipeline and vector data off model in --generate-meta	2017-10-25 16:03:26 +02:00
ines	273e638183	Add vector data to model meta after training (see #1457 )	2017-10-25 16:03:05 +02:00
ines	18aae423fb	Remove import of non-existing function	2017-10-25 15:54:10 +02:00
ines	5117a7d24d	Fix whitespace	2017-10-25 15:54:02 +02:00
ines	657a4d91bc	Merge branch 'develop' into feature/disable-pipes	2017-10-25 15:19:05 +02:00
ines	1a722dac31	Merge branch 'develop' into feature/disable-pipes	2017-10-25 15:18:18 +02:00
ines	6a00de4f77	Fix check of unexpected pipe names in restore()	2017-10-25 14:56:35 +02:00
ines	7f03932477	Return self on __enter__	2017-10-25 14:56:16 +02:00
Matthew Honnibal	b5de768852	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-25 14:44:16 +02:00
Matthew Honnibal	094512fd47	Fix model-mark on regression test.	2017-10-25 14:44:00 +02:00
Matthew Honnibal	e70f80f29e	Add Language.disable_pipes()	2017-10-25 13:46:41 +02:00
Matthew Honnibal	075e8118ea	Update from develop	2017-10-25 12:45:21 +02:00
ines	72497c8cb2	Remove comments and add TODO	2017-10-25 12:15:43 +02:00
ines	4d97efc3b5	Add missing docstrings	2017-10-25 12:10:16 +02:00
ines	1262aa0bf9	Implement PhraseMatcher.__contains__	2017-10-25 12:10:04 +02:00
ines	9c733a8849	Implement PhraseMatcher.__len__	2017-10-25 12:09:56 +02:00
ines	7eebeeaf85	Fix Matcher.__contains__	2017-10-25 12:09:47 +02:00
ines	7bcec57462	Remove unused attribute	2017-10-25 12:08:54 +02:00
ines	0b1dcbac14	Remove unused function	2017-10-25 12:08:46 +02:00
ines	3484174e48	Add Language.path	2017-10-25 11:57:43 +02:00
Ines Montani	d3bf488e16	Merge pull request #1171 from mollerhoj/support-danish Improve basic support for Danish	2017-10-24 20:29:57 +02:00
Matthew Honnibal	d9bb1e5de8	Increment version	2017-10-24 17:06:19 +02:00
Matthew Honnibal	908809d488	Update tests	2017-10-24 17:05:15 +02:00
Matthew Honnibal	66766c1454	Restore SP tag to English tag_map, until models migrate	2017-10-24 17:05:00 +02:00
Matthew Honnibal	30e67fa808	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-24 16:08:23 +02:00
Matthew Honnibal	b0f6fd3f1d	Disable tokenizer cache for special-cases. Fixes #1250	2017-10-24 16:08:05 +02:00
Matthew Honnibal	63f0bde749	Add test for #1250 : Tokenizer cache clobbered special-case attrs	2017-10-24 16:07:18 +02:00
ines	8492d5be6d	Always make lemmatizer return a list of lemmas, not a set	2017-10-24 16:00:56 +02:00
ines	95f866f99f	Add lookup argument to Lemmatizer.load	2017-10-24 16:00:56 +02:00
ines	95f6174516	Remove tensorizer from model pipeline example in spacy package	2017-10-24 16:00:56 +02:00
ines	090aed940a	Add test for currently failing span.as_doc case	2017-10-24 16:00:56 +02:00
ines	4ef81a9ebc	Fix whitespace	2017-10-24 16:00:56 +02:00
Matthew Honnibal	18f1c1d0ba	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-24 14:29:43 +02:00
Matthew Honnibal	4bea65a1a8	Fix Issue #1450 : Off-by-1 in * and ? matches Patterns that end in variable-length operators e.g. * and ? now end on the correct token. Previously, they were off by 1: the next token was pulled into the match, even if that's where the pattern failed.	2017-10-24 14:26:27 +02:00
Matthew Honnibal	391d5ef0d1	Normalize imports in regression test	2017-10-24 14:25:49 +02:00
ines	c55db0a4a1	Add example sentences for Japanese and Chinese (see #1107 )	2017-10-24 13:02:24 +02:00
ines	66f8f9d4a0	Fix Japanese tokenizer JapaneseTokenizer now returns a Doc, not individual words	2017-10-24 13:02:19 +02:00
Matthew Honnibal	dd5b2d8fa3	Check for out-of-memory when calling calloc. Closes #1446	2017-10-24 12:40:47 +02:00
Matthew Honnibal	b66b8f028b	Fix #1375 -- out-of-bounds on token.nbor()	2017-10-24 12:10:39 +02:00
Matthew Honnibal	a68d89a4f3	Add failing test for bug #1375 -- no out-of-bounds error for token.nbor()	2017-10-24 12:05:25 +02:00
Ines Montani	facf77e541	Merge branch 'develop' into support-danish	2017-10-24 11:53:19 +02:00
Matthew Honnibal	ccd2ab1a62	Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix Add LCA matrix for spans and docs	2017-10-24 11:22:46 +02:00
Matthew Honnibal	ef3e5a361b	Merge pull request #1442 from explosion/feature/fix-sp 💫Fix SP tag, tweak Vectors.__init__, fix Morphology	2017-10-24 10:24:07 +02:00
Matthew Honnibal	fdf25d10ba	Merge pull request #1440 from ramananbalakrishnan/develop Support single value for attribute list in doc.to_array	2017-10-24 10:23:12 +02:00
Matthew Honnibal	e7556ff048	Fix non-maxout parser	2017-10-23 18:16:23 +02:00
ines	a31f048b4d	Fix formatting	2017-10-23 10:38:06 +02:00
Matthew Honnibal	490ad3eaf0	Check that empty strings are handled. Closes #1242	2017-10-21 00:52:14 +02:00
Matthew Honnibal	8f8bccecb9	Patch deserialisation for invalid loads, to avoid model failure	2017-10-21 00:51:42 +02:00
Ramanan Balakrishnan	d2fe56a577	Add LCA matrix for spans and docs	2017-10-20 23:58:00 +05:30
Matthew Honnibal	d8391b1c4d	Fix #1434 : Matcher failed on ending ? if no token	2017-10-20 16:49:36 +02:00
Matthew Honnibal	fec53f09f7	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-20 16:28:34 +02:00
Matthew Honnibal	f111b228e0	Fix re-parsing of previously parsed text If a Doc object had been previously parsed, it was possible for invalid parses to be added. There were two problems: 1) The parse was only being partially erased 2) The RightArc action was able to create a 1-cycle. This patch fixes both errors, and avoids resetting the parse if one is present. In theory this might allow a better parse to be predicted by running the parser twice. Closes #1253.	2017-10-20 16:27:36 +02:00
Matthew Honnibal	1036798155	Make parser consistent if maxout==1	2017-10-20 16:24:16 +02:00
Matthew Honnibal	3faf9189a2	Make parser hidden shape consistent even if maxout==1	2017-10-20 16:23:31 +02:00
Matthew Honnibal	9010a1a060	Create vectors correctly	2017-10-20 14:19:46 +02:00
Matthew Honnibal	33229b1c9e	Remove print statement	2017-10-20 14:19:29 +02:00
Matthew Honnibal	cfae54c507	Make change to Vectors.__init__	2017-10-20 14:19:04 +02:00
Matthew Honnibal	ebecaddb76	Make 'data_or_width' two keyword args in Vectors.__init__ Previously the data and width options were one argument in Vectors, which meant you couldn't say vectors = Vectors(strings, width=300). It's better to have two keywords.	2017-10-20 14:17:15 +02:00
Matthew Honnibal	49895fbef6	Rename 'SP' special tag to '_SP' Renaming the tag with an underscore lets us add it to the tag map without worrying that we'll change the sequence of tags, which throws off the tag-to-ID mapping. For instance, if we inserted a 'SP' tag, the "VERB" tag is pushed to a different class ID, and the model is all messed up.	2017-10-20 14:01:12 +02:00
Matthew Honnibal	506cf2eb13	Remove cpdef enum, to avoid too much code generation	2017-10-20 14:00:23 +02:00
Matthew Honnibal	6218af0105	Remove cpdef enum, to avoid too much code generation	2017-10-20 13:59:57 +02:00
Matthew Honnibal	92ac9316b5	Fix initialization of vectors, to address serialization problem	2017-10-20 13:59:24 +02:00
Ramanan Balakrishnan	0726946563	cleanup to_array implementation using fixes on master	2017-10-20 17:09:37 +05:30
ines	108f1f786e	Update symbols and document missing token attributes (see #1439 )	2017-10-20 13:08:44 +02:00
ines	4acab77a8a	Add missing symbol for LAW entities (resolves #1427 )	2017-10-20 13:07:57 +02:00
Matthew Honnibal	b101736555	Fix precomputed layer	2017-10-20 12:14:52 +02:00
Ramanan Balakrishnan	b3ab124fc5	Support strings for attribute list in doc.to_array	2017-10-20 11:46:57 +05:30
Matthew Honnibal	64658e02e5	Implement fancier initialisation for precomputed layer	2017-10-20 03:07:45 +02:00
Matthew Honnibal	827cd8a883	Fix support of maxout pieces in parser	2017-10-20 03:07:17 +02:00
Matthew Honnibal	a8850b4282	Remove redundant PrecomputableMaxouts class	2017-10-19 20:27:34 +02:00
Matthew Honnibal	a17a1b60c7	Clean up redundant PrecomputableMaxouts class	2017-10-19 20:26:37 +02:00
Matthew Honnibal	b00d0a2c97	Fix bias in parser	2017-10-19 18:42:11 +02:00
Matthew Honnibal	b54b4b8a97	Make parser_maxout_pieces hyper-param work	2017-10-19 13:45:18 +02:00
Matthew Honnibal	03a215c5fd	Make PrecomputableAffines work	2017-10-19 13:44:49 +02:00
Ramanan Balakrishnan	7b9b1be44c	Support single value for attribute list in doc.to_array	2017-10-19 17:00:41 +05:30
Matthew Honnibal	61bc203f3f	Merge pull request #1438 from explosion/feature/fast-parser 💫 Improve runtime CPU efficiency of parser/NER	2017-10-19 02:42:21 +02:00
Matthew Honnibal	15e5a04a8d	Clean up more depth=0 conditional code	2017-10-19 01:48:43 +02:00
Matthew Honnibal	906c50ac59	Fix loop typing, that caused error on windows	2017-10-19 01:48:39 +02:00
ines	24512420b1	Show error if data_path does not exist or is None (see #1102 )	2017-10-19 00:53:49 +02:00
ines	bf415fd778	Add test for serializing extension attrs (see #1085 )	2017-10-19 00:53:08 +02:00
Matthew Honnibal	960788aaa2	Eliminate dead code in parser, and raise errors for obsolete options	2017-10-19 00:42:34 +02:00
Matthew Honnibal	bbfd7d8d5d	Clean up parser multi-threading	2017-10-19 00:25:21 +02:00
Matthew Honnibal	f018f2030c	Try optimized parser forward loop	2017-10-18 21:48:00 +02:00
Matthew Honnibal	65bf5e85bd	Improve piping in language.pipe	2017-10-18 21:46:12 +02:00
Matthew Honnibal	633a75c7e0	Break parser batches into sub-batches, sorted by length.	2017-10-18 21:45:01 +02:00
Ines Montani	f0d577e460	Merge pull request #1425 from explosion/feature/hindi-tokenizer 💫 Basic Hindi tokenization support	2017-10-18 13:34:52 +02:00
Matthew Honnibal	394633efce	Make doc pickling support hooks	2017-10-17 19:44:09 +02:00
Matthew Honnibal	fe844148f6	Test pickling hooks	2017-10-17 19:43:52 +02:00
Matthew Honnibal	cdb0c426d8	Improve deserialization of user_data, esp. for Underscore	2017-10-17 19:29:20 +02:00
Matthew Honnibal	374819edf8	Test user_data deserialization, re #1085	2017-10-17 19:28:54 +02:00
Matthew Honnibal	e35a83d142	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-17 18:22:06 +02:00
Matthew Honnibal	f45973848c	Rename 'tokens' variable 'doc' in tokenizer	2017-10-17 18:21:41 +02:00
Matthew Honnibal	839de87ca9	Make lambda func a named function, for pickling	2017-10-17 18:21:20 +02:00
Matthew Honnibal	9baa8fe7ec	Convert closure to functools.partial, to promote pickling	2017-10-17 18:20:52 +02:00
Matthew Honnibal	32a8564c79	Fix doc pickling	2017-10-17 18:20:24 +02:00
Matthew Honnibal	8ca97f32a3	Fix doc pickling test	2017-10-17 18:19:57 +02:00
Matthew Honnibal	9ce7d6af87	Make lex attr functions top-level functions, to promote pickling	2017-10-17 18:19:18 +02:00
Matthew Honnibal	1cc85a89ef	Allow reasonably efficient pickling of Language class, using to_bytes() and from_bytes().	2017-10-17 18:18:49 +02:00
Matthew Honnibal	0d57b9748a	Serialize lex_attr_getters with dill, for better pickle support	2017-10-17 18:17:45 +02:00
Matthew Honnibal	45d1dd90b1	Add tests for pickling doc	2017-10-17 17:20:58 +02:00
Ines Montani	afa67de7ee	Merge pull request #1428 from roanuz/develop Fix trailing whitespace and Language.from_disk overwrites	2017-10-17 16:29:15 +02:00
Matthew Honnibal	92c1eb2d6f	Fix Doc pickling. This also removes need for Binder class	2017-10-17 16:11:13 +02:00
Matthew Honnibal	ed8da9b11f	Add missing return statement in SentenceSegmenter	2017-10-17 15:32:56 +02:00
Ines Montani	aab299c8ae	Merge pull request #1429 from vishnunekkanti/develop fix syntax error in zh	2017-10-17 14:45:02 +02:00
Anto Binish Kaspar	534240648e	Fix trailing whitespace on morphology features	2017-10-17 17:15:58 +05:30
Anto Binish Kaspar	8f5b60c168	Fix Language.from_disk overwrites the meta.json file.	2017-10-17 17:15:32 +05:30
ines	8ca344712d	Add Language.has_pipe method	2017-10-17 11:20:07 +02:00
ines	485c4f6df5	Add Hungarian examples (see #1107 )	2017-10-17 02:37:45 +02:00
Matthew Honnibal	19531bad4c	Merge branch 'develop' into feature/streaming-data-memory-growth	2017-10-16 21:44:11 +02:00
Matthew Honnibal	df488274b1	Fix deserialization of vectors	2017-10-16 20:55:00 +02:00
Matthew Honnibal	4018486d31	Merge remote-tracking branch 'origin/develop' into feature/streaming-data-memory-growth	2017-10-16 20:49:48 +02:00
Matthew Honnibal	4174477161	Fix equality check in test	2017-10-16 19:50:35 +02:00
Matthew Honnibal	2bc06e4b22	Bump rolling buffer size to 10k	2017-10-16 19:38:29 +02:00
Matthew Honnibal	66e2eb8f39	Clean up remnant of frozen in StringStore	2017-10-16 19:34:41 +02:00
Matthew Honnibal	a002264fec	Remove caching of Token in Doc, as caused cycle.	2017-10-16 19:34:21 +02:00
Matthew Honnibal	3e037054c8	Remove obsolete is_frozen functionality from StringStore	2017-10-16 19:23:10 +02:00
Matthew Honnibal	5c14f3f033	Create a rolling buffer for the StringStore in Language.pipe()	2017-10-16 19:22:40 +02:00
Matthew Honnibal	59c216196c	Allow weakrefs on Doc objects	2017-10-16 19:22:11 +02:00
ines	d5418553eb	Fix whitespace	2017-10-16 18:30:04 +02:00
ines	6ceadcdb5c	Make sure from_disk passes string to numpy (see #1421 ) If path is a WindowsPath, numpy does not recognise it as a path and as a result, doesn't open the file. https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L369	2017-10-16 18:29:56 +02:00
Matthew Honnibal	010a7309ff	Merge pull request #1402 from explosion/feature/fix-matcher-operators 💫 Fix Matcher variable-length operators	2017-10-16 17:53:19 +02:00
Matthew Honnibal	c29927d2e7	Fix matcher test	2017-10-16 17:22:18 +02:00
Vishnu Kumar Nekkanti	d3c54cf39a	fixed SyntaxError while checking for jieba	2017-10-16 18:51:33 +05:30
Matthew Honnibal	a928ae2f35	Merge branch 'develop' into feature/fix-matcher-operators	2017-10-16 13:38:36 +02:00
Matthew Honnibal	56aa42cc5d	Fix and document matcher operator 'shadowing' behaviour	2017-10-16 13:38:20 +02:00
Matthew Honnibal	748d525801	Add more matcher operator tests	2017-10-16 13:38:01 +02:00
Matthew Honnibal	0433181658	Document operator semantics in Matcher docstring	2017-10-16 12:06:33 +02:00
ines	266e7180a7	Add Language class, stop words and basic stemmer that sets NORM	2017-10-14 14:59:52 +02:00
ines	e85e1d571b	Update base punctuation	2017-10-14 14:59:23 +02:00
ines	9d6c8eaa49	Update base norm exceptions with more unicode characters e.g. unicode variations of punctuation used in Chinese	2017-10-14 14:58:52 +02:00
ines	3516aa0cea	Port over changes from #1389	2017-10-14 13:32:55 +02:00
ines	cd6a29dce7	Port over changes from #1294	2017-10-14 13:28:46 +02:00
ines	38c756fd85	Port over changes from #1287	2017-10-14 13:16:21 +02:00
ines	612224c10d	Port over changes from #1157	2017-10-14 13:11:39 +02:00
ines	9b3f8f9ec3	Fix formatting and add comment on languages	2017-10-14 13:11:18 +02:00
ines	a4d974d97b	Port over URL pattern changes from #1411	2017-10-14 12:58:07 +02:00
ines	09aed58140	Port over changes from #1333 and add comments	2017-10-14 12:52:59 +02:00
Matthew Honnibal	cf6da9301a	Update lemmatizer test	2017-10-12 22:50:52 +02:00
Matthew Honnibal	9b90d235d1	Fix tag check in lemmatizer	2017-10-12 22:50:43 +02:00
Matthew Honnibal	dc01acd821	Escape encoding in validate function	2017-10-12 22:23:21 +02:00
Matthew Honnibal	27b927259a	Add locale_escape compat function	2017-10-12 22:22:04 +02:00
ines	9c6de3dcfa	Merge branch 'develop' into feature/cli-validate	2017-10-12 21:44:28 +02:00
Matthew Honnibal	462caf835a	Fix SBD test	2017-10-12 21:18:22 +02:00
ines	fff1028391	Add validate CLI command	2017-10-12 20:05:06 +02:00
Matthew Honnibal	908f44c3fe	Disable history features by default	2017-10-12 14:56:11 +02:00
Matthew Honnibal	a955843684	Increase default number of epochs	2017-10-12 13:13:01 +02:00
Matthew Honnibal	cecfcc7711	Set default hyper params back to 'slow' settings	2017-10-12 13:12:26 +02:00
Ines Montani	37aa523a8e	Merge pull request #1408 from explosion/feature/dot-underscore 💫 Custom attributes via Doc._, Token._ and Span._	2017-10-11 18:35:56 +02:00
ines	8ce6f96180	Don't make copies of language data components	2017-10-11 15:34:55 +02:00
ines	51519251c2	Fix underscore method test	2017-10-11 13:34:19 +02:00
ines	c6ae49e8bf	Fix formatting	2017-10-11 13:34:11 +02:00
ines	453c47ca24	Add German lemmatizer tests	2017-10-11 13:27:26 +02:00
ines	15fe0fd82d	Fix tests	2017-10-11 13:27:18 +02:00
ines	6dd14dc342	Add lookup lemmas to tokens without POS tags	2017-10-11 13:27:10 +02:00
ines	9620c1a640	Add lemma_lookup to Language defaults	2017-10-11 13:26:05 +02:00
ines	9fd471372a	Add lookup lemmatizer to lemmatizer as lookup() method	2017-10-11 13:25:51 +02:00
ines	e0ff145a8b	Merge branch 'develop' into feature/dot-underscore	2017-10-11 11:57:05 +02:00
ines	c1d6d43c83	Merge branch 'develop' into feature/lemmatizer	2017-10-11 11:56:35 +02:00
Matthew Honnibal	17c467e0ab	Avoid clobbering existing lemmas	2017-10-11 03:33:06 -05:00
Matthew Honnibal	807e109f2b	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-11 02:47:59 -05:00
Matthew Honnibal	6e552c9d83	Prune number of non-projective labels more aggressiely	2017-10-11 02:46:44 -05:00
Matthew Honnibal	76fe24f44d	Improve embedding defaults	2017-10-11 09:44:17 +02:00
Matthew Honnibal	188f620046	Improve parser defaults	2017-10-11 09:43:48 +02:00
Matthew Honnibal	acba2e1051	Fix metadata in training	2017-10-11 08:55:52 +02:00
Matthew Honnibal	74c2c6a58c	Add default name and lang to meta	2017-10-11 08:49:12 +02:00
Matthew Honnibal	3814a161e6	Avoid clobbering preset lemmas	2017-10-11 08:41:03 +02:00
Matthew Honnibal	fd47f8e89f	Fix failing test	2017-10-11 08:38:34 +02:00
Matthew Honnibal	462b2e26b4	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-11 08:23:04 +02:00
Matthew Honnibal	a6ac4699eb	Allow Morphology class to setup tokens Add Morphology.assign_untagged() C-method, and call it from Doc.push_back() when a token is created. This gives a place to allow the Morphology class to initialize token data.	2017-10-11 03:24:14 +02:00
Matthew Honnibal	3b527fa52b	Call morphology.assign_untagged when pushing token to Doc	2017-10-11 03:23:57 +02:00
Matthew Honnibal	c15d8278cb	Avoid lemmatizing inappropriate tags in English lemmatizer	2017-10-11 03:23:23 +02:00
Matthew Honnibal	d528b6e36d	Add assign_untagged method in Morphology	2017-10-11 03:22:49 +02:00
Matthew Honnibal	2c118ab3a6	Add tests for Doc creation	2017-10-11 03:21:23 +02:00
ines	820bf85075	Move LookupLemmatizer to spacy.lemmatizer	2017-10-11 02:25:13 +02:00
ines	417d45f5d0	Add lemmatizer data as variable on language data Don't create lookup lemmatizer within Language class and just pass in the data so it can be set on Token creation	2017-10-11 02:24:58 +02:00
ines	0c2343d73a	Tidy up language data	2017-10-11 02:22:49 +02:00
Matthew Honnibal	d84136b4a9	Update add label test	2017-10-10 22:57:41 +02:00
Matthew Honnibal	3065f12ef2	Make add parser label work for hidden_depth=0	2017-10-10 22:57:31 +02:00
ines	bfd58dd0fc	Merge branch 'develop' into feature/dot-underscore	2017-10-10 22:03:51 +02:00
Matthew Honnibal	73bca3d382	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-10 12:51:37 -05:00
Matthew Honnibal	5156074df1	Make loading code more consistent in train command	2017-10-10 12:51:20 -05:00
Matthew Honnibal	d70fba6807	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-10 19:33:10 +02:00
Matthew Honnibal	8143618497	Set prefix length back to 1	2017-10-10 19:32:54 +02:00
Matthew Honnibal	97c9b5db8b	Patch spacy.train for new pipeline management	2017-10-09 23:41:16 -05:00
Matthew Honnibal	a635240398	Add conll_ner2json converter	2017-10-09 22:03:26 -05:00
Matthew Honnibal	e0a9b02b67	Merge Span._ and Span.as_doc methods	2017-10-09 22:00:15 -05:00
Matthew Honnibal	dce8afb9cf	Set prefix length to 3	2017-10-09 21:55:55 -05:00
Matthew Honnibal	8265b90c83	Update parser defaults	2017-10-09 21:55:20 -05:00
Matthew Honnibal	dd2b0601d1	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-09 21:30:46 -05:00
Matthew Honnibal	09d61ada5e	Merge pull request #1396 from explosion/feature/pipeline-management 💫 Improve pipeline and factory management	2017-10-10 04:29:54 +02:00
ines	67350fa496	Use better logic for auto-generating component name Instances don't have __name__, so we try __class__.__name__ as well, before giving up and defaulting to repr(component).	2017-10-10 04:23:05 +02:00
ines	3fc4fe61d2	Fix typo	2017-10-10 04:15:14 +02:00
ines	59c4f27499	Add get, set and has methods to Underscore	2017-10-10 04:14:35 +02:00
Matthew Honnibal	19136fd155	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-10 03:58:30 +02:00
Matthew Honnibal	8978212ee5	Patch serialization bug raised in #1105	2017-10-10 03:58:12 +02:00
Matthew Honnibal	f0f2739ae3	Add test for serialization issue raised in #1105	2017-10-10 03:57:58 +02:00
Matthew Honnibal	735d18654d	Add NER converter for CoNLL 2003 data	2017-10-09 20:06:28 -05:00
Matthew Honnibal	51d18937af	Partially apply doc/span/token into method We want methods to act like they're "bound" to the object, so that you can make your method conditional on the `doc`, `span` or `token` instance --- like, well, a method. We therefore partially apply the function, which works like this: ``` def partial(unbound_method, constant_arg): def bound_method(args, kwargs): return unbound_method(constant_arg, args, **kwargs) return bound_method	2017-10-10 02:21:28 +02:00
Matthew Honnibal	808d8740d6	Remove print statement	2017-10-09 08:45:20 -05:00
Matthew Honnibal	0f41b25f60	Add speed benchmarks to metadata	2017-10-09 08:05:37 -05:00
ines	de374dc72a	Merge branch 'feature/pipeline-management' into feature/dot-underscore	2017-10-09 14:37:51 +02:00
Matthew Honnibal	2534cd57d7	Add bandaid solution to the 'shadowing' problem in #864	2017-10-09 08:59:35 +02:00
Matthew Honnibal	d8a2506023	Merge pull request #1401 from explosion/feature/add-parser-action 💫 Allow labels to be added to pre-trained parser and NER modes	2017-10-09 04:57:51 +02:00
Matthew Honnibal	689349e32f	Merge pull request #1400 from explosion/feature/sentence-parsing 💫 Force parser to respect preset sentence boundaries	2017-10-09 04:31:43 +02:00
Matthew Honnibal	e79fc41ff8	Merge pull request #1391 from explosion/feature/multilabel-textcat 💫 Fix multi-label support for text classification	2017-10-09 04:22:31 +02:00
Matthew Honnibal	fad2b8315f	Merge branch 'develop' into feature/add-parser-action	2017-10-09 04:13:04 +02:00
Matthew Honnibal	6c79841c0d	Fix tests for history features	2017-10-09 04:12:24 +02:00
Matthew Honnibal	dde87e6b0d	Add tests for adding parser actions	2017-10-09 03:42:35 +02:00
Matthew Honnibal	b2b8506f2c	Remove whitespace	2017-10-09 03:35:57 +02:00
Matthew Honnibal	d43a83e37a	Allow parser.add_label for pretrained models	2017-10-09 03:35:40 +02:00
Matthew Honnibal	81a64119db	Fix string-to-unicode problem	2017-10-09 00:59:49 +02:00
Matthew Honnibal	02c2af7119	Fix test	2017-10-09 00:29:37 +02:00
Matthew Honnibal	4cc84b0234	Prohibit Break when sent_start < 0	2017-10-09 00:02:45 +02:00
Matthew Honnibal	5a67efeccc	Add tests for sentence segmentation presetting	2017-10-09 00:02:23 +02:00
Matthew Honnibal	e938bce320	Adjust parsing transition system to allow preset sentence segments.	2017-10-08 23:53:34 +02:00
Matthew Honnibal	080afd4924	Add ternary value setting to Token.sent_start	2017-10-08 23:51:58 +02:00
Matthew Honnibal	7ae67ec6a1	Add Span.as_doc method	2017-10-08 23:50:20 +02:00
Matthew Honnibal	20309fb9db	Make history features default to zero	2017-10-08 20:32:14 +02:00
Matthew Honnibal	e74c8d2fad	Merge remote-tracking branch 'origin/develop' into feature/sentence-parsing	2017-10-08 20:20:41 +02:00
Matthew Honnibal	18063803de	Make TokenC.sent_tart an int, to allow ternary value	2017-10-08 19:58:54 +02:00
Matthew Honnibal	be4f0b6460	Update defaults	2017-10-08 02:08:12 -05:00
Matthew Honnibal	42b401d08b	Change default hidden depth to 1	2017-10-07 21:05:21 -05:00
Matthew Honnibal	9d66a915da	Update training defaults	2017-10-07 21:02:38 -05:00
Matthew Honnibal	d163115e91	Add non-linearity after history features	2017-10-07 21:00:43 -05:00
Matthew Honnibal	92c5d78b42	Unhack NER.add_action	2017-10-07 19:02:40 +02:00
Matthew Honnibal	f2b590f672	Increment version	2017-10-07 19:01:01 +02:00
Matthew Honnibal	9bd8191739	Add tests for Underscore	2017-10-07 18:56:19 +02:00
Matthew Honnibal	668a0ea640	Pass extensions into Underscore class	2017-10-07 18:56:01 +02:00
Matthew Honnibal	1289129fd9	Add Underscore class	2017-10-07 18:00:14 +02:00
Matthew Honnibal	eb0595bea9	Merge pull request #1392 from explosion/feature/parser-history-model 💫 Parser history features	2017-10-07 15:07:02 +02:00
Matthew Honnibal	3d22ccf495	Update default hyper-parameters	2017-10-07 07:16:41 -05:00
Matthew Honnibal	09442d25ec	Merge remote-tracking branch 'origin/develop' into feature/parser-history-model	2017-10-07 07:05:04 -05:00
Matthew Honnibal	3b67eabfea	Allow empty dictionaries to match any token in Matcher Often patterns need to match "any token". A clean way to denote this is with the empty dict {}: this sets no constraints on the token, so should always match. The problem was that having attributes length==0 was used as an end-of-array signal, so the matcher didn't handle this case correctly. This patch compiles empty token spec dicts into a constraint NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the lexeme -- so this always matches.	2017-10-07 03:36:15 +02:00
ines	0adadcb3f0	Fix beam parse model test	2017-10-07 02:15:15 +02:00
ines	b38a8f4a94	Fix and update pipe methods tests	2017-10-07 02:06:23 +02:00
Matthew Honnibal	0384f08218	Trigger nonproj.deprojectivize as a postprocess	2017-10-07 02:00:47 +02:00
Matthew Honnibal	3a65a0c970	Start adding tests for new pipeline management	2017-10-07 01:48:23 +02:00
ines	e43530269c	Update docstrings	2017-10-07 01:04:50 +02:00
ines	61a503a611	Fix parser test	2017-10-07 00:38:51 +02:00
ines	b39409173e	Add disable option and True/False/None values for pipeline	2017-10-07 00:29:08 +02:00
ines	2586b61b15	Fix formatting, tidy up and remove unused imports	2017-10-07 00:26:05 +02:00
ines	212c8f0711	Implement new Language methods and pipeline API	2017-10-07 00:25:54 +02:00
Matthew Honnibal	8be46d766e	Remove print statement	2017-10-06 16:19:02 -05:00
Matthew Honnibal	8e731009fe	Fix parser config serialization	2017-10-06 13:50:52 -05:00
Matthew Honnibal	f4c9a98166	Fix spacy evaluate command on non-GPU	2017-10-06 13:17:47 -05:00
Matthew Honnibal	16ba6aa8a6	Fix parser config serialization	2017-10-06 13:17:31 -05:00
Matthew Honnibal	c66399d8ae	Fix depth definition with history features	2017-10-06 06:20:05 -05:00
Matthew Honnibal	5c750a9c2f	Reserve 0 for 'missing' in history features	2017-10-06 06:10:13 -05:00
Matthew Honnibal	fbba7c517e	Pass dropout through to embed tables	2017-10-06 06:09:18 -05:00
Matthew Honnibal	21d11936fe	Fix significant train/test skew error in history feats	2017-10-06 06:08:50 -05:00
Matthew Honnibal	555d8c8bff	Fix beam history features	2017-10-05 22:21:50 -05:00
Matthew Honnibal	3db0a32fd6	Fix dropout for history features	2017-10-05 22:21:30 -05:00
Matthew Honnibal	b0618def8d	Add support for 2-token state option	2017-10-05 21:54:12 -05:00
Matthew Honnibal	363aa47b40	Clean up dead parsing code	2017-10-05 21:53:49 -05:00
Matthew Honnibal	ca12764772	Enable history features for beam parser	2017-10-05 21:53:29 -05:00
Matthew Honnibal	fc06b0a333	Fix training when hist_size==0	2017-10-05 21:52:28 -05:00
Matthew Honnibal	e25ffcb11f	Move history size under feature flags	2017-10-05 19:38:13 -05:00
Matthew Honnibal	563f46f026	Fix multi-label support for text classification The TextCategorizer class is supposed to support multi-label text classification, and allow training data to contain missing values. For this to work, the gradient of the loss should be 0 when labels are missing. Instead, there was no way to actually denote "missing" in the GoldParse class, and so the TextCategorizer class treated the label set within gold.cats as complete. To fix this, we change GoldParse.cats to be a dict instead of a list. The GoldParse.cats dict should map to floats, with 1. denoting 'present' and 0. denoting 'absent'. Gradients are zeroed for categories absent from the gold.cats dict. A nice bonus is that you can also set values between 0 and 1 for partial membership. You can also set numeric values, if you're using a text classification model that uses an appropriate loss function. Unfortunately this is a breaking change; although the functionality was only recently introduced and hasn't been properly documented yet. I've updated the example script accordingly.	2017-10-05 18:43:02 -05:00
Matthew Honnibal	c6cd81f192	Wrap try/except around model saving	2017-10-05 08:14:24 -05:00
Matthew Honnibal	5743b06e36	Wrap model saving in try/except	2017-10-05 08:12:50 -05:00
Matthew Honnibal	fd4baff475	Update tests	2017-10-05 08:12:27 -05:00
Matthew Honnibal	dcdfa071aa	Disable LayerNorm hack	2017-10-04 20:06:52 -05:00
Matthew Honnibal	943af4423a	Make depth setting in parser work again	2017-10-04 20:06:05 -05:00
Matthew Honnibal	bfabc333be	Merge remote-tracking branch 'origin/develop' into feature/parser-history-model	2017-10-04 20:00:36 -05:00
Matthew Honnibal	92066b04d6	Fix Embed and HistoryFeatures	2017-10-04 19:55:34 -05:00
Matthew Honnibal	d903986439	Increment version	2017-10-04 17:14:26 +02:00
Matthew Honnibal	40edb65ee7	Make test work for Python 2.7	2017-10-04 16:36:50 +02:00
Matthew Honnibal	bd8e84998a	Add nO attribute to TextCategorizer model	2017-10-04 16:07:30 +02:00
Matthew Honnibal	f8a0614527	Improve textcat model slightly	2017-10-04 15:15:53 +02:00
Matthew Honnibal	39798b0172	Uncomment layernorm adjustment hack	2017-10-04 15:12:09 +02:00
Matthew Honnibal	b3a7082bf8	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-04 14:56:46 +02:00
Matthew Honnibal	db05d4d582	Add test for #1380 . Passes without fix?	2017-10-04 14:56:31 +02:00
Matthew Honnibal	774f5732bd	Fix dimensionality of textcat when no vectors available	2017-10-04 14:55:15 +02:00
Ines Montani	28ba0b9b51	Merge pull request #1385 from explosion/feature/new-website 💫 New spaCy website	2017-10-04 14:35:52 +02:00
Matthew Honnibal	af75b74208	Unset LayerNorm backwards compat hack	2017-10-03 20:47:10 -05:00
ines	73ac0aa0b5	Update spacy evaluate and add displaCy option	2017-10-04 00:03:15 +02:00
Matthew Honnibal	246612cb53	Merge remote-tracking branch 'origin/develop' into feature/parser-history-model	2017-10-03 16:56:42 -05:00
Matthew Honnibal	f24c2e3a8a	Fix evaluate for non-GPU	2017-10-03 22:47:31 +02:00
Matthew Honnibal	5cbefcba17	Set backwards compatibility flag	2017-10-03 20:29:58 +02:00
Matthew Honnibal	5454b20cd7	Update thinc imports for 6.9	2017-10-03 20:07:17 +02:00
Matthew Honnibal	4a59f6358c	Fix thinc imports	2017-10-03 19:21:26 +02:00
Matthew Honnibal	e514d6aa0a	Import thinc modules more explicitly, to avoid cycles	2017-10-03 18:49:25 +02:00
Matthew Honnibal	338e1fda0e	Unbreak merge artefact	2017-10-03 09:41:05 -05:00
Matthew Honnibal	1289187279	Fix circular import	2017-10-03 09:33:21 -05:00
Matthew Honnibal	a44c4c3a5b	Add timer to evaluate	2017-10-03 09:15:35 -05:00
Matthew Honnibal	96da86b3e5	Add support for verbose flag to Language	2017-10-03 09:14:57 -05:00
Matthew Honnibal	02586a5243	Add timing to spacy evaluate command	2017-10-03 09:14:34 -05:00
ines	e49cd7aeaf	Move import into load to avoid circular imports	2017-10-03 15:22:19 +02:00
ines	b0dfa059db	Update docs link in about.py	2017-10-03 15:19:55 +02:00
Matthew Honnibal	dc3c791947	Fix history size option	2017-10-03 13:41:23 +02:00
Matthew Honnibal	278a4c17c6	Fix history features	2017-10-03 13:27:10 +02:00
Matthew Honnibal	b770f4e108	Fix embed class in history features	2017-10-03 13:26:55 +02:00
Matthew Honnibal	b50a359e11	Add support for history features in parsing models	2017-10-03 12:44:01 +02:00
Matthew Honnibal	ee41e4fea7	Support history features in stateclass	2017-10-03 12:43:48 +02:00
Matthew Honnibal	6aa6a5bc25	Add a layer type for history features	2017-10-03 12:43:09 +02:00
Matthew Honnibal	8902df44de	Fix component disabling during training	2017-10-02 21:07:23 +02:00
Matthew Honnibal	c617d288d8	Update pipeline component names in spaCy train	2017-10-02 17:20:19 +02:00
Matthew Honnibal	f942903429	Improve sentence merging in iob2json	2017-10-02 17:02:10 +02:00
Matthew Honnibal	31681d20e0	Fix concatenation in iob2json converter	2017-10-02 16:50:26 +02:00
Matthew Honnibal	4896ce3320	Remove misleading comment	2017-10-02 00:09:14 +02:00
Matthew Honnibal	d90cc917fa	Merge vectors.pyx doc strings	2017-10-01 17:05:54 -05:00
Matthew Honnibal	b2a8b9be77	Fix inconsistency of Vectors class API	2017-10-01 17:00:34 -05:00
Matthew Honnibal	e38089d598	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-01 22:10:54 +02:00
Matthew Honnibal	97c409b602	Add docstrings for spacy.vectors	2017-10-01 22:10:33 +02:00
ines	b776f48e58	Fix typo	2017-10-01 21:58:45 +02:00
Matthew Honnibal	94df115a81	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-01 14:06:23 -05:00
Matthew Honnibal	2cf0f4622f	Fix loading of models with pre-trained vectors	2017-10-01 14:05:32 -05:00
Matthew Honnibal	69c7c642c2	Add spacy evaluate	2017-10-01 14:05:04 -05:00
ines	8dbe49ecb8	Always compare lowercase package names Otherwise, is_package will return False if model name contains uppercase characters. See this issue: https://support.prodi.gy/t/saving-a-trained-ner-model-as-a-loadable-modu le/46/6	2017-09-29 20:55:17 +02:00
ines	153c2589d4	Revert "Always compare lowercase package names" This reverts commit `7d77dc490f`.	2017-09-29 20:53:36 +02:00
ines	fd1a9225d8	Handle conversion of pipeline components correctly Allow both comma and comma + whitespace as separators	2017-09-29 20:52:56 +02:00
ines	7d77dc490f	Always compare lowercase package names Otherwise, is_package will return False if model name contains uppercase characters. See this issue: https://support.prodi.gy/t/saving-a-trained-ner-model-as-a-loadable-modu le/46/6	2017-09-29 20:52:28 +02:00
Matthew Honnibal	cdb2d83e16	Pass dropout in parser	2017-09-28 18:47:13 -05:00
Matthew Honnibal	158e177cae	Fix default embed size	2017-09-28 08:25:23 -05:00
Matthew Honnibal	f6330d69e6	Default embed size to 7000	2017-09-28 08:07:41 -05:00
Matthew Honnibal	ac8481a7b0	Print NER loss	2017-09-28 08:05:31 -05:00
Matthew Honnibal	542ebfa498	Improve defaults	2017-09-27 18:54:37 -05:00
Matthew Honnibal	dcb86bdc43	Default batch size to 32	2017-09-27 11:48:19 -05:00
Matthew Honnibal	1a37a2c0a0	Update training defaults	2017-09-27 11:48:07 -05:00
Matthew Honnibal	13d7a97f3a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-27 11:44:37 -05:00
Matthew Honnibal	66c388ee01	Remove unhelpful multitask objectives	2017-09-27 11:44:16 -05:00
Matthew Honnibal	983201a83a	Fix hard-coded vector width	2017-09-27 11:43:58 -05:00
Ines Montani	959c46eabe	Merge pull request #1365 from wannaphongcom/develop Add Thai language for spaCy v2	2017-09-26 23:43:05 +02:00
Matthew Honnibal	1ef4236f8e	Merge pull request #1343 from explosion/feature/phrasematcher Update PhraseMatcher for spaCy 2	2017-09-26 20:44:23 +02:00
Wannaphong Phatthiyaphaibun	7b5263ffa4	fix thai test	2017-09-26 23:54:15 +07:00
ines	1ff62eaee7	Fix option shortcut to avoid conflict	2017-09-26 17:59:34 +02:00
Wannaphong Phatthiyaphaibun	3d5046c499	fix import in th	2017-09-26 22:41:20 +07:00
ines	7fdfb78141	Add version option to cli.train	2017-09-26 17:34:52 +02:00
Wannaphong Phatthiyaphaibun	a63f790b8c	fix thai tag_map	2017-09-26 22:28:57 +07:00
Wannaphong Phatthiyaphaibun	2ea27d07f4	fix tokenizer_exceptions in thai	2017-09-26 22:14:47 +07:00
Matthew Honnibal	41cc5c4c17	Merge branch 'develop' into feature/phrasematcher	2017-09-26 09:59:17 -05:00
Matthew Honnibal	c2e2f81773	Merge pull request #1355 from explosion/feature/noshare Make pipeline components independent	2017-09-26 16:58:09 +02:00
Wannaphong Phatthiyaphaibun	a2bf4cc7bf	fix newline in file	2017-09-26 21:49:43 +07:00
ines	bb5c631402	Implement like_num getter for French (via #1161 )	2017-09-26 16:47:45 +02:00
ines	15479b3bae	Add comment to like_num re: future work	2017-09-26 16:43:28 +02:00
ines	adda08fe14	Implement like_num getter for Dutch (via #1177 )	2017-09-26 16:39:15 +02:00
ines	5ee10379db	Port over changes from #1340	2017-09-26 16:38:08 +02:00
Wannaphong Phatthiyaphaibun	5cba67146c	add thai in spacy2	2017-09-26 21:36:27 +07:00
ines	10d291f129	Port over change from #1351	2017-09-26 16:11:41 +02:00
Matthew Honnibal	3274b46a0d	Try to fix compile error on Windows	2017-09-26 09:05:53 -05:00
Matthew Honnibal	19c7c09bf7	Fix PhraseMatcher.__contains__	2017-09-26 08:35:53 -05:00
Matthew Honnibal	d02a41a8c9	Merge remote-tracking branch 'origin/develop' into feature/phrasematcher	2017-09-26 08:32:55 -05:00
Matthew Honnibal	698fc0d016	Remove merge artefact	2017-09-26 08:31:37 -05:00
Matthew Honnibal	defb68e94f	Update feature/noshare with recent develop changes	2017-09-26 08:15:14 -05:00
Matthew Honnibal	ca28590ddd	Use dep and ent multi-task objectives for parser'	2017-09-26 08:13:52 -05:00
Matthew Honnibal	9bfd585a11	Fix parameter name in .pxd file	2017-09-26 07:28:50 -05:00
Matthew Honnibal	74f08e1ad5	Update test	2017-09-26 06:45:56 -05:00
Matthew Honnibal	5aaef3e7b8	Dont link vectors in vocab deserialize	2017-09-26 06:45:47 -05:00
Matthew Honnibal	18a27c7579	Fix typo in tensorizer serialization	2017-09-26 06:45:14 -05:00
Matthew Honnibal	5056743ad5	Fix parser serialization	2017-09-26 06:44:56 -05:00
Ines Montani	7123139b2b	Add __contains__ to PhraseMatcher	2017-09-26 13:13:27 +02:00
Ines Montani	50ad50f96a	Update matcher.pyx	2017-09-26 13:11:17 +02:00
Matthew Honnibal	e34e70673f	Allow tagger models to be built with pre-defined tok2vec layer	2017-09-26 05:51:52 -05:00
Matthew Honnibal	bf917225ab	Allow multi-task objectives during training	2017-09-26 05:42:52 -05:00
Matthew Honnibal	4ae9ea7684	Remove unused argument in Language	2017-09-26 05:41:35 -05:00
ines	edf7e4881d	Add meta.json option to cli.train and add relevant properties Add accuracy scores to meta.json instead of accuracy.json and replace all relevant properties like lang, pipeline, spacy_version in existing meta.json. If not present, also add name and version placeholders to make it packagable.	2017-09-25 19:00:47 +02:00
ines	d2d35b63b7	Fix formatting	2017-09-25 18:37:13 +02:00
Matthew Honnibal	8eb0b7b779	Add docstrings for Pipe API	2017-09-25 16:22:07 +02:00
Matthew Honnibal	39f390dba7	Add docstrings for Pipe API	2017-09-25 16:20:49 +02:00
Matthew Honnibal	8716ffe57d	Serialize vocab last	2017-09-24 05:01:45 -05:00
Matthew Honnibal	72bbcc0871	Handle lemmatization for unknown string IDs	2017-09-24 05:01:31 -05:00
Matthew Honnibal	204b58c864	Fix evaluation during training	2017-09-24 05:01:03 -05:00
Matthew Honnibal	dc3a623d00	Remove unused update_shared argument	2017-09-24 05:00:37 -05:00
Matthew Honnibal	63bd87508d	Don't use iterated convolutions	2017-09-23 04:39:17 -05:00
Matthew Honnibal	5a7fd0fd36	Fix vector linkage	2017-09-22 20:11:52 -05:00
Matthew Honnibal	4348c479fc	Merge pre-trained vectors and noshare patches	2017-09-22 20:07:28 -05:00
Matthew Honnibal	7dc61b3f43	Whitespace	2017-09-22 20:00:50 -05:00
Matthew Honnibal	e93d43a43a	Fix training with preset vectors	2017-09-22 20:00:40 -05:00
Matthew Honnibal	0795857dcb	Fix beam parsing	2017-09-23 02:59:53 +02:00
Matthew Honnibal	4bd6a12b1f	Fix Tok2Vec	2017-09-23 02:58:54 +02:00
Matthew Honnibal	386c1a5bd8	Fix tagger training	2017-09-23 02:58:06 +02:00
Matthew Honnibal	a2357cce3f	Set random seed in train script	2017-09-23 02:57:31 +02:00
Matthew Honnibal	05596159bf	Fix serialization when pre-trained vectors	2017-09-22 15:33:27 -05:00
Matthew Honnibal	980fb6e854	Refactor Tok2Vec	2017-09-22 09:38:36 -05:00
Matthew Honnibal	d9124f1aa3	Add link_vectors_to_models function	2017-09-22 09:38:22 -05:00
Matthew Honnibal	a186596307	Add 'reapply' combinator, for iterated CNN	2017-09-22 09:37:03 -05:00
Matthew Honnibal	40a4873b70	Fix serialization of model options	2017-09-21 13:07:26 -05:00
Matthew Honnibal	0a9016cade	Fix serialization during training	2017-09-21 13:06:45 -05:00
Matthew Honnibal	20193371f5	Don't share CNN, to reduce complexities	2017-09-21 14:59:48 +02:00
Matthew Honnibal	1d73dec8b1	Refactor train script	2017-09-20 19:17:10 -05:00
Matthew Honnibal	ffda38356a	Add util function to enable GPU	2017-09-20 19:16:35 -05:00
Matthew Honnibal	24e85c2048	Pass values for CNN maxout pieces option	2017-09-20 19:16:12 -05:00
Matthew Honnibal	b832f89ff8	Add resume_training function	2017-09-20 19:15:20 -05:00
Matthew Honnibal	f5144f04be	Add argument for CNN maxout pieces	2017-09-20 19:14:41 -05:00
Matthew Honnibal	842e21de9f	Fix int type error for Python 2	2017-09-20 23:55:30 +02:00
Matthew Honnibal	0c93c73e49	Add __reduce__ method for PhraseMatcher	2017-09-20 22:26:40 +02:00
Matthew Honnibal	cc408fc189	Make PhraseMatcher API like Matcher API	2017-09-20 22:20:35 +02:00
Matthew Honnibal	43ad250dd5	Update matcher tests	2017-09-20 21:54:49 +02:00
Matthew Honnibal	828cc91545	Fix PhraseMatcher for spaCy 2	2017-09-20 21:54:31 +02:00
Matthew Honnibal	78301b2d29	Avoid comparison to None in Tok2Vec	2017-09-20 00:19:34 +02:00
Matthew Honnibal	b36a38f63d	Fix serialization of pretrained_dims property	2017-09-19 23:42:27 +02:00
Matthew Honnibal	2489dcaccf	Fix serialization of parser	2017-09-19 23:42:12 +02:00
Matthew Honnibal	40837b275d	Fix tensorizer with pretrained vectors	2017-09-18 18:05:38 -05:00
Matthew Honnibal	a0c4b33d03	Support resuming a model during spacy train	2017-09-18 18:04:47 -05:00
Matthew Honnibal	c858927271	Copy vectors to GPU on begin training	2017-09-18 18:04:16 -05:00
Matthew Honnibal	3fa76c17d1	Refactor Tok2Vec	2017-09-18 15:00:05 -05:00
Matthew Honnibal	217e7891cd	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-18 11:36:21 -05:00
Matthew Honnibal	7b3f391f80	Try dropping the Affine layer, conditionally	2017-09-18 11:35:59 -05:00
ines	2480f8f521	Add missing return in Doc.from_disk() (closes #1330 )	2017-09-18 15:32:00 +02:00
Matthew Honnibal	2148ae605b	Dont use iterated convolutions	2017-09-17 17:36:04 -05:00
Matthew Honnibal	c013e5996f	Fix parser test	2017-09-17 13:13:20 -05:00
Matthew Honnibal	8f42f8d305	Remove unused 'preprocess' argument in Tok2Vec'	2017-09-17 12:30:16 -05:00
Matthew Honnibal	039d609362	Remove hard-coded default vectors width	2017-09-17 12:29:39 -05:00
Matthew Honnibal	4f38a67a89	Make width default to 0 in vectors.pyx	2017-09-17 12:29:14 -05:00
Matthew Honnibal	16122f566e	Fix cpdef enum in attrs.pyx	2017-09-17 12:28:53 -05:00
Matthew Honnibal	b159e0eb50	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-17 05:47:50 -05:00
Matthew Honnibal	2b0efc77ae	Fix wiring of pre-trained vectors in parser loading	2017-09-17 05:47:34 -05:00
Matthew Honnibal	31c2e91c35	Fix wiring of pre-trained vectors in parser loading	2017-09-17 05:46:55 -05:00
Matthew Honnibal	8f913a74ca	Fix defaults and args to build_tagger_model	2017-09-17 05:46:36 -05:00
Matthew Honnibal	c003c561c3	Revert NER action loading change, for model compatibility	2017-09-17 05:46:03 -05:00
Matthew Honnibal	43210abacc	Resolve fine-tuning conflict	2017-09-17 05:30:04 -05:00
ines	ece30c28a8	Don't split hyphenated words in German This way, the tokenizer matches the tokenization in German treebanks	2017-09-16 20:40:15 +02:00
ines	68f66aebf8	Use pkg_resources instead of pip for is_package (resolves #1293 )	2017-09-16 20:27:59 +02:00
Matthew Honnibal	5ff2491f24	Pass option for pre-trained vectors in parser	2017-09-16 12:47:21 -05:00
Matthew Honnibal	8665a77f48	Fix feature error in NER	2017-09-16 12:46:57 -05:00
Matthew Honnibal	e37a50a436	Pass documents to tensorizer, not 'features'	2017-09-16 12:46:36 -05:00
Matthew Honnibal	84e637e2e6	Pass option for pretrained vectors in pipeline	2017-09-16 12:46:02 -05:00
Matthew Honnibal	2a93404da6	Support optional pre-trained vectors in tensorizer model	2017-09-16 12:45:37 -05:00
Matthew Honnibal	e0a2aa9289	Support having word vectors data on GPU	2017-09-16 12:45:09 -05:00
Matthew Honnibal	ebf8942564	Fix test for Python3	2017-09-16 16:22:38 +02:00
Matthew Honnibal	8c945310fb	Excuse emoji failure on narrow unicode builds	2017-09-16 16:21:13 +02:00
Matthew Honnibal	11f2a05ede	Fix code explosion from long enum in Python 3, Cython 0.24+	2017-09-16 12:20:04 +02:00
Matthew Honnibal	3fa5b40b5c	Add test for hash consistency	2017-09-16 11:21:35 +02:00
Matthew Honnibal	f730d07e4e	Fix prange error for Windows	2017-09-16 00:25:33 +02:00
Matthew Honnibal	4b2065430e	Merge branch 'feature/parser-history' into develop	2017-09-15 10:42:20 +02:00
Matthew Honnibal	2f08489694	Remove AddHistory layer -- didnt work as planned	2017-09-15 10:41:40 +02:00
Matthew Honnibal	8b481e0465	Remove redundant brackets	2017-09-15 10:38:08 +02:00
Matthew Honnibal	d84607f6bb	Vectorize update in AddHistory	2017-09-14 20:34:40 +02:00
Ines Montani	bd3da3d6fb	Port over change from #1323 and tidy up	2017-09-14 19:23:13 +02:00
Matthew Honnibal	18347ab69c	Implement AddHistory layer wrapper	2017-09-14 19:07:35 +02:00
Matthew Honnibal	d4ca6cef9e	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-14 17:00:07 +02:00
Matthew Honnibal	8c503487af	Fix lookup of missing NER actions	2017-09-14 16:59:45 +02:00
Matthew Honnibal	664c5af745	Revert padding in parser	2017-09-14 16:59:25 +02:00
Matthew Honnibal	8496d76224	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-14 09:21:20 -05:00
Matthew Honnibal	d1518027a9	Increment version	2017-09-14 16:18:46 +02:00
Matthew Honnibal	70da88a3a7	Update comment on Language.begin_training	2017-09-14 16:18:30 +02:00
Matthew Honnibal	c6395b057a	Improve parser feature extraction, for missing values	2017-09-14 16:18:02 +02:00
Matthew Honnibal	daf869ab3b	Fix add_action for NER, so labelled 'O' actions aren't added	2017-09-14 16:16:41 +02:00
Matthew Honnibal	9cb2aef587	Remove print statement	2017-09-14 13:38:28 +02:00
Matthew Honnibal	ba23d63c35	Fix minibatch function, for fixed batch size	2017-09-14 13:37:41 +02:00
Jim O'Regan	7de709483b	missed adding here	2017-09-11 10:51:21 +01:00
Jim O'Regan	b1b6123867	add ga_tokenizer	2017-09-11 10:31:41 +01:00
Jim O'Regan	9dfd301962	rearrange	2017-09-11 10:14:18 +01:00
Jim O'Regan	187be6d372	copy/paste error	2017-09-11 09:33:17 +01:00
Jim O'Regan	c283e9edfe	first stab at test	2017-09-11 08:57:48 +01:00
Jim O'Regan	1ee75ae337	Merge remote-tracking branch 'origin/develop' into develop-irish	2017-09-11 08:40:11 +01:00
Matthew Honnibal	456bb8a74c	Unxfail and close #1305	2017-09-06 19:14:17 +02:00
Matthew Honnibal	99e44fbdbb	Update regression test	2017-09-06 19:13:51 +02:00
Matthew Honnibal	5c3ff06924	Fix lemmatizer rules	2017-09-06 19:13:24 +02:00
Matthew Honnibal	dd9cab0faf	Fix type-check for int/long	2017-09-06 19:03:05 +02:00
Matthew Honnibal	497a9308a8	Xfail new lemmatizer test	2017-09-06 18:41:22 +02:00
Matthew Honnibal	dcbf866970	Merge parser changes	2017-09-06 18:41:05 +02:00
Matthew Honnibal	5384fff5ce	Add test for 1305: Incorrect lemmatization of VBZ for English	2017-09-06 18:40:18 +02:00
Matthew Honnibal	24ff6b0ad9	Fix parsing and tok2vec models	2017-09-06 05:50:58 -05:00
Matthew Honnibal	1b65115bc2	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-04 20:02:53 -05:00
Matthew Honnibal	33fa91feb7	Restore correctness of parser model	2017-09-04 21:19:30 +02:00
Matthew Honnibal	e88a42e460	Increment version	2017-09-04 21:14:39 +02:00
Matthew Honnibal	9d65d67985	Preserve model compatibility in parser, for now	2017-09-04 16:46:22 +02:00
Matthew Honnibal	d5fbf27335	Fix test	2017-09-04 16:45:11 +02:00
Matthew Honnibal	7fdafcc4c4	Fix config loading in tagger	2017-09-04 16:38:49 +02:00
Matthew Honnibal	058372d120	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-04 16:27:53 +02:00
Matthew Honnibal	16e25ce3b5	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-04 09:26:53 -05:00
Matthew Honnibal	9f512e657a	Fix drop_layer calculation	2017-09-04 09:26:38 -05:00
Matthew Honnibal	cb4839033c	Fix loader for EN tests	2017-09-04 15:19:18 +02:00
Matthew Honnibal	382ce566eb	Fix deserialization bug	2017-09-04 15:19:01 +02:00
Matthew Honnibal	bfddf50081	Fix #1296 : Incorrect lemmatization of base form verbs	2017-09-04 15:18:41 +02:00
Matthew Honnibal	b29e6bff46	Improve lemmatization rule for am\|VBP	2017-09-04 15:18:10 +02:00
Matthew Honnibal	644d6c9e1a	Improve lemmatization tests, re #1296	2017-09-04 15:17:44 +02:00
Matthew Honnibal	3cf3fa1704	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-02 12:46:11 -05:00
Matthew Honnibal	e920885676	Fix pickle during train	2017-09-02 12:46:01 -05:00
Matthew Honnibal	c0eaba8b28	Fix low-data textcat	2017-09-02 15:17:32 +02:00
Matthew Honnibal	9e378bdac5	Fix textcat serialization	2017-09-02 15:17:20 +02:00
Matthew Honnibal	e3ea6ee02b	Increment version	2017-09-02 15:17:01 +02:00
Matthew Honnibal	a3b69bcb3d	Add low_data mode in textcat	2017-09-02 14:56:30 +02:00
Matthew Honnibal	ead78c7b9b	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-09-02 12:55:25 +02:00
Matthew Honnibal	5e6a9e7dcc	Add rule-based SBD	2017-09-02 12:53:38 +02:00
Matthew Honnibal	a824cf8f9a	Adjust text classification model	2017-09-02 11:41:00 +02:00
Matthew Honnibal	ac040b99bb	Add support for pre-trained vectors in text classifier	2017-09-01 16:39:55 +02:00
Matthew Honnibal	7742a6d559	Add GloVe vectors reader	2017-09-01 16:39:22 +02:00
Matthew Honnibal	789e1a3980	Use 13 parser features, not 8	2017-08-31 14:13:00 -05:00
Matthew Honnibal	30e35d9666	Fix syntax error	2017-08-30 17:35:39 -05:00
Matthew Honnibal	4ceebde523	Fix gradient bug in parser	2017-08-30 17:32:56 -05:00
ines	173089a45a	Add more validation for model meta	2017-08-29 11:21:46 +02:00
Matthew Honnibal	2e28982e28	Merge pull request #1288 from geovedi/indonesian Indonesian language support	2017-08-26 21:31:13 +02:00
ines	7e04b7f89c	Fix info text on pipeline in package cli	2017-08-26 18:30:59 +02:00
ines	40afa13a8a	Increment version	2017-08-26 18:30:49 +02:00
Matthew Honnibal	876f38c548	Merge pull request #1279 from oroszgy/model_cli_v2 Added vector loading to model cli	2017-08-26 15:57:50 +02:00
Matthew Honnibal	cfc055734e	Split % in units, for compatibility with corpus	2017-08-25 20:03:37 -05:00
Matthew Honnibal	4bb6bc3f9e	Add support for sent_start to GoldParse	2017-08-25 20:03:14 -05:00
Matthew Honnibal	44589fb38c	Fix Break oracle	2017-08-25 19:50:55 -05:00
Matthew Honnibal	6d4e8e14ca	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-08-25 12:37:16 -05:00
Matthew Honnibal	4ce5531389	Use layer norm instead of batch norm	2017-08-25 12:37:10 -05:00
Matthew Honnibal	20dd66ddc2	Constrain sentence boundaries to IS_PUNCT and IS_SPACE tokens	2017-08-25 19:35:47 +02:00
Jim Geovedi	58d8078971	Merge remote-tracking branch 'upstream/develop' into indonesian	2017-08-25 09:21:49 +08:00
Matthew Honnibal	6ceb0f0518	Allow Lexeme.rank to be set	2017-08-24 21:43:00 +02:00
Matthew Honnibal	44a1fa80d3	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-08-23 13:02:16 +02:00
ines	bb1abbeba5	Only link model if download was successfull	2017-08-23 12:36:31 +02:00
Matthew Honnibal	bb2541ffd3	Fix PROB attr for OOV words	2017-08-23 12:11:52 +02:00
Matthew Honnibal	1c5c256e58	Fix fine_tune when optimizer is None	2017-08-23 10:51:33 +02:00
Matthew Honnibal	9c580ad28a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-08-22 17:02:04 -05:00
Matthew Honnibal	a4633fff6f	Restore use of batch norm in model	2017-08-22 17:01:58 -05:00
Matthew Honnibal	03b5b9727a	Fix Doc.vector for empty doc objects	2017-08-22 19:52:19 +02:00
Matthew Honnibal	0551b7b03a	Fix doc.vector	2017-08-22 19:46:52 +02:00
Matthew Honnibal	83f8e98450	Fix retrieval of OOV vectors	2017-08-22 19:46:35 +02:00
Matthew Honnibal	df2745eb08	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-08-22 19:00:43 +02:00
Matthew Honnibal	5b329acbf2	Fix vectors_length property in vocab	2017-08-22 19:00:27 +02:00
Matthew Honnibal	1fe605dfe5	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-08-21 19:18:31 -05:00
Matthew Honnibal	18b64e79ec	Fix fine tuning	2017-08-21 19:18:26 -05:00
Matthew Honnibal	682346dd66	Restore optimized hidden_depth=0 for parser	2017-08-21 19:18:04 -05:00
Matthew Honnibal	a21d8f3f0b	Add predict paths to _ml models	2017-08-21 23:23:45 +02:00
Matthew Honnibal	cec76801dc	Add profile command to CLI	2017-08-21 23:23:05 +02:00
Matthew Honnibal	7be5f30f17	Add profile function	2017-08-21 23:22:49 +02:00
ines	a68dc891ea	Port over changes from #1281	2017-08-21 23:19:18 +02:00
Matthew Honnibal	5e50a65252	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-08-21 14:15:46 -05:00
Matthew Honnibal	80acbc5f1f	Fix fine-tune weight mixture	2017-08-21 14:15:29 -05:00
ines	d15775c3ad	Fix typos and commands in alpha docs	2017-08-21 13:40:11 +02:00
Gyorgy Orosz	b3576bfc86	Added vector leading to model cli	2017-08-20 23:16:12 +02:00
Matthew Honnibal	c10f63bf10	Initialize fine tuning to 0.5	2017-08-20 15:59:48 -05:00
Matthew Honnibal	62878e50db	Fix misalignment caued by filtering inputs at wrong point in parser	2017-08-20 15:59:28 -05:00
Matthew Honnibal	78a5f842e9	Fix update when update_shared=False	2017-08-20 15:58:34 -05:00
Matthew Honnibal	7a6edeea68	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-08-20 12:55:39 -05:00
Matthew Honnibal	f2f9229964	Fix name of update_shared flag	2017-08-20 18:19:06 +02:00
Matthew Honnibal	8a59718fd6	Fix fine-tuning	2017-08-20 18:17:35 +02:00
Matthew Honnibal	80a5146ec2	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-08-20 11:07:08 -05:00
Matthew Honnibal	84bb543e4d	Add gold_preproc flag to cli/train	2017-08-20 11:07:00 -05:00
Matthew Honnibal	3fe0d76e6d	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-08-20 14:50:01 +02:00
Matthew Honnibal	c1d3ff517a	Track loss in tagger	2017-08-20 14:42:23 +02:00
Matthew Honnibal	8875590081	Add optimizer in Language.update if sgd=None	2017-08-20 14:42:07 +02:00
Matthew Honnibal	84b7ed49e4	Ensure updates aren't made if no gold available	2017-08-20 14:41:38 +02:00
Ines Montani	c2bbd393af	Merge pull request #1276 from oroszgy/model_cli_v2 Ported model cli from v1	2017-08-20 11:52:59 +02:00
Jim Geovedi	f77443ab68	reworked	2017-08-20 13:43:21 +07:00
Jim Geovedi	fbc62a09c7	added {pre,suf,in}fix tests	2017-08-20 13:43:00 +07:00
Jim Geovedi	713d7c0aa0	added indonesian lang test	2017-08-20 12:17:14 +07:00

... 19 20 21 22 23 ...

5706 Commits