spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-29 19:36:31 +03:00

Author	SHA1	Message	Date
DuyguA	26ee0590a3	added some commonly used cases	2018-03-08 12:43:58 +01:00
DuyguA	ae6473e4d5	removed some words with negation particle.	2018-03-08 12:20:32 +01:00
DuyguA	6ed59a2198	removed number words to be caried to the lexical	2018-03-08 12:19:23 +01:00
DuyguA	04784a44a6	made alphabetical order for Turkish chaaracters	2018-03-08 12:11:32 +01:00
DuyguA	af33e022a5	added example sentences for Turkish	2018-03-08 12:06:03 +01:00
Matthew Honnibal	a1be01185c	Fix array out of bounds error in Span	2018-02-28 12:27:09 +01:00
Thomas Opsomer	8df9e52829	lemma property to return hash instead of unicode	2018-02-27 19:50:01 +01:00
Ines Montani	35634352fe	Merge pull request #2025 from dejanmarich/patch-1 Update stop_words.py for Croatian language	2018-02-26 18:22:32 +01:00
Matthew Honnibal	14f729c72a	Add subtok label to parser	2018-02-26 12:26:35 +01:00
Matthew Honnibal	7137ad8b0b	Make label filtering clearer for projectivisation	2018-02-26 12:02:01 +01:00
Matthew Honnibal	b8d52cb285	Fix inconsistent label freq cutoff for projectivisation	2018-02-26 12:01:44 +01:00
Matthew Honnibal	7b66ec896a	Revert "Revert "Improve parser oracle around sentence breaks."" This reverts commit `36e481c584`.	2018-02-26 10:57:37 +01:00
Matthew Honnibal	36e481c584	Revert "Improve parser oracle around sentence breaks." This reverts commit `50817dc9ad`.	2018-02-26 10:53:55 +01:00
Matthew Honnibal	5faae803c6	Add option to not use Janome for Japanese tokenization	2018-02-26 09:39:46 +01:00
Matthew Honnibal	9b406181cd	Add Chinese.Defaults.use_jieba setting, for UD	2018-02-25 15:12:38 +01:00
Matthew Honnibal	9ccd0c643b	Add Vietnamese	2018-02-25 15:00:46 +01:00
Matthew Honnibal	d4fdb97c87	Fix alignment for words with spaces	2018-02-25 14:55:00 +01:00
Matthew Honnibal	6d2c1ef52c	Fix SP tag in generic tag map	2018-02-24 16:04:56 +01:00
Matthew Honnibal	5cc3bd1c1d	Update alignment tests	2018-02-24 16:03:58 +01:00
Matthew Honnibal	6138439469	Fix many-to-one alignment	2018-02-24 16:03:50 +01:00
Matthew Honnibal	4890ee1732	Fix scoring of tokenization for punct	2018-02-24 10:32:32 +01:00
Matthew Honnibal	12b39f87da	Move cython declarations in matcher.pyx	2018-02-24 10:32:18 +01:00
Matthew Honnibal	01d1b7abdf	Support many-to-one alignment in GoldParse	2018-02-24 10:17:01 +01:00
Matthew Honnibal	7865746574	Support many-to-one alignment	2018-02-24 02:09:53 +01:00
Matthew Honnibal	458710b831	Poke matcher test for appveyor	2018-02-23 23:53:48 +01:00
Matthew Honnibal	968dabdde4	Fix bug in multi-task objective	2018-02-23 23:48:09 +01:00
Matthew Honnibal	2c9c8b8d72	Try comming out emoji test in matcher	2018-02-23 23:34:35 +01:00
Matthew Honnibal	980ad68cbe	Try to find test that fails on appveyor	2018-02-23 21:27:53 +01:00
Matthew Honnibal	39de8cd4d3	Try to find test failing on appveyor	2018-02-23 20:59:21 +01:00
Matthew Honnibal	4492a33a9d	Fix sent_start multi-task objective when alignment fails	2018-02-23 16:50:59 +01:00
Matthew Honnibal	5fa44e93f1	Set unicode_literals in matcher	2018-02-23 16:48:54 +01:00
Matthew Honnibal	12264f9296	Add multi-task objective for sentence segmentation	2018-02-23 16:25:57 +01:00
Matthew Honnibal	e7deadb519	Set version to 2.1.0.dev1	2018-02-23 16:22:24 +01:00
Matthew Honnibal	7b575a119e	Try to reduce memory usage of test_matcher	2018-02-23 15:34:37 +01:00
Matthew Honnibal	24563f4026	Fix data typing in align	2018-02-23 15:08:06 +01:00
Matthew Honnibal	7a5ba20692	Fix integer typing in _align	2018-02-23 14:51:24 +01:00
Matthew Honnibal	875411b875	Set unicode types in _align.pyx and test	2018-02-23 14:35:38 +01:00
Matthew Honnibal	51d9679aa3	Fix broken span.as_doc test	2018-02-23 14:22:24 +01:00
dejanmarich	71c261d58b	Update stop_words.py Added more words	2018-02-23 10:31:01 +01:00
Matthew Honnibal	3e6c1111b7	Remove obsolete test	2018-02-23 03:22:07 +01:00
Matthew Honnibal	a4fdec524a	Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-gold	2018-02-22 21:44:28 +01:00
Matthew Honnibal	50817dc9ad	Improve parser oracle around sentence breaks.	2018-02-22 19:22:26 +01:00
Matthew Honnibal	307aefe131	Increment version to v2.0.9	2018-02-22 17:07:53 +01:00
Feng Niu	1c60384bed	return on empty doc	2018-02-21 15:39:04 -08:00
Feng Niu	7eb1cd100b	unbound doc var	2018-02-21 15:05:37 -08:00
Feng Niu	8df75b229c	fix unbound vars in es.syntax_iterators	2018-02-21 13:11:17 -08:00
alldefector	4244e285c2	Fix Spanish noun_chunks failure caused by typo	2018-02-21 12:43:21 -08:00
Matthew Honnibal	661873ee4c	Randomize the rebatch size in parser	2018-02-21 21:02:07 +01:00
Matthew Honnibal	0872cf611d	Don't lower-case lemmas of proper nouns	2018-02-21 16:01:16 +01:00
Matthew Honnibal	a0ddb803fd	Make error when no label found more helpful	2018-02-21 16:00:59 +01:00
Matthew Honnibal	ea2fc5d45f	Improve length and freq cutoffs in parser	2018-02-21 16:00:38 +01:00
Matthew Honnibal	e5757d4bf0	Add labels property to parser	2018-02-21 16:00:00 +01:00
Matthew Honnibal	eff4ae809a	Fix nonproj label filter	2018-02-21 15:59:04 +01:00
Matthew Honnibal	e624405cda	Temporarily remove cutoff when filtering labels in nonproj	2018-02-21 13:53:40 +01:00
Matthew Honnibal	f466f0186e	Use new alignment implementation in GoldParse	2018-02-20 21:16:35 +01:00
Matthew Honnibal	c0734ba526	Make alignment work with strings	2018-02-20 17:51:49 +01:00
Matthew Honnibal	8180c84a98	Add tests for new Levenshtein alignment	2018-02-20 17:32:25 +01:00
Matthew Honnibal	930c980570	Add improved Levenshtein alignment implementation	2018-02-20 17:31:56 +01:00
Ines Montani	14e7e0f12a	Merge pull request #2000 from jimregan/polish-tag-map Polish tag map	2018-02-18 19:05:58 +01:00
Jim O'Regan	664407de5d	missing PrepCase attribute	2018-02-18 14:46:12 +00:00
Jim O'Regan	95f0673fbc	fix typo/missing here too	2018-02-18 14:38:27 +00:00
Matthew Honnibal	2bccad8815	Fix incorrect matcher test	2018-02-18 14:56:12 +01:00
Matthew Honnibal	530172d57a	Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-faster-matcher	2018-02-18 14:40:42 +01:00
Matthew Honnibal	cf0e320f2b	Add doc.is_sentenced attribute, re #1959	2018-02-18 14:16:55 +01:00
Matthew Honnibal	1e5aeb4eec	Merge pull request #1987 from thomasopsomer/span-sent Make span.sent work when only manual / custom sbd	2018-02-18 14:05:37 +01:00
Matthew Honnibal	1cf774bdc1	Add output options return_matches and as_tuples to Matcher	2018-02-18 14:00:45 +01:00
Matthew Honnibal	dd9b0945af	Fix inconsistencies in the symbols table	2018-02-18 13:51:31 +01:00
Matthew Honnibal	66496ac8e1	Set version to v2.1.0.dev0	2018-02-18 13:48:39 +01:00
Matthew Honnibal	eb3040ce46	Merge pull request #1891 from fucking-signup/master Fix issue #1889	2018-02-18 13:47:47 +01:00
Matthew Honnibal	3d7285870b	Update matcher branch with v2.0.8 master	2018-02-18 13:42:58 +01:00
ines	6bba1db4cc	Drop six and related hacks as a dependency	2018-02-18 13:29:56 +01:00
Matthew Honnibal	b30b09192a	Merge pull request #1665 from jimregan/animacy typo in "inan", add "nhum"	2018-02-18 13:26:53 +01:00
Matthew Honnibal	1b3c98e01b	Set version to v2.0.8	2018-02-18 12:16:31 +01:00
Matthew Honnibal	f9f46e5a07	Revert matcher fixes from GregDubbin	2018-02-18 10:59:28 +01:00
Matthew Honnibal	86405e4ad1	Fix CLI for multitask objectives	2018-02-18 10:59:11 +01:00
Matthew Honnibal	a34749b2bf	Add multitask objectives options to train CLI	2018-02-17 22:03:54 +01:00
Matthew Honnibal	8f06903e09	Fix multitask objectives	2018-02-17 18:41:36 +01:00
Matthew Honnibal	d1246c95fb	Fix model loading when using multitask objectives	2018-02-17 18:11:36 +01:00
Matthew Honnibal	262d0a3148	Fix overwriting of lexical attributes when loading vectors during training	2018-02-17 18:11:11 +01:00
Matthew Honnibal	c0caf7cf27	Fix LANG symbol	2018-02-17 18:10:50 +01:00
Matthew Honnibal	0bf2f6be29	Add missing symbol for LANG attr. Fixes inconsistent numeric ID	2018-02-17 17:37:02 +01:00
Matthew Honnibal	97a228a4ce	Increment to v2.0.8.dev0	2018-02-17 16:54:36 +01:00
Matthew Honnibal	f7dc64d2a3	Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-faster-matcher	2018-02-17 16:47:35 +01:00
Aaron Marquez	ea571e8325	Merge branch 'master' into issue-1959	2018-02-16 15:14:09 -08:00
Matthew Honnibal	7d5c720fc3	Fix multitask objective when no pipeline provided	2018-02-15 23:50:21 +01:00
Aaron Marquez	f0d3672e17	Changed loading EN model	2018-02-15 14:28:38 -08:00
Aaron Marquez	3765d84d57	Fix issue #1959	2018-02-15 12:51:49 -08:00
Aaron Marquez	7ba4111554	Add test for issue-1959	2018-02-15 12:46:22 -08:00
Matthew Honnibal	59b7cf9db8	Add get_beam_parse method in ArcEager, for Prodigy	2018-02-15 21:03:16 +01:00
Matthew Honnibal	3e541de440	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-15 21:02:55 +01:00
Thomas Opsomer	5d24a81c0b	add test for span.sent when doc not parsed	2018-02-15 16:59:16 +01:00
Thomas Opsomer	deab391cbf	correct check on sent_start & raise if no boundaries	2018-02-15 16:58:30 +01:00
Matthew Honnibal	afbd46adfb	Remove length cap in PhraseMatcher	2018-02-15 16:10:54 +01:00
Matthew Honnibal	4533c7408d	Update matcher tests	2018-02-15 15:39:47 +01:00
Matthew Honnibal	1c19605426	Move matcher2.pyx to matcher.pyx	2018-02-15 15:27:03 +01:00
Matthew Honnibal	9ebf2fe7c3	Make helper function to get longest matches	2018-02-15 15:26:15 +01:00
Matthew Honnibal	4cb861e080	Merge pull request #1968 from DuyguA/is_currency New lexical feature is_currency	2018-02-15 12:13:36 +01:00
Thomas Opsomer	b902731313	Find span sentence when only sentence boundaries (no parser)	2018-02-14 22:18:54 +01:00
Matthew Honnibal	d19dc67886	Make get_action nogil, for efficiency	2018-02-14 12:16:36 +01:00
Matthew Honnibal	7885b92b45	Refactor matcher2, hopefully making it faster	2018-02-14 12:11:17 +01:00
Matthew Honnibal	00261eea27	Make tests refer to matcher2	2018-02-14 12:10:51 +01:00
Claudiu-Vlad Ursache	e28de12cbd	Ensure files opened in `from_disk` are closed Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706).	2018-02-13 20:49:43 +01:00
Matthew Honnibal	262cbe356e	Remove caching, as doesn't seem to help for now.	2018-02-13 17:15:20 +01:00
Matthew Honnibal	f43d53f2c5	Remove print statement	2018-02-13 17:15:07 +01:00
Matthew Honnibal	dcd8d89aef	Update test for 850, making it work with matcher2	2018-02-13 16:35:20 +01:00
Matthew Honnibal	9bdfa5cd4f	Remove re comparisons tests, as matcher behaves differently	2018-02-13 16:28:52 +01:00
Matthew Honnibal	6d7986b0f1	Fix matcher test	2018-02-13 16:28:06 +01:00
Matthew Honnibal	9efda9e9ab	Add PhraseMatcher in matcher2.pyx	2018-02-13 16:27:46 +01:00
Johannes Dollinger	012e874d09	Add contributor agreement for emulbreh	2018-02-13 13:40:33 +01:00
Johannes Dollinger	bf94c13382	Don't fix random seeds on import	2018-02-13 12:42:23 +01:00
Matthew Honnibal	0004331895	Update notes on matcher2	2018-02-13 11:45:45 +01:00
Matthew Honnibal	b4cc39eb74	Fix zero-width quantifiers. Passes test_matcher	2018-02-13 11:45:32 +01:00
Matthew Honnibal	1b01685f47	Fix ZERO_PLUS operator	2018-02-12 12:28:03 +01:00
Matthew Honnibal	9115c3ba0a	Add TODO in notes	2018-02-12 12:06:48 +01:00
Matthew Honnibal	b00326a7fe	Move pattern_id out of TokenPattern	2018-02-12 12:05:54 +01:00
Matthew Honnibal	d34c732635	Add Python notes for rethinking matcher	2018-02-12 10:19:29 +01:00
Matthew Honnibal	d7c9b53120	Pass kwargs into pipeline components during begin_training	2018-02-12 10:18:39 +01:00
Matthew Honnibal	fae5c0dc18	Work on matcher2	2018-02-12 10:17:43 +01:00
4altinok	ca8728035d	added new lex feat to token	2018-02-11 18:55:48 +01:00
4altinok	edd7202a06	added new symbol	2018-02-11 18:55:32 +01:00
4altinok	ed1ac2969e	added new lexical feat to lexeme	2018-02-11 18:51:48 +01:00
4altinok	94fb0b75e3	code for is_currency	2018-02-11 18:51:32 +01:00
4altinok	3deef1497a	removed 18 and replaced 18 with is_currency	2018-02-11 18:51:09 +01:00
4altinok	471d3c9e23	added lex test for is_currency	2018-02-11 18:50:50 +01:00
ines	c63e99da8a	Fix typo in glossary (resolves #1964 ) Co-Authored-By: SThomasP <sthomasp@users.noreply.github.com>	2018-02-10 11:58:41 +01:00
Lyndon White	6ee5dff51c	Make python 3.4 compat module loading (fix #1733 )	2018-02-09 23:03:35 +08:00
Matthew Honnibal	e361b4f82b	Fix #1929 : Incorrect NER when pre-set sentence boundaries.	2018-02-08 15:25:41 +01:00
Matthew Honnibal	fd9fd275c5	Make test for #1945 more precise	2018-02-07 02:06:11 +01:00
Matthew Honnibal	c087a14380	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-07 01:29:39 +01:00
Matthew Honnibal	76d89b2180	Add test for #1945 : PhraseMatcher regression	2018-02-07 01:29:23 +01:00
Ines Montani	0954e15dda	Merge pull request #1913 from ohenrik/nb_syntax_iterator Norwegian Language (nb) - Added french syntax iterator with explanation	2018-02-06 04:59:07 +01:00
Ole Henrik Skogstrøm	251a7805fe	Copied French syntax iterator to simplify future changes	2018-02-05 14:45:05 +01:00
Matthew Honnibal	2e7391e627	Merge pull request #1916 from tokestermw/bug/fix-not-passing-in-model-cfg-in-nlp Bug/fix not passing in model cfg in nlp	2018-02-05 01:19:40 +01:00
Ali Zarezade	9df9da34a3	Fix init_model issue Fixing issue #1928	2018-02-03 17:21:34 +03:30
Matthew Honnibal	ebe84e45e5	Increment version to 2.0.7	2018-02-02 03:39:16 +01:00
Matthew Honnibal	e4b1f57599	Increment version	2018-02-02 02:33:23 +01:00
Matthew Honnibal	069531c351	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-02 02:32:58 +01:00
Matthew Honnibal	f74a802d09	Test and fix #1919 : Error resuming training	2018-02-02 02:32:40 +01:00
ines	f1d3deffac	Add Russian example sentences (see #1107 )	2018-02-01 20:09:40 +01:00
Matthew Honnibal	6b1126c312	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-01 02:57:52 +01:00
ines	3c1fb9d02d	Make validate command fail more gracefully if version not found Mostly relevant during develoment when working with .dev versions	2018-01-31 22:06:28 +01:00
Motoki Wu	54062b7326	added tests for issue #1915	2018-01-30 18:30:19 -08:00
Motoki Wu	f4a7d1a423	make to sure pass in **cfg to each component when training	2018-01-30 18:29:54 -08:00
ines	4046823699	Only check component in factories if string (see #1911 )	2018-01-30 16:29:07 +01:00
ines	ce10d320c4	Fix component check in self.factories (see #1911 )	2018-01-30 16:09:37 +01:00
Ole Henrik Skogstrøm	e40465487c	Added french syntax iterator with explenation	2018-01-30 15:44:29 +01:00
ines	8901814248	Improve error handling if pipeline component is not callable (resolves #1911 ) Also add help message if user accidentally calls nlp.add_pipe() with a string of a built-in component name.	2018-01-30 15:43:03 +01:00
Matthew Honnibal	a437ba87a3	Set release=True	2018-01-29 21:26:04 +01:00
Adam Binford	9238749aaf	Removed test to avoid network requests	2018-01-29 14:48:20 -05:00
Adam Binford	1a2c2f7d7f	Fixed auto linking after download and added simple test to check	2018-01-29 14:25:21 -05:00
Matthew Honnibal	cb7110c22e	Merge pull request #1882 from ohenrik/nb_lemma_and_tag_map Add norwegian bokmål ('nb') lemmatizer and tag_map	2018-01-29 18:18:50 +01:00
Matthew Honnibal	0c1e7f0c86	Merge pull request #1893 from azarezade/master Add Persian language	2018-01-29 18:18:33 +01:00
Matthew Honnibal	cbdab75b36	Increment version	2018-01-28 23:46:22 +01:00
Matthew Honnibal	512e6adb08	Merge pull request #1896 from thomasopsomer/fix-sent Fix sentence boundaries serialization (issue #1834)	2018-01-28 21:18:51 +01:00
Matthew Honnibal	f5b1ad4100	Limit parser model size, to hopefully reduce memory during CI tests	2018-01-28 21:00:32 +01:00
Thomas Opsomer	515e25910e	fix sent_start in serialization	2018-01-28 19:50:42 +01:00
Thomas Opsomer	45d62561f7	add test for the issue	2018-01-28 19:49:56 +01:00
ines	6d978e5c35	Don't use deprecated Doc.merge call in displaCy As reported here: https://stackoverflow.com/a/48464412/6400719	2018-01-27 11:25:05 +01:00
Ali Zarezade	bb6bd3d8ae	add persian language	2018-01-27 13:27:26 +03:30
Ali Zarezade	d195675db5	add persian language	2018-01-27 13:21:38 +03:30
Kit	4b42267ba3	Fix issue #1889	2018-01-25 23:17:22 +01:00
Kit	52ef51f36e	Add test for issue #1889	2018-01-25 22:56:48 +01:00
Ole Henrik Skogstrøm	8e2c9f2475	Cleaned up nb tag_map comments	2018-01-25 11:09:28 +01:00
Ole Henrik Skogstrøm	1107e89fcf	Updated doc string on nb tag_map module	2018-01-25 11:08:28 +01:00
Matthew Honnibal	6a8cb905aa	Merge pull request #1876 from GregDubbin/master Pattern matcher fixes	2018-01-24 16:38:11 +01:00
Matthew Honnibal	38b260e0c3	Merge pull request #1879 from azarezade/master Add Persian character and symbols	2018-01-24 16:34:22 +01:00
Matthew Honnibal	edb71a280e	Add test for #1883 : Unpickling Matcher	2018-01-24 15:42:33 +01:00
Matthew Honnibal	2ad050e668	Fix unpickling of Matcher. Also store correct data in matcher._patterns	2018-01-24 15:42:11 +01:00
Ole Henrik Skogstrøm	4058a7d579	Fix æøå characters in lemmatizer	2018-01-24 14:03:14 +01:00
Ole Henrik Skogstrøm	42248f423f	Updated tag map	2018-01-24 13:50:33 +01:00
Ole Henrik Skogstrøm	74b430b49a	Correct Lemmatizer	2018-01-24 13:26:33 +01:00
Ole Henrik Skogstrøm	b9b3a40c78	Add norwegian lemmatizer and tag_map	2018-01-24 12:28:29 +01:00
Matthew Honnibal	42a18ef903	Add test for #1868 : Vocab.__contains__ with ints	2018-01-23 23:27:05 +01:00
Matthew Honnibal	43f381ce36	Make Vocab.__contains__ work with ints. Fixes #1868	2018-01-23 23:26:47 +01:00
greg	85ab99e692	Correct test examples	2018-01-23 15:00:14 -05:00
greg	f50bb1aafc	Restructure StateC to eliminate dependency on unordered_map	2018-01-23 14:40:03 -05:00
Matthew Honnibal	f3753c2453	Further model deserialization fixes re #1727	2018-01-23 19:16:05 +01:00
Matthew Honnibal	91e916cb67	Add comment to new test	2018-01-23 19:11:53 +01:00
Matthew Honnibal	fd187d71ad	Add test for #1727	2018-01-23 19:11:01 +01:00
Matthew Honnibal	85c942a6e3	Dont overwrite pretrained_dims setting from cfg. Fixes #1727	2018-01-23 19:10:49 +01:00
Ali Zarezade	42349471bc	add ٪ as punctuation	2018-01-23 18:11:33 +03:30
Ali Zarezade	2bda582135	Add Persian character and symbols Add Persian characters and the following: - ٪ used instead of % - ؟ used instead of ? - ﷼ used instead of $ - ، used instead of , - ؛ used instead of ;	2018-01-23 13:20:36 +03:30
Matthew Honnibal	7e6dc283db	Fix unicode import in test	2018-01-22 23:55:44 +01:00
greg	686735b94e	Fix matcher import	2018-01-22 16:53:05 -05:00
greg	3a491093ee	Import libcpp.map if libcpp.unordered_map doesn't exist	2018-01-22 16:46:25 -05:00
greg	d55992bdf0	Switch match dictionary to use final state pointer rather than ID	2018-01-22 15:36:47 -05:00
Matthew Honnibal	4ce7d24fd5	Add test for #1799 : Set left and right edges (and thus sentences) in non-projective parses.	2018-01-22 20:18:38 +01:00
Matthew Honnibal	56164ab688	Set l_edge and r_edge correctly for non-projective parses. Fixes #1799	2018-01-22 20:18:04 +01:00
Matthew Honnibal	964aa1b384	Merge branch 'master' of https://github.com/explosion/spaCy	2018-01-22 19:18:46 +01:00
Matthew Honnibal	29897ed1b3	Allow vector loading to work on 1d data files. Fixes #1831	2018-01-22 19:18:26 +01:00
greg	490bc82c27	Add comments clarifying matcher logic for '*'	2018-01-22 10:03:12 -05:00
Matthew Honnibal	fe4748fc38	Merge pull request #1870 from avadhpatel/master Model Load Performance Improvement by more than 5x	2018-01-22 00:05:15 +01:00
Avadh Patel	a517df55c8	Small fix Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-21 15:20:45 -06:00
Avadh Patel	5b5029890d	Merge branch 'perfTuning' into perfTuningMaster Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-21 15:20:00 -06:00
Matthew Honnibal	203d2ea830	Allow multitask objectives to be added to the parser and NER more easily	2018-01-21 19:37:02 +01:00
Matthew Honnibal	4a7d524efb	Merge branch 'master' of https://github.com/explosion/spaCy	2018-01-21 19:22:03 +01:00
Matthew Honnibal	61a051f2c0	Fix MultitaskObjective	2018-01-21 19:21:34 +01:00
Avadh Patel	75903949da	Updated model building after suggestion from Matthew Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-18 06:51:57 -06:00
Avadh Patel	fe879da2a1	Do not train model if its going to be loaded from disk This saves significant time in loading a model from disk. Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-17 06:16:07 -06:00
Avadh Patel	2146faffee	Do not train model if its going to be loaded from disk This saves significant time in loading a model from disk. Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-17 06:04:22 -06:00
greg	7072b395c9	Add greedy matcher tests	2018-01-16 15:46:13 -05:00
greg	441f490c1c	Merge branch 'master' of github.com:GregDubbin/spaCy	2018-01-16 13:31:10 -05:00
greg	8bea62f26e	Correct bugs for greedy matching and introduce ADVANCE_PLUS action	2018-01-16 13:21:43 -05:00
Matthew Honnibal	ccb51a9f36	Make .similarity() return 1.0 if all orth attrs match	2018-01-15 16:29:48 +01:00
Matthew Honnibal	82135d85b7	Fix test	2018-01-15 15:55:15 +01:00
Matthew Honnibal	4b09616b58	Add test for #1757 : Comparison against None	2018-01-15 15:55:01 +01:00
Matthew Honnibal	b904d81e9a	Fix rich comparison against None objects. Closes #1757	2018-01-15 15:51:25 +01:00
Matthew Honnibal	9e413449f6	Fix unicode error in new test	2018-01-15 15:39:00 +01:00
Matthew Honnibal	ab7c45b12d	Fix error message and handling of doc.sents	2018-01-15 15:21:11 +01:00
Matthew Honnibal	6b215d2dd3	Add test for Issue #1537	2018-01-15 15:20:56 +01:00
ines	5babb7d6f6	Merge branch 'master' of https://github.com/explosion/spaCy	2018-01-14 17:31:09 +01:00
ines	793890cb4d	Remove test for removed deprecation warning	2018-01-14 17:31:06 +01:00
Matthew Honnibal	465a6f6452	Add missing Span.vocab property. Closes #1633	2018-01-14 15:06:30 +01:00
Matthew Honnibal	0cb090e526	Fix infinite recursion in token.sent_start. Closes #1640	2018-01-14 15:02:15 +01:00
Matthew Honnibal	5cbe913b6f	Don't raise deprecation warning in property. Closes #1813 , #1712	2018-01-14 14:55:58 +01:00
Matthew Honnibal	1a1cca6052	Fix vectors.resize() on Py3. Closes #1539	2018-01-14 14:48:51 +01:00
Matthew Honnibal	0153220304	Make set_vector add word to vocab. Fixes #1807	2018-01-14 13:57:57 +01:00
Ines Montani	55754f0cee	Merge pull request #1836 from fucking-signup/master Add tests for issue #1769	2018-01-13 00:23:35 +00:00
Kit	4ee97f20a0	Mark like_num tests as slow	2018-01-13 00:44:15 +01:00
Kit	855531537e	Rewrite tests for issue #1769	2018-01-12 23:49:51 +01:00
Kit	5b541cb5ec	Simplify tests for issue #1769	2018-01-12 23:34:27 +01:00
Kit	7a2adc4633	Remove some tests to see build status changes	2018-01-12 22:49:16 +01:00
Kit	0e62809a43	Rewrite tests for issue #1769	2018-01-12 22:26:06 +01:00
Ines Montani	36f426fe0a	Merge pull request #1808 from fucking-signup/master Fix issue #1769	2018-01-12 21:12:02 +00:00
Kit	76f4eeca44	Remove tests to see build changes on Windows (Python 2.7)	2018-01-12 20:30:51 +01:00
Matthew Honnibal	7ca49c2061	Merge branch 'master' into feature-improve-model-download	2018-01-10 18:21:55 +01:00
Kit	7ec0956e8d	Add regression test (issue #1769 )	2018-01-08 03:42:04 +01:00
Kit	701e7cc6aa	Rename variable to keep code consistent	2018-01-08 03:38:44 +01:00
Kit	ed0db95183	Find lowercased forms of ordinal words, where possible	2018-01-08 03:28:50 +01:00
Kit	9bc524982e	Find lowercased forms of numeric words	2018-01-08 03:25:08 +01:00
Søren Lind Kristiansen	62de5da1ff	Remove unsused dummy variable	2018-01-05 09:57:24 +01:00
Søren Lind Kristiansen	10dab8eef8	Remove dummy variable from function calls	2018-01-05 09:37:05 +01:00
Søren Lind Kristiansen	7f0ab145e9	Don't pass CLI command name as dummy argument	2018-01-04 21:33:47 +01:00
Ines Montani	6a008233b5	Merge pull request #1795 from textioHQ/issue1758 (resolves #1758 ) english tokenizer: handle "would've"	2018-01-04 02:43:39 +00:00
Kevin Humphreys	597df5bf83	add test	2018-01-03 13:00:05 -08:00
Kevin Humphreys	7918fa4ef9	handle would've	2018-01-03 12:25:48 -08:00
ines	2c656f90fb	Exit with 1 if incompatible models found (see #1714 )	2018-01-03 21:20:35 +01:00
ines	dacfaa2ca4	Ensure that download command exits properly (resolves #1714 )	2018-01-03 21:03:36 +01:00
Søren Lind Kristiansen	a9ff6eadc9	Prefix dummy argument names with underscore	2018-01-03 20:48:12 +01:00
ines	1081e08efb	Fix formatting	2018-01-03 20:14:50 +01:00
ines	d8109964d6	Use --no-deps on model install In general, it's nice for models to specify spaCy as a dependency. However, this tends to cause problems in conda environments, as pip will re-install spaCy and its dependencies (especially Thinc)	2018-01-03 17:40:37 +01:00
ines	319d754309	Fix overwriting of existing symlinks Check for is_symlink() to also overwrite invalid and outdated symlinks. Also show better error message if link path exists but is not symlink (i.e. file or directory).	2018-01-03 17:39:36 +01:00
ines	8ba0dfd017	Make message on failed linking more clear	2018-01-03 17:38:09 +01:00
Søren Lind Kristiansen	d6327e8495	Fix handling case when vectors not specified	2018-01-03 12:20:49 +01:00
Søren Lind Kristiansen	bcc51d7d8b	Fix shifted positional arguments	2018-01-03 12:19:47 +01:00
zqhZY	f27859fa99	add ChineseDefaults class for pickling	2017-12-28 17:13:58 +08:00
Ines Montani	ff9fc945ab	Merge pull request #1749 from sorenlind/da_ud_tokenization Tune Danish tokenizer to more closely match Universal Dependencies	2017-12-22 16:00:49 +00:00
ines	26f313dabc	Fix missing import	2017-12-22 16:21:44 +01:00
ines	8dc1c27841	Merge branch 'master' of https://github.com/explosion/spaCy	2017-12-22 16:01:00 +01:00
ines	b10ba848b8	xfail test that causes MemoryError on Python 2 on Windows Need to investigate this further!	2017-12-22 16:00:58 +01:00
Søren Lind Kristiansen	bef735aef7	Fix Danish abbreviation 'm.h.t.'	2017-12-21 09:24:31 +01:00
Ines Montani	a3dd167d7f	Merge branch 'master' into da_ud_tokenization	2017-12-20 21:05:34 +00:00
Ines Montani	97f100f69f	Merge pull request #1742 from kimfalk/master Two corrections in the da lan.	2017-12-20 21:02:00 +00:00
Ines Montani	d682a8803e	Merge pull request #1672 from cbilgili/master Adds Turkish Lemmatization	2017-12-20 21:01:00 +00:00
Benjamin Peterson	9452134cd1	remove no-break spaces from Hindi example (fixes #1750 )	2017-12-20 11:35:30 -08:00
Søren Lind Kristiansen	7a2f2f6f94	Fix formatting.	2017-12-20 18:37:37 +01:00
Søren Lind Kristiansen	15d13efafd	Tune Danish tokenizer to more closely match tokenization in Universal Dependencies.	2017-12-20 17:36:52 +01:00
Kim FalkJørgensen	648dc60755	Remove the incorrect exception 'm.h.t'	2017-12-20 10:02:39 +01:00
Kim FalkJørgensen	9c9f4ef84a	Fixing a translation error in examples.py Adding an exception in the tokenizer_exceptions.py	2017-12-19 15:26:50 +01:00
ines	22dc744b48	Fix check for '@' in like_url (see #1715 )	2017-12-16 13:48:43 +01:00
Ines Montani	9c1ee65268	Add regression test for #1698	2017-12-12 10:36:11 +01:00
Ines Montani	6455b574fc	Check for email address first	2017-12-12 10:25:13 +01:00
Bri-Will	d77361d76c	Update lex_attrs.py. Fix like_url from matching on e-mail	2017-12-11 14:13:28 -08:00
Søren Lind Kristiansen	5a9d377580	Remove abbreviation for positional plac argument	2017-12-11 11:08:29 +01:00
Isaac Sijaranamual	38021fbb00	Switch from python 3 only TemporaryDirectory to pytest's tmpdir	2017-12-11 00:16:04 +01:00
Isaac Sijaranamual	20ae0c459a	Fixes "Error saving model" #1622	2017-12-10 23:07:13 +01:00
Isaac Sijaranamual	568130ce7c	Adds regression test_issue1622	2017-12-10 23:00:48 +01:00
Isaac Sijaranamual	e188b61960	Make cli/train.py not eat exception	2017-12-10 22:53:08 +01:00
ines	020a7e5d52	Allow 'fine_grained' option in displaCy (see #1703 ) Shows token.tag_ instead of token.pos_. Disabled by default, to not cause rendering issues for models with long fine-grained tags (e.g. merged morphological features).	2017-12-09 15:11:12 +01:00
Matthew Honnibal	3b17eb7c49	Merge branch 'master' of https://github.com/explosion/spaCy	2017-12-07 10:39:32 +01:00
Matthew Honnibal	a6b43729c6	Set version to v2.0.5	2017-12-07 10:39:14 +01:00
ines	5eaa61c2b8	Fix formatting	2017-12-07 10:23:09 +01:00
ines	24e80c51b8	Document init-model command	2017-12-07 10:14:37 +01:00
Matthew Honnibal	c91f451b0f	Fix imports and CLI in init-model	2017-12-07 10:03:07 +01:00
ines	82e80ff928	Rename model command to init_model and fix formatting	2017-12-07 09:59:23 +01:00
Ines Montani	2feeb428d6	Merge pull request #1646 from GreenRiverRUS/master Added model command to create models from raw data	2017-12-07 08:54:26 +00:00
Matthew Honnibal	6373d2580d	Increment version to v2.0.5.dev0	2017-12-07 09:53:59 +01:00
Matthew Honnibal	36b47e3fa6	Fix (and test) vector pickling	2017-12-07 09:53:30 +01:00
Matthew Honnibal	05f41ff587	Set version to 2.0.4	2017-12-06 13:24:02 +01:00
Matthew Honnibal	04c38f7e87	Merge branch 'master' of https://github.com/explosion/spaCy	2017-12-06 12:15:52 +01:00
Matthew Honnibal	361944e512	If no rules are set, lemmatize by lookup	2017-12-06 12:12:11 +01:00
Matthew Honnibal	2ab0f2d186	Merge pull request #1664 from jimregan/italian-lemmatizer BOM in Italian lemmatiser	2017-12-06 11:09:04 +01:00
Matthew Honnibal	3f247119d3	Merge pull request #1668 from sorenlind/da_morph Add more Danish morph rules and clean up existing ones	2017-12-06 11:08:09 +01:00
Matthew Honnibal	b712de774e	Fix vectors pickling	2017-12-05 12:45:24 +01:00
Matthew Honnibal	04650e38c7	Set version to 2.0.4.dev0	2017-12-05 10:52:31 +01:00
Matthew Honnibal	07acb43a85	Merge branch 'master' of https://github.com/explosion/spaCy	2017-12-04 14:42:52 +01:00
Thomas Werkmeister	94eac75b7c	fix setup.py spacy req string for packaging Requirement should be `spacy>=2.0.2` instead of `spacy2.0.2`	2017-12-03 04:16:28 -06:00
ines	f2ea6d4713	Add Dutch example sentences (see #1107 )	2017-12-01 23:36:05 +01:00
Canbey Bilgili	abe098b255	Adds Turkish Lemmatization	2017-12-01 17:04:32 +03:00
Søren Lind Kristiansen	d86b537a38	Enable morph rules for Danish	2017-11-30 15:58:02 +01:00
Søren Lind Kristiansen	13a988adc3	Remove 'Number[psor]'	2017-11-30 15:55:04 +01:00
Søren Lind Kristiansen	dd6fde18a9	Add more Danish morph rules and clean up existing ones	2017-11-30 11:17:19 +01:00
Vadim Mazaev	495eacf470	Merge branch 'model_command'	2017-11-30 12:30:26 +03:00
Vadim Mazaev	4ba7ddf651	Bugfixies	2017-11-30 12:29:38 +03:00
Jim O'Regan	a4ecdeadd4	aha	2017-11-29 23:43:25 +00:00
Jim O'Regan	2c7a9215d7	Merge branch 'master' into animacy	2017-11-29 23:31:12 +00:00
Jim O'Regan	c3e6cee17a	use inan in polimorf tagset conversion	2017-11-29 23:15:47 +00:00
Jim O'Regan	b32575e78c	imports	2017-11-29 23:03:41 +00:00
Jim O'Regan	3696ce6a7b	add UD mapping	2017-11-29 22:59:19 +00:00
Jim O'Regan	f8e7082fe4	typo in "inan", add "nhum"	2017-11-29 22:40:47 +00:00
Matthew Honnibal	6bc0f4d29f	Merge pull request #1611 from fsonntag/master Solving #1494	2017-11-29 23:11:23 +01:00
Matthew Honnibal	f9ed9ea529	Merge pull request #1624 from GreenRiverRUS/russian Add support for Russian	2017-11-29 23:10:01 +01:00
Jim O'Regan	076a6fc60a	symbols	2017-11-29 20:11:20 +00:00
Jim O'Regan	834ba3c69a	(semi generated) Polimorf mapping	2017-11-29 20:08:24 +00:00
Jim O'Regan	ba6a23fd11	BOM in Italian lemmatiser	2017-11-29 17:40:07 +00:00
ines	a31506e060	Fix off-by-one error in nlp.add_pipe(after=name) (fixes #1654 )	2017-11-28 20:37:55 +01:00
ines	b62739fbfe	Add regression test for #1654	2017-11-28 20:27:54 +01:00
ines	2e50dbb9d7	Simplify test	2017-11-28 20:27:27 +01:00
Felix Sonntag	724ae7dc55	Fixed issue of infix capturing prefixes	2017-11-28 17:17:12 +01:00
Ines Montani	9052643e2c	Merge pull request #1653 from sorenlind/da_example_typo Fix typo	2017-11-27 14:47:42 +00:00
Søren Lind Kristiansen	5fe58b885b	Fix typo	2017-11-27 15:36:18 +01:00
Ines Montani	d52b1ab245	Add unicode_literals (hopefully fixes test failure on Python 2)	2017-11-27 15:16:54 +01:00
Søren Lind Kristiansen	0ffd27b0f6	Add several Danish alternative spellings	2017-11-27 13:35:41 +01:00
Ines Montani	6362024cf8	Merge pull request #1645 from GreenRiverRUS/fix_default_meta Fixed spaCy version string in default meta	2017-11-27 11:58:02 +00:00
Vadim Mazaev	c332ffdde1	Added model command to create model from raw data: words counts, brown clusters and vectors	2017-11-27 01:21:47 +03:00
Vadim Mazaev	59f03ab1d7	Fixed spacy version string in default meta	2017-11-26 23:02:07 +03:00
Vadim Mazaev	53e7c38637	Fixed tests depends on pymorphy2	2017-11-26 21:04:44 +03:00
Vadim Mazaev	cacd859dcd	Added tag map, fixed tests fails, added more exceptions	2017-11-26 20:54:48 +03:00
Ines Montani	a7bb8f1b42	Merge pull request #1637 from sorenlind/da_tokenization Improve Danish tokenization	2017-11-26 15:41:38 +00:00
ines	c699aec089	Add offsets_from_biluo_tags helper and tests (see #1626 )	2017-11-26 16:38:01 +01:00
Søren Lind Kristiansen	ef03e9ea53	Remove unused import.	2017-11-25 13:04:02 +01:00
Søren Lind Kristiansen	6aa241bcec	Add day of month tokenizer exceptions for Danish.	2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen	0c276ed020	Add weekday abbreviations and remove abiguous month abbreviations for Danish.	2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen	056547e989	Add multiple tokenizer exceptions for Danish.	2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen	8dc265ac0c	Add test for tokenization of 'i.' for Danish.	2017-11-24 11:29:37 +01:00
Søren Lind Kristiansen	ac8116510d	Fix tokenization of 'i.' for Danish.	2017-11-24 11:16:53 +01:00
Matthew Honnibal	79f11d4f85	Pickle vectors with vocab	2017-11-23 17:19:50 +01:00
Matthew Honnibal	f29c3925ee	Fix more efficient nonproj	2017-11-23 12:48:00 +00:00
Matthew Honnibal	e10e9ad2c5	Improve efficiency of Doc.to_array	2017-11-23 12:33:27 +00:00
Matthew Honnibal	2acc907d55	Improve profiling	2017-11-23 12:33:03 +00:00
Matthew Honnibal	fa62427300	Remove lookup-based lemmatization	2017-11-23 12:32:22 +00:00
Matthew Honnibal	fb26b2cb12	Use lookup lemmatizer if lemma unset	2017-11-23 12:31:58 +00:00
Matthew Honnibal	db5c714ad2	Improve efficiency of deprojectivization	2017-11-23 12:31:34 +00:00
Matthew Honnibal	8fec7268eb	Move string cleanup under a setting flag	2017-11-23 12:19:18 +00:00
Matthew Honnibal	5949777b12	Fix misleading multi-threading docstring	2017-11-23 12:18:59 +00:00
Matthew Honnibal	542e6fd4ea	Don't remove entries from specials	2017-11-23 12:17:42 +00:00
Matthew Honnibal	30ba81f881	Merge pull request #1576 from ligser/master Actually reset caches in pipe [wip]	2017-11-23 12:54:48 +01:00
ines	c90fe92e15	Fix displaCy test	2017-11-22 05:04:39 +01:00
ines	a6f33ac27d	Fix displaCy test	2017-11-22 04:19:28 +01:00
ines	93b0be611a	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-22 00:28:55 +01:00
ines	60b4915569	Use .pos_ instead of .tags_ in displaCy by default (see #1006 )	2017-11-22 00:28:52 +01:00
Vadim Mazaev	81314f8659	Fixed tokenizer: added char classes; added first lemmatizer and tokenizer tests	2017-11-21 22:23:59 +03:00
Vadim Mazaev	52ee1f9bf9	Updated Russian Language, added lemmatizer, norm exceptions and lex attrs	2017-11-21 11:44:46 +03:00
Burton DeWilde	a5c6869b2d	Fix bug where span.orth_ != span.text (see #1612 )	2017-11-20 12:05:43 -06:00
Burton DeWilde	635792997c	Add regression test for #1612	2017-11-20 12:05:35 -06:00
ines	9a63e32f21	Add noqa to Python 2 compat variables of built-ins (see #1617 )	2017-11-20 14:03:42 +01:00
ines	d70a64d78b	Fix syntax error and formatting in test (see #1617 )	2017-11-20 14:01:25 +01:00
ines	17849dee4b	Fix French test (see #1617 )	2017-11-20 13:59:59 +01:00
Felix Sonntag	33b0f86de3	Changed tokenizer to add infix when infix_start is offset	2017-11-19 16:32:10 +01:00
Felix Sonntag	8be3392302	Added regression text for 1494	2017-11-19 16:30:35 +01:00
Motoki Wu	a52e195a0a	Fixes Issue #1207 where `noun_chunks` of `Span` gives an error. Make sure to reference `self.doc` when getting the noun chunks. Same fix as `9750a0128c`	2017-11-17 17:16:20 -08:00
Motoki Wu	b818afaa0e	Added failing test for Issue #1207 . The noun chunk iterator should work for `Doc` but not for `Span`.	2017-11-17 17:04:27 -08:00
Vadim Mazaev	a0739a06d4	Returned russian support from v1.10 branch	2017-11-17 17:06:15 +03:00
yuukos	7401152289	updated Russian tokenizer moved the trying to import pymorph into __init__	2017-11-17 17:04:50 +03:00
yuukos	3aad66cf00	added russian language support	2017-11-17 17:04:22 +03:00
ines	a3d4dd1a5d	Test adding of lots of pipeline components (see #1585 ) Just to make sure that there's no error now or in the future with adding a large number of pipeline components.	2017-11-15 17:28:06 +01:00
Roman Domrachev	61d28d03e4	Try again to do selective remove cache	2017-11-15 19:11:12 +03:00
Roman Domrachev	b3311100c7	Merge branch 'master' of github.com:explosion/spaCy	2017-11-15 18:30:04 +03:00
Matthew Honnibal	b60d92aca8	Increment version	2017-11-15 16:14:46 +01:00
Roman Domrachev	505c6a2f2f	Completely cleanup tokenizer cache Tokenizer cache can have be different keys than string That modification can slow down tokenizer and need to be measured	2017-11-15 17:55:48 +03:00
Matthew Honnibal	cf0be62096	Increment version	2017-11-15 15:00:18 +01:00
ines	97a4f9362b	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-15 14:24:00 +01:00
ines	8e65247886	Fix lex.id if vectors is None	2017-11-15 14:23:58 +01:00
Matthew Honnibal	437ad1a852	Merge pull request #1570 from explosion/feature/fix-beam-leak Fix memory leak in beam parser	2017-11-15 14:15:05 +01:00
Matthew Honnibal	2f169fdb0a	Set lex ID correctly for new tokens in Vocab	2017-11-15 13:58:03 +01:00
Matthew Honnibal	fe3c42a06b	Fix caching in tokenizer	2017-11-15 13:55:46 +01:00
Matthew Honnibal	8d692771f6	Improve profiling	2017-11-15 13:51:25 +01:00
Matthew Honnibal	b797dca977	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-15 13:11:43 +01:00
ines	c9d72de0fb	Add dummy serialization methods for Japanese and missing lang getter (resolves #1557 )	2017-11-15 12:44:02 +01:00
Matthew Honnibal	d274d3a3b9	Let beam forward use minibatches	2017-11-15 00:51:42 +01:00
Matthew Honnibal	855872f872	Remove state hashing	2017-11-14 23:36:46 +01:00
Roman Domrachev	3e21680814	Use safer method to get string without hit	2017-11-14 22:58:46 +03:00
Roman Domrachev	a33d5a068d	Try to hold origin data instead of restore it	2017-11-14 22:40:03 +03:00
Roman Domrachev	91e2fa6561	Clean all caches	2017-11-14 21:15:04 +03:00
Roman Domrachev	4e378dc4a4	Remove all obsolete code and test only initial problem	2017-11-14 20:45:04 +03:00
Roman	47ce2347b0	Create test that fails when actual cleanup caused	2017-11-14 20:28:13 +03:00
Roman	caae77f72d	Update strings.pyx	2017-11-14 19:44:40 +03:00
Roman Domrachev	3d247d2bb8	Get back previous testcase	2017-11-14 18:01:37 +03:00
Roman Domrachev	870defa815	Swap keys in proper place Remove unnecessary clear of the hits	2017-11-14 17:56:30 +03:00
Roman Domrachev	86ca434c93	Merge github.com:explosion/spaCy	2017-11-14 17:46:22 +03:00
Roman Domrachev	a2745b0e84	StringStore now actually cleaned Do not lose docs in ref tracking	2017-11-14 17:45:50 +03:00
Matthew Honnibal	2512ea9eeb	Fix memory leak in beam parser	2017-11-14 02:11:40 +01:00
Matthew Honnibal	86ddf692a1	Fix bug in limit calculation on dev data	2017-11-14 01:37:10 +01:00
Ines Montani	ea6c85c67a	Merge pull request #1566 from MathiasDesch/master (resolves #1248 ) Add exceptions to tokenizer and norm	2017-11-13 19:05:22 +01:00
Matthew Honnibal	1b348389bb	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-13 18:18:48 +01:00
Matthew Honnibal	ca73d0d8fe	Cleanup states after beam parsing, explicitly	2017-11-13 18:18:26 +01:00
Matthew Honnibal	63ef9a2e73	Remove __dealloc__ from ParserBeam	2017-11-13 18:18:08 +01:00
Mathias Deschamps	c0691b2ab4	Add tokenizer exceptions for ing verbs Extend list of tokenizing exceptions introduced in `123810b`	2017-11-13 17:46:05 +01:00
Mathias Deschamps	288298ead9	Add norm exception for ing verbs Some ing verbs are sometimes written in or in'. Make the NORM form correct	2017-11-13 17:46:05 +01:00
Abhinav Sharma	59f5740ede	improved upon the list of included stop_words	2017-11-13 17:13:49 +05:30
Matthew Honnibal	6e641f46d4	Create a preprocess function that gets bigrams	2017-11-12 00:43:41 +01:00
Matthew Honnibal	c9251d79e3	Edit comment	2017-11-11 18:38:32 +01:00
Matthew Honnibal	dd1678eab3	Edit comment	2017-11-11 18:37:08 +01:00
Roman Domrachev	ee60a52ee7	Fix test imports and last batch cleanup	2017-11-11 11:32:16 +03:00
Roman Domrachev	4a6b094e09	Remove unused import	2017-11-11 03:13:05 +03:00
Roman Domrachev	3c600adf23	Try to fix StringStore clean up (see #1506 )	2017-11-11 03:11:27 +03:00
ines	ee97fd3cb4	Add regression test for #1547	2017-11-11 00:14:03 +01:00
ines	2df27db671	Add unicode declaration	2017-11-11 00:13:56 +01:00
ines	35653bef3a	Add missing import (fixes #1546 )	2017-11-10 19:05:18 +01:00
ines	4c5d2c80d5	Re-add python -m to commands, too brittle :( (see #1536 )	2017-11-10 02:30:55 +01:00
ines	123810b6de	Add "lovin'" to tokenizer exceptions (see #1248 )	2017-11-09 17:09:30 +01:00
ines	1c218397f6	Ensure path in Doc.to_disk/from_disk (resolves ##1521) Also add Doc serialization tests with both Path and string path options	2017-11-09 02:29:03 +01:00
Matthew Honnibal	49fd5a646f	Set version for 2.0.2 release	2017-11-08 22:39:39 +01:00
Matthew Honnibal	fba2dbddf7	Increment version	2017-11-08 22:19:08 +01:00
Matthew Honnibal	a5ea0fdf5a	Fix #1518 : vocab.vectors.resize() didn't work	2017-11-08 22:18:37 +01:00
Matthew Honnibal	de45702bbe	Strip dev suffixes from version for compatibility check	2017-11-08 18:40:21 +01:00
Matthew Honnibal	51639214a1	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-08 18:04:33 +01:00
Matthew Honnibal	a2f980de4e	Exclude .devN versioning from compatibility check	2017-11-08 18:03:52 +01:00
Daniel Hershcovich	d7ae54ff44	Fix typo in message	2017-11-08 16:06:28 +02:00
Matthew Honnibal	4194bc5744	Xfail flakey serialization test	2017-11-08 13:55:13 +01:00
Matthew Honnibal	d5537e5516	Work on Windows test failure	2017-11-08 13:25:18 +01:00
Matthew Honnibal	c27c82d5f9	Fix serialization	2017-11-08 13:08:48 +01:00
Matthew Honnibal	1d5599cd28	Fix dtype	2017-11-08 12:18:32 +01:00
Matthew Honnibal	fa7fdd0d9b	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-08 12:11:31 +01:00
Matthew Honnibal	072ff38a01	Try to fix python3.5 serialization	2017-11-08 12:10:49 +01:00
Ines Montani	3a0f34d567	Merge pull request #1509 from abhi18av/patch-1 Create examples.py for Hindi language	2017-11-08 11:37:19 +01:00
Ines Montani	42b241ccd0	Update language code in usage example in comment	2017-11-08 11:36:38 +01:00
Matthew Honnibal	e262e8d942	Increment version to v2.0.2.dev0	2017-11-08 11:25:47 +01:00
Matthew Honnibal	a8b592783b	Make a dtype more specific, to fix a windows build	2017-11-08 11:24:35 +01:00
Abhinav Sharma	84edade82d	Create examples.py Populated the file with the translations of English example sentences	2017-11-08 13:23:08 +05:30
Matthew Honnibal	d725aee4e2	Increment version to 2.0.1	2017-11-08 02:14:47 +01:00
Matthew Honnibal	8d6f68f1df	Increment version	2017-11-08 01:12:34 +01:00
ines	bcf42b8846	Fix typo	2017-11-08 01:06:37 +01:00
Matthew Honnibal	bbd2a3dee1	Fix title in about.py	2017-11-07 14:02:58 +01:00
Matthew Honnibal	4efaf9306c	Set version to spacy-nightly rc2	2017-11-07 13:27:26 +01:00
Matthew Honnibal	bf1ec2965f	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-07 13:20:29 +01:00
Matthew Honnibal	726f689da4	Fix missing import	2017-11-07 13:20:12 +01:00
ines	834f9c1aab	Update about.py	2017-11-07 13:11:33 +01:00
ines	a4662a31a9	Move model package templates to cli.package and update docs	2017-11-07 12:15:35 +01:00
ines	a09c096d3c	Get docs ready for v2.0.0	2017-11-07 12:00:43 +01:00
Matthew Honnibal	9a88e66103	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-07 02:00:06 +01:00
Matthew Honnibal	174abe4677	Increment to 2.0.0rc1	2017-11-07 01:59:46 +01:00
ines	42a0fbf291	Fix textcat simple train example	2017-11-07 01:25:54 +01:00
ines	8fb48b9b91	Update and document new util functions	2017-11-07 00:22:43 +01:00
Matthew Honnibal	1cab703bba	Move minibatch function to util	2017-11-06 23:45:36 +01:00
ines	5f43953536	Move test	2017-11-06 23:14:10 +01:00
Matthew Honnibal	dd90fe09f5	Remove extraneous label from textcat class	2017-11-06 22:09:02 +01:00
Matthew Honnibal	45e0617e61	Allow Language.update to take unicode text and dict objects	2017-11-06 22:07:38 +01:00
Matthew Honnibal	1831dbd065	Add test of simple textcat workflow	2017-11-06 22:04:29 +01:00
Matthew Honnibal	ffb9101f3f	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-06 19:20:41 +01:00
Matthew Honnibal	8fea512ac8	Don't set tensor in textcat	2017-11-06 19:20:14 +01:00
ines	acb9bdb852	Fix PRON_LEMMA imports	2017-11-06 17:41:53 +01:00
Matthew Honnibal	7d46793dd7	Add PRON_LEMMA to spacy.symbols	2017-11-06 17:38:25 +01:00
Matthew Honnibal	2f7e9f390d	Make test less flakey	2017-11-06 17:34:50 +01:00
Matthew Honnibal	407b08017e	Make test less flakey	2017-11-06 17:31:40 +01:00
Matthew Honnibal	102f797933	Fix lemma ordering in test	2017-11-06 17:02:17 +01:00
Matthew Honnibal	75e1618ec3	Fix lemma clobbering	2017-11-06 16:56:19 +01:00
Matthew Honnibal	6fdffd7246	Merge pull request #1497 from explosion/feature/improve-optimizer-handling 💫 Improve optimizer handling	2017-11-06 16:41:15 +01:00
Matthew Honnibal	8e6795437b	Set release=True	2017-11-06 16:39:32 +01:00
Matthew Honnibal	5c85bf3791	Fix missing import	2017-11-06 15:06:27 +01:00
Matthew Honnibal	25859dbb48	Return optimizer from begin_training, creating if necessary	2017-11-06 14:26:49 +01:00
Matthew Honnibal	465adfee94	Remove unused resume_training method, and pass optimizer through	2017-11-06 14:26:00 +01:00
Matthew Honnibal	13336a6197	Fix Adam import	2017-11-06 14:25:37 +01:00
Matthew Honnibal	2eb11d60f2	Add function create_default_optimizer to spacy._ml	2017-11-06 14:11:59 +01:00
Matthew Honnibal	31babe3c3f	Fix non-clobbering lemmatization	2017-11-06 12:36:05 +01:00
Matthew Honnibal	63c6ae4191	Fix lemmatizer test	2017-11-06 11:57:06 +01:00
Matthew Honnibal	a86a0181b5	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 22:19:10 +01:00
Matthew Honnibal	134d3b8143	Fix morphology	2017-11-05 22:18:22 +01:00
ines	08d1cf850a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 21:41:58 +01:00
ines	baa231745c	Fix Dutch tag map	2017-11-05 21:41:50 +01:00
Matthew Honnibal	46e62ad747	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 19:40:00 +01:00
Matthew Honnibal	bb25cb0f76	Avoid clobbering preset lemmas	2017-11-05 19:39:38 +01:00
ines	507ecb67af	Fix Spanish tag map	2017-11-05 19:23:34 +01:00
Matthew Honnibal	320008352b	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 18:46:15 +01:00
Matthew Honnibal	38109a0e4a	Register SentenceSegmenter in Language.factories	2017-11-05 18:45:57 +01:00
ines	975e1042ff	Fix Italian tag map	2017-11-05 18:34:09 +01:00
ines	6b2d6e4937	Fix Portuguese tag map	2017-11-05 18:31:00 +01:00
ines	fa2687fded	Fix Dutch tag map	2017-11-05 17:57:59 +01:00
ines	fb8990d916	Fix Spanish tag map	2017-11-05 17:48:46 +01:00
ines	9d13288f73	Fix French tag map	2017-11-05 17:47:59 +01:00
ines	54579805c5	Fix French tag map	2017-11-05 17:44:05 +01:00
Matthew Honnibal	2b35bb76ad	Fix tensorizer on GPU	2017-11-05 15:34:40 +01:00
Matthew Honnibal	6e5181bbaa	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 15:33:56 +01:00
Matthew Honnibal	6f438b17c1	Increment version to v2.0.0a19	2017-11-05 14:43:36 +01:00
Matthew Honnibal	225cc249c9	Pass string path to numpy, to fix #1479	2017-11-05 14:42:46 +01:00
Matthew Honnibal	00435d8f0c	Add extra beam parsing test	2017-11-05 14:39:57 +01:00
Matthew Honnibal	e777ea25bb	Merge pull request #1492 from uwol/develop TextCategorizer return parameter fix	2017-11-05 14:13:04 +01:00
Matthew Honnibal	0d4bd6414e	Fix Italian tag map	2017-11-05 14:11:03 +01:00
ines	ef597622a6	Add Portuguese tag map	2017-11-05 13:58:34 +01:00
ines	793c62dfda	Add Dutch tag map	2017-11-05 13:48:07 +01:00
ines	f7485a09c8	Fix Italian tag map	2017-11-05 13:12:58 +01:00
uwol	a2162b8908	tensorizer return parameter fix	2017-11-05 12:25:10 +01:00
ines	0a27afbf86	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-04 23:32:52 +01:00
ines	3cef901834	Add tag map for French and Italian	2017-11-04 23:32:51 +01:00
Matthew Honnibal	cfb83c231c	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-04 23:08:19 +01:00
Matthew Honnibal	d185927998	Undo harmful pickling hacks on Language class	2017-11-04 23:07:03 +01:00
ines	6c15aafebd	Fix formatting	2017-11-04 23:07:02 +01:00
Matthew Honnibal	3ca16ddbd4	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-04 00:25:02 +01:00
Matthew Honnibal	e4ec4be948	Fix parser test	2017-11-04 00:23:45 +01:00
Matthew Honnibal	98c29b7912	Add padding vector in parser, to make gradient more correct	2017-11-04 00:23:23 +01:00
ines	5e7d98f72a	Remove test for #1491	2017-11-03 22:10:57 +01:00
ines	718f1c50fb	Add regression test for #1491	2017-11-03 21:11:20 +01:00
Matthew Honnibal	144a93c2a5	Back-off to tensor for similarity if no vectors	2017-11-03 20:56:33 +01:00
Matthew Honnibal	1e9634691a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-03 20:21:15 +01:00
Matthew Honnibal	13c8881d2f	Expose parser's tok2vec model component	2017-11-03 20:20:59 +01:00
Matthew Honnibal	17c63906f9	Update tensorizer component	2017-11-03 20:20:26 +01:00
Matthew Honnibal	2bf21cbe29	Update model after optimising it instead of waiting	2017-11-03 20:20:01 +01:00
Matthew Honnibal	d6e831bf89	Fix lemmatizer tests	2017-11-03 19:46:34 +01:00
ines	eef930c73e	Assert instead of print	2017-11-03 18:50:57 +01:00
ines	f0986df94b	Add test for #1488 (passes on v2.0.0a18?)	2017-11-03 14:44:36 +01:00

... 8 9 10 11 12 ...

5311 Commits