spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-09-23 04:19:11 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	3cdee79a0c	Add depth argument for text classifier	2018-03-16 12:37:31 +01:00
Matthew Honnibal	13067095a1	Disable broken add-after-train in textcat	2018-03-16 12:33:33 +01:00
Matthew Honnibal	565ef8c4d8	Improve argument passing in textcat	2018-03-16 12:30:51 +01:00
Matthew Honnibal	eb2a3c5971	Remove unused function	2018-03-16 12:30:33 +01:00
Matthew Honnibal	307d6bf6d3	Fix parser for Thinc 6.11	2018-03-16 10:59:31 +01:00
Matthew Honnibal	9a389c4490	Fix parser for Thinc 6.11	2018-03-16 10:38:13 +01:00
Matthew Honnibal	648532d647	Don't assume blas methods are present	2018-03-16 02:48:20 +01:00
Matthew Honnibal	e85dd038fe	Merge remote-tracking branch 'origin/master' into feature/single-thread	2018-03-16 02:41:11 +01:00
Matthew Honnibal	e3be3d65b3	Version as v2.0.10.dev0	2018-03-15 17:31:22 +01:00
ines	f3f8bfc367	Add built-in factories for merge_entities and merge_noun_chunks Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).	2018-03-15 17:16:54 +01:00
Ines Montani	0d17377e8b	Merge pull request #2095 from DuyguA/quick-typo-fix (resolves #2063 ) Quick typo fix	2018-03-15 00:29:56 +01:00
ines	d854f69fe3	Add built-in factories for merge_entities and merge_noun_chunks Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).	2018-03-15 00:18:51 +01:00
ines	9ad5df41fe	Fix whitespace	2018-03-15 00:11:18 +01:00
Matthew Honnibal	d7ce6527fb	Use increasing batch sizes in ud-train	2018-03-14 20:15:28 +01:00
alldefector	f4e5904fc2	Fix Spanish noun_chunks failure caused by typo	2018-03-14 17:03:17 +01:00
Thomas Opsomer	fbf48b3f9f	lemma property to return hash instead of unicode	2018-03-14 17:03:00 +01:00
Matthew Honnibal	8cefc58abc	Fix Vectors pickling	2018-03-14 16:59:37 +01:00
DuyguA	be4f6da16b	maybe not a good idea to remove also	2018-03-14 14:47:24 +01:00
DuyguA	1a513f71e3	removed also from lookup	2018-03-14 11:57:15 +01:00
DuyguA	cca66abf1e	quick typo fix	2018-03-14 11:34:22 +01:00
Matthew Honnibal	7b755414eb	Update call into thinc	2018-03-13 13:59:59 +01:00
Matthew Honnibal	e101f10ef0	Fix header	2018-03-13 02:12:16 +01:00
Matthew Honnibal	952c87409e	Use openblas.sgemm in parser	2018-03-13 02:12:01 +01:00
Matthew Honnibal	d55620041b	Switch parser to gemm from thinc.openblas	2018-03-13 02:10:58 +01:00
Matthew Honnibal	c2f4759257	Fix test for Python 2	2018-03-12 23:03:05 +01:00
Matthew Honnibal	9aeec9c242	Increment dev version	2018-03-11 01:58:21 +01:00
Matthew Honnibal	f49d71fa7c	Merge branch 'master' of https://github.com/explosion/spaCy	2018-03-11 01:27:17 +01:00
Matthew Honnibal	5dddb30e5b	Fix ud-train script	2018-03-11 01:26:45 +01:00
Matthew Honnibal	e42960bd14	Merge pull request #2012 from alldefector/patch-1 Fix Spanish noun_chunks failure caused by typo	2018-03-11 01:05:19 +01:00
Matthew Honnibal	2cab4d6517	Remove use of attr module in ud_train	2018-03-11 00:59:39 +01:00
Matthew Honnibal	fa9fd21620	Increment dev version	2018-03-11 00:41:54 +01:00
Matthew Honnibal	53b3249e06	Add tests for arc eager oracle	2018-03-10 23:42:56 +01:00
Matthew Honnibal	754ea1b2f7	Link in spaCy CoNLL commands	2018-03-10 23:42:15 +01:00
Matthew Honnibal	3478ea76d1	Add ud_train and ud_evaluate CLI commands	2018-03-10 23:41:55 +01:00
Matthew Honnibal	4b72c38556	Fix dropout bug in beam parser	2018-03-10 23:16:40 +01:00
Matthew Honnibal	9cc202d670	Fix Vectors pickling	2018-03-10 22:53:42 +01:00
Matthew Honnibal	3d6487c734	Support dropout in beam parse	2018-03-10 22:41:55 +01:00
Matthew Honnibal	31b156d60b	Fix itershuffle	2018-03-10 22:32:59 +01:00
Matthew Honnibal	b59765ca9f	Stream gold during spacy train	2018-03-10 22:32:45 +01:00
Matthew Honnibal	c3d168509a	Stream the gold data during training, to reduce memory	2018-03-10 22:32:32 +01:00
DuyguA	cba63196f9	fixed typo	2018-03-09 10:54:18 +01:00
DuyguA	7a780476af	added more abbreviations	2018-03-09 10:13:00 +01:00
DuyguA	cca87756d7	added Sti	2018-03-08 18:07:52 +01:00
DuyguA	3c994311c5	added abbrevs	2018-03-08 18:03:27 +01:00
DuyguA	56d6fb180e	added like_num to lex	2018-03-08 15:25:25 +01:00
DuyguA	26ee0590a3	added some commonly used cases	2018-03-08 12:43:58 +01:00
DuyguA	ae6473e4d5	removed some words with negation particle.	2018-03-08 12:20:32 +01:00
DuyguA	6ed59a2198	removed number words to be caried to the lexical	2018-03-08 12:19:23 +01:00
DuyguA	04784a44a6	made alphabetical order for Turkish chaaracters	2018-03-08 12:11:32 +01:00
DuyguA	af33e022a5	added example sentences for Turkish	2018-03-08 12:06:03 +01:00
Matthew Honnibal	a1be01185c	Fix array out of bounds error in Span	2018-02-28 12:27:09 +01:00
Thomas Opsomer	8df9e52829	lemma property to return hash instead of unicode	2018-02-27 19:50:01 +01:00
Ines Montani	35634352fe	Merge pull request #2025 from dejanmarich/patch-1 Update stop_words.py for Croatian language	2018-02-26 18:22:32 +01:00
Matthew Honnibal	14f729c72a	Add subtok label to parser	2018-02-26 12:26:35 +01:00
Matthew Honnibal	7137ad8b0b	Make label filtering clearer for projectivisation	2018-02-26 12:02:01 +01:00
Matthew Honnibal	b8d52cb285	Fix inconsistent label freq cutoff for projectivisation	2018-02-26 12:01:44 +01:00
Matthew Honnibal	7b66ec896a	Revert "Revert "Improve parser oracle around sentence breaks."" This reverts commit `36e481c584`.	2018-02-26 10:57:37 +01:00
Matthew Honnibal	36e481c584	Revert "Improve parser oracle around sentence breaks." This reverts commit `50817dc9ad`.	2018-02-26 10:53:55 +01:00
Matthew Honnibal	5faae803c6	Add option to not use Janome for Japanese tokenization	2018-02-26 09:39:46 +01:00
Matthew Honnibal	9b406181cd	Add Chinese.Defaults.use_jieba setting, for UD	2018-02-25 15:12:38 +01:00
Matthew Honnibal	9ccd0c643b	Add Vietnamese	2018-02-25 15:00:46 +01:00
Matthew Honnibal	d4fdb97c87	Fix alignment for words with spaces	2018-02-25 14:55:00 +01:00
Matthew Honnibal	6d2c1ef52c	Fix SP tag in generic tag map	2018-02-24 16:04:56 +01:00
Matthew Honnibal	5cc3bd1c1d	Update alignment tests	2018-02-24 16:03:58 +01:00
Matthew Honnibal	6138439469	Fix many-to-one alignment	2018-02-24 16:03:50 +01:00
Matthew Honnibal	4890ee1732	Fix scoring of tokenization for punct	2018-02-24 10:32:32 +01:00
Matthew Honnibal	12b39f87da	Move cython declarations in matcher.pyx	2018-02-24 10:32:18 +01:00
Matthew Honnibal	01d1b7abdf	Support many-to-one alignment in GoldParse	2018-02-24 10:17:01 +01:00
Matthew Honnibal	7865746574	Support many-to-one alignment	2018-02-24 02:09:53 +01:00
Matthew Honnibal	458710b831	Poke matcher test for appveyor	2018-02-23 23:53:48 +01:00
Matthew Honnibal	968dabdde4	Fix bug in multi-task objective	2018-02-23 23:48:09 +01:00
Matthew Honnibal	2c9c8b8d72	Try comming out emoji test in matcher	2018-02-23 23:34:35 +01:00
Matthew Honnibal	980ad68cbe	Try to find test that fails on appveyor	2018-02-23 21:27:53 +01:00
Matthew Honnibal	39de8cd4d3	Try to find test failing on appveyor	2018-02-23 20:59:21 +01:00
Matthew Honnibal	4492a33a9d	Fix sent_start multi-task objective when alignment fails	2018-02-23 16:50:59 +01:00
Matthew Honnibal	5fa44e93f1	Set unicode_literals in matcher	2018-02-23 16:48:54 +01:00
Matthew Honnibal	12264f9296	Add multi-task objective for sentence segmentation	2018-02-23 16:25:57 +01:00
Matthew Honnibal	e7deadb519	Set version to 2.1.0.dev1	2018-02-23 16:22:24 +01:00
Matthew Honnibal	7b575a119e	Try to reduce memory usage of test_matcher	2018-02-23 15:34:37 +01:00
Matthew Honnibal	24563f4026	Fix data typing in align	2018-02-23 15:08:06 +01:00
Matthew Honnibal	7a5ba20692	Fix integer typing in _align	2018-02-23 14:51:24 +01:00
Matthew Honnibal	875411b875	Set unicode types in _align.pyx and test	2018-02-23 14:35:38 +01:00
Matthew Honnibal	51d9679aa3	Fix broken span.as_doc test	2018-02-23 14:22:24 +01:00
dejanmarich	71c261d58b	Update stop_words.py Added more words	2018-02-23 10:31:01 +01:00
Matthew Honnibal	3e6c1111b7	Remove obsolete test	2018-02-23 03:22:07 +01:00
Matthew Honnibal	a4fdec524a	Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-gold	2018-02-22 21:44:28 +01:00
Matthew Honnibal	50817dc9ad	Improve parser oracle around sentence breaks.	2018-02-22 19:22:26 +01:00
Matthew Honnibal	307aefe131	Increment version to v2.0.9	2018-02-22 17:07:53 +01:00
Feng Niu	1c60384bed	return on empty doc	2018-02-21 15:39:04 -08:00
Feng Niu	7eb1cd100b	unbound doc var	2018-02-21 15:05:37 -08:00
Feng Niu	8df75b229c	fix unbound vars in es.syntax_iterators	2018-02-21 13:11:17 -08:00
alldefector	4244e285c2	Fix Spanish noun_chunks failure caused by typo	2018-02-21 12:43:21 -08:00
Matthew Honnibal	661873ee4c	Randomize the rebatch size in parser	2018-02-21 21:02:07 +01:00
Matthew Honnibal	0872cf611d	Don't lower-case lemmas of proper nouns	2018-02-21 16:01:16 +01:00
Matthew Honnibal	a0ddb803fd	Make error when no label found more helpful	2018-02-21 16:00:59 +01:00
Matthew Honnibal	ea2fc5d45f	Improve length and freq cutoffs in parser	2018-02-21 16:00:38 +01:00
Matthew Honnibal	e5757d4bf0	Add labels property to parser	2018-02-21 16:00:00 +01:00
Matthew Honnibal	eff4ae809a	Fix nonproj label filter	2018-02-21 15:59:04 +01:00
Matthew Honnibal	e624405cda	Temporarily remove cutoff when filtering labels in nonproj	2018-02-21 13:53:40 +01:00
Matthew Honnibal	f466f0186e	Use new alignment implementation in GoldParse	2018-02-20 21:16:35 +01:00
Matthew Honnibal	c0734ba526	Make alignment work with strings	2018-02-20 17:51:49 +01:00
Matthew Honnibal	8180c84a98	Add tests for new Levenshtein alignment	2018-02-20 17:32:25 +01:00
Matthew Honnibal	930c980570	Add improved Levenshtein alignment implementation	2018-02-20 17:31:56 +01:00
Ines Montani	14e7e0f12a	Merge pull request #2000 from jimregan/polish-tag-map Polish tag map	2018-02-18 19:05:58 +01:00
Jim O'Regan	664407de5d	missing PrepCase attribute	2018-02-18 14:46:12 +00:00
Jim O'Regan	95f0673fbc	fix typo/missing here too	2018-02-18 14:38:27 +00:00
Matthew Honnibal	2bccad8815	Fix incorrect matcher test	2018-02-18 14:56:12 +01:00
Matthew Honnibal	530172d57a	Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-faster-matcher	2018-02-18 14:40:42 +01:00
Matthew Honnibal	cf0e320f2b	Add doc.is_sentenced attribute, re #1959	2018-02-18 14:16:55 +01:00
Matthew Honnibal	1e5aeb4eec	Merge pull request #1987 from thomasopsomer/span-sent Make span.sent work when only manual / custom sbd	2018-02-18 14:05:37 +01:00
Matthew Honnibal	1cf774bdc1	Add output options return_matches and as_tuples to Matcher	2018-02-18 14:00:45 +01:00
Matthew Honnibal	dd9b0945af	Fix inconsistencies in the symbols table	2018-02-18 13:51:31 +01:00
Matthew Honnibal	66496ac8e1	Set version to v2.1.0.dev0	2018-02-18 13:48:39 +01:00
Matthew Honnibal	eb3040ce46	Merge pull request #1891 from fucking-signup/master Fix issue #1889	2018-02-18 13:47:47 +01:00
Matthew Honnibal	3d7285870b	Update matcher branch with v2.0.8 master	2018-02-18 13:42:58 +01:00
ines	6bba1db4cc	Drop six and related hacks as a dependency	2018-02-18 13:29:56 +01:00
Matthew Honnibal	b30b09192a	Merge pull request #1665 from jimregan/animacy typo in "inan", add "nhum"	2018-02-18 13:26:53 +01:00
Matthew Honnibal	1b3c98e01b	Set version to v2.0.8	2018-02-18 12:16:31 +01:00
Matthew Honnibal	f9f46e5a07	Revert matcher fixes from GregDubbin	2018-02-18 10:59:28 +01:00
Matthew Honnibal	86405e4ad1	Fix CLI for multitask objectives	2018-02-18 10:59:11 +01:00
Matthew Honnibal	a34749b2bf	Add multitask objectives options to train CLI	2018-02-17 22:03:54 +01:00
Matthew Honnibal	8f06903e09	Fix multitask objectives	2018-02-17 18:41:36 +01:00
Matthew Honnibal	d1246c95fb	Fix model loading when using multitask objectives	2018-02-17 18:11:36 +01:00
Matthew Honnibal	262d0a3148	Fix overwriting of lexical attributes when loading vectors during training	2018-02-17 18:11:11 +01:00
Matthew Honnibal	c0caf7cf27	Fix LANG symbol	2018-02-17 18:10:50 +01:00
Matthew Honnibal	0bf2f6be29	Add missing symbol for LANG attr. Fixes inconsistent numeric ID	2018-02-17 17:37:02 +01:00
Matthew Honnibal	97a228a4ce	Increment to v2.0.8.dev0	2018-02-17 16:54:36 +01:00
Matthew Honnibal	f7dc64d2a3	Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-faster-matcher	2018-02-17 16:47:35 +01:00
Aaron Marquez	ea571e8325	Merge branch 'master' into issue-1959	2018-02-16 15:14:09 -08:00
Matthew Honnibal	7d5c720fc3	Fix multitask objective when no pipeline provided	2018-02-15 23:50:21 +01:00
Aaron Marquez	f0d3672e17	Changed loading EN model	2018-02-15 14:28:38 -08:00
Aaron Marquez	3765d84d57	Fix issue #1959	2018-02-15 12:51:49 -08:00
Aaron Marquez	7ba4111554	Add test for issue-1959	2018-02-15 12:46:22 -08:00
Matthew Honnibal	59b7cf9db8	Add get_beam_parse method in ArcEager, for Prodigy	2018-02-15 21:03:16 +01:00
Matthew Honnibal	3e541de440	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-15 21:02:55 +01:00
Thomas Opsomer	5d24a81c0b	add test for span.sent when doc not parsed	2018-02-15 16:59:16 +01:00
Thomas Opsomer	deab391cbf	correct check on sent_start & raise if no boundaries	2018-02-15 16:58:30 +01:00
Matthew Honnibal	afbd46adfb	Remove length cap in PhraseMatcher	2018-02-15 16:10:54 +01:00
Matthew Honnibal	4533c7408d	Update matcher tests	2018-02-15 15:39:47 +01:00
Matthew Honnibal	1c19605426	Move matcher2.pyx to matcher.pyx	2018-02-15 15:27:03 +01:00
Matthew Honnibal	9ebf2fe7c3	Make helper function to get longest matches	2018-02-15 15:26:15 +01:00
Matthew Honnibal	4cb861e080	Merge pull request #1968 from DuyguA/is_currency New lexical feature is_currency	2018-02-15 12:13:36 +01:00
Thomas Opsomer	b902731313	Find span sentence when only sentence boundaries (no parser)	2018-02-14 22:18:54 +01:00
Matthew Honnibal	d19dc67886	Make get_action nogil, for efficiency	2018-02-14 12:16:36 +01:00
Matthew Honnibal	7885b92b45	Refactor matcher2, hopefully making it faster	2018-02-14 12:11:17 +01:00
Matthew Honnibal	00261eea27	Make tests refer to matcher2	2018-02-14 12:10:51 +01:00
Claudiu-Vlad Ursache	e28de12cbd	Ensure files opened in `from_disk` are closed Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706).	2018-02-13 20:49:43 +01:00
Matthew Honnibal	262cbe356e	Remove caching, as doesn't seem to help for now.	2018-02-13 17:15:20 +01:00
Matthew Honnibal	f43d53f2c5	Remove print statement	2018-02-13 17:15:07 +01:00
Matthew Honnibal	dcd8d89aef	Update test for 850, making it work with matcher2	2018-02-13 16:35:20 +01:00
Matthew Honnibal	9bdfa5cd4f	Remove re comparisons tests, as matcher behaves differently	2018-02-13 16:28:52 +01:00
Matthew Honnibal	6d7986b0f1	Fix matcher test	2018-02-13 16:28:06 +01:00
Matthew Honnibal	9efda9e9ab	Add PhraseMatcher in matcher2.pyx	2018-02-13 16:27:46 +01:00
Johannes Dollinger	012e874d09	Add contributor agreement for emulbreh	2018-02-13 13:40:33 +01:00
Johannes Dollinger	bf94c13382	Don't fix random seeds on import	2018-02-13 12:42:23 +01:00
Matthew Honnibal	0004331895	Update notes on matcher2	2018-02-13 11:45:45 +01:00
Matthew Honnibal	b4cc39eb74	Fix zero-width quantifiers. Passes test_matcher	2018-02-13 11:45:32 +01:00
Matthew Honnibal	1b01685f47	Fix ZERO_PLUS operator	2018-02-12 12:28:03 +01:00
Matthew Honnibal	9115c3ba0a	Add TODO in notes	2018-02-12 12:06:48 +01:00
Matthew Honnibal	b00326a7fe	Move pattern_id out of TokenPattern	2018-02-12 12:05:54 +01:00
Matthew Honnibal	d34c732635	Add Python notes for rethinking matcher	2018-02-12 10:19:29 +01:00
Matthew Honnibal	d7c9b53120	Pass kwargs into pipeline components during begin_training	2018-02-12 10:18:39 +01:00
Matthew Honnibal	fae5c0dc18	Work on matcher2	2018-02-12 10:17:43 +01:00
4altinok	ca8728035d	added new lex feat to token	2018-02-11 18:55:48 +01:00
4altinok	edd7202a06	added new symbol	2018-02-11 18:55:32 +01:00
4altinok	ed1ac2969e	added new lexical feat to lexeme	2018-02-11 18:51:48 +01:00
4altinok	94fb0b75e3	code for is_currency	2018-02-11 18:51:32 +01:00
4altinok	3deef1497a	removed 18 and replaced 18 with is_currency	2018-02-11 18:51:09 +01:00
4altinok	471d3c9e23	added lex test for is_currency	2018-02-11 18:50:50 +01:00
ines	c63e99da8a	Fix typo in glossary (resolves #1964 ) Co-Authored-By: SThomasP <sthomasp@users.noreply.github.com>	2018-02-10 11:58:41 +01:00
Lyndon White	6ee5dff51c	Make python 3.4 compat module loading (fix #1733 )	2018-02-09 23:03:35 +08:00
Matthew Honnibal	e361b4f82b	Fix #1929 : Incorrect NER when pre-set sentence boundaries.	2018-02-08 15:25:41 +01:00
Matthew Honnibal	fd9fd275c5	Make test for #1945 more precise	2018-02-07 02:06:11 +01:00
Matthew Honnibal	c087a14380	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-07 01:29:39 +01:00
Matthew Honnibal	76d89b2180	Add test for #1945 : PhraseMatcher regression	2018-02-07 01:29:23 +01:00
Ines Montani	0954e15dda	Merge pull request #1913 from ohenrik/nb_syntax_iterator Norwegian Language (nb) - Added french syntax iterator with explanation	2018-02-06 04:59:07 +01:00
Ole Henrik Skogstrøm	251a7805fe	Copied French syntax iterator to simplify future changes	2018-02-05 14:45:05 +01:00
Matthew Honnibal	2e7391e627	Merge pull request #1916 from tokestermw/bug/fix-not-passing-in-model-cfg-in-nlp Bug/fix not passing in model cfg in nlp	2018-02-05 01:19:40 +01:00
Ali Zarezade	9df9da34a3	Fix init_model issue Fixing issue #1928	2018-02-03 17:21:34 +03:30
Matthew Honnibal	ebe84e45e5	Increment version to 2.0.7	2018-02-02 03:39:16 +01:00
Matthew Honnibal	e4b1f57599	Increment version	2018-02-02 02:33:23 +01:00
Matthew Honnibal	069531c351	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-02 02:32:58 +01:00
Matthew Honnibal	f74a802d09	Test and fix #1919 : Error resuming training	2018-02-02 02:32:40 +01:00
ines	f1d3deffac	Add Russian example sentences (see #1107 )	2018-02-01 20:09:40 +01:00
Matthew Honnibal	6b1126c312	Merge branch 'master' of https://github.com/explosion/spaCy	2018-02-01 02:57:52 +01:00
ines	3c1fb9d02d	Make validate command fail more gracefully if version not found Mostly relevant during develoment when working with .dev versions	2018-01-31 22:06:28 +01:00
Motoki Wu	54062b7326	added tests for issue #1915	2018-01-30 18:30:19 -08:00
Motoki Wu	f4a7d1a423	make to sure pass in **cfg to each component when training	2018-01-30 18:29:54 -08:00
ines	4046823699	Only check component in factories if string (see #1911 )	2018-01-30 16:29:07 +01:00
ines	ce10d320c4	Fix component check in self.factories (see #1911 )	2018-01-30 16:09:37 +01:00
Ole Henrik Skogstrøm	e40465487c	Added french syntax iterator with explenation	2018-01-30 15:44:29 +01:00
ines	8901814248	Improve error handling if pipeline component is not callable (resolves #1911 ) Also add help message if user accidentally calls nlp.add_pipe() with a string of a built-in component name.	2018-01-30 15:43:03 +01:00
Matthew Honnibal	a437ba87a3	Set release=True	2018-01-29 21:26:04 +01:00
Adam Binford	9238749aaf	Removed test to avoid network requests	2018-01-29 14:48:20 -05:00
Adam Binford	1a2c2f7d7f	Fixed auto linking after download and added simple test to check	2018-01-29 14:25:21 -05:00
Matthew Honnibal	cb7110c22e	Merge pull request #1882 from ohenrik/nb_lemma_and_tag_map Add norwegian bokmål ('nb') lemmatizer and tag_map	2018-01-29 18:18:50 +01:00
Matthew Honnibal	0c1e7f0c86	Merge pull request #1893 from azarezade/master Add Persian language	2018-01-29 18:18:33 +01:00
Matthew Honnibal	cbdab75b36	Increment version	2018-01-28 23:46:22 +01:00
Matthew Honnibal	512e6adb08	Merge pull request #1896 from thomasopsomer/fix-sent Fix sentence boundaries serialization (issue #1834)	2018-01-28 21:18:51 +01:00
Matthew Honnibal	f5b1ad4100	Limit parser model size, to hopefully reduce memory during CI tests	2018-01-28 21:00:32 +01:00
Thomas Opsomer	515e25910e	fix sent_start in serialization	2018-01-28 19:50:42 +01:00
Thomas Opsomer	45d62561f7	add test for the issue	2018-01-28 19:49:56 +01:00
ines	6d978e5c35	Don't use deprecated Doc.merge call in displaCy As reported here: https://stackoverflow.com/a/48464412/6400719	2018-01-27 11:25:05 +01:00
Ali Zarezade	bb6bd3d8ae	add persian language	2018-01-27 13:27:26 +03:30
Ali Zarezade	d195675db5	add persian language	2018-01-27 13:21:38 +03:30
Kit	4b42267ba3	Fix issue #1889	2018-01-25 23:17:22 +01:00
Kit	52ef51f36e	Add test for issue #1889	2018-01-25 22:56:48 +01:00
Ole Henrik Skogstrøm	8e2c9f2475	Cleaned up nb tag_map comments	2018-01-25 11:09:28 +01:00
Ole Henrik Skogstrøm	1107e89fcf	Updated doc string on nb tag_map module	2018-01-25 11:08:28 +01:00
Matthew Honnibal	6a8cb905aa	Merge pull request #1876 from GregDubbin/master Pattern matcher fixes	2018-01-24 16:38:11 +01:00
Matthew Honnibal	38b260e0c3	Merge pull request #1879 from azarezade/master Add Persian character and symbols	2018-01-24 16:34:22 +01:00
Matthew Honnibal	edb71a280e	Add test for #1883 : Unpickling Matcher	2018-01-24 15:42:33 +01:00
Matthew Honnibal	2ad050e668	Fix unpickling of Matcher. Also store correct data in matcher._patterns	2018-01-24 15:42:11 +01:00
Ole Henrik Skogstrøm	4058a7d579	Fix æøå characters in lemmatizer	2018-01-24 14:03:14 +01:00
Ole Henrik Skogstrøm	42248f423f	Updated tag map	2018-01-24 13:50:33 +01:00
Ole Henrik Skogstrøm	74b430b49a	Correct Lemmatizer	2018-01-24 13:26:33 +01:00
Ole Henrik Skogstrøm	b9b3a40c78	Add norwegian lemmatizer and tag_map	2018-01-24 12:28:29 +01:00
Matthew Honnibal	42a18ef903	Add test for #1868 : Vocab.__contains__ with ints	2018-01-23 23:27:05 +01:00
Matthew Honnibal	43f381ce36	Make Vocab.__contains__ work with ints. Fixes #1868	2018-01-23 23:26:47 +01:00
greg	85ab99e692	Correct test examples	2018-01-23 15:00:14 -05:00
greg	f50bb1aafc	Restructure StateC to eliminate dependency on unordered_map	2018-01-23 14:40:03 -05:00
Matthew Honnibal	f3753c2453	Further model deserialization fixes re #1727	2018-01-23 19:16:05 +01:00
Matthew Honnibal	91e916cb67	Add comment to new test	2018-01-23 19:11:53 +01:00
Matthew Honnibal	fd187d71ad	Add test for #1727	2018-01-23 19:11:01 +01:00
Matthew Honnibal	85c942a6e3	Dont overwrite pretrained_dims setting from cfg. Fixes #1727	2018-01-23 19:10:49 +01:00
Ali Zarezade	42349471bc	add ٪ as punctuation	2018-01-23 18:11:33 +03:30
Ali Zarezade	2bda582135	Add Persian character and symbols Add Persian characters and the following: - ٪ used instead of % - ؟ used instead of ? - ﷼ used instead of $ - ، used instead of , - ؛ used instead of ;	2018-01-23 13:20:36 +03:30
Matthew Honnibal	7e6dc283db	Fix unicode import in test	2018-01-22 23:55:44 +01:00
greg	686735b94e	Fix matcher import	2018-01-22 16:53:05 -05:00
greg	3a491093ee	Import libcpp.map if libcpp.unordered_map doesn't exist	2018-01-22 16:46:25 -05:00
greg	d55992bdf0	Switch match dictionary to use final state pointer rather than ID	2018-01-22 15:36:47 -05:00
Matthew Honnibal	4ce7d24fd5	Add test for #1799 : Set left and right edges (and thus sentences) in non-projective parses.	2018-01-22 20:18:38 +01:00
Matthew Honnibal	56164ab688	Set l_edge and r_edge correctly for non-projective parses. Fixes #1799	2018-01-22 20:18:04 +01:00
Matthew Honnibal	964aa1b384	Merge branch 'master' of https://github.com/explosion/spaCy	2018-01-22 19:18:46 +01:00
Matthew Honnibal	29897ed1b3	Allow vector loading to work on 1d data files. Fixes #1831	2018-01-22 19:18:26 +01:00
greg	490bc82c27	Add comments clarifying matcher logic for '*'	2018-01-22 10:03:12 -05:00
Matthew Honnibal	fe4748fc38	Merge pull request #1870 from avadhpatel/master Model Load Performance Improvement by more than 5x	2018-01-22 00:05:15 +01:00
Avadh Patel	a517df55c8	Small fix Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-21 15:20:45 -06:00
Avadh Patel	5b5029890d	Merge branch 'perfTuning' into perfTuningMaster Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-21 15:20:00 -06:00
Matthew Honnibal	203d2ea830	Allow multitask objectives to be added to the parser and NER more easily	2018-01-21 19:37:02 +01:00
Matthew Honnibal	4a7d524efb	Merge branch 'master' of https://github.com/explosion/spaCy	2018-01-21 19:22:03 +01:00
Matthew Honnibal	61a051f2c0	Fix MultitaskObjective	2018-01-21 19:21:34 +01:00
Avadh Patel	75903949da	Updated model building after suggestion from Matthew Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-18 06:51:57 -06:00
Avadh Patel	fe879da2a1	Do not train model if its going to be loaded from disk This saves significant time in loading a model from disk. Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-17 06:16:07 -06:00
Avadh Patel	2146faffee	Do not train model if its going to be loaded from disk This saves significant time in loading a model from disk. Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-17 06:04:22 -06:00
greg	7072b395c9	Add greedy matcher tests	2018-01-16 15:46:13 -05:00
greg	441f490c1c	Merge branch 'master' of github.com:GregDubbin/spaCy	2018-01-16 13:31:10 -05:00
greg	8bea62f26e	Correct bugs for greedy matching and introduce ADVANCE_PLUS action	2018-01-16 13:21:43 -05:00
Matthew Honnibal	ccb51a9f36	Make .similarity() return 1.0 if all orth attrs match	2018-01-15 16:29:48 +01:00
Matthew Honnibal	82135d85b7	Fix test	2018-01-15 15:55:15 +01:00
Matthew Honnibal	4b09616b58	Add test for #1757 : Comparison against None	2018-01-15 15:55:01 +01:00
Matthew Honnibal	b904d81e9a	Fix rich comparison against None objects. Closes #1757	2018-01-15 15:51:25 +01:00
Matthew Honnibal	9e413449f6	Fix unicode error in new test	2018-01-15 15:39:00 +01:00
Matthew Honnibal	ab7c45b12d	Fix error message and handling of doc.sents	2018-01-15 15:21:11 +01:00
Matthew Honnibal	6b215d2dd3	Add test for Issue #1537	2018-01-15 15:20:56 +01:00
ines	5babb7d6f6	Merge branch 'master' of https://github.com/explosion/spaCy	2018-01-14 17:31:09 +01:00
ines	793890cb4d	Remove test for removed deprecation warning	2018-01-14 17:31:06 +01:00
Matthew Honnibal	465a6f6452	Add missing Span.vocab property. Closes #1633	2018-01-14 15:06:30 +01:00
Matthew Honnibal	0cb090e526	Fix infinite recursion in token.sent_start. Closes #1640	2018-01-14 15:02:15 +01:00
Matthew Honnibal	5cbe913b6f	Don't raise deprecation warning in property. Closes #1813 , #1712	2018-01-14 14:55:58 +01:00
Matthew Honnibal	1a1cca6052	Fix vectors.resize() on Py3. Closes #1539	2018-01-14 14:48:51 +01:00
Matthew Honnibal	0153220304	Make set_vector add word to vocab. Fixes #1807	2018-01-14 13:57:57 +01:00
Ines Montani	55754f0cee	Merge pull request #1836 from fucking-signup/master Add tests for issue #1769	2018-01-13 00:23:35 +00:00
Kit	4ee97f20a0	Mark like_num tests as slow	2018-01-13 00:44:15 +01:00
Kit	855531537e	Rewrite tests for issue #1769	2018-01-12 23:49:51 +01:00
Kit	5b541cb5ec	Simplify tests for issue #1769	2018-01-12 23:34:27 +01:00
Kit	7a2adc4633	Remove some tests to see build status changes	2018-01-12 22:49:16 +01:00
Kit	0e62809a43	Rewrite tests for issue #1769	2018-01-12 22:26:06 +01:00
Ines Montani	36f426fe0a	Merge pull request #1808 from fucking-signup/master Fix issue #1769	2018-01-12 21:12:02 +00:00
Kit	76f4eeca44	Remove tests to see build changes on Windows (Python 2.7)	2018-01-12 20:30:51 +01:00
Matthew Honnibal	7ca49c2061	Merge branch 'master' into feature-improve-model-download	2018-01-10 18:21:55 +01:00
Kit	7ec0956e8d	Add regression test (issue #1769 )	2018-01-08 03:42:04 +01:00
Kit	701e7cc6aa	Rename variable to keep code consistent	2018-01-08 03:38:44 +01:00
Kit	ed0db95183	Find lowercased forms of ordinal words, where possible	2018-01-08 03:28:50 +01:00
Kit	9bc524982e	Find lowercased forms of numeric words	2018-01-08 03:25:08 +01:00
Søren Lind Kristiansen	62de5da1ff	Remove unsused dummy variable	2018-01-05 09:57:24 +01:00
Søren Lind Kristiansen	10dab8eef8	Remove dummy variable from function calls	2018-01-05 09:37:05 +01:00
Søren Lind Kristiansen	7f0ab145e9	Don't pass CLI command name as dummy argument	2018-01-04 21:33:47 +01:00
Ines Montani	6a008233b5	Merge pull request #1795 from textioHQ/issue1758 (resolves #1758 ) english tokenizer: handle "would've"	2018-01-04 02:43:39 +00:00
Kevin Humphreys	597df5bf83	add test	2018-01-03 13:00:05 -08:00
Kevin Humphreys	7918fa4ef9	handle would've	2018-01-03 12:25:48 -08:00
ines	2c656f90fb	Exit with 1 if incompatible models found (see #1714 )	2018-01-03 21:20:35 +01:00
ines	dacfaa2ca4	Ensure that download command exits properly (resolves #1714 )	2018-01-03 21:03:36 +01:00
Søren Lind Kristiansen	a9ff6eadc9	Prefix dummy argument names with underscore	2018-01-03 20:48:12 +01:00
ines	1081e08efb	Fix formatting	2018-01-03 20:14:50 +01:00
ines	d8109964d6	Use --no-deps on model install In general, it's nice for models to specify spaCy as a dependency. However, this tends to cause problems in conda environments, as pip will re-install spaCy and its dependencies (especially Thinc)	2018-01-03 17:40:37 +01:00
ines	319d754309	Fix overwriting of existing symlinks Check for is_symlink() to also overwrite invalid and outdated symlinks. Also show better error message if link path exists but is not symlink (i.e. file or directory).	2018-01-03 17:39:36 +01:00
ines	8ba0dfd017	Make message on failed linking more clear	2018-01-03 17:38:09 +01:00
Søren Lind Kristiansen	d6327e8495	Fix handling case when vectors not specified	2018-01-03 12:20:49 +01:00
Søren Lind Kristiansen	bcc51d7d8b	Fix shifted positional arguments	2018-01-03 12:19:47 +01:00
zqhZY	f27859fa99	add ChineseDefaults class for pickling	2017-12-28 17:13:58 +08:00
Ines Montani	ff9fc945ab	Merge pull request #1749 from sorenlind/da_ud_tokenization Tune Danish tokenizer to more closely match Universal Dependencies	2017-12-22 16:00:49 +00:00
ines	26f313dabc	Fix missing import	2017-12-22 16:21:44 +01:00
ines	8dc1c27841	Merge branch 'master' of https://github.com/explosion/spaCy	2017-12-22 16:01:00 +01:00
ines	b10ba848b8	xfail test that causes MemoryError on Python 2 on Windows Need to investigate this further!	2017-12-22 16:00:58 +01:00
Søren Lind Kristiansen	bef735aef7	Fix Danish abbreviation 'm.h.t.'	2017-12-21 09:24:31 +01:00
Ines Montani	a3dd167d7f	Merge branch 'master' into da_ud_tokenization	2017-12-20 21:05:34 +00:00
Ines Montani	97f100f69f	Merge pull request #1742 from kimfalk/master Two corrections in the da lan.	2017-12-20 21:02:00 +00:00
Ines Montani	d682a8803e	Merge pull request #1672 from cbilgili/master Adds Turkish Lemmatization	2017-12-20 21:01:00 +00:00
Benjamin Peterson	9452134cd1	remove no-break spaces from Hindi example (fixes #1750 )	2017-12-20 11:35:30 -08:00
Søren Lind Kristiansen	7a2f2f6f94	Fix formatting.	2017-12-20 18:37:37 +01:00
Søren Lind Kristiansen	15d13efafd	Tune Danish tokenizer to more closely match tokenization in Universal Dependencies.	2017-12-20 17:36:52 +01:00
Kim FalkJørgensen	648dc60755	Remove the incorrect exception 'm.h.t'	2017-12-20 10:02:39 +01:00
Kim FalkJørgensen	9c9f4ef84a	Fixing a translation error in examples.py Adding an exception in the tokenizer_exceptions.py	2017-12-19 15:26:50 +01:00
ines	22dc744b48	Fix check for '@' in like_url (see #1715 )	2017-12-16 13:48:43 +01:00
Ines Montani	9c1ee65268	Add regression test for #1698	2017-12-12 10:36:11 +01:00
Ines Montani	6455b574fc	Check for email address first	2017-12-12 10:25:13 +01:00
Bri-Will	d77361d76c	Update lex_attrs.py. Fix like_url from matching on e-mail	2017-12-11 14:13:28 -08:00
Søren Lind Kristiansen	5a9d377580	Remove abbreviation for positional plac argument	2017-12-11 11:08:29 +01:00
Isaac Sijaranamual	38021fbb00	Switch from python 3 only TemporaryDirectory to pytest's tmpdir	2017-12-11 00:16:04 +01:00
Isaac Sijaranamual	20ae0c459a	Fixes "Error saving model" #1622	2017-12-10 23:07:13 +01:00
Isaac Sijaranamual	568130ce7c	Adds regression test_issue1622	2017-12-10 23:00:48 +01:00
Isaac Sijaranamual	e188b61960	Make cli/train.py not eat exception	2017-12-10 22:53:08 +01:00
ines	020a7e5d52	Allow 'fine_grained' option in displaCy (see #1703 ) Shows token.tag_ instead of token.pos_. Disabled by default, to not cause rendering issues for models with long fine-grained tags (e.g. merged morphological features).	2017-12-09 15:11:12 +01:00
Matthew Honnibal	3b17eb7c49	Merge branch 'master' of https://github.com/explosion/spaCy	2017-12-07 10:39:32 +01:00
Matthew Honnibal	a6b43729c6	Set version to v2.0.5	2017-12-07 10:39:14 +01:00
ines	5eaa61c2b8	Fix formatting	2017-12-07 10:23:09 +01:00
ines	24e80c51b8	Document init-model command	2017-12-07 10:14:37 +01:00
Matthew Honnibal	c91f451b0f	Fix imports and CLI in init-model	2017-12-07 10:03:07 +01:00
ines	82e80ff928	Rename model command to init_model and fix formatting	2017-12-07 09:59:23 +01:00
Ines Montani	2feeb428d6	Merge pull request #1646 from GreenRiverRUS/master Added model command to create models from raw data	2017-12-07 08:54:26 +00:00
Matthew Honnibal	6373d2580d	Increment version to v2.0.5.dev0	2017-12-07 09:53:59 +01:00
Matthew Honnibal	36b47e3fa6	Fix (and test) vector pickling	2017-12-07 09:53:30 +01:00
Matthew Honnibal	05f41ff587	Set version to 2.0.4	2017-12-06 13:24:02 +01:00
Matthew Honnibal	04c38f7e87	Merge branch 'master' of https://github.com/explosion/spaCy	2017-12-06 12:15:52 +01:00
Matthew Honnibal	361944e512	If no rules are set, lemmatize by lookup	2017-12-06 12:12:11 +01:00
Matthew Honnibal	2ab0f2d186	Merge pull request #1664 from jimregan/italian-lemmatizer BOM in Italian lemmatiser	2017-12-06 11:09:04 +01:00
Matthew Honnibal	3f247119d3	Merge pull request #1668 from sorenlind/da_morph Add more Danish morph rules and clean up existing ones	2017-12-06 11:08:09 +01:00
Matthew Honnibal	b712de774e	Fix vectors pickling	2017-12-05 12:45:24 +01:00
Matthew Honnibal	04650e38c7	Set version to 2.0.4.dev0	2017-12-05 10:52:31 +01:00
Matthew Honnibal	07acb43a85	Merge branch 'master' of https://github.com/explosion/spaCy	2017-12-04 14:42:52 +01:00
Thomas Werkmeister	94eac75b7c	fix setup.py spacy req string for packaging Requirement should be `spacy>=2.0.2` instead of `spacy2.0.2`	2017-12-03 04:16:28 -06:00
ines	f2ea6d4713	Add Dutch example sentences (see #1107 )	2017-12-01 23:36:05 +01:00
Canbey Bilgili	abe098b255	Adds Turkish Lemmatization	2017-12-01 17:04:32 +03:00
Søren Lind Kristiansen	d86b537a38	Enable morph rules for Danish	2017-11-30 15:58:02 +01:00
Søren Lind Kristiansen	13a988adc3	Remove 'Number[psor]'	2017-11-30 15:55:04 +01:00
Søren Lind Kristiansen	dd6fde18a9	Add more Danish morph rules and clean up existing ones	2017-11-30 11:17:19 +01:00
Vadim Mazaev	495eacf470	Merge branch 'model_command'	2017-11-30 12:30:26 +03:00
Vadim Mazaev	4ba7ddf651	Bugfixies	2017-11-30 12:29:38 +03:00
Jim O'Regan	a4ecdeadd4	aha	2017-11-29 23:43:25 +00:00
Jim O'Regan	2c7a9215d7	Merge branch 'master' into animacy	2017-11-29 23:31:12 +00:00
Jim O'Regan	c3e6cee17a	use inan in polimorf tagset conversion	2017-11-29 23:15:47 +00:00
Jim O'Regan	b32575e78c	imports	2017-11-29 23:03:41 +00:00
Jim O'Regan	3696ce6a7b	add UD mapping	2017-11-29 22:59:19 +00:00
Jim O'Regan	f8e7082fe4	typo in "inan", add "nhum"	2017-11-29 22:40:47 +00:00
Matthew Honnibal	6bc0f4d29f	Merge pull request #1611 from fsonntag/master Solving #1494	2017-11-29 23:11:23 +01:00
Matthew Honnibal	f9ed9ea529	Merge pull request #1624 from GreenRiverRUS/russian Add support for Russian	2017-11-29 23:10:01 +01:00
Jim O'Regan	076a6fc60a	symbols	2017-11-29 20:11:20 +00:00
Jim O'Regan	834ba3c69a	(semi generated) Polimorf mapping	2017-11-29 20:08:24 +00:00
Jim O'Regan	ba6a23fd11	BOM in Italian lemmatiser	2017-11-29 17:40:07 +00:00
ines	a31506e060	Fix off-by-one error in nlp.add_pipe(after=name) (fixes #1654 )	2017-11-28 20:37:55 +01:00
ines	b62739fbfe	Add regression test for #1654	2017-11-28 20:27:54 +01:00
ines	2e50dbb9d7	Simplify test	2017-11-28 20:27:27 +01:00
Felix Sonntag	724ae7dc55	Fixed issue of infix capturing prefixes	2017-11-28 17:17:12 +01:00
Ines Montani	9052643e2c	Merge pull request #1653 from sorenlind/da_example_typo Fix typo	2017-11-27 14:47:42 +00:00
Søren Lind Kristiansen	5fe58b885b	Fix typo	2017-11-27 15:36:18 +01:00
Ines Montani	d52b1ab245	Add unicode_literals (hopefully fixes test failure on Python 2)	2017-11-27 15:16:54 +01:00
Søren Lind Kristiansen	0ffd27b0f6	Add several Danish alternative spellings	2017-11-27 13:35:41 +01:00
Ines Montani	6362024cf8	Merge pull request #1645 from GreenRiverRUS/fix_default_meta Fixed spaCy version string in default meta	2017-11-27 11:58:02 +00:00
Vadim Mazaev	c332ffdde1	Added model command to create model from raw data: words counts, brown clusters and vectors	2017-11-27 01:21:47 +03:00
Vadim Mazaev	59f03ab1d7	Fixed spacy version string in default meta	2017-11-26 23:02:07 +03:00
Vadim Mazaev	53e7c38637	Fixed tests depends on pymorphy2	2017-11-26 21:04:44 +03:00
Vadim Mazaev	cacd859dcd	Added tag map, fixed tests fails, added more exceptions	2017-11-26 20:54:48 +03:00
Ines Montani	a7bb8f1b42	Merge pull request #1637 from sorenlind/da_tokenization Improve Danish tokenization	2017-11-26 15:41:38 +00:00
ines	c699aec089	Add offsets_from_biluo_tags helper and tests (see #1626 )	2017-11-26 16:38:01 +01:00
Søren Lind Kristiansen	ef03e9ea53	Remove unused import.	2017-11-25 13:04:02 +01:00
Søren Lind Kristiansen	6aa241bcec	Add day of month tokenizer exceptions for Danish.	2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen	0c276ed020	Add weekday abbreviations and remove abiguous month abbreviations for Danish.	2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen	056547e989	Add multiple tokenizer exceptions for Danish.	2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen	8dc265ac0c	Add test for tokenization of 'i.' for Danish.	2017-11-24 11:29:37 +01:00
Søren Lind Kristiansen	ac8116510d	Fix tokenization of 'i.' for Danish.	2017-11-24 11:16:53 +01:00
Matthew Honnibal	79f11d4f85	Pickle vectors with vocab	2017-11-23 17:19:50 +01:00
Matthew Honnibal	f29c3925ee	Fix more efficient nonproj	2017-11-23 12:48:00 +00:00
Matthew Honnibal	e10e9ad2c5	Improve efficiency of Doc.to_array	2017-11-23 12:33:27 +00:00
Matthew Honnibal	2acc907d55	Improve profiling	2017-11-23 12:33:03 +00:00
Matthew Honnibal	fa62427300	Remove lookup-based lemmatization	2017-11-23 12:32:22 +00:00
Matthew Honnibal	fb26b2cb12	Use lookup lemmatizer if lemma unset	2017-11-23 12:31:58 +00:00
Matthew Honnibal	db5c714ad2	Improve efficiency of deprojectivization	2017-11-23 12:31:34 +00:00
Matthew Honnibal	8fec7268eb	Move string cleanup under a setting flag	2017-11-23 12:19:18 +00:00
Matthew Honnibal	5949777b12	Fix misleading multi-threading docstring	2017-11-23 12:18:59 +00:00
Matthew Honnibal	542e6fd4ea	Don't remove entries from specials	2017-11-23 12:17:42 +00:00
Matthew Honnibal	30ba81f881	Merge pull request #1576 from ligser/master Actually reset caches in pipe [wip]	2017-11-23 12:54:48 +01:00
ines	c90fe92e15	Fix displaCy test	2017-11-22 05:04:39 +01:00
ines	a6f33ac27d	Fix displaCy test	2017-11-22 04:19:28 +01:00
ines	93b0be611a	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-22 00:28:55 +01:00
ines	60b4915569	Use .pos_ instead of .tags_ in displaCy by default (see #1006 )	2017-11-22 00:28:52 +01:00
Vadim Mazaev	81314f8659	Fixed tokenizer: added char classes; added first lemmatizer and tokenizer tests	2017-11-21 22:23:59 +03:00
Vadim Mazaev	52ee1f9bf9	Updated Russian Language, added lemmatizer, norm exceptions and lex attrs	2017-11-21 11:44:46 +03:00
Burton DeWilde	a5c6869b2d	Fix bug where span.orth_ != span.text (see #1612 )	2017-11-20 12:05:43 -06:00
Burton DeWilde	635792997c	Add regression test for #1612	2017-11-20 12:05:35 -06:00
ines	9a63e32f21	Add noqa to Python 2 compat variables of built-ins (see #1617 )	2017-11-20 14:03:42 +01:00
ines	d70a64d78b	Fix syntax error and formatting in test (see #1617 )	2017-11-20 14:01:25 +01:00
ines	17849dee4b	Fix French test (see #1617 )	2017-11-20 13:59:59 +01:00
Felix Sonntag	33b0f86de3	Changed tokenizer to add infix when infix_start is offset	2017-11-19 16:32:10 +01:00
Felix Sonntag	8be3392302	Added regression text for 1494	2017-11-19 16:30:35 +01:00
Motoki Wu	a52e195a0a	Fixes Issue #1207 where `noun_chunks` of `Span` gives an error. Make sure to reference `self.doc` when getting the noun chunks. Same fix as `9750a0128c`	2017-11-17 17:16:20 -08:00
Motoki Wu	b818afaa0e	Added failing test for Issue #1207 . The noun chunk iterator should work for `Doc` but not for `Span`.	2017-11-17 17:04:27 -08:00
Vadim Mazaev	a0739a06d4	Returned russian support from v1.10 branch	2017-11-17 17:06:15 +03:00
yuukos	7401152289	updated Russian tokenizer moved the trying to import pymorph into __init__	2017-11-17 17:04:50 +03:00
yuukos	3aad66cf00	added russian language support	2017-11-17 17:04:22 +03:00
ines	a3d4dd1a5d	Test adding of lots of pipeline components (see #1585 ) Just to make sure that there's no error now or in the future with adding a large number of pipeline components.	2017-11-15 17:28:06 +01:00
Roman Domrachev	61d28d03e4	Try again to do selective remove cache	2017-11-15 19:11:12 +03:00
Roman Domrachev	b3311100c7	Merge branch 'master' of github.com:explosion/spaCy	2017-11-15 18:30:04 +03:00
Matthew Honnibal	b60d92aca8	Increment version	2017-11-15 16:14:46 +01:00
Roman Domrachev	505c6a2f2f	Completely cleanup tokenizer cache Tokenizer cache can have be different keys than string That modification can slow down tokenizer and need to be measured	2017-11-15 17:55:48 +03:00
Matthew Honnibal	cf0be62096	Increment version	2017-11-15 15:00:18 +01:00
ines	97a4f9362b	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-15 14:24:00 +01:00
ines	8e65247886	Fix lex.id if vectors is None	2017-11-15 14:23:58 +01:00
Matthew Honnibal	437ad1a852	Merge pull request #1570 from explosion/feature/fix-beam-leak Fix memory leak in beam parser	2017-11-15 14:15:05 +01:00
Matthew Honnibal	2f169fdb0a	Set lex ID correctly for new tokens in Vocab	2017-11-15 13:58:03 +01:00
Matthew Honnibal	fe3c42a06b	Fix caching in tokenizer	2017-11-15 13:55:46 +01:00
Matthew Honnibal	8d692771f6	Improve profiling	2017-11-15 13:51:25 +01:00
Matthew Honnibal	b797dca977	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-15 13:11:43 +01:00
ines	c9d72de0fb	Add dummy serialization methods for Japanese and missing lang getter (resolves #1557 )	2017-11-15 12:44:02 +01:00
Matthew Honnibal	d274d3a3b9	Let beam forward use minibatches	2017-11-15 00:51:42 +01:00
Matthew Honnibal	855872f872	Remove state hashing	2017-11-14 23:36:46 +01:00
Roman Domrachev	3e21680814	Use safer method to get string without hit	2017-11-14 22:58:46 +03:00
Roman Domrachev	a33d5a068d	Try to hold origin data instead of restore it	2017-11-14 22:40:03 +03:00
Roman Domrachev	91e2fa6561	Clean all caches	2017-11-14 21:15:04 +03:00
Roman Domrachev	4e378dc4a4	Remove all obsolete code and test only initial problem	2017-11-14 20:45:04 +03:00
Roman	47ce2347b0	Create test that fails when actual cleanup caused	2017-11-14 20:28:13 +03:00
Roman	caae77f72d	Update strings.pyx	2017-11-14 19:44:40 +03:00
Roman Domrachev	3d247d2bb8	Get back previous testcase	2017-11-14 18:01:37 +03:00
Roman Domrachev	870defa815	Swap keys in proper place Remove unnecessary clear of the hits	2017-11-14 17:56:30 +03:00
Roman Domrachev	86ca434c93	Merge github.com:explosion/spaCy	2017-11-14 17:46:22 +03:00
Roman Domrachev	a2745b0e84	StringStore now actually cleaned Do not lose docs in ref tracking	2017-11-14 17:45:50 +03:00
Matthew Honnibal	2512ea9eeb	Fix memory leak in beam parser	2017-11-14 02:11:40 +01:00
Matthew Honnibal	86ddf692a1	Fix bug in limit calculation on dev data	2017-11-14 01:37:10 +01:00
Ines Montani	ea6c85c67a	Merge pull request #1566 from MathiasDesch/master (resolves #1248 ) Add exceptions to tokenizer and norm	2017-11-13 19:05:22 +01:00
Matthew Honnibal	1b348389bb	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-13 18:18:48 +01:00
Matthew Honnibal	ca73d0d8fe	Cleanup states after beam parsing, explicitly	2017-11-13 18:18:26 +01:00
Matthew Honnibal	63ef9a2e73	Remove __dealloc__ from ParserBeam	2017-11-13 18:18:08 +01:00
Mathias Deschamps	c0691b2ab4	Add tokenizer exceptions for ing verbs Extend list of tokenizing exceptions introduced in `123810b`	2017-11-13 17:46:05 +01:00
Mathias Deschamps	288298ead9	Add norm exception for ing verbs Some ing verbs are sometimes written in or in'. Make the NORM form correct	2017-11-13 17:46:05 +01:00
Abhinav Sharma	59f5740ede	improved upon the list of included stop_words	2017-11-13 17:13:49 +05:30
Matthew Honnibal	6e641f46d4	Create a preprocess function that gets bigrams	2017-11-12 00:43:41 +01:00
Matthew Honnibal	c9251d79e3	Edit comment	2017-11-11 18:38:32 +01:00
Matthew Honnibal	dd1678eab3	Edit comment	2017-11-11 18:37:08 +01:00
Roman Domrachev	ee60a52ee7	Fix test imports and last batch cleanup	2017-11-11 11:32:16 +03:00
Roman Domrachev	4a6b094e09	Remove unused import	2017-11-11 03:13:05 +03:00
Roman Domrachev	3c600adf23	Try to fix StringStore clean up (see #1506 )	2017-11-11 03:11:27 +03:00
ines	ee97fd3cb4	Add regression test for #1547	2017-11-11 00:14:03 +01:00
ines	2df27db671	Add unicode declaration	2017-11-11 00:13:56 +01:00
ines	35653bef3a	Add missing import (fixes #1546 )	2017-11-10 19:05:18 +01:00
ines	4c5d2c80d5	Re-add python -m to commands, too brittle :( (see #1536 )	2017-11-10 02:30:55 +01:00
ines	123810b6de	Add "lovin'" to tokenizer exceptions (see #1248 )	2017-11-09 17:09:30 +01:00
ines	1c218397f6	Ensure path in Doc.to_disk/from_disk (resolves ##1521) Also add Doc serialization tests with both Path and string path options	2017-11-09 02:29:03 +01:00
Matthew Honnibal	49fd5a646f	Set version for 2.0.2 release	2017-11-08 22:39:39 +01:00
Matthew Honnibal	fba2dbddf7	Increment version	2017-11-08 22:19:08 +01:00
Matthew Honnibal	a5ea0fdf5a	Fix #1518 : vocab.vectors.resize() didn't work	2017-11-08 22:18:37 +01:00
Matthew Honnibal	de45702bbe	Strip dev suffixes from version for compatibility check	2017-11-08 18:40:21 +01:00
Matthew Honnibal	51639214a1	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-08 18:04:33 +01:00
Matthew Honnibal	a2f980de4e	Exclude .devN versioning from compatibility check	2017-11-08 18:03:52 +01:00
Daniel Hershcovich	d7ae54ff44	Fix typo in message	2017-11-08 16:06:28 +02:00
Matthew Honnibal	4194bc5744	Xfail flakey serialization test	2017-11-08 13:55:13 +01:00
Matthew Honnibal	d5537e5516	Work on Windows test failure	2017-11-08 13:25:18 +01:00
Matthew Honnibal	c27c82d5f9	Fix serialization	2017-11-08 13:08:48 +01:00
Matthew Honnibal	1d5599cd28	Fix dtype	2017-11-08 12:18:32 +01:00
Matthew Honnibal	fa7fdd0d9b	Merge branch 'master' of https://github.com/explosion/spaCy	2017-11-08 12:11:31 +01:00
Matthew Honnibal	072ff38a01	Try to fix python3.5 serialization	2017-11-08 12:10:49 +01:00
Ines Montani	3a0f34d567	Merge pull request #1509 from abhi18av/patch-1 Create examples.py for Hindi language	2017-11-08 11:37:19 +01:00
Ines Montani	42b241ccd0	Update language code in usage example in comment	2017-11-08 11:36:38 +01:00
Matthew Honnibal	e262e8d942	Increment version to v2.0.2.dev0	2017-11-08 11:25:47 +01:00
Matthew Honnibal	a8b592783b	Make a dtype more specific, to fix a windows build	2017-11-08 11:24:35 +01:00
Abhinav Sharma	84edade82d	Create examples.py Populated the file with the translations of English example sentences	2017-11-08 13:23:08 +05:30
Matthew Honnibal	d725aee4e2	Increment version to 2.0.1	2017-11-08 02:14:47 +01:00
Matthew Honnibal	8d6f68f1df	Increment version	2017-11-08 01:12:34 +01:00
ines	bcf42b8846	Fix typo	2017-11-08 01:06:37 +01:00
Matthew Honnibal	bbd2a3dee1	Fix title in about.py	2017-11-07 14:02:58 +01:00
Matthew Honnibal	4efaf9306c	Set version to spacy-nightly rc2	2017-11-07 13:27:26 +01:00
Matthew Honnibal	bf1ec2965f	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-07 13:20:29 +01:00
Matthew Honnibal	726f689da4	Fix missing import	2017-11-07 13:20:12 +01:00
ines	834f9c1aab	Update about.py	2017-11-07 13:11:33 +01:00
ines	a4662a31a9	Move model package templates to cli.package and update docs	2017-11-07 12:15:35 +01:00
ines	a09c096d3c	Get docs ready for v2.0.0	2017-11-07 12:00:43 +01:00
Matthew Honnibal	9a88e66103	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-07 02:00:06 +01:00
Matthew Honnibal	174abe4677	Increment to 2.0.0rc1	2017-11-07 01:59:46 +01:00
ines	42a0fbf291	Fix textcat simple train example	2017-11-07 01:25:54 +01:00
ines	8fb48b9b91	Update and document new util functions	2017-11-07 00:22:43 +01:00
Matthew Honnibal	1cab703bba	Move minibatch function to util	2017-11-06 23:45:36 +01:00
ines	5f43953536	Move test	2017-11-06 23:14:10 +01:00
Matthew Honnibal	dd90fe09f5	Remove extraneous label from textcat class	2017-11-06 22:09:02 +01:00
Matthew Honnibal	45e0617e61	Allow Language.update to take unicode text and dict objects	2017-11-06 22:07:38 +01:00
Matthew Honnibal	1831dbd065	Add test of simple textcat workflow	2017-11-06 22:04:29 +01:00
Matthew Honnibal	ffb9101f3f	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-06 19:20:41 +01:00
Matthew Honnibal	8fea512ac8	Don't set tensor in textcat	2017-11-06 19:20:14 +01:00
ines	acb9bdb852	Fix PRON_LEMMA imports	2017-11-06 17:41:53 +01:00
Matthew Honnibal	7d46793dd7	Add PRON_LEMMA to spacy.symbols	2017-11-06 17:38:25 +01:00
Matthew Honnibal	2f7e9f390d	Make test less flakey	2017-11-06 17:34:50 +01:00
Matthew Honnibal	407b08017e	Make test less flakey	2017-11-06 17:31:40 +01:00
Matthew Honnibal	102f797933	Fix lemma ordering in test	2017-11-06 17:02:17 +01:00
Matthew Honnibal	75e1618ec3	Fix lemma clobbering	2017-11-06 16:56:19 +01:00
Matthew Honnibal	6fdffd7246	Merge pull request #1497 from explosion/feature/improve-optimizer-handling 💫 Improve optimizer handling	2017-11-06 16:41:15 +01:00
Matthew Honnibal	8e6795437b	Set release=True	2017-11-06 16:39:32 +01:00
Matthew Honnibal	5c85bf3791	Fix missing import	2017-11-06 15:06:27 +01:00
Matthew Honnibal	25859dbb48	Return optimizer from begin_training, creating if necessary	2017-11-06 14:26:49 +01:00
Matthew Honnibal	465adfee94	Remove unused resume_training method, and pass optimizer through	2017-11-06 14:26:00 +01:00
Matthew Honnibal	13336a6197	Fix Adam import	2017-11-06 14:25:37 +01:00
Matthew Honnibal	2eb11d60f2	Add function create_default_optimizer to spacy._ml	2017-11-06 14:11:59 +01:00
Matthew Honnibal	31babe3c3f	Fix non-clobbering lemmatization	2017-11-06 12:36:05 +01:00
Matthew Honnibal	63c6ae4191	Fix lemmatizer test	2017-11-06 11:57:06 +01:00
Matthew Honnibal	a86a0181b5	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 22:19:10 +01:00
Matthew Honnibal	134d3b8143	Fix morphology	2017-11-05 22:18:22 +01:00
ines	08d1cf850a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 21:41:58 +01:00
ines	baa231745c	Fix Dutch tag map	2017-11-05 21:41:50 +01:00
Matthew Honnibal	46e62ad747	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 19:40:00 +01:00
Matthew Honnibal	bb25cb0f76	Avoid clobbering preset lemmas	2017-11-05 19:39:38 +01:00
ines	507ecb67af	Fix Spanish tag map	2017-11-05 19:23:34 +01:00
Matthew Honnibal	320008352b	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 18:46:15 +01:00
Matthew Honnibal	38109a0e4a	Register SentenceSegmenter in Language.factories	2017-11-05 18:45:57 +01:00
ines	975e1042ff	Fix Italian tag map	2017-11-05 18:34:09 +01:00
ines	6b2d6e4937	Fix Portuguese tag map	2017-11-05 18:31:00 +01:00
ines	fa2687fded	Fix Dutch tag map	2017-11-05 17:57:59 +01:00
ines	fb8990d916	Fix Spanish tag map	2017-11-05 17:48:46 +01:00
ines	9d13288f73	Fix French tag map	2017-11-05 17:47:59 +01:00
ines	54579805c5	Fix French tag map	2017-11-05 17:44:05 +01:00
Matthew Honnibal	2b35bb76ad	Fix tensorizer on GPU	2017-11-05 15:34:40 +01:00
Matthew Honnibal	6e5181bbaa	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-05 15:33:56 +01:00
Matthew Honnibal	6f438b17c1	Increment version to v2.0.0a19	2017-11-05 14:43:36 +01:00
Matthew Honnibal	225cc249c9	Pass string path to numpy, to fix #1479	2017-11-05 14:42:46 +01:00
Matthew Honnibal	00435d8f0c	Add extra beam parsing test	2017-11-05 14:39:57 +01:00
Matthew Honnibal	e777ea25bb	Merge pull request #1492 from uwol/develop TextCategorizer return parameter fix	2017-11-05 14:13:04 +01:00
Matthew Honnibal	0d4bd6414e	Fix Italian tag map	2017-11-05 14:11:03 +01:00
ines	ef597622a6	Add Portuguese tag map	2017-11-05 13:58:34 +01:00
ines	793c62dfda	Add Dutch tag map	2017-11-05 13:48:07 +01:00
ines	f7485a09c8	Fix Italian tag map	2017-11-05 13:12:58 +01:00
uwol	a2162b8908	tensorizer return parameter fix	2017-11-05 12:25:10 +01:00
ines	0a27afbf86	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-04 23:32:52 +01:00
ines	3cef901834	Add tag map for French and Italian	2017-11-04 23:32:51 +01:00
Matthew Honnibal	cfb83c231c	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-04 23:08:19 +01:00
Matthew Honnibal	d185927998	Undo harmful pickling hacks on Language class	2017-11-04 23:07:03 +01:00
ines	6c15aafebd	Fix formatting	2017-11-04 23:07:02 +01:00
Matthew Honnibal	3ca16ddbd4	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-04 00:25:02 +01:00
Matthew Honnibal	e4ec4be948	Fix parser test	2017-11-04 00:23:45 +01:00
Matthew Honnibal	98c29b7912	Add padding vector in parser, to make gradient more correct	2017-11-04 00:23:23 +01:00
ines	5e7d98f72a	Remove test for #1491	2017-11-03 22:10:57 +01:00
ines	718f1c50fb	Add regression test for #1491	2017-11-03 21:11:20 +01:00
Matthew Honnibal	144a93c2a5	Back-off to tensor for similarity if no vectors	2017-11-03 20:56:33 +01:00
Matthew Honnibal	1e9634691a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-03 20:21:15 +01:00
Matthew Honnibal	13c8881d2f	Expose parser's tok2vec model component	2017-11-03 20:20:59 +01:00
Matthew Honnibal	17c63906f9	Update tensorizer component	2017-11-03 20:20:26 +01:00
Matthew Honnibal	2bf21cbe29	Update model after optimising it instead of waiting	2017-11-03 20:20:01 +01:00
Matthew Honnibal	d6e831bf89	Fix lemmatizer tests	2017-11-03 19:46:34 +01:00
ines	eef930c73e	Assert instead of print	2017-11-03 18:50:57 +01:00
ines	f0986df94b	Add test for #1488 (passes on v2.0.0a18?)	2017-11-03 14:44:36 +01:00
Matthew Honnibal	711278b667	Make test less flakey	2017-11-03 14:36:08 +01:00
Matthew Honnibal	7fea845374	Remove print statement	2017-11-03 14:04:51 +01:00
Matthew Honnibal	0a534ae96a	Fix test for backprop d_pad	2017-11-03 14:04:16 +01:00
Matthew Honnibal	33bd2428db	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-03 13:29:56 +01:00
Matthew Honnibal	6681058abd	Fix tensor extending in tagger	2017-11-03 13:29:36 +01:00
Matthew Honnibal	bd2cbdfa85	Make Morphology not fail on unknown tags	2017-11-03 13:29:09 +01:00
Matthew Honnibal	c9b118a7e9	Set softmax attr in tagger model	2017-11-03 11:22:01 +01:00
Matthew Honnibal	a5b05f85f0	Set Doc.tensor attribute in parser	2017-11-03 11:21:00 +01:00
Matthew Honnibal	62ed58935a	Add Doc.extend_tensor() method	2017-11-03 11:20:31 +01:00
Matthew Honnibal	d6fc39c8a6	Set Doc.tensor from Tagger	2017-11-03 11:20:05 +01:00
Matthew Honnibal	b3264aa5f0	Expose the softmax layer in the tagger model, to allow setting tensors	2017-11-03 11:19:51 +01:00
Matthew Honnibal	c2bbf076a4	Add document length cap for training	2017-11-03 01:54:54 +01:00
Matthew Honnibal	6771780d3f	Fix backprop of padding variable	2017-11-03 01:54:34 +01:00
Matthew Honnibal	54a716f2ec	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-03 00:55:20 +01:00
Matthew Honnibal	260e6ee3fb	Improve efficiency of backprop of padding variable	2017-11-03 00:49:11 +01:00
Matthew Honnibal	a22f96c3f1	Add test for backpropagating padding	2017-11-03 00:48:54 +01:00
ines	9baab241b4	Add skeleton language data for Turkish	2017-11-02 16:32:24 +01:00
ines	c6fea3e5f6	Add Romanian and Croatian skeletons (experimental) Add language data templates to make it easier for others to contribute to the language support	2017-11-01 23:04:28 +01:00
ines	18c859500b	Add missing imports	2017-11-01 23:02:51 +01:00
ines	819e30a26e	Tidy up tokenizer exceptions	2017-11-01 23:02:45 +01:00
ines	3af281a334	Update test model name	2017-11-01 23:02:00 +01:00
Matthew Honnibal	b30dd36179	Allow Tagger.add_label() before training	2017-11-01 21:49:24 +01:00
Matthew Honnibal	eca41f0cf6	Fix filename conversion for conllu	2017-11-01 21:26:49 +01:00
Matthew Honnibal	e237472cdc	Fix tag and filename conversion for conllu	2017-11-01 21:25:33 +01:00
Matthew Honnibal	b84d99b281	Revert tagger.add_label() changes, to fix model	2017-11-01 21:10:45 +01:00
Matthew Honnibal	f5855e539b	Fix tagger model loading	2017-11-01 20:42:36 +01:00
Matthew Honnibal	624644adfe	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 20:26:41 +01:00
ines	5f661a1b3a	Remove tensorizer from pre-set pipe_names	2017-11-01 19:48:33 +01:00
Matthew Honnibal	190522efd3	Fix tagger when some tags aren't in Morphology	2017-11-01 19:27:49 +01:00
Matthew Honnibal	e85e31cfbd	Fix backprop of d_pad	2017-11-01 19:27:26 +01:00
Matthew Honnibal	759cc79185	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 19:00:19 +01:00
Matthew Honnibal	1ae40b50b4	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 17:07:02 +01:00
Matthew Honnibal	7ae1aacdb8	Fix add_label methods	2017-11-01 17:06:43 +01:00
ines	8c2260e18c	Move span tests to /doc	2017-11-01 16:56:35 +01:00
Matthew Honnibal	2ef7b59eb0	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 16:51:41 +01:00
ines	1d1f91a041	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 16:49:44 +01:00
ines	9659391944	Update deprecated methods and add warnings	2017-11-01 16:49:42 +01:00
ines	260cb37224	Catch deprecation warning	2017-11-01 16:49:18 +01:00
ines	5914faafbb	Fix .merge tests to not use deprecated API	2017-11-01 16:49:11 +01:00
ines	705a4e3e4a	Fix formatting	2017-11-01 16:44:08 +01:00
Matthew Honnibal	d17a12c71d	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 16:38:26 +01:00
Matthew Honnibal	9f9439667b	Don't create low-data text classifier if no vectors	2017-11-01 16:34:09 +01:00
Matthew Honnibal	e7a9174877	Add add_label methods to Tagger and TextCategorizer	2017-11-01 16:32:44 +01:00
ines	39e0586192	Add deprecated helper Uses warning to show DeprecationWarning and custom stack trace	2017-11-01 16:32:36 +01:00
Matthew Honnibal	a7bf38bf31	Remove misleading comment on util.get_cuda_stream()	2017-11-01 13:57:25 +01:00
Matthew Honnibal	273e96b63f	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 13:27:35 +01:00
Matthew Honnibal	9e0ebee81c	Add Token.is_sent_start property, so can deprecate Token.sent_start	2017-11-01 13:27:14 +01:00
Matthew Honnibal	7e7116cdf7	Fix Doc.to_array when only one string attr provided	2017-11-01 13:26:43 +01:00
Matthew Honnibal	301fb2bb60	Implement Span.n_lefts and Span.n_rights	2017-11-01 13:25:12 +01:00
Matthew Honnibal	c047498f87	Fix vectors test	2017-11-01 13:24:47 +01:00
ines	9a5e7c6fe2	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 13:14:45 +01:00
ines	bfe17b7df1	Fix begin_training if get_gold_tuples is None	2017-11-01 13:14:31 +01:00
ines	affd3404ab	Remove old model command (now "vocab")	2017-11-01 13:14:03 +01:00
Matthew Honnibal	fdb4b8e456	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 02:07:17 +01:00
Matthew Honnibal	c48dd0e1d3	Fix vector pruning	2017-11-01 02:06:58 +01:00
ines	37e62ab0e2	Update vector meta in meta.json	2017-11-01 01:25:09 +01:00
ines	96b4aef0bf	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 01:10:53 +01:00
Matthew Honnibal	86eba61fae	Fix token.vector when vectors are missing	2017-11-01 00:47:35 +01:00
ines	5683fd65ed	Update docstrings	2017-11-01 00:42:39 +01:00
Matthew Honnibal	44bce8e53f	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 00:35:16 +01:00
Matthew Honnibal	c16310d156	Update vectors with find method	2017-11-01 00:34:55 +01:00
Ines Montani	d11659463b	Merge pull request #1152 from jimregan/develop-irish [WIP] attempt a port from #1147	2017-11-01 00:23:43 +01:00
ines	2ad2f09d12	Update docstrings and simplify most_similar	2017-11-01 00:18:08 +01:00
Jim O'Regan	08b0bfd153	merge	2017-10-31 22:55:59 +00:00
Jim O'Regan	00ecfa5417	Ó, not O	2017-10-31 22:54:42 +00:00
ines	ba2e6c8c6f	Update docstrings and formatting	2017-10-31 23:23:34 +01:00
Matthew Honnibal	0de8d213a3	Merge pull request #1475 from explosion/feature/sm-vectors Improve and simplify Vectors class	2017-10-31 22:59:50 +01:00
Ines Montani	25b1d6cd91	Fix syntax error	2017-10-31 22:36:03 +01:00
Matthew Honnibal	92dc127569	Fix test for Python 3	2017-10-31 22:21:55 +01:00
Jim O'Regan	fe4b10346a	replace example sentence until I get around to adding a punctuation.py	2017-10-31 20:24:53 +00:00
Matthew Honnibal	c5799ecc7b	Remove print statement	2017-10-31 21:12:33 +01:00
ines	7e424a1804	Don't copy exception dicts if not necessary and tidy up	2017-10-31 21:05:29 +01:00
Matthew Honnibal	c390f2d745	Make it easier to pass explicit no-pruning to vocab	2017-10-31 20:14:47 +01:00
Ines Montani	06c25a8882	Remove comma that caused list to wrap in tuple! Also removed extra dict wrappings for performance (we used to have them in there, but they should only really exist if copying the dict is absolutely necessary)	2017-10-31 20:13:16 +01:00
Matthew Honnibal	d90a22afe6	Fix loading previous vectors models	2017-10-31 19:58:35 +01:00
Ines Montani	147448b65b	Add missing symbols	2017-10-31 19:34:45 +01:00
Matthew Honnibal	997a61557a	Add vectors.n_keys property	2017-10-31 19:30:52 +01:00
Matthew Honnibal	8075726838	Restore vector usage in models	2017-10-31 19:21:17 +01:00
Matthew Honnibal	3659a807b0	Remove vector pruning arg from train CLI	2017-10-31 19:21:05 +01:00
Ines Montani	9b0de9fb43	Fix import of symbols (now nested one level lower)	2017-10-31 19:17:58 +01:00
Matthew Honnibal	59203a2e8a	Move vector pruning command into spacy vocab cli tool	2017-10-31 19:10:01 +01:00
Matthew Honnibal	77d8f5de9a	Revise and simplify Vectors class	2017-10-31 18:25:08 +01:00
Jim O'Regan	d4a8160c36	change quotes	2017-10-31 15:15:44 +00:00
Jim O'Regan	34ca59691b	no idea what is wrong here	2017-10-31 14:50:13 +00:00
Jim O'Regan	41dd29e48e	merge	2017-10-31 14:07:45 +00:00
Matthew Honnibal	cb5217012f	Fix vector remapping	2017-10-31 11:40:46 +01:00
Matthew Honnibal	9c11ee4a1c	WIP on vectors fixes	2017-10-31 11:22:56 +01:00
Matthew Honnibal	ce876c551e	Fix GPU usage	2017-10-31 02:33:34 +01:00
Matthew Honnibal	7698903617	Fix GPU usage	2017-10-31 02:33:16 +01:00
Matthew Honnibal	368fdb389a	WIP on refactoring and fixing vectors	2017-10-31 02:00:26 +01:00
Matthew Honnibal	4e3006cec7	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-30 19:44:58 +01:00
Matthew Honnibal	4112a991ec	Fix vector pruning	2017-10-30 19:44:40 +01:00
ines	ec657c1ddc	Update vocab docs and document Vocab.prune_vectors	2017-10-30 19:35:41 +01:00
ines	803e41bc66	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-30 18:39:51 +01:00
ines	8e02294241	Add vectors to Language.meta	2017-10-30 18:39:48 +01:00
ines	abf8aa05d3	Populate --create-meta defaults from file if available If meta.json is found in directory and user chooses to overwrite it, show existing data as defaults.	2017-10-30 18:39:38 +01:00
ines	ce98fa7934	Fix formatting	2017-10-30 18:38:55 +01:00
ines	98c35d2585	Fix spacy vocab command	2017-10-30 18:38:41 +01:00
Matthew Honnibal	e98451b5f7	Add -prune-vectors argument to spacy.cly.train	2017-10-30 18:00:10 +01:00
Matthew Honnibal	e026b29ea9	Add prune_vectors method to Vocab	2017-10-30 17:59:43 +01:00
Explosion Bot	d0cf12c8c7	Fix off-by-one error in vectors	2017-10-30 16:22:03 +01:00
Explosion Bot	05a1dd570e	Fix vocab script	2017-10-30 16:19:22 +01:00
Explosion Bot	b46bdce8d2	Add missing import	2017-10-30 16:18:10 +01:00
Explosion Bot	2d2cc294b4	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-30 16:15:05 +01:00
Explosion Bot	0fc1209421	Wire up new vocab command	2017-10-30 16:14:50 +01:00
Explosion Bot	aa64031751	Fix clear_vectors() method on Vocab	2017-10-30 16:09:04 +01:00
Explosion Bot	7b56b2f04b	Add Vocab.cfg attr, to hold stuff like oov probs	2017-10-30 16:08:50 +01:00
Explosion Bot	ab5d5ed880	Fix vectors.add()	2017-10-30 16:08:09 +01:00
Explosion Bot	41d0f1665a	Fix add_attrs for cluster	2017-10-30 16:07:50 +01:00
ines	5453821a9f	Update NER annotation scheme Add note on training data sources and include coarse-grained Wikipedia scheme	2017-10-30 13:53:49 +01:00
Explosion Bot	5ede7cec9b	Improve Lexeme.set_attrs method	2017-10-30 11:49:11 +01:00
Explosion Bot	72aea8f105	Update vectors.add() to allow setting keys to rows	2017-10-30 10:03:08 +01:00
Matthew Honnibal	c43cc5361d	Merge pull request #1467 from explosion/feature/better-parser 💫 Bug fixes to parser model (requires retraining)	2017-10-29 02:05:22 +02:00
ines	6c2d8d3b2a	Use shortcuts-nightly.json to resolve model shortcuts	2017-10-29 01:28:31 +02:00
Matthew Honnibal	a0c7dabb72	Fix bug in 8-token parser features	2017-10-28 23:01:35 +00:00
Matthew Honnibal	b713d10d97	Switch to 13 features in parser	2017-10-28 23:01:14 +00:00
Matthew Honnibal	3b91097321	Whitespace	2017-10-28 17:05:11 +00:00
Matthew Honnibal	6ef72864fa	Improve initialization for hidden layers	2017-10-28 17:05:01 +00:00
Matthew Honnibal	5414e2f14b	Use missing features in parser	2017-10-28 16:45:54 +00:00
Matthew Honnibal	df4803cc6d	Add learned missing values for parser	2017-10-28 16:45:14 +00:00
Matthew Honnibal	64e4ff7c4b	Merge 'tidy-up' changes into branch. Resolve conflicts	2017-10-28 13:16:06 +02:00
Explosion Bot	fb0c96f39a	Fix optimizer loading	2017-10-28 11:58:16 +02:00
Explosion Bot	b22e42af7f	Merge changes to parser and _ml	2017-10-28 11:52:10 +02:00
ines	d96e72f656	Tidy up rest	2017-10-27 21:07:59 +02:00
ines	a8e10f94e4	Tidy up Lexeme and update docs	2017-10-27 21:07:50 +02:00
ines	ba5e646219	Tidy up pipeline	2017-10-27 20:29:08 +02:00
ines	b4d226a3f1	Tidy up syntax	2017-10-27 19:45:57 +02:00
ines	5167a0cce2	Tidy up Vectors and docs	2017-10-27 19:45:19 +02:00
ines	7946464742	Remove spacy.tagger (now in pipeline)	2017-10-27 19:45:04 +02:00
ines	9c89e2cdef	Remove unused syntax iterators (now in language data)	2017-10-27 18:09:53 +02:00
ines	d2df81d907	Fix not implemented Span getters	2017-10-27 18:09:28 +02:00
ines	544a407b93	Tidy up Doc, Token and Span and add missing docs	2017-10-27 17:07:26 +02:00
ines	a6135336f5	Tidy up gold	2017-10-27 17:02:55 +02:00
ines	6a0483b7aa	Tidy up and document Doc, Token and Span	2017-10-27 15:41:45 +02:00
ines	1a559d4c95	Remove old, unused file	2017-10-27 15:34:35 +02:00
ines	91899d337b	Tidy up language, lemmatizer and scorer	2017-10-27 14:40:14 +02:00
ines	778212efea	Tidy up init and main	2017-10-27 14:39:51 +02:00
ines	e33b7e0b3c	Tidy up parser and ML	2017-10-27 14:39:30 +02:00
ines	e3265998c0	Tidy up displaCy	2017-10-27 14:39:19 +02:00
ines	ea4a41c8fb	Tidy up util and helpers	2017-10-27 14:39:09 +02:00
ines	d941fc3667	Tidy up CLI	2017-10-27 14:38:39 +02:00
Matthew Honnibal	531142a933	Merge remote-tracking branch 'origin/develop' into feature/better-parser	2017-10-27 12:34:48 +00:00
Matthew Honnibal	19a2b9bf27	Fix import of Optimizer	2017-10-27 12:33:42 +00:00
Matthew Honnibal	4d048e94d3	Add compat for thinc.neural.optimizers.Optimizer	2017-10-27 10:23:49 +00:00
Ines Montani	4033e70c71	Merge pull request #1461 from explosion/feature/disable-pipes 💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples	2017-10-27 12:21:40 +02:00
Matthew Honnibal	75a637fa43	Remove redundant imports from _ml	2017-10-27 10:19:56 +00:00
Matthew Honnibal	c9987cf131	Avoid use of numpy.tensordot	2017-10-27 10:18:36 +00:00
Matthew Honnibal	f6fef30adc	Remove dead code from spacy._ml	2017-10-27 10:16:41 +00:00
Matthew Honnibal	b9616419e1	Add try/except around bz2 import	2017-10-27 01:18:05 +00:00
Matthew Honnibal	783c0c8795	Remove unnecessary bz2 import	2017-10-27 01:17:54 +00:00
Matthew Honnibal	bb25bdcd92	Adjust call to scatter_add for the new version	2017-10-27 01:16:55 +00:00
Ines Montani	287a3ca256	Merge pull request #1466 from explosion/feature/rename-pipeline 💫 Clean up dead linear model code	2017-10-27 02:03:28 +02:00
ines	4eb5bd02e7	Update textcat pre-processing after to_array change	2017-10-27 00:32:12 +02:00
ines	2d6ec99884	Set 'model' as default model name to prevent meta.json errors	2017-10-26 16:12:23 +02:00
ines	9e372913e0	Remove old 'SP' condition in tag map	2017-10-26 16:11:57 +02:00
Matthew Honnibal	c52671420c	Remove old cfile import	2017-10-26 13:28:19 +02:00
Matthew Honnibal	ea03f1ef64	Remove obsolete cfile code	2017-10-26 13:23:36 +02:00
Matthew Honnibal	90d1d9b230	Remove obsolete parser code	2017-10-26 13:22:45 +02:00
ines	6f78e29bed	Add LAW entity label to glossary	2017-10-26 13:04:35 +02:00
ines	9bf78d5fb3	Update spacy.explain docs	2017-10-26 13:04:25 +02:00
Matthew Honnibal	33f8c58782	Remove obsolete parser.pyx	2017-10-26 12:42:05 +02:00
Matthew Honnibal	a8abc47811	Rename BaseThincComponent --> Pipe	2017-10-26 12:40:40 +02:00
Matthew Honnibal	b0f3ea2200	Fix names of pipeline components NeuralDependencyParser --> DependencyParser NeuralEntityRecognizer --> EntityRecognizer TokenVectorEncoder --> Tensorizer NeuralLabeller --> MultitaskObjective	2017-10-26 12:38:23 +02:00
Matthew Honnibal	b6b4f1aaf7	Merge pull request #1462 from explosion/feature/vector-meta-data 💫 Add vector meta data to model meta.json on train/package and show in docs	2017-10-26 11:39:41 +02:00
Matthew Honnibal	35977bdbb9	Update better-parser branch with develop	2017-10-26 00:55:53 +00:00
Ines Montani	090bd00369	Merge pull request #1464 from mayukh18/develop_bengali_pronouns added the bengali pronouns for v2.0	2017-10-25 21:55:25 +02:00
mayukh18	1bc07758fa	added few bengali pronouns	2017-10-25 22:24:40 +05:30
ines	de1e5f35d5	Merge branch 'develop' into feature/disable-pipes	2017-10-25 16:33:12 +02:00
ines	728b609bf9	Merge branch 'develop' into feature/vector-meta-data	2017-10-25 16:32:22 +02:00
ines	c0b55ebdac	Fix PhraseMatcher.__contains__ and add more tests	2017-10-25 16:31:11 +02:00
ines	91beacf5e3	Fix Matcher.__contains__	2017-10-25 16:19:38 +02:00
ines	11e3f19764	Fix vectors data added after training (see #1457 )	2017-10-25 16:08:26 +02:00
ines	057954695b	Read pipeline and vector data off model in --generate-meta	2017-10-25 16:03:26 +02:00
ines	273e638183	Add vector data to model meta after training (see #1457 )	2017-10-25 16:03:05 +02:00
ines	18aae423fb	Remove import of non-existing function	2017-10-25 15:54:10 +02:00
ines	5117a7d24d	Fix whitespace	2017-10-25 15:54:02 +02:00
ines	657a4d91bc	Merge branch 'develop' into feature/disable-pipes	2017-10-25 15:19:05 +02:00
ines	1a722dac31	Merge branch 'develop' into feature/disable-pipes	2017-10-25 15:18:18 +02:00
ines	6a00de4f77	Fix check of unexpected pipe names in restore()	2017-10-25 14:56:35 +02:00
ines	7f03932477	Return self on __enter__	2017-10-25 14:56:16 +02:00
Matthew Honnibal	b5de768852	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-25 14:44:16 +02:00
Matthew Honnibal	094512fd47	Fix model-mark on regression test.	2017-10-25 14:44:00 +02:00
Matthew Honnibal	e70f80f29e	Add Language.disable_pipes()	2017-10-25 13:46:41 +02:00
Matthew Honnibal	075e8118ea	Update from develop	2017-10-25 12:45:21 +02:00
ines	72497c8cb2	Remove comments and add TODO	2017-10-25 12:15:43 +02:00
ines	4d97efc3b5	Add missing docstrings	2017-10-25 12:10:16 +02:00
ines	1262aa0bf9	Implement PhraseMatcher.__contains__	2017-10-25 12:10:04 +02:00
ines	9c733a8849	Implement PhraseMatcher.__len__	2017-10-25 12:09:56 +02:00
ines	7eebeeaf85	Fix Matcher.__contains__	2017-10-25 12:09:47 +02:00
ines	7bcec57462	Remove unused attribute	2017-10-25 12:08:54 +02:00
ines	0b1dcbac14	Remove unused function	2017-10-25 12:08:46 +02:00
ines	3484174e48	Add Language.path	2017-10-25 11:57:43 +02:00
Ines Montani	d3bf488e16	Merge pull request #1171 from mollerhoj/support-danish Improve basic support for Danish	2017-10-24 20:29:57 +02:00
Matthew Honnibal	d9bb1e5de8	Increment version	2017-10-24 17:06:19 +02:00
Matthew Honnibal	908809d488	Update tests	2017-10-24 17:05:15 +02:00
Matthew Honnibal	66766c1454	Restore SP tag to English tag_map, until models migrate	2017-10-24 17:05:00 +02:00
Matthew Honnibal	30e67fa808	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-24 16:08:23 +02:00
Matthew Honnibal	b0f6fd3f1d	Disable tokenizer cache for special-cases. Fixes #1250	2017-10-24 16:08:05 +02:00
Matthew Honnibal	63f0bde749	Add test for #1250 : Tokenizer cache clobbered special-case attrs	2017-10-24 16:07:18 +02:00
ines	8492d5be6d	Always make lemmatizer return a list of lemmas, not a set	2017-10-24 16:00:56 +02:00
ines	95f866f99f	Add lookup argument to Lemmatizer.load	2017-10-24 16:00:56 +02:00
ines	95f6174516	Remove tensorizer from model pipeline example in spacy package	2017-10-24 16:00:56 +02:00
ines	090aed940a	Add test for currently failing span.as_doc case	2017-10-24 16:00:56 +02:00
ines	4ef81a9ebc	Fix whitespace	2017-10-24 16:00:56 +02:00
Matthew Honnibal	18f1c1d0ba	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-24 14:29:43 +02:00
Matthew Honnibal	4bea65a1a8	Fix Issue #1450 : Off-by-1 in * and ? matches Patterns that end in variable-length operators e.g. * and ? now end on the correct token. Previously, they were off by 1: the next token was pulled into the match, even if that's where the pattern failed.	2017-10-24 14:26:27 +02:00
Matthew Honnibal	391d5ef0d1	Normalize imports in regression test	2017-10-24 14:25:49 +02:00
ines	c55db0a4a1	Add example sentences for Japanese and Chinese (see #1107 )	2017-10-24 13:02:24 +02:00
ines	66f8f9d4a0	Fix Japanese tokenizer JapaneseTokenizer now returns a Doc, not individual words	2017-10-24 13:02:19 +02:00
Matthew Honnibal	dd5b2d8fa3	Check for out-of-memory when calling calloc. Closes #1446	2017-10-24 12:40:47 +02:00
Matthew Honnibal	b66b8f028b	Fix #1375 -- out-of-bounds on token.nbor()	2017-10-24 12:10:39 +02:00
Matthew Honnibal	a68d89a4f3	Add failing test for bug #1375 -- no out-of-bounds error for token.nbor()	2017-10-24 12:05:25 +02:00
Ines Montani	facf77e541	Merge branch 'develop' into support-danish	2017-10-24 11:53:19 +02:00
Matthew Honnibal	ccd2ab1a62	Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix Add LCA matrix for spans and docs	2017-10-24 11:22:46 +02:00
Matthew Honnibal	ef3e5a361b	Merge pull request #1442 from explosion/feature/fix-sp 💫Fix SP tag, tweak Vectors.__init__, fix Morphology	2017-10-24 10:24:07 +02:00
Matthew Honnibal	fdf25d10ba	Merge pull request #1440 from ramananbalakrishnan/develop Support single value for attribute list in doc.to_array	2017-10-24 10:23:12 +02:00
Matthew Honnibal	e7556ff048	Fix non-maxout parser	2017-10-23 18:16:23 +02:00
ines	a31f048b4d	Fix formatting	2017-10-23 10:38:06 +02:00
Matthew Honnibal	490ad3eaf0	Check that empty strings are handled. Closes #1242	2017-10-21 00:52:14 +02:00
Matthew Honnibal	8f8bccecb9	Patch deserialisation for invalid loads, to avoid model failure	2017-10-21 00:51:42 +02:00
Ramanan Balakrishnan	d2fe56a577	Add LCA matrix for spans and docs	2017-10-20 23:58:00 +05:30
Matthew Honnibal	d8391b1c4d	Fix #1434 : Matcher failed on ending ? if no token	2017-10-20 16:49:36 +02:00
Matthew Honnibal	fec53f09f7	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-20 16:28:34 +02:00
Matthew Honnibal	f111b228e0	Fix re-parsing of previously parsed text If a Doc object had been previously parsed, it was possible for invalid parses to be added. There were two problems: 1) The parse was only being partially erased 2) The RightArc action was able to create a 1-cycle. This patch fixes both errors, and avoids resetting the parse if one is present. In theory this might allow a better parse to be predicted by running the parser twice. Closes #1253.	2017-10-20 16:27:36 +02:00
Matthew Honnibal	1036798155	Make parser consistent if maxout==1	2017-10-20 16:24:16 +02:00
Matthew Honnibal	3faf9189a2	Make parser hidden shape consistent even if maxout==1	2017-10-20 16:23:31 +02:00
Matthew Honnibal	9010a1a060	Create vectors correctly	2017-10-20 14:19:46 +02:00
Matthew Honnibal	33229b1c9e	Remove print statement	2017-10-20 14:19:29 +02:00
Matthew Honnibal	cfae54c507	Make change to Vectors.__init__	2017-10-20 14:19:04 +02:00
Matthew Honnibal	ebecaddb76	Make 'data_or_width' two keyword args in Vectors.__init__ Previously the data and width options were one argument in Vectors, which meant you couldn't say vectors = Vectors(strings, width=300). It's better to have two keywords.	2017-10-20 14:17:15 +02:00
Matthew Honnibal	49895fbef6	Rename 'SP' special tag to '_SP' Renaming the tag with an underscore lets us add it to the tag map without worrying that we'll change the sequence of tags, which throws off the tag-to-ID mapping. For instance, if we inserted a 'SP' tag, the "VERB" tag is pushed to a different class ID, and the model is all messed up.	2017-10-20 14:01:12 +02:00
Matthew Honnibal	506cf2eb13	Remove cpdef enum, to avoid too much code generation	2017-10-20 14:00:23 +02:00
Matthew Honnibal	6218af0105	Remove cpdef enum, to avoid too much code generation	2017-10-20 13:59:57 +02:00
Matthew Honnibal	92ac9316b5	Fix initialization of vectors, to address serialization problem	2017-10-20 13:59:24 +02:00
Ramanan Balakrishnan	0726946563	cleanup to_array implementation using fixes on master	2017-10-20 17:09:37 +05:30
ines	108f1f786e	Update symbols and document missing token attributes (see #1439 )	2017-10-20 13:08:44 +02:00
ines	4acab77a8a	Add missing symbol for LAW entities (resolves #1427 )	2017-10-20 13:07:57 +02:00
Matthew Honnibal	b101736555	Fix precomputed layer	2017-10-20 12:14:52 +02:00
Ramanan Balakrishnan	b3ab124fc5	Support strings for attribute list in doc.to_array	2017-10-20 11:46:57 +05:30
Matthew Honnibal	64658e02e5	Implement fancier initialisation for precomputed layer	2017-10-20 03:07:45 +02:00
Matthew Honnibal	827cd8a883	Fix support of maxout pieces in parser	2017-10-20 03:07:17 +02:00
Matthew Honnibal	a8850b4282	Remove redundant PrecomputableMaxouts class	2017-10-19 20:27:34 +02:00
Matthew Honnibal	a17a1b60c7	Clean up redundant PrecomputableMaxouts class	2017-10-19 20:26:37 +02:00
Matthew Honnibal	b00d0a2c97	Fix bias in parser	2017-10-19 18:42:11 +02:00
Matthew Honnibal	b54b4b8a97	Make parser_maxout_pieces hyper-param work	2017-10-19 13:45:18 +02:00
Matthew Honnibal	03a215c5fd	Make PrecomputableAffines work	2017-10-19 13:44:49 +02:00
Ramanan Balakrishnan	7b9b1be44c	Support single value for attribute list in doc.to_array	2017-10-19 17:00:41 +05:30
Matthew Honnibal	61bc203f3f	Merge pull request #1438 from explosion/feature/fast-parser 💫 Improve runtime CPU efficiency of parser/NER	2017-10-19 02:42:21 +02:00
Matthew Honnibal	15e5a04a8d	Clean up more depth=0 conditional code	2017-10-19 01:48:43 +02:00
Matthew Honnibal	906c50ac59	Fix loop typing, that caused error on windows	2017-10-19 01:48:39 +02:00
ines	24512420b1	Show error if data_path does not exist or is None (see #1102 )	2017-10-19 00:53:49 +02:00
ines	bf415fd778	Add test for serializing extension attrs (see #1085 )	2017-10-19 00:53:08 +02:00
Matthew Honnibal	960788aaa2	Eliminate dead code in parser, and raise errors for obsolete options	2017-10-19 00:42:34 +02:00
Matthew Honnibal	bbfd7d8d5d	Clean up parser multi-threading	2017-10-19 00:25:21 +02:00
Matthew Honnibal	f018f2030c	Try optimized parser forward loop	2017-10-18 21:48:00 +02:00
Matthew Honnibal	65bf5e85bd	Improve piping in language.pipe	2017-10-18 21:46:12 +02:00
Matthew Honnibal	633a75c7e0	Break parser batches into sub-batches, sorted by length.	2017-10-18 21:45:01 +02:00
Ines Montani	f0d577e460	Merge pull request #1425 from explosion/feature/hindi-tokenizer 💫 Basic Hindi tokenization support	2017-10-18 13:34:52 +02:00
Matthew Honnibal	394633efce	Make doc pickling support hooks	2017-10-17 19:44:09 +02:00
Matthew Honnibal	fe844148f6	Test pickling hooks	2017-10-17 19:43:52 +02:00
Matthew Honnibal	cdb0c426d8	Improve deserialization of user_data, esp. for Underscore	2017-10-17 19:29:20 +02:00
Matthew Honnibal	374819edf8	Test user_data deserialization, re #1085	2017-10-17 19:28:54 +02:00
Matthew Honnibal	e35a83d142	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-17 18:22:06 +02:00
Matthew Honnibal	f45973848c	Rename 'tokens' variable 'doc' in tokenizer	2017-10-17 18:21:41 +02:00
Matthew Honnibal	839de87ca9	Make lambda func a named function, for pickling	2017-10-17 18:21:20 +02:00
Matthew Honnibal	9baa8fe7ec	Convert closure to functools.partial, to promote pickling	2017-10-17 18:20:52 +02:00
Matthew Honnibal	32a8564c79	Fix doc pickling	2017-10-17 18:20:24 +02:00
Matthew Honnibal	8ca97f32a3	Fix doc pickling test	2017-10-17 18:19:57 +02:00
Matthew Honnibal	9ce7d6af87	Make lex attr functions top-level functions, to promote pickling	2017-10-17 18:19:18 +02:00
Matthew Honnibal	1cc85a89ef	Allow reasonably efficient pickling of Language class, using to_bytes() and from_bytes().	2017-10-17 18:18:49 +02:00
Matthew Honnibal	0d57b9748a	Serialize lex_attr_getters with dill, for better pickle support	2017-10-17 18:17:45 +02:00
Matthew Honnibal	45d1dd90b1	Add tests for pickling doc	2017-10-17 17:20:58 +02:00
Ines Montani	afa67de7ee	Merge pull request #1428 from roanuz/develop Fix trailing whitespace and Language.from_disk overwrites	2017-10-17 16:29:15 +02:00
Matthew Honnibal	92c1eb2d6f	Fix Doc pickling. This also removes need for Binder class	2017-10-17 16:11:13 +02:00
Matthew Honnibal	ed8da9b11f	Add missing return statement in SentenceSegmenter	2017-10-17 15:32:56 +02:00
Ines Montani	aab299c8ae	Merge pull request #1429 from vishnunekkanti/develop fix syntax error in zh	2017-10-17 14:45:02 +02:00
Anto Binish Kaspar	534240648e	Fix trailing whitespace on morphology features	2017-10-17 17:15:58 +05:30
Anto Binish Kaspar	8f5b60c168	Fix Language.from_disk overwrites the meta.json file.	2017-10-17 17:15:32 +05:30
ines	8ca344712d	Add Language.has_pipe method	2017-10-17 11:20:07 +02:00
ines	485c4f6df5	Add Hungarian examples (see #1107 )	2017-10-17 02:37:45 +02:00
Matthew Honnibal	19531bad4c	Merge branch 'develop' into feature/streaming-data-memory-growth	2017-10-16 21:44:11 +02:00
Matthew Honnibal	df488274b1	Fix deserialization of vectors	2017-10-16 20:55:00 +02:00
Matthew Honnibal	4018486d31	Merge remote-tracking branch 'origin/develop' into feature/streaming-data-memory-growth	2017-10-16 20:49:48 +02:00
Matthew Honnibal	4174477161	Fix equality check in test	2017-10-16 19:50:35 +02:00
Matthew Honnibal	2bc06e4b22	Bump rolling buffer size to 10k	2017-10-16 19:38:29 +02:00
Matthew Honnibal	66e2eb8f39	Clean up remnant of frozen in StringStore	2017-10-16 19:34:41 +02:00
Matthew Honnibal	a002264fec	Remove caching of Token in Doc, as caused cycle.	2017-10-16 19:34:21 +02:00
Matthew Honnibal	3e037054c8	Remove obsolete is_frozen functionality from StringStore	2017-10-16 19:23:10 +02:00
Matthew Honnibal	5c14f3f033	Create a rolling buffer for the StringStore in Language.pipe()	2017-10-16 19:22:40 +02:00
Matthew Honnibal	59c216196c	Allow weakrefs on Doc objects	2017-10-16 19:22:11 +02:00
ines	d5418553eb	Fix whitespace	2017-10-16 18:30:04 +02:00
ines	6ceadcdb5c	Make sure from_disk passes string to numpy (see #1421 ) If path is a WindowsPath, numpy does not recognise it as a path and as a result, doesn't open the file. https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L369	2017-10-16 18:29:56 +02:00
Matthew Honnibal	010a7309ff	Merge pull request #1402 from explosion/feature/fix-matcher-operators 💫 Fix Matcher variable-length operators	2017-10-16 17:53:19 +02:00
Matthew Honnibal	c29927d2e7	Fix matcher test	2017-10-16 17:22:18 +02:00
Vishnu Kumar Nekkanti	d3c54cf39a	fixed SyntaxError while checking for jieba	2017-10-16 18:51:33 +05:30
Matthew Honnibal	a928ae2f35	Merge branch 'develop' into feature/fix-matcher-operators	2017-10-16 13:38:36 +02:00
Matthew Honnibal	56aa42cc5d	Fix and document matcher operator 'shadowing' behaviour	2017-10-16 13:38:20 +02:00
Matthew Honnibal	748d525801	Add more matcher operator tests	2017-10-16 13:38:01 +02:00
Matthew Honnibal	0433181658	Document operator semantics in Matcher docstring	2017-10-16 12:06:33 +02:00
ines	266e7180a7	Add Language class, stop words and basic stemmer that sets NORM	2017-10-14 14:59:52 +02:00
ines	e85e1d571b	Update base punctuation	2017-10-14 14:59:23 +02:00
ines	9d6c8eaa49	Update base norm exceptions with more unicode characters e.g. unicode variations of punctuation used in Chinese	2017-10-14 14:58:52 +02:00
ines	3516aa0cea	Port over changes from #1389	2017-10-14 13:32:55 +02:00
ines	cd6a29dce7	Port over changes from #1294	2017-10-14 13:28:46 +02:00
ines	38c756fd85	Port over changes from #1287	2017-10-14 13:16:21 +02:00

... 15 16 17 18 19 ...

5706 Commits