spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-29 11:26:28 +03:00

Author	SHA1	Message	Date
ines	842782c128	Move fix_deprecated_glove_vectors_loading to deprecated.py	2017-03-15 17:33:29 +01:00
Matthew Honnibal	4cab8ac136	Update morph exceptions test	2017-03-15 09:31:34 -05:00
Matthew Honnibal	d719f8e77e	Use nogil in parser, and set L1 to 0.0 by default	2017-03-15 09:31:01 -05:00
Matthew Honnibal	c61c501406	Update beam-parser to allow parser to maintain nogil	2017-03-15 09:30:22 -05:00
Matthew Honnibal	3d4e389d23	Whitespace	2017-03-15 09:29:42 -05:00
Matthew Honnibal	7769bc31e3	Add beam-search classes	2017-03-15 09:27:41 -05:00
Matthew Honnibal	c79b3129e3	Fix setting of empty lexeme in initial parse state	2017-03-15 09:26:53 -05:00
Matthew Honnibal	d864708072	Add more morphology names in attrs.pyx	2017-03-15 09:26:16 -05:00
Matthew Honnibal	b382dc902c	Add morph rules in Language	2017-03-15 09:24:40 -05:00
Matthew Honnibal	8dbff4f5f4	Wire up English lemma and morph rules.	2017-03-15 09:23:22 -05:00
Matthew Honnibal	f70be44746	Use lemmatizer in code, not from downloaded model.	2017-03-15 04:52:50 -05:00
ines	42ba740dde	Revert "Merge branch 'debug'" This reverts commit `89b79d1178`, reversing changes made to `02bdf490a1`.	2017-03-13 20:11:52 +01:00
ines	4c5f51e49e	Update regression test	2017-03-13 15:16:11 +01:00
ines	02bdf490a1	Remove regression test to see if it caused pytest Travis error	2017-03-13 13:00:22 +01:00
ines	17018750ac	Add regression test for #717	2017-03-13 12:58:22 +01:00
ines	2883ebfca2	Remove print statement	2017-03-13 12:30:42 +01:00
ines	98c13d8aa9	Add regression test for #401	2017-03-13 12:28:41 +01:00
ines	444d665f9d	Add regression test for #686	2017-03-13 12:23:35 +01:00
ines	46b17e5b51	Add regression test for #719	2017-03-13 12:17:35 +01:00
ines	c8ae682ff9	Add regression test for #636	2017-03-13 12:08:31 +01:00
ines	337f9601f2	Add missing unicode declaration	2017-03-13 12:08:19 +01:00
ines	d70386ec6e	Update docstring in #886 regression test	2017-03-13 12:00:38 +01:00
ines	51ba3ef0a8	Add regression test for #886	2017-03-13 11:44:58 +01:00
ines	eec3f21c50	Add WordNet license	2017-03-12 13:58:24 +01:00
ines	f9e603903b	Rename stop_words.py to word_sets.py and include more sets NUM_WORDS and ORDINAL_WORDS are currently not used, but the hard-coded list should be removed from orth.pyx and replaced to use language-specific functions. This will later allow other languages to use their own functions to set those flags. (In English, this is easier because it only needs to be checked against a set – in German for example, this requires a more complex function, as most number words are one word.)	2017-03-12 13:58:22 +01:00
ines	f24f9b4b7b	Remove unused code	2017-03-12 13:58:22 +01:00
ines	1da29a7146	Use new Lemmatizer data and remove file import Since there's currently only an English lemmatizer, the global Lemmatizer imports from spacy.en. This is unideal and still needs to be fixed.	2017-03-12 13:58:22 +01:00
ines	0957737ee8	Add Python-formatted lemmatizer data and rules	2017-03-12 13:58:22 +01:00
ines	c89e30d1a3	Add test for English time exceptions ("1a.m." etc.)	2017-03-12 13:58:22 +01:00
ines	ce9568af84	Move English time exceptions ("1a.m." etc.) and refactor	2017-03-12 13:58:22 +01:00
ines	6b30541774	Fix formatting	2017-03-12 13:58:22 +01:00
Ines Montani	e97a30b99a	Merge pull request #885 from PySUST/master [Bengali] Spell checked and add new stop words	2017-03-12 13:20:59 +01:00
ines	66c1f194f9	Use consistent unicode declarations	2017-03-12 13:07:28 +01:00
shuvanon	91cb4cdb2b	Sort stop_words	2017-03-12 17:55:51 +06:00
shuvanon	784f6cfa49	Update stop_words	2017-03-12 17:41:01 +06:00
shuvanon	73cc17078e	Merge branch 'master' of https://github.com/PySUST/spaCy	2017-03-12 14:52:17 +06:00
shuvanon	35ec7135bb	Spell checked and add new stop words	2017-03-12 14:51:34 +06:00
Em	9c809efc25	Removed mapStr	2017-03-11 16:23:26 -08:00
Matthew Honnibal	fa23278ee3	Add classes for beam parser and beam NER	2017-03-11 12:45:37 -06:00
Matthew Honnibal	6c4108c073	Add header for beam parser	2017-03-11 12:45:12 -06:00
Matthew Honnibal	4382f175b3	Squelch compiler warnings	2017-03-11 12:44:43 -06:00
Matthew Honnibal	ea2592879f	Merge branch 'master' of https://github.com/explosion/spaCy	2017-03-11 11:13:37 -06:00
Matthew Honnibal	1224c4d3c6	Improve output on trainer	2017-03-11 11:12:48 -06:00
Matthew Honnibal	b438dfd3f3	Add itn argument to tagger.update	2017-03-11 11:12:21 -06:00
Matthew Honnibal	931feb3360	Allow beam parsing for NER	2017-03-11 11:12:01 -06:00
Matthew Honnibal	f77a5bb60a	Switch back to greedy parser	2017-03-11 11:11:30 -06:00
Matthew Honnibal	ca9c8c57c0	Add iteration argument to parser.update	2017-03-11 07:00:47 -06:00
Matthew Honnibal	dcce9ca3f3	Use beam parser	2017-03-11 07:00:20 -06:00
Matthew Honnibal	e30ffdd003	Use ftrl optimizer in tagger	2017-03-11 06:59:13 -06:00
Matthew Honnibal	d59c6926c1	I think this fixes the segfault	2017-03-11 06:58:34 -06:00
Matthew Honnibal	318b9e32ff	WIP on beam parser. Currently segfaults.	2017-03-11 06:19:52 -06:00
Em	426d17167f	Added string manipulation for spans	2017-03-10 16:50:02 -08:00
Matthew Honnibal	b0d80dc9ae	Update name of 'train' function in BeamParser	2017-03-10 14:35:43 -06:00
Matthew Honnibal	d11f1a4ddf	Record negative costs in non-monotonic arc eager oracle	2017-03-10 11:22:04 -06:00
Matthew Honnibal	ecf91a2dbb	Support beam parser	2017-03-10 11:21:21 -06:00
Ines Montani	a16aff17aa	Merge pull request #876 from PySUST/master [Bangla] Update "tokenizer_exceptions.py"	2017-03-10 14:46:00 +01:00
ines	10e29189ac	Adjust URL testcases and xfail problems (instead of comment)	2017-03-10 14:22:50 +01:00
ines	b04893a059	Make regex locale-independent for Python 2	2017-03-10 14:21:57 +01:00
Matthew Honnibal	ea53647362	Merge branch 'develop'	2017-03-10 02:49:39 -06:00
Ines Montani	1c40890321	Add missing comma Should fix Travis build error	2017-03-10 09:34:54 +01:00
Shuvanon Razik	c251703428	Update abbreviations	2017-03-10 10:45:01 +06:00
Matthew Honnibal	b5247c49eb	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-03-09 18:45:43 -06:00
Matthew Honnibal	798450136d	Set L1 penalty to 0 in tagger.	2017-03-09 18:43:47 -06:00
Matthew Honnibal	c62da02344	Use ftrl training, to learn compressed model.	2017-03-09 18:43:21 -06:00
Matthew Honnibal	f71eeef9bb	Pass path argument to end_training	2017-03-09 18:42:40 -06:00
Dan Rapp	123d3f2d38	Fix error in test case parameterization	2017-03-09 12:18:21 -07:00
Dan Rapp	b9307dfcd7	Merge branch 'master' into rappdw/tokenizer_exceptions_url_fix	2017-03-09 11:42:14 -07:00
Dan Rapp	3b1df3808d	Issue #840 - URL pattenr too broad	2017-03-09 11:39:39 -07:00
Matthew Honnibal	5b0b968d13	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-03-08 15:03:10 +01:00
Matthew Honnibal	0ac3d27689	Fix handling of trailing whitespace Fix off-by-one error that meant trailing spaces were being dropped. Closes #792	2017-03-08 15:01:40 +01:00
ines	c2e3e651b8	Re-add regression test for #859	2017-03-08 14:36:09 +01:00
Matthew Honnibal	0a6d7ca200	Fix spacing after token_match The boolean flag indicating a space after the token was being set incorrectly after the token_match regex was applied. Fixes #859.	2017-03-08 14:33:32 +01:00
shuvanon	85438aee1b	update tokenizertokenizer	2017-03-08 17:29:39 +06:00
shuvanon	45bc78461c	update tokenizertokenizer	2017-03-08 17:27:12 +06:00
Matthew Honnibal	cd33b39a04	Fix 2/3 problem for json save/load	2017-03-08 01:39:13 +01:00
Matthew Honnibal	40703988bc	Use FTRL training in parser	2017-03-08 01:38:51 +01:00
Matthew Honnibal	d108534dc2	Fix 2/3 problems for training	2017-03-08 01:37:52 +01:00
Matthew Honnibal	d03d6a13f1	Merge branch 'rominf-ud20' into develop	2017-03-07 21:48:56 +01:00
Matthew Honnibal	f7374d0b86	Merge branch 'ud20' of https://github.com/rominf/spaCy into rominf-ud20	2017-03-07 21:48:37 +01:00
Matthew Honnibal	16670d3251	Xfail the vocab pickling for now	2017-03-07 21:43:28 +01:00
Matthew Honnibal	a89c3500f6	Fixes to hacky vocab pickling	2017-03-07 20:58:55 +01:00
Matthew Honnibal	d814892805	Hackish pickle support for Vocab.	2017-03-07 20:25:12 +01:00
Matthew Honnibal	26614e028f	Add hacky support for StringCFile, to make pickling easier.	2017-03-07 20:24:37 +01:00
Matthew Honnibal	3edb8ae207	Whitespace	2017-03-07 17:16:26 +01:00
Matthew Honnibal	5de7e712b7	Add support for pickling StringStore.	2017-03-07 17:15:18 +01:00
Matthew Honnibal	4e75e74247	Update regression test for variable-length pattern problem in the matcher.	2017-03-07 16:08:32 +01:00
Matthew Honnibal	6d67213b80	Add test for 850: Matcher fails on zero-or-more.	2017-03-07 15:55:28 +01:00
Aniruddha Adhikary	696215a3fb	add tests for Bengali	2017-03-05 11:25:12 +06:00
Aniruddha Adhikary	8f3bfe9bfc	[Bengali] basic tag map, morph, lemma rules and exceptions	2017-03-04 12:36:59 +06:00
Roman Inflianskas	66e1109b53	Add support for Universal Dependencies v2.0	2017-03-03 13:17:34 +01:00
ines	8dff040032	Revert "Add regression test for #859 " This reverts commit `c4f16c66d1`.	2017-03-01 21:56:20 +01:00
Juan Miguel Cejuela	25c29f072d	apply patch	2017-03-01 21:44:17 +01:00
Juan Miguel Cejuela	a8cfde46d3	#781 Fix test — colocalizes is lemmatized to colocaliz and colicalize	2017-03-01 21:43:08 +01:00
Juan Miguel Cejuela	a471114eb2	#781 add regression test, failing previous bug fix	2017-03-01 21:30:51 +01:00
ines	c4f16c66d1	Add regression test for #859	2017-03-01 16:07:27 +01:00
Aniruddha Adhikary	d91be7aed4	add punctuations for Bengali	2017-02-28 21:07:14 +06:00
Aniruddha Adhikary	5a4fc09576	add basic Bengali support	2017-02-28 07:48:37 +06:00
Matthew Honnibal	cc9b2b74e3	Merge branch 'french-tokenizer-exceptions'	2017-02-27 11:44:39 +01:00
Matthew Honnibal	bd4375a2e6	Remove comment	2017-02-27 11:44:26 +01:00
Matthew Honnibal	e7e22d8be6	Move import within get_exceptions() function, to speed import	2017-02-27 11:34:48 +01:00
Matthew Honnibal	34bcc8706d	Merge branch 'french-tokenizer-exceptions'	2017-02-27 11:21:21 +01:00
Matthew Honnibal	0aaa546435	Fix test after updating the French tokenizer stuff	2017-02-27 11:20:47 +01:00
Matthew Honnibal	26446aa728	Avoid loading all French exceptions on import Move exceptions loading behind a get_tokenizer_exceptions() function for French, instead of loading into the top-level namespace. This cuts import times from 0.6s to 0.2s, at the expense of making the French data a little different from the others (there's no top-level TOKENIZER_EXCEPTIONS variable.) The current solution feels somewhat unsatisfying.	2017-02-25 11:55:00 +01:00
ines	376c5813a7	Remove print statements from test	2017-02-24 18:26:32 +01:00
ines	7c1260e98c	Add regression test	2017-02-24 18:22:49 +01:00
ines	0e2e331b58	Convert exceptions to Python list	2017-02-24 18:22:40 +01:00
ines	51eb190ef4	Remove print statements from test	2017-02-24 17:41:12 +01:00
Matthew Honnibal	db5ada3995	Merge branch 'master' of https://github.com/explosion/spaCy	2017-02-24 14:28:12 +01:00
Matthew Honnibal	8f94897d07	Add 1 operator to matcher, and make sure open patterns are closed at end of document. Closes Issue #766	2017-02-24 14:27:02 +01:00
ines	67991b6e5f	Add more test cases to #775 regression test to cover #847	2017-02-18 14:10:44 +01:00
ines	30ce2a6793	Exclude "shed" and "Shed" from tokenizer exceptions (see #847 )	2017-02-18 14:10:44 +01:00
Ines Montani	de997c1a33	Merge pull request #842 from magnusburton/master Added regular verb rules for Swedish	2017-02-17 11:18:20 +01:00
Magnus Burton	41fcfd06b8	Added regular verb rules for Swedish	2017-02-17 10:04:04 +01:00
ines	aa92d4e9b5	Fix unicode regex for Python 2 (see #834 )	2017-02-16 23:49:54 +01:00
ines	44de3c7642	Reformat test and use text_file fixture	2017-02-16 23:49:19 +01:00
ines	3dd22e9c88	Mark vectors test as xfail (temporary)	2017-02-16 23:28:51 +01:00
ines	85d249d451	Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834 )"" This reverts commit `ea05f78660`.	2017-02-16 23:26:25 +01:00
ines	ea05f78660	Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834 )" This reverts commit `7d8c9eee7f`, reversing changes made to `f6b69babcc`.	2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque	06a71d22df	Fix test failure by using unicode literals	2017-02-16 14:48:00 +01:00
Raphaël Bournhonesque	3ba109622c	Add regression test with non ' ' space character as token	2017-02-16 12:23:27 +01:00
Raphaël Bournhonesque	e17dc2db75	Remove useless import	2017-02-16 12:10:24 +01:00
Raphaël Bournhonesque	3fd2742649	load_vectors should accept arbitrary space characters as word tokens Fix bug #834	2017-02-16 12:08:30 +01:00
ines	f08e180a47	Make groups non-capturing Prevents hitting the 100 named groups limit in Python	2017-02-10 13:35:02 +01:00
ines	fa3b8512da	Use consistent imports and exports Bundle everything in language_data to keep it consistent with other languages and make TOKENIZER_EXCEPTIONS importable from there.	2017-02-10 13:34:09 +01:00
ines	21f09d10d7	Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"" This reverts commit `f02a2f9322`.	2017-02-10 13:17:05 +01:00
ines	f02a2f9322	Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions" This reverts commit `b95afdf39c`, reversing changes made to `b0ccf32378`.	2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque	309da78bf0	Merge branch 'master' into tokenizer_exceptions	2017-02-09 16:32:12 +01:00
Raphaël Bournhonesque	4ce0bbc6b6	Update unit tests	2017-02-09 16:30:43 +01:00
Raphaël Bournhonesque	5d706ab95d	Merge tokenizer exceptions from PR #802	2017-02-09 16:30:28 +01:00
ines	654fe447b1	Add Swedish tokenizer tests (see #807 )	2017-02-05 11:47:07 +01:00
ines	6715615d55	Add missing EXC variable and combine tokenizer exceptions	2017-02-05 11:42:52 +01:00
Ines Montani	30a52d576b	Merge pull request #807 from magnusburton/master Added swedish lemma rules and more verb contractions	2017-02-05 11:34:19 +01:00
Magnus Burton	19c0ce745a	Added swedish lemma rules	2017-02-04 17:53:32 +01:00
Michael Wallin	d25556bf80	[issue 805] Fix issue	2017-02-04 16:22:21 +02:00
Michael Wallin	35100c8bdd	[issue 805] Add regression test and the required fixture	2017-02-04 16:21:34 +02:00
ines	0ab353b0ca	Add line breaks to Finnish stop words for better readability	2017-02-04 13:40:25 +01:00
Michael Wallin	1a1952afa5	[finnish] Add initial tests for tokenizer	2017-02-04 13:54:10 +02:00
Michael Wallin	f9bb25d1cf	[finnish] Reformat and correct stop words	2017-02-04 13:54:10 +02:00
Michael Wallin	73f66ec570	Add preliminary support for Finnish	2017-02-04 13:54:10 +02:00
Ines Montani	65d6202107	Merge pull request #802 from Tpt/fr-tokenizer Adds more French tokenizer exceptions	2017-02-03 10:52:20 +01:00
Tpt	75a74857bb	Adds more French tokenizer exceptions	2017-02-03 13:45:18 +04:00
Ines Montani	afc6365388	Update regression test for #801 to match current expected behaviour	2017-02-02 16:23:05 +01:00
Ines Montani	012f4820cb	Keep infixes of punctuation + hyphens as one token (see #801 )	2017-02-02 16:22:40 +01:00
Ines Montani	1219a5f513	Add = to tokenizer prefixes	2017-02-02 16:21:11 +01:00
Ines Montani	ff04748eb6	Add missing emoticon	2017-02-02 16:21:00 +01:00
Ines Montani	13a4ab37e0	Add regression test for #801	2017-02-02 15:33:52 +01:00
Raphaël Bournhonesque	85f951ca99	Add tokenizer exceptions for French	2017-02-02 08:36:16 +01:00
Matvey Ezhov	32a22291bc	Small `Doc.count_by` documentation update Current example doesn't work	2017-01-31 19:18:45 +03:00
Ines Montani	e4875834fe	Fix formatting	2017-01-31 15:19:33 +01:00
Ines Montani	c304834e45	Add missing import	2017-01-31 15:18:30 +01:00
Ines Montani	e6465b9ca3	Parametrize test cases and mark as xfail	2017-01-31 15:14:42 +01:00
latkins	e4c84321a5	Added regression test for Issue #792 .	2017-01-31 13:47:42 +00:00
Matthew Honnibal	6c665b81df	Fix redundant == TAG in from_array conditional	2017-01-31 00:46:21 +11:00
Ines Montani	19501f3340	Add regression test for #775	2017-01-25 13:16:52 +01:00
Ines Montani	209c37bbcf	Exclude "shell" and "Shell" from English tokenizer exceptions (resolves #775 )	2017-01-25 13:15:02 +01:00
Raphaël Bournhonesque	1be9c0e724	Add fr tokenization unit tests	2017-01-24 10:57:37 +01:00
Raphaël Bournhonesque	1faaf698ca	Add infixes and abbreviation exceptions (fr)	2017-01-24 10:57:37 +01:00
Raphaël Bournhonesque	cf8474401b	Remove unused import statement	2017-01-24 10:57:37 +01:00
Raphaël Bournhonesque	902f136f18	Add support for elision in French	2017-01-24 10:57:37 +01:00
Ines Montani	55c9c62abc	Use relative import	2017-01-23 21:27:49 +01:00
Ines Montani	0967eb07be	Add regression test for #768	2017-01-23 21:25:46 +01:00
Ines Montani	6baa98f774	Merge pull request #769 from raphael0202/spacy-768 Allow zero-width 'infix' token	2017-01-23 21:24:33 +01:00
Raphaël Bournhonesque	dce8f5515e	Allow zero-width 'infix' token	2017-01-23 18:28:01 +01:00
Ines Montani	5f6f48e734	Add regression test for #759	2017-01-20 15:11:48 +01:00
Ines Montani	09ecc39b4e	Fix multi-line string of NUM_WORDS (resolves #759 )	2017-01-20 15:11:48 +01:00
Magnus Burton	69eab727d7	Added loops to handle contractions with verbs	2017-01-19 14:08:52 +01:00
Matthew Honnibal	be26085277	Fix missing import Closes #755	2017-01-19 22:03:52 +11:00
Ines Montani	7e36568d5b	Fix title to accommodate sputnik	2017-01-17 00:51:09 +01:00
Ines Montani	d704cfa60d	Fix typo	2017-01-16 21:30:33 +01:00
Ines Montani	64e142f460	Update about.py	2017-01-16 14:23:08 +01:00
Matthew Honnibal	e889cd698e	Increment version	2017-01-16 14:01:35 +01:00
Matthew Honnibal	e7f8e13cf3	Make Token hashable. Fixes #743	2017-01-16 13:27:57 +01:00
Matthew Honnibal	2c60d0cb1e	Test #743 : Tokens unhashable.	2017-01-16 13:27:26 +01:00
Matthew Honnibal	48c712f1c1	Merge branch 'master' of ssh://github.com/explosion/spaCy	2017-01-16 13:18:06 +01:00
Matthew Honnibal	7ccf490c73	Increment version	2017-01-16 13:17:58 +01:00
Ines Montani	50878ef598	Exclude "were" and "Were" from tokenizer exceptions and add regression test (resolves #744 )	2017-01-16 13:10:38 +01:00
Ines Montani	e053c7693b	Fix formatting	2017-01-16 13:09:52 +01:00
Ines Montani	116c675c3c	Merge pull request #742 from oroszgy/hu_tokenizer_fix Improved Hungarian tokenizer	2017-01-14 23:52:44 +01:00
Gyorgy Orosz	92345b6a41	Further numeric test.	2017-01-14 22:44:19 +01:00
Gyorgy Orosz	b4df202bfa	Better error handling	2017-01-14 22:24:58 +01:00
Gyorgy Orosz	b03a46792c	Better error handling	2017-01-14 22:09:29 +01:00
Gyorgy Orosz	a45f22913f	Added further abbreviations present in the Szeged corpus	2017-01-14 22:08:55 +01:00
Ines Montani	332ce2d758	Update README.md	2017-01-14 21:12:11 +01:00
Gyorgy Orosz	9505c6a72b	Passing all old tests.	2017-01-14 20:39:21 +01:00
Gyorgy Orosz	63037e79af	Fixed hyphen handling in the Hungarian tokenizer.	2017-01-14 16:30:11 +01:00
Gyorgy Orosz	f77c0284d6	Maintaining compatibility with other spacy tokenizers.	2017-01-14 16:19:15 +01:00
Gyorgy Orosz	be7a7aeb1a	Reversed accidental changes.	2017-01-14 15:59:36 +01:00
Gyorgy Orosz	1be5da1ac6	Fixed Hungarian tokenizer for numbers	2017-01-14 15:51:59 +01:00
Ines Montani	a89e269a5a	Fix test formatting and consistency	2017-01-14 13:41:19 +01:00
Ines Montani	3424e3a7e5	Update README.md	2017-01-13 15:54:54 +01:00
Ines Montani	49186b34a1	Mark lemmatizer tests as models since they use installed data	2017-01-13 15:12:07 +01:00
Ines Montani	138deb80a1	Modernise vector tests, use add_vecs_to_vocab and don't depend on models	2017-01-13 15:12:07 +01:00
Ines Montani	96f0caa28a	Fix test name for consistency	2017-01-13 15:12:07 +01:00
Ines Montani	dc2bb1259f	Add util function to add vectors to vocab	2017-01-13 15:12:07 +01:00
Ines Montani	db9b25663d	Reformat add_docs_equal and add docstring	2017-01-13 15:12:07 +01:00
Ines Montani	62ce0a0073	Add README.md to tests to explain organisation and conventions	2017-01-13 15:11:18 +01:00
Ines Montani	38d60f6b90	Modernise serializer I/O tests and don't depend on models where possible	2017-01-13 02:24:56 +01:00
Ines Montani	4bb5b89ee4	Add text_file_b fixture using BytesIO	2017-01-13 02:23:50 +01:00
Ines Montani	49febd8c62	Modernise noun chunks tests and don't depend on models	2017-01-13 02:01:00 +01:00
Ines Montani	3ee97b5686	Rename test_parser to test_noun_chunks	2017-01-13 01:36:33 +01:00
Ines Montani	a308703f47	Remove old tests	2017-01-13 01:34:48 +01:00
Ines Montani	12eb8edf26	Move parser tests from unit to parser	2017-01-13 01:34:38 +01:00
Ines Montani	138c53ff2e	Merge tokenizer tests	2017-01-13 01:34:14 +01:00
Ines Montani	01f36ca3ff	Move attrs tests from unit to root and modernise	2017-01-13 01:33:50 +01:00
Ines Montani	3610d27967	Move alignment tests from munge to gold and modernise	2017-01-13 01:33:31 +01:00
Ines Montani	094ff7396a	Reformat and rename Pragmatic Segmenter tests and mark xfails	2017-01-13 01:30:20 +01:00
Ines Montani	affcf1b19d	Modernise lemmatizer tests	2017-01-12 23:41:17 +01:00
Ines Montani	33d9cf87f9	Modernise tagger tests and fix xpassing test	2017-01-12 23:40:52 +01:00
Ines Montani	33e5f8dc2e	Create basic and extended test set for URLs	2017-01-12 23:40:02 +01:00
Ines Montani	5e4f5ebfc8	Modernise BILUO tests	2017-01-12 23:39:18 +01:00
Ines Montani	09acfbca01	Add Lemmatizer fixture	2017-01-12 23:38:55 +01:00
Ines Montani	514bfa2597	Add path fixture for spaCy data path	2017-01-12 23:38:47 +01:00
Ines Montani	0894b8c0ef	Don't split tokens with digits and "/" infixes (resolves #740 )	2017-01-12 22:58:26 +01:00
Ines Montani	e9e99a5670	Add regression test for #740	2017-01-12 22:57:38 +01:00
Ines Montani	6935d55409	Fix formatting	2017-01-12 22:56:20 +01:00
Ines Montani	5f0d196a31	Modernise and merge matcher tests	2017-01-12 22:23:11 +01:00
Ines Montani	d5d774413a	Update comments on EN and DE fixtures	2017-01-12 22:03:07 +01:00
Ines Montani	9b4bea1df9	Tidy up and rename regression tests and remove unnecessary imports	2017-01-12 22:00:37 +01:00
Ines Montani	5e1b6178e3	Fix formatting and consistency	2017-01-12 22:00:06 +01:00
Ines Montani	a3fd32455e	Remove redundant language loading integration tests	2017-01-12 21:59:48 +01:00
Ines Montani	61f1ca09c2	Modernise serializer codecs tests	2017-01-12 21:58:55 +01:00
Ines Montani	5dbc6e59f6	Modernise Huffman tests	2017-01-12 21:58:40 +01:00
Ines Montani	edeeeccea5	Modernise packer tests and don't depend on models where possible	2017-01-12 21:58:07 +01:00
Ines Montani	d084676cd0	Modernise and merge serialization tests	2017-01-12 21:57:19 +01:00
Ines Montani	442237787c	Add assert_docs_equal util to compare two docs	2017-01-12 21:56:52 +01:00
Ines Montani	eac3f700fb	Add fixture for entity recognizer	2017-01-12 21:56:32 +01:00
Ines Montani	b438cfddbc	Modernise matcher tests and split into two files	2017-01-12 17:51:46 +01:00
Ines Montani	27482ebed8	Move matcher tests for #188 and #242 to regression tests Modernise tests and remove unnecessary imports	2017-01-12 17:33:57 +01:00
Ines Montani	0a4dc632bd	Update test to not create redundant Doc object	2017-01-12 17:33:18 +01:00
Ines Montani	a2526e66d8	Fix formatting, naming and unicode declaration	2017-01-12 16:51:13 +01:00
Ines Montani	052cdff07d	Modernise vector similarity tests	2017-01-12 16:51:13 +01:00
Ines Montani	bd20ec0a6a	Add get_cosine util function	2017-01-12 16:51:13 +01:00
Ines Montani	51ef75f629	Fix regression test for #615 and remove unnecessary imports	2017-01-12 16:51:12 +01:00
Ines Montani	aeb747e10c	Adjust formatting	2017-01-12 16:51:12 +01:00
Ines Montani	8e3e58a7e6	Modernise and merge lexeme vocab tests	2017-01-12 16:51:12 +01:00
Ines Montani	c3d4516fc2	Move test for #361 to regression tests	2017-01-12 16:51:12 +01:00
Daniel Hershcovich	99eb494a82	Fix #737 : support loading word vectors with " " as a word	2017-01-12 17:00:14 +02:00
Ines Montani	7cb3d74426	Modernise span tests and don't depend on models	2017-01-12 15:30:49 +01:00
Ines Montani	92e3d8b3ee	Modernise vocab API tests and remove old xfailing tests	2017-01-12 15:27:46 +01:00
Ines Montani	7ea87684cd	Rename test_vocab.py to test_vocab_api.py	2017-01-12 15:12:21 +01:00
Ines Montani	0da2ee5c68	Merge flag features tests into orth tests in tests root	2017-01-12 15:12:00 +01:00
Ines Montani	03c136cfd3	Remove StringStore tests from vocab tests	2017-01-12 15:11:15 +01:00
Ines Montani	d7bd57abdf	Modernise add vectors vocab test	2017-01-12 15:09:49 +01:00
Ines Montani	89525ef345	Use consistent test names	2017-01-12 15:09:21 +01:00
Ines Montani	f8803808ce	Remove old unused tests and conftest files	2017-01-12 15:09:05 +01:00
Ines Montani	4d0bfebcd9	Move Pragmatic Segmenter test cases (currently unused) to parser tests	2017-01-12 15:08:02 +01:00
Ines Montani	26d018d874	Add tests for StringStore	2017-01-12 15:07:31 +01:00
Ines Montani	9b6784bab5	Add fixture for StringStore	2017-01-12 15:05:40 +01:00
Ines Montani	99d66d613a	Modernise tests for merging spans and don't depend on models	2017-01-12 12:26:26 +01:00
Ines Montani	fa8f67596d	Remove unused old test	2017-01-12 12:26:08 +01:00
Ines Montani	359f73a96b	Move test for #54 to regression tests	2017-01-12 12:25:51 +01:00
Ines Montani	3f3a46722c	Remove unused conftest	2017-01-12 12:25:24 +01:00
Ines Montani	c2406e92bc	Allow setting ents in get_doc	2017-01-12 12:25:10 +01:00
Ines Montani	c5914c6fe5	Fix and pass regression test for #736	2017-01-12 11:48:56 +01:00
Matthew Honnibal	4e48862fa8	Remove print statement	2017-01-12 11:25:39 +01:00
Matthew Honnibal	d1d8214767	Increment version	2017-01-12 11:21:57 +01:00
Matthew Honnibal	fba67fa342	Fix Issue #736 : Times were being tokenized with incorrect string values.	2017-01-12 11:21:01 +01:00
Ines Montani	a6790b6694	Rename tags to pos in get_doc and allow adding tags to tokens	2017-01-12 11:18:36 +01:00
Ines Montani	1add8ace67	Merge lemmatizer tests	2017-01-12 11:16:53 +01:00
Ines Montani	3bc082abdf	Modernise morph exceptions test and don't depend on models	2017-01-12 11:14:29 +01:00
Ines Montani	ec7739b76e	Add regression test for #736	2017-01-12 11:12:44 +01:00
Ines Montani	6c1c564891	Move language-specific tests out of redundant tokenizer directories	2017-01-12 02:17:18 +01:00
Ines Montani	8fecedac3a	Tidy up	2017-01-12 02:16:37 +01:00
Ines Montani	ae7edd30e7	Move text file back to tokenizer tests directory	2017-01-12 02:10:23 +01:00
Ines Montani	ffcaba9017	Remove old and/or redundant tests	2017-01-12 02:10:18 +01:00
Ines Montani	19c4132097	Modernise space attachment parser tests and don't depend on models	2017-01-12 01:54:44 +01:00
Ines Montani	69778924c8	Modernise and merge parser tests and don't depend on models	2017-01-12 01:07:29 +01:00
Ines Montani	178c147612	Modernise nonprojectivity tests and don't depend on models	2017-01-12 01:06:36 +01:00
Ines Montani	1a3984742c	Modernise sentence boundary detection tests and don't depend on models (where possible)	2017-01-11 23:53:08 +01:00
Ines Montani	0cdb6ea61d	Remove old unused pickle test	2017-01-11 23:52:28 +01:00
Ines Montani	c9671329dc	Move test for #309 to regression tests	2017-01-11 23:52:13 +01:00
Ines Montani	d0e37b5670	Modernise parser tests and don't depend on models	2017-01-11 21:30:27 +01:00
Ines Montani	342cb41782	Add apply_transition_sequence util function to utils	2017-01-11 21:30:14 +01:00
Ines Montani	09807addff	Add en_parser fixture	2017-01-11 21:29:59 +01:00
Ines Montani	55d151aa61	Modernise Doc parse tree navigation tests and don't depend on models	2017-01-11 21:14:15 +01:00
Ines Montani	7262421bb2	Use consistent test names	2017-01-11 19:00:52 +01:00
Ines Montani	33800c9367	Rename "tokens" tests to "doc"	2017-01-11 18:59:01 +01:00
Ines Montani	3a9c6a9563	Remove old unused files	2017-01-11 18:58:38 +01:00
Ines Montani	8e962de39f	Remove old word vector tests	2017-01-11 18:55:08 +01:00
Ines Montani	e027936920	Modernise Doc noun chunks tests	2017-01-11 18:54:56 +01:00
Ines Montani	439f396acd	Modernise Doc array tests and don't depend on models	2017-01-11 18:54:46 +01:00
Ines Montani	05447be884	Modernise test for adding entities	2017-01-11 18:54:24 +01:00
Ines Montani	6e883f4c00	Modernise Doc API tests and don't depend on models	2017-01-11 18:05:36 +01:00
Ines Montani	8bf3bb5c44	Make words optional for get_doc	2017-01-11 18:05:10 +01:00
Ines Montani	928db7e419	Fix StringIO import for Python 3	2017-01-11 14:07:48 +01:00
Ines Montani	69998f216b	Rename test_tokens_api.py to test_doc_api.py	2017-01-11 13:58:56 +01:00
Ines Montani	d94dea1b18	Merge token tests into token API tests	2017-01-11 13:57:02 +01:00
Ines Montani	eb23424ab0	Modernise token API tests and don't depend on loading models	2017-01-11 13:56:54 +01:00
Ines Montani	c682b8ca90	Merge conftests into one cohesive file	2017-01-11 13:56:32 +01:00
Ines Montani	909f24d7df	Add test utils and get_doc helper function Create Doc object from given vocab, words and annotations to allow tests not to depend on loading the models.	2017-01-11 13:55:33 +01:00
Matthew Honnibal	e12c90e03f	Merge branch 'master' of ssh://github.com/explosion/spaCy	2017-01-11 13:03:51 +01:00
Matthew Honnibal	12cd27b821	Amend 8ae8b443f: Handle comparison with None tokens.	2017-01-11 13:03:32 +01:00
Daniel Hershcovich	8e603cc917	Avoid "True if ... else False"	2017-01-11 11:18:22 +02:00
Matthew Honnibal	44e2b0100d	Support TAG attribute in doc.from_array	2017-01-10 22:47:07 +01:00
Ines Montani	3e6e1f0251	Tidy up regression tests	2017-01-10 19:24:10 +01:00
Magnus Burton	aad23ab0b4	Supplemented with capitalized Swedish exceptions	2017-01-10 16:07:20 +01:00
Ines Montani	869963c3c4	Mark extensive prefix/suffix tests as slow	2017-01-10 15:57:35 +01:00
Ines Montani	487e020ebe	Add simple test for surrounding brackets	2017-01-10 15:57:26 +01:00
Ines Montani	0ba5cf51d2	Assert length first	2017-01-10 15:57:00 +01:00
Ines Montani	2185d31907	Adjust names and formatting	2017-01-10 15:56:35 +01:00
Ines Montani	e10d4ca964	Remove semi-redundant URLs and punctuation for faster testing	2017-01-10 15:54:25 +01:00
Ines Montani	3a3cb2c90c	Add unicode declaration	2017-01-10 15:53:15 +01:00
Matthew Honnibal	0f9b8a00a5	Unbreak data download	2017-01-09 23:40:26 +01:00
Matthew Honnibal	8ae8b443f1	Add richcmp method to Token. Closes #631	2017-01-09 19:30:31 +01:00
Matthew Honnibal	64f747cb65	Token comparison test	2017-01-09 19:12:00 +01:00
Matthew Honnibal	18c3c2d05c	Add tests for token comparison, re Issue #631	2017-01-09 19:09:59 +01:00
Matthew Honnibal	97a1286129	Revert changes to tagger and parser for thinc 6	2017-01-09 10:08:34 -06:00
Matthew Honnibal	95a52005df	Revert "Fix Issue #683 : Add 'SP' to tag_map, if it's not there already, within the Morphology class." This reverts commit `40e71586d6`.	2017-01-09 09:55:55 -06:00
Ines Montani	363f09e68c	Merge pull request #726 from magnusburton/master Added Swedish abbreviations as token exceptions	2017-01-09 14:58:15 +01:00
Matthew Honnibal	42cd598f57	Use correct fixtures in URL tokenizer	2017-01-09 14:10:40 +01:00
Matthew Honnibal	d9a77ddf14	Return None for data path if it doesn't exist	2017-01-09 14:10:05 +01:00
Matthew Honnibal	e4862d1dab	Merge branch 'develop'	2017-01-09 13:36:01 +01:00
Ines Montani	aa876884f0	Revert "Revert "Merge remote-tracking branch 'origin/master'"" This reverts commit `fb9d3bb022`.	2017-01-09 13:28:13 +01:00
Ines Montani	d5c72c40eb	Remove old tests for old website example code	2017-01-08 22:28:53 +01:00
Ines Montani	eef94e3ee2	Split off period after two or more uppercase letters (fixes #483 )	2017-01-08 22:28:25 +01:00
Ines Montani	a89a6000e5	Remove unused import	2017-01-08 22:17:37 +01:00
Ines Montani	5d28664fc5	Don't test Hungarian for numbers and hyphens for now Reinvestigate behaviour of case affixes given reorganised tokenizer patterns.	2017-01-08 20:45:40 +01:00
Ines Montani	53362b6b93	Reorganise Hungarian prefixes/suffixes/infixes Use global prefixes and suffixes for non-language-specific rules, import list of alpha unicode characters and adjust regexes.	2017-01-08 20:40:33 +01:00
Ines Montani	347c4a2d06	Reorganise and reformat global tokenizer prefixes, suffixes and infixes	2017-01-08 20:37:39 +01:00
Ines Montani	0dec90e9f7	Use global abbreviation data languages and remove duplicates	2017-01-08 20:36:00 +01:00
Ines Montani	7c3cb2a652	Add global abbreviations data	2017-01-08 20:34:03 +01:00
Ines Montani	de5aa92bc2	Handle deprecated tokenizer prefix data	2017-01-08 20:33:28 +01:00
Ines Montani	abb09782f9	Move sun.txt to original location and fix path to not break parser tests	2017-01-08 20:32:54 +01:00
Ines Montani	cab39c59c5	Add missing contractions to English tokenizer exceptions Inspired by https://github.com/kootenpv/contractions/blob/master/contractions/__init __.py	2017-01-05 19:59:06 +01:00
Ines Montani	a23504fe07	Move abbreviations below other exceptions	2017-01-05 19:58:07 +01:00
Ines Montani	7d2cf934b9	Generate he/she/it correctly with 's instead of 've	2017-01-05 19:57:00 +01:00
Ines Montani	8328925e1f	Add newlines to long German text	2017-01-05 18:13:30 +01:00
Ines Montani	55b46d7cf6	Add tokenizer tests for German	2017-01-05 18:11:25 +01:00
Ines Montani	5bb4081f52	Remove redundant test_tokenizer.py for English	2017-01-05 18:11:11 +01:00
Ines Montani	8216ba599b	Add tests for longer and mixed English texts	2017-01-05 18:11:04 +01:00
Ines Montani	65f937d5c6	Move basic contraction tests to test_contractions.py	2017-01-05 18:09:53 +01:00
Ines Montani	bbe7cab3a1	Move non-English-specific tests back to general tokenizer tests	2017-01-05 18:09:29 +01:00
Ines Montani	038002d616	Reformat HU tokenizer tests and adapt to general style Improve readability of test cases and add conftest.py with fixture	2017-01-05 18:06:44 +01:00
Ines Montani	bc911322b3	Move ") to emoticons (see Tweebo challenge test)	2017-01-05 18:05:38 +01:00
Ines Montani	637f785036	Add general sanity tests for all tokenizers	2017-01-05 16:25:38 +01:00
Ines Montani	c5f2dc15de	Move English tokenizer tests to directory /en	2017-01-05 16:25:04 +01:00
Ines Montani	8b45363b4d	Modernize and merge general tokenizer tests	2017-01-05 13:17:05 +01:00
Ines Montani	02cfda48c9	Modernize and merge tokenizer tests for string loading	2017-01-05 13:16:55 +01:00
Ines Montani	a11f684822	Modernize and merge tokenizer tests for whitespace	2017-01-05 13:16:33 +01:00
Ines Montani	8b284fc6f1	Modernize and merge tokenizer tests for text from file	2017-01-05 13:15:52 +01:00
Ines Montani	2c2e878653	Modernize and merge tokenizer tests for punctuation	2017-01-05 13:14:16 +01:00
Ines Montani	8a74129cdf	Modernize and merge tokenizer tests for prefixes/suffixes/infixes	2017-01-05 13:13:12 +01:00
Ines Montani	0e65dca9a5	Modernize and merge tokenizer tests for exception and emoticons	2017-01-05 13:11:31 +01:00
Ines Montani	34c47bb20d	Fix formatting	2017-01-05 13:10:51 +01:00
Ines Montani	2e72683baa	Add missing docstrings	2017-01-05 13:10:21 +01:00
Ines Montani	da10a049a6	Add unicode declarations	2017-01-05 13:09:48 +01:00
Ines Montani	58adae8774	Remove unused file	2017-01-05 13:09:22 +01:00
Ines Montani	c6e5a5349d	Move regression test for #360 into own file	2017-01-04 00:49:31 +01:00
Ines Montani	8279993a6f	Modernize and merge tokenizer tests for punctuation	2017-01-04 00:49:20 +01:00
Ines Montani	550630df73	Update tokenizer tests for contractions	2017-01-04 00:48:42 +01:00
Ines Montani	109f202e8f	Update conftest fixture	2017-01-04 00:48:21 +01:00
Ines Montani	ee6b49b293	Modernize tokenizer tests for emoticons	2017-01-04 00:47:59 +01:00
Ines Montani	f09b5a5dfd	Modernize tokenizer tests for infixes	2017-01-04 00:47:42 +01:00
Ines Montani	59059fed27	Move regression test for #351 to own file	2017-01-04 00:47:11 +01:00
Ines Montani	667051375d	Modernize tokenizer tests for whitespace	2017-01-04 00:46:35 +01:00
Ines Montani	aafc894285	Modernize tokenizer tests for contractions Use @pytest.mark.parametrize.	2017-01-03 23:02:21 +01:00
Ines Montani	1d237664af	Add lowercase lemma to tokenizer exceptions	2017-01-03 23:02:21 +01:00
Ines Montani	84a87951eb	Fix typos	2017-01-03 18:27:43 +01:00
Ines Montani	35b39f53c3	Reorganise English tokenizer exceptions (as discussed in #718 ) Add logic to generate exceptions that follow a consistent pattern (like verbs and pronouns) and allow certain tokens to be excluded explicitly.	2017-01-03 18:26:09 +01:00
Ines Montani	fb9d3bb022	Revert "Merge remote-tracking branch 'origin/master'" This reverts commit `d3b181cdf1`, reversing changes made to `b19cfcc144`.	2017-01-03 18:21:36 +01:00
Ines Montani	461cbb99d8	Revert "Reorganise English tokenizer exceptions (as discussed in #718 )" This reverts commit `b19cfcc144`.	2017-01-03 18:21:29 +01:00
Ines Montani	d3b181cdf1	Merge remote-tracking branch 'origin/master' # Conflicts: # spacy/en/tokenizer_exceptions.py	2017-01-03 18:20:01 +01:00
Ines Montani	b19cfcc144	Reorganise English tokenizer exceptions (as discussed in #718 ) Add logic to generate exceptions that follow a consistent pattern (like verbs and pronouns) and allow certain tokens to be excluded explicitly.	2017-01-03 18:17:57 +01:00
Ines Montani	1bd53bbf89	Fix typos (resolves #718 )	2017-01-03 11:26:21 +01:00
Matthew Honnibal	fde53be3b4	Move whole token mach inside _split_affixes.	2016-12-30 17:11:50 -06:00
Matthew Honnibal	3ba7c167a8	Fix URL tests	2016-12-30 17:10:08 -06:00
Matthew Honnibal	9936a1b9b5	Merge branch 'tokenization_w_exception_patterns' of https://github.com/oroszgy/spaCy.hu into oroszgy-tokenization_w_exception_patterns	2016-12-30 14:53:40 -06:00
Magnus Burton	56e2219b65	Added Swedish city abbreviations	2016-12-30 21:17:34 +01:00
Magnus Burton	e935c950d8	Added months and days as abbreviations for Swedish	2016-12-30 21:08:44 +01:00
kengz	73a38bd4d1	Merge remote-tracking branch 'upstream/master'	2016-12-30 12:19:59 -05:00
kengz	da44183ae1	move parse_tree logic to a new tokens/printers.py file	2016-12-30 12:19:18 -05:00
Matthew Honnibal	3e8d9c772e	Test interaction of token_match and punctuation Check that the new token_match function applies after punctuation is split off.	2016-12-31 00:52:17 +11:00
Matthew Honnibal	74b921f394	Merge branch 'master' of ssh://github.com/explosion/spaCy into develop	2016-12-30 14:38:27 +01:00
Matthew Honnibal	623d94e14f	Whitespace	2016-12-31 00:30:28 +11:00
Matthew Honnibal	af81ac8bb0	Use thinc 6.0	2016-12-29 11:58:42 +01:00
Petter Hohle	f112e7754e	Add PART to tag map 16 of the 17 PoS tags in the UD tag set is added; PART is missing.	2016-12-28 18:39:01 +01:00
Matthew Honnibal	f62db78dc3	Increment version	2016-12-27 21:11:22 +01:00
Matthew Honnibal	cade536d1e	Merge branch 'master' of ssh://github.com/explosion/spaCy	2016-12-27 21:04:10 +01:00
Matthew Honnibal	ce4539dafd	Allow the vocabulary to grow to 10,000, to prevent cold-start problem.	2016-12-27 21:03:45 +01:00
Ines Montani	ad3669cef5	Merge pull request #703 from magnusburton/master Added Swedish abbreviations	2016-12-27 01:01:49 +01:00
Ines Montani	78f754dd9a	Merge pull request #705 from oroszgy/hu_tokenizer Initial support for Hungarian	2016-12-27 00:48:13 +01:00
Ines Montani	8785706039	Reformat stop words for better readability	2016-12-24 00:58:40 +01:00
Gyorgy Orosz	45e045a87b	Unicode/UTF8 compatibility for Python2	2016-12-24 00:21:00 +01:00
Gyorgy Orosz	72b61b6d03	Typo fix.	2016-12-24 00:10:29 +01:00
Gyorgy Orosz	3a9be4d485	Updated token exception handling mechanism to allow the usage of arbitrary functions as token exception matchers.	2016-12-23 23:49:34 +01:00
Ines Montani	1436b9f15a	Fix formatting and consistency	2016-12-23 21:36:01 +01:00
Ines Montani	1d64527727	Update Spanish tokenizer Remove reflexive pronouns as they're part of an open class, fix mistakes and add exceptions	2016-12-23 21:36:01 +01:00
Ines Montani	7f411fd01c	Remove exceptions containing whitespace / no special chars	2016-12-23 14:30:06 +01:00
Magnus Burton	fdf4776262	Added Swedish abbreviations	2016-12-22 22:45:18 +01:00
Gyorgy Orosz	d9c59c4751	Maintaining backward compatibility.	2016-12-21 23:30:49 +01:00
Gyorgy Orosz	1748549aeb	Added exception pattern mechanism to the tokenizer.	2016-12-21 23:16:19 +01:00
Gyorgy Orosz	35aa54765d	Hungarian module is exposed in spacy.	2016-12-21 20:45:36 +01:00
Gyorgy Orosz	ab2f6ea46c	Removed data files from tests..	2016-12-21 20:22:09 +01:00
Ines Montani	3c87c71d43	Add tokenizer exceptions for a.m. and p.m. in Spanish	2016-12-21 18:19:10 +01:00
Ines Montani	78e63dc7d0	Update tokenizer exceptions for English	2016-12-21 18:06:34 +01:00
Ines Montani	702d1eed93	Update tokenizer exceptions for German	2016-12-21 18:06:27 +01:00
Ines Montani	d60380418e	Update tokenizer exceptions for Spanish	2016-12-21 18:06:17 +01:00
Ines Montani	920fa0fed2	Add DET_LEMMA constant	2016-12-21 18:05:41 +01:00
Ines Montani	8978806ea6	Allow Vocab to load without serializer_freqs	2016-12-21 18:05:23 +01:00
Ines Montani	be8ed811f6	Remove trailing whitespace	2016-12-21 18:04:41 +01:00
Ines Montani	926e19184a	Merge pull request #695 from magnusburton/master Added Swedish morph rules	2016-12-21 01:06:00 +01:00
Gyorgy Orosz	3d5306acb9	Added further testcases.	2016-12-20 23:49:35 +01:00
Gyorgy Orosz	23956e72ff	Improved partial support for tokenzing Hungarian numbers	2016-12-20 23:36:59 +01:00
Gyorgy Orosz	6add156075	Refactored language data structure	2016-12-20 22:28:20 +01:00
Gyorgy Orosz	366b3f8685	Merge branch 'master' into hu_tokenizer	2016-12-20 20:53:31 +01:00
Gyorgy Orosz	c035928156	Partial Hungarian number tokenization is added.	2016-12-20 20:46:20 +01:00
JM	70ff0639b5	Fixed missing vec_path declaration that was failing if 'add_vectors' was set Added vec_path variable declaration to avoid accessing it before assignment in case 'add_vectors' is in overrides.	2016-12-20 18:21:05 +01:00
Magnus Burton	48dcc9f647	Added morph rules	2016-12-20 13:18:41 +01:00
Magnus Burton	db5a077d2b	Initial commit for Swedish	2016-12-20 11:05:06 +01:00
Matthew Honnibal	3f5747a9b2	Merge branch 'master' of ssh://github.com/explosion/spaCy	2016-12-18 23:44:22 +01:00
Matthew Honnibal	40e71586d6	Fix Issue #683 : Add 'SP' to tag_map, if it's not there already, within the Morphology class.	2016-12-18 23:44:05 +01:00
Matthew Honnibal	fa1d23e10d	Merge branch 'master' of https://github.com/explosion/spaCy	2016-12-18 23:32:03 +01:00
Matthew Honnibal	f38eb25fe1	Fix test for word vector	2016-12-18 23:31:55 +01:00
Matthew Honnibal	4e68abebc4	Merge branch 'master' of ssh://github.com/explosion/spaCy	2016-12-18 23:19:45 +01:00
Matthew Honnibal	5a6328a5a4	Increment version	2016-12-18 23:19:19 +01:00
Matthew Honnibal	13a0b31279	Another tweak to GloVe path hackery.	2016-12-18 23:12:49 +01:00
Matthew Honnibal	2c6228565e	Fix vector loading re glove hack	2016-12-18 23:06:44 +01:00
Matthew Honnibal	618b50a064	Fix issue #684 : GloVe vectors not loaded in spacy.en.English.	2016-12-18 22:46:31 +01:00
Matthew Honnibal	404019ad2f	Fix issue #672 : ent_iob_ was a string, not unicode, due to missing unicode_literals statement.	2016-12-18 22:33:53 +01:00
Matthew Honnibal	2ef9d53117	Untested fix for issue #684 : GloVe vectors hack should be inserted in English, not in spacy.load.	2016-12-18 22:29:31 +01:00
Matthew Honnibal	c065359459	Fix path-override bug in spacy.load	2016-12-18 22:15:29 +01:00
Matthew Honnibal	813249f826	Work on morphology class. Still not fully consistent with rest of library.	2016-12-18 17:35:22 +01:00
Matthew Honnibal	3679fb43a3	Fix loading of lemmatizer	2016-12-18 17:34:09 +01:00
Matthew Honnibal	3980f1b0cb	Ignore more morphology attributes in deprecated mode of intify_attrs	2016-12-18 17:33:46 +01:00
Matthew Honnibal	7a98ee5e5a	Merge language data change	2016-12-18 17:03:52 +01:00
Matthew Honnibal	e4c951c153	Merge branch 'organize-language-data' of ssh://github.com/explosion/spaCy into organize-language-data	2016-12-18 17:01:08 +01:00
Ines Montani	b99d683a93	Fix formatting	2016-12-18 16:58:28 +01:00
Ines Montani	b11d8cd3db	Merge remote-tracking branch 'origin/organize-language-data' into organize-language-data	2016-12-18 16:57:12 +01:00
Ines Montani	d1c1d3f9cd	Fix tokenizer test	2016-12-18 16:55:32 +01:00
Ines Montani	753068f1d5	Use base language data as default	2016-12-18 16:55:25 +01:00
Ines Montani	bcc1d50d09	Remove trailing whitespace	2016-12-18 16:54:52 +01:00
Ines Montani	4e95737c6c	Add base tag map	2016-12-18 16:54:28 +01:00
Ines Montani	2b2ea8ca11	Reorganise language data	2016-12-18 16:54:19 +01:00
Matthew Honnibal	1b31c05bf8	Whitespace	2016-12-18 16:51:40 +01:00
Matthew Honnibal	bdcecb3c96	Add import in regression test	2016-12-18 16:51:31 +01:00
Matthew Honnibal	6ee1df93c5	Set tag_map to None if it's not seen in the data by vocab	2016-12-18 16:51:10 +01:00
Matthew Honnibal	33996e770b	Update header for morphology class	2016-12-18 16:50:42 +01:00
Matthew Honnibal	d58187ffa7	Filter out morphology keys in deprecated attrs	2016-12-18 16:50:26 +01:00
Matthew Honnibal	837a5d4100	Update morphology class so that exceptions can be added one-by-one, and so that arbitrary attributes can be referenced.	2016-12-18 16:49:46 +01:00
Matthew Honnibal	44f4f008bd	Wire up lemmatizer rules for English	2016-12-18 15:50:09 +01:00
Matthew Honnibal	e6fc4afb04	Whitespace	2016-12-18 15:48:00 +01:00
Ines Montani	32b36c3882	Break language data components into their own files	2016-12-18 15:40:22 +01:00
Ines Montani	1bff59a8db	Update English language data	2016-12-18 15:36:53 +01:00
Ines Montani	2eb163c5dd	Add lemma rules	2016-12-18 15:36:53 +01:00
Ines Montani	29ad8143d8	Add morph rules	2016-12-18 15:36:53 +01:00
Ines Montani	bc40dad7d9	Add entity rules	2016-12-18 15:36:53 +01:00
Ines Montani	eaa3b1319d	Fix formatting	2016-12-18 15:36:53 +01:00
Ines Montani	704c7442e0	Break language data components into their own files	2016-12-18 15:36:53 +01:00
Ines Montani	62655fd36f	Add ENT_ID constant	2016-12-18 15:36:53 +01:00
Matthew Honnibal	fa272fdf12	Merge branch 'organize-language-data' of ssh://github.com/explosion/spaCy into organize-language-data	2016-12-18 15:00:21 +01:00
Matthew Honnibal	57c4341453	Refactor loading of morphology exceptions, adding a method add_special_case.	2016-12-18 14:59:44 +01:00
Ines Montani	77cf2fb0f6	Remove unnecessary argument in test	2016-12-18 14:06:27 +01:00
Ines Montani	121c310566	Remove trailing whitespace	2016-12-18 14:06:27 +01:00
Ines Montani	0fc4e45cb3	Fix tag map for German	2016-12-18 13:30:03 +01:00
Ines Montani	28326649f3	Fix typo	2016-12-18 13:30:03 +01:00
Matthew Honnibal	0595cc0635	Change test595 to mock data, instead of requiring model.	2016-12-18 13:28:51 +01:00
Matthew Honnibal	a4eb5c2bff	Check POS key in lemmatizer, to update it for new data format	2016-12-18 13:28:20 +01:00
Matthew Honnibal	28d63ec58e	Restore missing '' character in tokenizer exceptions.	2016-12-18 05:34:51 +01:00
Ines Montani	a9421652c9	Remove duplicates in tag map	2016-12-17 22:44:31 +01:00
Ines Montani	69baf1c9a8	Fix tag map	2016-12-17 22:44:22 +01:00
Ines Montani	577adad945	Fix formatting	2016-12-17 14:00:52 +01:00
Ines Montani	fc4ad17136	Fix typo	2016-12-17 14:00:47 +01:00
Ines Montani	bb94e784dc	Fix typo	2016-12-17 13:59:30 +01:00
Ines Montani	afda532595	Use symbols in tag map	2016-12-17 13:56:24 +01:00
Ines Montani	07249145c9	Fix formatting	2016-12-17 13:34:46 +01:00
Ines Montani	dd55d085b6	Reformat dutch language data to match new style	2016-12-17 13:26:01 +01:00
Ines Montani	f2c48ef504	Resolve stopwords conflict to merge Dutch	2016-12-17 13:08:16 +01:00
Matthew Honnibal	ff03ade08f	Merge pull request #688 from nlesc-sherlock/dutch Support for Dutch in SpaCy	2016-12-17 22:44:58 +11:00
Ines Montani	a22322187f	Add missing lemmas to tokenizer exceptions (fixes #674 )	2016-12-17 12:42:41 +01:00
Ines Montani	5445074cbd	Expand tokenizer exceptions with unicode apostrophe (fixes #685 )	2016-12-17 12:34:08 +01:00
Ines Montani	e0a7b5c612	Fix formatting	2016-12-17 12:33:09 +01:00
Ines Montani	08162dce67	Move shared functions and constants to global language data	2016-12-17 12:32:48 +01:00
Ines Montani	6a60a61086	Move update_exc to global language data utils	2016-12-17 12:29:02 +01:00
Ines Montani	f324311249	Add global language data utils	2016-12-17 12:27:41 +01:00
Ines Montani	487ce1e20a	Add encoding declaration	2016-12-17 12:25:44 +01:00
Ines Montani	d8d50a0334	Add tokenizer exception for "gonna" (fixes #691 )	2016-12-17 11:59:28 +01:00
Ines Montani	c69b77d8aa	Revert "Add exception for "gonna"" This reverts commit `280c03f67b`.	2016-12-17 11:56:44 +01:00
Ines Montani	280c03f67b	Add exception for "gonna"	2016-12-17 11:54:59 +01:00
Ines Montani	5031a015e2	Fix typo in stopwords (fixes #689 )	2016-12-15 17:57:06 +01:00
Janneke van der Zwaan	4a3fdcce8a	Merge github.com:explosion/spaCy into dutch	2016-12-13 09:25:23 +01:00
Matthew Honnibal	5965d3c2a7	Revert "Add acl to symbols.pyx"	2016-12-12 10:10:28 +11:00
Matthew Honnibal	6dee76dfed	Update symbols.pxd	2016-12-12 10:09:58 +11:00
Pokey Rule	18a15c0777	Add acl to symbols.pyx	2016-12-11 20:00:07 +00:00
Gyorgy Orosz	0cf2144d24	Adding partial hyphen and quote handling support.	2016-12-11 00:14:36 +01:00
Gyorgy Orosz	2051726fd3	Passing Hungatian abbrev tests.	2016-12-10 23:37:58 +01:00
Ines Montani	63024466a9	Add Portuguese stopwords	2016-12-08 20:45:07 +01:00
Ines Montani	7bfe2d4abc	Update Portuguese language data	2016-12-08 20:41:41 +01:00
Ines Montani	c0c5f31950	Remove unused data and download script	2016-12-08 20:39:49 +01:00
Ines Montani	0a6d529104	Remove unused data	2016-12-08 20:36:56 +01:00
Ines Montani	1b3b043660	Add French stopwords	2016-12-08 20:12:43 +01:00
Ines Montani	8863e504eb	Update French language data	2016-12-08 20:07:14 +01:00
Ines Montani	7cb9f51be6	Add Italian stopwords	2016-12-08 20:05:25 +01:00
Ines Montani	470a0e0bea	Update Italian language data	2016-12-08 19:52:18 +01:00
Ines Montani	1a284d342e	Add Spanish language data	2016-12-08 19:47:03 +01:00
Ines Montani	0c39654786	Remove unused import	2016-12-08 19:46:53 +01:00
Ines Montani	e47ee94761	Split punctuation into its own file	2016-12-08 19:46:43 +01:00
Ines Montani	70b51ed7c8	Remove time from German language data	2016-12-08 19:45:50 +01:00
Ines Montani	e8ae588be9	Add emoticons	2016-12-08 19:45:18 +01:00
Ines Montani	5908c0ed9f	Fix formatting	2016-12-08 19:45:11 +01:00
Ines Montani	311b30ab35	Reorganize exceptions for English and German	2016-12-08 13:58:32 +01:00
Ines Montani	66c7348cda	Add update_exc util function	2016-12-08 13:58:12 +01:00
Ines Montani	1256232fad	Fix formatting	2016-12-08 13:56:40 +01:00
Ines Montani	8e977cc71c	Fix formatting	2016-12-08 13:56:17 +01:00
Ines Montani	0176b99004	Fix formatting	2016-12-08 12:48:02 +01:00
Ines Montani	877f09218b	Add more custom rules for abbreviations	2016-12-08 12:47:01 +01:00
Gyorgy Orosz	0289b8ceaa	Additional abbreviation tests.	2016-12-08 12:17:44 +01:00
Gyorgy Orosz	90d22db023	Added Hungarian resource files.	2016-12-08 12:06:36 +01:00
Ines Montani	bfaa42636c	Update language data for German	2016-12-08 12:01:09 +01:00
Ines Montani	ec44bee321	Fix capitalization on morphological features	2016-12-08 12:00:54 +01:00
Gyorgy Orosz	5b00039955	First steps towards the Hungarian tokenizer code.	2016-12-07 23:07:43 +01:00
Ines Montani	ce979553df	Resolve conflict	2016-12-07 21:16:52 +01:00
Ines Montani	8350d65695	Change morphology and lemmatizer API Take morphology features as object instead of keyword arguments	2016-12-07 21:12:49 +01:00
Ines Montani	52e7d634df	Remove trailing whitespace	2016-12-07 21:12:19 +01:00
Ines Montani	0d07d7fc80	Apply emoticon exceptions to tokenizer	2016-12-07 21:11:59 +01:00
Ines Montani	71f0f34cb3	Fix formatting	2016-12-07 21:11:29 +01:00
Ines Montani	9413bcd9ee	Declare encoding and unicode literals	2016-12-07 21:10:34 +01:00
Ines Montani	a280ff2657	Fix __all__	2016-12-07 21:10:12 +01:00
Ines Montani	ba8721953c	Add missing emoticons	2016-12-07 21:09:44 +01:00
Ines Montani	1285c4ba93	Update English language data	2016-12-07 20:33:28 +01:00
Ines Montani	79dce0aabe	Add emoticons	2016-12-07 20:33:28 +01:00
Ines Montani	a662a95294	Add line breaks	2016-12-07 20:33:28 +01:00
Ines Montani	07f0efb102	Add test for tokenizer regular expressions	2016-12-07 20:33:28 +01:00
Ines Montani	e0712d1b32	Reformat language data	2016-12-07 20:33:28 +01:00
Matthew Honnibal	0c0f4c965d	Increment version	2016-12-03 11:16:52 +01:00
Matthew Honnibal	f6e356aada	Add (and test) Span.sentiment attribute. By default we average token.span, but can override with custom hook. Re Issue #667	2016-12-02 11:05:50 +01:00
Janneke van der Zwaan	88869e0e07	Merge github.com:explosion/spaCy into dutch	2016-11-30 17:13:39 +01:00
Janneke van der Zwaan	51ade86b86	Update language data with tag map from UD_Dutch	2016-11-30 14:41:23 +01:00
Janneke van der Zwaan	90f6ff12c9	Update Dutch language data - Use Dutch tag map - remove tokenizer exceptions	2016-11-30 11:59:39 +01:00
dafnevk	7b8f4c49f2	Added language Dutch to init file	2016-11-29 16:42:05 +01:00
Matthew Honnibal	296d33a4fc	Merge branch 'master' of ssh://github.com/explosion/spaCy	2016-11-26 12:36:18 +01:00
Matthew Honnibal	1f6c37c6f5	Fix create_tokenizer when nlp is None	2016-11-26 12:36:04 +01:00
Matthew Honnibal	c7889492f9	Fix model saving error for Python 3	2016-11-25 18:04:30 -06:00
Matthew Honnibal	bc0a202c9c	Fix unicode problem in nonproj module	2016-11-25 17:29:17 -06:00
Matthew Honnibal	6dd3b94fa6	Filter out deprecated attributes when reading special-case tokenization rules.	2016-11-25 09:57:18 -06:00
Matthew Honnibal	e879c79b8c	Merge branch 'master' of https://github.com/explosion/spaCy	2016-11-25 09:18:28 -06:00
Matthew Honnibal	a335c6dcc2	Exclude morphs from deprecated token attributes for now	2016-11-25 16:17:32 +01:00
Matthew Honnibal	f799a07f25	Merge branch 'master' of https://github.com/explosion/spaCy	2016-11-25 09:16:43 -06:00
Matthew Honnibal	159e8c46e1	Merge old training fixes with newer state	2016-11-25 09:16:36 -06:00
Matthew Honnibal	846e80f2f4	Exclude morphs from deprecated token attributes for now	2016-11-25 16:14:54 +01:00
Matthew Honnibal	664f2dd1c0	Allow dep to be None in scorer, for missing labels.	2016-11-25 09:02:49 -06:00
Matthew Honnibal	39341598bb	Fix NER label calculation	2016-11-25 09:02:22 -06:00
Matthew Honnibal	ca773a1f53	Tweak arc_eager n_gold to deal with negative costs, and improve error message.	2016-11-25 09:01:52 -06:00
Matthew Honnibal	a2f55e7015	Pass cfg through loading, for training.	2016-11-25 09:01:20 -06:00
Matthew Honnibal	608d8f5421	Pass cfg through parser, and have is_valid default to 1, not 0 when resetting state	2016-11-25 09:00:21 -06:00
Matthew Honnibal	cc7e607a8a	Fix gold.pyx for 1.0	2016-11-25 08:57:59 -06:00
root	080d29e092	Fix train.py for 1.0	2016-11-25 08:55:33 -06:00
Matthew Honnibal	6652f2a135	Test #656 , #624 : special case rules for tokenizer with attributes.	2016-11-25 12:44:13 +01:00
Matthew Honnibal	1e0f566d95	Fix #656 , #624 : Support arbitrary token attributes when adding special-case rules.	2016-11-25 12:43:24 +01:00
Matthew Honnibal	87613edf8f	Add set_struct_attr staticmethod to token	2016-11-25 12:41:47 +01:00
Matthew Honnibal	fb69aa648f	Merge branch 'master' of ssh://github.com/explosion/spaCy	2016-11-25 11:35:44 +01:00
Matthew Honnibal	9a03a3f85e	Add get_struct_attr staticmethod to Token, to match Lexeme.get_struct_attr.	2016-11-25 11:35:17 +01:00
Matthew Honnibal	53d8ca8f51	Add spacy.attrs.intify_attrs function, to normalize strings in token attribute dictionaries.	2016-11-25 11:34:30 +01:00
Ines Montani	d21ad01840	Add emoticons	2016-11-24 19:13:00 +01:00
dafnevk	d8c7ac203a	Added nl module for dutch	2016-11-24 16:39:49 +01:00
dafnevk	3db8b0d322	Added language class and some language data (with some TODOs) for Dutch	2016-11-24 15:56:38 +01:00
Ines Montani	4dcfafde02	Add line breaks	2016-11-24 14:57:37 +01:00
Ines Montani	6247c005a2	Add test for tokenizer regular expressions	2016-11-24 13:51:59 +01:00
Ines Montani	de747e39e7	Reformat language data	2016-11-24 13:51:32 +01:00
Matthew Honnibal	b8c4f5ea76	Allow German noun chunks to work on Span Update the German noun chunks iterator, so that it also works on Span objects.	2016-11-24 23:30:15 +11:00
Pokey Rule	3e3bda142d	Add noun_chunks to Span	2016-11-24 10:47:20 +00:00
Janneke van der Zwaan	83daade0e4	Add directory and initial (empty) files for language Dutch	2016-11-24 09:45:41 +01:00
Matthew Honnibal	09f68bc641	Fix Issue #639 : stop words in language class not used. This patch is messy, but it's better not to change too much until the language data loading can be properly refactored.	2016-11-24 00:13:55 +01:00
Matthew Honnibal	48e1dc29d4	Fix default path loading.	2016-11-23 23:48:55 +01:00
Matthew Honnibal	e01c1875ee	Work on test for #615	2016-11-23 23:48:41 +01:00
ExplodingCabbage	6c4f488e89	Fix syntax mistake	2016-11-23 15:12:45 +00:00
Matthew Honnibal	60eb2343ce	Only try to load vectors if they exist.	2016-11-23 13:50:24 +01:00
Matthew Honnibal	618ac36093	Fix use of path argument in Language.__init__. Needs to be keyword arg, not positional.	2016-11-23 13:26:34 +01:00
Mark Amery	fbe19680a6	Fix another bug related to Language.__init__'s path parameter	2016-11-20 20:31:34 +00:00
Mark Amery	b0a07c21a0	Fix `path` param of `Language.__init__` always being ignored There was an explicitly-declared `path` keyword argument, so 'path' would never be present in `**overrides`. This line just overwrote any manually-specified value the user might've passed to the `path` parameter.	2016-11-20 16:29:57 +00:00
Mark Amery	1988fce389	Merge remote-tracking branch 'origin/master' into specify-data-path	2016-11-20 16:07:14 +00:00
Mark Amery	3871007c72	Let --data-path be specified when running download.py scripts Resolves https://github.com/explosion/spaCy/issues/637	2016-11-20 15:48:04 +00:00
Ines Montani	dad2c6cae9	Strip trailing whitespace	2016-11-20 16:45:51 +01:00
Ines Montani	3082e49326	Update and reformat German stopwords	2016-11-20 16:45:26 +01:00
Sourav Singh	6745eac309	Update language_data.py	2016-11-20 19:52:02 +05:30
Sourav Singh	4d9aae7d6a	Add German Stopwords	2016-11-19 22:47:53 +05:30
Matthew Honnibal	7afb2544a7	Merge pull request #627 from sadovnychyi/patch-1 Remove duplicated line of vocab declaration	2016-11-16 06:09:18 +11:00
Yanhao	762169da29	Fixed bug: eg.guess is a tag id, rather than tag	2016-11-15 14:11:22 +08:00
Dmytro Sadovnychyi	e70a7050e1	Remove duplicated line of vocab declaration As already declared on line 211.	2016-11-13 18:52:49 +08:00
Matthew Honnibal	f123f92e0c	Fix #617 : Vocab.load() required Path. Should work with string as well.	2016-11-10 22:48:48 +01:00
Matthew Honnibal	e86f440ca6	Fix test for issue 617	2016-11-10 22:48:10 +01:00
Matthew Honnibal	faa7610c56	Merge branch 'master' of ssh://github.com/explosion/spaCy	2016-11-10 22:46:38 +01:00
Matthew Honnibal	a2c7de8329	spacy/tests/regression/test_issue617.py Test Issue #617	2016-11-10 22:46:23 +01:00
tiago	2a3e342c1f	Added a test case to cover the span.merge returning values	2016-11-09 18:57:50 +00:00
tiago	b38cfd0ef9	now span.merge returns token like it says on documentation	2016-11-09 14:58:19 +00:00
Dmitry Sadovnychyi	9488222e79	Fix PhraseMatcher to work with updated Matcher #613	2016-11-09 00:14:26 +08:00
Dmitry Sadovnychyi	86c056ba64	Add basic test for PhraseMatcher #613	2016-11-09 00:10:32 +08:00
Matthew Honnibal	3ea15b257f	Fix test for 605	2016-11-06 11:59:26 +01:00
Matthew Honnibal	efe7790439	Test #590 : Order dependence in Matcher rules.	2016-11-06 11:21:36 +01:00
Matthew Honnibal	5cd3acb265	Fix #605 : Acceptor now rejects matches as expected.	2016-11-06 10:50:42 +01:00
Matthew Honnibal	75805397dd	Test Issue #605	2016-11-06 10:42:32 +01:00
Matthew Honnibal	014b6936ac	Fix #608 -- __version__ should be available at the base of the package.	2016-11-04 21:21:02 +01:00
Matthew Honnibal	42b0736db7	Increment version	2016-11-04 20:04:21 +01:00
Matthew Honnibal	9f93386994	Update version	2016-11-04 19:28:16 +01:00
Matthew Honnibal	1fb09c3dc1	Fix morphology tagger	2016-11-04 19:19:09 +01:00
Matthew Honnibal	a36353df47	Temporarily put back the tokenize_from_strings method, while tests aren't updated yet.	2016-11-04 19:18:07 +01:00
Matthew Honnibal	f0917b6808	Fix Issue #376 : and/or was tagged as a noun.	2016-11-04 15:21:28 +01:00
Matthew Honnibal	737816e86e	Fix #368 : Tokenizer handled pattern 'unicode close quote, period' incorrectly.	2016-11-04 15:16:20 +01:00
Matthew Honnibal	ab952b4756	Fix #578 -- Sputnik had been purging all files on --force, not just the relevant one.	2016-11-04 10:44:11 +01:00
Matthew Honnibal	6e37ba1d82	Fix #602 , #603 --- Broken build	2016-11-04 09:54:24 +01:00
Matthew Honnibal	293c79c09a	Fix #595 : Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly.	2016-11-04 00:29:07 +01:00

... 10 11 12 13 14 ...

3026 Commits