spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-09-21 11:29:13 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	9750a0128c	Fix Span.noun_chunks. Closes #1207	2017-07-22 14:14:57 +02:00
Matthew Honnibal	d9b85675d7	Rename regression test	2017-07-22 14:14:35 +02:00
Matthew Honnibal	dfbc7e49de	Add test for Issue #1207	2017-07-22 14:14:01 +02:00
Matthew Honnibal	0ae3807d7d	Fix gaps in Lexeme API. Closes #1031	2017-07-22 13:53:48 +02:00
Matthew Honnibal	83e1b5f1e3	Merge branch 'master' of https://github.com/explosion/spaCy	2017-07-22 13:45:35 +02:00
Matthew Honnibal	45f6961ae0	Add __version__ symbol in __init__.py	2017-07-22 13:45:21 +02:00
Matthew Honnibal	8b9c4c5e1c	Add missing SP symbol to tag map, re #1052	2017-07-22 13:44:17 +02:00
Ines Montani	9af04ea11f	Merge pull request #1161 from AlexisEidelman/patch-1 French NUM_WORDS and ORDINAL_WORDS	2017-07-22 13:40:46 +02:00
Matthew Honnibal	44dd247e73	Merge branch 'master' of https://github.com/explosion/spaCy	2017-07-22 13:35:30 +02:00
Matthew Honnibal	94267ec50f	Fix merge conflit in printer	2017-07-22 13:35:15 +02:00
Ines Montani	c7708dc736	Merge pull request #1177 from swierh/master Dutch NUM_WORDS and ORDINAL_WORDS	2017-07-22 13:35:08 +02:00
Matthew Honnibal	5916d46ba8	Avoid use of deepcopy in printer	2017-07-22 13:34:01 +02:00
Ines Montani	9eca6503c1	Merge pull request #1157 from polm/master Add basic Japanese Tokenizer Test	2017-07-10 13:07:11 +02:00
Paul O'Leary McCann	bc87b815cc	Add comment clarifying what LANGUAGES does	2017-07-09 16:28:55 +09:00
Paul O'Leary McCann	04e6a65188	Remove Japanese from LANGUAGES LANGUAGES is a list of languages whose tokenizers get run through a variety of generic tests. Since the generic tests don't check the JA fixture, it blows up when it can't find janome. -POLM	2017-07-09 16:23:26 +09:00
Swier	29720150f9	fix import of stop words in language data	2017-07-05 14:08:04 +02:00
Swier	f377c9c952	Rename stop_words.py to word_sets.py	2017-07-05 14:06:28 +02:00
Swier	5357874bf7	add Dutch numbers and ordinals	2017-07-05 14:03:30 +02:00
gispk47	669bd14213	Update __init__.py remove the empty string return from jieba.cut,this will cause the list of tokens cant be pushed assert error	2017-07-01 13:12:00 +08:00
Paul O'Leary McCann	c336193392	Parametrize and extend Japanese tokenizer tests	2017-06-29 00:09:40 +09:00
Paul O'Leary McCann	30a34ebb6e	Add importorskip for janome	2017-06-29 00:09:20 +09:00
Alexis	1b3a5d87ba	French NUM_WORDS and ORDINAL_WORDS	2017-06-28 14:11:20 +02:00
Paul O'Leary McCann	e56fea14eb	Add basic Japanese tokenizer test	2017-06-28 01:24:25 +09:00
Paul O'Leary McCann	84041a2bb5	Make create_tokenizer work with Japanese	2017-06-28 01:18:05 +09:00
György Orosz	fa26041da6	Fixed typo in cli/package.py	2017-06-07 16:19:08 +02:00
Ines Montani	e7ef51b382	Update tokenizer_exceptions.py	2017-06-02 19:00:01 +02:00
Ines Montani	81918155ef	Merge pull request #1096 from recognai/master Spanish model features	2017-06-02 11:07:27 +02:00
Francisco Aranda	70a2180199	fix(spanish sentence segmentation): remove tokenizer exceptions the break sentence segmentation. Aligned with training corpus	2017-06-02 08:19:57 +02:00
Francisco Aranda	5b385e7d78	feat(spanish model): add the spanish noun chunker	2017-06-02 08:14:06 +02:00
Ines Montani	7f6be41f21	Fix typo in English tokenizer exceptions (resolves #1071 )	2017-05-23 12:18:00 +02:00
Raphaël Bournhonesque	6381ebfb14	Use yield from syntax	2017-05-18 10:42:35 +02:00
Raphaël Bournhonesque	f37d078d6a	Fix issue #1069 with custom hook `Doc.sents` definition	2017-05-18 09:59:38 +02:00
ines	9003fd25e5	Fix error messages if model is required (resolves #1051 ) Rename about.__docs__ to about.__docs_models__.	2017-05-13 13:14:02 +02:00
ines	24e973b17f	Rename about.__docs__ to about.__docs_models__	2017-05-13 13:09:00 +02:00
ines	6e1dbc608e	Fix parse_tree test	2017-05-13 12:34:20 +02:00
ines	573f0ba867	Replace deepcopy	2017-05-13 12:34:14 +02:00
ines	bd428c0a70	Set defaults for light and flat kwargs	2017-05-13 12:34:05 +02:00
ines	c5669450a0	Fix formatting	2017-05-13 12:33:57 +02:00
Matthew Honnibal	ad590feaa8	Fix test, which imported English incorrectly	2017-05-13 11:36:19 +02:00
Ines Montani	8d742ac8ff	Merge pull request #1055 from recognai/master Enable pruning out rare words from clusters data	2017-05-13 03:22:56 +02:00
Matthew Honnibal	b2540d2379	Merge Kengz's tree_print patch	2017-05-13 03:18:49 +02:00
oeg	cdaefae60a	feature(populate_vocab): Enable pruning out rare words from clusters data	2017-05-12 16:15:19 +02:00
ines	b1f22c5a10	Fix formatting	2017-05-03 20:11:02 +02:00
ines	a04b5be1b2	Add glossary for annotation scheme (closes #1034 ) Can be imported as explain from spacy.glossary, or called as spacy.explain(term)	2017-05-03 17:02:17 +02:00
Ines Montani	3ea23a3f4d	Fix formatting	2017-05-03 09:44:38 +02:00
Ines Montani	d730eb0c0d	Raise custom ImportError if importing janome fails	2017-05-03 09:43:29 +02:00
Ines Montani	949ad6594b	Add newline	2017-05-03 09:38:43 +02:00
Ines Montani	d12ca587ea	Add newline	2017-05-03 09:38:29 +02:00
Ines Montani	8676cd0135	Add newline	2017-05-03 09:38:07 +02:00
Yasuaki Uechi	c8f83aeb87	Add basic japanese support	2017-05-03 13:56:21 +09:00
Matthew Honnibal	31ec9e1371	Merge branch 'master' of https://github.com/explosion/spaCy	2017-04-27 13:21:39 +02:00
Matthew Honnibal	2da16adcc2	Add dropout optin for parser and NER Dropout can now be specified in the `Parser.update()` method via the `drop` keyword argument, e.g. nlp.entity.update(doc, gold, drop=0.4) This will randomly drop 40% of features, and multiply the value of the others by 1. / 0.4. This may be useful for generalising from small data sets. This commit also patches the examples/training/train_new_entity_type.py example, to use dropout and fix the output (previously it did not output the learned entity).	2017-04-27 13:18:39 +02:00
Ines Montani	7da9cefd25	Merge pull request #1022 from luvogels/master Initial support for Norwegian Bokmål	2017-04-27 11:16:06 +02:00
Ines Montani	c9e592ae6c	Add newline	2017-04-27 11:15:41 +02:00
Ines Montani	5942adccc2	Add newline	2017-04-27 11:15:19 +02:00
Ines Montani	4cd9269aef	Add newline	2017-04-27 11:15:04 +02:00
Ines Montani	ccf13ecc21	Add newline	2017-04-27 11:14:42 +02:00
Ines Montani	03d2b0cc05	Add newline	2017-04-27 11:14:26 +02:00
luvogels	d12a0b6431	Hooked up tokenizer tests	2017-04-26 23:21:41 +02:00
Matthew Honnibal	f0e1606d27	Increment version	2017-04-26 20:25:41 +02:00
luvogels	b331929a7e	Merge branch 'master' of https://github.com/luvogels/spaCy	2017-04-26 19:15:48 +02:00
luvogels	8de59ce3b9	Added tokenizer tests	2017-04-26 19:10:18 +02:00
Matthew Honnibal	4d98511db7	Make Span hashable. Closes #1019	2017-04-26 19:01:05 +02:00
Matthew Honnibal	24c4c51f13	Try to make test999 less flakey	2017-04-26 18:42:06 +02:00
Leif Uwe Vogelsang	460094bf09	Update __init__.py	2017-04-26 18:27:55 +02:00
ines	527d51ac9a	Fetch shortcuts from GitHub and improve error handling	2017-04-26 18:00:28 +02:00
Matthew Honnibal	c4be9c36fe	Fix unicode header in tests	2017-04-24 10:09:01 +02:00
Matthew Honnibal	65f10b53e5	Fix test	2017-04-24 00:25:55 +02:00
Matthew Honnibal	70a43858e1	Fix flakey test	2017-04-24 00:06:30 +02:00
Matthew Honnibal	3973af2d15	Make training test less flakey	2017-04-23 22:59:34 +02:00
Matthew Honnibal	4f9657b42b	Fix reporting if no dev data with train	2017-04-23 22:27:10 +02:00
Matthew Honnibal	df2ac8b843	Merge branch 'master' of https://github.com/explosion/spaCy	2017-04-23 21:25:07 +02:00
Matthew Honnibal	d0e19267e8	Create directory if missing in save_to_directory	2017-04-23 21:24:43 +02:00
ines	42305bc519	Remove unnecessary test	2017-04-23 21:21:41 +02:00
ines	012ea594d1	Add file for misc tests	2017-04-23 21:06:51 +02:00
ines	83f66947dc	Rename test_download to test_cli	2017-04-23 21:06:50 +02:00
ines	401045433c	Simplify compat.fix_text	2017-04-23 21:06:50 +02:00
Matthew Honnibal	e033c86a64	Increment version	2017-04-23 21:03:43 +02:00
Matthew Honnibal	d2436dc17b	Update fix for Issue #999	2017-04-23 18:14:37 +02:00
Matthew Honnibal	874a3cbb07	Add test for Issue #955	2017-04-23 17:57:01 +02:00
Matthew Honnibal	60703cede5	Ensure noun chunks can't be nested. Closes #955	2017-04-23 17:56:39 +02:00
Matthew Honnibal	c9ec24b257	Merge branch 'master' of https://github.com/explosion/spaCy	2017-04-23 17:07:46 +02:00
Matthew Honnibal	5d8af40445	Add test for Issue #999	2017-04-23 17:06:30 +02:00
Matthew Honnibal	4d2a659c52	Fix json dump for Python3	2017-04-23 17:05:53 +02:00
Matthew Honnibal	040751ad17	Remove xfail on Test #910	2017-04-23 16:28:55 +02:00
ines	3a9710f356	Pass dev_scores to print_progress correctly (resolves #1008 ) Only read scores attribute if command is used with dev_data, otherwise default dev_scores to empty dict.	2017-04-23 15:58:40 +02:00
Matthew Honnibal	1b12f342e4	Merge branch 'master' of https://github.com/explosion/spaCy	2017-04-20 17:03:11 +02:00
Matthew Honnibal	4eef200bab	Persist the actions within spacy.parser.cfg	2017-04-20 17:02:44 +02:00
ines	25c70b4cc5	Move fix_text to spacy.compat (see #1002 )	2017-04-20 15:47:17 +02:00
Ines Montani	60b5243bee	Merge pull request #1002 from oroszgy/model_cli_fix Fixes for the `model` CLI	2017-04-20 15:41:03 +02:00
Gyorgy Orosz	4a06a2572c	Using ftfy for handling broken encoded strings.	2017-04-20 13:34:51 +02:00
Ines Montani	3800b29046	Merge pull request #1001 from recognai/master Add SPACE to es tag map	2017-04-20 12:16:34 +02:00
oeg	f0bcd0babb	fix(model): Add SPACE to es tag_map. Fixing error in morphology.pyx when SP tag is missing	2017-04-20 11:36:24 +02:00
Ben Eyal	e90e8a3f10	Enable test	2017-04-20 02:25:24 +03:00
Ben Eyal	33af52599e	Redefine alphabetic characters For caseless languages (Hebrew, Bengali) all characters are both lowercase and uppercase.	2017-04-20 02:25:02 +03:00
Ben Eyal	d8098a8be2	Use `regex` instead of `re`	2017-04-20 02:22:52 +03:00
oeg	daaa42dd25	Merge remote-tracking branch 'upstream/master'	2017-04-19 23:30:36 +02:00
oeg	936a297241	fix(model): Fix tag map for fixing issues with tag SPACE	2017-04-19 23:30:21 +02:00
luvogels	c7cec7e5e2	Update __init__.py	2017-04-19 21:06:30 +02:00
luvogels	55e8cade36	Update __init__.py	2017-04-19 21:06:30 +02:00

1 2 3 4 5 ...

2901 Commits