spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-04-04 09:14:12 +03:00

Author	SHA1	Message	Date
Paul O'Leary McCann	bc87b815cc	Add comment clarifying what LANGUAGES does	2017-07-09 16:28:55 +09:00
Paul O'Leary McCann	04e6a65188	Remove Japanese from LANGUAGES LANGUAGES is a list of languages whose tokenizers get run through a variety of generic tests. Since the generic tests don't check the JA fixture, it blows up when it can't find janome. -POLM	2017-07-09 16:23:26 +09:00
Paul O'Leary McCann	c336193392	Parametrize and extend Japanese tokenizer tests	2017-06-29 00:09:40 +09:00
Paul O'Leary McCann	30a34ebb6e	Add importorskip for janome	2017-06-29 00:09:20 +09:00
Paul O'Leary McCann	e56fea14eb	Add basic Japanese tokenizer test	2017-06-28 01:24:25 +09:00
Paul O'Leary McCann	84041a2bb5	Make create_tokenizer work with Japanese	2017-06-28 01:18:05 +09:00
György Orosz	fa26041da6	Fixed typo in cli/package.py	2017-06-07 16:19:08 +02:00
Ines Montani	e7ef51b382	Update tokenizer_exceptions.py	2017-06-02 19:00:01 +02:00
Ines Montani	81918155ef	Merge pull request #1096 from recognai/master Spanish model features	2017-06-02 11:07:27 +02:00
Francisco Aranda	70a2180199	fix(spanish sentence segmentation): remove tokenizer exceptions the break sentence segmentation. Aligned with training corpus	2017-06-02 08:19:57 +02:00
Francisco Aranda	5b385e7d78	feat(spanish model): add the spanish noun chunker	2017-06-02 08:14:06 +02:00
Ines Montani	7f6be41f21	Fix typo in English tokenizer exceptions (resolves #1071 )	2017-05-23 12:18:00 +02:00
Raphaël Bournhonesque	6381ebfb14	Use yield from syntax	2017-05-18 10:42:35 +02:00
Raphaël Bournhonesque	f37d078d6a	Fix issue #1069 with custom hook `Doc.sents` definition	2017-05-18 09:59:38 +02:00
ines	9003fd25e5	Fix error messages if model is required (resolves #1051 ) Rename about.__docs__ to about.__docs_models__.	2017-05-13 13:14:02 +02:00
ines	24e973b17f	Rename about.__docs__ to about.__docs_models__	2017-05-13 13:09:00 +02:00
ines	6e1dbc608e	Fix parse_tree test	2017-05-13 12:34:20 +02:00
ines	573f0ba867	Replace deepcopy	2017-05-13 12:34:14 +02:00
ines	bd428c0a70	Set defaults for light and flat kwargs	2017-05-13 12:34:05 +02:00
ines	c5669450a0	Fix formatting	2017-05-13 12:33:57 +02:00
Matthew Honnibal	ad590feaa8	Fix test, which imported English incorrectly	2017-05-13 11:36:19 +02:00
Ines Montani	8d742ac8ff	Merge pull request #1055 from recognai/master Enable pruning out rare words from clusters data	2017-05-13 03:22:56 +02:00
Matthew Honnibal	b2540d2379	Merge Kengz's tree_print patch	2017-05-13 03:18:49 +02:00
oeg	cdaefae60a	feature(populate_vocab): Enable pruning out rare words from clusters data	2017-05-12 16:15:19 +02:00
ines	b1f22c5a10	Fix formatting	2017-05-03 20:11:02 +02:00
ines	a04b5be1b2	Add glossary for annotation scheme (closes #1034 ) Can be imported as explain from spacy.glossary, or called as spacy.explain(term)	2017-05-03 17:02:17 +02:00
Ines Montani	3ea23a3f4d	Fix formatting	2017-05-03 09:44:38 +02:00
Ines Montani	d730eb0c0d	Raise custom ImportError if importing janome fails	2017-05-03 09:43:29 +02:00
Ines Montani	949ad6594b	Add newline	2017-05-03 09:38:43 +02:00
Ines Montani	d12ca587ea	Add newline	2017-05-03 09:38:29 +02:00
Ines Montani	8676cd0135	Add newline	2017-05-03 09:38:07 +02:00
Yasuaki Uechi	c8f83aeb87	Add basic japanese support	2017-05-03 13:56:21 +09:00
Matthew Honnibal	31ec9e1371	Merge branch 'master' of https://github.com/explosion/spaCy	2017-04-27 13:21:39 +02:00
Matthew Honnibal	2da16adcc2	Add dropout optin for parser and NER Dropout can now be specified in the `Parser.update()` method via the `drop` keyword argument, e.g. nlp.entity.update(doc, gold, drop=0.4) This will randomly drop 40% of features, and multiply the value of the others by 1. / 0.4. This may be useful for generalising from small data sets. This commit also patches the examples/training/train_new_entity_type.py example, to use dropout and fix the output (previously it did not output the learned entity).	2017-04-27 13:18:39 +02:00
Ines Montani	7da9cefd25	Merge pull request #1022 from luvogels/master Initial support for Norwegian Bokmål	2017-04-27 11:16:06 +02:00
Ines Montani	c9e592ae6c	Add newline	2017-04-27 11:15:41 +02:00
Ines Montani	5942adccc2	Add newline	2017-04-27 11:15:19 +02:00
Ines Montani	4cd9269aef	Add newline	2017-04-27 11:15:04 +02:00
Ines Montani	ccf13ecc21	Add newline	2017-04-27 11:14:42 +02:00
Ines Montani	03d2b0cc05	Add newline	2017-04-27 11:14:26 +02:00
luvogels	d12a0b6431	Hooked up tokenizer tests	2017-04-26 23:21:41 +02:00
Matthew Honnibal	f0e1606d27	Increment version	2017-04-26 20:25:41 +02:00
luvogels	b331929a7e	Merge branch 'master' of https://github.com/luvogels/spaCy	2017-04-26 19:15:48 +02:00
luvogels	8de59ce3b9	Added tokenizer tests	2017-04-26 19:10:18 +02:00
Matthew Honnibal	4d98511db7	Make Span hashable. Closes #1019	2017-04-26 19:01:05 +02:00
Matthew Honnibal	24c4c51f13	Try to make test999 less flakey	2017-04-26 18:42:06 +02:00
Leif Uwe Vogelsang	460094bf09	Update __init__.py	2017-04-26 18:27:55 +02:00
ines	527d51ac9a	Fetch shortcuts from GitHub and improve error handling	2017-04-26 18:00:28 +02:00
Matthew Honnibal	c4be9c36fe	Fix unicode header in tests	2017-04-24 10:09:01 +02:00
Matthew Honnibal	65f10b53e5	Fix test	2017-04-24 00:25:55 +02:00

1 2 3 4 5 ...

2833 Commits