Commit Graph

2833 Commits

Author SHA1 Message Date
Paul O'Leary McCann
bc87b815cc Add comment clarifying what LANGUAGES does 2017-07-09 16:28:55 +09:00
Paul O'Leary McCann
04e6a65188 Remove Japanese from LANGUAGES
LANGUAGES is a list of languages whose tokenizers get run through a
variety of generic tests. Since the generic tests don't check the JA
fixture, it blows up when it can't find janome. -POLM
2017-07-09 16:23:26 +09:00
Paul O'Leary McCann
c336193392 Parametrize and extend Japanese tokenizer tests 2017-06-29 00:09:40 +09:00
Paul O'Leary McCann
30a34ebb6e Add importorskip for janome 2017-06-29 00:09:20 +09:00
Paul O'Leary McCann
e56fea14eb Add basic Japanese tokenizer test 2017-06-28 01:24:25 +09:00
Paul O'Leary McCann
84041a2bb5 Make create_tokenizer work with Japanese 2017-06-28 01:18:05 +09:00
György Orosz
fa26041da6 Fixed typo in cli/package.py 2017-06-07 16:19:08 +02:00
Ines Montani
e7ef51b382 Update tokenizer_exceptions.py 2017-06-02 19:00:01 +02:00
Ines Montani
81918155ef Merge pull request #1096 from recognai/master
Spanish model features
2017-06-02 11:07:27 +02:00
Francisco Aranda
70a2180199 fix(spanish sentence segmentation): remove tokenizer exceptions the break sentence segmentation. Aligned with training corpus 2017-06-02 08:19:57 +02:00
Francisco Aranda
5b385e7d78 feat(spanish model): add the spanish noun chunker 2017-06-02 08:14:06 +02:00
Ines Montani
7f6be41f21 Fix typo in English tokenizer exceptions (resolves #1071) 2017-05-23 12:18:00 +02:00
Raphaël Bournhonesque
6381ebfb14 Use yield from syntax 2017-05-18 10:42:35 +02:00
Raphaël Bournhonesque
f37d078d6a Fix issue #1069 with custom hook Doc.sents definition 2017-05-18 09:59:38 +02:00
ines
9003fd25e5 Fix error messages if model is required (resolves #1051)
Rename about.__docs__ to about.__docs_models__.
2017-05-13 13:14:02 +02:00
ines
24e973b17f Rename about.__docs__ to about.__docs_models__ 2017-05-13 13:09:00 +02:00
ines
6e1dbc608e Fix parse_tree test 2017-05-13 12:34:20 +02:00
ines
573f0ba867 Replace deepcopy 2017-05-13 12:34:14 +02:00
ines
bd428c0a70 Set defaults for light and flat kwargs 2017-05-13 12:34:05 +02:00
ines
c5669450a0 Fix formatting 2017-05-13 12:33:57 +02:00
Matthew Honnibal
ad590feaa8 Fix test, which imported English incorrectly 2017-05-13 11:36:19 +02:00
Ines Montani
8d742ac8ff Merge pull request #1055 from recognai/master
Enable pruning out rare words from clusters data
2017-05-13 03:22:56 +02:00
Matthew Honnibal
b2540d2379 Merge Kengz's tree_print patch 2017-05-13 03:18:49 +02:00
oeg
cdaefae60a feature(populate_vocab): Enable pruning out rare words from clusters data 2017-05-12 16:15:19 +02:00
ines
b1f22c5a10 Fix formatting 2017-05-03 20:11:02 +02:00
ines
a04b5be1b2 Add glossary for annotation scheme (closes #1034)
Can be imported as explain from spacy.glossary, or called as
spacy.explain(term)
2017-05-03 17:02:17 +02:00
Ines Montani
3ea23a3f4d Fix formatting 2017-05-03 09:44:38 +02:00
Ines Montani
d730eb0c0d Raise custom ImportError if importing janome fails 2017-05-03 09:43:29 +02:00
Ines Montani
949ad6594b Add newline 2017-05-03 09:38:43 +02:00
Ines Montani
d12ca587ea Add newline 2017-05-03 09:38:29 +02:00
Ines Montani
8676cd0135 Add newline 2017-05-03 09:38:07 +02:00
Yasuaki Uechi
c8f83aeb87 Add basic japanese support 2017-05-03 13:56:21 +09:00
Matthew Honnibal
31ec9e1371 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-27 13:21:39 +02:00
Matthew Honnibal
2da16adcc2 Add dropout optin for parser and NER
Dropout can now be specified in the `Parser.update()` method via
the `drop` keyword argument, e.g.

    nlp.entity.update(doc, gold, drop=0.4)

This will randomly drop 40% of features, and multiply the value of the
others by 1. / 0.4. This may be useful for generalising from small data
sets.

This commit also patches the examples/training/train_new_entity_type.py
example, to use dropout and fix the output (previously it did not output
the learned entity).
2017-04-27 13:18:39 +02:00
Ines Montani
7da9cefd25 Merge pull request #1022 from luvogels/master
Initial support for Norwegian Bokmål
2017-04-27 11:16:06 +02:00
Ines Montani
c9e592ae6c Add newline 2017-04-27 11:15:41 +02:00
Ines Montani
5942adccc2 Add newline 2017-04-27 11:15:19 +02:00
Ines Montani
4cd9269aef Add newline 2017-04-27 11:15:04 +02:00
Ines Montani
ccf13ecc21 Add newline 2017-04-27 11:14:42 +02:00
Ines Montani
03d2b0cc05 Add newline 2017-04-27 11:14:26 +02:00
luvogels
d12a0b6431 Hooked up tokenizer tests 2017-04-26 23:21:41 +02:00
Matthew Honnibal
f0e1606d27 Increment version 2017-04-26 20:25:41 +02:00
luvogels
b331929a7e Merge branch 'master' of https://github.com/luvogels/spaCy 2017-04-26 19:15:48 +02:00
luvogels
8de59ce3b9 Added tokenizer tests 2017-04-26 19:10:18 +02:00
Matthew Honnibal
4d98511db7 Make Span hashable. Closes #1019 2017-04-26 19:01:05 +02:00
Matthew Honnibal
24c4c51f13 Try to make test999 less flakey 2017-04-26 18:42:06 +02:00
Leif Uwe Vogelsang
460094bf09 Update __init__.py 2017-04-26 18:27:55 +02:00
ines
527d51ac9a Fetch shortcuts from GitHub and improve error handling 2017-04-26 18:00:28 +02:00
Matthew Honnibal
c4be9c36fe Fix unicode header in tests 2017-04-24 10:09:01 +02:00
Matthew Honnibal
65f10b53e5 Fix test 2017-04-24 00:25:55 +02:00