Ines Montani
9af04ea11f
Merge pull request #1161 from AlexisEidelman/patch-1
...
French NUM_WORDS and ORDINAL_WORDS
2017-07-22 13:40:46 +02:00
Matthew Honnibal
44dd247e73
Merge branch 'master' of https://github.com/explosion/spaCy
2017-07-22 13:35:30 +02:00
Matthew Honnibal
94267ec50f
Fix merge conflit in printer
2017-07-22 13:35:15 +02:00
Ines Montani
c7708dc736
Merge pull request #1177 from swierh/master
...
Dutch NUM_WORDS and ORDINAL_WORDS
2017-07-22 13:35:08 +02:00
Matthew Honnibal
5916d46ba8
Avoid use of deepcopy in printer
2017-07-22 13:34:01 +02:00
Ines Montani
9eca6503c1
Merge pull request #1157 from polm/master
...
Add basic Japanese Tokenizer Test
2017-07-10 13:07:11 +02:00
Paul O'Leary McCann
bc87b815cc
Add comment clarifying what LANGUAGES does
2017-07-09 16:28:55 +09:00
Paul O'Leary McCann
04e6a65188
Remove Japanese from LANGUAGES
...
LANGUAGES is a list of languages whose tokenizers get run through a
variety of generic tests. Since the generic tests don't check the JA
fixture, it blows up when it can't find janome. -POLM
2017-07-09 16:23:26 +09:00
Swier
29720150f9
fix import of stop words in language data
2017-07-05 14:08:04 +02:00
Swier
f377c9c952
Rename stop_words.py to word_sets.py
2017-07-05 14:06:28 +02:00
Swier
5357874bf7
add Dutch numbers and ordinals
2017-07-05 14:03:30 +02:00
gispk47
669bd14213
Update __init__.py
...
remove the empty string return from jieba.cut,this will cause the list of tokens cant be pushed assert error
2017-07-01 13:12:00 +08:00
Paul O'Leary McCann
c336193392
Parametrize and extend Japanese tokenizer tests
2017-06-29 00:09:40 +09:00
Paul O'Leary McCann
30a34ebb6e
Add importorskip for janome
2017-06-29 00:09:20 +09:00
Alexis
1b3a5d87ba
French NUM_WORDS and ORDINAL_WORDS
2017-06-28 14:11:20 +02:00
Paul O'Leary McCann
e56fea14eb
Add basic Japanese tokenizer test
2017-06-28 01:24:25 +09:00
Paul O'Leary McCann
84041a2bb5
Make create_tokenizer work with Japanese
2017-06-28 01:18:05 +09:00
György Orosz
fa26041da6
Fixed typo in cli/package.py
2017-06-07 16:19:08 +02:00
Ines Montani
e7ef51b382
Update tokenizer_exceptions.py
2017-06-02 19:00:01 +02:00
Ines Montani
81918155ef
Merge pull request #1096 from recognai/master
...
Spanish model features
2017-06-02 11:07:27 +02:00
Francisco Aranda
70a2180199
fix(spanish sentence segmentation): remove tokenizer exceptions the break sentence segmentation. Aligned with training corpus
2017-06-02 08:19:57 +02:00
Francisco Aranda
5b385e7d78
feat(spanish model): add the spanish noun chunker
2017-06-02 08:14:06 +02:00
Ines Montani
7f6be41f21
Fix typo in English tokenizer exceptions ( resolves #1071 )
2017-05-23 12:18:00 +02:00
Raphaël Bournhonesque
6381ebfb14
Use yield from syntax
2017-05-18 10:42:35 +02:00
Raphaël Bournhonesque
f37d078d6a
Fix issue #1069 with custom hook Doc.sents
definition
2017-05-18 09:59:38 +02:00
ines
9003fd25e5
Fix error messages if model is required ( resolves #1051 )
...
Rename about.__docs__ to about.__docs_models__.
2017-05-13 13:14:02 +02:00
ines
24e973b17f
Rename about.__docs__ to about.__docs_models__
2017-05-13 13:09:00 +02:00
ines
6e1dbc608e
Fix parse_tree test
2017-05-13 12:34:20 +02:00
ines
573f0ba867
Replace deepcopy
2017-05-13 12:34:14 +02:00
ines
bd428c0a70
Set defaults for light and flat kwargs
2017-05-13 12:34:05 +02:00
ines
c5669450a0
Fix formatting
2017-05-13 12:33:57 +02:00
Matthew Honnibal
ad590feaa8
Fix test, which imported English incorrectly
2017-05-13 11:36:19 +02:00
Ines Montani
8d742ac8ff
Merge pull request #1055 from recognai/master
...
Enable pruning out rare words from clusters data
2017-05-13 03:22:56 +02:00
Matthew Honnibal
b2540d2379
Merge Kengz's tree_print patch
2017-05-13 03:18:49 +02:00
oeg
cdaefae60a
feature(populate_vocab): Enable pruning out rare words from clusters data
2017-05-12 16:15:19 +02:00
ines
b1f22c5a10
Fix formatting
2017-05-03 20:11:02 +02:00
ines
a04b5be1b2
Add glossary for annotation scheme ( closes #1034 )
...
Can be imported as explain from spacy.glossary, or called as
spacy.explain(term)
2017-05-03 17:02:17 +02:00
Ines Montani
3ea23a3f4d
Fix formatting
2017-05-03 09:44:38 +02:00
Ines Montani
d730eb0c0d
Raise custom ImportError if importing janome fails
2017-05-03 09:43:29 +02:00
Ines Montani
949ad6594b
Add newline
2017-05-03 09:38:43 +02:00
Ines Montani
d12ca587ea
Add newline
2017-05-03 09:38:29 +02:00
Ines Montani
8676cd0135
Add newline
2017-05-03 09:38:07 +02:00
Yasuaki Uechi
c8f83aeb87
Add basic japanese support
2017-05-03 13:56:21 +09:00
Matthew Honnibal
31ec9e1371
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-27 13:21:39 +02:00
Matthew Honnibal
2da16adcc2
Add dropout optin for parser and NER
...
Dropout can now be specified in the `Parser.update()` method via
the `drop` keyword argument, e.g.
nlp.entity.update(doc, gold, drop=0.4)
This will randomly drop 40% of features, and multiply the value of the
others by 1. / 0.4. This may be useful for generalising from small data
sets.
This commit also patches the examples/training/train_new_entity_type.py
example, to use dropout and fix the output (previously it did not output
the learned entity).
2017-04-27 13:18:39 +02:00
Ines Montani
7da9cefd25
Merge pull request #1022 from luvogels/master
...
Initial support for Norwegian Bokmål
2017-04-27 11:16:06 +02:00
Ines Montani
c9e592ae6c
Add newline
2017-04-27 11:15:41 +02:00
Ines Montani
5942adccc2
Add newline
2017-04-27 11:15:19 +02:00
Ines Montani
4cd9269aef
Add newline
2017-04-27 11:15:04 +02:00
Ines Montani
ccf13ecc21
Add newline
2017-04-27 11:14:42 +02:00