Raphaël Bournhonesque
8ff4f512a2
Check in PatternParser that the generated Pattern is valid
2017-06-11 18:28:36 +02:00
Raphaël Bournhonesque
e55199d454
Implementation of Pattern
2017-06-11 01:06:24 +02:00
György Orosz
fa26041da6
Fixed typo in cli/package.py
2017-06-07 16:19:08 +02:00
Ines Montani
e7ef51b382
Update tokenizer_exceptions.py
2017-06-02 19:00:01 +02:00
Ines Montani
81918155ef
Merge pull request #1096 from recognai/master
...
Spanish model features
2017-06-02 11:07:27 +02:00
Francisco Aranda
70a2180199
fix(spanish sentence segmentation): remove tokenizer exceptions the break sentence segmentation. Aligned with training corpus
2017-06-02 08:19:57 +02:00
Francisco Aranda
5b385e7d78
feat(spanish model): add the spanish noun chunker
2017-06-02 08:14:06 +02:00
Ines Montani
7f6be41f21
Fix typo in English tokenizer exceptions ( resolves #1071 )
2017-05-23 12:18:00 +02:00
Raphaël Bournhonesque
6381ebfb14
Use yield from syntax
2017-05-18 10:42:35 +02:00
Raphaël Bournhonesque
f37d078d6a
Fix issue #1069 with custom hook Doc.sents
definition
2017-05-18 09:59:38 +02:00
ines
9003fd25e5
Fix error messages if model is required ( resolves #1051 )
...
Rename about.__docs__ to about.__docs_models__.
2017-05-13 13:14:02 +02:00
ines
24e973b17f
Rename about.__docs__ to about.__docs_models__
2017-05-13 13:09:00 +02:00
ines
6e1dbc608e
Fix parse_tree test
2017-05-13 12:34:20 +02:00
ines
573f0ba867
Replace deepcopy
2017-05-13 12:34:14 +02:00
ines
bd428c0a70
Set defaults for light and flat kwargs
2017-05-13 12:34:05 +02:00
ines
c5669450a0
Fix formatting
2017-05-13 12:33:57 +02:00
Matthew Honnibal
ad590feaa8
Fix test, which imported English incorrectly
2017-05-13 11:36:19 +02:00
Ines Montani
8d742ac8ff
Merge pull request #1055 from recognai/master
...
Enable pruning out rare words from clusters data
2017-05-13 03:22:56 +02:00
Matthew Honnibal
b2540d2379
Merge Kengz's tree_print patch
2017-05-13 03:18:49 +02:00
oeg
cdaefae60a
feature(populate_vocab): Enable pruning out rare words from clusters data
2017-05-12 16:15:19 +02:00
ines
b1f22c5a10
Fix formatting
2017-05-03 20:11:02 +02:00
ines
a04b5be1b2
Add glossary for annotation scheme ( closes #1034 )
...
Can be imported as explain from spacy.glossary, or called as
spacy.explain(term)
2017-05-03 17:02:17 +02:00
Ines Montani
3ea23a3f4d
Fix formatting
2017-05-03 09:44:38 +02:00
Ines Montani
d730eb0c0d
Raise custom ImportError if importing janome fails
2017-05-03 09:43:29 +02:00
Ines Montani
949ad6594b
Add newline
2017-05-03 09:38:43 +02:00
Ines Montani
d12ca587ea
Add newline
2017-05-03 09:38:29 +02:00
Ines Montani
8676cd0135
Add newline
2017-05-03 09:38:07 +02:00
Yasuaki Uechi
c8f83aeb87
Add basic japanese support
2017-05-03 13:56:21 +09:00
Matthew Honnibal
31ec9e1371
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-27 13:21:39 +02:00
Matthew Honnibal
2da16adcc2
Add dropout optin for parser and NER
...
Dropout can now be specified in the `Parser.update()` method via
the `drop` keyword argument, e.g.
nlp.entity.update(doc, gold, drop=0.4)
This will randomly drop 40% of features, and multiply the value of the
others by 1. / 0.4. This may be useful for generalising from small data
sets.
This commit also patches the examples/training/train_new_entity_type.py
example, to use dropout and fix the output (previously it did not output
the learned entity).
2017-04-27 13:18:39 +02:00
Ines Montani
7da9cefd25
Merge pull request #1022 from luvogels/master
...
Initial support for Norwegian Bokmål
2017-04-27 11:16:06 +02:00
Ines Montani
c9e592ae6c
Add newline
2017-04-27 11:15:41 +02:00
Ines Montani
5942adccc2
Add newline
2017-04-27 11:15:19 +02:00
Ines Montani
4cd9269aef
Add newline
2017-04-27 11:15:04 +02:00
Ines Montani
ccf13ecc21
Add newline
2017-04-27 11:14:42 +02:00
Ines Montani
03d2b0cc05
Add newline
2017-04-27 11:14:26 +02:00
luvogels
d12a0b6431
Hooked up tokenizer tests
2017-04-26 23:21:41 +02:00
Matthew Honnibal
f0e1606d27
Increment version
2017-04-26 20:25:41 +02:00
luvogels
b331929a7e
Merge branch 'master' of https://github.com/luvogels/spaCy
2017-04-26 19:15:48 +02:00
luvogels
8de59ce3b9
Added tokenizer tests
2017-04-26 19:10:18 +02:00
Matthew Honnibal
4d98511db7
Make Span hashable. Closes #1019
2017-04-26 19:01:05 +02:00
Matthew Honnibal
24c4c51f13
Try to make test999 less flakey
2017-04-26 18:42:06 +02:00
Leif Uwe Vogelsang
460094bf09
Update __init__.py
2017-04-26 18:27:55 +02:00
ines
527d51ac9a
Fetch shortcuts from GitHub and improve error handling
2017-04-26 18:00:28 +02:00
Matthew Honnibal
c4be9c36fe
Fix unicode header in tests
2017-04-24 10:09:01 +02:00
Matthew Honnibal
65f10b53e5
Fix test
2017-04-24 00:25:55 +02:00
Matthew Honnibal
70a43858e1
Fix flakey test
2017-04-24 00:06:30 +02:00
Matthew Honnibal
3973af2d15
Make training test less flakey
2017-04-23 22:59:34 +02:00
Matthew Honnibal
4f9657b42b
Fix reporting if no dev data with train
2017-04-23 22:27:10 +02:00
Matthew Honnibal
df2ac8b843
Merge branch 'master' of https://github.com/explosion/spaCy
2017-04-23 21:25:07 +02:00