Matthew Honnibal
727481377e
Add text-classifer thinc models
2017-07-20 00:17:17 +02:00
Matthew Honnibal
f014138c11
Fix parser tests
2017-07-20 00:16:52 +02:00
Jorge Paredes
fadacd0d47
Fix url broken
...
The related url to **custom named entities** was broken
2017-07-16 10:06:32 -05:00
Ines Montani
2d22b63e09
Merge pull request #1186 from lgenerknol/master
...
.../cli/#foo is 404
2017-07-13 17:33:55 +02:00
lgenerknol
2b219caf0d
.../cli/#foo is 404
...
https://spacy.io/docs/usage/cli/#package is a 404.
Changed to https://spacy.io/docs/usage/cli#package
Definitely a larger fix possible to deal with trailing slashes
2017-07-12 13:12:24 -04:00
Ines Montani
d79fa8743a
Merge pull request #1185 from lgenerknol/master
...
Missing markup char
2017-07-12 17:27:42 +02:00
lgenerknol
6cf2690943
Missing markup char
...
Frontend displayed:
```
If start_idx and do not mark[...]
```
Note the missing "end_idx" after 'and'.
2017-07-12 11:06:16 -04:00
Ines Montani
9eca6503c1
Merge pull request #1157 from polm/master
...
Add basic Japanese Tokenizer Test
2017-07-10 13:07:11 +02:00
Paul O'Leary McCann
bc87b815cc
Add comment clarifying what LANGUAGES does
2017-07-09 16:28:55 +09:00
Paul O'Leary McCann
04e6a65188
Remove Japanese from LANGUAGES
...
LANGUAGES is a list of languages whose tokenizers get run through a
variety of generic tests. Since the generic tests don't check the JA
fixture, it blows up when it can't find janome. -POLM
2017-07-09 16:23:26 +09:00
Ines Montani
2b9411bb54
Merge pull request #1181 from val314159/patch-1
...
make this work in python2.7
2017-07-08 00:15:47 +02:00
val314159
19d4706f69
make this work in python2.7
2017-07-07 13:18:17 -07:00
Swier
29720150f9
fix import of stop words in language data
2017-07-05 14:08:04 +02:00
Swier
f377c9c952
Rename stop_words.py to word_sets.py
2017-07-05 14:06:28 +02:00
Swier
5357874bf7
add Dutch numbers and ordinals
2017-07-05 14:03:30 +02:00
mollerhoj
85144835da
Add Tag_map for Danish
2017-07-03 15:52:55 +02:00
mollerhoj
64c732918a
Add Morph_rules. (TODO: Not working?)
2017-07-03 15:52:55 +02:00
mollerhoj
3b2cb107a3
Add like_num functionality to Danish
2017-07-03 15:49:51 +02:00
mollerhoj
e8f40ceed8
Add short names of months to tokenizer_exceptions
2017-07-03 15:49:51 +02:00
mollerhoj
e840077601
Add some basic tests for Danish
2017-07-03 15:49:51 +02:00
mollerhoj
23025d3b05
Clean up a couple of strange English stopwords
2017-07-03 15:41:59 +02:00
mollerhoj
dc5be7d2f3
Cleanup list of Danish stopwords
2017-07-03 15:40:58 +02:00
Raphaël Bournhonesque
8592f3de47
Fix fuzzy unit tests
2017-07-01 15:03:32 +02:00
Raphaël Bournhonesque
f4748834d9
Use spacy hash_string function instead of md5
2017-07-01 13:17:26 +02:00
Raphaël Bournhonesque
c3d722d66f
Add a disclaimer about classes copied from the Jinja2 project
2017-07-01 13:09:56 +02:00
Ines Montani
84eb9d6bd3
Merge pull request #1167 from callumkift/fix/docs-ner-training
...
Fixed error training NER documentation and example
2017-07-01 11:46:31 +02:00
Ines Montani
c91642efd5
Port over changes from #1168
2017-07-01 11:43:54 +02:00
Ines Montani
0c7f5af5ee
Merge pull request #1168 from gispk47/master
...
Update zh language error
2017-07-01 11:43:12 +02:00
gispk47
669bd14213
Update __init__.py
...
remove the empty string return from jieba.cut,this will cause the list of tokens cant be pushed assert error
2017-07-01 13:12:00 +08:00
Callum Kift
dfaeee1f37
fixed bug in training ner documentation and example
2017-06-30 09:56:33 +02:00
Paul O'Leary McCann
c336193392
Parametrize and extend Japanese tokenizer tests
2017-06-29 00:09:40 +09:00
Paul O'Leary McCann
30a34ebb6e
Add importorskip for janome
2017-06-29 00:09:20 +09:00
Alexis
1b3a5d87ba
French NUM_WORDS and ORDINAL_WORDS
2017-06-28 14:11:20 +02:00
Jim O'Regan
70f4d26c10
bounds checks
2017-06-28 10:59:46 +01:00
Jim O'Regan
1ba38b2036
some helpers; the Irish part of UD only has 2500 sentences so this will need source of morphology
2017-06-28 00:42:00 +01:00
Jim O'Regan
559e03605a
b'
2017-06-27 22:42:16 +01:00
Paul O'Leary McCann
e56fea14eb
Add basic Japanese tokenizer test
2017-06-28 01:24:25 +09:00
Paul O'Leary McCann
84041a2bb5
Make create_tokenizer work with Japanese
2017-06-28 01:18:05 +09:00
Ines Montani
f69ff15089
Update CONTRIBUTORS.md
2017-06-27 14:49:02 +02:00
Ines Montani
e265e34e18
Merge pull request #1153 from jimregan/polish
...
add tokeniser exceptions for Polish
2017-06-27 14:48:00 +02:00
Jim Regan
d81ceb0cd5
Merge branch 'develop' into polish
2017-06-26 22:42:27 +01:00
Jim O'Regan
2f84c73585
a start
2017-06-26 22:40:04 +01:00
Jim O'Regan
28d7f0a672
reference
2017-06-26 22:38:28 +01:00
Jim O'Regan
e12defdd9c
missed a couple
2017-06-26 22:24:14 +01:00
Jim O'Regan
c1e4e0f3bf
just now discovered that you can do multiwords
2017-06-26 22:19:39 +01:00
Jim O'Regan
5e5f94c1c0
fix dup
2017-06-26 21:57:00 +01:00
Jim O'Regan
a8dff9133e
add POS
2017-06-26 21:53:41 +01:00
Jim O'Regan
3c4d83aa6e
CLA
2017-06-26 21:32:48 +01:00
Jim O'Regan
e9213f54de
missed one
2017-06-26 21:29:21 +01:00
Jim O'Regan
1eb7cc3017
attempt a port from #1147
2017-06-26 21:24:55 +01:00