Commit Graph

5296 Commits

Author SHA1 Message Date
Ines Montani
c7708dc736 Merge pull request #1177 from swierh/master
Dutch NUM_WORDS and ORDINAL_WORDS
2017-07-22 13:35:08 +02:00
Matthew Honnibal
5916d46ba8 Avoid use of deepcopy in printer 2017-07-22 13:34:01 +02:00
Matthew Honnibal
a405660068 Add commit to tagger example 2017-07-22 13:32:48 +02:00
Matthew Honnibal
3fef5f642b Rename tagger training example 2017-07-22 13:29:15 +02:00
Matthew Honnibal
8bb443be4f Add standalone tagger training example 2017-07-22 13:28:51 +02:00
Ines Montani
7c66691790 Merge pull request #1197 from jsparedes/patch-1
Fix url broken
2017-07-21 14:05:26 +02:00
Jorge Paredes
fadacd0d47 Fix url broken
The related url to **custom named entities** was broken
2017-07-16 10:06:32 -05:00
Ines Montani
2d22b63e09 Merge pull request #1186 from lgenerknol/master
.../cli/#foo is 404
2017-07-13 17:33:55 +02:00
lgenerknol
2b219caf0d .../cli/#foo is 404
https://spacy.io/docs/usage/cli/#package is a 404.  
Changed to https://spacy.io/docs/usage/cli#package 

Definitely a larger fix possible to deal with trailing slashes
2017-07-12 13:12:24 -04:00
Ines Montani
d79fa8743a Merge pull request #1185 from lgenerknol/master
Missing markup char
2017-07-12 17:27:42 +02:00
lgenerknol
6cf2690943 Missing markup char
Frontend displayed: 
```
 If start_idx and do not mark[...]
```
Note the missing "end_idx" after 'and'.
2017-07-12 11:06:16 -04:00
Ines Montani
9eca6503c1 Merge pull request #1157 from polm/master
Add basic Japanese Tokenizer Test
2017-07-10 13:07:11 +02:00
Paul O'Leary McCann
bc87b815cc Add comment clarifying what LANGUAGES does 2017-07-09 16:28:55 +09:00
Paul O'Leary McCann
04e6a65188 Remove Japanese from LANGUAGES
LANGUAGES is a list of languages whose tokenizers get run through a
variety of generic tests. Since the generic tests don't check the JA
fixture, it blows up when it can't find janome. -POLM
2017-07-09 16:23:26 +09:00
Ines Montani
2b9411bb54 Merge pull request #1181 from val314159/patch-1
make this work in python2.7
2017-07-08 00:15:47 +02:00
val314159
19d4706f69 make this work in python2.7 2017-07-07 13:18:17 -07:00
Swier
29720150f9 fix import of stop words in language data 2017-07-05 14:08:04 +02:00
Swier
f377c9c952 Rename stop_words.py to word_sets.py 2017-07-05 14:06:28 +02:00
Swier
5357874bf7 add Dutch numbers and ordinals 2017-07-05 14:03:30 +02:00
Raphaël Bournhonesque
8592f3de47 Fix fuzzy unit tests 2017-07-01 15:03:32 +02:00
Raphaël Bournhonesque
f4748834d9 Use spacy hash_string function instead of md5 2017-07-01 13:17:26 +02:00
Raphaël Bournhonesque
c3d722d66f Add a disclaimer about classes copied from the Jinja2 project 2017-07-01 13:09:56 +02:00
Ines Montani
84eb9d6bd3 Merge pull request #1167 from callumkift/fix/docs-ner-training
Fixed error training NER documentation and example
2017-07-01 11:46:31 +02:00
Ines Montani
0c7f5af5ee Merge pull request #1168 from gispk47/master
Update zh language error
2017-07-01 11:43:12 +02:00
gispk47
669bd14213 Update __init__.py
remove the empty string return from jieba.cut,this will cause the list of tokens cant be pushed assert error
2017-07-01 13:12:00 +08:00
Callum Kift
dfaeee1f37 fixed bug in training ner documentation and example 2017-06-30 09:56:33 +02:00
Paul O'Leary McCann
c336193392 Parametrize and extend Japanese tokenizer tests 2017-06-29 00:09:40 +09:00
Paul O'Leary McCann
30a34ebb6e Add importorskip for janome 2017-06-29 00:09:20 +09:00
Alexis
1b3a5d87ba French NUM_WORDS and ORDINAL_WORDS 2017-06-28 14:11:20 +02:00
Paul O'Leary McCann
e56fea14eb Add basic Japanese tokenizer test 2017-06-28 01:24:25 +09:00
Paul O'Leary McCann
84041a2bb5 Make create_tokenizer work with Japanese 2017-06-28 01:18:05 +09:00
Ines Montani
f69ff15089 Update CONTRIBUTORS.md 2017-06-27 14:49:02 +02:00
Ines Montani
d6e08f2bf6 Merge pull request #1142 from garfieldnate/patch-1
fix confusing typo
2017-06-26 10:41:47 +02:00
Nathan Glenn
81166c3d56 fix confusing typo
This document describes the `Vocab` class, not the `Span` class.
2017-06-21 19:22:30 +02:00
Ines Montani
9335736c20 Merge pull request #1127 from bartbroere/master
Fixed a minor typo in the documentation
2017-06-13 13:15:20 +02:00
Bart Broere
e3be243e06 Merge pull request #1 from explosion/master
Update
2017-06-12 22:06:59 +02:00
Ines Montani
6b94c3cf00 Merge pull request #1126 from ianmobbs/master
Added html5lib==1.0b8 to requirements.txt
2017-06-12 21:18:24 +02:00
Ian Mobbs
d19ce29a23 Create requirements.txt 2017-06-12 13:21:44 -04:00
Bart Broere
e4a45ae55f Very minor documentation fix 2017-06-12 12:28:51 +02:00
Raphaël Bournhonesque
46637369aa Add basic unit tests for Pattern 2017-06-11 18:34:38 +02:00
Raphaël Bournhonesque
1849a110e3 Improve logging 2017-06-11 18:31:19 +02:00
Raphaël Bournhonesque
4289a21703 Add 'ent' to node matching key 2017-06-11 18:30:53 +02:00
Raphaël Bournhonesque
d010f5a123 Fix node matching bug caused by lower function 2017-06-11 18:30:28 +02:00
Raphaël Bournhonesque
4ca8a396a2 Do not add the root token to the adjacency map 2017-06-11 18:30:01 +02:00
Raphaël Bournhonesque
d9c567371f Move add_node and add_edge methods to the Tree base class 2017-06-11 18:29:28 +02:00
Raphaël Bournhonesque
8ff4f512a2 Check in PatternParser that the generated Pattern is valid 2017-06-11 18:28:36 +02:00
Raphaël Bournhonesque
e55199d454 Implementation of Pattern 2017-06-11 01:06:24 +02:00
Ines Montani
47aaecd974 Merge pull request #1109 from oroszgy/patch-2
Fixed typo in cli/package.py
2017-06-07 16:39:40 +02:00
György Orosz
fa26041da6 Fixed typo in cli/package.py 2017-06-07 16:19:08 +02:00
Ines Montani
3cceabbf32 Update README.rst 2017-06-06 14:39:54 +02:00