Commit Graph

9021 Commits

Author SHA1 Message Date
mollerhoj
e840077601 Add some basic tests for Danish 2017-07-03 15:49:51 +02:00
mollerhoj
23025d3b05 Clean up a couple of strange English stopwords 2017-07-03 15:41:59 +02:00
mollerhoj
dc5be7d2f3 Cleanup list of Danish stopwords 2017-07-03 15:40:58 +02:00
Raphaël Bournhonesque
8592f3de47 Fix fuzzy unit tests 2017-07-01 15:03:32 +02:00
Raphaël Bournhonesque
f4748834d9 Use spacy hash_string function instead of md5 2017-07-01 13:17:26 +02:00
Raphaël Bournhonesque
c3d722d66f Add a disclaimer about classes copied from the Jinja2 project 2017-07-01 13:09:56 +02:00
Ines Montani
84eb9d6bd3 Merge pull request #1167 from callumkift/fix/docs-ner-training
Fixed error training NER documentation and example
2017-07-01 11:46:31 +02:00
Ines Montani
c91642efd5 Port over changes from #1168 2017-07-01 11:43:54 +02:00
Ines Montani
0c7f5af5ee Merge pull request #1168 from gispk47/master
Update zh language error
2017-07-01 11:43:12 +02:00
gispk47
669bd14213 Update __init__.py
remove the empty string return from jieba.cut,this will cause the list of tokens cant be pushed assert error
2017-07-01 13:12:00 +08:00
Callum Kift
dfaeee1f37 fixed bug in training ner documentation and example 2017-06-30 09:56:33 +02:00
Paul O'Leary McCann
c336193392 Parametrize and extend Japanese tokenizer tests 2017-06-29 00:09:40 +09:00
Paul O'Leary McCann
30a34ebb6e Add importorskip for janome 2017-06-29 00:09:20 +09:00
Alexis
1b3a5d87ba French NUM_WORDS and ORDINAL_WORDS 2017-06-28 14:11:20 +02:00
Jim O'Regan
70f4d26c10 bounds checks 2017-06-28 10:59:46 +01:00
Jim O'Regan
1ba38b2036 some helpers; the Irish part of UD only has 2500 sentences so this will need source of morphology 2017-06-28 00:42:00 +01:00
Jim O'Regan
559e03605a b' 2017-06-27 22:42:16 +01:00
Paul O'Leary McCann
e56fea14eb Add basic Japanese tokenizer test 2017-06-28 01:24:25 +09:00
Paul O'Leary McCann
84041a2bb5 Make create_tokenizer work with Japanese 2017-06-28 01:18:05 +09:00
Ines Montani
f69ff15089 Update CONTRIBUTORS.md 2017-06-27 14:49:02 +02:00
Ines Montani
e265e34e18 Merge pull request #1153 from jimregan/polish
add tokeniser exceptions for Polish
2017-06-27 14:48:00 +02:00
Jim Regan
d81ceb0cd5 Merge branch 'develop' into polish 2017-06-26 22:42:27 +01:00
Jim O'Regan
2f84c73585 a start 2017-06-26 22:40:04 +01:00
Jim O'Regan
28d7f0a672 reference 2017-06-26 22:38:28 +01:00
Jim O'Regan
e12defdd9c missed a couple 2017-06-26 22:24:14 +01:00
Jim O'Regan
c1e4e0f3bf just now discovered that you can do multiwords 2017-06-26 22:19:39 +01:00
Jim O'Regan
5e5f94c1c0 fix dup 2017-06-26 21:57:00 +01:00
Jim O'Regan
a8dff9133e add POS 2017-06-26 21:53:41 +01:00
Jim O'Regan
3c4d83aa6e CLA 2017-06-26 21:32:48 +01:00
Jim O'Regan
e9213f54de missed one 2017-06-26 21:29:21 +01:00
Jim O'Regan
1eb7cc3017 attempt a port from #1147 2017-06-26 21:24:55 +01:00
Ines Montani
d6e08f2bf6 Merge pull request #1142 from garfieldnate/patch-1
fix confusing typo
2017-06-26 10:41:47 +02:00
Ines Montani
01c7c09c7f Merge pull request #1146 from jarle/doc-patch
Fix small typo in the new spaCy 101 guide
2017-06-26 10:41:18 +02:00
Jarle Mathiesen
f20533ec0c fix small typo 2017-06-24 12:31:33 +02:00
Nathan Glenn
81166c3d56 fix confusing typo
This document describes the `Vocab` class, not the `Span` class.
2017-06-21 19:22:30 +02:00
Matthew Honnibal
91e52543ef Merge pull request #1118 from Gregory-Howard/patch-2
Update _tokenizer_exceptions_list (adding cities)
2017-06-20 11:16:07 +02:00
Matthew Honnibal
8ea785e01a Merge pull request #1119 from oroszgy/patch-3
Fixed conllu converter
2017-06-20 11:14:41 +02:00
Ines Montani
9335736c20 Merge pull request #1127 from bartbroere/master
Fixed a minor typo in the documentation
2017-06-13 13:15:20 +02:00
Ines Montani
f64e3efc76 Merge pull request #1128 from thinline72/patch-1
Changed the capital of Lithuania to Vilnius
2017-06-13 13:14:43 +02:00
Savva Kolbachev
800a8faff4 Changed the capital of Lithuania to Vilnius
Hi,
There is a typo about the capital of Lithuania.

Vilnius is the capital of Lithuania https://en.wikipedia.org/wiki/Vilnius
Ljubljana is the capital of Slovenia https://en.wikipedia.org/wiki/Ljubljana
2017-06-12 23:27:00 +03:00
Bart Broere
e3be243e06 Merge pull request #1 from explosion/master
Update
2017-06-12 22:06:59 +02:00
Ines Montani
6eae9f943a Merge pull request #1125 from Tpt/french_noun_chunks
Adds function to extract french noun chunks
2017-06-12 21:25:33 +02:00
Ines Montani
57f64b9e1c Merge pull request #1124 from v3t3a/patch-3
docs - Fix url error for Displacy Ent visualizer
2017-06-12 21:20:32 +02:00
Ines Montani
b2a28028cf Merge pull request #1115 from v3t3a/patch-2
docs - Add read() method when opening file (Lightning tour)
2017-06-12 21:19:25 +02:00
Ines Montani
fe8d136ae0 Merge pull request #1114 from v3t3a/patch-1
docs - Update doc.jade (Just remove a duplicate 'doc =')
2017-06-12 21:19:02 +02:00
Ines Montani
6b94c3cf00 Merge pull request #1126 from ianmobbs/master
Added html5lib==1.0b8 to requirements.txt
2017-06-12 21:18:24 +02:00
Ian Mobbs
d19ce29a23 Create requirements.txt 2017-06-12 13:21:44 -04:00
Tpt
7745b3ae04 Adds noun chunks to French syntax iterators 2017-06-12 15:29:58 +02:00
Tpt
57e8254f63 Adds function to extract french noun chunks 2017-06-12 15:20:49 +02:00
Vetea
eae1f7b19c Fix url error for Displacy Ent visualizer 2017-06-12 14:30:02 +02:00