Paul O'Leary McCann
|
c336193392
|
Parametrize and extend Japanese tokenizer tests
|
2017-06-29 00:09:40 +09:00 |
|
Paul O'Leary McCann
|
30a34ebb6e
|
Add importorskip for janome
|
2017-06-29 00:09:20 +09:00 |
|
Alexis
|
1b3a5d87ba
|
French NUM_WORDS and ORDINAL_WORDS
|
2017-06-28 14:11:20 +02:00 |
|
Jim O'Regan
|
70f4d26c10
|
bounds checks
|
2017-06-28 10:59:46 +01:00 |
|
Jim O'Regan
|
1ba38b2036
|
some helpers; the Irish part of UD only has 2500 sentences so this will need source of morphology
|
2017-06-28 00:42:00 +01:00 |
|
Jim O'Regan
|
559e03605a
|
b'
|
2017-06-27 22:42:16 +01:00 |
|
Paul O'Leary McCann
|
e56fea14eb
|
Add basic Japanese tokenizer test
|
2017-06-28 01:24:25 +09:00 |
|
Paul O'Leary McCann
|
84041a2bb5
|
Make create_tokenizer work with Japanese
|
2017-06-28 01:18:05 +09:00 |
|
Ines Montani
|
f69ff15089
|
Update CONTRIBUTORS.md
|
2017-06-27 14:49:02 +02:00 |
|
Ines Montani
|
e265e34e18
|
Merge pull request #1153 from jimregan/polish
add tokeniser exceptions for Polish
|
2017-06-27 14:48:00 +02:00 |
|
Jim Regan
|
d81ceb0cd5
|
Merge branch 'develop' into polish
|
2017-06-26 22:42:27 +01:00 |
|
Jim O'Regan
|
2f84c73585
|
a start
|
2017-06-26 22:40:04 +01:00 |
|
Jim O'Regan
|
28d7f0a672
|
reference
|
2017-06-26 22:38:28 +01:00 |
|
Jim O'Regan
|
e12defdd9c
|
missed a couple
|
2017-06-26 22:24:14 +01:00 |
|
Jim O'Regan
|
c1e4e0f3bf
|
just now discovered that you can do multiwords
|
2017-06-26 22:19:39 +01:00 |
|
Jim O'Regan
|
5e5f94c1c0
|
fix dup
|
2017-06-26 21:57:00 +01:00 |
|
Jim O'Regan
|
a8dff9133e
|
add POS
|
2017-06-26 21:53:41 +01:00 |
|
Jim O'Regan
|
3c4d83aa6e
|
CLA
|
2017-06-26 21:32:48 +01:00 |
|
Jim O'Regan
|
e9213f54de
|
missed one
|
2017-06-26 21:29:21 +01:00 |
|
Jim O'Regan
|
1eb7cc3017
|
attempt a port from #1147
|
2017-06-26 21:24:55 +01:00 |
|
Ines Montani
|
d6e08f2bf6
|
Merge pull request #1142 from garfieldnate/patch-1
fix confusing typo
|
2017-06-26 10:41:47 +02:00 |
|
Ines Montani
|
01c7c09c7f
|
Merge pull request #1146 from jarle/doc-patch
Fix small typo in the new spaCy 101 guide
|
2017-06-26 10:41:18 +02:00 |
|
Jarle Mathiesen
|
f20533ec0c
|
fix small typo
|
2017-06-24 12:31:33 +02:00 |
|
Nathan Glenn
|
81166c3d56
|
fix confusing typo
This document describes the `Vocab` class, not the `Span` class.
|
2017-06-21 19:22:30 +02:00 |
|
Matthew Honnibal
|
91e52543ef
|
Merge pull request #1118 from Gregory-Howard/patch-2
Update _tokenizer_exceptions_list (adding cities)
|
2017-06-20 11:16:07 +02:00 |
|
Matthew Honnibal
|
8ea785e01a
|
Merge pull request #1119 from oroszgy/patch-3
Fixed conllu converter
|
2017-06-20 11:14:41 +02:00 |
|
Ines Montani
|
9335736c20
|
Merge pull request #1127 from bartbroere/master
Fixed a minor typo in the documentation
|
2017-06-13 13:15:20 +02:00 |
|
Ines Montani
|
f64e3efc76
|
Merge pull request #1128 from thinline72/patch-1
Changed the capital of Lithuania to Vilnius
|
2017-06-13 13:14:43 +02:00 |
|
Savva Kolbachev
|
800a8faff4
|
Changed the capital of Lithuania to Vilnius
Hi,
There is a typo about the capital of Lithuania.
Vilnius is the capital of Lithuania https://en.wikipedia.org/wiki/Vilnius
Ljubljana is the capital of Slovenia https://en.wikipedia.org/wiki/Ljubljana
|
2017-06-12 23:27:00 +03:00 |
|
Bart Broere
|
e3be243e06
|
Merge pull request #1 from explosion/master
Update
|
2017-06-12 22:06:59 +02:00 |
|
Ines Montani
|
6eae9f943a
|
Merge pull request #1125 from Tpt/french_noun_chunks
Adds function to extract french noun chunks
|
2017-06-12 21:25:33 +02:00 |
|
Ines Montani
|
57f64b9e1c
|
Merge pull request #1124 from v3t3a/patch-3
docs - Fix url error for Displacy Ent visualizer
|
2017-06-12 21:20:32 +02:00 |
|
Ines Montani
|
b2a28028cf
|
Merge pull request #1115 from v3t3a/patch-2
docs - Add read() method when opening file (Lightning tour)
|
2017-06-12 21:19:25 +02:00 |
|
Ines Montani
|
fe8d136ae0
|
Merge pull request #1114 from v3t3a/patch-1
docs - Update doc.jade (Just remove a duplicate 'doc =')
|
2017-06-12 21:19:02 +02:00 |
|
Ines Montani
|
6b94c3cf00
|
Merge pull request #1126 from ianmobbs/master
Added html5lib==1.0b8 to requirements.txt
|
2017-06-12 21:18:24 +02:00 |
|
Ian Mobbs
|
d19ce29a23
|
Create requirements.txt
|
2017-06-12 13:21:44 -04:00 |
|
Tpt
|
7745b3ae04
|
Adds noun chunks to French syntax iterators
|
2017-06-12 15:29:58 +02:00 |
|
Tpt
|
57e8254f63
|
Adds function to extract french noun chunks
|
2017-06-12 15:20:49 +02:00 |
|
Vetea
|
eae1f7b19c
|
Fix url error for Displacy Ent visualizer
|
2017-06-12 14:30:02 +02:00 |
|
Bart Broere
|
e4a45ae55f
|
Very minor documentation fix
|
2017-06-12 12:28:51 +02:00 |
|
Raphaël Bournhonesque
|
46637369aa
|
Add basic unit tests for Pattern
|
2017-06-11 18:34:38 +02:00 |
|
Raphaël Bournhonesque
|
1849a110e3
|
Improve logging
|
2017-06-11 18:31:19 +02:00 |
|
Raphaël Bournhonesque
|
4289a21703
|
Add 'ent' to node matching key
|
2017-06-11 18:30:53 +02:00 |
|
Raphaël Bournhonesque
|
d010f5a123
|
Fix node matching bug caused by lower function
|
2017-06-11 18:30:28 +02:00 |
|
Raphaël Bournhonesque
|
4ca8a396a2
|
Do not add the root token to the adjacency map
|
2017-06-11 18:30:01 +02:00 |
|
Raphaël Bournhonesque
|
d9c567371f
|
Move add_node and add_edge methods to the Tree base class
|
2017-06-11 18:29:28 +02:00 |
|
Raphaël Bournhonesque
|
8ff4f512a2
|
Check in PatternParser that the generated Pattern is valid
|
2017-06-11 18:28:36 +02:00 |
|
Raphaël Bournhonesque
|
e55199d454
|
Implementation of Pattern
|
2017-06-11 01:06:24 +02:00 |
|
György Orosz
|
62dbf9025c
|
Fixed conllu converter
|
2017-06-09 22:53:56 +02:00 |
|
Grégory Howard
|
cd974b32b7
|
Update _tokenizer_exceptions_list (adding cities)
|
2017-06-09 17:58:18 +02:00 |
|