Matthew Honnibal
|
fde53be3b4
|
Move whole token mach inside _split_affixes.
|
2016-12-30 17:11:50 -06:00 |
|
Matthew Honnibal
|
3ba7c167a8
|
Fix URL tests
|
2016-12-30 17:10:08 -06:00 |
|
Matthew Honnibal
|
9936a1b9b5
|
Merge branch 'tokenization_w_exception_patterns' of https://github.com/oroszgy/spaCy.hu into oroszgy-tokenization_w_exception_patterns
|
2016-12-30 14:53:40 -06:00 |
|
Magnus Burton
|
56e2219b65
|
Added Swedish city abbreviations
|
2016-12-30 21:17:34 +01:00 |
|
Magnus Burton
|
e935c950d8
|
Added months and days as abbreviations for Swedish
|
2016-12-30 21:08:44 +01:00 |
|
Matthew Honnibal
|
3e8d9c772e
|
Test interaction of token_match and punctuation
Check that the new token_match function applies after punctuation is split off.
|
2016-12-31 00:52:17 +11:00 |
|
Matthew Honnibal
|
74b921f394
|
Merge branch 'master' of ssh://github.com/explosion/spaCy into develop
|
2016-12-30 14:38:27 +01:00 |
|
Matthew Honnibal
|
623d94e14f
|
Whitespace
|
2016-12-31 00:30:28 +11:00 |
|
Matthew Honnibal
|
af81ac8bb0
|
Use thinc 6.0
|
2016-12-29 11:58:42 +01:00 |
|
Matthew Honnibal
|
9bac332688
|
Merge branch 'develop'
|
2016-12-29 11:43:38 +01:00 |
|
Ines Montani
|
9d39e7853a
|
Merge pull request #713 from petterhh/patch-1
Add PART to tag map
|
2016-12-28 18:51:09 +01:00 |
|
Petter Hohle
|
f112e7754e
|
Add PART to tag map
16 of the 17 PoS tags in the UD tag set is added; PART is missing.
|
2016-12-28 18:39:01 +01:00 |
|
Ines Montani
|
14295f9302
|
Update README.rst
|
2016-12-28 00:55:00 +01:00 |
|
Ines Montani
|
9f24eb3fd9
|
Update CONTRIBUTORS.md
|
2016-12-28 00:25:07 +01:00 |
|
Ines Montani
|
d1585959d9
|
Add Hungarian to alpha support overview
|
2016-12-27 22:31:41 +01:00 |
|
Ines Montani
|
decb7437ea
|
Update README.rst
|
2016-12-27 22:19:19 +01:00 |
|
Ines Montani
|
e80dad8616
|
Update version
|
2016-12-27 22:18:48 +01:00 |
|
Matthew Honnibal
|
f62db78dc3
|
Increment version
|
2016-12-27 21:11:22 +01:00 |
|
Matthew Honnibal
|
cade536d1e
|
Merge branch 'master' of ssh://github.com/explosion/spaCy
|
2016-12-27 21:04:10 +01:00 |
|
Matthew Honnibal
|
ce4539dafd
|
Allow the vocabulary to grow to 10,000, to prevent cold-start problem.
|
2016-12-27 21:03:45 +01:00 |
|
Ines Montani
|
ad3669cef5
|
Merge pull request #703 from magnusburton/master
Added Swedish abbreviations
|
2016-12-27 01:01:49 +01:00 |
|
Ines Montani
|
223142d3d3
|
Update CONTRIBUTORS.md
|
2016-12-27 00:49:26 +01:00 |
|
Ines Montani
|
78f754dd9a
|
Merge pull request #705 from oroszgy/hu_tokenizer
Initial support for Hungarian
|
2016-12-27 00:48:13 +01:00 |
|
Gyorgy Orosz
|
ef8f3103f2
|
Merge branch 'hu_tokenizer' of github.com:oroszgy/spaCy into hu_tokenizer
|
2016-12-26 22:39:17 +01:00 |
|
Gyorgy Orosz
|
ade7487ff8
|
Accepted contributor agreement.
|
2016-12-26 22:37:02 +01:00 |
|
Ines Montani
|
b7becaec85
|
Fix typo
|
2016-12-25 15:23:32 +01:00 |
|
Ines Montani
|
6dd8ae1b0d
|
Update README.md
|
2016-12-25 14:43:40 +01:00 |
|
Ines Montani
|
f6f6e028ea
|
Make links detect target automatically and replace false with null for no attribute
New version of Harp would render attribute=false as attribute="false",
while attribute=null renders element without attribute.
|
2016-12-24 12:24:04 +01:00 |
|
Ines Montani
|
b893126c12
|
Use link mixin instead of plain link markup
|
2016-12-24 12:22:52 +01:00 |
|
Ines Montani
|
8785706039
|
Reformat stop words for better readability
|
2016-12-24 00:58:40 +01:00 |
|
Gyorgy Orosz
|
45e045a87b
|
Unicode/UTF8 compatibility for Python2
|
2016-12-24 00:21:00 +01:00 |
|
Gyorgy Orosz
|
72b61b6d03
|
Typo fix.
|
2016-12-24 00:10:29 +01:00 |
|
Gyorgy Orosz
|
3a9be4d485
|
Updated token exception handling mechanism to allow the usage of arbitrary functions as token exception matchers.
|
2016-12-23 23:49:34 +01:00 |
|
Ines Montani
|
207555fae7
|
Fix spelling
|
2016-12-23 21:36:01 +01:00 |
|
Ines Montani
|
1436b9f15a
|
Fix formatting and consistency
|
2016-12-23 21:36:01 +01:00 |
|
Ines Montani
|
1d64527727
|
Update Spanish tokenizer
Remove reflexive pronouns as they're part of an open class, fix
mistakes and add exceptions
|
2016-12-23 21:36:01 +01:00 |
|
Ines Montani
|
12bb0aa3e3
|
Fix license formatting for GitHub's parser
|
2016-12-23 15:05:03 +01:00 |
|
Ines Montani
|
48b03b4001
|
Fix formatting and wording
|
2016-12-23 14:36:03 +01:00 |
|
Ines Montani
|
cc051ddc15
|
Add resources page to usage docs
|
2016-12-23 14:36:03 +01:00 |
|
Ines Montani
|
11ec02d5e3
|
Separate inline icon and help cursor classes
|
2016-12-23 14:36:03 +01:00 |
|
Ines Montani
|
7f411fd01c
|
Remove exceptions containing whitespace / no special chars
|
2016-12-23 14:30:06 +01:00 |
|
Magnus Burton
|
fdf4776262
|
Added Swedish abbreviations
|
2016-12-22 22:45:18 +01:00 |
|
Ines Montani
|
642803d533
|
Merge pull request #702 from fnorf/patch-1
fixed minor typo
|
2016-12-22 13:06:56 +01:00 |
|
Hannes
|
c5c0ed9af8
|
fixed minor typo
Peformance -> Performance
|
2016-12-22 13:02:56 +01:00 |
|
Gyorgy Orosz
|
d9c59c4751
|
Maintaining backward compatibility.
|
2016-12-21 23:30:49 +01:00 |
|
Gyorgy Orosz
|
1748549aeb
|
Added exception pattern mechanism to the tokenizer.
|
2016-12-21 23:16:19 +01:00 |
|
Gyorgy Orosz
|
35aa54765d
|
Hungarian module is exposed in spacy.
|
2016-12-21 20:45:36 +01:00 |
|
Gyorgy Orosz
|
ab2f6ea46c
|
Removed data files from tests..
|
2016-12-21 20:22:09 +01:00 |
|
Ines Montani
|
3c87c71d43
|
Add tokenizer exceptions for a.m. and p.m. in Spanish
|
2016-12-21 18:19:10 +01:00 |
|
Ines Montani
|
d1a2846750
|
Document DET_LEMMA
|
2016-12-21 18:18:35 +01:00 |
|