Commit Graph

3993 Commits

Author SHA1 Message Date
Matthew Honnibal
3e8d9c772e Test interaction of token_match and punctuation
Check that the new token_match function applies after punctuation is split off.
2016-12-31 00:52:17 +11:00
Matthew Honnibal
623d94e14f Whitespace 2016-12-31 00:30:28 +11:00
Ines Montani
9d39e7853a Merge pull request #713 from petterhh/patch-1
Add PART to tag map
2016-12-28 18:51:09 +01:00
Petter Hohle
f112e7754e Add PART to tag map
16 of the 17 PoS tags in the UD tag set is added; PART is missing.
2016-12-28 18:39:01 +01:00
Ines Montani
14295f9302 Update README.rst 2016-12-28 00:55:00 +01:00
Ines Montani
9f24eb3fd9 Update CONTRIBUTORS.md 2016-12-28 00:25:07 +01:00
Ines Montani
d1585959d9 Add Hungarian to alpha support overview 2016-12-27 22:31:41 +01:00
Ines Montani
decb7437ea Update README.rst 2016-12-27 22:19:19 +01:00
Ines Montani
e80dad8616 Update version 2016-12-27 22:18:48 +01:00
Matthew Honnibal
f62db78dc3 Increment version 2016-12-27 21:11:22 +01:00
Matthew Honnibal
cade536d1e Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-12-27 21:04:10 +01:00
Matthew Honnibal
ce4539dafd Allow the vocabulary to grow to 10,000, to prevent cold-start problem. 2016-12-27 21:03:45 +01:00
Ines Montani
ad3669cef5 Merge pull request #703 from magnusburton/master
Added Swedish abbreviations
2016-12-27 01:01:49 +01:00
Ines Montani
223142d3d3 Update CONTRIBUTORS.md 2016-12-27 00:49:26 +01:00
Ines Montani
78f754dd9a Merge pull request #705 from oroszgy/hu_tokenizer
Initial support for Hungarian
2016-12-27 00:48:13 +01:00
Gyorgy Orosz
ef8f3103f2 Merge branch 'hu_tokenizer' of github.com:oroszgy/spaCy into hu_tokenizer 2016-12-26 22:39:17 +01:00
Gyorgy Orosz
ade7487ff8 Accepted contributor agreement. 2016-12-26 22:37:02 +01:00
Ines Montani
b7becaec85 Fix typo 2016-12-25 15:23:32 +01:00
Ines Montani
6dd8ae1b0d Update README.md 2016-12-25 14:43:40 +01:00
Ines Montani
f6f6e028ea Make links detect target automatically and replace false with null for no attribute
New version of Harp would render attribute=false as attribute="false",
while attribute=null renders element without attribute.
2016-12-24 12:24:04 +01:00
Ines Montani
b893126c12 Use link mixin instead of plain link markup 2016-12-24 12:22:52 +01:00
Ines Montani
8785706039 Reformat stop words for better readability 2016-12-24 00:58:40 +01:00
Gyorgy Orosz
45e045a87b Unicode/UTF8 compatibility for Python2 2016-12-24 00:21:00 +01:00
Gyorgy Orosz
72b61b6d03 Typo fix. 2016-12-24 00:10:29 +01:00
Gyorgy Orosz
3a9be4d485 Updated token exception handling mechanism to allow the usage of arbitrary functions as token exception matchers. 2016-12-23 23:49:34 +01:00
Ines Montani
207555fae7 Fix spelling 2016-12-23 21:36:01 +01:00
Ines Montani
1436b9f15a Fix formatting and consistency 2016-12-23 21:36:01 +01:00
Ines Montani
1d64527727 Update Spanish tokenizer
Remove reflexive pronouns as they're part of an open class, fix
mistakes and add exceptions
2016-12-23 21:36:01 +01:00
Ines Montani
12bb0aa3e3 Fix license formatting for GitHub's parser 2016-12-23 15:05:03 +01:00
Ines Montani
48b03b4001 Fix formatting and wording 2016-12-23 14:36:03 +01:00
Ines Montani
cc051ddc15 Add resources page to usage docs 2016-12-23 14:36:03 +01:00
Ines Montani
11ec02d5e3 Separate inline icon and help cursor classes 2016-12-23 14:36:03 +01:00
Ines Montani
7f411fd01c Remove exceptions containing whitespace / no special chars 2016-12-23 14:30:06 +01:00
Magnus Burton
fdf4776262 Added Swedish abbreviations 2016-12-22 22:45:18 +01:00
Ines Montani
642803d533 Merge pull request #702 from fnorf/patch-1
fixed minor typo
2016-12-22 13:06:56 +01:00
Hannes
c5c0ed9af8 fixed minor typo
Peformance -> Performance
2016-12-22 13:02:56 +01:00
Gyorgy Orosz
d9c59c4751 Maintaining backward compatibility. 2016-12-21 23:30:49 +01:00
Gyorgy Orosz
1748549aeb Added exception pattern mechanism to the tokenizer. 2016-12-21 23:16:19 +01:00
Gyorgy Orosz
35aa54765d Hungarian module is exposed in spacy. 2016-12-21 20:45:36 +01:00
Gyorgy Orosz
ab2f6ea46c Removed data files from tests.. 2016-12-21 20:22:09 +01:00
Ines Montani
3c87c71d43 Add tokenizer exceptions for a.m. and p.m. in Spanish 2016-12-21 18:19:10 +01:00
Ines Montani
d1a2846750 Document DET_LEMMA 2016-12-21 18:18:35 +01:00
Ines Montani
78e63dc7d0 Update tokenizer exceptions for English 2016-12-21 18:06:34 +01:00
Ines Montani
702d1eed93 Update tokenizer exceptions for German 2016-12-21 18:06:27 +01:00
Ines Montani
d60380418e Update tokenizer exceptions for Spanish 2016-12-21 18:06:17 +01:00
Ines Montani
920fa0fed2 Add DET_LEMMA constant 2016-12-21 18:05:41 +01:00
Ines Montani
8978806ea6 Allow Vocab to load without serializer_freqs 2016-12-21 18:05:23 +01:00
Ines Montani
be8ed811f6 Remove trailing whitespace 2016-12-21 18:04:41 +01:00
Ines Montani
926e19184a Merge pull request #695 from magnusburton/master
Added Swedish morph rules
2016-12-21 01:06:00 +01:00
Ines Montani
71c00db8a5 Update language models page 2016-12-21 00:54:54 +01:00