Commit Graph

4351 Commits

Author SHA1 Message Date
Matthew Honnibal
f62db78dc3 Increment version 2016-12-27 21:11:22 +01:00
Matthew Honnibal
cade536d1e Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-12-27 21:04:10 +01:00
Matthew Honnibal
ce4539dafd Allow the vocabulary to grow to 10,000, to prevent cold-start problem. 2016-12-27 21:03:45 +01:00
Ines Montani
ad3669cef5 Merge pull request #703 from magnusburton/master
Added Swedish abbreviations
2016-12-27 01:01:49 +01:00
Ines Montani
223142d3d3 Update CONTRIBUTORS.md 2016-12-27 00:49:26 +01:00
Ines Montani
78f754dd9a Merge pull request #705 from oroszgy/hu_tokenizer
Initial support for Hungarian
2016-12-27 00:48:13 +01:00
Gyorgy Orosz
ef8f3103f2 Merge branch 'hu_tokenizer' of github.com:oroszgy/spaCy into hu_tokenizer 2016-12-26 22:39:17 +01:00
Gyorgy Orosz
ade7487ff8 Accepted contributor agreement. 2016-12-26 22:37:02 +01:00
Ines Montani
b7becaec85 Fix typo 2016-12-25 15:23:32 +01:00
Ines Montani
6dd8ae1b0d Update README.md 2016-12-25 14:43:40 +01:00
Ines Montani
f6f6e028ea Make links detect target automatically and replace false with null for no attribute
New version of Harp would render attribute=false as attribute="false",
while attribute=null renders element without attribute.
2016-12-24 12:24:04 +01:00
Ines Montani
b893126c12 Use link mixin instead of plain link markup 2016-12-24 12:22:52 +01:00
Ines Montani
8785706039 Reformat stop words for better readability 2016-12-24 00:58:40 +01:00
Gyorgy Orosz
45e045a87b Unicode/UTF8 compatibility for Python2 2016-12-24 00:21:00 +01:00
Gyorgy Orosz
72b61b6d03 Typo fix. 2016-12-24 00:10:29 +01:00
Gyorgy Orosz
3a9be4d485 Updated token exception handling mechanism to allow the usage of arbitrary functions as token exception matchers. 2016-12-23 23:49:34 +01:00
Ines Montani
207555fae7 Fix spelling 2016-12-23 21:36:01 +01:00
Ines Montani
1436b9f15a Fix formatting and consistency 2016-12-23 21:36:01 +01:00
Ines Montani
1d64527727 Update Spanish tokenizer
Remove reflexive pronouns as they're part of an open class, fix
mistakes and add exceptions
2016-12-23 21:36:01 +01:00
Ines Montani
12bb0aa3e3 Fix license formatting for GitHub's parser 2016-12-23 15:05:03 +01:00
Ines Montani
48b03b4001 Fix formatting and wording 2016-12-23 14:36:03 +01:00
Ines Montani
cc051ddc15 Add resources page to usage docs 2016-12-23 14:36:03 +01:00
Ines Montani
11ec02d5e3 Separate inline icon and help cursor classes 2016-12-23 14:36:03 +01:00
Ines Montani
7f411fd01c Remove exceptions containing whitespace / no special chars 2016-12-23 14:30:06 +01:00
Magnus Burton
fdf4776262 Added Swedish abbreviations 2016-12-22 22:45:18 +01:00
Ines Montani
642803d533 Merge pull request #702 from fnorf/patch-1
fixed minor typo
2016-12-22 13:06:56 +01:00
Hannes
c5c0ed9af8 fixed minor typo
Peformance -> Performance
2016-12-22 13:02:56 +01:00
Gyorgy Orosz
d9c59c4751 Maintaining backward compatibility. 2016-12-21 23:30:49 +01:00
Gyorgy Orosz
1748549aeb Added exception pattern mechanism to the tokenizer. 2016-12-21 23:16:19 +01:00
Gyorgy Orosz
35aa54765d Hungarian module is exposed in spacy. 2016-12-21 20:45:36 +01:00
Gyorgy Orosz
ab2f6ea46c Removed data files from tests.. 2016-12-21 20:22:09 +01:00
Ines Montani
3c87c71d43 Add tokenizer exceptions for a.m. and p.m. in Spanish 2016-12-21 18:19:10 +01:00
Ines Montani
d1a2846750 Document DET_LEMMA 2016-12-21 18:18:35 +01:00
Ines Montani
78e63dc7d0 Update tokenizer exceptions for English 2016-12-21 18:06:34 +01:00
Ines Montani
702d1eed93 Update tokenizer exceptions for German 2016-12-21 18:06:27 +01:00
Ines Montani
d60380418e Update tokenizer exceptions for Spanish 2016-12-21 18:06:17 +01:00
Ines Montani
920fa0fed2 Add DET_LEMMA constant 2016-12-21 18:05:41 +01:00
Ines Montani
8978806ea6 Allow Vocab to load without serializer_freqs 2016-12-21 18:05:23 +01:00
Ines Montani
be8ed811f6 Remove trailing whitespace 2016-12-21 18:04:41 +01:00
Ines Montani
926e19184a Merge pull request #695 from magnusburton/master
Added Swedish morph rules
2016-12-21 01:06:00 +01:00
Ines Montani
71c00db8a5 Update language models page 2016-12-21 00:54:54 +01:00
Gyorgy Orosz
3d5306acb9 Added further testcases. 2016-12-20 23:49:35 +01:00
Gyorgy Orosz
23956e72ff Improved partial support for tokenzing Hungarian numbers 2016-12-20 23:36:59 +01:00
Matthew Honnibal
5a319060b9 Merge branch 'master' of https://github.com/explosion/spaCy 2016-12-20 16:26:57 -06:00
Matthew Honnibal
7793e2ad82 Fix use of dropout in sentiment analysis LSTM example 2016-12-20 16:26:38 -06:00
Gyorgy Orosz
6add156075 Refactored language data structure 2016-12-20 22:28:20 +01:00
Matthew Honnibal
6aed94a3b9 Merge pull request #698 from aikramer2/master
update to training doc
2016-12-21 07:46:51 +11:00
aikramer2
349143faa2 update to training doc 2016-12-20 12:01:16 -08:00
Gyorgy Orosz
366b3f8685 Merge branch 'master' into hu_tokenizer 2016-12-20 20:53:31 +01:00
Gyorgy Orosz
c035928156 Partial Hungarian number tokenization is added. 2016-12-20 20:46:20 +01:00