Commit Graph

2076 Commits

Author SHA1 Message Date
Matthew Honnibal
b6047afe4c * Fix punctuation lemma rules, to resolve Issue #130 2015-10-09 10:25:37 +02:00
Matthew Honnibal
393a13d1af * Add unicode em dash to specials.json, so that we can control what POS tag it gets. This way we can prevent sentence boundary detection errors, to address Issue #130. 2015-10-09 19:24:33 +11:00
Matthew Honnibal
1490feda29 * Make generate_specials pretty-print the specials.json file 2015-10-09 19:23:47 +11:00
Matthew Honnibal
1842a53e73 * Lemmatize smart quotes as plain quotes 2015-10-09 19:09:36 +11:00
Matthew Honnibal
2d9e5bf566 * Allow punctuation to be lemmatized 2015-10-09 19:02:42 +11:00
Matthew Honnibal
5332c0b697 * Add support for punctuation lemmatization, to handle unicode characters. This should help in addressing Issue #130 2015-10-09 18:54:40 +11:00
Matthew Honnibal
b71ba2eed5 * Add tests for unicode puncuation character lemmatization 2015-10-09 18:43:14 +11:00
Yubing (Tom) Dong
9a6811acc4 Merge remote-tracking branch 'upstream/master' 2015-10-08 22:53:02 -07:00
Henning Peters
0e13f18ea4 remove compile warning noise 2015-10-09 07:23:39 +02:00
Matthew Honnibal
c5b2c4ead8 * Don't build old license page 2015-10-09 14:58:45 +11:00
Matthew Honnibal
4bae38128d * Remove license page from website in repo 2015-10-09 14:58:34 +11:00
Matthew Honnibal
00c1992503 * Mark tests that require models 2015-10-09 14:48:14 +11:00
Matthew Honnibal
dea40cfec3 * Mark tests that require models 2015-10-09 14:37:48 +11:00
Matthew Honnibal
5031440c35 * Mark tests that require models 2015-10-09 14:29:28 +11:00
Matthew Honnibal
76936a3456 * Mark tests that require models 2015-10-09 14:19:07 +11:00
Matthew Honnibal
7b340912d4 * Mark tests that require models 2015-10-09 14:09:26 +11:00
Matthew Honnibal
20b8c3e281 * Mark tests that require models 2015-10-09 13:58:01 +11:00
Matthew Honnibal
b125289f30 * Fix type declaration in asciied function 2015-10-09 13:46:57 +11:00
Matthew Honnibal
9ff288c7bb * Update tests, after removal of spacy.en.attrs 2015-10-09 13:37:25 +11:00
Matthew Honnibal
c64fd472b0 * Fix travis.yml 2015-10-09 12:58:08 +11:00
Matthew Honnibal
f2374ecfb6 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-10-09 12:48:34 +11:00
Matthew Honnibal
5af4b62fe7 * Filter out phrases that consist of common, lower-case words. 2015-10-09 12:47:43 +11:00
Matthew Honnibal
4bbc8f45c6 * Fix multi word matcher 2015-10-09 02:02:37 +11:00
Matthew Honnibal
801d55a6d9 * Fix phrase matcher 2015-10-09 02:00:45 +11:00
Matthew Honnibal
7b23442543 Merge pull request #133 from pquentin/patch-2
Fix typo
2015-10-08 21:47:04 +11:00
Quentin Pradet
1a71706c05 Fix typo 2015-10-08 14:22:23 +04:00
Matthew Honnibal
b3a70e6375 * Clean up unnecessary try/except block 2015-10-08 14:34:11 +11:00
Matthew Honnibal
4513bed175 * Avoid compiling unused files 2015-10-08 14:00:34 +11:00
Matthew Honnibal
e3e8994368 * Patch italian tag map 2015-10-08 14:00:13 +11:00
Matthew Honnibal
2d68f75b6a * Fix identity tag map 2015-10-08 13:59:56 +11:00
Matthew Honnibal
5890682ed1 * Fix multi_word_matches script 2015-10-08 13:59:32 +11:00
Matthew Honnibal
a83253b455 Merge pull request #129 from chrisdubois/patch-1
Fix size of allocation when creating a pattern
2015-10-08 12:04:41 +11:00
Matthew Honnibal
6ea1601e93 * Add script to train models off the UD treebanks. Note that the UD data is restricted to research purposes only, and should only be used to train models for academic experiments. 2015-10-08 12:01:08 +11:00
Chris DuBois
e095faa785 Add contributor. 2015-10-07 17:55:46 -07:00
chrisdubois
cc47b8ad6a Fix size of allocation when creating a pattern
Each pattern object currently contains two AttrValues rather than just one.
2015-10-07 10:32:55 -07:00
Yubing (Tom) Dong
0f601b8b75 Update docstring of Doc.__getitem__ 2015-10-07 01:27:28 -07:00
Yubing (Tom) Dong
3fd3bc79aa Refactor to remove duplicate slicing logic 2015-10-07 01:25:35 -07:00
Yubing (Tom) Dong
97685aecb7 Add slicing support to Span 2015-10-06 02:45:49 -07:00
Yubing (Tom) Dong
5cc2f2b01a Test simple indexing for Span 2015-10-06 02:41:46 -07:00
Yubing (Tom) Dong
ef2af20cd3 Make Doc's slicing behavior conform to Python conventions 2015-10-06 02:41:28 -07:00
Yubing (Tom) Dong
2fc33e8024 Allow step=1 when slicing a Doc 2015-10-06 00:57:05 -07:00
Yubing (Tom) Dong
73566899bf Add Doc slicing tests 2015-10-06 00:57:01 -07:00
Matthew Honnibal
b228a8f4a6 * Remove spacy/en/attrs 2015-10-06 16:20:46 +11:00
Matthew Honnibal
693677fd8d * Prepare to remove en/attrx file, now that moving to symbols.pyx 2015-10-06 16:20:13 +11:00
Matthew Honnibal
63bd17135f * Whitespace 2015-10-06 10:37:07 +11:00
Matthew Honnibal
e7c31f7eae * Tweak information extraction example 2015-10-06 10:35:49 +11:00
Matthew Honnibal
c503654ec1 * Update bin/parser/train for printing output. 2015-10-06 10:35:22 +11:00
Matthew Honnibal
3d9f41c2c9 * Add LookupError for better error reporting in Vocab 2015-10-06 10:34:59 +11:00
Matthew Honnibal
ecc5281b36 * Remove en/pos.pyx, as the tagger code now lives in spacy/tagger.pyx 2015-10-06 10:12:08 +11:00
Matthew Honnibal
e4ba8a4b5a * Add multi word matching code 2015-10-06 09:06:52 +11:00