Commit Graph

5311 Commits

Author SHA1 Message Date
DuyguA
26ee0590a3 added some commonly used cases 2018-03-08 12:43:58 +01:00
DuyguA
ae6473e4d5 removed some words with negation particle. 2018-03-08 12:20:32 +01:00
DuyguA
6ed59a2198 removed number words to be caried to the lexical 2018-03-08 12:19:23 +01:00
DuyguA
04784a44a6 made alphabetical order for Turkish chaaracters 2018-03-08 12:11:32 +01:00
DuyguA
af33e022a5 added example sentences for Turkish 2018-03-08 12:06:03 +01:00
Matthew Honnibal
a1be01185c Fix array out of bounds error in Span 2018-02-28 12:27:09 +01:00
Thomas Opsomer
8df9e52829 lemma property to return hash instead of unicode 2018-02-27 19:50:01 +01:00
Ines Montani
35634352fe
Merge pull request #2025 from dejanmarich/patch-1
Update stop_words.py for Croatian language
2018-02-26 18:22:32 +01:00
Matthew Honnibal
14f729c72a Add subtok label to parser 2018-02-26 12:26:35 +01:00
Matthew Honnibal
7137ad8b0b Make label filtering clearer for projectivisation 2018-02-26 12:02:01 +01:00
Matthew Honnibal
b8d52cb285 Fix inconsistent label freq cutoff for projectivisation 2018-02-26 12:01:44 +01:00
Matthew Honnibal
7b66ec896a Revert "Revert "Improve parser oracle around sentence breaks.""
This reverts commit 36e481c584.
2018-02-26 10:57:37 +01:00
Matthew Honnibal
36e481c584 Revert "Improve parser oracle around sentence breaks."
This reverts commit 50817dc9ad.
2018-02-26 10:53:55 +01:00
Matthew Honnibal
5faae803c6 Add option to not use Janome for Japanese tokenization 2018-02-26 09:39:46 +01:00
Matthew Honnibal
9b406181cd Add Chinese.Defaults.use_jieba setting, for UD 2018-02-25 15:12:38 +01:00
Matthew Honnibal
9ccd0c643b Add Vietnamese 2018-02-25 15:00:46 +01:00
Matthew Honnibal
d4fdb97c87 Fix alignment for words with spaces 2018-02-25 14:55:00 +01:00
Matthew Honnibal
6d2c1ef52c Fix SP tag in generic tag map 2018-02-24 16:04:56 +01:00
Matthew Honnibal
5cc3bd1c1d Update alignment tests 2018-02-24 16:03:58 +01:00
Matthew Honnibal
6138439469 Fix many-to-one alignment 2018-02-24 16:03:50 +01:00
Matthew Honnibal
4890ee1732 Fix scoring of tokenization for punct 2018-02-24 10:32:32 +01:00
Matthew Honnibal
12b39f87da Move cython declarations in matcher.pyx 2018-02-24 10:32:18 +01:00
Matthew Honnibal
01d1b7abdf Support many-to-one alignment in GoldParse 2018-02-24 10:17:01 +01:00
Matthew Honnibal
7865746574 Support many-to-one alignment 2018-02-24 02:09:53 +01:00
Matthew Honnibal
458710b831 Poke matcher test for appveyor 2018-02-23 23:53:48 +01:00
Matthew Honnibal
968dabdde4 Fix bug in multi-task objective 2018-02-23 23:48:09 +01:00
Matthew Honnibal
2c9c8b8d72 Try comming out emoji test in matcher 2018-02-23 23:34:35 +01:00
Matthew Honnibal
980ad68cbe Try to find test that fails on appveyor 2018-02-23 21:27:53 +01:00
Matthew Honnibal
39de8cd4d3 Try to find test failing on appveyor 2018-02-23 20:59:21 +01:00
Matthew Honnibal
4492a33a9d Fix sent_start multi-task objective when alignment fails 2018-02-23 16:50:59 +01:00
Matthew Honnibal
5fa44e93f1 Set unicode_literals in matcher 2018-02-23 16:48:54 +01:00
Matthew Honnibal
12264f9296 Add multi-task objective for sentence segmentation 2018-02-23 16:25:57 +01:00
Matthew Honnibal
e7deadb519 Set version to 2.1.0.dev1 2018-02-23 16:22:24 +01:00
Matthew Honnibal
7b575a119e Try to reduce memory usage of test_matcher 2018-02-23 15:34:37 +01:00
Matthew Honnibal
24563f4026 Fix data typing in align 2018-02-23 15:08:06 +01:00
Matthew Honnibal
7a5ba20692 Fix integer typing in _align 2018-02-23 14:51:24 +01:00
Matthew Honnibal
875411b875 Set unicode types in _align.pyx and test 2018-02-23 14:35:38 +01:00
Matthew Honnibal
51d9679aa3 Fix broken span.as_doc test 2018-02-23 14:22:24 +01:00
dejanmarich
71c261d58b
Update stop_words.py
Added more words
2018-02-23 10:31:01 +01:00
Matthew Honnibal
3e6c1111b7 Remove obsolete test 2018-02-23 03:22:07 +01:00
Matthew Honnibal
a4fdec524a Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-gold 2018-02-22 21:44:28 +01:00
Matthew Honnibal
50817dc9ad Improve parser oracle around sentence breaks. 2018-02-22 19:22:26 +01:00
Matthew Honnibal
307aefe131 Increment version to v2.0.9 2018-02-22 17:07:53 +01:00
Feng Niu
1c60384bed return on empty doc 2018-02-21 15:39:04 -08:00
Feng Niu
7eb1cd100b unbound doc var 2018-02-21 15:05:37 -08:00
Feng Niu
8df75b229c fix unbound vars in es.syntax_iterators 2018-02-21 13:11:17 -08:00
alldefector
4244e285c2
Fix Spanish noun_chunks failure caused by typo 2018-02-21 12:43:21 -08:00
Matthew Honnibal
661873ee4c Randomize the rebatch size in parser 2018-02-21 21:02:07 +01:00
Matthew Honnibal
0872cf611d Don't lower-case lemmas of proper nouns 2018-02-21 16:01:16 +01:00
Matthew Honnibal
a0ddb803fd Make error when no label found more helpful 2018-02-21 16:00:59 +01:00
Matthew Honnibal
ea2fc5d45f Improve length and freq cutoffs in parser 2018-02-21 16:00:38 +01:00
Matthew Honnibal
e5757d4bf0 Add labels property to parser 2018-02-21 16:00:00 +01:00
Matthew Honnibal
eff4ae809a Fix nonproj label filter 2018-02-21 15:59:04 +01:00
Matthew Honnibal
e624405cda Temporarily remove cutoff when filtering labels in nonproj 2018-02-21 13:53:40 +01:00
Matthew Honnibal
f466f0186e Use new alignment implementation in GoldParse 2018-02-20 21:16:35 +01:00
Matthew Honnibal
c0734ba526 Make alignment work with strings 2018-02-20 17:51:49 +01:00
Matthew Honnibal
8180c84a98 Add tests for new Levenshtein alignment 2018-02-20 17:32:25 +01:00
Matthew Honnibal
930c980570 Add improved Levenshtein alignment implementation 2018-02-20 17:31:56 +01:00
Ines Montani
14e7e0f12a
Merge pull request #2000 from jimregan/polish-tag-map
Polish tag map
2018-02-18 19:05:58 +01:00
Jim O'Regan
664407de5d missing PrepCase attribute 2018-02-18 14:46:12 +00:00
Jim O'Regan
95f0673fbc fix typo/missing here too 2018-02-18 14:38:27 +00:00
Matthew Honnibal
2bccad8815 Fix incorrect matcher test 2018-02-18 14:56:12 +01:00
Matthew Honnibal
530172d57a Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-faster-matcher 2018-02-18 14:40:42 +01:00
Matthew Honnibal
cf0e320f2b Add doc.is_sentenced attribute, re #1959 2018-02-18 14:16:55 +01:00
Matthew Honnibal
1e5aeb4eec
Merge pull request #1987 from thomasopsomer/span-sent
Make span.sent work when only manual / custom sbd
2018-02-18 14:05:37 +01:00
Matthew Honnibal
1cf774bdc1 Add output options return_matches and as_tuples to Matcher 2018-02-18 14:00:45 +01:00
Matthew Honnibal
dd9b0945af Fix inconsistencies in the symbols table 2018-02-18 13:51:31 +01:00
Matthew Honnibal
66496ac8e1 Set version to v2.1.0.dev0 2018-02-18 13:48:39 +01:00
Matthew Honnibal
eb3040ce46
Merge pull request #1891 from fucking-signup/master
Fix issue #1889
2018-02-18 13:47:47 +01:00
Matthew Honnibal
3d7285870b Update matcher branch with v2.0.8 master 2018-02-18 13:42:58 +01:00
ines
6bba1db4cc Drop six and related hacks as a dependency 2018-02-18 13:29:56 +01:00
Matthew Honnibal
b30b09192a
Merge pull request #1665 from jimregan/animacy
typo in "inan", add "nhum"
2018-02-18 13:26:53 +01:00
Matthew Honnibal
1b3c98e01b Set version to v2.0.8 2018-02-18 12:16:31 +01:00
Matthew Honnibal
f9f46e5a07 Revert matcher fixes from GregDubbin 2018-02-18 10:59:28 +01:00
Matthew Honnibal
86405e4ad1 Fix CLI for multitask objectives 2018-02-18 10:59:11 +01:00
Matthew Honnibal
a34749b2bf Add multitask objectives options to train CLI 2018-02-17 22:03:54 +01:00
Matthew Honnibal
8f06903e09 Fix multitask objectives 2018-02-17 18:41:36 +01:00
Matthew Honnibal
d1246c95fb Fix model loading when using multitask objectives 2018-02-17 18:11:36 +01:00
Matthew Honnibal
262d0a3148 Fix overwriting of lexical attributes when loading vectors during training 2018-02-17 18:11:11 +01:00
Matthew Honnibal
c0caf7cf27 Fix LANG symbol 2018-02-17 18:10:50 +01:00
Matthew Honnibal
0bf2f6be29 Add missing symbol for LANG attr. Fixes inconsistent numeric ID 2018-02-17 17:37:02 +01:00
Matthew Honnibal
97a228a4ce Increment to v2.0.8.dev0 2018-02-17 16:54:36 +01:00
Matthew Honnibal
f7dc64d2a3 Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-faster-matcher 2018-02-17 16:47:35 +01:00
Aaron Marquez
ea571e8325 Merge branch 'master' into issue-1959 2018-02-16 15:14:09 -08:00
Matthew Honnibal
7d5c720fc3 Fix multitask objective when no pipeline provided 2018-02-15 23:50:21 +01:00
Aaron Marquez
f0d3672e17 Changed loading EN model 2018-02-15 14:28:38 -08:00
Aaron Marquez
3765d84d57 Fix issue #1959 2018-02-15 12:51:49 -08:00
Aaron Marquez
7ba4111554 Add test for issue-1959 2018-02-15 12:46:22 -08:00
Matthew Honnibal
59b7cf9db8 Add get_beam_parse method in ArcEager, for Prodigy 2018-02-15 21:03:16 +01:00
Matthew Honnibal
3e541de440 Merge branch 'master' of https://github.com/explosion/spaCy 2018-02-15 21:02:55 +01:00
Thomas Opsomer
5d24a81c0b add test for span.sent when doc not parsed 2018-02-15 16:59:16 +01:00
Thomas Opsomer
deab391cbf correct check on sent_start & raise if no boundaries 2018-02-15 16:58:30 +01:00
Matthew Honnibal
afbd46adfb Remove length cap in PhraseMatcher 2018-02-15 16:10:54 +01:00
Matthew Honnibal
4533c7408d Update matcher tests 2018-02-15 15:39:47 +01:00
Matthew Honnibal
1c19605426 Move matcher2.pyx to matcher.pyx 2018-02-15 15:27:03 +01:00
Matthew Honnibal
9ebf2fe7c3 Make helper function to get longest matches 2018-02-15 15:26:15 +01:00
Matthew Honnibal
4cb861e080
Merge pull request #1968 from DuyguA/is_currency
New lexical feature is_currency
2018-02-15 12:13:36 +01:00
Thomas Opsomer
b902731313 Find span sentence when only sentence boundaries (no parser) 2018-02-14 22:18:54 +01:00
Matthew Honnibal
d19dc67886 Make get_action nogil, for efficiency 2018-02-14 12:16:36 +01:00
Matthew Honnibal
7885b92b45 Refactor matcher2, hopefully making it faster 2018-02-14 12:11:17 +01:00
Matthew Honnibal
00261eea27 Make tests refer to matcher2 2018-02-14 12:10:51 +01:00
Claudiu-Vlad Ursache
e28de12cbd
Ensure files opened in from_disk are closed
Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706).
2018-02-13 20:49:43 +01:00
Matthew Honnibal
262cbe356e Remove caching, as doesn't seem to help for now. 2018-02-13 17:15:20 +01:00
Matthew Honnibal
f43d53f2c5 Remove print statement 2018-02-13 17:15:07 +01:00
Matthew Honnibal
dcd8d89aef Update test for 850, making it work with matcher2 2018-02-13 16:35:20 +01:00
Matthew Honnibal
9bdfa5cd4f Remove re comparisons tests, as matcher behaves differently 2018-02-13 16:28:52 +01:00
Matthew Honnibal
6d7986b0f1 Fix matcher test 2018-02-13 16:28:06 +01:00
Matthew Honnibal
9efda9e9ab Add PhraseMatcher in matcher2.pyx 2018-02-13 16:27:46 +01:00
Johannes Dollinger
012e874d09 Add contributor agreement for emulbreh 2018-02-13 13:40:33 +01:00
Johannes Dollinger
bf94c13382 Don't fix random seeds on import 2018-02-13 12:42:23 +01:00
Matthew Honnibal
0004331895 Update notes on matcher2 2018-02-13 11:45:45 +01:00
Matthew Honnibal
b4cc39eb74 Fix zero-width quantifiers. Passes test_matcher 2018-02-13 11:45:32 +01:00
Matthew Honnibal
1b01685f47 Fix ZERO_PLUS operator 2018-02-12 12:28:03 +01:00
Matthew Honnibal
9115c3ba0a Add TODO in notes 2018-02-12 12:06:48 +01:00
Matthew Honnibal
b00326a7fe Move pattern_id out of TokenPattern 2018-02-12 12:05:54 +01:00
Matthew Honnibal
d34c732635 Add Python notes for rethinking matcher 2018-02-12 10:19:29 +01:00
Matthew Honnibal
d7c9b53120 Pass kwargs into pipeline components during begin_training 2018-02-12 10:18:39 +01:00
Matthew Honnibal
fae5c0dc18 Work on matcher2 2018-02-12 10:17:43 +01:00
4altinok
ca8728035d added new lex feat to token 2018-02-11 18:55:48 +01:00
4altinok
edd7202a06 added new symbol 2018-02-11 18:55:32 +01:00
4altinok
ed1ac2969e added new lexical feat to lexeme 2018-02-11 18:51:48 +01:00
4altinok
94fb0b75e3 code for is_currency 2018-02-11 18:51:32 +01:00
4altinok
3deef1497a removed 18 and replaced 18 with is_currency 2018-02-11 18:51:09 +01:00
4altinok
471d3c9e23 added lex test for is_currency 2018-02-11 18:50:50 +01:00
ines
c63e99da8a Fix typo in glossary (resolves #1964)
Co-Authored-By: SThomasP <sthomasp@users.noreply.github.com>
2018-02-10 11:58:41 +01:00
Lyndon White
6ee5dff51c
Make python 3.4 compat module loading (fix #1733) 2018-02-09 23:03:35 +08:00
Matthew Honnibal
e361b4f82b Fix #1929: Incorrect NER when pre-set sentence boundaries. 2018-02-08 15:25:41 +01:00
Matthew Honnibal
fd9fd275c5 Make test for #1945 more precise 2018-02-07 02:06:11 +01:00
Matthew Honnibal
c087a14380 Merge branch 'master' of https://github.com/explosion/spaCy 2018-02-07 01:29:39 +01:00
Matthew Honnibal
76d89b2180 Add test for #1945: PhraseMatcher regression 2018-02-07 01:29:23 +01:00
Ines Montani
0954e15dda
Merge pull request #1913 from ohenrik/nb_syntax_iterator
Norwegian Language (nb) - Added french syntax iterator with explanation
2018-02-06 04:59:07 +01:00
Ole Henrik Skogstrøm
251a7805fe Copied French syntax iterator to simplify future changes 2018-02-05 14:45:05 +01:00
Matthew Honnibal
2e7391e627
Merge pull request #1916 from tokestermw/bug/fix-not-passing-in-model-cfg-in-nlp
Bug/fix not passing in model cfg in nlp
2018-02-05 01:19:40 +01:00
Ali Zarezade
9df9da34a3
Fix init_model issue
Fixing issue #1928
2018-02-03 17:21:34 +03:30
Matthew Honnibal
ebe84e45e5 Increment version to 2.0.7 2018-02-02 03:39:16 +01:00
Matthew Honnibal
e4b1f57599 Increment version 2018-02-02 02:33:23 +01:00
Matthew Honnibal
069531c351 Merge branch 'master' of https://github.com/explosion/spaCy 2018-02-02 02:32:58 +01:00
Matthew Honnibal
f74a802d09 Test and fix #1919: Error resuming training 2018-02-02 02:32:40 +01:00
ines
f1d3deffac Add Russian example sentences (see #1107) 2018-02-01 20:09:40 +01:00
Matthew Honnibal
6b1126c312 Merge branch 'master' of https://github.com/explosion/spaCy 2018-02-01 02:57:52 +01:00
ines
3c1fb9d02d Make validate command fail more gracefully if version not found
Mostly relevant during develoment when working with .dev versions
2018-01-31 22:06:28 +01:00
Motoki Wu
54062b7326 added tests for issue #1915 2018-01-30 18:30:19 -08:00
Motoki Wu
f4a7d1a423 make to sure pass in **cfg to each component when training 2018-01-30 18:29:54 -08:00
ines
4046823699 Only check component in factories if string (see #1911) 2018-01-30 16:29:07 +01:00
ines
ce10d320c4 Fix component check in self.factories (see #1911) 2018-01-30 16:09:37 +01:00
Ole Henrik Skogstrøm
e40465487c Added french syntax iterator with explenation 2018-01-30 15:44:29 +01:00
ines
8901814248 Improve error handling if pipeline component is not callable (resolves #1911)
Also add help message if user accidentally calls nlp.add_pipe() with a string of a built-in component name.
2018-01-30 15:43:03 +01:00
Matthew Honnibal
a437ba87a3 Set release=True 2018-01-29 21:26:04 +01:00
Adam Binford
9238749aaf Removed test to avoid network requests 2018-01-29 14:48:20 -05:00
Adam Binford
1a2c2f7d7f Fixed auto linking after download and added simple test to check 2018-01-29 14:25:21 -05:00
Matthew Honnibal
cb7110c22e
Merge pull request #1882 from ohenrik/nb_lemma_and_tag_map
Add norwegian bokmål ('nb') lemmatizer and tag_map
2018-01-29 18:18:50 +01:00
Matthew Honnibal
0c1e7f0c86
Merge pull request #1893 from azarezade/master
Add Persian language
2018-01-29 18:18:33 +01:00
Matthew Honnibal
cbdab75b36 Increment version 2018-01-28 23:46:22 +01:00
Matthew Honnibal
512e6adb08
Merge pull request #1896 from thomasopsomer/fix-sent
Fix sentence boundaries serialization (issue #1834)
2018-01-28 21:18:51 +01:00
Matthew Honnibal
f5b1ad4100 Limit parser model size, to hopefully reduce memory during CI tests 2018-01-28 21:00:32 +01:00
Thomas Opsomer
515e25910e fix sent_start in serialization 2018-01-28 19:50:42 +01:00
Thomas Opsomer
45d62561f7 add test for the issue 2018-01-28 19:49:56 +01:00
ines
6d978e5c35 Don't use deprecated Doc.merge call in displaCy
As reported here: https://stackoverflow.com/a/48464412/6400719
2018-01-27 11:25:05 +01:00
Ali Zarezade
bb6bd3d8ae add persian language 2018-01-27 13:27:26 +03:30
Ali Zarezade
d195675db5 add persian language 2018-01-27 13:21:38 +03:30
Kit
4b42267ba3
Fix issue #1889 2018-01-25 23:17:22 +01:00
Kit
52ef51f36e
Add test for issue #1889 2018-01-25 22:56:48 +01:00
Ole Henrik Skogstrøm
8e2c9f2475 Cleaned up nb tag_map comments 2018-01-25 11:09:28 +01:00
Ole Henrik Skogstrøm
1107e89fcf Updated doc string on nb tag_map module 2018-01-25 11:08:28 +01:00
Matthew Honnibal
6a8cb905aa
Merge pull request #1876 from GregDubbin/master
Pattern matcher fixes
2018-01-24 16:38:11 +01:00
Matthew Honnibal
38b260e0c3
Merge pull request #1879 from azarezade/master
Add Persian character and symbols
2018-01-24 16:34:22 +01:00
Matthew Honnibal
edb71a280e Add test for #1883: Unpickling Matcher 2018-01-24 15:42:33 +01:00
Matthew Honnibal
2ad050e668 Fix unpickling of Matcher. Also store correct data in matcher._patterns 2018-01-24 15:42:11 +01:00
Ole Henrik Skogstrøm
4058a7d579 Fix æøå characters in lemmatizer 2018-01-24 14:03:14 +01:00
Ole Henrik Skogstrøm
42248f423f Updated tag map 2018-01-24 13:50:33 +01:00
Ole Henrik Skogstrøm
74b430b49a Correct Lemmatizer 2018-01-24 13:26:33 +01:00
Ole Henrik Skogstrøm
b9b3a40c78 Add norwegian lemmatizer and tag_map 2018-01-24 12:28:29 +01:00
Matthew Honnibal
42a18ef903 Add test for #1868: Vocab.__contains__ with ints 2018-01-23 23:27:05 +01:00
Matthew Honnibal
43f381ce36 Make Vocab.__contains__ work with ints. Fixes #1868 2018-01-23 23:26:47 +01:00
greg
85ab99e692 Correct test examples 2018-01-23 15:00:14 -05:00
greg
f50bb1aafc Restructure StateC to eliminate dependency on unordered_map 2018-01-23 14:40:03 -05:00
Matthew Honnibal
f3753c2453 Further model deserialization fixes re #1727 2018-01-23 19:16:05 +01:00
Matthew Honnibal
91e916cb67 Add comment to new test 2018-01-23 19:11:53 +01:00
Matthew Honnibal
fd187d71ad Add test for #1727 2018-01-23 19:11:01 +01:00
Matthew Honnibal
85c942a6e3 Dont overwrite pretrained_dims setting from cfg. Fixes #1727 2018-01-23 19:10:49 +01:00
Ali Zarezade
42349471bc
add ٪ as punctuation 2018-01-23 18:11:33 +03:30
Ali Zarezade
2bda582135
Add Persian character and symbols
Add Persian characters and the following:
- ٪ used instead of %
- ؟ used instead of ?
- ﷼ used instead of $
- ، used instead of ,
- ؛ used instead of ;
2018-01-23 13:20:36 +03:30
Matthew Honnibal
7e6dc283db Fix unicode import in test 2018-01-22 23:55:44 +01:00
greg
686735b94e Fix matcher import 2018-01-22 16:53:05 -05:00
greg
3a491093ee Import libcpp.map if libcpp.unordered_map doesn't exist 2018-01-22 16:46:25 -05:00
greg
d55992bdf0 Switch match dictionary to use final state pointer rather than ID 2018-01-22 15:36:47 -05:00
Matthew Honnibal
4ce7d24fd5 Add test for #1799: Set left and right edges (and thus sentences) in non-projective parses. 2018-01-22 20:18:38 +01:00
Matthew Honnibal
56164ab688 Set l_edge and r_edge correctly for non-projective parses. Fixes #1799 2018-01-22 20:18:04 +01:00
Matthew Honnibal
964aa1b384 Merge branch 'master' of https://github.com/explosion/spaCy 2018-01-22 19:18:46 +01:00
Matthew Honnibal
29897ed1b3 Allow vector loading to work on 1d data files. Fixes #1831 2018-01-22 19:18:26 +01:00
greg
490bc82c27 Add comments clarifying matcher logic for '*' 2018-01-22 10:03:12 -05:00
Matthew Honnibal
fe4748fc38
Merge pull request #1870 from avadhpatel/master
Model Load Performance Improvement by more than 5x
2018-01-22 00:05:15 +01:00
Avadh Patel
a517df55c8 Small fix
Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-21 15:20:45 -06:00
Avadh Patel
5b5029890d Merge branch 'perfTuning' into perfTuningMaster
Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-21 15:20:00 -06:00
Matthew Honnibal
203d2ea830 Allow multitask objectives to be added to the parser and NER more easily 2018-01-21 19:37:02 +01:00
Matthew Honnibal
4a7d524efb Merge branch 'master' of https://github.com/explosion/spaCy 2018-01-21 19:22:03 +01:00
Matthew Honnibal
61a051f2c0 Fix MultitaskObjective 2018-01-21 19:21:34 +01:00
Avadh Patel
75903949da Updated model building after suggestion from Matthew
Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-18 06:51:57 -06:00
Avadh Patel
fe879da2a1 Do not train model if its going to be loaded from disk
This saves significant time in loading a model from disk.

Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-17 06:16:07 -06:00
Avadh Patel
2146faffee Do not train model if its going to be loaded from disk
This saves significant time in loading a model from disk.

Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-17 06:04:22 -06:00
greg
7072b395c9 Add greedy matcher tests 2018-01-16 15:46:13 -05:00
greg
441f490c1c Merge branch 'master' of github.com:GregDubbin/spaCy 2018-01-16 13:31:10 -05:00
greg
8bea62f26e Correct bugs for greedy matching and introduce ADVANCE_PLUS action 2018-01-16 13:21:43 -05:00
Matthew Honnibal
ccb51a9f36 Make .similarity() return 1.0 if all orth attrs match 2018-01-15 16:29:48 +01:00
Matthew Honnibal
82135d85b7 Fix test 2018-01-15 15:55:15 +01:00
Matthew Honnibal
4b09616b58 Add test for #1757: Comparison against None 2018-01-15 15:55:01 +01:00
Matthew Honnibal
b904d81e9a Fix rich comparison against None objects. Closes #1757 2018-01-15 15:51:25 +01:00
Matthew Honnibal
9e413449f6 Fix unicode error in new test 2018-01-15 15:39:00 +01:00
Matthew Honnibal
ab7c45b12d Fix error message and handling of doc.sents 2018-01-15 15:21:11 +01:00
Matthew Honnibal
6b215d2dd3 Add test for Issue #1537 2018-01-15 15:20:56 +01:00
ines
5babb7d6f6 Merge branch 'master' of https://github.com/explosion/spaCy 2018-01-14 17:31:09 +01:00
ines
793890cb4d Remove test for removed deprecation warning 2018-01-14 17:31:06 +01:00
Matthew Honnibal
465a6f6452 Add missing Span.vocab property. Closes #1633 2018-01-14 15:06:30 +01:00
Matthew Honnibal
0cb090e526 Fix infinite recursion in token.sent_start. Closes #1640 2018-01-14 15:02:15 +01:00
Matthew Honnibal
5cbe913b6f Don't raise deprecation warning in property. Closes #1813, #1712 2018-01-14 14:55:58 +01:00
Matthew Honnibal
1a1cca6052 Fix vectors.resize() on Py3. Closes #1539 2018-01-14 14:48:51 +01:00
Matthew Honnibal
0153220304 Make set_vector add word to vocab. Fixes #1807 2018-01-14 13:57:57 +01:00
Ines Montani
55754f0cee
Merge pull request #1836 from fucking-signup/master
Add tests for issue #1769
2018-01-13 00:23:35 +00:00
Kit
4ee97f20a0
Mark like_num tests as slow 2018-01-13 00:44:15 +01:00
Kit
855531537e
Rewrite tests for issue #1769 2018-01-12 23:49:51 +01:00
Kit
5b541cb5ec
Simplify tests for issue #1769 2018-01-12 23:34:27 +01:00
Kit
7a2adc4633
Remove some tests to see build status changes 2018-01-12 22:49:16 +01:00
Kit
0e62809a43
Rewrite tests for issue #1769 2018-01-12 22:26:06 +01:00
Ines Montani
36f426fe0a
Merge pull request #1808 from fucking-signup/master
Fix issue #1769
2018-01-12 21:12:02 +00:00
Kit
76f4eeca44
Remove tests to see build changes on Windows (Python 2.7) 2018-01-12 20:30:51 +01:00
Matthew Honnibal
7ca49c2061
Merge branch 'master' into feature-improve-model-download 2018-01-10 18:21:55 +01:00
Kit
7ec0956e8d
Add regression test (issue #1769) 2018-01-08 03:42:04 +01:00
Kit
701e7cc6aa
Rename variable to keep code consistent 2018-01-08 03:38:44 +01:00
Kit
ed0db95183
Find lowercased forms of ordinal words, where possible 2018-01-08 03:28:50 +01:00
Kit
9bc524982e
Find lowercased forms of numeric words 2018-01-08 03:25:08 +01:00
Søren Lind Kristiansen
62de5da1ff Remove unsused dummy variable 2018-01-05 09:57:24 +01:00
Søren Lind Kristiansen
10dab8eef8 Remove dummy variable from function calls 2018-01-05 09:37:05 +01:00
Søren Lind Kristiansen
7f0ab145e9 Don't pass CLI command name as dummy argument 2018-01-04 21:33:47 +01:00
Ines Montani
6a008233b5
Merge pull request #1795 from textioHQ/issue1758 (resolves #1758)
english tokenizer: handle "would've"
2018-01-04 02:43:39 +00:00
Kevin Humphreys
597df5bf83 add test 2018-01-03 13:00:05 -08:00
Kevin Humphreys
7918fa4ef9 handle would've 2018-01-03 12:25:48 -08:00
ines
2c656f90fb Exit with 1 if incompatible models found (see #1714) 2018-01-03 21:20:35 +01:00
ines
dacfaa2ca4 Ensure that download command exits properly (resolves #1714) 2018-01-03 21:03:36 +01:00
Søren Lind Kristiansen
a9ff6eadc9 Prefix dummy argument names with underscore 2018-01-03 20:48:12 +01:00
ines
1081e08efb Fix formatting 2018-01-03 20:14:50 +01:00
ines
d8109964d6 Use --no-deps on model install
In general, it's nice for models to specify spaCy as a dependency. However, this tends to cause problems in conda environments, as pip will re-install spaCy and its dependencies (especially Thinc)
2018-01-03 17:40:37 +01:00
ines
319d754309 Fix overwriting of existing symlinks
Check for is_symlink() to also overwrite invalid and outdated symlinks. Also show better error message if link path exists but is not symlink (i.e. file or directory).
2018-01-03 17:39:36 +01:00
ines
8ba0dfd017 Make message on failed linking more clear 2018-01-03 17:38:09 +01:00
Søren Lind Kristiansen
d6327e8495 Fix handling case when vectors not specified 2018-01-03 12:20:49 +01:00
Søren Lind Kristiansen
bcc51d7d8b Fix shifted positional arguments 2018-01-03 12:19:47 +01:00
zqhZY
f27859fa99 add ChineseDefaults class for pickling 2017-12-28 17:13:58 +08:00
Ines Montani
ff9fc945ab
Merge pull request #1749 from sorenlind/da_ud_tokenization
Tune Danish tokenizer to more closely match Universal Dependencies
2017-12-22 16:00:49 +00:00
ines
26f313dabc Fix missing import 2017-12-22 16:21:44 +01:00
ines
8dc1c27841 Merge branch 'master' of https://github.com/explosion/spaCy 2017-12-22 16:01:00 +01:00
ines
b10ba848b8 xfail test that causes MemoryError on Python 2 on Windows
Need to investigate this further!
2017-12-22 16:00:58 +01:00
Søren Lind Kristiansen
bef735aef7 Fix Danish abbreviation 'm.h.t.' 2017-12-21 09:24:31 +01:00
Ines Montani
a3dd167d7f
Merge branch 'master' into da_ud_tokenization 2017-12-20 21:05:34 +00:00
Ines Montani
97f100f69f
Merge pull request #1742 from kimfalk/master
Two corrections in the da lan.
2017-12-20 21:02:00 +00:00
Ines Montani
d682a8803e
Merge pull request #1672 from cbilgili/master
Adds Turkish Lemmatization
2017-12-20 21:01:00 +00:00
Benjamin Peterson
9452134cd1 remove no-break spaces from Hindi example (fixes #1750) 2017-12-20 11:35:30 -08:00
Søren Lind Kristiansen
7a2f2f6f94 Fix formatting. 2017-12-20 18:37:37 +01:00
Søren Lind Kristiansen
15d13efafd Tune Danish tokenizer to more closely match tokenization in Universal Dependencies. 2017-12-20 17:36:52 +01:00
Kim FalkJørgensen
648dc60755 Remove the incorrect exception 'm.h.t' 2017-12-20 10:02:39 +01:00
Kim FalkJørgensen
9c9f4ef84a Fixing a translation error in examples.py
Adding an exception in the tokenizer_exceptions.py
2017-12-19 15:26:50 +01:00
ines
22dc744b48 Fix check for '@' in like_url (see #1715) 2017-12-16 13:48:43 +01:00
Ines Montani
9c1ee65268
Add regression test for #1698 2017-12-12 10:36:11 +01:00
Ines Montani
6455b574fc
Check for email address first 2017-12-12 10:25:13 +01:00
Bri-Will
d77361d76c
Update lex_attrs.py. Fix like_url from matching on e-mail 2017-12-11 14:13:28 -08:00
Søren Lind Kristiansen
5a9d377580 Remove abbreviation for positional plac argument 2017-12-11 11:08:29 +01:00
Isaac Sijaranamual
38021fbb00 Switch from python 3 only TemporaryDirectory to pytest's tmpdir 2017-12-11 00:16:04 +01:00
Isaac Sijaranamual
20ae0c459a Fixes "Error saving model" #1622 2017-12-10 23:07:13 +01:00
Isaac Sijaranamual
568130ce7c Adds regression test_issue1622 2017-12-10 23:00:48 +01:00
Isaac Sijaranamual
e188b61960 Make cli/train.py not eat exception 2017-12-10 22:53:08 +01:00
ines
020a7e5d52 Allow 'fine_grained' option in displaCy (see #1703)
Shows token.tag_ instead of token.pos_. Disabled by default, to not cause rendering issues for models with long fine-grained tags (e.g. merged morphological features).
2017-12-09 15:11:12 +01:00
Matthew Honnibal
3b17eb7c49 Merge branch 'master' of https://github.com/explosion/spaCy 2017-12-07 10:39:32 +01:00
Matthew Honnibal
a6b43729c6 Set version to v2.0.5 2017-12-07 10:39:14 +01:00
ines
5eaa61c2b8 Fix formatting 2017-12-07 10:23:09 +01:00
ines
24e80c51b8 Document init-model command 2017-12-07 10:14:37 +01:00
Matthew Honnibal
c91f451b0f Fix imports and CLI in init-model 2017-12-07 10:03:07 +01:00
ines
82e80ff928 Rename model command to init_model and fix formatting 2017-12-07 09:59:23 +01:00
Ines Montani
2feeb428d6
Merge pull request #1646 from GreenRiverRUS/master
Added model command to create models from raw data
2017-12-07 08:54:26 +00:00
Matthew Honnibal
6373d2580d Increment version to v2.0.5.dev0 2017-12-07 09:53:59 +01:00
Matthew Honnibal
36b47e3fa6 Fix (and test) vector pickling 2017-12-07 09:53:30 +01:00
Matthew Honnibal
05f41ff587 Set version to 2.0.4 2017-12-06 13:24:02 +01:00
Matthew Honnibal
04c38f7e87 Merge branch 'master' of https://github.com/explosion/spaCy 2017-12-06 12:15:52 +01:00
Matthew Honnibal
361944e512 If no rules are set, lemmatize by lookup 2017-12-06 12:12:11 +01:00
Matthew Honnibal
2ab0f2d186
Merge pull request #1664 from jimregan/italian-lemmatizer
BOM in Italian lemmatiser
2017-12-06 11:09:04 +01:00
Matthew Honnibal
3f247119d3
Merge pull request #1668 from sorenlind/da_morph
Add more Danish morph rules and clean up existing ones
2017-12-06 11:08:09 +01:00
Matthew Honnibal
b712de774e Fix vectors pickling 2017-12-05 12:45:24 +01:00
Matthew Honnibal
04650e38c7 Set version to 2.0.4.dev0 2017-12-05 10:52:31 +01:00
Matthew Honnibal
07acb43a85 Merge branch 'master' of https://github.com/explosion/spaCy 2017-12-04 14:42:52 +01:00
Thomas Werkmeister
94eac75b7c
fix setup.py spacy req string for packaging
Requirement should be `spacy>=2.0.2` instead of `spacy2.0.2`
2017-12-03 04:16:28 -06:00
ines
f2ea6d4713 Add Dutch example sentences (see #1107) 2017-12-01 23:36:05 +01:00
Canbey Bilgili
abe098b255 Adds Turkish Lemmatization 2017-12-01 17:04:32 +03:00
Søren Lind Kristiansen
d86b537a38 Enable morph rules for Danish 2017-11-30 15:58:02 +01:00
Søren Lind Kristiansen
13a988adc3 Remove 'Number[psor]' 2017-11-30 15:55:04 +01:00
Søren Lind Kristiansen
dd6fde18a9 Add more Danish morph rules and clean up existing ones 2017-11-30 11:17:19 +01:00
Vadim Mazaev
495eacf470 Merge branch 'model_command' 2017-11-30 12:30:26 +03:00
Vadim Mazaev
4ba7ddf651 Bugfixies 2017-11-30 12:29:38 +03:00
Jim O'Regan
a4ecdeadd4 aha 2017-11-29 23:43:25 +00:00
Jim O'Regan
2c7a9215d7 Merge branch 'master' into animacy 2017-11-29 23:31:12 +00:00
Jim O'Regan
c3e6cee17a use inan in polimorf tagset conversion 2017-11-29 23:15:47 +00:00
Jim O'Regan
b32575e78c imports 2017-11-29 23:03:41 +00:00
Jim O'Regan
3696ce6a7b add UD mapping 2017-11-29 22:59:19 +00:00
Jim O'Regan
f8e7082fe4 typo in "inan", add "nhum" 2017-11-29 22:40:47 +00:00
Matthew Honnibal
6bc0f4d29f
Merge pull request #1611 from fsonntag/master
Solving #1494
2017-11-29 23:11:23 +01:00
Matthew Honnibal
f9ed9ea529
Merge pull request #1624 from GreenRiverRUS/russian
Add support for Russian
2017-11-29 23:10:01 +01:00
Jim O'Regan
076a6fc60a symbols 2017-11-29 20:11:20 +00:00
Jim O'Regan
834ba3c69a (semi generated) Polimorf mapping 2017-11-29 20:08:24 +00:00
Jim O'Regan
ba6a23fd11 BOM in Italian lemmatiser 2017-11-29 17:40:07 +00:00
ines
a31506e060 Fix off-by-one error in nlp.add_pipe(after=name) (fixes #1654) 2017-11-28 20:37:55 +01:00
ines
b62739fbfe Add regression test for #1654 2017-11-28 20:27:54 +01:00
ines
2e50dbb9d7 Simplify test 2017-11-28 20:27:27 +01:00
Felix Sonntag
724ae7dc55 Fixed issue of infix capturing prefixes 2017-11-28 17:17:12 +01:00
Ines Montani
9052643e2c
Merge pull request #1653 from sorenlind/da_example_typo
Fix typo
2017-11-27 14:47:42 +00:00
Søren Lind Kristiansen
5fe58b885b Fix typo 2017-11-27 15:36:18 +01:00
Ines Montani
d52b1ab245
Add unicode_literals (hopefully fixes test failure on Python 2) 2017-11-27 15:16:54 +01:00
Søren Lind Kristiansen
0ffd27b0f6 Add several Danish alternative spellings 2017-11-27 13:35:41 +01:00
Ines Montani
6362024cf8
Merge pull request #1645 from GreenRiverRUS/fix_default_meta
Fixed spaCy version string in default meta
2017-11-27 11:58:02 +00:00
Vadim Mazaev
c332ffdde1 Added model command to create model from raw data:
words counts, brown clusters and vectors
2017-11-27 01:21:47 +03:00
Vadim Mazaev
59f03ab1d7 Fixed spacy version string in default meta 2017-11-26 23:02:07 +03:00
Vadim Mazaev
53e7c38637 Fixed tests depends on pymorphy2 2017-11-26 21:04:44 +03:00
Vadim Mazaev
cacd859dcd Added tag map, fixed tests fails, added more exceptions 2017-11-26 20:54:48 +03:00
Ines Montani
a7bb8f1b42
Merge pull request #1637 from sorenlind/da_tokenization
Improve Danish tokenization
2017-11-26 15:41:38 +00:00
ines
c699aec089 Add offsets_from_biluo_tags helper and tests (see #1626) 2017-11-26 16:38:01 +01:00
Søren Lind Kristiansen
ef03e9ea53 Remove unused import. 2017-11-25 13:04:02 +01:00
Søren Lind Kristiansen
6aa241bcec Add day of month tokenizer exceptions for Danish. 2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen
0c276ed020 Add weekday abbreviations and remove abiguous month abbreviations for Danish. 2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen
056547e989 Add multiple tokenizer exceptions for Danish. 2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen
8dc265ac0c Add test for tokenization of 'i.' for Danish. 2017-11-24 11:29:37 +01:00
Søren Lind Kristiansen
ac8116510d Fix tokenization of 'i.' for Danish. 2017-11-24 11:16:53 +01:00
Matthew Honnibal
79f11d4f85 Pickle vectors with vocab 2017-11-23 17:19:50 +01:00
Matthew Honnibal
f29c3925ee Fix more efficient nonproj 2017-11-23 12:48:00 +00:00
Matthew Honnibal
e10e9ad2c5 Improve efficiency of Doc.to_array 2017-11-23 12:33:27 +00:00
Matthew Honnibal
2acc907d55 Improve profiling 2017-11-23 12:33:03 +00:00
Matthew Honnibal
fa62427300 Remove lookup-based lemmatization 2017-11-23 12:32:22 +00:00
Matthew Honnibal
fb26b2cb12 Use lookup lemmatizer if lemma unset 2017-11-23 12:31:58 +00:00
Matthew Honnibal
db5c714ad2 Improve efficiency of deprojectivization 2017-11-23 12:31:34 +00:00
Matthew Honnibal
8fec7268eb Move string cleanup under a setting flag 2017-11-23 12:19:18 +00:00
Matthew Honnibal
5949777b12 Fix misleading multi-threading docstring 2017-11-23 12:18:59 +00:00
Matthew Honnibal
542e6fd4ea Don't remove entries from specials 2017-11-23 12:17:42 +00:00
Matthew Honnibal
30ba81f881
Merge pull request #1576 from ligser/master
Actually reset caches in pipe [wip]
2017-11-23 12:54:48 +01:00
ines
c90fe92e15 Fix displaCy test 2017-11-22 05:04:39 +01:00
ines
a6f33ac27d Fix displaCy test 2017-11-22 04:19:28 +01:00
ines
93b0be611a Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-22 00:28:55 +01:00
ines
60b4915569 Use .pos_ instead of .tags_ in displaCy by default (see #1006) 2017-11-22 00:28:52 +01:00
Vadim Mazaev
81314f8659 Fixed tokenizer: added char classes; added first lemmatizer and
tokenizer tests
2017-11-21 22:23:59 +03:00
Vadim Mazaev
52ee1f9bf9 Updated Russian Language, added lemmatizer, norm exceptions and lex
attrs
2017-11-21 11:44:46 +03:00
Burton DeWilde
a5c6869b2d Fix bug where span.orth_ != span.text (see #1612) 2017-11-20 12:05:43 -06:00
Burton DeWilde
635792997c Add regression test for #1612 2017-11-20 12:05:35 -06:00
ines
9a63e32f21 Add noqa to Python 2 compat variables of built-ins (see #1617) 2017-11-20 14:03:42 +01:00
ines
d70a64d78b Fix syntax error and formatting in test (see #1617) 2017-11-20 14:01:25 +01:00
ines
17849dee4b Fix French test (see #1617) 2017-11-20 13:59:59 +01:00
Felix Sonntag
33b0f86de3 Changed tokenizer to add infix when infix_start is offset 2017-11-19 16:32:10 +01:00
Felix Sonntag
8be3392302 Added regression text for 1494 2017-11-19 16:30:35 +01:00
Motoki Wu
a52e195a0a Fixes Issue #1207 where noun_chunks of Span gives an error.
Make sure to reference `self.doc` when getting the noun chunks.

Same fix as 9750a0128c
2017-11-17 17:16:20 -08:00
Motoki Wu
b818afaa0e Added failing test for Issue #1207.
The noun chunk iterator should work for `Doc` but not for `Span`.
2017-11-17 17:04:27 -08:00
Vadim Mazaev
a0739a06d4 Returned russian support from v1.10 branch 2017-11-17 17:06:15 +03:00
yuukos
7401152289 updated Russian tokenizer
moved the trying to import pymorph into __init__
2017-11-17 17:04:50 +03:00
yuukos
3aad66cf00 added russian language support 2017-11-17 17:04:22 +03:00
ines
a3d4dd1a5d Test adding of lots of pipeline components (see #1585)
Just to make sure that there's no error now or in the future with adding a large number of pipeline components.
2017-11-15 17:28:06 +01:00
Roman Domrachev
61d28d03e4 Try again to do selective remove cache 2017-11-15 19:11:12 +03:00
Roman Domrachev
b3311100c7 Merge branch 'master' of github.com:explosion/spaCy 2017-11-15 18:30:04 +03:00
Matthew Honnibal
b60d92aca8 Increment version 2017-11-15 16:14:46 +01:00
Roman Domrachev
505c6a2f2f Completely cleanup tokenizer cache
Tokenizer cache can have be different keys than string

That modification can slow down tokenizer and need to be measured
2017-11-15 17:55:48 +03:00
Matthew Honnibal
cf0be62096 Increment version 2017-11-15 15:00:18 +01:00
ines
97a4f9362b Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-15 14:24:00 +01:00
ines
8e65247886 Fix lex.id if vectors is None 2017-11-15 14:23:58 +01:00
Matthew Honnibal
437ad1a852
Merge pull request #1570 from explosion/feature/fix-beam-leak
Fix memory leak in beam parser
2017-11-15 14:15:05 +01:00
Matthew Honnibal
2f169fdb0a Set lex ID correctly for new tokens in Vocab 2017-11-15 13:58:03 +01:00
Matthew Honnibal
fe3c42a06b Fix caching in tokenizer 2017-11-15 13:55:46 +01:00
Matthew Honnibal
8d692771f6 Improve profiling 2017-11-15 13:51:25 +01:00
Matthew Honnibal
b797dca977 Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-15 13:11:43 +01:00
ines
c9d72de0fb Add dummy serialization methods for Japanese and missing lang getter (resolves #1557) 2017-11-15 12:44:02 +01:00
Matthew Honnibal
d274d3a3b9 Let beam forward use minibatches 2017-11-15 00:51:42 +01:00
Matthew Honnibal
855872f872 Remove state hashing 2017-11-14 23:36:46 +01:00
Roman Domrachev
3e21680814 Use safer method to get string without hit 2017-11-14 22:58:46 +03:00
Roman Domrachev
a33d5a068d Try to hold origin data instead of restore it 2017-11-14 22:40:03 +03:00
Roman Domrachev
91e2fa6561 Clean all caches 2017-11-14 21:15:04 +03:00
Roman Domrachev
4e378dc4a4 Remove all obsolete code and test only initial problem 2017-11-14 20:45:04 +03:00
Roman
47ce2347b0
Create test that fails when actual cleanup caused 2017-11-14 20:28:13 +03:00
Roman
caae77f72d
Update strings.pyx 2017-11-14 19:44:40 +03:00
Roman Domrachev
3d247d2bb8 Get back previous testcase 2017-11-14 18:01:37 +03:00
Roman Domrachev
870defa815 Swap keys in proper place
Remove unnecessary clear of the hits
2017-11-14 17:56:30 +03:00
Roman Domrachev
86ca434c93 Merge github.com:explosion/spaCy 2017-11-14 17:46:22 +03:00
Roman Domrachev
a2745b0e84 StringStore now actually cleaned
Do not lose docs in ref tracking
2017-11-14 17:45:50 +03:00
Matthew Honnibal
2512ea9eeb Fix memory leak in beam parser 2017-11-14 02:11:40 +01:00
Matthew Honnibal
86ddf692a1 Fix bug in limit calculation on dev data 2017-11-14 01:37:10 +01:00
Ines Montani
ea6c85c67a
Merge pull request #1566 from MathiasDesch/master (resolves #1248)
Add exceptions to tokenizer and norm
2017-11-13 19:05:22 +01:00
Matthew Honnibal
1b348389bb Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-13 18:18:48 +01:00
Matthew Honnibal
ca73d0d8fe Cleanup states after beam parsing, explicitly 2017-11-13 18:18:26 +01:00
Matthew Honnibal
63ef9a2e73 Remove __dealloc__ from ParserBeam 2017-11-13 18:18:08 +01:00
Mathias Deschamps
c0691b2ab4 Add tokenizer exceptions for ing verbs
Extend list of tokenizing exceptions introduced in 123810b
2017-11-13 17:46:05 +01:00
Mathias Deschamps
288298ead9 Add norm exception for ing verbs
Some ing verbs are sometimes written in or in'. Make the NORM form correct
2017-11-13 17:46:05 +01:00
Abhinav Sharma
59f5740ede
improved upon the list of included stop_words 2017-11-13 17:13:49 +05:30
Matthew Honnibal
6e641f46d4 Create a preprocess function that gets bigrams 2017-11-12 00:43:41 +01:00
Matthew Honnibal
c9251d79e3
Edit comment 2017-11-11 18:38:32 +01:00
Matthew Honnibal
dd1678eab3
Edit comment 2017-11-11 18:37:08 +01:00
Roman Domrachev
ee60a52ee7 Fix test imports and last batch cleanup 2017-11-11 11:32:16 +03:00
Roman Domrachev
4a6b094e09 Remove unused import 2017-11-11 03:13:05 +03:00
Roman Domrachev
3c600adf23 Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
ines
ee97fd3cb4 Add regression test for #1547 2017-11-11 00:14:03 +01:00
ines
2df27db671 Add unicode declaration 2017-11-11 00:13:56 +01:00
ines
35653bef3a Add missing import (fixes #1546) 2017-11-10 19:05:18 +01:00
ines
4c5d2c80d5 Re-add python -m to commands, too brittle :( (see #1536) 2017-11-10 02:30:55 +01:00
ines
123810b6de Add "lovin'" to tokenizer exceptions (see #1248) 2017-11-09 17:09:30 +01:00
ines
1c218397f6 Ensure path in Doc.to_disk/from_disk (resolves ##1521)
Also add Doc serialization tests with both Path and string path options
2017-11-09 02:29:03 +01:00
Matthew Honnibal
49fd5a646f Set version for 2.0.2 release 2017-11-08 22:39:39 +01:00
Matthew Honnibal
fba2dbddf7 Increment version 2017-11-08 22:19:08 +01:00
Matthew Honnibal
a5ea0fdf5a Fix #1518: vocab.vectors.resize() didn't work 2017-11-08 22:18:37 +01:00
Matthew Honnibal
de45702bbe Strip dev suffixes from version for compatibility check 2017-11-08 18:40:21 +01:00
Matthew Honnibal
51639214a1 Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-08 18:04:33 +01:00
Matthew Honnibal
a2f980de4e Exclude .devN versioning from compatibility check 2017-11-08 18:03:52 +01:00
Daniel Hershcovich
d7ae54ff44
Fix typo in message 2017-11-08 16:06:28 +02:00
Matthew Honnibal
4194bc5744 Xfail flakey serialization test 2017-11-08 13:55:13 +01:00
Matthew Honnibal
d5537e5516 Work on Windows test failure 2017-11-08 13:25:18 +01:00
Matthew Honnibal
c27c82d5f9 Fix serialization 2017-11-08 13:08:48 +01:00
Matthew Honnibal
1d5599cd28 Fix dtype 2017-11-08 12:18:32 +01:00
Matthew Honnibal
fa7fdd0d9b Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-08 12:11:31 +01:00
Matthew Honnibal
072ff38a01 Try to fix python3.5 serialization 2017-11-08 12:10:49 +01:00
Ines Montani
3a0f34d567
Merge pull request #1509 from abhi18av/patch-1
Create examples.py for Hindi language
2017-11-08 11:37:19 +01:00
Ines Montani
42b241ccd0
Update language code in usage example in comment 2017-11-08 11:36:38 +01:00
Matthew Honnibal
e262e8d942 Increment version to v2.0.2.dev0 2017-11-08 11:25:47 +01:00
Matthew Honnibal
a8b592783b Make a dtype more specific, to fix a windows build 2017-11-08 11:24:35 +01:00
Abhinav Sharma
84edade82d
Create examples.py
Populated the file with the translations of English example sentences
2017-11-08 13:23:08 +05:30
Matthew Honnibal
d725aee4e2 Increment version to 2.0.1 2017-11-08 02:14:47 +01:00
Matthew Honnibal
8d6f68f1df Increment version 2017-11-08 01:12:34 +01:00
ines
bcf42b8846 Fix typo 2017-11-08 01:06:37 +01:00
Matthew Honnibal
bbd2a3dee1 Fix title in about.py 2017-11-07 14:02:58 +01:00
Matthew Honnibal
4efaf9306c Set version to spacy-nightly rc2 2017-11-07 13:27:26 +01:00
Matthew Honnibal
bf1ec2965f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-07 13:20:29 +01:00
Matthew Honnibal
726f689da4 Fix missing import 2017-11-07 13:20:12 +01:00
ines
834f9c1aab Update about.py 2017-11-07 13:11:33 +01:00
ines
a4662a31a9 Move model package templates to cli.package and update docs 2017-11-07 12:15:35 +01:00
ines
a09c096d3c Get docs ready for v2.0.0 2017-11-07 12:00:43 +01:00
Matthew Honnibal
9a88e66103 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-07 02:00:06 +01:00
Matthew Honnibal
174abe4677 Increment to 2.0.0rc1 2017-11-07 01:59:46 +01:00
ines
42a0fbf291 Fix textcat simple train example 2017-11-07 01:25:54 +01:00
ines
8fb48b9b91 Update and document new util functions 2017-11-07 00:22:43 +01:00
Matthew Honnibal
1cab703bba Move minibatch function to util 2017-11-06 23:45:36 +01:00
ines
5f43953536 Move test 2017-11-06 23:14:10 +01:00
Matthew Honnibal
dd90fe09f5 Remove extraneous label from textcat class 2017-11-06 22:09:02 +01:00
Matthew Honnibal
45e0617e61 Allow Language.update to take unicode text and dict objects 2017-11-06 22:07:38 +01:00
Matthew Honnibal
1831dbd065 Add test of simple textcat workflow 2017-11-06 22:04:29 +01:00
Matthew Honnibal
ffb9101f3f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-06 19:20:41 +01:00
Matthew Honnibal
8fea512ac8 Don't set tensor in textcat 2017-11-06 19:20:14 +01:00
ines
acb9bdb852 Fix PRON_LEMMA imports 2017-11-06 17:41:53 +01:00
Matthew Honnibal
7d46793dd7 Add PRON_LEMMA to spacy.symbols 2017-11-06 17:38:25 +01:00
Matthew Honnibal
2f7e9f390d Make test less flakey 2017-11-06 17:34:50 +01:00
Matthew Honnibal
407b08017e Make test less flakey 2017-11-06 17:31:40 +01:00
Matthew Honnibal
102f797933 Fix lemma ordering in test 2017-11-06 17:02:17 +01:00
Matthew Honnibal
75e1618ec3 Fix lemma clobbering 2017-11-06 16:56:19 +01:00
Matthew Honnibal
6fdffd7246
Merge pull request #1497 from explosion/feature/improve-optimizer-handling
💫 Improve optimizer handling
2017-11-06 16:41:15 +01:00
Matthew Honnibal
8e6795437b Set release=True 2017-11-06 16:39:32 +01:00
Matthew Honnibal
5c85bf3791 Fix missing import 2017-11-06 15:06:27 +01:00
Matthew Honnibal
25859dbb48 Return optimizer from begin_training, creating if necessary 2017-11-06 14:26:49 +01:00
Matthew Honnibal
465adfee94 Remove unused resume_training method, and pass optimizer through 2017-11-06 14:26:00 +01:00
Matthew Honnibal
13336a6197 Fix Adam import 2017-11-06 14:25:37 +01:00
Matthew Honnibal
2eb11d60f2 Add function create_default_optimizer to spacy._ml 2017-11-06 14:11:59 +01:00
Matthew Honnibal
31babe3c3f Fix non-clobbering lemmatization 2017-11-06 12:36:05 +01:00
Matthew Honnibal
63c6ae4191 Fix lemmatizer test 2017-11-06 11:57:06 +01:00
Matthew Honnibal
a86a0181b5 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-05 22:19:10 +01:00
Matthew Honnibal
134d3b8143 Fix morphology 2017-11-05 22:18:22 +01:00
ines
08d1cf850a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-05 21:41:58 +01:00
ines
baa231745c Fix Dutch tag map 2017-11-05 21:41:50 +01:00
Matthew Honnibal
46e62ad747 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-05 19:40:00 +01:00
Matthew Honnibal
bb25cb0f76 Avoid clobbering preset lemmas 2017-11-05 19:39:38 +01:00
ines
507ecb67af Fix Spanish tag map 2017-11-05 19:23:34 +01:00
Matthew Honnibal
320008352b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-05 18:46:15 +01:00
Matthew Honnibal
38109a0e4a Register SentenceSegmenter in Language.factories 2017-11-05 18:45:57 +01:00
ines
975e1042ff Fix Italian tag map 2017-11-05 18:34:09 +01:00
ines
6b2d6e4937 Fix Portuguese tag map 2017-11-05 18:31:00 +01:00
ines
fa2687fded Fix Dutch tag map 2017-11-05 17:57:59 +01:00
ines
fb8990d916 Fix Spanish tag map 2017-11-05 17:48:46 +01:00
ines
9d13288f73 Fix French tag map 2017-11-05 17:47:59 +01:00
ines
54579805c5 Fix French tag map 2017-11-05 17:44:05 +01:00
Matthew Honnibal
2b35bb76ad Fix tensorizer on GPU 2017-11-05 15:34:40 +01:00
Matthew Honnibal
6e5181bbaa Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-05 15:33:56 +01:00
Matthew Honnibal
6f438b17c1 Increment version to v2.0.0a19 2017-11-05 14:43:36 +01:00
Matthew Honnibal
225cc249c9 Pass string path to numpy, to fix #1479 2017-11-05 14:42:46 +01:00
Matthew Honnibal
00435d8f0c Add extra beam parsing test 2017-11-05 14:39:57 +01:00
Matthew Honnibal
e777ea25bb
Merge pull request #1492 from uwol/develop
TextCategorizer return parameter fix
2017-11-05 14:13:04 +01:00
Matthew Honnibal
0d4bd6414e Fix Italian tag map 2017-11-05 14:11:03 +01:00
ines
ef597622a6 Add Portuguese tag map 2017-11-05 13:58:34 +01:00
ines
793c62dfda Add Dutch tag map 2017-11-05 13:48:07 +01:00
ines
f7485a09c8 Fix Italian tag map 2017-11-05 13:12:58 +01:00
uwol
a2162b8908 tensorizer return parameter fix 2017-11-05 12:25:10 +01:00
ines
0a27afbf86 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-04 23:32:52 +01:00
ines
3cef901834 Add tag map for French and Italian 2017-11-04 23:32:51 +01:00
Matthew Honnibal
cfb83c231c Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-04 23:08:19 +01:00
Matthew Honnibal
d185927998 Undo harmful pickling hacks on Language class 2017-11-04 23:07:03 +01:00
ines
6c15aafebd Fix formatting 2017-11-04 23:07:02 +01:00
Matthew Honnibal
3ca16ddbd4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-04 00:25:02 +01:00
Matthew Honnibal
e4ec4be948 Fix parser test 2017-11-04 00:23:45 +01:00
Matthew Honnibal
98c29b7912 Add padding vector in parser, to make gradient more correct 2017-11-04 00:23:23 +01:00
ines
5e7d98f72a Remove test for #1491 2017-11-03 22:10:57 +01:00
ines
718f1c50fb Add regression test for #1491 2017-11-03 21:11:20 +01:00
Matthew Honnibal
144a93c2a5 Back-off to tensor for similarity if no vectors 2017-11-03 20:56:33 +01:00
Matthew Honnibal
1e9634691a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-03 20:21:15 +01:00
Matthew Honnibal
13c8881d2f Expose parser's tok2vec model component 2017-11-03 20:20:59 +01:00
Matthew Honnibal
17c63906f9 Update tensorizer component 2017-11-03 20:20:26 +01:00
Matthew Honnibal
2bf21cbe29 Update model after optimising it instead of waiting 2017-11-03 20:20:01 +01:00
Matthew Honnibal
d6e831bf89 Fix lemmatizer tests 2017-11-03 19:46:34 +01:00
ines
eef930c73e Assert instead of print 2017-11-03 18:50:57 +01:00
ines
f0986df94b Add test for #1488 (passes on v2.0.0a18?) 2017-11-03 14:44:36 +01:00