Matthew Honnibal
|
36e481c584
|
Revert "Improve parser oracle around sentence breaks."
This reverts commit 50817dc9ad .
|
2018-02-26 10:53:55 +01:00 |
|
Matthew Honnibal
|
5faae803c6
|
Add option to not use Janome for Japanese tokenization
|
2018-02-26 09:39:46 +01:00 |
|
Matthew Honnibal
|
9b406181cd
|
Add Chinese.Defaults.use_jieba setting, for UD
|
2018-02-25 15:12:38 +01:00 |
|
Matthew Honnibal
|
9ccd0c643b
|
Add Vietnamese
|
2018-02-25 15:00:46 +01:00 |
|
Matthew Honnibal
|
d4fdb97c87
|
Fix alignment for words with spaces
|
2018-02-25 14:55:00 +01:00 |
|
Matthew Honnibal
|
6d2c1ef52c
|
Fix SP tag in generic tag map
|
2018-02-24 16:04:56 +01:00 |
|
Matthew Honnibal
|
5cc3bd1c1d
|
Update alignment tests
|
2018-02-24 16:03:58 +01:00 |
|
Matthew Honnibal
|
6138439469
|
Fix many-to-one alignment
|
2018-02-24 16:03:50 +01:00 |
|
Matthew Honnibal
|
4890ee1732
|
Fix scoring of tokenization for punct
|
2018-02-24 10:32:32 +01:00 |
|
Matthew Honnibal
|
12b39f87da
|
Move cython declarations in matcher.pyx
|
2018-02-24 10:32:18 +01:00 |
|
Matthew Honnibal
|
01d1b7abdf
|
Support many-to-one alignment in GoldParse
|
2018-02-24 10:17:01 +01:00 |
|
Matthew Honnibal
|
7865746574
|
Support many-to-one alignment
|
2018-02-24 02:09:53 +01:00 |
|
Matthew Honnibal
|
458710b831
|
Poke matcher test for appveyor
|
2018-02-23 23:53:48 +01:00 |
|
Matthew Honnibal
|
968dabdde4
|
Fix bug in multi-task objective
|
2018-02-23 23:48:09 +01:00 |
|
Matthew Honnibal
|
2c9c8b8d72
|
Try comming out emoji test in matcher
|
2018-02-23 23:34:35 +01:00 |
|
Matthew Honnibal
|
980ad68cbe
|
Try to find test that fails on appveyor
|
2018-02-23 21:27:53 +01:00 |
|
Matthew Honnibal
|
39de8cd4d3
|
Try to find test failing on appveyor
|
2018-02-23 20:59:21 +01:00 |
|
Matthew Honnibal
|
4492a33a9d
|
Fix sent_start multi-task objective when alignment fails
|
2018-02-23 16:50:59 +01:00 |
|
Matthew Honnibal
|
5fa44e93f1
|
Set unicode_literals in matcher
|
2018-02-23 16:48:54 +01:00 |
|
Matthew Honnibal
|
12264f9296
|
Add multi-task objective for sentence segmentation
|
2018-02-23 16:25:57 +01:00 |
|
Matthew Honnibal
|
e7deadb519
|
Set version to 2.1.0.dev1
|
2018-02-23 16:22:24 +01:00 |
|
Matthew Honnibal
|
7b575a119e
|
Try to reduce memory usage of test_matcher
|
2018-02-23 15:34:37 +01:00 |
|
Matthew Honnibal
|
24563f4026
|
Fix data typing in align
|
2018-02-23 15:08:06 +01:00 |
|
Matthew Honnibal
|
7a5ba20692
|
Fix integer typing in _align
|
2018-02-23 14:51:24 +01:00 |
|
Matthew Honnibal
|
875411b875
|
Set unicode types in _align.pyx and test
|
2018-02-23 14:35:38 +01:00 |
|
Matthew Honnibal
|
51d9679aa3
|
Fix broken span.as_doc test
|
2018-02-23 14:22:24 +01:00 |
|
dejanmarich
|
71c261d58b
|
Update stop_words.py
Added more words
|
2018-02-23 10:31:01 +01:00 |
|
Matthew Honnibal
|
3e6c1111b7
|
Remove obsolete test
|
2018-02-23 03:22:07 +01:00 |
|
Matthew Honnibal
|
a4fdec524a
|
Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-gold
|
2018-02-22 21:44:28 +01:00 |
|
Matthew Honnibal
|
50817dc9ad
|
Improve parser oracle around sentence breaks.
|
2018-02-22 19:22:26 +01:00 |
|
Matthew Honnibal
|
307aefe131
|
Increment version to v2.0.9
|
2018-02-22 17:07:53 +01:00 |
|
Feng Niu
|
1c60384bed
|
return on empty doc
|
2018-02-21 15:39:04 -08:00 |
|
Feng Niu
|
7eb1cd100b
|
unbound doc var
|
2018-02-21 15:05:37 -08:00 |
|
Feng Niu
|
8df75b229c
|
fix unbound vars in es.syntax_iterators
|
2018-02-21 13:11:17 -08:00 |
|
alldefector
|
4244e285c2
|
Fix Spanish noun_chunks failure caused by typo
|
2018-02-21 12:43:21 -08:00 |
|
Matthew Honnibal
|
661873ee4c
|
Randomize the rebatch size in parser
|
2018-02-21 21:02:07 +01:00 |
|
Matthew Honnibal
|
0872cf611d
|
Don't lower-case lemmas of proper nouns
|
2018-02-21 16:01:16 +01:00 |
|
Matthew Honnibal
|
a0ddb803fd
|
Make error when no label found more helpful
|
2018-02-21 16:00:59 +01:00 |
|
Matthew Honnibal
|
ea2fc5d45f
|
Improve length and freq cutoffs in parser
|
2018-02-21 16:00:38 +01:00 |
|
Matthew Honnibal
|
e5757d4bf0
|
Add labels property to parser
|
2018-02-21 16:00:00 +01:00 |
|
Matthew Honnibal
|
eff4ae809a
|
Fix nonproj label filter
|
2018-02-21 15:59:04 +01:00 |
|
Matthew Honnibal
|
e624405cda
|
Temporarily remove cutoff when filtering labels in nonproj
|
2018-02-21 13:53:40 +01:00 |
|
Matthew Honnibal
|
f466f0186e
|
Use new alignment implementation in GoldParse
|
2018-02-20 21:16:35 +01:00 |
|
Matthew Honnibal
|
c0734ba526
|
Make alignment work with strings
|
2018-02-20 17:51:49 +01:00 |
|
Matthew Honnibal
|
8180c84a98
|
Add tests for new Levenshtein alignment
|
2018-02-20 17:32:25 +01:00 |
|
Matthew Honnibal
|
930c980570
|
Add improved Levenshtein alignment implementation
|
2018-02-20 17:31:56 +01:00 |
|
Ines Montani
|
14e7e0f12a
|
Merge pull request #2000 from jimregan/polish-tag-map
Polish tag map
|
2018-02-18 19:05:58 +01:00 |
|
Jim O'Regan
|
664407de5d
|
missing PrepCase attribute
|
2018-02-18 14:46:12 +00:00 |
|
Jim O'Regan
|
95f0673fbc
|
fix typo/missing here too
|
2018-02-18 14:38:27 +00:00 |
|
Matthew Honnibal
|
2bccad8815
|
Fix incorrect matcher test
|
2018-02-18 14:56:12 +01:00 |
|
Matthew Honnibal
|
530172d57a
|
Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-faster-matcher
|
2018-02-18 14:40:42 +01:00 |
|
Matthew Honnibal
|
cf0e320f2b
|
Add doc.is_sentenced attribute, re #1959
|
2018-02-18 14:16:55 +01:00 |
|
Matthew Honnibal
|
1e5aeb4eec
|
Merge pull request #1987 from thomasopsomer/span-sent
Make span.sent work when only manual / custom sbd
|
2018-02-18 14:05:37 +01:00 |
|
Matthew Honnibal
|
1cf774bdc1
|
Add output options return_matches and as_tuples to Matcher
|
2018-02-18 14:00:45 +01:00 |
|
Matthew Honnibal
|
dd9b0945af
|
Fix inconsistencies in the symbols table
|
2018-02-18 13:51:31 +01:00 |
|
Matthew Honnibal
|
66496ac8e1
|
Set version to v2.1.0.dev0
|
2018-02-18 13:48:39 +01:00 |
|
Matthew Honnibal
|
eb3040ce46
|
Merge pull request #1891 from fucking-signup/master
Fix issue #1889
|
2018-02-18 13:47:47 +01:00 |
|
Matthew Honnibal
|
3d7285870b
|
Update matcher branch with v2.0.8 master
|
2018-02-18 13:42:58 +01:00 |
|
ines
|
6bba1db4cc
|
Drop six and related hacks as a dependency
|
2018-02-18 13:29:56 +01:00 |
|
Matthew Honnibal
|
b30b09192a
|
Merge pull request #1665 from jimregan/animacy
typo in "inan", add "nhum"
|
2018-02-18 13:26:53 +01:00 |
|
Matthew Honnibal
|
1b3c98e01b
|
Set version to v2.0.8
|
2018-02-18 12:16:31 +01:00 |
|
Matthew Honnibal
|
f9f46e5a07
|
Revert matcher fixes from GregDubbin
|
2018-02-18 10:59:28 +01:00 |
|
Matthew Honnibal
|
86405e4ad1
|
Fix CLI for multitask objectives
|
2018-02-18 10:59:11 +01:00 |
|
Matthew Honnibal
|
a34749b2bf
|
Add multitask objectives options to train CLI
|
2018-02-17 22:03:54 +01:00 |
|
Matthew Honnibal
|
8f06903e09
|
Fix multitask objectives
|
2018-02-17 18:41:36 +01:00 |
|
Matthew Honnibal
|
d1246c95fb
|
Fix model loading when using multitask objectives
|
2018-02-17 18:11:36 +01:00 |
|
Matthew Honnibal
|
262d0a3148
|
Fix overwriting of lexical attributes when loading vectors during training
|
2018-02-17 18:11:11 +01:00 |
|
Matthew Honnibal
|
c0caf7cf27
|
Fix LANG symbol
|
2018-02-17 18:10:50 +01:00 |
|
Matthew Honnibal
|
0bf2f6be29
|
Add missing symbol for LANG attr. Fixes inconsistent numeric ID
|
2018-02-17 17:37:02 +01:00 |
|
Matthew Honnibal
|
97a228a4ce
|
Increment to v2.0.8.dev0
|
2018-02-17 16:54:36 +01:00 |
|
Matthew Honnibal
|
f7dc64d2a3
|
Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-faster-matcher
|
2018-02-17 16:47:35 +01:00 |
|
Aaron Marquez
|
ea571e8325
|
Merge branch 'master' into issue-1959
|
2018-02-16 15:14:09 -08:00 |
|
Matthew Honnibal
|
7d5c720fc3
|
Fix multitask objective when no pipeline provided
|
2018-02-15 23:50:21 +01:00 |
|
Aaron Marquez
|
f0d3672e17
|
Changed loading EN model
|
2018-02-15 14:28:38 -08:00 |
|
Aaron Marquez
|
3765d84d57
|
Fix issue #1959
|
2018-02-15 12:51:49 -08:00 |
|
Aaron Marquez
|
7ba4111554
|
Add test for issue-1959
|
2018-02-15 12:46:22 -08:00 |
|
Matthew Honnibal
|
59b7cf9db8
|
Add get_beam_parse method in ArcEager, for Prodigy
|
2018-02-15 21:03:16 +01:00 |
|
Matthew Honnibal
|
3e541de440
|
Merge branch 'master' of https://github.com/explosion/spaCy
|
2018-02-15 21:02:55 +01:00 |
|
Thomas Opsomer
|
5d24a81c0b
|
add test for span.sent when doc not parsed
|
2018-02-15 16:59:16 +01:00 |
|
Thomas Opsomer
|
deab391cbf
|
correct check on sent_start & raise if no boundaries
|
2018-02-15 16:58:30 +01:00 |
|
Matthew Honnibal
|
afbd46adfb
|
Remove length cap in PhraseMatcher
|
2018-02-15 16:10:54 +01:00 |
|
Matthew Honnibal
|
4533c7408d
|
Update matcher tests
|
2018-02-15 15:39:47 +01:00 |
|
Matthew Honnibal
|
1c19605426
|
Move matcher2.pyx to matcher.pyx
|
2018-02-15 15:27:03 +01:00 |
|
Matthew Honnibal
|
9ebf2fe7c3
|
Make helper function to get longest matches
|
2018-02-15 15:26:15 +01:00 |
|
Matthew Honnibal
|
4cb861e080
|
Merge pull request #1968 from DuyguA/is_currency
New lexical feature is_currency
|
2018-02-15 12:13:36 +01:00 |
|
Thomas Opsomer
|
b902731313
|
Find span sentence when only sentence boundaries (no parser)
|
2018-02-14 22:18:54 +01:00 |
|
Matthew Honnibal
|
d19dc67886
|
Make get_action nogil, for efficiency
|
2018-02-14 12:16:36 +01:00 |
|
Matthew Honnibal
|
7885b92b45
|
Refactor matcher2, hopefully making it faster
|
2018-02-14 12:11:17 +01:00 |
|
Matthew Honnibal
|
00261eea27
|
Make tests refer to matcher2
|
2018-02-14 12:10:51 +01:00 |
|
Claudiu-Vlad Ursache
|
e28de12cbd
|
Ensure files opened in from_disk are closed
Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706).
|
2018-02-13 20:49:43 +01:00 |
|
Matthew Honnibal
|
262cbe356e
|
Remove caching, as doesn't seem to help for now.
|
2018-02-13 17:15:20 +01:00 |
|
Matthew Honnibal
|
f43d53f2c5
|
Remove print statement
|
2018-02-13 17:15:07 +01:00 |
|
Matthew Honnibal
|
dcd8d89aef
|
Update test for 850, making it work with matcher2
|
2018-02-13 16:35:20 +01:00 |
|
Matthew Honnibal
|
9bdfa5cd4f
|
Remove re comparisons tests, as matcher behaves differently
|
2018-02-13 16:28:52 +01:00 |
|
Matthew Honnibal
|
6d7986b0f1
|
Fix matcher test
|
2018-02-13 16:28:06 +01:00 |
|
Matthew Honnibal
|
9efda9e9ab
|
Add PhraseMatcher in matcher2.pyx
|
2018-02-13 16:27:46 +01:00 |
|
Johannes Dollinger
|
012e874d09
|
Add contributor agreement for emulbreh
|
2018-02-13 13:40:33 +01:00 |
|
Johannes Dollinger
|
bf94c13382
|
Don't fix random seeds on import
|
2018-02-13 12:42:23 +01:00 |
|
Matthew Honnibal
|
0004331895
|
Update notes on matcher2
|
2018-02-13 11:45:45 +01:00 |
|
Matthew Honnibal
|
b4cc39eb74
|
Fix zero-width quantifiers. Passes test_matcher
|
2018-02-13 11:45:32 +01:00 |
|
Matthew Honnibal
|
1b01685f47
|
Fix ZERO_PLUS operator
|
2018-02-12 12:28:03 +01:00 |
|
Matthew Honnibal
|
9115c3ba0a
|
Add TODO in notes
|
2018-02-12 12:06:48 +01:00 |
|
Matthew Honnibal
|
b00326a7fe
|
Move pattern_id out of TokenPattern
|
2018-02-12 12:05:54 +01:00 |
|
Matthew Honnibal
|
d34c732635
|
Add Python notes for rethinking matcher
|
2018-02-12 10:19:29 +01:00 |
|
Matthew Honnibal
|
d7c9b53120
|
Pass kwargs into pipeline components during begin_training
|
2018-02-12 10:18:39 +01:00 |
|
Matthew Honnibal
|
fae5c0dc18
|
Work on matcher2
|
2018-02-12 10:17:43 +01:00 |
|
4altinok
|
ca8728035d
|
added new lex feat to token
|
2018-02-11 18:55:48 +01:00 |
|
4altinok
|
edd7202a06
|
added new symbol
|
2018-02-11 18:55:32 +01:00 |
|
4altinok
|
ed1ac2969e
|
added new lexical feat to lexeme
|
2018-02-11 18:51:48 +01:00 |
|
4altinok
|
94fb0b75e3
|
code for is_currency
|
2018-02-11 18:51:32 +01:00 |
|
4altinok
|
3deef1497a
|
removed 18 and replaced 18 with is_currency
|
2018-02-11 18:51:09 +01:00 |
|
4altinok
|
471d3c9e23
|
added lex test for is_currency
|
2018-02-11 18:50:50 +01:00 |
|
ines
|
c63e99da8a
|
Fix typo in glossary (resolves #1964)
Co-Authored-By: SThomasP <sthomasp@users.noreply.github.com>
|
2018-02-10 11:58:41 +01:00 |
|
Lyndon White
|
6ee5dff51c
|
Make python 3.4 compat module loading (fix #1733)
|
2018-02-09 23:03:35 +08:00 |
|
Matthew Honnibal
|
e361b4f82b
|
Fix #1929: Incorrect NER when pre-set sentence boundaries.
|
2018-02-08 15:25:41 +01:00 |
|
Matthew Honnibal
|
fd9fd275c5
|
Make test for #1945 more precise
|
2018-02-07 02:06:11 +01:00 |
|
Matthew Honnibal
|
c087a14380
|
Merge branch 'master' of https://github.com/explosion/spaCy
|
2018-02-07 01:29:39 +01:00 |
|
Matthew Honnibal
|
76d89b2180
|
Add test for #1945: PhraseMatcher regression
|
2018-02-07 01:29:23 +01:00 |
|
Ines Montani
|
0954e15dda
|
Merge pull request #1913 from ohenrik/nb_syntax_iterator
Norwegian Language (nb) - Added french syntax iterator with explanation
|
2018-02-06 04:59:07 +01:00 |
|
Ole Henrik Skogstrøm
|
251a7805fe
|
Copied French syntax iterator to simplify future changes
|
2018-02-05 14:45:05 +01:00 |
|
Matthew Honnibal
|
2e7391e627
|
Merge pull request #1916 from tokestermw/bug/fix-not-passing-in-model-cfg-in-nlp
Bug/fix not passing in model cfg in nlp
|
2018-02-05 01:19:40 +01:00 |
|
Ali Zarezade
|
9df9da34a3
|
Fix init_model issue
Fixing issue #1928
|
2018-02-03 17:21:34 +03:30 |
|
Matthew Honnibal
|
ebe84e45e5
|
Increment version to 2.0.7
|
2018-02-02 03:39:16 +01:00 |
|
Matthew Honnibal
|
e4b1f57599
|
Increment version
|
2018-02-02 02:33:23 +01:00 |
|
Matthew Honnibal
|
069531c351
|
Merge branch 'master' of https://github.com/explosion/spaCy
|
2018-02-02 02:32:58 +01:00 |
|
Matthew Honnibal
|
f74a802d09
|
Test and fix #1919: Error resuming training
|
2018-02-02 02:32:40 +01:00 |
|
ines
|
f1d3deffac
|
Add Russian example sentences (see #1107)
|
2018-02-01 20:09:40 +01:00 |
|
Matthew Honnibal
|
6b1126c312
|
Merge branch 'master' of https://github.com/explosion/spaCy
|
2018-02-01 02:57:52 +01:00 |
|
ines
|
3c1fb9d02d
|
Make validate command fail more gracefully if version not found
Mostly relevant during develoment when working with .dev versions
|
2018-01-31 22:06:28 +01:00 |
|
Motoki Wu
|
54062b7326
|
added tests for issue #1915
|
2018-01-30 18:30:19 -08:00 |
|
Motoki Wu
|
f4a7d1a423
|
make to sure pass in **cfg to each component when training
|
2018-01-30 18:29:54 -08:00 |
|
ines
|
4046823699
|
Only check component in factories if string (see #1911)
|
2018-01-30 16:29:07 +01:00 |
|
ines
|
ce10d320c4
|
Fix component check in self.factories (see #1911)
|
2018-01-30 16:09:37 +01:00 |
|
Ole Henrik Skogstrøm
|
e40465487c
|
Added french syntax iterator with explenation
|
2018-01-30 15:44:29 +01:00 |
|
ines
|
8901814248
|
Improve error handling if pipeline component is not callable (resolves #1911)
Also add help message if user accidentally calls nlp.add_pipe() with a string of a built-in component name.
|
2018-01-30 15:43:03 +01:00 |
|
Matthew Honnibal
|
a437ba87a3
|
Set release=True
|
2018-01-29 21:26:04 +01:00 |
|
Adam Binford
|
9238749aaf
|
Removed test to avoid network requests
|
2018-01-29 14:48:20 -05:00 |
|
Adam Binford
|
1a2c2f7d7f
|
Fixed auto linking after download and added simple test to check
|
2018-01-29 14:25:21 -05:00 |
|
Matthew Honnibal
|
cb7110c22e
|
Merge pull request #1882 from ohenrik/nb_lemma_and_tag_map
Add norwegian bokmål ('nb') lemmatizer and tag_map
|
2018-01-29 18:18:50 +01:00 |
|
Matthew Honnibal
|
0c1e7f0c86
|
Merge pull request #1893 from azarezade/master
Add Persian language
|
2018-01-29 18:18:33 +01:00 |
|
Matthew Honnibal
|
cbdab75b36
|
Increment version
|
2018-01-28 23:46:22 +01:00 |
|
Matthew Honnibal
|
512e6adb08
|
Merge pull request #1896 from thomasopsomer/fix-sent
Fix sentence boundaries serialization (issue #1834)
|
2018-01-28 21:18:51 +01:00 |
|
Matthew Honnibal
|
f5b1ad4100
|
Limit parser model size, to hopefully reduce memory during CI tests
|
2018-01-28 21:00:32 +01:00 |
|
Thomas Opsomer
|
515e25910e
|
fix sent_start in serialization
|
2018-01-28 19:50:42 +01:00 |
|
Thomas Opsomer
|
45d62561f7
|
add test for the issue
|
2018-01-28 19:49:56 +01:00 |
|
ines
|
6d978e5c35
|
Don't use deprecated Doc.merge call in displaCy
As reported here: https://stackoverflow.com/a/48464412/6400719
|
2018-01-27 11:25:05 +01:00 |
|
Ali Zarezade
|
bb6bd3d8ae
|
add persian language
|
2018-01-27 13:27:26 +03:30 |
|
Ali Zarezade
|
d195675db5
|
add persian language
|
2018-01-27 13:21:38 +03:30 |
|
Kit
|
4b42267ba3
|
Fix issue #1889
|
2018-01-25 23:17:22 +01:00 |
|
Kit
|
52ef51f36e
|
Add test for issue #1889
|
2018-01-25 22:56:48 +01:00 |
|