Commit Graph

4762 Commits

Author SHA1 Message Date
Matthew Honnibal
fe4748fc38
Merge pull request #1870 from avadhpatel/master
Model Load Performance Improvement by more than 5x
2018-01-22 00:05:15 +01:00
Avadh Patel
a517df55c8 Small fix
Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-21 15:20:45 -06:00
Avadh Patel
5b5029890d Merge branch 'perfTuning' into perfTuningMaster
Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-21 15:20:00 -06:00
Matthew Honnibal
203d2ea830 Allow multitask objectives to be added to the parser and NER more easily 2018-01-21 19:37:02 +01:00
Matthew Honnibal
4a7d524efb Merge branch 'master' of https://github.com/explosion/spaCy 2018-01-21 19:22:03 +01:00
Matthew Honnibal
61a051f2c0 Fix MultitaskObjective 2018-01-21 19:21:34 +01:00
Avadh Patel
75903949da Updated model building after suggestion from Matthew
Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-18 06:51:57 -06:00
Avadh Patel
fe879da2a1 Do not train model if its going to be loaded from disk
This saves significant time in loading a model from disk.

Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-17 06:16:07 -06:00
Avadh Patel
2146faffee Do not train model if its going to be loaded from disk
This saves significant time in loading a model from disk.

Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-17 06:04:22 -06:00
greg
7072b395c9 Add greedy matcher tests 2018-01-16 15:46:13 -05:00
greg
441f490c1c Merge branch 'master' of github.com:GregDubbin/spaCy 2018-01-16 13:31:10 -05:00
greg
8bea62f26e Correct bugs for greedy matching and introduce ADVANCE_PLUS action 2018-01-16 13:21:43 -05:00
Matthew Honnibal
ccb51a9f36 Make .similarity() return 1.0 if all orth attrs match 2018-01-15 16:29:48 +01:00
Matthew Honnibal
82135d85b7 Fix test 2018-01-15 15:55:15 +01:00
Matthew Honnibal
4b09616b58 Add test for #1757: Comparison against None 2018-01-15 15:55:01 +01:00
Matthew Honnibal
b904d81e9a Fix rich comparison against None objects. Closes #1757 2018-01-15 15:51:25 +01:00
Matthew Honnibal
9e413449f6 Fix unicode error in new test 2018-01-15 15:39:00 +01:00
Matthew Honnibal
ab7c45b12d Fix error message and handling of doc.sents 2018-01-15 15:21:11 +01:00
Matthew Honnibal
6b215d2dd3 Add test for Issue #1537 2018-01-15 15:20:56 +01:00
ines
5babb7d6f6 Merge branch 'master' of https://github.com/explosion/spaCy 2018-01-14 17:31:09 +01:00
ines
793890cb4d Remove test for removed deprecation warning 2018-01-14 17:31:06 +01:00
Matthew Honnibal
465a6f6452 Add missing Span.vocab property. Closes #1633 2018-01-14 15:06:30 +01:00
Matthew Honnibal
0cb090e526 Fix infinite recursion in token.sent_start. Closes #1640 2018-01-14 15:02:15 +01:00
Matthew Honnibal
5cbe913b6f Don't raise deprecation warning in property. Closes #1813, #1712 2018-01-14 14:55:58 +01:00
Matthew Honnibal
1a1cca6052 Fix vectors.resize() on Py3. Closes #1539 2018-01-14 14:48:51 +01:00
Matthew Honnibal
0153220304 Make set_vector add word to vocab. Fixes #1807 2018-01-14 13:57:57 +01:00
Ines Montani
55754f0cee
Merge pull request #1836 from fucking-signup/master
Add tests for issue #1769
2018-01-13 00:23:35 +00:00
Kit
4ee97f20a0
Mark like_num tests as slow 2018-01-13 00:44:15 +01:00
Kit
855531537e
Rewrite tests for issue #1769 2018-01-12 23:49:51 +01:00
Kit
5b541cb5ec
Simplify tests for issue #1769 2018-01-12 23:34:27 +01:00
Kit
7a2adc4633
Remove some tests to see build status changes 2018-01-12 22:49:16 +01:00
Kit
0e62809a43
Rewrite tests for issue #1769 2018-01-12 22:26:06 +01:00
Ines Montani
36f426fe0a
Merge pull request #1808 from fucking-signup/master
Fix issue #1769
2018-01-12 21:12:02 +00:00
Kit
76f4eeca44
Remove tests to see build changes on Windows (Python 2.7) 2018-01-12 20:30:51 +01:00
Matthew Honnibal
7ca49c2061
Merge branch 'master' into feature-improve-model-download 2018-01-10 18:21:55 +01:00
Kit
7ec0956e8d
Add regression test (issue #1769) 2018-01-08 03:42:04 +01:00
Kit
701e7cc6aa
Rename variable to keep code consistent 2018-01-08 03:38:44 +01:00
Kit
ed0db95183
Find lowercased forms of ordinal words, where possible 2018-01-08 03:28:50 +01:00
Kit
9bc524982e
Find lowercased forms of numeric words 2018-01-08 03:25:08 +01:00
Søren Lind Kristiansen
62de5da1ff Remove unsused dummy variable 2018-01-05 09:57:24 +01:00
Søren Lind Kristiansen
10dab8eef8 Remove dummy variable from function calls 2018-01-05 09:37:05 +01:00
Søren Lind Kristiansen
7f0ab145e9 Don't pass CLI command name as dummy argument 2018-01-04 21:33:47 +01:00
Ines Montani
6a008233b5
Merge pull request #1795 from textioHQ/issue1758 (resolves #1758)
english tokenizer: handle "would've"
2018-01-04 02:43:39 +00:00
Kevin Humphreys
597df5bf83 add test 2018-01-03 13:00:05 -08:00
Kevin Humphreys
7918fa4ef9 handle would've 2018-01-03 12:25:48 -08:00
ines
2c656f90fb Exit with 1 if incompatible models found (see #1714) 2018-01-03 21:20:35 +01:00
ines
dacfaa2ca4 Ensure that download command exits properly (resolves #1714) 2018-01-03 21:03:36 +01:00
Søren Lind Kristiansen
a9ff6eadc9 Prefix dummy argument names with underscore 2018-01-03 20:48:12 +01:00
ines
1081e08efb Fix formatting 2018-01-03 20:14:50 +01:00
ines
d8109964d6 Use --no-deps on model install
In general, it's nice for models to specify spaCy as a dependency. However, this tends to cause problems in conda environments, as pip will re-install spaCy and its dependencies (especially Thinc)
2018-01-03 17:40:37 +01:00
ines
319d754309 Fix overwriting of existing symlinks
Check for is_symlink() to also overwrite invalid and outdated symlinks. Also show better error message if link path exists but is not symlink (i.e. file or directory).
2018-01-03 17:39:36 +01:00
ines
8ba0dfd017 Make message on failed linking more clear 2018-01-03 17:38:09 +01:00
Søren Lind Kristiansen
d6327e8495 Fix handling case when vectors not specified 2018-01-03 12:20:49 +01:00
Søren Lind Kristiansen
bcc51d7d8b Fix shifted positional arguments 2018-01-03 12:19:47 +01:00
zqhZY
f27859fa99 add ChineseDefaults class for pickling 2017-12-28 17:13:58 +08:00
Ines Montani
ff9fc945ab
Merge pull request #1749 from sorenlind/da_ud_tokenization
Tune Danish tokenizer to more closely match Universal Dependencies
2017-12-22 16:00:49 +00:00
ines
26f313dabc Fix missing import 2017-12-22 16:21:44 +01:00
ines
8dc1c27841 Merge branch 'master' of https://github.com/explosion/spaCy 2017-12-22 16:01:00 +01:00
ines
b10ba848b8 xfail test that causes MemoryError on Python 2 on Windows
Need to investigate this further!
2017-12-22 16:00:58 +01:00
Søren Lind Kristiansen
bef735aef7 Fix Danish abbreviation 'm.h.t.' 2017-12-21 09:24:31 +01:00
Ines Montani
a3dd167d7f
Merge branch 'master' into da_ud_tokenization 2017-12-20 21:05:34 +00:00
Ines Montani
97f100f69f
Merge pull request #1742 from kimfalk/master
Two corrections in the da lan.
2017-12-20 21:02:00 +00:00
Ines Montani
d682a8803e
Merge pull request #1672 from cbilgili/master
Adds Turkish Lemmatization
2017-12-20 21:01:00 +00:00
Benjamin Peterson
9452134cd1 remove no-break spaces from Hindi example (fixes #1750) 2017-12-20 11:35:30 -08:00
Søren Lind Kristiansen
7a2f2f6f94 Fix formatting. 2017-12-20 18:37:37 +01:00
Søren Lind Kristiansen
15d13efafd Tune Danish tokenizer to more closely match tokenization in Universal Dependencies. 2017-12-20 17:36:52 +01:00
Kim FalkJørgensen
648dc60755 Remove the incorrect exception 'm.h.t' 2017-12-20 10:02:39 +01:00
Kim FalkJørgensen
9c9f4ef84a Fixing a translation error in examples.py
Adding an exception in the tokenizer_exceptions.py
2017-12-19 15:26:50 +01:00
ines
22dc744b48 Fix check for '@' in like_url (see #1715) 2017-12-16 13:48:43 +01:00
Ines Montani
9c1ee65268
Add regression test for #1698 2017-12-12 10:36:11 +01:00
Ines Montani
6455b574fc
Check for email address first 2017-12-12 10:25:13 +01:00
Bri-Will
d77361d76c
Update lex_attrs.py. Fix like_url from matching on e-mail 2017-12-11 14:13:28 -08:00
Søren Lind Kristiansen
5a9d377580 Remove abbreviation for positional plac argument 2017-12-11 11:08:29 +01:00
Isaac Sijaranamual
38021fbb00 Switch from python 3 only TemporaryDirectory to pytest's tmpdir 2017-12-11 00:16:04 +01:00
Isaac Sijaranamual
20ae0c459a Fixes "Error saving model" #1622 2017-12-10 23:07:13 +01:00
Isaac Sijaranamual
568130ce7c Adds regression test_issue1622 2017-12-10 23:00:48 +01:00
Isaac Sijaranamual
e188b61960 Make cli/train.py not eat exception 2017-12-10 22:53:08 +01:00
ines
020a7e5d52 Allow 'fine_grained' option in displaCy (see #1703)
Shows token.tag_ instead of token.pos_. Disabled by default, to not cause rendering issues for models with long fine-grained tags (e.g. merged morphological features).
2017-12-09 15:11:12 +01:00
Matthew Honnibal
3b17eb7c49 Merge branch 'master' of https://github.com/explosion/spaCy 2017-12-07 10:39:32 +01:00
Matthew Honnibal
a6b43729c6 Set version to v2.0.5 2017-12-07 10:39:14 +01:00
ines
5eaa61c2b8 Fix formatting 2017-12-07 10:23:09 +01:00
ines
24e80c51b8 Document init-model command 2017-12-07 10:14:37 +01:00
Matthew Honnibal
c91f451b0f Fix imports and CLI in init-model 2017-12-07 10:03:07 +01:00
ines
82e80ff928 Rename model command to init_model and fix formatting 2017-12-07 09:59:23 +01:00
Ines Montani
2feeb428d6
Merge pull request #1646 from GreenRiverRUS/master
Added model command to create models from raw data
2017-12-07 08:54:26 +00:00
Matthew Honnibal
6373d2580d Increment version to v2.0.5.dev0 2017-12-07 09:53:59 +01:00
Matthew Honnibal
36b47e3fa6 Fix (and test) vector pickling 2017-12-07 09:53:30 +01:00
Matthew Honnibal
05f41ff587 Set version to 2.0.4 2017-12-06 13:24:02 +01:00
Matthew Honnibal
04c38f7e87 Merge branch 'master' of https://github.com/explosion/spaCy 2017-12-06 12:15:52 +01:00
Matthew Honnibal
361944e512 If no rules are set, lemmatize by lookup 2017-12-06 12:12:11 +01:00
Matthew Honnibal
2ab0f2d186
Merge pull request #1664 from jimregan/italian-lemmatizer
BOM in Italian lemmatiser
2017-12-06 11:09:04 +01:00
Matthew Honnibal
3f247119d3
Merge pull request #1668 from sorenlind/da_morph
Add more Danish morph rules and clean up existing ones
2017-12-06 11:08:09 +01:00
Matthew Honnibal
b712de774e Fix vectors pickling 2017-12-05 12:45:24 +01:00
Matthew Honnibal
04650e38c7 Set version to 2.0.4.dev0 2017-12-05 10:52:31 +01:00
Matthew Honnibal
07acb43a85 Merge branch 'master' of https://github.com/explosion/spaCy 2017-12-04 14:42:52 +01:00
Thomas Werkmeister
94eac75b7c
fix setup.py spacy req string for packaging
Requirement should be `spacy>=2.0.2` instead of `spacy2.0.2`
2017-12-03 04:16:28 -06:00
ines
f2ea6d4713 Add Dutch example sentences (see #1107) 2017-12-01 23:36:05 +01:00
Canbey Bilgili
abe098b255 Adds Turkish Lemmatization 2017-12-01 17:04:32 +03:00
Søren Lind Kristiansen
d86b537a38 Enable morph rules for Danish 2017-11-30 15:58:02 +01:00
Søren Lind Kristiansen
13a988adc3 Remove 'Number[psor]' 2017-11-30 15:55:04 +01:00
Søren Lind Kristiansen
dd6fde18a9 Add more Danish morph rules and clean up existing ones 2017-11-30 11:17:19 +01:00
Vadim Mazaev
495eacf470 Merge branch 'model_command' 2017-11-30 12:30:26 +03:00
Vadim Mazaev
4ba7ddf651 Bugfixies 2017-11-30 12:29:38 +03:00
Matthew Honnibal
6bc0f4d29f
Merge pull request #1611 from fsonntag/master
Solving #1494
2017-11-29 23:11:23 +01:00
Matthew Honnibal
f9ed9ea529
Merge pull request #1624 from GreenRiverRUS/russian
Add support for Russian
2017-11-29 23:10:01 +01:00
Jim O'Regan
ba6a23fd11 BOM in Italian lemmatiser 2017-11-29 17:40:07 +00:00
ines
a31506e060 Fix off-by-one error in nlp.add_pipe(after=name) (fixes #1654) 2017-11-28 20:37:55 +01:00
ines
b62739fbfe Add regression test for #1654 2017-11-28 20:27:54 +01:00
ines
2e50dbb9d7 Simplify test 2017-11-28 20:27:27 +01:00
Felix Sonntag
724ae7dc55 Fixed issue of infix capturing prefixes 2017-11-28 17:17:12 +01:00
Ines Montani
9052643e2c
Merge pull request #1653 from sorenlind/da_example_typo
Fix typo
2017-11-27 14:47:42 +00:00
Søren Lind Kristiansen
5fe58b885b Fix typo 2017-11-27 15:36:18 +01:00
Ines Montani
d52b1ab245
Add unicode_literals (hopefully fixes test failure on Python 2) 2017-11-27 15:16:54 +01:00
Søren Lind Kristiansen
0ffd27b0f6 Add several Danish alternative spellings 2017-11-27 13:35:41 +01:00
Ines Montani
6362024cf8
Merge pull request #1645 from GreenRiverRUS/fix_default_meta
Fixed spaCy version string in default meta
2017-11-27 11:58:02 +00:00
Vadim Mazaev
c332ffdde1 Added model command to create model from raw data:
words counts, brown clusters and vectors
2017-11-27 01:21:47 +03:00
Vadim Mazaev
59f03ab1d7 Fixed spacy version string in default meta 2017-11-26 23:02:07 +03:00
Vadim Mazaev
53e7c38637 Fixed tests depends on pymorphy2 2017-11-26 21:04:44 +03:00
Vadim Mazaev
cacd859dcd Added tag map, fixed tests fails, added more exceptions 2017-11-26 20:54:48 +03:00
Ines Montani
a7bb8f1b42
Merge pull request #1637 from sorenlind/da_tokenization
Improve Danish tokenization
2017-11-26 15:41:38 +00:00
ines
c699aec089 Add offsets_from_biluo_tags helper and tests (see #1626) 2017-11-26 16:38:01 +01:00
Søren Lind Kristiansen
ef03e9ea53 Remove unused import. 2017-11-25 13:04:02 +01:00
Søren Lind Kristiansen
6aa241bcec Add day of month tokenizer exceptions for Danish. 2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen
0c276ed020 Add weekday abbreviations and remove abiguous month abbreviations for Danish. 2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen
056547e989 Add multiple tokenizer exceptions for Danish. 2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen
8dc265ac0c Add test for tokenization of 'i.' for Danish. 2017-11-24 11:29:37 +01:00
Søren Lind Kristiansen
ac8116510d Fix tokenization of 'i.' for Danish. 2017-11-24 11:16:53 +01:00
Matthew Honnibal
79f11d4f85 Pickle vectors with vocab 2017-11-23 17:19:50 +01:00
Matthew Honnibal
f29c3925ee Fix more efficient nonproj 2017-11-23 12:48:00 +00:00
Matthew Honnibal
e10e9ad2c5 Improve efficiency of Doc.to_array 2017-11-23 12:33:27 +00:00
Matthew Honnibal
2acc907d55 Improve profiling 2017-11-23 12:33:03 +00:00
Matthew Honnibal
fa62427300 Remove lookup-based lemmatization 2017-11-23 12:32:22 +00:00
Matthew Honnibal
fb26b2cb12 Use lookup lemmatizer if lemma unset 2017-11-23 12:31:58 +00:00
Matthew Honnibal
db5c714ad2 Improve efficiency of deprojectivization 2017-11-23 12:31:34 +00:00
Matthew Honnibal
8fec7268eb Move string cleanup under a setting flag 2017-11-23 12:19:18 +00:00
Matthew Honnibal
5949777b12 Fix misleading multi-threading docstring 2017-11-23 12:18:59 +00:00
Matthew Honnibal
542e6fd4ea Don't remove entries from specials 2017-11-23 12:17:42 +00:00
Matthew Honnibal
30ba81f881
Merge pull request #1576 from ligser/master
Actually reset caches in pipe [wip]
2017-11-23 12:54:48 +01:00
ines
c90fe92e15 Fix displaCy test 2017-11-22 05:04:39 +01:00
ines
a6f33ac27d Fix displaCy test 2017-11-22 04:19:28 +01:00
ines
93b0be611a Merge branch 'master' of https://github.com/explosion/spaCy 2017-11-22 00:28:55 +01:00
ines
60b4915569 Use .pos_ instead of .tags_ in displaCy by default (see #1006) 2017-11-22 00:28:52 +01:00
Vadim Mazaev
81314f8659 Fixed tokenizer: added char classes; added first lemmatizer and
tokenizer tests
2017-11-21 22:23:59 +03:00
Vadim Mazaev
52ee1f9bf9 Updated Russian Language, added lemmatizer, norm exceptions and lex
attrs
2017-11-21 11:44:46 +03:00
Burton DeWilde
a5c6869b2d Fix bug where span.orth_ != span.text (see #1612) 2017-11-20 12:05:43 -06:00
Burton DeWilde
635792997c Add regression test for #1612 2017-11-20 12:05:35 -06:00
ines
9a63e32f21 Add noqa to Python 2 compat variables of built-ins (see #1617) 2017-11-20 14:03:42 +01:00
ines
d70a64d78b Fix syntax error and formatting in test (see #1617) 2017-11-20 14:01:25 +01:00
ines
17849dee4b Fix French test (see #1617) 2017-11-20 13:59:59 +01:00
Felix Sonntag
33b0f86de3 Changed tokenizer to add infix when infix_start is offset 2017-11-19 16:32:10 +01:00