Matthew Honnibal
465a6f6452
Add missing Span.vocab property. Closes #1633
2018-01-14 15:06:30 +01:00
Matthew Honnibal
0cb090e526
Fix infinite recursion in token.sent_start. Closes #1640
2018-01-14 15:02:15 +01:00
Matthew Honnibal
5cbe913b6f
Don't raise deprecation warning in property. Closes #1813 , #1712
2018-01-14 14:55:58 +01:00
Matthew Honnibal
1a1cca6052
Fix vectors.resize() on Py3. Closes #1539
2018-01-14 14:48:51 +01:00
Matthew Honnibal
0153220304
Make set_vector add word to vocab. Fixes #1807
2018-01-14 13:57:57 +01:00
Ines Montani
55754f0cee
Merge pull request #1836 from fucking-signup/master
...
Add tests for issue #1769
2018-01-13 00:23:35 +00:00
Kit
4ee97f20a0
Mark like_num tests as slow
2018-01-13 00:44:15 +01:00
Kit
855531537e
Rewrite tests for issue #1769
2018-01-12 23:49:51 +01:00
Kit
5b541cb5ec
Simplify tests for issue #1769
2018-01-12 23:34:27 +01:00
Kit
7a2adc4633
Remove some tests to see build status changes
2018-01-12 22:49:16 +01:00
Kit
0e62809a43
Rewrite tests for issue #1769
2018-01-12 22:26:06 +01:00
Ines Montani
36f426fe0a
Merge pull request #1808 from fucking-signup/master
...
Fix issue #1769
2018-01-12 21:12:02 +00:00
Kit
76f4eeca44
Remove tests to see build changes on Windows (Python 2.7)
2018-01-12 20:30:51 +01:00
Matthew Honnibal
7ca49c2061
Merge branch 'master' into feature-improve-model-download
2018-01-10 18:21:55 +01:00
Kit
7ec0956e8d
Add regression test (issue #1769 )
2018-01-08 03:42:04 +01:00
Kit
701e7cc6aa
Rename variable to keep code consistent
2018-01-08 03:38:44 +01:00
Kit
ed0db95183
Find lowercased forms of ordinal words, where possible
2018-01-08 03:28:50 +01:00
Kit
9bc524982e
Find lowercased forms of numeric words
2018-01-08 03:25:08 +01:00
Søren Lind Kristiansen
62de5da1ff
Remove unsused dummy variable
2018-01-05 09:57:24 +01:00
Søren Lind Kristiansen
10dab8eef8
Remove dummy variable from function calls
2018-01-05 09:37:05 +01:00
Søren Lind Kristiansen
7f0ab145e9
Don't pass CLI command name as dummy argument
2018-01-04 21:33:47 +01:00
Ines Montani
6a008233b5
Merge pull request #1795 from textioHQ/issue1758 ( resolves #1758 )
...
english tokenizer: handle "would've"
2018-01-04 02:43:39 +00:00
Kevin Humphreys
597df5bf83
add test
2018-01-03 13:00:05 -08:00
Kevin Humphreys
7918fa4ef9
handle would've
2018-01-03 12:25:48 -08:00
ines
2c656f90fb
Exit with 1 if incompatible models found (see #1714 )
2018-01-03 21:20:35 +01:00
ines
dacfaa2ca4
Ensure that download command exits properly ( resolves #1714 )
2018-01-03 21:03:36 +01:00
Søren Lind Kristiansen
a9ff6eadc9
Prefix dummy argument names with underscore
2018-01-03 20:48:12 +01:00
ines
1081e08efb
Fix formatting
2018-01-03 20:14:50 +01:00
ines
d8109964d6
Use --no-deps on model install
...
In general, it's nice for models to specify spaCy as a dependency. However, this tends to cause problems in conda environments, as pip will re-install spaCy and its dependencies (especially Thinc)
2018-01-03 17:40:37 +01:00
ines
319d754309
Fix overwriting of existing symlinks
...
Check for is_symlink() to also overwrite invalid and outdated symlinks. Also show better error message if link path exists but is not symlink (i.e. file or directory).
2018-01-03 17:39:36 +01:00
ines
8ba0dfd017
Make message on failed linking more clear
2018-01-03 17:38:09 +01:00
Søren Lind Kristiansen
d6327e8495
Fix handling case when vectors not specified
2018-01-03 12:20:49 +01:00
Søren Lind Kristiansen
bcc51d7d8b
Fix shifted positional arguments
2018-01-03 12:19:47 +01:00
zqhZY
f27859fa99
add ChineseDefaults class for pickling
2017-12-28 17:13:58 +08:00
Ines Montani
ff9fc945ab
Merge pull request #1749 from sorenlind/da_ud_tokenization
...
Tune Danish tokenizer to more closely match Universal Dependencies
2017-12-22 16:00:49 +00:00
ines
26f313dabc
Fix missing import
2017-12-22 16:21:44 +01:00
ines
8dc1c27841
Merge branch 'master' of https://github.com/explosion/spaCy
2017-12-22 16:01:00 +01:00
ines
b10ba848b8
xfail test that causes MemoryError on Python 2 on Windows
...
Need to investigate this further!
2017-12-22 16:00:58 +01:00
Søren Lind Kristiansen
bef735aef7
Fix Danish abbreviation 'm.h.t.'
2017-12-21 09:24:31 +01:00
Ines Montani
a3dd167d7f
Merge branch 'master' into da_ud_tokenization
2017-12-20 21:05:34 +00:00
Ines Montani
97f100f69f
Merge pull request #1742 from kimfalk/master
...
Two corrections in the da lan.
2017-12-20 21:02:00 +00:00
Ines Montani
d682a8803e
Merge pull request #1672 from cbilgili/master
...
Adds Turkish Lemmatization
2017-12-20 21:01:00 +00:00
Benjamin Peterson
9452134cd1
remove no-break spaces from Hindi example ( fixes #1750 )
2017-12-20 11:35:30 -08:00
Søren Lind Kristiansen
7a2f2f6f94
Fix formatting.
2017-12-20 18:37:37 +01:00
Søren Lind Kristiansen
15d13efafd
Tune Danish tokenizer to more closely match tokenization in Universal Dependencies.
2017-12-20 17:36:52 +01:00
Kim FalkJørgensen
648dc60755
Remove the incorrect exception 'm.h.t'
2017-12-20 10:02:39 +01:00
Kim FalkJørgensen
9c9f4ef84a
Fixing a translation error in examples.py
...
Adding an exception in the tokenizer_exceptions.py
2017-12-19 15:26:50 +01:00
ines
22dc744b48
Fix check for '@' in like_url (see #1715 )
2017-12-16 13:48:43 +01:00
Ines Montani
9c1ee65268
Add regression test for #1698
2017-12-12 10:36:11 +01:00
Ines Montani
6455b574fc
Check for email address first
2017-12-12 10:25:13 +01:00
Bri-Will
d77361d76c
Update lex_attrs.py. Fix like_url from matching on e-mail
2017-12-11 14:13:28 -08:00
Søren Lind Kristiansen
5a9d377580
Remove abbreviation for positional plac argument
2017-12-11 11:08:29 +01:00
Isaac Sijaranamual
38021fbb00
Switch from python 3 only TemporaryDirectory to pytest's tmpdir
2017-12-11 00:16:04 +01:00
Isaac Sijaranamual
20ae0c459a
Fixes "Error saving model" #1622
2017-12-10 23:07:13 +01:00
Isaac Sijaranamual
568130ce7c
Adds regression test_issue1622
2017-12-10 23:00:48 +01:00
Isaac Sijaranamual
e188b61960
Make cli/train.py not eat exception
2017-12-10 22:53:08 +01:00
ines
020a7e5d52
Allow 'fine_grained' option in displaCy (see #1703 )
...
Shows token.tag_ instead of token.pos_. Disabled by default, to not cause rendering issues for models with long fine-grained tags (e.g. merged morphological features).
2017-12-09 15:11:12 +01:00
Matthew Honnibal
3b17eb7c49
Merge branch 'master' of https://github.com/explosion/spaCy
2017-12-07 10:39:32 +01:00
Matthew Honnibal
a6b43729c6
Set version to v2.0.5
2017-12-07 10:39:14 +01:00
ines
5eaa61c2b8
Fix formatting
2017-12-07 10:23:09 +01:00
ines
24e80c51b8
Document init-model command
2017-12-07 10:14:37 +01:00
Matthew Honnibal
c91f451b0f
Fix imports and CLI in init-model
2017-12-07 10:03:07 +01:00
ines
82e80ff928
Rename model command to init_model and fix formatting
2017-12-07 09:59:23 +01:00
Ines Montani
2feeb428d6
Merge pull request #1646 from GreenRiverRUS/master
...
Added model command to create models from raw data
2017-12-07 08:54:26 +00:00
Matthew Honnibal
6373d2580d
Increment version to v2.0.5.dev0
2017-12-07 09:53:59 +01:00
Matthew Honnibal
36b47e3fa6
Fix (and test) vector pickling
2017-12-07 09:53:30 +01:00
Matthew Honnibal
05f41ff587
Set version to 2.0.4
2017-12-06 13:24:02 +01:00
Matthew Honnibal
04c38f7e87
Merge branch 'master' of https://github.com/explosion/spaCy
2017-12-06 12:15:52 +01:00
Matthew Honnibal
361944e512
If no rules are set, lemmatize by lookup
2017-12-06 12:12:11 +01:00
Matthew Honnibal
2ab0f2d186
Merge pull request #1664 from jimregan/italian-lemmatizer
...
BOM in Italian lemmatiser
2017-12-06 11:09:04 +01:00
Matthew Honnibal
3f247119d3
Merge pull request #1668 from sorenlind/da_morph
...
Add more Danish morph rules and clean up existing ones
2017-12-06 11:08:09 +01:00
Matthew Honnibal
b712de774e
Fix vectors pickling
2017-12-05 12:45:24 +01:00
Matthew Honnibal
04650e38c7
Set version to 2.0.4.dev0
2017-12-05 10:52:31 +01:00
Matthew Honnibal
07acb43a85
Merge branch 'master' of https://github.com/explosion/spaCy
2017-12-04 14:42:52 +01:00
Thomas Werkmeister
94eac75b7c
fix setup.py spacy req string for packaging
...
Requirement should be `spacy>=2.0.2` instead of `spacy2.0.2`
2017-12-03 04:16:28 -06:00
ines
f2ea6d4713
Add Dutch example sentences (see #1107 )
2017-12-01 23:36:05 +01:00
Canbey Bilgili
abe098b255
Adds Turkish Lemmatization
2017-12-01 17:04:32 +03:00
Søren Lind Kristiansen
d86b537a38
Enable morph rules for Danish
2017-11-30 15:58:02 +01:00
Søren Lind Kristiansen
13a988adc3
Remove 'Number[psor]'
2017-11-30 15:55:04 +01:00
Søren Lind Kristiansen
dd6fde18a9
Add more Danish morph rules and clean up existing ones
2017-11-30 11:17:19 +01:00
Vadim Mazaev
495eacf470
Merge branch 'model_command'
2017-11-30 12:30:26 +03:00
Vadim Mazaev
4ba7ddf651
Bugfixies
2017-11-30 12:29:38 +03:00
Jim O'Regan
a4ecdeadd4
aha
2017-11-29 23:43:25 +00:00
Jim O'Regan
2c7a9215d7
Merge branch 'master' into animacy
2017-11-29 23:31:12 +00:00
Jim O'Regan
c3e6cee17a
use inan in polimorf tagset conversion
2017-11-29 23:15:47 +00:00
Jim O'Regan
b32575e78c
imports
2017-11-29 23:03:41 +00:00
Jim O'Regan
3696ce6a7b
add UD mapping
2017-11-29 22:59:19 +00:00
Jim O'Regan
f8e7082fe4
typo in "inan", add "nhum"
2017-11-29 22:40:47 +00:00
Matthew Honnibal
6bc0f4d29f
Merge pull request #1611 from fsonntag/master
...
Solving #1494
2017-11-29 23:11:23 +01:00
Matthew Honnibal
f9ed9ea529
Merge pull request #1624 from GreenRiverRUS/russian
...
Add support for Russian
2017-11-29 23:10:01 +01:00
Jim O'Regan
076a6fc60a
symbols
2017-11-29 20:11:20 +00:00
Jim O'Regan
834ba3c69a
(semi generated) Polimorf mapping
2017-11-29 20:08:24 +00:00
Jim O'Regan
ba6a23fd11
BOM in Italian lemmatiser
2017-11-29 17:40:07 +00:00
ines
a31506e060
Fix off-by-one error in nlp.add_pipe(after=name) ( fixes #1654 )
2017-11-28 20:37:55 +01:00
ines
b62739fbfe
Add regression test for #1654
2017-11-28 20:27:54 +01:00
ines
2e50dbb9d7
Simplify test
2017-11-28 20:27:27 +01:00
Felix Sonntag
724ae7dc55
Fixed issue of infix capturing prefixes
2017-11-28 17:17:12 +01:00
Ines Montani
9052643e2c
Merge pull request #1653 from sorenlind/da_example_typo
...
Fix typo
2017-11-27 14:47:42 +00:00
Søren Lind Kristiansen
5fe58b885b
Fix typo
2017-11-27 15:36:18 +01:00
Ines Montani
d52b1ab245
Add unicode_literals (hopefully fixes test failure on Python 2)
2017-11-27 15:16:54 +01:00
Søren Lind Kristiansen
0ffd27b0f6
Add several Danish alternative spellings
2017-11-27 13:35:41 +01:00
Ines Montani
6362024cf8
Merge pull request #1645 from GreenRiverRUS/fix_default_meta
...
Fixed spaCy version string in default meta
2017-11-27 11:58:02 +00:00
Vadim Mazaev
c332ffdde1
Added model command to create model from raw data:
...
words counts, brown clusters and vectors
2017-11-27 01:21:47 +03:00
Vadim Mazaev
59f03ab1d7
Fixed spacy version string in default meta
2017-11-26 23:02:07 +03:00
Vadim Mazaev
53e7c38637
Fixed tests depends on pymorphy2
2017-11-26 21:04:44 +03:00
Vadim Mazaev
cacd859dcd
Added tag map, fixed tests fails, added more exceptions
2017-11-26 20:54:48 +03:00
Ines Montani
a7bb8f1b42
Merge pull request #1637 from sorenlind/da_tokenization
...
Improve Danish tokenization
2017-11-26 15:41:38 +00:00
ines
c699aec089
Add offsets_from_biluo_tags helper and tests (see #1626 )
2017-11-26 16:38:01 +01:00
Søren Lind Kristiansen
ef03e9ea53
Remove unused import.
2017-11-25 13:04:02 +01:00
Søren Lind Kristiansen
6aa241bcec
Add day of month tokenizer exceptions for Danish.
2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen
0c276ed020
Add weekday abbreviations and remove abiguous month abbreviations for Danish.
2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen
056547e989
Add multiple tokenizer exceptions for Danish.
2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen
8dc265ac0c
Add test for tokenization of 'i.' for Danish.
2017-11-24 11:29:37 +01:00
Søren Lind Kristiansen
ac8116510d
Fix tokenization of 'i.' for Danish.
2017-11-24 11:16:53 +01:00
Matthew Honnibal
79f11d4f85
Pickle vectors with vocab
2017-11-23 17:19:50 +01:00
Matthew Honnibal
f29c3925ee
Fix more efficient nonproj
2017-11-23 12:48:00 +00:00
Matthew Honnibal
e10e9ad2c5
Improve efficiency of Doc.to_array
2017-11-23 12:33:27 +00:00
Matthew Honnibal
2acc907d55
Improve profiling
2017-11-23 12:33:03 +00:00
Matthew Honnibal
fa62427300
Remove lookup-based lemmatization
2017-11-23 12:32:22 +00:00
Matthew Honnibal
fb26b2cb12
Use lookup lemmatizer if lemma unset
2017-11-23 12:31:58 +00:00
Matthew Honnibal
db5c714ad2
Improve efficiency of deprojectivization
2017-11-23 12:31:34 +00:00
Matthew Honnibal
8fec7268eb
Move string cleanup under a setting flag
2017-11-23 12:19:18 +00:00
Matthew Honnibal
5949777b12
Fix misleading multi-threading docstring
2017-11-23 12:18:59 +00:00
Matthew Honnibal
542e6fd4ea
Don't remove entries from specials
2017-11-23 12:17:42 +00:00
Matthew Honnibal
30ba81f881
Merge pull request #1576 from ligser/master
...
Actually reset caches in pipe [wip]
2017-11-23 12:54:48 +01:00
ines
c90fe92e15
Fix displaCy test
2017-11-22 05:04:39 +01:00
ines
a6f33ac27d
Fix displaCy test
2017-11-22 04:19:28 +01:00
ines
93b0be611a
Merge branch 'master' of https://github.com/explosion/spaCy
2017-11-22 00:28:55 +01:00
ines
60b4915569
Use .pos_ instead of .tags_ in displaCy by default (see #1006 )
2017-11-22 00:28:52 +01:00
Vadim Mazaev
81314f8659
Fixed tokenizer: added char classes; added first lemmatizer and
...
tokenizer tests
2017-11-21 22:23:59 +03:00
Vadim Mazaev
52ee1f9bf9
Updated Russian Language, added lemmatizer, norm exceptions and lex
...
attrs
2017-11-21 11:44:46 +03:00
Burton DeWilde
a5c6869b2d
Fix bug where span.orth_ != span.text (see #1612 )
2017-11-20 12:05:43 -06:00
Burton DeWilde
635792997c
Add regression test for #1612
2017-11-20 12:05:35 -06:00
ines
9a63e32f21
Add noqa to Python 2 compat variables of built-ins (see #1617 )
2017-11-20 14:03:42 +01:00
ines
d70a64d78b
Fix syntax error and formatting in test (see #1617 )
2017-11-20 14:01:25 +01:00
ines
17849dee4b
Fix French test (see #1617 )
2017-11-20 13:59:59 +01:00
Felix Sonntag
33b0f86de3
Changed tokenizer to add infix when infix_start is offset
2017-11-19 16:32:10 +01:00
Felix Sonntag
8be3392302
Added regression text for 1494
2017-11-19 16:30:35 +01:00
Motoki Wu
a52e195a0a
Fixes Issue #1207 where noun_chunks
of Span
gives an error.
...
Make sure to reference `self.doc` when getting the noun chunks.
Same fix as 9750a0128c
2017-11-17 17:16:20 -08:00
Motoki Wu
b818afaa0e
Added failing test for Issue #1207 .
...
The noun chunk iterator should work for `Doc` but not for `Span`.
2017-11-17 17:04:27 -08:00
Vadim Mazaev
a0739a06d4
Returned russian support from v1.10 branch
2017-11-17 17:06:15 +03:00
yuukos
7401152289
updated Russian tokenizer
...
moved the trying to import pymorph into __init__
2017-11-17 17:04:50 +03:00
yuukos
3aad66cf00
added russian language support
2017-11-17 17:04:22 +03:00
ines
a3d4dd1a5d
Test adding of lots of pipeline components (see #1585 )
...
Just to make sure that there's no error now or in the future with adding a large number of pipeline components.
2017-11-15 17:28:06 +01:00
Roman Domrachev
61d28d03e4
Try again to do selective remove cache
2017-11-15 19:11:12 +03:00
Roman Domrachev
b3311100c7
Merge branch 'master' of github.com:explosion/spaCy
2017-11-15 18:30:04 +03:00
Matthew Honnibal
b60d92aca8
Increment version
2017-11-15 16:14:46 +01:00
Roman Domrachev
505c6a2f2f
Completely cleanup tokenizer cache
...
Tokenizer cache can have be different keys than string
That modification can slow down tokenizer and need to be measured
2017-11-15 17:55:48 +03:00
Matthew Honnibal
cf0be62096
Increment version
2017-11-15 15:00:18 +01:00
ines
97a4f9362b
Merge branch 'master' of https://github.com/explosion/spaCy
2017-11-15 14:24:00 +01:00
ines
8e65247886
Fix lex.id if vectors is None
2017-11-15 14:23:58 +01:00
Matthew Honnibal
437ad1a852
Merge pull request #1570 from explosion/feature/fix-beam-leak
...
Fix memory leak in beam parser
2017-11-15 14:15:05 +01:00
Matthew Honnibal
2f169fdb0a
Set lex ID correctly for new tokens in Vocab
2017-11-15 13:58:03 +01:00
Matthew Honnibal
fe3c42a06b
Fix caching in tokenizer
2017-11-15 13:55:46 +01:00
Matthew Honnibal
8d692771f6
Improve profiling
2017-11-15 13:51:25 +01:00
Matthew Honnibal
b797dca977
Merge branch 'master' of https://github.com/explosion/spaCy
2017-11-15 13:11:43 +01:00
ines
c9d72de0fb
Add dummy serialization methods for Japanese and missing lang getter ( resolves #1557 )
2017-11-15 12:44:02 +01:00
Matthew Honnibal
d274d3a3b9
Let beam forward use minibatches
2017-11-15 00:51:42 +01:00
Matthew Honnibal
855872f872
Remove state hashing
2017-11-14 23:36:46 +01:00
Roman Domrachev
3e21680814
Use safer method to get string without hit
2017-11-14 22:58:46 +03:00
Roman Domrachev
a33d5a068d
Try to hold origin data instead of restore it
2017-11-14 22:40:03 +03:00
Roman Domrachev
91e2fa6561
Clean all caches
2017-11-14 21:15:04 +03:00
Roman Domrachev
4e378dc4a4
Remove all obsolete code and test only initial problem
2017-11-14 20:45:04 +03:00
Roman
47ce2347b0
Create test that fails when actual cleanup caused
2017-11-14 20:28:13 +03:00
Roman
caae77f72d
Update strings.pyx
2017-11-14 19:44:40 +03:00
Roman Domrachev
3d247d2bb8
Get back previous testcase
2017-11-14 18:01:37 +03:00
Roman Domrachev
870defa815
Swap keys in proper place
...
Remove unnecessary clear of the hits
2017-11-14 17:56:30 +03:00
Roman Domrachev
86ca434c93
Merge github.com:explosion/spaCy
2017-11-14 17:46:22 +03:00
Roman Domrachev
a2745b0e84
StringStore now actually cleaned
...
Do not lose docs in ref tracking
2017-11-14 17:45:50 +03:00
Matthew Honnibal
2512ea9eeb
Fix memory leak in beam parser
2017-11-14 02:11:40 +01:00
Matthew Honnibal
86ddf692a1
Fix bug in limit calculation on dev data
2017-11-14 01:37:10 +01:00
Ines Montani
ea6c85c67a
Merge pull request #1566 from MathiasDesch/master ( resolves #1248 )
...
Add exceptions to tokenizer and norm
2017-11-13 19:05:22 +01:00
Matthew Honnibal
1b348389bb
Merge branch 'master' of https://github.com/explosion/spaCy
2017-11-13 18:18:48 +01:00
Matthew Honnibal
ca73d0d8fe
Cleanup states after beam parsing, explicitly
2017-11-13 18:18:26 +01:00
Matthew Honnibal
63ef9a2e73
Remove __dealloc__ from ParserBeam
2017-11-13 18:18:08 +01:00
Mathias Deschamps
c0691b2ab4
Add tokenizer exceptions for ing verbs
...
Extend list of tokenizing exceptions introduced in 123810b
2017-11-13 17:46:05 +01:00
Mathias Deschamps
288298ead9
Add norm exception for ing verbs
...
Some ing verbs are sometimes written in or in'. Make the NORM form correct
2017-11-13 17:46:05 +01:00
Abhinav Sharma
59f5740ede
improved upon the list of included stop_words
2017-11-13 17:13:49 +05:30
Matthew Honnibal
6e641f46d4
Create a preprocess function that gets bigrams
2017-11-12 00:43:41 +01:00
Matthew Honnibal
c9251d79e3
Edit comment
2017-11-11 18:38:32 +01:00
Matthew Honnibal
dd1678eab3
Edit comment
2017-11-11 18:37:08 +01:00
Roman Domrachev
ee60a52ee7
Fix test imports and last batch cleanup
2017-11-11 11:32:16 +03:00
Roman Domrachev
4a6b094e09
Remove unused import
2017-11-11 03:13:05 +03:00
Roman Domrachev
3c600adf23
Try to fix StringStore clean up (see #1506 )
2017-11-11 03:11:27 +03:00
ines
ee97fd3cb4
Add regression test for #1547
2017-11-11 00:14:03 +01:00
ines
2df27db671
Add unicode declaration
2017-11-11 00:13:56 +01:00
ines
35653bef3a
Add missing import ( fixes #1546 )
2017-11-10 19:05:18 +01:00
ines
4c5d2c80d5
Re-add python -m to commands, too brittle :( (see #1536 )
2017-11-10 02:30:55 +01:00
ines
123810b6de
Add "lovin'" to tokenizer exceptions (see #1248 )
2017-11-09 17:09:30 +01:00
ines
1c218397f6
Ensure path in Doc.to_disk/from_disk (resolves ##1521)
...
Also add Doc serialization tests with both Path and string path options
2017-11-09 02:29:03 +01:00
Matthew Honnibal
49fd5a646f
Set version for 2.0.2 release
2017-11-08 22:39:39 +01:00
Matthew Honnibal
fba2dbddf7
Increment version
2017-11-08 22:19:08 +01:00
Matthew Honnibal
a5ea0fdf5a
Fix #1518 : vocab.vectors.resize() didn't work
2017-11-08 22:18:37 +01:00
Matthew Honnibal
de45702bbe
Strip dev suffixes from version for compatibility check
2017-11-08 18:40:21 +01:00
Matthew Honnibal
51639214a1
Merge branch 'master' of https://github.com/explosion/spaCy
2017-11-08 18:04:33 +01:00
Matthew Honnibal
a2f980de4e
Exclude .devN versioning from compatibility check
2017-11-08 18:03:52 +01:00
Daniel Hershcovich
d7ae54ff44
Fix typo in message
2017-11-08 16:06:28 +02:00
Matthew Honnibal
4194bc5744
Xfail flakey serialization test
2017-11-08 13:55:13 +01:00
Matthew Honnibal
d5537e5516
Work on Windows test failure
2017-11-08 13:25:18 +01:00
Matthew Honnibal
c27c82d5f9
Fix serialization
2017-11-08 13:08:48 +01:00
Matthew Honnibal
1d5599cd28
Fix dtype
2017-11-08 12:18:32 +01:00
Matthew Honnibal
fa7fdd0d9b
Merge branch 'master' of https://github.com/explosion/spaCy
2017-11-08 12:11:31 +01:00
Matthew Honnibal
072ff38a01
Try to fix python3.5 serialization
2017-11-08 12:10:49 +01:00
Ines Montani
3a0f34d567
Merge pull request #1509 from abhi18av/patch-1
...
Create examples.py for Hindi language
2017-11-08 11:37:19 +01:00
Ines Montani
42b241ccd0
Update language code in usage example in comment
2017-11-08 11:36:38 +01:00
Matthew Honnibal
e262e8d942
Increment version to v2.0.2.dev0
2017-11-08 11:25:47 +01:00
Matthew Honnibal
a8b592783b
Make a dtype more specific, to fix a windows build
2017-11-08 11:24:35 +01:00
Abhinav Sharma
84edade82d
Create examples.py
...
Populated the file with the translations of English example sentences
2017-11-08 13:23:08 +05:30
Matthew Honnibal
d725aee4e2
Increment version to 2.0.1
2017-11-08 02:14:47 +01:00
Matthew Honnibal
8d6f68f1df
Increment version
2017-11-08 01:12:34 +01:00
ines
bcf42b8846
Fix typo
2017-11-08 01:06:37 +01:00
Matthew Honnibal
bbd2a3dee1
Fix title in about.py
2017-11-07 14:02:58 +01:00
Matthew Honnibal
4efaf9306c
Set version to spacy-nightly rc2
2017-11-07 13:27:26 +01:00
Matthew Honnibal
bf1ec2965f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-07 13:20:29 +01:00
Matthew Honnibal
726f689da4
Fix missing import
2017-11-07 13:20:12 +01:00
ines
834f9c1aab
Update about.py
2017-11-07 13:11:33 +01:00
ines
a4662a31a9
Move model package templates to cli.package and update docs
2017-11-07 12:15:35 +01:00
ines
a09c096d3c
Get docs ready for v2.0.0
2017-11-07 12:00:43 +01:00
Matthew Honnibal
9a88e66103
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-07 02:00:06 +01:00
Matthew Honnibal
174abe4677
Increment to 2.0.0rc1
2017-11-07 01:59:46 +01:00
ines
42a0fbf291
Fix textcat simple train example
2017-11-07 01:25:54 +01:00
ines
8fb48b9b91
Update and document new util functions
2017-11-07 00:22:43 +01:00
Matthew Honnibal
1cab703bba
Move minibatch function to util
2017-11-06 23:45:36 +01:00
ines
5f43953536
Move test
2017-11-06 23:14:10 +01:00
Matthew Honnibal
dd90fe09f5
Remove extraneous label from textcat class
2017-11-06 22:09:02 +01:00
Matthew Honnibal
45e0617e61
Allow Language.update to take unicode text and dict objects
2017-11-06 22:07:38 +01:00
Matthew Honnibal
1831dbd065
Add test of simple textcat workflow
2017-11-06 22:04:29 +01:00
Matthew Honnibal
ffb9101f3f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-06 19:20:41 +01:00
Matthew Honnibal
8fea512ac8
Don't set tensor in textcat
2017-11-06 19:20:14 +01:00
ines
acb9bdb852
Fix PRON_LEMMA imports
2017-11-06 17:41:53 +01:00
Matthew Honnibal
7d46793dd7
Add PRON_LEMMA to spacy.symbols
2017-11-06 17:38:25 +01:00
Matthew Honnibal
2f7e9f390d
Make test less flakey
2017-11-06 17:34:50 +01:00
Matthew Honnibal
407b08017e
Make test less flakey
2017-11-06 17:31:40 +01:00
Matthew Honnibal
102f797933
Fix lemma ordering in test
2017-11-06 17:02:17 +01:00
Matthew Honnibal
75e1618ec3
Fix lemma clobbering
2017-11-06 16:56:19 +01:00
Matthew Honnibal
6fdffd7246
Merge pull request #1497 from explosion/feature/improve-optimizer-handling
...
💫 Improve optimizer handling
2017-11-06 16:41:15 +01:00
Matthew Honnibal
8e6795437b
Set release=True
2017-11-06 16:39:32 +01:00
Matthew Honnibal
5c85bf3791
Fix missing import
2017-11-06 15:06:27 +01:00
Matthew Honnibal
25859dbb48
Return optimizer from begin_training, creating if necessary
2017-11-06 14:26:49 +01:00
Matthew Honnibal
465adfee94
Remove unused resume_training method, and pass optimizer through
2017-11-06 14:26:00 +01:00
Matthew Honnibal
13336a6197
Fix Adam import
2017-11-06 14:25:37 +01:00
Matthew Honnibal
2eb11d60f2
Add function create_default_optimizer to spacy._ml
2017-11-06 14:11:59 +01:00
Matthew Honnibal
31babe3c3f
Fix non-clobbering lemmatization
2017-11-06 12:36:05 +01:00
Matthew Honnibal
63c6ae4191
Fix lemmatizer test
2017-11-06 11:57:06 +01:00
Matthew Honnibal
a86a0181b5
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-05 22:19:10 +01:00
Matthew Honnibal
134d3b8143
Fix morphology
2017-11-05 22:18:22 +01:00
ines
08d1cf850a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-05 21:41:58 +01:00
ines
baa231745c
Fix Dutch tag map
2017-11-05 21:41:50 +01:00
Matthew Honnibal
46e62ad747
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-05 19:40:00 +01:00
Matthew Honnibal
bb25cb0f76
Avoid clobbering preset lemmas
2017-11-05 19:39:38 +01:00
ines
507ecb67af
Fix Spanish tag map
2017-11-05 19:23:34 +01:00
Matthew Honnibal
320008352b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-05 18:46:15 +01:00
Matthew Honnibal
38109a0e4a
Register SentenceSegmenter in Language.factories
2017-11-05 18:45:57 +01:00
ines
975e1042ff
Fix Italian tag map
2017-11-05 18:34:09 +01:00
ines
6b2d6e4937
Fix Portuguese tag map
2017-11-05 18:31:00 +01:00
ines
fa2687fded
Fix Dutch tag map
2017-11-05 17:57:59 +01:00
ines
fb8990d916
Fix Spanish tag map
2017-11-05 17:48:46 +01:00
ines
9d13288f73
Fix French tag map
2017-11-05 17:47:59 +01:00
ines
54579805c5
Fix French tag map
2017-11-05 17:44:05 +01:00
Matthew Honnibal
2b35bb76ad
Fix tensorizer on GPU
2017-11-05 15:34:40 +01:00
Matthew Honnibal
6e5181bbaa
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-05 15:33:56 +01:00
Matthew Honnibal
6f438b17c1
Increment version to v2.0.0a19
2017-11-05 14:43:36 +01:00
Matthew Honnibal
225cc249c9
Pass string path to numpy, to fix #1479
2017-11-05 14:42:46 +01:00
Matthew Honnibal
00435d8f0c
Add extra beam parsing test
2017-11-05 14:39:57 +01:00
Matthew Honnibal
e777ea25bb
Merge pull request #1492 from uwol/develop
...
TextCategorizer return parameter fix
2017-11-05 14:13:04 +01:00
Matthew Honnibal
0d4bd6414e
Fix Italian tag map
2017-11-05 14:11:03 +01:00
ines
ef597622a6
Add Portuguese tag map
2017-11-05 13:58:34 +01:00
ines
793c62dfda
Add Dutch tag map
2017-11-05 13:48:07 +01:00
ines
f7485a09c8
Fix Italian tag map
2017-11-05 13:12:58 +01:00
uwol
a2162b8908
tensorizer return parameter fix
2017-11-05 12:25:10 +01:00
ines
0a27afbf86
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-04 23:32:52 +01:00
ines
3cef901834
Add tag map for French and Italian
2017-11-04 23:32:51 +01:00
Matthew Honnibal
cfb83c231c
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-04 23:08:19 +01:00
Matthew Honnibal
d185927998
Undo harmful pickling hacks on Language class
2017-11-04 23:07:03 +01:00
ines
6c15aafebd
Fix formatting
2017-11-04 23:07:02 +01:00
Matthew Honnibal
3ca16ddbd4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-04 00:25:02 +01:00
Matthew Honnibal
e4ec4be948
Fix parser test
2017-11-04 00:23:45 +01:00
Matthew Honnibal
98c29b7912
Add padding vector in parser, to make gradient more correct
2017-11-04 00:23:23 +01:00
ines
5e7d98f72a
Remove test for #1491
2017-11-03 22:10:57 +01:00
ines
718f1c50fb
Add regression test for #1491
2017-11-03 21:11:20 +01:00
Matthew Honnibal
144a93c2a5
Back-off to tensor for similarity if no vectors
2017-11-03 20:56:33 +01:00
Matthew Honnibal
1e9634691a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-03 20:21:15 +01:00
Matthew Honnibal
13c8881d2f
Expose parser's tok2vec model component
2017-11-03 20:20:59 +01:00
Matthew Honnibal
17c63906f9
Update tensorizer component
2017-11-03 20:20:26 +01:00
Matthew Honnibal
2bf21cbe29
Update model after optimising it instead of waiting
2017-11-03 20:20:01 +01:00
Matthew Honnibal
d6e831bf89
Fix lemmatizer tests
2017-11-03 19:46:34 +01:00
ines
eef930c73e
Assert instead of print
2017-11-03 18:50:57 +01:00
ines
f0986df94b
Add test for #1488 (passes on v2.0.0a18?)
2017-11-03 14:44:36 +01:00
Matthew Honnibal
711278b667
Make test less flakey
2017-11-03 14:36:08 +01:00
Matthew Honnibal
7fea845374
Remove print statement
2017-11-03 14:04:51 +01:00
Matthew Honnibal
0a534ae96a
Fix test for backprop d_pad
2017-11-03 14:04:16 +01:00
Matthew Honnibal
33bd2428db
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-03 13:29:56 +01:00
Matthew Honnibal
6681058abd
Fix tensor extending in tagger
2017-11-03 13:29:36 +01:00
Matthew Honnibal
bd2cbdfa85
Make Morphology not fail on unknown tags
2017-11-03 13:29:09 +01:00
Matthew Honnibal
c9b118a7e9
Set softmax attr in tagger model
2017-11-03 11:22:01 +01:00
Matthew Honnibal
a5b05f85f0
Set Doc.tensor attribute in parser
2017-11-03 11:21:00 +01:00
Matthew Honnibal
62ed58935a
Add Doc.extend_tensor() method
2017-11-03 11:20:31 +01:00
Matthew Honnibal
d6fc39c8a6
Set Doc.tensor from Tagger
2017-11-03 11:20:05 +01:00
Matthew Honnibal
b3264aa5f0
Expose the softmax layer in the tagger model, to allow setting tensors
2017-11-03 11:19:51 +01:00
Matthew Honnibal
c2bbf076a4
Add document length cap for training
2017-11-03 01:54:54 +01:00
Matthew Honnibal
6771780d3f
Fix backprop of padding variable
2017-11-03 01:54:34 +01:00
Matthew Honnibal
54a716f2ec
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-03 00:55:20 +01:00
Matthew Honnibal
260e6ee3fb
Improve efficiency of backprop of padding variable
2017-11-03 00:49:11 +01:00
Matthew Honnibal
a22f96c3f1
Add test for backpropagating padding
2017-11-03 00:48:54 +01:00
ines
9baab241b4
Add skeleton language data for Turkish
2017-11-02 16:32:24 +01:00
ines
c6fea3e5f6
Add Romanian and Croatian skeletons (experimental)
...
Add language data templates to make it easier for others to contribute to the language support
2017-11-01 23:04:28 +01:00
ines
18c859500b
Add missing imports
2017-11-01 23:02:51 +01:00
ines
819e30a26e
Tidy up tokenizer exceptions
2017-11-01 23:02:45 +01:00
ines
3af281a334
Update test model name
2017-11-01 23:02:00 +01:00
Matthew Honnibal
b30dd36179
Allow Tagger.add_label() before training
2017-11-01 21:49:24 +01:00
Matthew Honnibal
eca41f0cf6
Fix filename conversion for conllu
2017-11-01 21:26:49 +01:00
Matthew Honnibal
e237472cdc
Fix tag and filename conversion for conllu
2017-11-01 21:25:33 +01:00
Matthew Honnibal
b84d99b281
Revert tagger.add_label() changes, to fix model
2017-11-01 21:10:45 +01:00
Matthew Honnibal
f5855e539b
Fix tagger model loading
2017-11-01 20:42:36 +01:00
Matthew Honnibal
624644adfe
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 20:26:41 +01:00
ines
5f661a1b3a
Remove tensorizer from pre-set pipe_names
2017-11-01 19:48:33 +01:00
Matthew Honnibal
190522efd3
Fix tagger when some tags aren't in Morphology
2017-11-01 19:27:49 +01:00
Matthew Honnibal
e85e31cfbd
Fix backprop of d_pad
2017-11-01 19:27:26 +01:00
Matthew Honnibal
759cc79185
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 19:00:19 +01:00
Matthew Honnibal
1ae40b50b4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 17:07:02 +01:00
Matthew Honnibal
7ae1aacdb8
Fix add_label methods
2017-11-01 17:06:43 +01:00
ines
8c2260e18c
Move span tests to /doc
2017-11-01 16:56:35 +01:00
Matthew Honnibal
2ef7b59eb0
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 16:51:41 +01:00
ines
1d1f91a041
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 16:49:44 +01:00
ines
9659391944
Update deprecated methods and add warnings
2017-11-01 16:49:42 +01:00
ines
260cb37224
Catch deprecation warning
2017-11-01 16:49:18 +01:00
ines
5914faafbb
Fix .merge tests to not use deprecated API
2017-11-01 16:49:11 +01:00
ines
705a4e3e4a
Fix formatting
2017-11-01 16:44:08 +01:00
Matthew Honnibal
d17a12c71d
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 16:38:26 +01:00
Matthew Honnibal
9f9439667b
Don't create low-data text classifier if no vectors
2017-11-01 16:34:09 +01:00
Matthew Honnibal
e7a9174877
Add add_label methods to Tagger and TextCategorizer
2017-11-01 16:32:44 +01:00
ines
39e0586192
Add deprecated helper
...
Uses warning to show DeprecationWarning and custom stack trace
2017-11-01 16:32:36 +01:00
Matthew Honnibal
a7bf38bf31
Remove misleading comment on util.get_cuda_stream()
2017-11-01 13:57:25 +01:00
Matthew Honnibal
273e96b63f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 13:27:35 +01:00
Matthew Honnibal
9e0ebee81c
Add Token.is_sent_start property, so can deprecate Token.sent_start
2017-11-01 13:27:14 +01:00
Matthew Honnibal
7e7116cdf7
Fix Doc.to_array when only one string attr provided
2017-11-01 13:26:43 +01:00
Matthew Honnibal
301fb2bb60
Implement Span.n_lefts and Span.n_rights
2017-11-01 13:25:12 +01:00
Matthew Honnibal
c047498f87
Fix vectors test
2017-11-01 13:24:47 +01:00
ines
9a5e7c6fe2
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 13:14:45 +01:00
ines
bfe17b7df1
Fix begin_training if get_gold_tuples is None
2017-11-01 13:14:31 +01:00
ines
affd3404ab
Remove old model command (now "vocab")
2017-11-01 13:14:03 +01:00
Matthew Honnibal
fdb4b8e456
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 02:07:17 +01:00
Matthew Honnibal
c48dd0e1d3
Fix vector pruning
2017-11-01 02:06:58 +01:00
ines
37e62ab0e2
Update vector meta in meta.json
2017-11-01 01:25:09 +01:00
ines
96b4aef0bf
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 01:10:53 +01:00
Matthew Honnibal
86eba61fae
Fix token.vector when vectors are missing
2017-11-01 00:47:35 +01:00
ines
5683fd65ed
Update docstrings
2017-11-01 00:42:39 +01:00
Matthew Honnibal
44bce8e53f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 00:35:16 +01:00
Matthew Honnibal
c16310d156
Update vectors with find method
2017-11-01 00:34:55 +01:00
Ines Montani
d11659463b
Merge pull request #1152 from jimregan/develop-irish
...
[WIP] attempt a port from #1147
2017-11-01 00:23:43 +01:00