Søren Lind Kristiansen
bcc51d7d8b
Fix shifted positional arguments
2018-01-03 12:19:47 +01:00
zqhZY
f27859fa99
add ChineseDefaults class for pickling
2017-12-28 17:13:58 +08:00
Ines Montani
ff9fc945ab
Merge pull request #1749 from sorenlind/da_ud_tokenization
...
Tune Danish tokenizer to more closely match Universal Dependencies
2017-12-22 16:00:49 +00:00
ines
26f313dabc
Fix missing import
2017-12-22 16:21:44 +01:00
ines
8dc1c27841
Merge branch 'master' of https://github.com/explosion/spaCy
2017-12-22 16:01:00 +01:00
ines
b10ba848b8
xfail test that causes MemoryError on Python 2 on Windows
...
Need to investigate this further!
2017-12-22 16:00:58 +01:00
Søren Lind Kristiansen
bef735aef7
Fix Danish abbreviation 'm.h.t.'
2017-12-21 09:24:31 +01:00
Ines Montani
a3dd167d7f
Merge branch 'master' into da_ud_tokenization
2017-12-20 21:05:34 +00:00
Ines Montani
97f100f69f
Merge pull request #1742 from kimfalk/master
...
Two corrections in the da lan.
2017-12-20 21:02:00 +00:00
Ines Montani
d682a8803e
Merge pull request #1672 from cbilgili/master
...
Adds Turkish Lemmatization
2017-12-20 21:01:00 +00:00
Benjamin Peterson
9452134cd1
remove no-break spaces from Hindi example ( fixes #1750 )
2017-12-20 11:35:30 -08:00
Søren Lind Kristiansen
7a2f2f6f94
Fix formatting.
2017-12-20 18:37:37 +01:00
Søren Lind Kristiansen
15d13efafd
Tune Danish tokenizer to more closely match tokenization in Universal Dependencies.
2017-12-20 17:36:52 +01:00
Kim FalkJørgensen
648dc60755
Remove the incorrect exception 'm.h.t'
2017-12-20 10:02:39 +01:00
Kim FalkJørgensen
9c9f4ef84a
Fixing a translation error in examples.py
...
Adding an exception in the tokenizer_exceptions.py
2017-12-19 15:26:50 +01:00
ines
22dc744b48
Fix check for '@' in like_url (see #1715 )
2017-12-16 13:48:43 +01:00
Ines Montani
9c1ee65268
Add regression test for #1698
2017-12-12 10:36:11 +01:00
Ines Montani
6455b574fc
Check for email address first
2017-12-12 10:25:13 +01:00
Bri-Will
d77361d76c
Update lex_attrs.py. Fix like_url from matching on e-mail
2017-12-11 14:13:28 -08:00
Søren Lind Kristiansen
5a9d377580
Remove abbreviation for positional plac argument
2017-12-11 11:08:29 +01:00
Isaac Sijaranamual
38021fbb00
Switch from python 3 only TemporaryDirectory to pytest's tmpdir
2017-12-11 00:16:04 +01:00
Isaac Sijaranamual
20ae0c459a
Fixes "Error saving model" #1622
2017-12-10 23:07:13 +01:00
Isaac Sijaranamual
568130ce7c
Adds regression test_issue1622
2017-12-10 23:00:48 +01:00
Isaac Sijaranamual
e188b61960
Make cli/train.py not eat exception
2017-12-10 22:53:08 +01:00
ines
020a7e5d52
Allow 'fine_grained' option in displaCy (see #1703 )
...
Shows token.tag_ instead of token.pos_. Disabled by default, to not cause rendering issues for models with long fine-grained tags (e.g. merged morphological features).
2017-12-09 15:11:12 +01:00
Matthew Honnibal
3b17eb7c49
Merge branch 'master' of https://github.com/explosion/spaCy
2017-12-07 10:39:32 +01:00
Matthew Honnibal
a6b43729c6
Set version to v2.0.5
2017-12-07 10:39:14 +01:00
ines
5eaa61c2b8
Fix formatting
2017-12-07 10:23:09 +01:00
ines
24e80c51b8
Document init-model command
2017-12-07 10:14:37 +01:00
Matthew Honnibal
c91f451b0f
Fix imports and CLI in init-model
2017-12-07 10:03:07 +01:00
ines
82e80ff928
Rename model command to init_model and fix formatting
2017-12-07 09:59:23 +01:00
Ines Montani
2feeb428d6
Merge pull request #1646 from GreenRiverRUS/master
...
Added model command to create models from raw data
2017-12-07 08:54:26 +00:00
Matthew Honnibal
6373d2580d
Increment version to v2.0.5.dev0
2017-12-07 09:53:59 +01:00
Matthew Honnibal
36b47e3fa6
Fix (and test) vector pickling
2017-12-07 09:53:30 +01:00
Matthew Honnibal
05f41ff587
Set version to 2.0.4
2017-12-06 13:24:02 +01:00
Matthew Honnibal
04c38f7e87
Merge branch 'master' of https://github.com/explosion/spaCy
2017-12-06 12:15:52 +01:00
Matthew Honnibal
361944e512
If no rules are set, lemmatize by lookup
2017-12-06 12:12:11 +01:00
Matthew Honnibal
2ab0f2d186
Merge pull request #1664 from jimregan/italian-lemmatizer
...
BOM in Italian lemmatiser
2017-12-06 11:09:04 +01:00
Matthew Honnibal
3f247119d3
Merge pull request #1668 from sorenlind/da_morph
...
Add more Danish morph rules and clean up existing ones
2017-12-06 11:08:09 +01:00
Matthew Honnibal
b712de774e
Fix vectors pickling
2017-12-05 12:45:24 +01:00
Matthew Honnibal
04650e38c7
Set version to 2.0.4.dev0
2017-12-05 10:52:31 +01:00
Matthew Honnibal
07acb43a85
Merge branch 'master' of https://github.com/explosion/spaCy
2017-12-04 14:42:52 +01:00
Thomas Werkmeister
94eac75b7c
fix setup.py spacy req string for packaging
...
Requirement should be `spacy>=2.0.2` instead of `spacy2.0.2`
2017-12-03 04:16:28 -06:00
ines
f2ea6d4713
Add Dutch example sentences (see #1107 )
2017-12-01 23:36:05 +01:00
Canbey Bilgili
abe098b255
Adds Turkish Lemmatization
2017-12-01 17:04:32 +03:00
Søren Lind Kristiansen
d86b537a38
Enable morph rules for Danish
2017-11-30 15:58:02 +01:00
Søren Lind Kristiansen
13a988adc3
Remove 'Number[psor]'
2017-11-30 15:55:04 +01:00
Søren Lind Kristiansen
dd6fde18a9
Add more Danish morph rules and clean up existing ones
2017-11-30 11:17:19 +01:00
Vadim Mazaev
495eacf470
Merge branch 'model_command'
2017-11-30 12:30:26 +03:00
Vadim Mazaev
4ba7ddf651
Bugfixies
2017-11-30 12:29:38 +03:00
Jim O'Regan
a4ecdeadd4
aha
2017-11-29 23:43:25 +00:00
Jim O'Regan
2c7a9215d7
Merge branch 'master' into animacy
2017-11-29 23:31:12 +00:00
Jim O'Regan
c3e6cee17a
use inan in polimorf tagset conversion
2017-11-29 23:15:47 +00:00
Jim O'Regan
b32575e78c
imports
2017-11-29 23:03:41 +00:00
Jim O'Regan
3696ce6a7b
add UD mapping
2017-11-29 22:59:19 +00:00
Jim O'Regan
f8e7082fe4
typo in "inan", add "nhum"
2017-11-29 22:40:47 +00:00
Matthew Honnibal
6bc0f4d29f
Merge pull request #1611 from fsonntag/master
...
Solving #1494
2017-11-29 23:11:23 +01:00
Matthew Honnibal
f9ed9ea529
Merge pull request #1624 from GreenRiverRUS/russian
...
Add support for Russian
2017-11-29 23:10:01 +01:00
Jim O'Regan
076a6fc60a
symbols
2017-11-29 20:11:20 +00:00
Jim O'Regan
834ba3c69a
(semi generated) Polimorf mapping
2017-11-29 20:08:24 +00:00
Jim O'Regan
ba6a23fd11
BOM in Italian lemmatiser
2017-11-29 17:40:07 +00:00
ines
a31506e060
Fix off-by-one error in nlp.add_pipe(after=name) ( fixes #1654 )
2017-11-28 20:37:55 +01:00
ines
b62739fbfe
Add regression test for #1654
2017-11-28 20:27:54 +01:00
ines
2e50dbb9d7
Simplify test
2017-11-28 20:27:27 +01:00
Felix Sonntag
724ae7dc55
Fixed issue of infix capturing prefixes
2017-11-28 17:17:12 +01:00
Ines Montani
9052643e2c
Merge pull request #1653 from sorenlind/da_example_typo
...
Fix typo
2017-11-27 14:47:42 +00:00
Søren Lind Kristiansen
5fe58b885b
Fix typo
2017-11-27 15:36:18 +01:00
Ines Montani
d52b1ab245
Add unicode_literals (hopefully fixes test failure on Python 2)
2017-11-27 15:16:54 +01:00
Søren Lind Kristiansen
0ffd27b0f6
Add several Danish alternative spellings
2017-11-27 13:35:41 +01:00
Ines Montani
6362024cf8
Merge pull request #1645 from GreenRiverRUS/fix_default_meta
...
Fixed spaCy version string in default meta
2017-11-27 11:58:02 +00:00
Vadim Mazaev
c332ffdde1
Added model command to create model from raw data:
...
words counts, brown clusters and vectors
2017-11-27 01:21:47 +03:00
Vadim Mazaev
59f03ab1d7
Fixed spacy version string in default meta
2017-11-26 23:02:07 +03:00
Vadim Mazaev
53e7c38637
Fixed tests depends on pymorphy2
2017-11-26 21:04:44 +03:00
Vadim Mazaev
cacd859dcd
Added tag map, fixed tests fails, added more exceptions
2017-11-26 20:54:48 +03:00
Ines Montani
a7bb8f1b42
Merge pull request #1637 from sorenlind/da_tokenization
...
Improve Danish tokenization
2017-11-26 15:41:38 +00:00
ines
c699aec089
Add offsets_from_biluo_tags helper and tests (see #1626 )
2017-11-26 16:38:01 +01:00
Søren Lind Kristiansen
ef03e9ea53
Remove unused import.
2017-11-25 13:04:02 +01:00
Søren Lind Kristiansen
6aa241bcec
Add day of month tokenizer exceptions for Danish.
2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen
0c276ed020
Add weekday abbreviations and remove abiguous month abbreviations for Danish.
2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen
056547e989
Add multiple tokenizer exceptions for Danish.
2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen
8dc265ac0c
Add test for tokenization of 'i.' for Danish.
2017-11-24 11:29:37 +01:00
Søren Lind Kristiansen
ac8116510d
Fix tokenization of 'i.' for Danish.
2017-11-24 11:16:53 +01:00
Matthew Honnibal
79f11d4f85
Pickle vectors with vocab
2017-11-23 17:19:50 +01:00
Matthew Honnibal
f29c3925ee
Fix more efficient nonproj
2017-11-23 12:48:00 +00:00
Matthew Honnibal
e10e9ad2c5
Improve efficiency of Doc.to_array
2017-11-23 12:33:27 +00:00
Matthew Honnibal
2acc907d55
Improve profiling
2017-11-23 12:33:03 +00:00
Matthew Honnibal
fa62427300
Remove lookup-based lemmatization
2017-11-23 12:32:22 +00:00
Matthew Honnibal
fb26b2cb12
Use lookup lemmatizer if lemma unset
2017-11-23 12:31:58 +00:00
Matthew Honnibal
db5c714ad2
Improve efficiency of deprojectivization
2017-11-23 12:31:34 +00:00
Matthew Honnibal
8fec7268eb
Move string cleanup under a setting flag
2017-11-23 12:19:18 +00:00
Matthew Honnibal
5949777b12
Fix misleading multi-threading docstring
2017-11-23 12:18:59 +00:00
Matthew Honnibal
542e6fd4ea
Don't remove entries from specials
2017-11-23 12:17:42 +00:00
Matthew Honnibal
30ba81f881
Merge pull request #1576 from ligser/master
...
Actually reset caches in pipe [wip]
2017-11-23 12:54:48 +01:00
ines
c90fe92e15
Fix displaCy test
2017-11-22 05:04:39 +01:00
ines
a6f33ac27d
Fix displaCy test
2017-11-22 04:19:28 +01:00
ines
93b0be611a
Merge branch 'master' of https://github.com/explosion/spaCy
2017-11-22 00:28:55 +01:00
ines
60b4915569
Use .pos_ instead of .tags_ in displaCy by default (see #1006 )
2017-11-22 00:28:52 +01:00
Vadim Mazaev
81314f8659
Fixed tokenizer: added char classes; added first lemmatizer and
...
tokenizer tests
2017-11-21 22:23:59 +03:00
Vadim Mazaev
52ee1f9bf9
Updated Russian Language, added lemmatizer, norm exceptions and lex
...
attrs
2017-11-21 11:44:46 +03:00
Burton DeWilde
a5c6869b2d
Fix bug where span.orth_ != span.text (see #1612 )
2017-11-20 12:05:43 -06:00
Burton DeWilde
635792997c
Add regression test for #1612
2017-11-20 12:05:35 -06:00
ines
9a63e32f21
Add noqa to Python 2 compat variables of built-ins (see #1617 )
2017-11-20 14:03:42 +01:00
ines
d70a64d78b
Fix syntax error and formatting in test (see #1617 )
2017-11-20 14:01:25 +01:00
ines
17849dee4b
Fix French test (see #1617 )
2017-11-20 13:59:59 +01:00
Felix Sonntag
33b0f86de3
Changed tokenizer to add infix when infix_start is offset
2017-11-19 16:32:10 +01:00
Felix Sonntag
8be3392302
Added regression text for 1494
2017-11-19 16:30:35 +01:00
Motoki Wu
a52e195a0a
Fixes Issue #1207 where noun_chunks
of Span
gives an error.
...
Make sure to reference `self.doc` when getting the noun chunks.
Same fix as 9750a0128c
2017-11-17 17:16:20 -08:00
Motoki Wu
b818afaa0e
Added failing test for Issue #1207 .
...
The noun chunk iterator should work for `Doc` but not for `Span`.
2017-11-17 17:04:27 -08:00
Vadim Mazaev
a0739a06d4
Returned russian support from v1.10 branch
2017-11-17 17:06:15 +03:00
yuukos
7401152289
updated Russian tokenizer
...
moved the trying to import pymorph into __init__
2017-11-17 17:04:50 +03:00
yuukos
3aad66cf00
added russian language support
2017-11-17 17:04:22 +03:00
ines
a3d4dd1a5d
Test adding of lots of pipeline components (see #1585 )
...
Just to make sure that there's no error now or in the future with adding a large number of pipeline components.
2017-11-15 17:28:06 +01:00
Roman Domrachev
61d28d03e4
Try again to do selective remove cache
2017-11-15 19:11:12 +03:00
Roman Domrachev
b3311100c7
Merge branch 'master' of github.com:explosion/spaCy
2017-11-15 18:30:04 +03:00
Matthew Honnibal
b60d92aca8
Increment version
2017-11-15 16:14:46 +01:00
Roman Domrachev
505c6a2f2f
Completely cleanup tokenizer cache
...
Tokenizer cache can have be different keys than string
That modification can slow down tokenizer and need to be measured
2017-11-15 17:55:48 +03:00
Matthew Honnibal
cf0be62096
Increment version
2017-11-15 15:00:18 +01:00
ines
97a4f9362b
Merge branch 'master' of https://github.com/explosion/spaCy
2017-11-15 14:24:00 +01:00
ines
8e65247886
Fix lex.id if vectors is None
2017-11-15 14:23:58 +01:00
Matthew Honnibal
437ad1a852
Merge pull request #1570 from explosion/feature/fix-beam-leak
...
Fix memory leak in beam parser
2017-11-15 14:15:05 +01:00
Matthew Honnibal
2f169fdb0a
Set lex ID correctly for new tokens in Vocab
2017-11-15 13:58:03 +01:00
Matthew Honnibal
fe3c42a06b
Fix caching in tokenizer
2017-11-15 13:55:46 +01:00
Matthew Honnibal
8d692771f6
Improve profiling
2017-11-15 13:51:25 +01:00
Matthew Honnibal
b797dca977
Merge branch 'master' of https://github.com/explosion/spaCy
2017-11-15 13:11:43 +01:00
ines
c9d72de0fb
Add dummy serialization methods for Japanese and missing lang getter ( resolves #1557 )
2017-11-15 12:44:02 +01:00
Matthew Honnibal
d274d3a3b9
Let beam forward use minibatches
2017-11-15 00:51:42 +01:00
Matthew Honnibal
855872f872
Remove state hashing
2017-11-14 23:36:46 +01:00
Roman Domrachev
3e21680814
Use safer method to get string without hit
2017-11-14 22:58:46 +03:00
Roman Domrachev
a33d5a068d
Try to hold origin data instead of restore it
2017-11-14 22:40:03 +03:00
Roman Domrachev
91e2fa6561
Clean all caches
2017-11-14 21:15:04 +03:00
Roman Domrachev
4e378dc4a4
Remove all obsolete code and test only initial problem
2017-11-14 20:45:04 +03:00
Roman
47ce2347b0
Create test that fails when actual cleanup caused
2017-11-14 20:28:13 +03:00
Roman
caae77f72d
Update strings.pyx
2017-11-14 19:44:40 +03:00
Roman Domrachev
3d247d2bb8
Get back previous testcase
2017-11-14 18:01:37 +03:00
Roman Domrachev
870defa815
Swap keys in proper place
...
Remove unnecessary clear of the hits
2017-11-14 17:56:30 +03:00
Roman Domrachev
86ca434c93
Merge github.com:explosion/spaCy
2017-11-14 17:46:22 +03:00
Roman Domrachev
a2745b0e84
StringStore now actually cleaned
...
Do not lose docs in ref tracking
2017-11-14 17:45:50 +03:00
Matthew Honnibal
2512ea9eeb
Fix memory leak in beam parser
2017-11-14 02:11:40 +01:00
Matthew Honnibal
86ddf692a1
Fix bug in limit calculation on dev data
2017-11-14 01:37:10 +01:00
Ines Montani
ea6c85c67a
Merge pull request #1566 from MathiasDesch/master ( resolves #1248 )
...
Add exceptions to tokenizer and norm
2017-11-13 19:05:22 +01:00
Matthew Honnibal
1b348389bb
Merge branch 'master' of https://github.com/explosion/spaCy
2017-11-13 18:18:48 +01:00
Matthew Honnibal
ca73d0d8fe
Cleanup states after beam parsing, explicitly
2017-11-13 18:18:26 +01:00
Matthew Honnibal
63ef9a2e73
Remove __dealloc__ from ParserBeam
2017-11-13 18:18:08 +01:00
Mathias Deschamps
c0691b2ab4
Add tokenizer exceptions for ing verbs
...
Extend list of tokenizing exceptions introduced in 123810b
2017-11-13 17:46:05 +01:00
Mathias Deschamps
288298ead9
Add norm exception for ing verbs
...
Some ing verbs are sometimes written in or in'. Make the NORM form correct
2017-11-13 17:46:05 +01:00
Abhinav Sharma
59f5740ede
improved upon the list of included stop_words
2017-11-13 17:13:49 +05:30
Matthew Honnibal
6e641f46d4
Create a preprocess function that gets bigrams
2017-11-12 00:43:41 +01:00
Matthew Honnibal
c9251d79e3
Edit comment
2017-11-11 18:38:32 +01:00
Matthew Honnibal
dd1678eab3
Edit comment
2017-11-11 18:37:08 +01:00
Roman Domrachev
ee60a52ee7
Fix test imports and last batch cleanup
2017-11-11 11:32:16 +03:00
Roman Domrachev
4a6b094e09
Remove unused import
2017-11-11 03:13:05 +03:00
Roman Domrachev
3c600adf23
Try to fix StringStore clean up (see #1506 )
2017-11-11 03:11:27 +03:00
ines
ee97fd3cb4
Add regression test for #1547
2017-11-11 00:14:03 +01:00
ines
2df27db671
Add unicode declaration
2017-11-11 00:13:56 +01:00
ines
35653bef3a
Add missing import ( fixes #1546 )
2017-11-10 19:05:18 +01:00
ines
4c5d2c80d5
Re-add python -m to commands, too brittle :( (see #1536 )
2017-11-10 02:30:55 +01:00
ines
123810b6de
Add "lovin'" to tokenizer exceptions (see #1248 )
2017-11-09 17:09:30 +01:00
ines
1c218397f6
Ensure path in Doc.to_disk/from_disk (resolves ##1521)
...
Also add Doc serialization tests with both Path and string path options
2017-11-09 02:29:03 +01:00
Matthew Honnibal
49fd5a646f
Set version for 2.0.2 release
2017-11-08 22:39:39 +01:00
Matthew Honnibal
fba2dbddf7
Increment version
2017-11-08 22:19:08 +01:00
Matthew Honnibal
a5ea0fdf5a
Fix #1518 : vocab.vectors.resize() didn't work
2017-11-08 22:18:37 +01:00
Matthew Honnibal
de45702bbe
Strip dev suffixes from version for compatibility check
2017-11-08 18:40:21 +01:00
Matthew Honnibal
51639214a1
Merge branch 'master' of https://github.com/explosion/spaCy
2017-11-08 18:04:33 +01:00
Matthew Honnibal
a2f980de4e
Exclude .devN versioning from compatibility check
2017-11-08 18:03:52 +01:00
Daniel Hershcovich
d7ae54ff44
Fix typo in message
2017-11-08 16:06:28 +02:00
Matthew Honnibal
4194bc5744
Xfail flakey serialization test
2017-11-08 13:55:13 +01:00
Matthew Honnibal
d5537e5516
Work on Windows test failure
2017-11-08 13:25:18 +01:00
Matthew Honnibal
c27c82d5f9
Fix serialization
2017-11-08 13:08:48 +01:00
Matthew Honnibal
1d5599cd28
Fix dtype
2017-11-08 12:18:32 +01:00
Matthew Honnibal
fa7fdd0d9b
Merge branch 'master' of https://github.com/explosion/spaCy
2017-11-08 12:11:31 +01:00
Matthew Honnibal
072ff38a01
Try to fix python3.5 serialization
2017-11-08 12:10:49 +01:00
Ines Montani
3a0f34d567
Merge pull request #1509 from abhi18av/patch-1
...
Create examples.py for Hindi language
2017-11-08 11:37:19 +01:00
Ines Montani
42b241ccd0
Update language code in usage example in comment
2017-11-08 11:36:38 +01:00
Matthew Honnibal
e262e8d942
Increment version to v2.0.2.dev0
2017-11-08 11:25:47 +01:00
Matthew Honnibal
a8b592783b
Make a dtype more specific, to fix a windows build
2017-11-08 11:24:35 +01:00
Abhinav Sharma
84edade82d
Create examples.py
...
Populated the file with the translations of English example sentences
2017-11-08 13:23:08 +05:30
Matthew Honnibal
d725aee4e2
Increment version to 2.0.1
2017-11-08 02:14:47 +01:00
Matthew Honnibal
8d6f68f1df
Increment version
2017-11-08 01:12:34 +01:00
ines
bcf42b8846
Fix typo
2017-11-08 01:06:37 +01:00
Matthew Honnibal
bbd2a3dee1
Fix title in about.py
2017-11-07 14:02:58 +01:00
Matthew Honnibal
4efaf9306c
Set version to spacy-nightly rc2
2017-11-07 13:27:26 +01:00
Matthew Honnibal
bf1ec2965f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-07 13:20:29 +01:00
Matthew Honnibal
726f689da4
Fix missing import
2017-11-07 13:20:12 +01:00
ines
834f9c1aab
Update about.py
2017-11-07 13:11:33 +01:00
ines
a4662a31a9
Move model package templates to cli.package and update docs
2017-11-07 12:15:35 +01:00
ines
a09c096d3c
Get docs ready for v2.0.0
2017-11-07 12:00:43 +01:00
Matthew Honnibal
9a88e66103
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-07 02:00:06 +01:00
Matthew Honnibal
174abe4677
Increment to 2.0.0rc1
2017-11-07 01:59:46 +01:00
ines
42a0fbf291
Fix textcat simple train example
2017-11-07 01:25:54 +01:00
ines
8fb48b9b91
Update and document new util functions
2017-11-07 00:22:43 +01:00
Matthew Honnibal
1cab703bba
Move minibatch function to util
2017-11-06 23:45:36 +01:00
ines
5f43953536
Move test
2017-11-06 23:14:10 +01:00
Matthew Honnibal
dd90fe09f5
Remove extraneous label from textcat class
2017-11-06 22:09:02 +01:00
Matthew Honnibal
45e0617e61
Allow Language.update to take unicode text and dict objects
2017-11-06 22:07:38 +01:00
Matthew Honnibal
1831dbd065
Add test of simple textcat workflow
2017-11-06 22:04:29 +01:00
Matthew Honnibal
ffb9101f3f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-06 19:20:41 +01:00
Matthew Honnibal
8fea512ac8
Don't set tensor in textcat
2017-11-06 19:20:14 +01:00
ines
acb9bdb852
Fix PRON_LEMMA imports
2017-11-06 17:41:53 +01:00
Matthew Honnibal
7d46793dd7
Add PRON_LEMMA to spacy.symbols
2017-11-06 17:38:25 +01:00
Matthew Honnibal
2f7e9f390d
Make test less flakey
2017-11-06 17:34:50 +01:00
Matthew Honnibal
407b08017e
Make test less flakey
2017-11-06 17:31:40 +01:00
Matthew Honnibal
102f797933
Fix lemma ordering in test
2017-11-06 17:02:17 +01:00
Matthew Honnibal
75e1618ec3
Fix lemma clobbering
2017-11-06 16:56:19 +01:00
Matthew Honnibal
6fdffd7246
Merge pull request #1497 from explosion/feature/improve-optimizer-handling
...
💫 Improve optimizer handling
2017-11-06 16:41:15 +01:00
Matthew Honnibal
8e6795437b
Set release=True
2017-11-06 16:39:32 +01:00
Matthew Honnibal
5c85bf3791
Fix missing import
2017-11-06 15:06:27 +01:00
Matthew Honnibal
25859dbb48
Return optimizer from begin_training, creating if necessary
2017-11-06 14:26:49 +01:00
Matthew Honnibal
465adfee94
Remove unused resume_training method, and pass optimizer through
2017-11-06 14:26:00 +01:00
Matthew Honnibal
13336a6197
Fix Adam import
2017-11-06 14:25:37 +01:00
Matthew Honnibal
2eb11d60f2
Add function create_default_optimizer to spacy._ml
2017-11-06 14:11:59 +01:00
Matthew Honnibal
31babe3c3f
Fix non-clobbering lemmatization
2017-11-06 12:36:05 +01:00
Matthew Honnibal
63c6ae4191
Fix lemmatizer test
2017-11-06 11:57:06 +01:00
Matthew Honnibal
a86a0181b5
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-05 22:19:10 +01:00
Matthew Honnibal
134d3b8143
Fix morphology
2017-11-05 22:18:22 +01:00
ines
08d1cf850a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-05 21:41:58 +01:00
ines
baa231745c
Fix Dutch tag map
2017-11-05 21:41:50 +01:00
Matthew Honnibal
46e62ad747
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-05 19:40:00 +01:00
Matthew Honnibal
bb25cb0f76
Avoid clobbering preset lemmas
2017-11-05 19:39:38 +01:00
ines
507ecb67af
Fix Spanish tag map
2017-11-05 19:23:34 +01:00
Matthew Honnibal
320008352b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-05 18:46:15 +01:00
Matthew Honnibal
38109a0e4a
Register SentenceSegmenter in Language.factories
2017-11-05 18:45:57 +01:00
ines
975e1042ff
Fix Italian tag map
2017-11-05 18:34:09 +01:00
ines
6b2d6e4937
Fix Portuguese tag map
2017-11-05 18:31:00 +01:00
ines
fa2687fded
Fix Dutch tag map
2017-11-05 17:57:59 +01:00
ines
fb8990d916
Fix Spanish tag map
2017-11-05 17:48:46 +01:00
ines
9d13288f73
Fix French tag map
2017-11-05 17:47:59 +01:00
ines
54579805c5
Fix French tag map
2017-11-05 17:44:05 +01:00
Matthew Honnibal
2b35bb76ad
Fix tensorizer on GPU
2017-11-05 15:34:40 +01:00
Matthew Honnibal
6e5181bbaa
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-05 15:33:56 +01:00
Matthew Honnibal
6f438b17c1
Increment version to v2.0.0a19
2017-11-05 14:43:36 +01:00
Matthew Honnibal
225cc249c9
Pass string path to numpy, to fix #1479
2017-11-05 14:42:46 +01:00
Matthew Honnibal
00435d8f0c
Add extra beam parsing test
2017-11-05 14:39:57 +01:00
Matthew Honnibal
e777ea25bb
Merge pull request #1492 from uwol/develop
...
TextCategorizer return parameter fix
2017-11-05 14:13:04 +01:00
Matthew Honnibal
0d4bd6414e
Fix Italian tag map
2017-11-05 14:11:03 +01:00
ines
ef597622a6
Add Portuguese tag map
2017-11-05 13:58:34 +01:00
ines
793c62dfda
Add Dutch tag map
2017-11-05 13:48:07 +01:00
ines
f7485a09c8
Fix Italian tag map
2017-11-05 13:12:58 +01:00
uwol
a2162b8908
tensorizer return parameter fix
2017-11-05 12:25:10 +01:00
ines
0a27afbf86
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-04 23:32:52 +01:00
ines
3cef901834
Add tag map for French and Italian
2017-11-04 23:32:51 +01:00
Matthew Honnibal
cfb83c231c
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-04 23:08:19 +01:00
Matthew Honnibal
d185927998
Undo harmful pickling hacks on Language class
2017-11-04 23:07:03 +01:00
ines
6c15aafebd
Fix formatting
2017-11-04 23:07:02 +01:00
Matthew Honnibal
3ca16ddbd4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-04 00:25:02 +01:00
Matthew Honnibal
e4ec4be948
Fix parser test
2017-11-04 00:23:45 +01:00
Matthew Honnibal
98c29b7912
Add padding vector in parser, to make gradient more correct
2017-11-04 00:23:23 +01:00
ines
5e7d98f72a
Remove test for #1491
2017-11-03 22:10:57 +01:00
ines
718f1c50fb
Add regression test for #1491
2017-11-03 21:11:20 +01:00
Matthew Honnibal
144a93c2a5
Back-off to tensor for similarity if no vectors
2017-11-03 20:56:33 +01:00
Matthew Honnibal
1e9634691a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-03 20:21:15 +01:00
Matthew Honnibal
13c8881d2f
Expose parser's tok2vec model component
2017-11-03 20:20:59 +01:00
Matthew Honnibal
17c63906f9
Update tensorizer component
2017-11-03 20:20:26 +01:00
Matthew Honnibal
2bf21cbe29
Update model after optimising it instead of waiting
2017-11-03 20:20:01 +01:00
Matthew Honnibal
d6e831bf89
Fix lemmatizer tests
2017-11-03 19:46:34 +01:00
ines
eef930c73e
Assert instead of print
2017-11-03 18:50:57 +01:00
ines
f0986df94b
Add test for #1488 (passes on v2.0.0a18?)
2017-11-03 14:44:36 +01:00
Matthew Honnibal
711278b667
Make test less flakey
2017-11-03 14:36:08 +01:00
Matthew Honnibal
7fea845374
Remove print statement
2017-11-03 14:04:51 +01:00
Matthew Honnibal
0a534ae96a
Fix test for backprop d_pad
2017-11-03 14:04:16 +01:00
Matthew Honnibal
33bd2428db
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-03 13:29:56 +01:00
Matthew Honnibal
6681058abd
Fix tensor extending in tagger
2017-11-03 13:29:36 +01:00
Matthew Honnibal
bd2cbdfa85
Make Morphology not fail on unknown tags
2017-11-03 13:29:09 +01:00
Matthew Honnibal
c9b118a7e9
Set softmax attr in tagger model
2017-11-03 11:22:01 +01:00
Matthew Honnibal
a5b05f85f0
Set Doc.tensor attribute in parser
2017-11-03 11:21:00 +01:00
Matthew Honnibal
62ed58935a
Add Doc.extend_tensor() method
2017-11-03 11:20:31 +01:00
Matthew Honnibal
d6fc39c8a6
Set Doc.tensor from Tagger
2017-11-03 11:20:05 +01:00
Matthew Honnibal
b3264aa5f0
Expose the softmax layer in the tagger model, to allow setting tensors
2017-11-03 11:19:51 +01:00
Matthew Honnibal
c2bbf076a4
Add document length cap for training
2017-11-03 01:54:54 +01:00
Matthew Honnibal
6771780d3f
Fix backprop of padding variable
2017-11-03 01:54:34 +01:00
Matthew Honnibal
54a716f2ec
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-03 00:55:20 +01:00
Matthew Honnibal
260e6ee3fb
Improve efficiency of backprop of padding variable
2017-11-03 00:49:11 +01:00
Matthew Honnibal
a22f96c3f1
Add test for backpropagating padding
2017-11-03 00:48:54 +01:00
ines
9baab241b4
Add skeleton language data for Turkish
2017-11-02 16:32:24 +01:00
ines
c6fea3e5f6
Add Romanian and Croatian skeletons (experimental)
...
Add language data templates to make it easier for others to contribute to the language support
2017-11-01 23:04:28 +01:00
ines
18c859500b
Add missing imports
2017-11-01 23:02:51 +01:00
ines
819e30a26e
Tidy up tokenizer exceptions
2017-11-01 23:02:45 +01:00
ines
3af281a334
Update test model name
2017-11-01 23:02:00 +01:00
Matthew Honnibal
b30dd36179
Allow Tagger.add_label() before training
2017-11-01 21:49:24 +01:00
Matthew Honnibal
eca41f0cf6
Fix filename conversion for conllu
2017-11-01 21:26:49 +01:00
Matthew Honnibal
e237472cdc
Fix tag and filename conversion for conllu
2017-11-01 21:25:33 +01:00
Matthew Honnibal
b84d99b281
Revert tagger.add_label() changes, to fix model
2017-11-01 21:10:45 +01:00
Matthew Honnibal
f5855e539b
Fix tagger model loading
2017-11-01 20:42:36 +01:00
Matthew Honnibal
624644adfe
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 20:26:41 +01:00
ines
5f661a1b3a
Remove tensorizer from pre-set pipe_names
2017-11-01 19:48:33 +01:00
Matthew Honnibal
190522efd3
Fix tagger when some tags aren't in Morphology
2017-11-01 19:27:49 +01:00
Matthew Honnibal
e85e31cfbd
Fix backprop of d_pad
2017-11-01 19:27:26 +01:00
Matthew Honnibal
759cc79185
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 19:00:19 +01:00
Matthew Honnibal
1ae40b50b4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 17:07:02 +01:00
Matthew Honnibal
7ae1aacdb8
Fix add_label methods
2017-11-01 17:06:43 +01:00
ines
8c2260e18c
Move span tests to /doc
2017-11-01 16:56:35 +01:00
Matthew Honnibal
2ef7b59eb0
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 16:51:41 +01:00
ines
1d1f91a041
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 16:49:44 +01:00
ines
9659391944
Update deprecated methods and add warnings
2017-11-01 16:49:42 +01:00
ines
260cb37224
Catch deprecation warning
2017-11-01 16:49:18 +01:00
ines
5914faafbb
Fix .merge tests to not use deprecated API
2017-11-01 16:49:11 +01:00
ines
705a4e3e4a
Fix formatting
2017-11-01 16:44:08 +01:00
Matthew Honnibal
d17a12c71d
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 16:38:26 +01:00
Matthew Honnibal
9f9439667b
Don't create low-data text classifier if no vectors
2017-11-01 16:34:09 +01:00
Matthew Honnibal
e7a9174877
Add add_label methods to Tagger and TextCategorizer
2017-11-01 16:32:44 +01:00
ines
39e0586192
Add deprecated helper
...
Uses warning to show DeprecationWarning and custom stack trace
2017-11-01 16:32:36 +01:00
Matthew Honnibal
a7bf38bf31
Remove misleading comment on util.get_cuda_stream()
2017-11-01 13:57:25 +01:00
Matthew Honnibal
273e96b63f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 13:27:35 +01:00
Matthew Honnibal
9e0ebee81c
Add Token.is_sent_start property, so can deprecate Token.sent_start
2017-11-01 13:27:14 +01:00
Matthew Honnibal
7e7116cdf7
Fix Doc.to_array when only one string attr provided
2017-11-01 13:26:43 +01:00
Matthew Honnibal
301fb2bb60
Implement Span.n_lefts and Span.n_rights
2017-11-01 13:25:12 +01:00
Matthew Honnibal
c047498f87
Fix vectors test
2017-11-01 13:24:47 +01:00
ines
9a5e7c6fe2
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 13:14:45 +01:00
ines
bfe17b7df1
Fix begin_training if get_gold_tuples is None
2017-11-01 13:14:31 +01:00
ines
affd3404ab
Remove old model command (now "vocab")
2017-11-01 13:14:03 +01:00
Matthew Honnibal
fdb4b8e456
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 02:07:17 +01:00
Matthew Honnibal
c48dd0e1d3
Fix vector pruning
2017-11-01 02:06:58 +01:00
ines
37e62ab0e2
Update vector meta in meta.json
2017-11-01 01:25:09 +01:00
ines
96b4aef0bf
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 01:10:53 +01:00
Matthew Honnibal
86eba61fae
Fix token.vector when vectors are missing
2017-11-01 00:47:35 +01:00
ines
5683fd65ed
Update docstrings
2017-11-01 00:42:39 +01:00
Matthew Honnibal
44bce8e53f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 00:35:16 +01:00
Matthew Honnibal
c16310d156
Update vectors with find method
2017-11-01 00:34:55 +01:00
Ines Montani
d11659463b
Merge pull request #1152 from jimregan/develop-irish
...
[WIP] attempt a port from #1147
2017-11-01 00:23:43 +01:00
ines
2ad2f09d12
Update docstrings and simplify most_similar
2017-11-01 00:18:08 +01:00
Jim O'Regan
08b0bfd153
merge
2017-10-31 22:55:59 +00:00
Jim O'Regan
00ecfa5417
Ó, not O
2017-10-31 22:54:42 +00:00
ines
ba2e6c8c6f
Update docstrings and formatting
2017-10-31 23:23:34 +01:00
Matthew Honnibal
0de8d213a3
Merge pull request #1475 from explosion/feature/sm-vectors
...
Improve and simplify Vectors class
2017-10-31 22:59:50 +01:00
Ines Montani
25b1d6cd91
Fix syntax error
2017-10-31 22:36:03 +01:00
Matthew Honnibal
92dc127569
Fix test for Python 3
2017-10-31 22:21:55 +01:00
Jim O'Regan
fe4b10346a
replace example sentence until I get around to adding a punctuation.py
2017-10-31 20:24:53 +00:00
Matthew Honnibal
c5799ecc7b
Remove print statement
2017-10-31 21:12:33 +01:00
ines
7e424a1804
Don't copy exception dicts if not necessary and tidy up
2017-10-31 21:05:29 +01:00
Matthew Honnibal
c390f2d745
Make it easier to pass explicit no-pruning to vocab
2017-10-31 20:14:47 +01:00
Ines Montani
06c25a8882
Remove comma that caused list to wrap in tuple!
...
Also removed extra dict wrappings for performance (we used to have them in there, but they should only really exist if copying the dict is absolutely necessary)
2017-10-31 20:13:16 +01:00
Matthew Honnibal
d90a22afe6
Fix loading previous vectors models
2017-10-31 19:58:35 +01:00
Ines Montani
147448b65b
Add missing symbols
2017-10-31 19:34:45 +01:00
Matthew Honnibal
997a61557a
Add vectors.n_keys property
2017-10-31 19:30:52 +01:00
Matthew Honnibal
8075726838
Restore vector usage in models
2017-10-31 19:21:17 +01:00
Matthew Honnibal
3659a807b0
Remove vector pruning arg from train CLI
2017-10-31 19:21:05 +01:00
Ines Montani
9b0de9fb43
Fix import of symbols (now nested one level lower)
2017-10-31 19:17:58 +01:00
Matthew Honnibal
59203a2e8a
Move vector pruning command into spacy vocab cli tool
2017-10-31 19:10:01 +01:00
Matthew Honnibal
77d8f5de9a
Revise and simplify Vectors class
2017-10-31 18:25:08 +01:00
Jim O'Regan
d4a8160c36
change quotes
2017-10-31 15:15:44 +00:00
Jim O'Regan
34ca59691b
no idea what is wrong here
2017-10-31 14:50:13 +00:00
Jim O'Regan
41dd29e48e
merge
2017-10-31 14:07:45 +00:00
Matthew Honnibal
cb5217012f
Fix vector remapping
2017-10-31 11:40:46 +01:00
Matthew Honnibal
9c11ee4a1c
WIP on vectors fixes
2017-10-31 11:22:56 +01:00
Matthew Honnibal
ce876c551e
Fix GPU usage
2017-10-31 02:33:34 +01:00
Matthew Honnibal
7698903617
Fix GPU usage
2017-10-31 02:33:16 +01:00
Matthew Honnibal
368fdb389a
WIP on refactoring and fixing vectors
2017-10-31 02:00:26 +01:00
Matthew Honnibal
4e3006cec7
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-30 19:44:58 +01:00
Matthew Honnibal
4112a991ec
Fix vector pruning
2017-10-30 19:44:40 +01:00
ines
ec657c1ddc
Update vocab docs and document Vocab.prune_vectors
2017-10-30 19:35:41 +01:00
ines
803e41bc66
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-30 18:39:51 +01:00