Matthew Honnibal
0c7fab4443
Set version to 2.0.11
2018-04-04 11:19:11 +02:00
Matthew Honnibal
a350be0601
Fix vector-name loading fix
2018-04-04 01:31:25 +02:00
Matthew Honnibal
21047bde52
Fix syntax error in italian lemmatizer
2018-04-03 23:13:22 +02:00
Matthew Honnibal
81f4005f3d
Fix loading models with pretrained vectors
2018-04-03 23:11:48 +02:00
ines
3463ded7cf
Check if spaCy has compiled correctly and show error message
2018-04-03 22:18:47 +02:00
Matthew Honnibal
96b612873b
Add hyper-parameter to control whether parser makes a beam update
2018-04-03 22:02:56 +02:00
ines
e5f47cd82d
Update errors
2018-04-03 21:40:29 +02:00
Matthew Honnibal
f7e6313b43
Increment version to v2.0.11.dev0
2018-04-03 20:58:47 +02:00
ines
10462816bc
Fix tests for Python 2
2018-04-03 18:51:31 +02:00
ines
62b4b527d7
Don't raise error if set_extension has getter and setter ( closes #2177 )
...
Improve error messages, raise error if setter is specified without a getter and compare against _unset to allow default=None. Also add more tests.
2018-04-03 18:30:17 +02:00
ines
ee3082ad29
Fix whitespace
2018-04-03 18:29:53 +02:00
Ines Montani
3141e04822
💫 New system for error messages and warnings ( #2163 )
...
* Add spacy.errors module
* Update deprecation and user warnings
* Replace errors and asserts with new error message system
* Remove redundant asserts
* Fix whitespace
* Add messages for print/util.prints statements
* Fix typo
* Fix typos
* Move CLI messages to spacy.cli._messages
* Add decorator to display error code with message
An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc.
* Remove unused link in spacy.about
* Update errors for invalid pipeline components
* Improve error for unknown factories
* Add displaCy warnings
* Update formatting consistency
* Move error message to spacy.errors
* Update errors and check if doc returned by component is None
2018-04-03 15:50:31 +02:00
Matthew Honnibal
abf8b16d71
Add doc.retokenize() context manager ( #2172 )
...
This patch takes a step towards #1487 by introducing the
doc.retokenize() context manager, to handle merging spans, and soon
splitting tokens.
The idea is to do merging and splitting like this:
with doc.retokenize() as retokenizer:
for start, end, label in matches:
retokenizer.merge(doc[start : end], attrs={'ent_type': label})
The retokenizer accumulates the merge requests, and applies them
together at the end of the block. This will allow retokenization to be
more efficient, and much less error prone.
A retokenizer.split() function will then be added, to handle splitting a
single token into multiple tokens. These methods take `Span` and `Token`
objects; if the user wants to go directly from offsets, they can append
to the .merges and .splits lists on the retokenizer.
The doc.merge() method's behaviour remains unchanged, so this patch
should be 100% backwards incompatible (modulo bugs). Internally,
doc.merge() fixes up the arguments (to handle the various deprecated styles),
opens the retokenizer, and makes the single merge.
We can later start making deprecation warnings on direct calls to doc.merge(),
to migrate people to use of the retokenize context manager.
2018-04-03 14:10:35 +02:00
Suraj Rajan
1cdbb7c97c
[2032] - Changed python set to cpp stl set ( #2170 )
...
Changed python set to cpp stl set #2032
## Description
Changed python set to cpp stl set. CPP stl set works better due to the logarithmic run time of its methods. Finding minimum in the cpp set is done in constant time as opposed to the worst case linear runtime of python set. Operations such as find,count,insert,delete are also done in either constant and logarithmic time thus making cpp set a better option to manage vectors.
Reference : http://www.cplusplus.com/reference/set/set/
### Types of change
Enhancement for `Vectors` for faster initialising of word vectors(fasttext)
2018-03-31 13:28:25 +02:00
Matthew Honnibal
f3b7c5e537
Fix syntax error
2018-03-29 21:50:32 +02:00
Matthew Honnibal
23afa6429f
Add input length error, to address #1826
2018-03-29 21:45:26 +02:00
Ines Montani
a609a1ca29
Merge pull request #2152 from explosion/feature/tidy-up-dependencies
...
💫 Tidy up dependencies
2018-03-29 14:35:09 +02:00
Viet Trung Tran
ea2af94cd9
Add support for Vietnamese in spaCy by leveraging Pyvi, an external Vietnamese tokenizer ( #2155 )
...
* support for Vietnamese
* Contributor Agreement for adding Vietnamese support on spaCy
2018-03-29 12:19:51 +02:00
ines
e6979bdbbd
Merge branch 'feature/tidy-up-dependencies' of https://github.com/explosion/spaCy into feature/tidy-up-dependencies
2018-03-29 00:19:37 +02:00
ines
83146458a2
Fix urllib for Python 3
2018-03-29 00:19:33 +02:00
Matthew Honnibal
8308bbc617
Get msgpack and msgpack_numpy via Thinc, to avoid potential version conflicts
2018-03-29 00:14:55 +02:00
Matthew Honnibal
b5098079d8
Fix error on urllib
2018-03-29 00:08:16 +02:00
Ines Montani
0de599b16b
Merge pull request #2159 from explosion/feature/fix-merged-entity-iob ( resolves #1554 , resolves #1752 )
...
💫 Fix token.ent_iob after doc.merge(), and ensure consistency in doc.ents
2018-03-28 23:10:00 +02:00
Ines Montani
98e9cda677
Merge pull request #2158 from explosion/feature/fix-multiple-vectors ( resolves #1660 )
...
💫 Fix loading of multiple vector models
2018-03-28 23:08:24 +02:00
Matthew Honnibal
a7c5ae2beb
Avoid forcing a name on empty vectors, and remove print statement
2018-03-28 21:08:58 +02:00
ines
3eb67bbe4b
Allow entity types with dashes ( resolves #1967 )
2018-03-28 20:51:26 +02:00
Matthew Honnibal
cf5fcf0546
Update serialization test
2018-03-28 20:12:53 +02:00
Matthew Honnibal
4555e3e251
Dont assume pretrained_vectors cfg set in build_tagger
2018-03-28 20:12:45 +02:00
Matthew Honnibal
0b375d50c8
Fix ent_iob tags in doc.merge to avoid inconsistent sequences
2018-03-28 18:39:03 +02:00
Matthew Honnibal
95fa89c4b8
Update doc.ents test
2018-03-28 18:39:03 +02:00
Matthew Honnibal
e807f88410
Resolve merge when cherry-picking ent iob patches from develop
2018-03-28 18:38:13 +02:00
Matthew Honnibal
99fbc7db33
Improve error message when entity sequence is inconsistent
2018-03-28 18:36:53 +02:00
Matthew Honnibal
cbd2794be0
Add test for ent_iob during span merge
2018-03-28 18:36:53 +02:00
Matthew Honnibal
f8dd905a24
Warn and fallback if vectors have no name
2018-03-28 18:24:53 +02:00
Matthew Honnibal
fd9e259414
Add test for #1660
2018-03-28 18:22:51 +02:00
Matthew Honnibal
bc4afa9881
Remove print statement
2018-03-28 17:48:37 +02:00
Matthew Honnibal
79dc241caa
Set pretrained_vectors in parser cfg
2018-03-28 17:35:07 +02:00
Matthew Honnibal
17c3e7efa2
Add message noting vectors
2018-03-28 16:33:43 +02:00
Matthew Honnibal
9bf6e93b3e
Set pretrained_vectors in begin_training
2018-03-28 16:32:41 +02:00
Matthew Honnibal
95a9615221
Fix loading of multiple pre-trained vectors
...
This patch addresses #1660 , which was caused by keying all pre-trained
vectors with the same ID when telling Thinc how to refer to them. This
meant that if multiple models were loaded that had pre-trained vectors,
errors or incorrect behaviour resulted.
The vectors class now includes a .name attribute, which defaults to:
{nlp.meta['lang']_nlp.meta['name']}.vectors
The vectors name is set in the cfg of the pipeline components under the
key pretrained_vectors. This replaces the previous cfg key
pretrained_dims.
In order to make existing models compatible with this change, we check
for the pretrained_dims key when loading models in from_disk and
from_bytes, and add the cfg key pretrained_vectors if we find it.
2018-03-28 16:02:59 +02:00
ines
7fbc9e5874
Replace requests with urllib
2018-03-28 12:46:07 +02:00
ines
da1f200362
Add compat helpers for urllib
2018-03-28 12:45:53 +02:00
ines
ac88c72c9a
Fix ftfy workaround and remove old import
2018-03-28 12:14:28 +02:00
ines
ce6071ca89
Remove ftfy dependency and update docs
2018-03-28 12:09:42 +02:00
Matthew Honnibal
070b6c6495
Remove dependency on ftfy
2018-03-28 12:07:02 +02:00
ines
6d2c85f428
Drop six and related hacks as a dependency
2018-03-28 10:45:25 +02:00
ines
9e83513004
Add position of invalid token to error message
2018-03-27 23:56:59 +02:00
ines
11c4735ccf
Fix issue in Italian lemmatizer data ( resolves #2050 )
2018-03-27 23:55:22 +02:00
ines
693971dd8f
Improve error message if token text is empty string (see #2101 )
2018-03-27 22:25:40 +02:00
ines
0c829e6605
Fix whitespace
2018-03-27 22:20:59 +02:00
Matthew Honnibal
d4680e4d83
Merge branch 'master' of https://github.com/explosion/spaCy
2018-03-27 13:36:37 +02:00
Matthew Honnibal
63a267b34d
Fix #2073 : Token.set_extension not working
2018-03-27 13:36:20 +02:00
Ines Montani
68226109f4
Merge pull request #2142 from jimregan/polish-more-tokens
...
more exceptions
2018-03-24 19:06:44 +01:00
Matthew Honnibal
d566e673bf
Set version to v2.0.10
2018-03-24 18:09:03 +01:00
Matthew Honnibal
0d3bf0d4eb
Merge branch 'master' of https://github.com/explosion/spaCy
2018-03-24 17:31:49 +01:00
dejanmarich
ccd1c04c63
Update stop_words.py
...
Added more words
2018-03-24 17:31:24 +01:00
ines
f1446b0257
Port over Turkish changes
2018-03-24 17:31:07 +01:00
DuyguA
cd604878a4
quick typo fix
2018-03-24 17:26:35 +01:00
Matthew Honnibal
406548b976
Support .gz and .tar.gz files in spacy init-model
2018-03-24 17:18:32 +01:00
Jim O'Regan
efe037e8be
more exceptions
2018-03-24 00:05:27 +00:00
Matthew Honnibal
e3be3d65b3
Version as v2.0.10.dev0
2018-03-15 17:31:22 +01:00
ines
f3f8bfc367
Add built-in factories for merge_entities and merge_noun_chunks
...
Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')).
2018-03-15 17:16:54 +01:00
alldefector
f4e5904fc2
Fix Spanish noun_chunks failure caused by typo
2018-03-14 17:03:17 +01:00
Thomas Opsomer
fbf48b3f9f
lemma property to return hash instead of unicode
2018-03-14 17:03:00 +01:00
Matthew Honnibal
8cefc58abc
Fix Vectors pickling
2018-03-14 16:59:37 +01:00
Matthew Honnibal
307aefe131
Increment version to v2.0.9
2018-02-22 17:07:53 +01:00
Ines Montani
14e7e0f12a
Merge pull request #2000 from jimregan/polish-tag-map
...
Polish tag map
2018-02-18 19:05:58 +01:00
Jim O'Regan
664407de5d
missing PrepCase attribute
2018-02-18 14:46:12 +00:00
Jim O'Regan
95f0673fbc
fix typo/missing here too
2018-02-18 14:38:27 +00:00
Matthew Honnibal
cf0e320f2b
Add doc.is_sentenced attribute, re #1959
2018-02-18 14:16:55 +01:00
Matthew Honnibal
1e5aeb4eec
Merge pull request #1987 from thomasopsomer/span-sent
...
Make span.sent work when only manual / custom sbd
2018-02-18 14:05:37 +01:00
Matthew Honnibal
1cf774bdc1
Add output options return_matches and as_tuples to Matcher
2018-02-18 14:00:45 +01:00
Matthew Honnibal
dd9b0945af
Fix inconsistencies in the symbols table
2018-02-18 13:51:31 +01:00
Matthew Honnibal
66496ac8e1
Set version to v2.1.0.dev0
2018-02-18 13:48:39 +01:00
Matthew Honnibal
eb3040ce46
Merge pull request #1891 from fucking-signup/master
...
Fix issue #1889
2018-02-18 13:47:47 +01:00
ines
6bba1db4cc
Drop six and related hacks as a dependency
2018-02-18 13:29:56 +01:00
Matthew Honnibal
b30b09192a
Merge pull request #1665 from jimregan/animacy
...
typo in "inan", add "nhum"
2018-02-18 13:26:53 +01:00
Matthew Honnibal
1b3c98e01b
Set version to v2.0.8
2018-02-18 12:16:31 +01:00
Matthew Honnibal
f9f46e5a07
Revert matcher fixes from GregDubbin
2018-02-18 10:59:28 +01:00
Matthew Honnibal
86405e4ad1
Fix CLI for multitask objectives
2018-02-18 10:59:11 +01:00
Matthew Honnibal
a34749b2bf
Add multitask objectives options to train CLI
2018-02-17 22:03:54 +01:00
Matthew Honnibal
8f06903e09
Fix multitask objectives
2018-02-17 18:41:36 +01:00
Matthew Honnibal
d1246c95fb
Fix model loading when using multitask objectives
2018-02-17 18:11:36 +01:00
Matthew Honnibal
262d0a3148
Fix overwriting of lexical attributes when loading vectors during training
2018-02-17 18:11:11 +01:00
Matthew Honnibal
c0caf7cf27
Fix LANG symbol
2018-02-17 18:10:50 +01:00
Matthew Honnibal
0bf2f6be29
Add missing symbol for LANG attr. Fixes inconsistent numeric ID
2018-02-17 17:37:02 +01:00
Matthew Honnibal
97a228a4ce
Increment to v2.0.8.dev0
2018-02-17 16:54:36 +01:00
Aaron Marquez
ea571e8325
Merge branch 'master' into issue-1959
2018-02-16 15:14:09 -08:00
Matthew Honnibal
7d5c720fc3
Fix multitask objective when no pipeline provided
2018-02-15 23:50:21 +01:00
Aaron Marquez
f0d3672e17
Changed loading EN model
2018-02-15 14:28:38 -08:00
Aaron Marquez
3765d84d57
Fix issue #1959
2018-02-15 12:51:49 -08:00
Aaron Marquez
7ba4111554
Add test for issue-1959
2018-02-15 12:46:22 -08:00
Matthew Honnibal
59b7cf9db8
Add get_beam_parse method in ArcEager, for Prodigy
2018-02-15 21:03:16 +01:00
Matthew Honnibal
3e541de440
Merge branch 'master' of https://github.com/explosion/spaCy
2018-02-15 21:02:55 +01:00
Thomas Opsomer
5d24a81c0b
add test for span.sent when doc not parsed
2018-02-15 16:59:16 +01:00
Thomas Opsomer
deab391cbf
correct check on sent_start & raise if no boundaries
2018-02-15 16:58:30 +01:00
Matthew Honnibal
4cb861e080
Merge pull request #1968 from DuyguA/is_currency
...
New lexical feature is_currency
2018-02-15 12:13:36 +01:00
Thomas Opsomer
b902731313
Find span sentence when only sentence boundaries (no parser)
2018-02-14 22:18:54 +01:00
Claudiu-Vlad Ursache
e28de12cbd
Ensure files opened in from_disk
are closed
...
Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706 ).
2018-02-13 20:49:43 +01:00
Johannes Dollinger
012e874d09
Add contributor agreement for emulbreh
2018-02-13 13:40:33 +01:00
Johannes Dollinger
bf94c13382
Don't fix random seeds on import
2018-02-13 12:42:23 +01:00
Matthew Honnibal
d7c9b53120
Pass kwargs into pipeline components during begin_training
2018-02-12 10:18:39 +01:00
4altinok
ca8728035d
added new lex feat to token
2018-02-11 18:55:48 +01:00
4altinok
edd7202a06
added new symbol
2018-02-11 18:55:32 +01:00
4altinok
ed1ac2969e
added new lexical feat to lexeme
2018-02-11 18:51:48 +01:00
4altinok
94fb0b75e3
code for is_currency
2018-02-11 18:51:32 +01:00
4altinok
3deef1497a
removed 18 and replaced 18 with is_currency
2018-02-11 18:51:09 +01:00
4altinok
471d3c9e23
added lex test for is_currency
2018-02-11 18:50:50 +01:00
ines
c63e99da8a
Fix typo in glossary ( resolves #1964 )
...
Co-Authored-By: SThomasP <sthomasp@users.noreply.github.com>
2018-02-10 11:58:41 +01:00
Lyndon White
6ee5dff51c
Make python 3.4 compat module loading ( fix #1733 )
2018-02-09 23:03:35 +08:00
Matthew Honnibal
e361b4f82b
Fix #1929 : Incorrect NER when pre-set sentence boundaries.
2018-02-08 15:25:41 +01:00
Matthew Honnibal
fd9fd275c5
Make test for #1945 more precise
2018-02-07 02:06:11 +01:00
Matthew Honnibal
c087a14380
Merge branch 'master' of https://github.com/explosion/spaCy
2018-02-07 01:29:39 +01:00
Matthew Honnibal
76d89b2180
Add test for #1945 : PhraseMatcher regression
2018-02-07 01:29:23 +01:00
Ines Montani
0954e15dda
Merge pull request #1913 from ohenrik/nb_syntax_iterator
...
Norwegian Language (nb) - Added french syntax iterator with explanation
2018-02-06 04:59:07 +01:00
Ole Henrik Skogstrøm
251a7805fe
Copied French syntax iterator to simplify future changes
2018-02-05 14:45:05 +01:00
Matthew Honnibal
2e7391e627
Merge pull request #1916 from tokestermw/bug/fix-not-passing-in-model-cfg-in-nlp
...
Bug/fix not passing in model cfg in nlp
2018-02-05 01:19:40 +01:00
Ali Zarezade
9df9da34a3
Fix init_model issue
...
Fixing issue #1928
2018-02-03 17:21:34 +03:30
Matthew Honnibal
ebe84e45e5
Increment version to 2.0.7
2018-02-02 03:39:16 +01:00
Matthew Honnibal
e4b1f57599
Increment version
2018-02-02 02:33:23 +01:00
Matthew Honnibal
069531c351
Merge branch 'master' of https://github.com/explosion/spaCy
2018-02-02 02:32:58 +01:00
Matthew Honnibal
f74a802d09
Test and fix #1919 : Error resuming training
2018-02-02 02:32:40 +01:00
ines
f1d3deffac
Add Russian example sentences (see #1107 )
2018-02-01 20:09:40 +01:00
Matthew Honnibal
6b1126c312
Merge branch 'master' of https://github.com/explosion/spaCy
2018-02-01 02:57:52 +01:00
ines
3c1fb9d02d
Make validate command fail more gracefully if version not found
...
Mostly relevant during develoment when working with .dev versions
2018-01-31 22:06:28 +01:00
Motoki Wu
54062b7326
added tests for issue #1915
2018-01-30 18:30:19 -08:00
Motoki Wu
f4a7d1a423
make to sure pass in **cfg to each component when training
2018-01-30 18:29:54 -08:00
ines
4046823699
Only check component in factories if string (see #1911 )
2018-01-30 16:29:07 +01:00
ines
ce10d320c4
Fix component check in self.factories (see #1911 )
2018-01-30 16:09:37 +01:00
Ole Henrik Skogstrøm
e40465487c
Added french syntax iterator with explenation
2018-01-30 15:44:29 +01:00
ines
8901814248
Improve error handling if pipeline component is not callable ( resolves #1911 )
...
Also add help message if user accidentally calls nlp.add_pipe() with a string of a built-in component name.
2018-01-30 15:43:03 +01:00
Matthew Honnibal
a437ba87a3
Set release=True
2018-01-29 21:26:04 +01:00
Adam Binford
9238749aaf
Removed test to avoid network requests
2018-01-29 14:48:20 -05:00
Adam Binford
1a2c2f7d7f
Fixed auto linking after download and added simple test to check
2018-01-29 14:25:21 -05:00
Matthew Honnibal
cb7110c22e
Merge pull request #1882 from ohenrik/nb_lemma_and_tag_map
...
Add norwegian bokmål ('nb') lemmatizer and tag_map
2018-01-29 18:18:50 +01:00
Matthew Honnibal
0c1e7f0c86
Merge pull request #1893 from azarezade/master
...
Add Persian language
2018-01-29 18:18:33 +01:00
Matthew Honnibal
cbdab75b36
Increment version
2018-01-28 23:46:22 +01:00
Matthew Honnibal
512e6adb08
Merge pull request #1896 from thomasopsomer/fix-sent
...
Fix sentence boundaries serialization (issue #1834 )
2018-01-28 21:18:51 +01:00
Matthew Honnibal
f5b1ad4100
Limit parser model size, to hopefully reduce memory during CI tests
2018-01-28 21:00:32 +01:00
Thomas Opsomer
515e25910e
fix sent_start in serialization
2018-01-28 19:50:42 +01:00
Thomas Opsomer
45d62561f7
add test for the issue
2018-01-28 19:49:56 +01:00
ines
6d978e5c35
Don't use deprecated Doc.merge call in displaCy
...
As reported here: https://stackoverflow.com/a/48464412/6400719
2018-01-27 11:25:05 +01:00
Ali Zarezade
bb6bd3d8ae
add persian language
2018-01-27 13:27:26 +03:30
Ali Zarezade
d195675db5
add persian language
2018-01-27 13:21:38 +03:30
Kit
4b42267ba3
Fix issue #1889
2018-01-25 23:17:22 +01:00
Kit
52ef51f36e
Add test for issue #1889
2018-01-25 22:56:48 +01:00
Ole Henrik Skogstrøm
8e2c9f2475
Cleaned up nb tag_map comments
2018-01-25 11:09:28 +01:00
Ole Henrik Skogstrøm
1107e89fcf
Updated doc string on nb tag_map module
2018-01-25 11:08:28 +01:00
Matthew Honnibal
6a8cb905aa
Merge pull request #1876 from GregDubbin/master
...
Pattern matcher fixes
2018-01-24 16:38:11 +01:00
Matthew Honnibal
38b260e0c3
Merge pull request #1879 from azarezade/master
...
Add Persian character and symbols
2018-01-24 16:34:22 +01:00
Matthew Honnibal
edb71a280e
Add test for #1883 : Unpickling Matcher
2018-01-24 15:42:33 +01:00
Matthew Honnibal
2ad050e668
Fix unpickling of Matcher. Also store correct data in matcher._patterns
2018-01-24 15:42:11 +01:00
Ole Henrik Skogstrøm
4058a7d579
Fix æøå characters in lemmatizer
2018-01-24 14:03:14 +01:00
Ole Henrik Skogstrøm
42248f423f
Updated tag map
2018-01-24 13:50:33 +01:00
Ole Henrik Skogstrøm
74b430b49a
Correct Lemmatizer
2018-01-24 13:26:33 +01:00
Ole Henrik Skogstrøm
b9b3a40c78
Add norwegian lemmatizer and tag_map
2018-01-24 12:28:29 +01:00
Matthew Honnibal
42a18ef903
Add test for #1868 : Vocab.__contains__ with ints
2018-01-23 23:27:05 +01:00
Matthew Honnibal
43f381ce36
Make Vocab.__contains__ work with ints. Fixes #1868
2018-01-23 23:26:47 +01:00
greg
85ab99e692
Correct test examples
2018-01-23 15:00:14 -05:00
greg
f50bb1aafc
Restructure StateC to eliminate dependency on unordered_map
2018-01-23 14:40:03 -05:00
Matthew Honnibal
f3753c2453
Further model deserialization fixes re #1727
2018-01-23 19:16:05 +01:00
Matthew Honnibal
91e916cb67
Add comment to new test
2018-01-23 19:11:53 +01:00
Matthew Honnibal
fd187d71ad
Add test for #1727
2018-01-23 19:11:01 +01:00
Matthew Honnibal
85c942a6e3
Dont overwrite pretrained_dims setting from cfg. Fixes #1727
2018-01-23 19:10:49 +01:00
Ali Zarezade
42349471bc
add ٪ as punctuation
2018-01-23 18:11:33 +03:30
Ali Zarezade
2bda582135
Add Persian character and symbols
...
Add Persian characters and the following:
- ٪ used instead of %
- ؟ used instead of ?
- ﷼ used instead of $
- ، used instead of ,
- ؛ used instead of ;
2018-01-23 13:20:36 +03:30
Matthew Honnibal
7e6dc283db
Fix unicode import in test
2018-01-22 23:55:44 +01:00
greg
686735b94e
Fix matcher import
2018-01-22 16:53:05 -05:00
greg
3a491093ee
Import libcpp.map if libcpp.unordered_map doesn't exist
2018-01-22 16:46:25 -05:00
greg
d55992bdf0
Switch match dictionary to use final state pointer rather than ID
2018-01-22 15:36:47 -05:00
Matthew Honnibal
4ce7d24fd5
Add test for #1799 : Set left and right edges (and thus sentences) in non-projective parses.
2018-01-22 20:18:38 +01:00
Matthew Honnibal
56164ab688
Set l_edge and r_edge correctly for non-projective parses. Fixes #1799
2018-01-22 20:18:04 +01:00
Matthew Honnibal
964aa1b384
Merge branch 'master' of https://github.com/explosion/spaCy
2018-01-22 19:18:46 +01:00
Matthew Honnibal
29897ed1b3
Allow vector loading to work on 1d data files. Fixes #1831
2018-01-22 19:18:26 +01:00
greg
490bc82c27
Add comments clarifying matcher logic for '*'
2018-01-22 10:03:12 -05:00
Matthew Honnibal
fe4748fc38
Merge pull request #1870 from avadhpatel/master
...
Model Load Performance Improvement by more than 5x
2018-01-22 00:05:15 +01:00
Avadh Patel
a517df55c8
Small fix
...
Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-21 15:20:45 -06:00
Avadh Patel
5b5029890d
Merge branch 'perfTuning' into perfTuningMaster
...
Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-21 15:20:00 -06:00
Matthew Honnibal
203d2ea830
Allow multitask objectives to be added to the parser and NER more easily
2018-01-21 19:37:02 +01:00
Matthew Honnibal
4a7d524efb
Merge branch 'master' of https://github.com/explosion/spaCy
2018-01-21 19:22:03 +01:00
Matthew Honnibal
61a051f2c0
Fix MultitaskObjective
2018-01-21 19:21:34 +01:00
Avadh Patel
75903949da
Updated model building after suggestion from Matthew
...
Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-18 06:51:57 -06:00
Avadh Patel
fe879da2a1
Do not train model if its going to be loaded from disk
...
This saves significant time in loading a model from disk.
Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-17 06:16:07 -06:00
Avadh Patel
2146faffee
Do not train model if its going to be loaded from disk
...
This saves significant time in loading a model from disk.
Signed-off-by: Avadh Patel <avadh4all@gmail.com>
2018-01-17 06:04:22 -06:00
greg
7072b395c9
Add greedy matcher tests
2018-01-16 15:46:13 -05:00
greg
441f490c1c
Merge branch 'master' of github.com:GregDubbin/spaCy
2018-01-16 13:31:10 -05:00
greg
8bea62f26e
Correct bugs for greedy matching and introduce ADVANCE_PLUS action
2018-01-16 13:21:43 -05:00
Matthew Honnibal
ccb51a9f36
Make .similarity() return 1.0 if all orth attrs match
2018-01-15 16:29:48 +01:00
Matthew Honnibal
82135d85b7
Fix test
2018-01-15 15:55:15 +01:00
Matthew Honnibal
4b09616b58
Add test for #1757 : Comparison against None
2018-01-15 15:55:01 +01:00
Matthew Honnibal
b904d81e9a
Fix rich comparison against None objects. Closes #1757
2018-01-15 15:51:25 +01:00
Matthew Honnibal
9e413449f6
Fix unicode error in new test
2018-01-15 15:39:00 +01:00
Matthew Honnibal
ab7c45b12d
Fix error message and handling of doc.sents
2018-01-15 15:21:11 +01:00
Matthew Honnibal
6b215d2dd3
Add test for Issue #1537
2018-01-15 15:20:56 +01:00
ines
5babb7d6f6
Merge branch 'master' of https://github.com/explosion/spaCy
2018-01-14 17:31:09 +01:00
ines
793890cb4d
Remove test for removed deprecation warning
2018-01-14 17:31:06 +01:00
Matthew Honnibal
465a6f6452
Add missing Span.vocab property. Closes #1633
2018-01-14 15:06:30 +01:00
Matthew Honnibal
0cb090e526
Fix infinite recursion in token.sent_start. Closes #1640
2018-01-14 15:02:15 +01:00
Matthew Honnibal
5cbe913b6f
Don't raise deprecation warning in property. Closes #1813 , #1712
2018-01-14 14:55:58 +01:00
Matthew Honnibal
1a1cca6052
Fix vectors.resize() on Py3. Closes #1539
2018-01-14 14:48:51 +01:00
Matthew Honnibal
0153220304
Make set_vector add word to vocab. Fixes #1807
2018-01-14 13:57:57 +01:00
Ines Montani
55754f0cee
Merge pull request #1836 from fucking-signup/master
...
Add tests for issue #1769
2018-01-13 00:23:35 +00:00
Kit
4ee97f20a0
Mark like_num tests as slow
2018-01-13 00:44:15 +01:00
Kit
855531537e
Rewrite tests for issue #1769
2018-01-12 23:49:51 +01:00
Kit
5b541cb5ec
Simplify tests for issue #1769
2018-01-12 23:34:27 +01:00
Kit
7a2adc4633
Remove some tests to see build status changes
2018-01-12 22:49:16 +01:00
Kit
0e62809a43
Rewrite tests for issue #1769
2018-01-12 22:26:06 +01:00
Ines Montani
36f426fe0a
Merge pull request #1808 from fucking-signup/master
...
Fix issue #1769
2018-01-12 21:12:02 +00:00
Kit
76f4eeca44
Remove tests to see build changes on Windows (Python 2.7)
2018-01-12 20:30:51 +01:00
Matthew Honnibal
7ca49c2061
Merge branch 'master' into feature-improve-model-download
2018-01-10 18:21:55 +01:00
Kit
7ec0956e8d
Add regression test (issue #1769 )
2018-01-08 03:42:04 +01:00
Kit
701e7cc6aa
Rename variable to keep code consistent
2018-01-08 03:38:44 +01:00
Kit
ed0db95183
Find lowercased forms of ordinal words, where possible
2018-01-08 03:28:50 +01:00
Kit
9bc524982e
Find lowercased forms of numeric words
2018-01-08 03:25:08 +01:00
Søren Lind Kristiansen
62de5da1ff
Remove unsused dummy variable
2018-01-05 09:57:24 +01:00
Søren Lind Kristiansen
10dab8eef8
Remove dummy variable from function calls
2018-01-05 09:37:05 +01:00
Søren Lind Kristiansen
7f0ab145e9
Don't pass CLI command name as dummy argument
2018-01-04 21:33:47 +01:00
Ines Montani
6a008233b5
Merge pull request #1795 from textioHQ/issue1758 ( resolves #1758 )
...
english tokenizer: handle "would've"
2018-01-04 02:43:39 +00:00
Kevin Humphreys
597df5bf83
add test
2018-01-03 13:00:05 -08:00
Kevin Humphreys
7918fa4ef9
handle would've
2018-01-03 12:25:48 -08:00
ines
2c656f90fb
Exit with 1 if incompatible models found (see #1714 )
2018-01-03 21:20:35 +01:00
ines
dacfaa2ca4
Ensure that download command exits properly ( resolves #1714 )
2018-01-03 21:03:36 +01:00
Søren Lind Kristiansen
a9ff6eadc9
Prefix dummy argument names with underscore
2018-01-03 20:48:12 +01:00
ines
1081e08efb
Fix formatting
2018-01-03 20:14:50 +01:00
ines
d8109964d6
Use --no-deps on model install
...
In general, it's nice for models to specify spaCy as a dependency. However, this tends to cause problems in conda environments, as pip will re-install spaCy and its dependencies (especially Thinc)
2018-01-03 17:40:37 +01:00
ines
319d754309
Fix overwriting of existing symlinks
...
Check for is_symlink() to also overwrite invalid and outdated symlinks. Also show better error message if link path exists but is not symlink (i.e. file or directory).
2018-01-03 17:39:36 +01:00
ines
8ba0dfd017
Make message on failed linking more clear
2018-01-03 17:38:09 +01:00
Søren Lind Kristiansen
d6327e8495
Fix handling case when vectors not specified
2018-01-03 12:20:49 +01:00
Søren Lind Kristiansen
bcc51d7d8b
Fix shifted positional arguments
2018-01-03 12:19:47 +01:00
zqhZY
f27859fa99
add ChineseDefaults class for pickling
2017-12-28 17:13:58 +08:00
Ines Montani
ff9fc945ab
Merge pull request #1749 from sorenlind/da_ud_tokenization
...
Tune Danish tokenizer to more closely match Universal Dependencies
2017-12-22 16:00:49 +00:00
ines
26f313dabc
Fix missing import
2017-12-22 16:21:44 +01:00
ines
8dc1c27841
Merge branch 'master' of https://github.com/explosion/spaCy
2017-12-22 16:01:00 +01:00
ines
b10ba848b8
xfail test that causes MemoryError on Python 2 on Windows
...
Need to investigate this further!
2017-12-22 16:00:58 +01:00
Søren Lind Kristiansen
bef735aef7
Fix Danish abbreviation 'm.h.t.'
2017-12-21 09:24:31 +01:00
Ines Montani
a3dd167d7f
Merge branch 'master' into da_ud_tokenization
2017-12-20 21:05:34 +00:00
Ines Montani
97f100f69f
Merge pull request #1742 from kimfalk/master
...
Two corrections in the da lan.
2017-12-20 21:02:00 +00:00
Ines Montani
d682a8803e
Merge pull request #1672 from cbilgili/master
...
Adds Turkish Lemmatization
2017-12-20 21:01:00 +00:00
Benjamin Peterson
9452134cd1
remove no-break spaces from Hindi example ( fixes #1750 )
2017-12-20 11:35:30 -08:00
Søren Lind Kristiansen
7a2f2f6f94
Fix formatting.
2017-12-20 18:37:37 +01:00
Søren Lind Kristiansen
15d13efafd
Tune Danish tokenizer to more closely match tokenization in Universal Dependencies.
2017-12-20 17:36:52 +01:00
Kim FalkJørgensen
648dc60755
Remove the incorrect exception 'm.h.t'
2017-12-20 10:02:39 +01:00
Kim FalkJørgensen
9c9f4ef84a
Fixing a translation error in examples.py
...
Adding an exception in the tokenizer_exceptions.py
2017-12-19 15:26:50 +01:00
ines
22dc744b48
Fix check for '@' in like_url (see #1715 )
2017-12-16 13:48:43 +01:00
Ines Montani
9c1ee65268
Add regression test for #1698
2017-12-12 10:36:11 +01:00
Ines Montani
6455b574fc
Check for email address first
2017-12-12 10:25:13 +01:00
Bri-Will
d77361d76c
Update lex_attrs.py. Fix like_url from matching on e-mail
2017-12-11 14:13:28 -08:00
Søren Lind Kristiansen
5a9d377580
Remove abbreviation for positional plac argument
2017-12-11 11:08:29 +01:00
Isaac Sijaranamual
38021fbb00
Switch from python 3 only TemporaryDirectory to pytest's tmpdir
2017-12-11 00:16:04 +01:00
Isaac Sijaranamual
20ae0c459a
Fixes "Error saving model" #1622
2017-12-10 23:07:13 +01:00