Suraj Rajan
1cdbb7c97c
[2032] - Changed python set to cpp stl set ( #2170 )
...
Changed python set to cpp stl set #2032
## Description
Changed python set to cpp stl set. CPP stl set works better due to the logarithmic run time of its methods. Finding minimum in the cpp set is done in constant time as opposed to the worst case linear runtime of python set. Operations such as find,count,insert,delete are also done in either constant and logarithmic time thus making cpp set a better option to manage vectors.
Reference : http://www.cplusplus.com/reference/set/set/
### Types of change
Enhancement for `Vectors` for faster initialising of word vectors(fasttext)
2018-03-31 13:28:25 +02:00
Katrin Leinweber
6f84e32253
Formalise citation info ( #2167 )
...
* Create CITATION file
* Add Katrinleinweber contributor agreement
2018-03-30 10:34:14 +02:00
Matthew Honnibal
f3b7c5e537
Fix syntax error
2018-03-29 21:50:32 +02:00
Matthew Honnibal
23afa6429f
Add input length error, to address #1826
2018-03-29 21:45:26 +02:00
Matthew Honnibal
cca7e7ad11
Merge branch 'master' of https://github.com/explosion/spaCy
2018-03-29 20:27:06 +02:00
Matthew Honnibal
68ad366935
Improve train_new_entity_type example
2018-03-29 20:26:41 +02:00
Ines Montani
a609a1ca29
Merge pull request #2152 from explosion/feature/tidy-up-dependencies
...
💫 Tidy up dependencies
2018-03-29 14:35:09 +02:00
Viet Trung Tran
ea2af94cd9
Add support for Vietnamese in spaCy by leveraging Pyvi, an external Vietnamese tokenizer ( #2155 )
...
* support for Vietnamese
* Contributor Agreement for adding Vietnamese support on spaCy
2018-03-29 12:19:51 +02:00
ines
e6979bdbbd
Merge branch 'feature/tidy-up-dependencies' of https://github.com/explosion/spaCy into feature/tidy-up-dependencies
2018-03-29 00:19:37 +02:00
ines
83146458a2
Fix urllib for Python 3
2018-03-29 00:19:33 +02:00
Matthew Honnibal
8308bbc617
Get msgpack and msgpack_numpy via Thinc, to avoid potential version conflicts
2018-03-29 00:14:55 +02:00
Matthew Honnibal
b5098079d8
Fix error on urllib
2018-03-29 00:08:16 +02:00
Ines Montani
0de599b16b
Merge pull request #2159 from explosion/feature/fix-merged-entity-iob ( resolves #1554 , resolves #1752 )
...
💫 Fix token.ent_iob after doc.merge(), and ensure consistency in doc.ents
2018-03-28 23:10:00 +02:00
Ines Montani
98e9cda677
Merge pull request #2158 from explosion/feature/fix-multiple-vectors ( resolves #1660 )
...
💫 Fix loading of multiple vector models
2018-03-28 23:08:24 +02:00
Matthew Honnibal
a7c5ae2beb
Avoid forcing a name on empty vectors, and remove print statement
2018-03-28 21:08:58 +02:00
ines
3eb67bbe4b
Allow entity types with dashes ( resolves #1967 )
2018-03-28 20:51:26 +02:00
Matthew Honnibal
cf5fcf0546
Update serialization test
2018-03-28 20:12:53 +02:00
Matthew Honnibal
4555e3e251
Dont assume pretrained_vectors cfg set in build_tagger
2018-03-28 20:12:45 +02:00
ines
9615ed5ed7
Update emoji/hashtag matcher example ( resolves #2156 ) [ci skip]
2018-03-28 18:41:28 +02:00
Matthew Honnibal
0b375d50c8
Fix ent_iob tags in doc.merge to avoid inconsistent sequences
2018-03-28 18:39:03 +02:00
Matthew Honnibal
95fa89c4b8
Update doc.ents test
2018-03-28 18:39:03 +02:00
Matthew Honnibal
e807f88410
Resolve merge when cherry-picking ent iob patches from develop
2018-03-28 18:38:13 +02:00
Matthew Honnibal
99fbc7db33
Improve error message when entity sequence is inconsistent
2018-03-28 18:36:53 +02:00
Matthew Honnibal
cbd2794be0
Add test for ent_iob during span merge
2018-03-28 18:36:53 +02:00
Matthew Honnibal
f8dd905a24
Warn and fallback if vectors have no name
2018-03-28 18:24:53 +02:00
Matthew Honnibal
fd9e259414
Add test for #1660
2018-03-28 18:22:51 +02:00
Matthew Honnibal
bc4afa9881
Remove print statement
2018-03-28 17:48:37 +02:00
Matthew Honnibal
79dc241caa
Set pretrained_vectors in parser cfg
2018-03-28 17:35:07 +02:00
Matthew Honnibal
17c3e7efa2
Add message noting vectors
2018-03-28 16:33:43 +02:00
Matthew Honnibal
9bf6e93b3e
Set pretrained_vectors in begin_training
2018-03-28 16:32:41 +02:00
Matthew Honnibal
95a9615221
Fix loading of multiple pre-trained vectors
...
This patch addresses #1660 , which was caused by keying all pre-trained
vectors with the same ID when telling Thinc how to refer to them. This
meant that if multiple models were loaded that had pre-trained vectors,
errors or incorrect behaviour resulted.
The vectors class now includes a .name attribute, which defaults to:
{nlp.meta['lang']_nlp.meta['name']}.vectors
The vectors name is set in the cfg of the pipeline components under the
key pretrained_vectors. This replaces the previous cfg key
pretrained_dims.
In order to make existing models compatible with this change, we check
for the pretrained_dims key when loading models in from_disk and
from_bytes, and add the cfg key pretrained_vectors if we find it.
2018-03-28 16:02:59 +02:00
ines
07b8c255a5
Updatee example with note to install requests
2018-03-28 12:46:27 +02:00
ines
366c98a94b
Remove requests dependency
2018-03-28 12:46:18 +02:00
ines
7fbc9e5874
Replace requests with urllib
2018-03-28 12:46:07 +02:00
ines
da1f200362
Add compat helpers for urllib
2018-03-28 12:45:53 +02:00
ines
ac88c72c9a
Fix ftfy workaround and remove old import
2018-03-28 12:14:28 +02:00
ines
ce6071ca89
Remove ftfy dependency and update docs
2018-03-28 12:09:42 +02:00
Matthew Honnibal
070b6c6495
Remove dependency on ftfy
2018-03-28 12:07:02 +02:00
ines
6d2c85f428
Drop six and related hacks as a dependency
2018-03-28 10:45:25 +02:00
ines
9e83513004
Add position of invalid token to error message
2018-03-27 23:56:59 +02:00
ines
11c4735ccf
Fix issue in Italian lemmatizer data ( resolves #2050 )
2018-03-27 23:55:22 +02:00
ines
693971dd8f
Improve error message if token text is empty string (see #2101 )
2018-03-27 22:25:40 +02:00
ines
0c829e6605
Fix whitespace
2018-03-27 22:20:59 +02:00
Ines Montani
e0ae390607
Update CONTRIBUTING.md
2018-03-27 13:47:00 +02:00
Matthew Honnibal
d4680e4d83
Merge branch 'master' of https://github.com/explosion/spaCy
2018-03-27 13:36:37 +02:00
Matthew Honnibal
63a267b34d
Fix #2073 : Token.set_extension not working
2018-03-27 13:36:20 +02:00
Ines Montani
284bbb1dd1
Merge pull request #2146 from justindujardin/tensorboard-standalone-example
...
Add example using TensorBoard standalone projector
2018-03-27 13:23:32 +02:00
Justin DuJardin
4eeb178856
Add example using TensorBoard standalone projector
...
- the tensorboard standalone project expects a different set of files than the plugin to TensorFlow.
2018-03-25 21:50:13 -07:00
Ines Montani
68226109f4
Merge pull request #2142 from jimregan/polish-more-tokens
...
more exceptions
2018-03-24 19:06:44 +01:00
Matthew Honnibal
d566e673bf
Set version to v2.0.10
2018-03-24 18:09:03 +01:00