Commit Graph

179 Commits

Author SHA1 Message Date
ines
05fe6758a7 Set lexeme attributes for tokenizer special cases 2017-06-03 19:44:39 +02:00
ines
41a6adf1f6 Initialise Vocab length correctly 2017-06-02 10:57:25 +02:00
ines
53b82f972a Add strings to Vocab in init, instead of StringStore 2017-06-02 10:57:06 +02:00
ines
023f38bdd4 Fix return value of Vocab.from_bytes 2017-06-02 10:56:40 +02:00
Matthew Honnibal
307d615c5f Fix serialization for tagger when tag_map has changed 2017-06-01 12:18:36 -05:00
Matthew Honnibal
9805e0e369 Fix vocab pickling 2017-05-31 08:25:01 -05:00
Matthew Honnibal
a131981f3b Work on vectors 2017-05-30 23:34:50 +02:00
Matthew Honnibal
9bf22a94aa Fix tag set serialisation 2017-05-29 17:52:36 -05:00
Matthew Honnibal
920887f4e4 Specify order of vocab deserialization 2017-05-29 13:04:40 +02:00
Matthew Honnibal
6b019b0540 Update to/from bytes methods 2017-05-29 10:14:20 +02:00
Matthew Honnibal
6dad4117ad Work on serialization for models 2017-05-29 01:37:57 +02:00
Matthew Honnibal
2edd96ce47 Draft Vocab to/from disk/bytes 2017-05-28 23:34:12 +02:00
Matthew Honnibal
fe11564b8e Finish stringstore change. Also xfail vectors tests 2017-05-28 15:10:22 +02:00
Matthew Honnibal
fe4a746300 Accomodate symbols in new string scheme 2017-05-28 13:03:16 +02:00
Matthew Honnibal
a5606c3eda Work on changing StringStore to return hashes. 2017-05-28 12:36:27 +02:00
Matthew Honnibal
39293ab2ee Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-28 11:46:57 +02:00
Matthew Honnibal
15f6efc127 Remove vectors from vocab 2017-05-28 11:45:32 +02:00
ines
c8543c8237 Fix formatting and docstrings and remove deprecated function 2017-05-28 00:22:40 +02:00
ines
251346b59f Fix typos and formatting 2017-05-21 14:18:46 +02:00
ines
d82ae9a585 Change "function" to "callable" in docs 2017-05-21 13:17:40 +02:00
ines
f0cc642bb9 Update docstrings and API docs for Vocab 2017-05-20 14:00:41 +02:00
Matthew Honnibal
793430aa7a Get spaCy train command working with neural network
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
2017-05-17 12:04:50 +02:00
Matthew Honnibal
9e167b7bb6 Strip serializer from code 2017-05-09 17:28:50 +02:00
ines
e1efd589c3 Fix json imports and use ujson 2017-04-15 12:13:34 +02:00
ines
c05ec4b89a Add compat functions and remove old workarounds
Add ensure_path util function to handle checking instance of path
2017-04-15 12:11:16 +02:00
ines
d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines
561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
Matthew Honnibal
d013aba7b5 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-17 18:30:53 +01:00
Matthew Honnibal
854cfce7cf Make vocabs more compatible across versions
Previously, symbols were inserted into the string-store
before strings were loaded. This meant that adding a symbol
would invalidate saved models. We now make sure that strings
are loaded faithfully, so that compatibility is maintained.
2017-03-17 18:29:04 +01:00
Matthew Honnibal
1cc841e600 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-17 08:18:11 -05:00
Matthew Honnibal
4bfc55b532 Auto-add words to vocab when loading vectors
When calling vocab.load_vectors_from_bin_loc, ensure that missing
entries are added to the vocab. Otherwise, loading vectors into an
empty vocab object resulted in no vectors being added.
2017-03-17 08:15:59 -05:00
Matthew Honnibal
4382f175b3 Squelch compiler warnings 2017-03-11 12:44:43 -06:00
Matthew Honnibal
d814892805 Hackish pickle support for Vocab. 2017-03-07 20:25:12 +01:00
ines
aa92d4e9b5 Fix unicode regex for Python 2 (see #834) 2017-02-16 23:49:54 +01:00
ines
85d249d451 Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)""
This reverts commit ea05f78660.
2017-02-16 23:26:25 +01:00
ines
ea05f78660 Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)"
This reverts commit 7d8c9eee7f, reversing
changes made to f6b69babcc.
2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque
e17dc2db75 Remove useless import 2017-02-16 12:10:24 +01:00
Raphaël Bournhonesque
3fd2742649 load_vectors should accept arbitrary space characters as word tokens
Fix bug  #834
2017-02-16 12:08:30 +01:00
Daniel Hershcovich
99eb494a82 Fix #737: support loading word vectors with " " as a word 2017-01-12 17:00:14 +02:00
Daniel Hershcovich
8e603cc917 Avoid "True if ... else False" 2017-01-11 11:18:22 +02:00
Matthew Honnibal
cade536d1e Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-12-27 21:04:10 +01:00
Matthew Honnibal
ce4539dafd Allow the vocabulary to grow to 10,000, to prevent cold-start problem. 2016-12-27 21:03:45 +01:00
Ines Montani
8978806ea6 Allow Vocab to load without serializer_freqs 2016-12-21 18:05:23 +01:00
Ines Montani
be8ed811f6 Remove trailing whitespace 2016-12-21 18:04:41 +01:00
Matthew Honnibal
6ee1df93c5 Set tag_map to None if it's not seen in the data by vocab 2016-12-18 16:51:10 +01:00
Matthew Honnibal
1e0f566d95 Fix #656, #624: Support arbitrary token attributes when adding special-case rules. 2016-11-25 12:43:24 +01:00
Matthew Honnibal
f123f92e0c Fix #617: Vocab.load() required Path. Should work with string as well. 2016-11-10 22:48:48 +01:00
Matthew Honnibal
b86f8af0c1 Fix doc strings 2016-11-01 12:25:36 +01:00
Matthew Honnibal
6036ec7c77 Fix vector norm when loading lexemes. 2016-10-23 19:40:18 +02:00
Matthew Honnibal
3e688e6d4b Fix issue #514 -- serializer fails when new entity type has been added. The fix here is quite ugly. It's best to add the entities ASAP after loading the NLP pipeline, to mitigate the brittleness. 2016-10-23 17:45:44 +02:00