Matthew Honnibal
b1fe41b45d
* Extend infix test, commenting on limitation of tokenizer w.r.t. infixes at the moment.
2016-03-29 14:31:05 +11:00
Matthew Honnibal
9c73983bdd
* Add test for hyphenation problem in Issue #302
2016-03-29 14:27:13 +11:00
Matthew Honnibal
d249e2f7f3
* Improve error message in bin/parser/train.py
2016-03-29 13:04:33 +11:00
Matthew Honnibal
910a6c805f
* Add infix rule for double hyphens, re Issue #302
2016-03-29 13:03:44 +11:00
Matthew Honnibal
ad119c074f
* Fix incorrect whitespacing in Doc.text. This change is potentially breaking, to anyone who was relying on the previous incorrect semantics.
2016-03-29 13:02:42 +11:00
Matthew Honnibal
8c7a1908ee
Merge pull request #307 from scoder/faster_string_store
...
remove internal redundancy and overhead from StringStore
2016-03-29 12:59:52 +11:00
Wolfgang Seeker
7195b6742d
add restrictions to L-arc and R-arc to prevent space heads
2016-03-28 10:40:52 +02:00
Matthew Honnibal
8c77a994c6
Merge pull request #305 from henningpeters/master
...
multiple langs in download script
2016-03-26 21:54:59 +11:00
Henning Peters
c90d4a6f17
relative imports in __init__.py
2016-03-26 11:44:53 +01:00
Henning Peters
db095a162c
fix
2016-03-25 18:59:47 +01:00
Henning Peters
b8f63071eb
add lang registration facility
2016-03-25 18:54:45 +01:00
Matthew Honnibal
9cd21ad5b5
Merge pull request #284 from olegzd/olegzd/example/inventoryCount
...
Added reloadable English() example for inventory counting
2016-03-25 09:48:47 +11:00
Matthew Honnibal
4a37fdcee1
Merge pull request #287 from wbwseeker/deproj_sentbnd_bug
...
add function to Token for setting head and dep (and dep_)
2016-03-25 09:47:45 +11:00
Stefan Behnel
f18805ee1c
make StringStore.__contains__() return True for the empty string (which is also contained in iteration)
2016-03-24 15:42:12 +01:00
Stefan Behnel
f2cfbfc412
remove internal redundancy and overhead from StringStore
2016-03-24 15:25:27 +01:00
Wolfgang Seeker
d65ef41d08
make error messages language independent
2016-03-24 11:47:09 +01:00
Henning Peters
963570aa49
Merge branch 'master' of github.com:spacy-io/spaCy
2016-03-24 11:19:47 +01:00
Henning Peters
a7d7ea3afa
first idea for supporting multiple langs in download script
2016-03-24 11:19:43 +01:00
Wolfgang Seeker
5080077097
revert init_model.py back to pre-german state (because it makes more sense)
...
simplify token.n_rights and token.n_lefts
2016-03-21 16:10:25 +01:00
Matthew Honnibal
a862edc0e6
Merge pull request #296 from elyase/patch-2
...
make use of log_smooth_count
2016-03-19 06:50:30 +11:00
Yaser Martinez Palenzuela
3c210f45fa
make use of log_smooth_count
2016-03-17 12:19:52 +01:00
Wolfgang Seeker
5e2e8e951a
add baseclass DocIterator for iterators over documents
...
add classes for English and German noun chunks
the respective iterators are set for the document when created by the parser
as they depend on the annotation scheme of the parsing model
2016-03-16 15:53:35 +01:00
Matthew Honnibal
80134eb12d
Merge branch 'master' of https://github.com/spacy-io/spaCy
2016-03-15 19:14:50 +00:00
Matthew Honnibal
eaccbcda0f
Fix bug in pos_tag.py script
2016-03-16 06:04:14 +11:00
Wolfgang Seeker
2ae253ef5b
changed head.__set__ to make it simpler
2016-03-14 13:43:48 +01:00
Henning Peters
8f870854c4
move bootstrap script to gist
2016-03-14 11:32:20 +01:00
Henning Peters
c12d3dd200
add __init__.py to empty package dirs
2016-03-14 11:28:03 +01:00
Henning Peters
54f3447b5f
cleanup
2016-03-14 01:46:33 +01:00
Henning Peters
8ef5b6e126
cleanup
2016-03-13 19:52:13 +01:00
Henning Peters
1fe29c6919
cleanup
2016-03-13 18:12:32 +01:00
Henning Peters
9f628688ce
cleanup
2016-03-12 14:31:39 +01:00
Henning Peters
49f499ca1c
cleanup
2016-03-12 14:30:24 +01:00
Henning Peters
5701686272
cleanup
2016-03-12 13:47:10 +01:00
Wolfgang Seeker
46e3f979f1
add function for setting head and label to token
...
change PseudoProjectivity.deprojectivize to use these functions
2016-03-11 17:31:06 +01:00
Matthew Honnibal
b37571063a
Merge pull request #286 from gushecht/patch-2
...
added batch_size as keyword argument
2016-03-11 09:46:36 +11:00
Gus Hecht
feefe64ab2
added batch_size as keyword argument
...
There's probably a better default value....
2016-03-10 14:16:34 -08:00
Wolfgang Seeker
03fb498dbe
introduce lang field for LexemeC to hold language id
...
put noun_chunk logic into iterators.py for each language separately
2016-03-10 13:01:34 +01:00
Oleg Zdornyy
a774131671
Added reloadable English() example for inv. count
2016-03-09 19:35:55 -08:00
Wolfgang Seeker
bc9c62e279
replace Language functions with corresponding orth functions
...
implement punctuation functions in orth
2016-03-09 18:07:37 +01:00
Wolfgang Seeker
d9312bc9ea
add new files npchunks.{pyx,pxd} to hold noun phrase chunk generators
2016-03-09 16:18:48 +01:00
Matthew Honnibal
1508528c8c
* Increment version
2016-03-08 15:58:45 +00:00
Matthew Honnibal
963fe5258e
* Add missing __contains__ method to vocab
2016-03-08 15:49:10 +00:00
Matthew Honnibal
478aa21cb0
* Remove broken __reduce__ method on vocab
2016-03-08 15:48:21 +00:00
Matthew Honnibal
20235bde00
Merge pull request #282 from henningpeters/switch_vectors
...
initial proposal for ability to switch vectors
2016-03-09 01:39:41 +11:00
Henning Peters
5b3b3ebc8e
upgrade to latest sputnik
2016-03-08 15:30:17 +01:00
Henning Peters
eb7ae61b1c
cleanup api
2016-03-08 12:59:18 +01:00
Henning Peters
b740f20191
hash_string() should not depend on python's internal unicode representation, also fixes https://github.com/spacy-io/sense2vec/issues/5 for py2
2016-03-06 09:19:27 +01:00
Henning Peters
aa4d964c14
cleanup api
2016-03-05 17:51:32 +01:00
Henning Peters
931c07a609
initial proposal for separate vector package
2016-03-04 11:09:06 +01:00
Wolfgang Seeker
7adbd7a785
replace Counter with normal dict
2016-03-03 21:36:27 +01:00