Wolfgang Seeker
03fb498dbe
introduce lang field for LexemeC to hold language id
...
put noun_chunk logic into iterators.py for each language separately
2016-03-10 13:01:34 +01:00
Oleg Zdornyy
a774131671
Added reloadable English() example for inv. count
2016-03-09 19:35:55 -08:00
Wolfgang Seeker
bc9c62e279
replace Language functions with corresponding orth functions
...
implement punctuation functions in orth
2016-03-09 18:07:37 +01:00
Wolfgang Seeker
d9312bc9ea
add new files npchunks.{pyx,pxd} to hold noun phrase chunk generators
2016-03-09 16:18:48 +01:00
Matthew Honnibal
1508528c8c
* Increment version
2016-03-08 15:58:45 +00:00
Matthew Honnibal
963fe5258e
* Add missing __contains__ method to vocab
2016-03-08 15:49:10 +00:00
Matthew Honnibal
478aa21cb0
* Remove broken __reduce__ method on vocab
2016-03-08 15:48:21 +00:00
Matthew Honnibal
20235bde00
Merge pull request #282 from henningpeters/switch_vectors
...
initial proposal for ability to switch vectors
2016-03-09 01:39:41 +11:00
Henning Peters
5b3b3ebc8e
upgrade to latest sputnik
2016-03-08 15:30:17 +01:00
Henning Peters
eb7ae61b1c
cleanup api
2016-03-08 12:59:18 +01:00
Henning Peters
b740f20191
hash_string() should not depend on python's internal unicode representation, also fixes https://github.com/spacy-io/sense2vec/issues/5 for py2
2016-03-06 09:19:27 +01:00
Henning Peters
aa4d964c14
cleanup api
2016-03-05 17:51:32 +01:00
Henning Peters
931c07a609
initial proposal for separate vector package
2016-03-04 11:09:06 +01:00
Wolfgang Seeker
7adbd7a785
replace Counter with normal dict
2016-03-03 21:36:27 +01:00
Wolfgang Seeker
1ae487a4f6
add backwards compatibility with python 2.6
2016-03-03 21:18:12 +01:00
Wolfgang Seeker
9d1e6de4a0
make a proper list from zip iterator
2016-03-03 19:51:01 +01:00
Wolfgang Seeker
49f9d1c085
change test_nonproj.py to not use zip inside numpy.asarray
2016-03-03 19:42:09 +01:00
Wolfgang Seeker
72b8df0684
turned PseudoProjectivity into a normal python class
2016-03-03 19:05:08 +01:00
Matthew Honnibal
fcaa0ad7ce
Merge pull request #280 from wbwseeker/german_parser
...
German parser
2016-03-04 03:27:42 +11:00
Wolfgang Seeker
690c5acabf
adjust train.py to train both english and german models
2016-03-03 15:21:00 +01:00
Matthew Honnibal
9d51e4d13c
Delete gather_freqs.py
...
This script was in a broken state, and should be unnecessary. The functionality is subsumed by `get_freqs.py`
2016-03-02 00:42:55 +11:00
Matthew Honnibal
ae2b479312
Merge pull request #278 from elyase/patch-1
...
replace codecs.open with io.open
2016-03-02 00:41:23 +11:00
Yaser Martinez Palenzuela
1a93d7f725
replace codecs.open with io.open
2016-03-01 14:10:11 +01:00
Wolfgang Seeker
3448cb40a4
integrated pseudo-projective parsing into parser
...
- nonproj.pyx holds a class PseudoProjectivity which currently holds
all functionality to implement Nivre & Nilsson 2005's pseudo-projective
parsing using the HEAD decoration scheme
- changed lefts/rights in Token to account for possible non-projective
structures
2016-03-01 10:09:08 +01:00
Henning Peters
ee4c4f6a77
add run section to buildbot.json
2016-02-26 23:04:33 +01:00
Henning Peters
d3a65ef261
fix wheel build/test
2016-02-26 20:47:35 +01:00
Wolfgang Seeker
56b7210e82
moved nonproj.py to syntax/nonproj.pyx
2016-02-25 15:08:49 +01:00
Henning Peters
304e27624f
run tests for wheels
2016-02-24 20:21:40 +01:00
Henning Peters
4d375afb91
run tests for wheels
2016-02-24 19:59:08 +01:00
Henning Peters
f3df736e0a
remove unidecode-related test
2016-02-24 18:22:22 +01:00
Matthew Honnibal
1ba31f6229
Merge pull request #275 from henningpeters/unidecode
...
remove text-unidecode dependency
2016-02-25 04:10:45 +11:00
Wolfgang Seeker
4b2297d5d4
add class PseudoProjective for pseudo-projective parsing
...
PseudoProjective() implements the algorithm from Nivre & Nilsson 2005
using their HEAD decoration scheme.
2016-02-24 11:26:25 +01:00
Henning Peters
12d58a7099
remove text-unidecode dependency
2016-02-24 08:01:59 +01:00
Henning Peters
63deae47fe
Update buildbot.json
2016-02-23 13:36:04 +01:00
Wolfgang Seeker
8d531c958b
replace tests for non-projectivity
...
- add functions to find non-projective edges
- add test file for non-projectivity functions
2016-02-22 14:40:40 +01:00
Henning Peters
dfd1a1d3a2
Update buildbot.json
2016-02-22 06:13:09 +01:00
Matthew Honnibal
141639ea3a
* Fix bug in tokenizer that caused new tokens to be added for affixes
2016-02-21 23:17:47 +00:00
Henning Peters
1501ef58e0
Update README.md
2016-02-19 19:36:47 +01:00
Henning Peters
85f94fd314
get rid of pip-clear.py
2016-02-19 18:48:02 +01:00
Henning Peters
37a7020904
move displacy to its own subdomain
2016-02-19 14:03:52 +01:00
Henning Peters
59339d45e5
remove displacy
2016-02-19 13:30:49 +01:00
Henning Peters
0bb05ec7e1
Merge branch 'master' of github.com:spacy-io/spaCy
2016-02-19 13:30:14 +01:00
Henning Peters
d86a2a7a78
Update _installation.jade
...
with ```pip install -e .``` we don't need to set the PYTHONPATH anymore
also sync build instructions with travis script
2016-02-18 22:54:20 +01:00
Wolfgang Seeker
eae35e9b27
add tokenizer files for German, add/change code to train German pos tagger
...
- add files to specify rules for German tokenization
- change generate_specials.py to generate from an external file (abbrev.de.tab)
- copy gazetteer.json from lang_data/en/
- init_model.py
- change doc freq threshold to 0
- add train_german_tagger.py
- expects conll09-formatted input
2016-02-18 13:24:20 +01:00
Henning Peters
04e1054bfa
Merge branch 'master' of github.com:henningpeters/spaCy
2016-02-15 01:34:06 +01:00
Henning Peters
9cc4f8d5b3
avoid shadowing __name__
2016-02-15 01:33:39 +01:00
Henning Peters
135746947a
Update package.json
2016-02-14 20:19:26 +01:00
Henning Peters
4c9e3c7911
upgrade spuntik, enforce data api via model version constraints
2016-02-14 16:03:17 +01:00
Henning Peters
9d8966a2c0
Update test_tokenizer.py
2016-02-10 19:24:37 +01:00
Henning Peters
82c57f21a4
Update requirements.txt
2016-02-10 18:56:21 +01:00