Commit Graph

15336 Commits

Author SHA1 Message Date
Wolfgang Seeker
7adbd7a785 replace Counter with normal dict 2016-03-03 21:36:27 +01:00
Wolfgang Seeker
1ae487a4f6 add backwards compatibility with python 2.6 2016-03-03 21:18:12 +01:00
Wolfgang Seeker
9d1e6de4a0 make a proper list from zip iterator 2016-03-03 19:51:01 +01:00
Wolfgang Seeker
49f9d1c085 change test_nonproj.py to not use zip inside numpy.asarray 2016-03-03 19:42:09 +01:00
Wolfgang Seeker
72b8df0684 turned PseudoProjectivity into a normal python class 2016-03-03 19:05:08 +01:00
Matthew Honnibal
fcaa0ad7ce Merge pull request #280 from wbwseeker/german_parser
German parser
2016-03-04 03:27:42 +11:00
Wolfgang Seeker
690c5acabf adjust train.py to train both english and german models 2016-03-03 15:21:00 +01:00
Matthew Honnibal
9d51e4d13c Delete gather_freqs.py
This script was in a broken state, and should be unnecessary. The functionality is subsumed by `get_freqs.py`
2016-03-02 00:42:55 +11:00
Matthew Honnibal
ae2b479312 Merge pull request #278 from elyase/patch-1
replace codecs.open with io.open
2016-03-02 00:41:23 +11:00
Yaser Martinez Palenzuela
1a93d7f725 replace codecs.open with io.open 2016-03-01 14:10:11 +01:00
Wolfgang Seeker
3448cb40a4 integrated pseudo-projective parsing into parser
- nonproj.pyx holds a class PseudoProjectivity which currently holds
  all functionality to implement Nivre & Nilsson 2005's pseudo-projective
  parsing using the HEAD decoration scheme
- changed lefts/rights in Token to account for possible non-projective
  structures
2016-03-01 10:09:08 +01:00
Henning Peters
ee4c4f6a77 add run section to buildbot.json 2016-02-26 23:04:33 +01:00
Henning Peters
d3a65ef261 fix wheel build/test 2016-02-26 20:47:35 +01:00
Matthew Honnibal
fca5367aac * Switch beam_sgd to best_first_sgd 2016-02-26 02:29:23 +00:00
Wolfgang Seeker
56b7210e82 moved nonproj.py to syntax/nonproj.pyx 2016-02-25 15:08:49 +01:00
Matthew Honnibal
afed99b3c8 * Fix command line arguments for tagger 2016-02-25 03:00:54 +01:00
Matthew Honnibal
a76316ae7e * Try beam search for SGD 2016-02-25 03:00:35 +01:00
Henning Peters
304e27624f run tests for wheels 2016-02-24 20:21:40 +01:00
Henning Peters
4d375afb91 run tests for wheels 2016-02-24 19:59:08 +01:00
Henning Peters
f3df736e0a remove unidecode-related test 2016-02-24 18:22:22 +01:00
Matthew Honnibal
db87db87ea * Update tagger.pxd for CharacterTagger model 2016-02-24 18:20:47 +01:00
Matthew Honnibal
77f2b218f9 * Update conll_train script 2016-02-24 18:19:38 +01:00
Matthew Honnibal
fab538672e * Refactor CharacterTagger 2016-02-24 18:17:16 +01:00
Matthew Honnibal
1ba31f6229 Merge pull request #275 from henningpeters/unidecode
remove text-unidecode dependency
2016-02-25 04:10:45 +11:00
Wolfgang Seeker
4b2297d5d4 add class PseudoProjective for pseudo-projective parsing
PseudoProjective() implements the algorithm from Nivre & Nilsson 2005
using their HEAD decoration scheme.
2016-02-24 11:26:25 +01:00
Henning Peters
12d58a7099 remove text-unidecode dependency 2016-02-24 08:01:59 +01:00
Henning Peters
63deae47fe Update buildbot.json 2016-02-23 13:36:04 +01:00
Matthew Honnibal
92e9134603 * Try new CoNLL tagger method 2016-02-22 22:57:06 +01:00
Matthew Honnibal
3f12fb4191 * Change defaults for character tagger 2016-02-22 22:56:33 +01:00
Matthew Honnibal
422b33838e * Add note explaining parse features 2016-02-22 22:55:52 +01:00
Wolfgang Seeker
8d531c958b replace tests for non-projectivity
- add functions to find non-projective edges
- add test file for non-projectivity functions
2016-02-22 14:40:40 +01:00
Henning Peters
dfd1a1d3a2 Update buildbot.json 2016-02-22 06:13:09 +01:00
Matthew Honnibal
141639ea3a * Fix bug in tokenizer that caused new tokens to be added for affixes 2016-02-21 23:17:47 +00:00
Matthew Honnibal
5f53ef1a43 * Update conll_train for tagger, to use neural network tagger 2016-02-22 00:16:40 +01:00
Matthew Honnibal
c3f334cef1 * Work on character tagger 2016-02-22 00:15:25 +01:00
Matthew Honnibal
7a519ea5af * Add cautionary note to vocab about encoding 2016-02-22 00:13:20 +01:00
Henning Peters
1501ef58e0 Update README.md 2016-02-19 19:36:47 +01:00
Henning Peters
85f94fd314 get rid of pip-clear.py 2016-02-19 18:48:02 +01:00
Henning Peters
37a7020904 move displacy to its own subdomain 2016-02-19 14:03:52 +01:00
Henning Peters
59339d45e5 remove displacy 2016-02-19 13:30:49 +01:00
Henning Peters
0bb05ec7e1 Merge branch 'master' of github.com:spacy-io/spaCy 2016-02-19 13:30:14 +01:00
Henning Peters
d86a2a7a78 Update _installation.jade
with ```pip install -e .``` we don't need to set the PYTHONPATH anymore
also sync build instructions with travis script
2016-02-18 22:54:20 +01:00
Wolfgang Seeker
eae35e9b27 add tokenizer files for German, add/change code to train German pos tagger
- add files to specify rules for German tokenization
- change generate_specials.py to generate from an external file (abbrev.de.tab)
- copy gazetteer.json from lang_data/en/

- init_model.py
	- change doc freq threshold to 0
- add train_german_tagger.py
	- expects conll09-formatted input
2016-02-18 13:24:20 +01:00
Matthew Honnibal
92f62bcb84 * Work on character tagger 2016-02-16 23:55:47 +01:00
Matthew Honnibal
7ae048cf76 * Delete old draft of sense2vec post 2016-02-15 14:50:01 +01:00
Matthew Honnibal
05ec31a134 * Tmp 2016-02-15 14:40:28 +01:00
Matthew Honnibal
2326c5298f * Rename post Sense2Vec with SpaCy 2016-02-15 09:16:58 +01:00
Matthew Honnibal
ceb87e6b14 * Add meta.jade for sense2vec post 2016-02-15 09:16:12 +01:00
Henning Peters
04e1054bfa Merge branch 'master' of github.com:henningpeters/spaCy 2016-02-15 01:34:06 +01:00
Henning Peters
9cc4f8d5b3 avoid shadowing __name__ 2016-02-15 01:33:39 +01:00