Matthew Honnibal
6ea1601e93
* Add script to train models off the UD treebanks. Note that the UD data is restricted to research purposes only, and should only be used to train models for academic experiments.
2015-10-08 12:01:08 +11:00
Matthew Honnibal
c503654ec1
* Update bin/parser/train for printing output.
2015-10-06 10:35:22 +11:00
Matthew Honnibal
1ae55cb63a
* Copy tag_map.json in init_model
2015-09-12 05:54:02 +02:00
Matthew Honnibal
b2e82e55f6
* Create POS model dir in training script
2015-09-08 15:36:23 +02:00
Matthew Honnibal
5ad4527c42
* Rename Deutsch to German
2015-09-06 20:18:58 +02:00
Matthew Honnibal
d1eea2d865
* Update train.py for language-generic spaCy
2015-09-06 17:51:48 +02:00
Matthew Honnibal
950ce36660
* Update init model
2015-09-06 17:51:30 +02:00
Matthew Honnibal
b6b1e1aa12
* Add link for Finnish model
2015-08-27 10:26:02 +02:00
Matthew Honnibal
320ced276a
* Add tagger training script
2015-08-27 09:15:41 +02:00
Matthew Honnibal
dc13edd7cb
* Refactor init_model to accomodate other languages
2015-08-26 19:14:05 +02:00
Matthew Honnibal
bbf07ac253
* Cut down init_model to work on more languages
2015-08-24 01:05:20 +02:00
Matthew Honnibal
3ecacb9635
* Copy gazetteer file in init_model
2015-08-06 16:07:23 +02:00
Matthew Honnibal
ddc1a5cfe5
* Fix training under python3
2015-07-28 14:09:30 +02:00
Matthew Honnibal
174ed1ad20
* Tighten the frequency filter in init_model
2015-07-27 21:44:51 +02:00
Matthew Honnibal
6047f2aa35
* Fix path to freqs.txt
2015-07-27 02:22:35 +02:00
Matthew Honnibal
0368889d6c
* Support gzipped frequencies in init_model
2015-07-26 22:39:22 +02:00
Matthew Honnibal
c4f20847da
* Fix init_model for travis tests
2015-07-26 14:03:30 +02:00
Matthew Honnibal
09312b9353
* Fix init_model for travis tests
2015-07-26 13:55:47 +02:00
Matthew Honnibal
90ad717dc4
* Update default freq thresholds in init_model
2015-07-26 01:41:17 +02:00
Matthew Honnibal
6a5e035a48
* Ensure data files are copied for tokenizer in init_model
2015-07-26 01:36:19 +02:00
Matthew Honnibal
ab93898ac6
* Make heuristics more explicit in init_model
2015-07-26 00:22:19 +02:00
Matthew Honnibal
5c04dcd7c1
* Fix init_model
2015-07-25 23:33:02 +02:00
Matthew Honnibal
fd525f0675
* Pass OOV probability around
2015-07-25 23:29:51 +02:00
Matthew Honnibal
5b6bf4d4a6
* Remove probability cap on lexicon
2015-07-25 23:05:51 +02:00
Matthew Honnibal
c62eb110c0
* Fix merge conflict in init_model
2015-07-25 23:04:30 +02:00
Matthew Honnibal
0301472d15
* Fix init_model
2015-07-25 22:56:35 +02:00
Matthew Honnibal
8e800adfbc
* Fix init_model
2015-07-25 22:54:08 +02:00
Matthew Honnibal
5f183098e4
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2015-07-25 22:37:04 +02:00
Matthew Honnibal
6076213c16
* Fix init_model script
2015-07-25 22:35:52 +02:00
Matthew Honnibal
1a99eb69da
Merge branch 'master' of https://github.com/honnibal/spaCy
2015-07-25 22:19:48 +02:00
Matthew Honnibal
ef448649b3
* Add read_freqs function in init_model
2015-07-25 22:16:36 +02:00
Matthew Honnibal
2e6a60eaec
Merge branch 'master' of https://github.com/honnibal/spaCy
2015-07-25 21:14:07 +02:00
Matthew Honnibal
105305b4aa
* Upd get_freqs script
2015-07-25 21:13:41 +02:00
Matthew Honnibal
616445e027
* Add simple script to collate frequencies from sorted file
2015-07-25 21:12:45 +02:00
Matthew Honnibal
c52179f5fa
* Use print function in train.py, for py 2/3 compatibility
2015-07-24 04:52:35 +02:00
Matthew Honnibal
6be3ee311c
Py3 compatibility tweak
2015-07-23 13:13:15 +02:00
Matthew Honnibal
d4407d8e2f
Py3 compatibility tweak
2015-07-23 09:45:15 +02:00
Matthew Honnibal
da4821fc14
* Add cluster words to probs in init_model
2015-07-23 09:27:07 +02:00
Matthew Honnibal
4af2595d99
* Fix structure of wordnet directory for init_model
2015-07-23 06:35:38 +02:00
Matthew Honnibal
83c0f0da22
* Remove lemmatizer from init_model
2015-07-23 02:32:34 +02:00
Matthew Honnibal
4729200dfc
* Whitespace
2015-07-23 01:19:26 +02:00
Matthew Honnibal
2b7bd46508
* Update get_freqs script
2015-07-22 15:43:06 +02:00
Matthew Honnibal
386246db5b
* Update init_model, making language resources optional
2015-07-22 00:25:14 +02:00
Matthew Honnibal
317cbbc015
* Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time.
2015-07-19 15:18:17 +02:00
Matthew Honnibal
a6ff7e6ca4
* Fix redundant options in train.py
2015-07-17 22:38:05 +02:00
Matthew Honnibal
6cfa83157e
Merge branch 'refactor' of ssh://github.com/honnibal/spaCy into refactor
2015-07-17 21:38:04 +02:00
Matthew Honnibal
38ca0c33f5
Merge branch 'neuralnet' into refactor
...
Mostly refactors parser, to use new thinc3.2 Example class.
Aim is to remove use of shared memory, so that we can parallelize
over documents easily.
Conflicts:
setup.py
spacy/syntax/parser.pxd
spacy/syntax/parser.pyx
spacy/syntax/stateclass.pyx
2015-07-14 14:13:47 +02:00
Matthew Honnibal
af54d05d60
* Remove sense stuff from init_model
2015-07-14 10:56:17 +02:00
Matthew Honnibal
3de1b3ef1d
* Change get_freqs to take a list of files
2015-07-14 10:55:56 +02:00
Matthew Honnibal
39c93116eb
* Add get_freqs script
2015-07-14 02:31:32 +02:00