Matthew Honnibal
|
92f750cf8b
|
* Use a gzipped frequencies file in init_model
|
2015-10-11 06:59:44 +02:00 |
|
Matthew Honnibal
|
064bd69ad0
|
* Refactor symbols, so that frequency rank can be derived from the orth id of a word.
|
2015-10-10 16:03:48 +11:00 |
|
Matthew Honnibal
|
83dccf0fd7
|
* Use io module insteads of deprecated codecs module
|
2015-10-10 14:13:01 +11:00 |
|
Matthew Honnibal
|
f35632e2e5
|
* Remove SBD print statement in train, after SBD evaluation was removed from Scorer
|
2015-10-09 11:08:58 +02:00 |
|
Matthew Honnibal
|
6ea1601e93
|
* Add script to train models off the UD treebanks. Note that the UD data is restricted to research purposes only, and should only be used to train models for academic experiments.
|
2015-10-08 12:01:08 +11:00 |
|
Matthew Honnibal
|
c503654ec1
|
* Update bin/parser/train for printing output.
|
2015-10-06 10:35:22 +11:00 |
|
alvations
|
8caedba42a
|
caught more codecs.open -> io.open
|
2015-09-30 20:20:09 +02:00 |
|
alvations
|
764bdc62e7
|
caught another codecs.open
|
2015-09-30 20:16:52 +02:00 |
|
Matthew Honnibal
|
1ae55cb63a
|
* Copy tag_map.json in init_model
|
2015-09-12 05:54:02 +02:00 |
|
Matthew Honnibal
|
b2e82e55f6
|
* Create POS model dir in training script
|
2015-09-08 15:36:23 +02:00 |
|
Matthew Honnibal
|
5ad4527c42
|
* Rename Deutsch to German
|
2015-09-06 20:18:58 +02:00 |
|
Matthew Honnibal
|
d1eea2d865
|
* Update train.py for language-generic spaCy
|
2015-09-06 17:51:48 +02:00 |
|
Matthew Honnibal
|
950ce36660
|
* Update init model
|
2015-09-06 17:51:30 +02:00 |
|
Matthew Honnibal
|
b6b1e1aa12
|
* Add link for Finnish model
|
2015-08-27 10:26:02 +02:00 |
|
Matthew Honnibal
|
320ced276a
|
* Add tagger training script
|
2015-08-27 09:15:41 +02:00 |
|
Matthew Honnibal
|
dc13edd7cb
|
* Refactor init_model to accomodate other languages
|
2015-08-26 19:14:05 +02:00 |
|
Matthew Honnibal
|
bbf07ac253
|
* Cut down init_model to work on more languages
|
2015-08-24 01:05:20 +02:00 |
|
Matthew Honnibal
|
3ecacb9635
|
* Copy gazetteer file in init_model
|
2015-08-06 16:07:23 +02:00 |
|
Matthew Honnibal
|
ddc1a5cfe5
|
* Fix training under python3
|
2015-07-28 14:09:30 +02:00 |
|
Matthew Honnibal
|
174ed1ad20
|
* Tighten the frequency filter in init_model
|
2015-07-27 21:44:51 +02:00 |
|
Matthew Honnibal
|
6047f2aa35
|
* Fix path to freqs.txt
|
2015-07-27 02:22:35 +02:00 |
|
Matthew Honnibal
|
0368889d6c
|
* Support gzipped frequencies in init_model
|
2015-07-26 22:39:22 +02:00 |
|
Matthew Honnibal
|
c4f20847da
|
* Fix init_model for travis tests
|
2015-07-26 14:03:30 +02:00 |
|
Matthew Honnibal
|
09312b9353
|
* Fix init_model for travis tests
|
2015-07-26 13:55:47 +02:00 |
|
Matthew Honnibal
|
90ad717dc4
|
* Update default freq thresholds in init_model
|
2015-07-26 01:41:17 +02:00 |
|
Matthew Honnibal
|
6a5e035a48
|
* Ensure data files are copied for tokenizer in init_model
|
2015-07-26 01:36:19 +02:00 |
|
Matthew Honnibal
|
ab93898ac6
|
* Make heuristics more explicit in init_model
|
2015-07-26 00:22:19 +02:00 |
|
Matthew Honnibal
|
5c04dcd7c1
|
* Fix init_model
|
2015-07-25 23:33:02 +02:00 |
|
Matthew Honnibal
|
fd525f0675
|
* Pass OOV probability around
|
2015-07-25 23:29:51 +02:00 |
|
Matthew Honnibal
|
5b6bf4d4a6
|
* Remove probability cap on lexicon
|
2015-07-25 23:05:51 +02:00 |
|
Matthew Honnibal
|
c62eb110c0
|
* Fix merge conflict in init_model
|
2015-07-25 23:04:30 +02:00 |
|
Matthew Honnibal
|
0301472d15
|
* Fix init_model
|
2015-07-25 22:56:35 +02:00 |
|
Matthew Honnibal
|
8e800adfbc
|
* Fix init_model
|
2015-07-25 22:54:08 +02:00 |
|
Matthew Honnibal
|
5f183098e4
|
Merge branch 'master' of ssh://github.com/honnibal/spaCy
|
2015-07-25 22:37:04 +02:00 |
|
Matthew Honnibal
|
6076213c16
|
* Fix init_model script
|
2015-07-25 22:35:52 +02:00 |
|
Matthew Honnibal
|
1a99eb69da
|
Merge branch 'master' of https://github.com/honnibal/spaCy
|
2015-07-25 22:19:48 +02:00 |
|
Matthew Honnibal
|
ef448649b3
|
* Add read_freqs function in init_model
|
2015-07-25 22:16:36 +02:00 |
|
Matthew Honnibal
|
2e6a60eaec
|
Merge branch 'master' of https://github.com/honnibal/spaCy
|
2015-07-25 21:14:07 +02:00 |
|
Matthew Honnibal
|
105305b4aa
|
* Upd get_freqs script
|
2015-07-25 21:13:41 +02:00 |
|
Matthew Honnibal
|
616445e027
|
* Add simple script to collate frequencies from sorted file
|
2015-07-25 21:12:45 +02:00 |
|
Matthew Honnibal
|
c52179f5fa
|
* Use print function in train.py, for py 2/3 compatibility
|
2015-07-24 04:52:35 +02:00 |
|
Matthew Honnibal
|
6be3ee311c
|
Py3 compatibility tweak
|
2015-07-23 13:13:15 +02:00 |
|
Matthew Honnibal
|
d4407d8e2f
|
Py3 compatibility tweak
|
2015-07-23 09:45:15 +02:00 |
|
Matthew Honnibal
|
da4821fc14
|
* Add cluster words to probs in init_model
|
2015-07-23 09:27:07 +02:00 |
|
Matthew Honnibal
|
4af2595d99
|
* Fix structure of wordnet directory for init_model
|
2015-07-23 06:35:38 +02:00 |
|
Matthew Honnibal
|
83c0f0da22
|
* Remove lemmatizer from init_model
|
2015-07-23 02:32:34 +02:00 |
|
Matthew Honnibal
|
4729200dfc
|
* Whitespace
|
2015-07-23 01:19:26 +02:00 |
|
Matthew Honnibal
|
2b7bd46508
|
* Update get_freqs script
|
2015-07-22 15:43:06 +02:00 |
|
Matthew Honnibal
|
386246db5b
|
* Update init_model, making language resources optional
|
2015-07-22 00:25:14 +02:00 |
|
Matthew Honnibal
|
317cbbc015
|
* Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time.
|
2015-07-19 15:18:17 +02:00 |
|