Matthew Honnibal
|
6e68b344c1
|
* Train after parsing, not before.
|
2015-11-12 04:43:52 +11:00 |
|
Matthew Honnibal
|
4fb038a9eb
|
* Update conll_train.py script for spaCy v0.97
|
2015-10-31 00:53:51 +11:00 |
|
Matthew Honnibal
|
cfaa4bde5d
|
* Add train and parse scripts that use CoNLL formatted data
|
2015-10-30 12:54:49 +11:00 |
|
Matthew Honnibal
|
2348a08481
|
* Load/dump strings with a json file, instead of the hacky strings file we were using.
|
2015-10-22 21:13:03 +11:00 |
|
Matthew Honnibal
|
0ce12e4548
|
* Import io in get_freqs
|
2015-10-19 12:56:18 +11:00 |
|
Matthew Honnibal
|
17fffb4c57
|
* Update get_freqs.py script
|
2015-10-16 04:33:49 +11:00 |
|
Matthew Honnibal
|
5ff4454177
|
* Update get_freqs.py script
|
2015-10-16 04:31:15 +11:00 |
|
Matthew Honnibal
|
a748146dd3
|
* Update get_freqs.py script
|
2015-10-16 04:24:50 +11:00 |
|
Matthew Honnibal
|
a29fd79fbc
|
* Update get_freqs.py script
|
2015-10-16 04:24:08 +11:00 |
|
Matthew Honnibal
|
e08a4b46a2
|
* Update get_freqs.py script
|
2015-10-16 04:20:35 +11:00 |
|
Matthew Honnibal
|
92f750cf8b
|
* Use a gzipped frequencies file in init_model
|
2015-10-11 06:59:44 +02:00 |
|
Matthew Honnibal
|
064bd69ad0
|
* Refactor symbols, so that frequency rank can be derived from the orth id of a word.
|
2015-10-10 16:03:48 +11:00 |
|
Matthew Honnibal
|
83dccf0fd7
|
* Use io module insteads of deprecated codecs module
|
2015-10-10 14:13:01 +11:00 |
|
Matthew Honnibal
|
f35632e2e5
|
* Remove SBD print statement in train, after SBD evaluation was removed from Scorer
|
2015-10-09 11:08:58 +02:00 |
|
Matthew Honnibal
|
6ea1601e93
|
* Add script to train models off the UD treebanks. Note that the UD data is restricted to research purposes only, and should only be used to train models for academic experiments.
|
2015-10-08 12:01:08 +11:00 |
|
Matthew Honnibal
|
c503654ec1
|
* Update bin/parser/train for printing output.
|
2015-10-06 10:35:22 +11:00 |
|
alvations
|
8caedba42a
|
caught more codecs.open -> io.open
|
2015-09-30 20:20:09 +02:00 |
|
alvations
|
764bdc62e7
|
caught another codecs.open
|
2015-09-30 20:16:52 +02:00 |
|
Matthew Honnibal
|
1ae55cb63a
|
* Copy tag_map.json in init_model
|
2015-09-12 05:54:02 +02:00 |
|
Matthew Honnibal
|
b2e82e55f6
|
* Create POS model dir in training script
|
2015-09-08 15:36:23 +02:00 |
|
Matthew Honnibal
|
5ad4527c42
|
* Rename Deutsch to German
|
2015-09-06 20:18:58 +02:00 |
|
Matthew Honnibal
|
d1eea2d865
|
* Update train.py for language-generic spaCy
|
2015-09-06 17:51:48 +02:00 |
|
Matthew Honnibal
|
950ce36660
|
* Update init model
|
2015-09-06 17:51:30 +02:00 |
|
Matthew Honnibal
|
b6b1e1aa12
|
* Add link for Finnish model
|
2015-08-27 10:26:02 +02:00 |
|
Matthew Honnibal
|
320ced276a
|
* Add tagger training script
|
2015-08-27 09:15:41 +02:00 |
|
Matthew Honnibal
|
dc13edd7cb
|
* Refactor init_model to accomodate other languages
|
2015-08-26 19:14:05 +02:00 |
|
Matthew Honnibal
|
bbf07ac253
|
* Cut down init_model to work on more languages
|
2015-08-24 01:05:20 +02:00 |
|
Matthew Honnibal
|
3ecacb9635
|
* Copy gazetteer file in init_model
|
2015-08-06 16:07:23 +02:00 |
|
Matthew Honnibal
|
ddc1a5cfe5
|
* Fix training under python3
|
2015-07-28 14:09:30 +02:00 |
|
Matthew Honnibal
|
174ed1ad20
|
* Tighten the frequency filter in init_model
|
2015-07-27 21:44:51 +02:00 |
|
Matthew Honnibal
|
6047f2aa35
|
* Fix path to freqs.txt
|
2015-07-27 02:22:35 +02:00 |
|
Matthew Honnibal
|
0368889d6c
|
* Support gzipped frequencies in init_model
|
2015-07-26 22:39:22 +02:00 |
|
Matthew Honnibal
|
c4f20847da
|
* Fix init_model for travis tests
|
2015-07-26 14:03:30 +02:00 |
|
Matthew Honnibal
|
09312b9353
|
* Fix init_model for travis tests
|
2015-07-26 13:55:47 +02:00 |
|
Matthew Honnibal
|
90ad717dc4
|
* Update default freq thresholds in init_model
|
2015-07-26 01:41:17 +02:00 |
|
Matthew Honnibal
|
6a5e035a48
|
* Ensure data files are copied for tokenizer in init_model
|
2015-07-26 01:36:19 +02:00 |
|
Matthew Honnibal
|
ab93898ac6
|
* Make heuristics more explicit in init_model
|
2015-07-26 00:22:19 +02:00 |
|
Matthew Honnibal
|
5c04dcd7c1
|
* Fix init_model
|
2015-07-25 23:33:02 +02:00 |
|
Matthew Honnibal
|
fd525f0675
|
* Pass OOV probability around
|
2015-07-25 23:29:51 +02:00 |
|
Matthew Honnibal
|
5b6bf4d4a6
|
* Remove probability cap on lexicon
|
2015-07-25 23:05:51 +02:00 |
|
Matthew Honnibal
|
c62eb110c0
|
* Fix merge conflict in init_model
|
2015-07-25 23:04:30 +02:00 |
|
Matthew Honnibal
|
0301472d15
|
* Fix init_model
|
2015-07-25 22:56:35 +02:00 |
|
Matthew Honnibal
|
8e800adfbc
|
* Fix init_model
|
2015-07-25 22:54:08 +02:00 |
|
Matthew Honnibal
|
5f183098e4
|
Merge branch 'master' of ssh://github.com/honnibal/spaCy
|
2015-07-25 22:37:04 +02:00 |
|
Matthew Honnibal
|
6076213c16
|
* Fix init_model script
|
2015-07-25 22:35:52 +02:00 |
|
Matthew Honnibal
|
1a99eb69da
|
Merge branch 'master' of https://github.com/honnibal/spaCy
|
2015-07-25 22:19:48 +02:00 |
|
Matthew Honnibal
|
ef448649b3
|
* Add read_freqs function in init_model
|
2015-07-25 22:16:36 +02:00 |
|
Matthew Honnibal
|
2e6a60eaec
|
Merge branch 'master' of https://github.com/honnibal/spaCy
|
2015-07-25 21:14:07 +02:00 |
|
Matthew Honnibal
|
105305b4aa
|
* Upd get_freqs script
|
2015-07-25 21:13:41 +02:00 |
|
Matthew Honnibal
|
616445e027
|
* Add simple script to collate frequencies from sorted file
|
2015-07-25 21:12:45 +02:00 |
|