Matthew Honnibal
|
ea23b64cc8
|
Refactor training, with new spacy.train module. Defaults still a little awkward.
|
2016-10-09 12:24:24 +02:00 |
|
Matthew Honnibal
|
7db956133e
|
Move tokenizer data for German into spacy.de.language_data
|
2016-09-25 15:37:33 +02:00 |
|
Matthew Honnibal
|
95aaea0d3f
|
Refactor so that the tokenizer data is read from Python data, rather than from disk
|
2016-09-25 14:49:53 +02:00 |
|
Matthew Honnibal
|
fd65cf6cbb
|
Finish refactoring data loading
|
2016-09-24 20:26:17 +02:00 |
|
Henning Peters
|
470cdf5bf9
|
remove deprecated LOCAL_DATA_DIR
|
2016-04-05 11:25:54 +02:00 |
|
Matthew Honnibal
|
445164d5b4
|
* Restore the LOCAL_DATA_DIR global in spacy/en/__init__.py, although this is now deprecated
|
2016-01-19 02:54:56 +01:00 |
|
Matthew Honnibal
|
187960606f
|
* Fix pickle problems
|
2015-12-28 16:54:03 +01:00 |
|
Henning Peters
|
8359bd4d93
|
strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible
|
2015-12-18 09:52:55 +01:00 |
|
Henning Peters
|
9027cef3bc
|
access model via sputnik
|
2015-12-07 06:01:28 +01:00 |
|
Matthew Honnibal
|
e13e47e9e5
|
* Add English stop words
|
2015-09-14 17:48:51 +10:00 |
|
Matthew Honnibal
|
c4d8754385
|
* Specify LOCAL_DATA_DIR global in spacy.en.__init__.py
|
2015-08-26 19:15:07 +02:00 |
|
Matthew Honnibal
|
8083a07c3e
|
* Use language base class
|
2015-08-25 15:37:30 +02:00 |
|
Matthew Honnibal
|
6f1743692a
|
* Work on language-independent refactoring
|
2015-08-23 20:49:18 +02:00 |
|
Matthew Honnibal
|
cad0cca4e3
|
* Tmp
|
2015-08-22 22:04:34 +02:00 |
|
Matthew Honnibal
|
5737115e1e
|
* Work on gazetteer matching
|
2015-08-06 14:33:21 +02:00 |
|
Matthew Honnibal
|
ddc1a5cfe5
|
* Fix training under python3
|
2015-07-28 14:09:30 +02:00 |
|
Matthew Honnibal
|
8e4c69ee8c
|
* Add is_oov property, and fix up handling of attributes
|
2015-07-27 01:50:06 +02:00 |
|
Matthew Honnibal
|
eeaea25f0c
|
* Check oov_prob file is present
|
2015-07-26 16:36:38 +02:00 |
|
Matthew Honnibal
|
1b5d1da2a7
|
* Allow an OOV probability to be specified in get_lex_props
|
2015-07-26 00:03:43 +02:00 |
|
Matthew Honnibal
|
cd6e25132b
|
* Allow an OOV probability to be specified in get_lex_props
|
2015-07-26 00:01:46 +02:00 |
|
Matthew Honnibal
|
5b41744270
|
* Check for directory presence before loading annotators
|
2015-07-23 09:27:37 +02:00 |
|
Matthew Honnibal
|
680bb47b55
|
* Write serializer freqs to single file, vocab/serializer.json
|
2015-07-23 01:15:25 +02:00 |
|
Matthew Honnibal
|
c86dbe4944
|
* Update English.save_models for new Packer save/load stuff
|
2015-07-22 13:40:23 +02:00 |
|
Matthew Honnibal
|
317cbbc015
|
* Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time.
|
2015-07-19 15:18:17 +02:00 |
|
Matthew Honnibal
|
db9dfd2e23
|
* Major refactor of serialization. Nearly complete now.
|
2015-07-17 01:27:54 +02:00 |
|
Matthew Honnibal
|
897de2d438
|
* Add 'bitter' property for serializer in English class
|
2015-07-16 17:47:53 +02:00 |
|
Matthew Honnibal
|
ff9ff6f3fa
|
* Ensure unseen words are given low log probability
|
2015-07-12 01:31:09 +02:00 |
|
Matthew Honnibal
|
6ddb2f5e45
|
* Restore merge_mwe in English class
|
2015-07-08 19:35:30 +02:00 |
|
Matthew Honnibal
|
6859f6adac
|
* Restore merge_mwe in English class
|
2015-07-08 19:34:55 +02:00 |
|
Matthew Honnibal
|
e3c53f5ecd
|
* Fix mention of Tokens in docstring
|
2015-07-08 18:56:27 +02:00 |
|
Matthew Honnibal
|
bb522496dd
|
* Rename Tokens to Doc
|
2015-07-08 18:53:00 +02:00 |
|
Matthew Honnibal
|
4e4fac452b
|
* Refactor __init__ for simplicity. Allow parse=True, tag=True etc flags to be passed at top-level. Do not lazy-load parser.
|
2015-07-08 12:35:29 +02:00 |
|
Matthew Honnibal
|
1d2deb4616
|
* Work on refactoring default arguments to English.__init__
|
2015-07-07 15:53:25 +02:00 |
|
Matthew Honnibal
|
6788c86b2f
|
* Begin refactor
|
2015-07-07 14:00:07 +02:00 |
|
Matthew Honnibal
|
58d5ac0944
|
* Add beam search capabilities to Parser. Rename GreedyParser to Parser.
|
2015-06-02 00:28:02 +02:00 |
|
Matthew Honnibal
|
eba7b34f66
|
* Add flag to disable loading of word vectors
|
2015-05-25 01:02:42 +02:00 |
|
Jordan Suchow
|
3a8d9b37a6
|
Remove trailing whitespace
|
2015-04-19 13:01:38 -07:00 |
|
Matthew Honnibal
|
42617548af
|
* Disable merge_mwes by default
|
2015-04-16 04:20:31 +02:00 |
|
Matthew Honnibal
|
b8d34531c4
|
* Add support for units to English.__init__, by loading and applying regular expressions
|
2015-04-07 04:02:32 +02:00 |
|
Matthew Honnibal
|
801bf14f4f
|
* Clean up handling of dep_strings and ent_strings, using StringStore to encode the label names.
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
f21ab2d7fb
|
* Fix bug in ugly ent_strings hack on English class
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
8057a95f20
|
* NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring.
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
220ce8bfed
|
* Prepare English class for NER
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
179b7eb0a7
|
* Specify parser transition system in language
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
64645a1c2f
|
* Improve docstring on English
|
2015-02-11 15:13:20 -05:00 |
|
Matthew Honnibal
|
a1ed574b7b
|
* Fix default model path for English
|
2015-01-31 16:38:27 +11:00 |
|
Matthew Honnibal
|
c38c62d4a3
|
* Add docstring to English class
|
2015-01-27 02:45:21 +11:00 |
|
Matthew Honnibal
|
951d06c824
|
* Silently don't parse if data is not present
|
2015-01-25 14:47:38 +11:00 |
|
Matthew Honnibal
|
dd56e298e2
|
* Ensure tagging is applied if parse=True
|
2015-01-25 02:19:44 +11:00 |
|
Matthew Honnibal
|
94750819cd
|
* Set parse=True by default --- i.e. parse unless told not to.
|
2015-01-25 01:28:28 +11:00 |
|