Jim Geovedi
f6f15678fb
added lex_attrs
2017-07-23 22:55:22 +07:00
Jim Geovedi
bed8162d00
added tokenizer_exceptions
2017-07-23 22:55:05 +07:00
Jim Geovedi
b80c35bc9a
added norm_exceptions
2017-07-23 22:54:49 +07:00
Jim Geovedi
b5de329ea3
added norm_exceptions
2017-07-23 22:54:19 +07:00
Jim Geovedi
082e9ade46
fixed typo
2017-07-23 21:30:34 +07:00
Jim Geovedi
e2efeb186e
added stopwords
2017-07-23 20:52:37 +07:00
Jim Geovedi
da98676839
use template
2017-07-23 20:51:31 +07:00
Jim Geovedi
c2b4dd7809
start working on Indonesian language
2017-07-23 20:50:56 +07:00
Matthew Honnibal
5771bd1ff8
Increment version
2017-07-23 14:18:38 +02:00
Matthew Honnibal
c4a81a47a4
Fix deserialization
2017-07-23 14:11:07 +02:00
Matthew Honnibal
2df563ad24
Remove optimization for textcat that caused loading problem
2017-07-23 14:10:51 +02:00
Matthew Honnibal
4fe77bced2
Add cfg attr to pipeline components
2017-07-23 00:52:47 +02:00
Matthew Honnibal
d8aa721664
Compute Language.meta with a property
2017-07-23 00:50:18 +02:00
Matthew Honnibal
54a539a113
Finish text classifier example
2017-07-23 00:34:12 +02:00
Matthew Honnibal
a88a7deffe
Five save/load of textcat config
2017-07-23 00:33:43 +02:00
Matthew Honnibal
c27fdaef6f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-07-22 20:15:55 +02:00
Matthew Honnibal
2bc7d87c70
Add example for training text classifier
2017-07-22 20:15:32 +02:00
Matthew Honnibal
9bae0ddc50
Fix minibatching
2017-07-22 20:14:49 +02:00
Matthew Honnibal
ded0df5e2f
Expose hyper-param as keyword arg
2017-07-22 20:14:37 +02:00
Matthew Honnibal
f5de8deeec
Increment version
2017-07-22 20:04:53 +02:00
Matthew Honnibal
b55714d5d1
Make gold_tuples arg optional in begin_training
2017-07-22 20:04:43 +02:00
Matthew Honnibal
ed6c85fa3c
Fix loading of text categories in GoldParse
2017-07-22 20:04:03 +02:00
Matthew Honnibal
6ffec9dfea
Update _ml, for textcat model
2017-07-22 20:03:40 +02:00
ines
864cefd3b2
Update README.rst
2017-07-22 18:29:55 +02:00
ines
e349271506
Increment version
2017-07-22 18:29:30 +02:00
ines
ab8ffbaab7
Add text classification to v2 overview
2017-07-22 17:56:51 +02:00
ines
f085b88f9d
Add TextCategorizer API docs stub
2017-07-22 17:56:33 +02:00
ines
ab1a4e8b3c
Add Tensorizer API docs stub
2017-07-22 17:56:25 +02:00
ines
0fb89dd204
Add text classification usage guide template
2017-07-22 17:56:07 +02:00
ines
d05ab1b3a0
Add text classification to 101 overview and change order
2017-07-22 17:55:53 +02:00
ines
d2a7e5b8e5
Add GoldParse.cats attribute
2017-07-22 17:55:35 +02:00
ines
23d976ed00
Add Doc.cats attribute and missing v2 tag
2017-07-22 17:55:14 +02:00
Ines Montani
570964e67f
Update README.rst
2017-07-22 16:20:19 +02:00
Matthew Honnibal
5494605689
Fiddle with regex pin
2017-07-22 16:09:50 +02:00
Matthew Honnibal
78fcf56dd5
Update version pin for regex library
2017-07-22 15:57:58 +02:00
Matthew Honnibal
d51d55bba6
Increment version
2017-07-22 15:43:16 +02:00
Matthew Honnibal
8ccf154413
Merge branch 'master' of https://github.com/explosion/spaCy
2017-07-22 15:42:44 +02:00
Matthew Honnibal
796b2f4c1b
Remove print statements in tests
2017-07-22 15:42:38 +02:00
ines
7c4bf9994d
Add note on requirements and preventing model re-downloads ( closes #1143 )
2017-07-22 15:40:12 +02:00
ines
de25bad036
Use lower min version for requests dependency ( fixes #1137 )
...
Ensure compatibility with docker-compose and other packages
2017-07-22 15:29:10 +02:00
ines
d7560047c5
Fix version
2017-07-22 15:24:33 +02:00
Matthew Honnibal
af945ea8e2
Merge branch 'master' of https://github.com/explosion/spaCy
2017-07-22 15:09:59 +02:00
Matthew Honnibal
4b2e5e59ed
Add flush_cache method to tokenizer, to fix #1061
...
The tokenizer caches output for common chunks, for efficiency. This
cache is be invalidated when the tokenizer rules change, e.g. when a new
special-case rule is introduced. That's what was causing #1061 .
When the cache is flushed, we free the intermediate token chunks.
I *think* this is safe --- but if we start getting segfaults, this patch
is to blame. The resolution would be to simply not free those bits of
memory. They'll be freed when the tokenizer exits anyway.
2017-07-22 15:06:50 +02:00
Ines Montani
96df9c7154
Update CONTRIBUTORS.md
2017-07-22 15:05:46 +02:00
ines
b22b18a019
Add notes on spacy.explain() to annotation docs
2017-07-22 15:02:15 +02:00
Ines Montani
1ddbeddca2
Fix typo
2017-07-22 15:00:58 +02:00
ines
e3f23f9d91
Use latest available version in examples
2017-07-22 14:57:51 +02:00
Matthew Honnibal
23a55b40ca
Default to English noun chunks iterator if no lang set
2017-07-22 14:15:25 +02:00
Matthew Honnibal
9750a0128c
Fix Span.noun_chunks. Closes #1207
2017-07-22 14:14:57 +02:00
Matthew Honnibal
d9b85675d7
Rename regression test
2017-07-22 14:14:35 +02:00