Matthew Honnibal
4b2e5e59ed
Add flush_cache method to tokenizer, to fix #1061
...
The tokenizer caches output for common chunks, for efficiency. This
cache is be invalidated when the tokenizer rules change, e.g. when a new
special-case rule is introduced. That's what was causing #1061 .
When the cache is flushed, we free the intermediate token chunks.
I *think* this is safe --- but if we start getting segfaults, this patch
is to blame. The resolution would be to simply not free those bits of
memory. They'll be freed when the tokenizer exits anyway.
2017-07-22 15:06:50 +02:00
Ines Montani
96df9c7154
Update CONTRIBUTORS.md
2017-07-22 15:05:46 +02:00
ines
b22b18a019
Add notes on spacy.explain() to annotation docs
2017-07-22 15:02:15 +02:00
Ines Montani
1ddbeddca2
Fix typo
2017-07-22 15:00:58 +02:00
ines
e3f23f9d91
Use latest available version in examples
2017-07-22 14:57:51 +02:00
Matthew Honnibal
23a55b40ca
Default to English noun chunks iterator if no lang set
2017-07-22 14:15:25 +02:00
Matthew Honnibal
9750a0128c
Fix Span.noun_chunks. Closes #1207
2017-07-22 14:14:57 +02:00
Matthew Honnibal
d9b85675d7
Rename regression test
2017-07-22 14:14:35 +02:00
Matthew Honnibal
dfbc7e49de
Add test for Issue #1207
2017-07-22 14:14:01 +02:00
Matthew Honnibal
0ae3807d7d
Fix gaps in Lexeme API. Closes #1031
2017-07-22 13:53:48 +02:00
Matthew Honnibal
83e1b5f1e3
Merge branch 'master' of https://github.com/explosion/spaCy
2017-07-22 13:45:35 +02:00
Matthew Honnibal
45f6961ae0
Add __version__ symbol in __init__.py
2017-07-22 13:45:21 +02:00
Matthew Honnibal
8b9c4c5e1c
Add missing SP symbol to tag map, re #1052
2017-07-22 13:44:17 +02:00
Ines Montani
69396dcfd3
Update CONTRIBUTORS.md
2017-07-22 13:43:15 +02:00
Ines Montani
9af04ea11f
Merge pull request #1161 from AlexisEidelman/patch-1
...
French NUM_WORDS and ORDINAL_WORDS
2017-07-22 13:40:46 +02:00
Matthew Honnibal
8b581fdac5
Remove unused example
2017-07-22 13:36:54 +02:00
Matthew Honnibal
44dd247e73
Merge branch 'master' of https://github.com/explosion/spaCy
2017-07-22 13:35:30 +02:00
Matthew Honnibal
94267ec50f
Fix merge conflit in printer
2017-07-22 13:35:15 +02:00
Ines Montani
c7708dc736
Merge pull request #1177 from swierh/master
...
Dutch NUM_WORDS and ORDINAL_WORDS
2017-07-22 13:35:08 +02:00
Matthew Honnibal
5916d46ba8
Avoid use of deepcopy in printer
2017-07-22 13:34:01 +02:00
Matthew Honnibal
a405660068
Add commit to tagger example
2017-07-22 13:32:48 +02:00
Matthew Honnibal
3fef5f642b
Rename tagger training example
2017-07-22 13:29:15 +02:00
Matthew Honnibal
8bb443be4f
Add standalone tagger training example
2017-07-22 13:28:51 +02:00
Matthew Honnibal
d6a5c2c85a
Add test for NER
2017-07-22 01:48:58 +02:00
Matthew Honnibal
28244df4da
Add test for beam parsing
2017-07-22 01:48:35 +02:00
Matthew Honnibal
c86445bdfd
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-07-22 01:14:28 +02:00
Matthew Honnibal
b3a749610e
Fix name of TextCategorizer
2017-07-22 01:14:07 +02:00
Matthew Honnibal
2424493970
Remove unnecessary import of Mock
2017-07-22 01:13:54 +02:00
Matthew Honnibal
baa3d81c35
Add text categorizer to Language
2017-07-22 01:13:36 +02:00
Matthew Honnibal
a6a2159969
Add slot for text categories to Doc
2017-07-22 00:34:15 +02:00
Matthew Honnibal
374ab3ecfb
Increment alpha version
2017-07-22 00:32:49 +02:00
Ines Montani
7c66691790
Merge pull request #1197 from jsparedes/patch-1
...
Fix url broken
2017-07-21 14:05:26 +02:00
Matthew Honnibal
289f23df51
Test beam parsing
2017-07-20 15:03:10 +02:00
Matthew Honnibal
3da1063b36
Add beam decoding to parser, to allow NER uncertainties
2017-07-20 15:02:55 +02:00
Matthew Honnibal
0ca5832427
Improve negative example handling in NER oracle
2017-07-20 00:18:49 +02:00
Matthew Honnibal
a231b56d40
Add text-classification hook to pipeline
2017-07-20 00:18:15 +02:00
Matthew Honnibal
7ea50182a5
Add support for text-classification labels to GoldParse
2017-07-20 00:17:47 +02:00
Matthew Honnibal
727481377e
Add text-classifer thinc models
2017-07-20 00:17:17 +02:00
Matthew Honnibal
f014138c11
Fix parser tests
2017-07-20 00:16:52 +02:00
Jorge Paredes
fadacd0d47
Fix url broken
...
The related url to **custom named entities** was broken
2017-07-16 10:06:32 -05:00
Ines Montani
2d22b63e09
Merge pull request #1186 from lgenerknol/master
...
.../cli/#foo is 404
2017-07-13 17:33:55 +02:00
lgenerknol
2b219caf0d
.../cli/#foo is 404
...
https://spacy.io/docs/usage/cli/#package is a 404.
Changed to https://spacy.io/docs/usage/cli#package
Definitely a larger fix possible to deal with trailing slashes
2017-07-12 13:12:24 -04:00
Ines Montani
d79fa8743a
Merge pull request #1185 from lgenerknol/master
...
Missing markup char
2017-07-12 17:27:42 +02:00
lgenerknol
6cf2690943
Missing markup char
...
Frontend displayed:
```
If start_idx and do not mark[...]
```
Note the missing "end_idx" after 'and'.
2017-07-12 11:06:16 -04:00
Ines Montani
9eca6503c1
Merge pull request #1157 from polm/master
...
Add basic Japanese Tokenizer Test
2017-07-10 13:07:11 +02:00
Paul O'Leary McCann
bc87b815cc
Add comment clarifying what LANGUAGES does
2017-07-09 16:28:55 +09:00
Paul O'Leary McCann
04e6a65188
Remove Japanese from LANGUAGES
...
LANGUAGES is a list of languages whose tokenizers get run through a
variety of generic tests. Since the generic tests don't check the JA
fixture, it blows up when it can't find janome. -POLM
2017-07-09 16:23:26 +09:00
Ines Montani
2b9411bb54
Merge pull request #1181 from val314159/patch-1
...
make this work in python2.7
2017-07-08 00:15:47 +02:00
val314159
19d4706f69
make this work in python2.7
2017-07-07 13:18:17 -07:00
Swier
29720150f9
fix import of stop words in language data
2017-07-05 14:08:04 +02:00