Paul O'Leary McCann
53e17296e9
Fix pronoun handling
...
Missed this case earlier.
連体詞 have three classes for UD purposes:
- その -> DET
- それ -> PRON
- 同じ -> ADJ
-POLM
2017-08-22 00:01:49 +09:00
Paul O'Leary McCann
c435f748d7
Put Mecab import in utility function
2017-08-22 00:01:28 +09:00
ines
dcff10abe9
Add regression test for #1281
2017-08-21 16:11:47 +02:00
ines
edc596d9a7
Add missing tokenizer exceptions ( resolves #1281 )
2017-08-21 16:11:36 +02:00
ines
c5c3f4c7d9
Use more generous .env ignore rule
2017-08-21 16:08:40 +02:00
Paul O'Leary McCann
234a8a7591
Change default tag for 動詞,非自立可能
...
Example of this is いる in these sentences:
彼はそこにいる。# should be VERB
彼は底に立っている。# should be AUX
Unclear which case is more numerous - need to check a large corpus - but
in keeping with the other ambiguous tags, this is mapped to the
"dominant" or first part of the tag. -POLM
2017-08-21 00:21:45 +09:00
Ines Montani
dca026124f
Merge pull request #1262 from kevinmarsh/patch-1
...
Fix broken tutorial link on website
2017-08-16 09:58:07 +02:00
Kevin Marsh
e3738aba0d
Fix broken tutorial link on website
2017-08-15 21:50:09 +01:00
Ines Montani
a9465271a7
Merge pull request #1245 from delirious-lettuce/fix_typos
...
Fix typos
2017-08-07 23:11:20 +02:00
Paul O'Leary McCann
6e9e686568
Sample implementation of Japanese Tagger (ref #1214 )
...
This is far from complete but it should be enough to check some things.
1. Mecab transition. Janome doesn't support Unidic, only IPAdic, but UD
tag mappings are based on Unidic. This switches out Mecab for Janome to
get around that.
2. Raw tag extension. A simple tag map can't meet the specifications for
UD tag mappings, so this adds an extra field to ambiguous cases. For
this demo it just deals with the simplest case, which only needs to look
at the literal token. (In reality it may be necessary to look at the
whole sentence, but that's another issue.)
3. General code structure. Seems nobody else has implemented a custom
Tagger yet, so still not sure this is the correct way to pass the
vocabulary around, for example.
Any feedback would be greatly appreciated. -POLM
2017-08-08 01:27:15 +09:00
Delirious Lettuce
d3b03f0544
Fix typos:
...
* `auxillary` -> `auxiliary`
* `consistute` -> `constitute`
* `earlist` -> `earliest`
* `prefered` -> `preferred`
* `direcory` -> `directory`
* `reuseable` -> `reusable`
* `idiosyncracies` -> `idiosyncrasies`
* `enviroment` -> `environment`
* `unecessary` -> `unnecessary`
* `yesteday` -> `yesterday`
* `resouces` -> `resources`
2017-08-06 21:31:39 -06:00
Matthew Honnibal
b7b121103f
Merge pull request #1244 from gideonite/patch-1
...
improve pipe, tee, izip explanation
2017-08-06 14:34:07 +02:00
Gideon Dresdner
7e98a3613c
improve pipe, tee, izip explanation
...
Use an example from an old issue https://github.com/explosion/spaCy/issues/172#issuecomment-183963403 .
2017-08-06 13:21:45 +02:00
ines
864cefd3b2
Update README.rst
2017-07-22 18:29:55 +02:00
ines
e349271506
Increment version
2017-07-22 18:29:30 +02:00
Ines Montani
570964e67f
Update README.rst
2017-07-22 16:20:19 +02:00
Matthew Honnibal
5494605689
Fiddle with regex pin
2017-07-22 16:09:50 +02:00
Matthew Honnibal
78fcf56dd5
Update version pin for regex library
2017-07-22 15:57:58 +02:00
Matthew Honnibal
d51d55bba6
Increment version
2017-07-22 15:43:16 +02:00
Matthew Honnibal
8ccf154413
Merge branch 'master' of https://github.com/explosion/spaCy
2017-07-22 15:42:44 +02:00
Matthew Honnibal
796b2f4c1b
Remove print statements in tests
2017-07-22 15:42:38 +02:00
ines
7c4bf9994d
Add note on requirements and preventing model re-downloads ( closes #1143 )
2017-07-22 15:40:12 +02:00
ines
de25bad036
Use lower min version for requests dependency ( fixes #1137 )
...
Ensure compatibility with docker-compose and other packages
2017-07-22 15:29:10 +02:00
ines
d7560047c5
Fix version
2017-07-22 15:24:33 +02:00
Matthew Honnibal
af945ea8e2
Merge branch 'master' of https://github.com/explosion/spaCy
2017-07-22 15:09:59 +02:00
Matthew Honnibal
4b2e5e59ed
Add flush_cache method to tokenizer, to fix #1061
...
The tokenizer caches output for common chunks, for efficiency. This
cache is be invalidated when the tokenizer rules change, e.g. when a new
special-case rule is introduced. That's what was causing #1061 .
When the cache is flushed, we free the intermediate token chunks.
I *think* this is safe --- but if we start getting segfaults, this patch
is to blame. The resolution would be to simply not free those bits of
memory. They'll be freed when the tokenizer exits anyway.
2017-07-22 15:06:50 +02:00
Ines Montani
96df9c7154
Update CONTRIBUTORS.md
2017-07-22 15:05:46 +02:00
ines
b22b18a019
Add notes on spacy.explain() to annotation docs
2017-07-22 15:02:15 +02:00
ines
e3f23f9d91
Use latest available version in examples
2017-07-22 14:57:51 +02:00
Matthew Honnibal
23a55b40ca
Default to English noun chunks iterator if no lang set
2017-07-22 14:15:25 +02:00
Matthew Honnibal
9750a0128c
Fix Span.noun_chunks. Closes #1207
2017-07-22 14:14:57 +02:00
Matthew Honnibal
d9b85675d7
Rename regression test
2017-07-22 14:14:35 +02:00
Matthew Honnibal
dfbc7e49de
Add test for Issue #1207
2017-07-22 14:14:01 +02:00
Matthew Honnibal
0ae3807d7d
Fix gaps in Lexeme API. Closes #1031
2017-07-22 13:53:48 +02:00
Matthew Honnibal
83e1b5f1e3
Merge branch 'master' of https://github.com/explosion/spaCy
2017-07-22 13:45:35 +02:00
Matthew Honnibal
45f6961ae0
Add __version__ symbol in __init__.py
2017-07-22 13:45:21 +02:00
Matthew Honnibal
8b9c4c5e1c
Add missing SP symbol to tag map, re #1052
2017-07-22 13:44:17 +02:00
Ines Montani
69396dcfd3
Update CONTRIBUTORS.md
2017-07-22 13:43:15 +02:00
Ines Montani
9af04ea11f
Merge pull request #1161 from AlexisEidelman/patch-1
...
French NUM_WORDS and ORDINAL_WORDS
2017-07-22 13:40:46 +02:00
Matthew Honnibal
8b581fdac5
Remove unused example
2017-07-22 13:36:54 +02:00
Matthew Honnibal
44dd247e73
Merge branch 'master' of https://github.com/explosion/spaCy
2017-07-22 13:35:30 +02:00
Matthew Honnibal
94267ec50f
Fix merge conflit in printer
2017-07-22 13:35:15 +02:00
Ines Montani
c7708dc736
Merge pull request #1177 from swierh/master
...
Dutch NUM_WORDS and ORDINAL_WORDS
2017-07-22 13:35:08 +02:00
Matthew Honnibal
5916d46ba8
Avoid use of deepcopy in printer
2017-07-22 13:34:01 +02:00
Matthew Honnibal
a405660068
Add commit to tagger example
2017-07-22 13:32:48 +02:00
Matthew Honnibal
3fef5f642b
Rename tagger training example
2017-07-22 13:29:15 +02:00
Matthew Honnibal
8bb443be4f
Add standalone tagger training example
2017-07-22 13:28:51 +02:00
Ines Montani
7c66691790
Merge pull request #1197 from jsparedes/patch-1
...
Fix url broken
2017-07-21 14:05:26 +02:00
Jorge Paredes
fadacd0d47
Fix url broken
...
The related url to **custom named entities** was broken
2017-07-16 10:06:32 -05:00
Ines Montani
2d22b63e09
Merge pull request #1186 from lgenerknol/master
...
.../cli/#foo is 404
2017-07-13 17:33:55 +02:00