Matthew Honnibal
0e1adacaff
Merge pull request #1390 from mdcclv/contributor-mdcclv
...
Contributor agreement for Orion Montoya @mdcclv
2017-10-06 02:39:08 +02:00
Orion Montoya
e04e11070f
Contributor agreement for Orion Montoya @mdcclv
2017-10-05 17:45:45 -04:00
Ines Montani
e77d8886f7
Update CONTRIBUTORS.md
2017-10-05 22:22:04 +02:00
Matthew Honnibal
dea81f113d
Merge pull request #1389 from mdcclv/lemmatizer_obey_exceptions
...
Lemmatizer obey exceptions
2017-10-05 22:11:21 +02:00
Orion Montoya
b0d271809d
Unit test for lemmatizer exceptions -- copied from regression test for #1387
2017-10-05 10:49:28 -04:00
Orion Montoya
ffb50d21a0
Lemmatizer honors exceptions: Fix #1387
2017-10-05 10:49:02 -04:00
Orion Montoya
e81a608173
Regression test for lemmatizer exceptions -- demonstrate issue #1387
2017-10-05 10:47:48 -04:00
Ines Montani
678651ca98
Merge pull request #1386 from kokes/patch-1
...
Fixing links to SyntaxNet
2017-10-04 13:35:01 +02:00
Ondrej Kokes
a9362f1c73
Fixing links to SyntaxNet
2017-10-04 12:55:07 +02:00
Matthew Honnibal
eb72eae258
Merge pull request #1364 from Destygo/master
...
Fixed NER model loading bug
2017-09-29 12:29:43 +02:00
Ines Montani
58bfe30a12
Merge pull request #1362 from IamJeffG/docs/custom-tokenizer
...
Document Tokenizer(token_match) and clarify tokenizer_pseudo_code
2017-09-26 15:51:15 +02:00
Vincent Genty
259ed027af
Fixed NER model loading bug
2017-09-26 15:46:04 +02:00
Ines Montani
361211fe26
Merge pull request #1342 from wannaphongcom/master
...
Add Thai language
2017-09-26 15:40:55 +02:00
Jeffrey Gerard
b6ebedd09c
Document Tokenizer(token_match) and clarify tokenizer_pseudo_code
...
Closes #835
In the `tokenizer_pseudo_code` I put the `special_cases` kwarg
before `find_prefix` because this now matches the order the args
are used in the pseudocode, and it also matches spacy's actual code.
2017-09-25 13:13:25 -07:00
Matthew Honnibal
2f8d535f65
Merge pull request #1351 from hscspring/patch-4
...
Update punctuation.py
2017-09-24 12:16:39 +02:00
Matthew Honnibal
9177313063
Merge pull request #1352 from hscspring/patch-5
...
Update customizing-tokenizer.jade
2017-09-22 16:11:49 +02:00
Matthew Honnibal
1dbc2285b8
Merge pull request #1350 from hscspring/patch-3
...
Update word-vectors-similarities.jade
2017-09-22 16:11:05 +02:00
Yam
54855f0eee
Update customizing-tokenizer.jade
2017-09-22 12:15:48 +08:00
Yam
6f450306c3
Update customizing-tokenizer.jade
...
update some codes:
- `me` -> `-PRON`
- `TAG` -> `POS`
- `create_tokenizer` function
2017-09-22 10:53:22 +08:00
Yam
923c4c2fb2
Update punctuation.py
...
add `……`
2017-09-22 09:50:46 +08:00
Yam
425c09488d
Update word-vectors-similarities.jade
...
add
```
import spacy
nlp = spacy.load('en') ```
2017-09-22 08:56:34 +08:00
Wannaphong Phatthiyaphaibun
1abf472068
add th test
2017-09-21 12:56:58 +07:00
Matthew Honnibal
ea2732469b
Merge pull request #1340 from hscspring/patch-1
...
Update punctuation.py
2017-09-20 23:57:00 +02:00
Wannaphong Phatthiyaphaibun
39bb5690f0
update th
2017-09-21 00:36:02 +07:00
Wannaphong Phatthiyaphaibun
44291f6697
add thai
2017-09-20 23:26:34 +07:00
Yam
978b24ccd4
Update punctuation.py
...
In Chinese, `~` and `——` is hyphens,
`·` is intermittent symbol
2017-09-20 23:02:22 +08:00
Matthew Honnibal
aa728b33ca
Merge pull request #1333 from galaxyh/master
...
Add Chinese punctuation
2017-09-19 15:09:30 +02:00
Yu-chun Huang
188b439b25
Add Chinese punctuation
...
Add Chinese punctuation.
2017-09-19 16:58:42 +08:00
Yu-chun Huang
1f1f35dcd0
Add Chinese punctuation
...
Add Chinese punctuation.
2017-09-19 16:57:24 +08:00
Ines Montani
4bee26188d
Merge pull request #1323 from galaxyh/master
...
Set the "cut_all" parameter in jieba.cut() to False, or jieba will return ALL POSSIBLE word segmentations.
2017-09-14 15:23:41 +02:00
Yu-chun Huang
7692b8c071
Update __init__.py
...
Set the "cut_all" parameter to False, or jieba will return ALL POSSIBLE word segmentations.
2017-09-12 16:23:47 +08:00
Matthew Honnibal
ddaff6ca56
Merge pull request #1287 from IamJeffG/feature/1226-more-complete-noun-chunks
...
Capture more noun chunks
2017-09-08 07:59:10 +02:00
Matthew Honnibal
45029a550e
Fix customized-tokenizer tests
2017-09-04 20:13:13 +02:00
Matthew Honnibal
34c585396a
Merge pull request #1294 from Vimos/master
...
Fix issue #1292 and add test case for the Assertion Error
2017-09-04 19:20:40 +02:00
Matthew Honnibal
c68f188eb0
Fix error on test
2017-09-04 18:59:36 +02:00
Matthew Honnibal
33313c01ad
Merge pull request #1298 from ericzhao28/master
...
Lowest common ancestor matrix for spans and docs
2017-09-04 18:57:54 +02:00
Matthew Honnibal
e8a26ebfab
Add efficiency note to new get_lca_matrix() method
2017-09-04 15:43:52 +02:00
Eric Zhao
d61c117081
Lowest common ancestor matrix for spans and docs
...
Added functionality for spans and docs to get lowest common ancestor
matrix by simply calling: doc.get_lca_matrix() or
doc[:3].get_lca_matrix().
Corresponding unit tests were also added under spacy/tests/doc and
spacy/tests/spans.
Designed to address: https://github.com/explosion/spaCy/issues/969 .
2017-09-03 12:22:19 -07:00
Matthew Honnibal
9bffcaa73d
Update test to make it slightly more direct
...
The `nlp` container should be unnecessary here. If so, we can test the tokenizer class just a little more directly.
2017-09-01 21:16:56 +02:00
Vimos Tan
a6d9fb5bb6
fix issue #1292
2017-08-30 14:49:14 +08:00
Jeffrey Gerard
884ba168a8
Capture more noun chunks
2017-08-23 21:18:53 -07:00
ines
dcff10abe9
Add regression test for #1281
2017-08-21 16:11:47 +02:00
ines
edc596d9a7
Add missing tokenizer exceptions ( resolves #1281 )
2017-08-21 16:11:36 +02:00
ines
c5c3f4c7d9
Use more generous .env ignore rule
2017-08-21 16:08:40 +02:00
Ines Montani
dca026124f
Merge pull request #1262 from kevinmarsh/patch-1
...
Fix broken tutorial link on website
2017-08-16 09:58:07 +02:00
Kevin Marsh
e3738aba0d
Fix broken tutorial link on website
2017-08-15 21:50:09 +01:00
Ines Montani
a9465271a7
Merge pull request #1245 from delirious-lettuce/fix_typos
...
Fix typos
2017-08-07 23:11:20 +02:00
Delirious Lettuce
d3b03f0544
Fix typos:
...
* `auxillary` -> `auxiliary`
* `consistute` -> `constitute`
* `earlist` -> `earliest`
* `prefered` -> `preferred`
* `direcory` -> `directory`
* `reuseable` -> `reusable`
* `idiosyncracies` -> `idiosyncrasies`
* `enviroment` -> `environment`
* `unecessary` -> `unnecessary`
* `yesteday` -> `yesterday`
* `resouces` -> `resources`
2017-08-06 21:31:39 -06:00
Matthew Honnibal
b7b121103f
Merge pull request #1244 from gideonite/patch-1
...
improve pipe, tee, izip explanation
2017-08-06 14:34:07 +02:00
Gideon Dresdner
7e98a3613c
improve pipe, tee, izip explanation
...
Use an example from an old issue https://github.com/explosion/spaCy/issues/172#issuecomment-183963403 .
2017-08-06 13:21:45 +02:00