Commit Graph

5203 Commits

Author SHA1 Message Date
Ines Montani
58bfe30a12 Merge pull request #1362 from IamJeffG/docs/custom-tokenizer
Document Tokenizer(token_match) and clarify tokenizer_pseudo_code
2017-09-26 15:51:15 +02:00
Ines Montani
361211fe26 Merge pull request #1342 from wannaphongcom/master
Add Thai language
2017-09-26 15:40:55 +02:00
Jeffrey Gerard
b6ebedd09c Document Tokenizer(token_match) and clarify tokenizer_pseudo_code
Closes #835

In the `tokenizer_pseudo_code` I put the `special_cases` kwarg
before `find_prefix` because this now matches the order the args
are used in the pseudocode, and it also matches spacy's actual code.
2017-09-25 13:13:25 -07:00
Matthew Honnibal
2f8d535f65 Merge pull request #1351 from hscspring/patch-4
Update punctuation.py
2017-09-24 12:16:39 +02:00
Matthew Honnibal
9177313063 Merge pull request #1352 from hscspring/patch-5
Update customizing-tokenizer.jade
2017-09-22 16:11:49 +02:00
Matthew Honnibal
1dbc2285b8 Merge pull request #1350 from hscspring/patch-3
Update word-vectors-similarities.jade
2017-09-22 16:11:05 +02:00
Yam
54855f0eee Update customizing-tokenizer.jade 2017-09-22 12:15:48 +08:00
Yam
6f450306c3 Update customizing-tokenizer.jade
update some codes:    
- `me` -> `-PRON`
- `TAG` -> `POS`
- `create_tokenizer` function
2017-09-22 10:53:22 +08:00
Yam
923c4c2fb2 Update punctuation.py
add `……`
2017-09-22 09:50:46 +08:00
Yam
425c09488d Update word-vectors-similarities.jade
add
```    
import spacy
nlp = spacy.load('en') ```
2017-09-22 08:56:34 +08:00
Wannaphong Phatthiyaphaibun
1abf472068 add th test 2017-09-21 12:56:58 +07:00
Matthew Honnibal
ea2732469b Merge pull request #1340 from hscspring/patch-1
Update punctuation.py
2017-09-20 23:57:00 +02:00
Wannaphong Phatthiyaphaibun
39bb5690f0 update th 2017-09-21 00:36:02 +07:00
Wannaphong Phatthiyaphaibun
44291f6697 add thai 2017-09-20 23:26:34 +07:00
Yam
978b24ccd4 Update punctuation.py
In Chinese, `~` and `——` is hyphens,   
`·` is intermittent symbol
2017-09-20 23:02:22 +08:00
Matthew Honnibal
aa728b33ca Merge pull request #1333 from galaxyh/master
Add Chinese punctuation
2017-09-19 15:09:30 +02:00
Yu-chun Huang
188b439b25 Add Chinese punctuation
Add Chinese punctuation.
2017-09-19 16:58:42 +08:00
Yu-chun Huang
1f1f35dcd0 Add Chinese punctuation
Add Chinese punctuation.
2017-09-19 16:57:24 +08:00
Ines Montani
4bee26188d Merge pull request #1323 from galaxyh/master
Set the "cut_all" parameter in jieba.cut() to False, or jieba will return ALL POSSIBLE word segmentations.
2017-09-14 15:23:41 +02:00
Yu-chun Huang
7692b8c071 Update __init__.py
Set the "cut_all" parameter to False, or jieba will return ALL POSSIBLE word segmentations.
2017-09-12 16:23:47 +08:00
Matthew Honnibal
ddaff6ca56 Merge pull request #1287 from IamJeffG/feature/1226-more-complete-noun-chunks
Capture more noun chunks
2017-09-08 07:59:10 +02:00
Matthew Honnibal
45029a550e Fix customized-tokenizer tests 2017-09-04 20:13:13 +02:00
Matthew Honnibal
34c585396a Merge pull request #1294 from Vimos/master
Fix issue #1292 and add test case for the Assertion Error
2017-09-04 19:20:40 +02:00
Matthew Honnibal
c68f188eb0 Fix error on test 2017-09-04 18:59:36 +02:00
Matthew Honnibal
33313c01ad Merge pull request #1298 from ericzhao28/master
Lowest common ancestor matrix for spans and docs
2017-09-04 18:57:54 +02:00
Matthew Honnibal
e8a26ebfab Add efficiency note to new get_lca_matrix() method 2017-09-04 15:43:52 +02:00
Eric Zhao
d61c117081 Lowest common ancestor matrix for spans and docs
Added functionality for spans and docs to get lowest common ancestor
matrix by simply calling: doc.get_lca_matrix() or
doc[:3].get_lca_matrix().
Corresponding unit tests were also added under spacy/tests/doc and
spacy/tests/spans.
Designed to address: https://github.com/explosion/spaCy/issues/969.
2017-09-03 12:22:19 -07:00
Matthew Honnibal
9bffcaa73d Update test to make it slightly more direct
The `nlp` container should be unnecessary here. If so, we can test the tokenizer class just a little more directly.
2017-09-01 21:16:56 +02:00
Vimos Tan
a6d9fb5bb6 fix issue #1292 2017-08-30 14:49:14 +08:00
Jeffrey Gerard
884ba168a8 Capture more noun chunks 2017-08-23 21:18:53 -07:00
ines
dcff10abe9 Add regression test for #1281 2017-08-21 16:11:47 +02:00
ines
edc596d9a7 Add missing tokenizer exceptions (resolves #1281) 2017-08-21 16:11:36 +02:00
ines
c5c3f4c7d9 Use more generous .env ignore rule 2017-08-21 16:08:40 +02:00
Ines Montani
dca026124f Merge pull request #1262 from kevinmarsh/patch-1
Fix broken tutorial link on website
2017-08-16 09:58:07 +02:00
Kevin Marsh
e3738aba0d Fix broken tutorial link on website 2017-08-15 21:50:09 +01:00
Ines Montani
a9465271a7 Merge pull request #1245 from delirious-lettuce/fix_typos
Fix typos
2017-08-07 23:11:20 +02:00
Delirious Lettuce
d3b03f0544 Fix typos:
* `auxillary` -> `auxiliary`
  * `consistute` -> `constitute`
  * `earlist` -> `earliest`
  * `prefered` -> `preferred`
  * `direcory` -> `directory`
  * `reuseable` -> `reusable`
  * `idiosyncracies` -> `idiosyncrasies`
  * `enviroment` -> `environment`
  * `unecessary` -> `unnecessary`
  * `yesteday` -> `yesterday`
  * `resouces` -> `resources`
2017-08-06 21:31:39 -06:00
Matthew Honnibal
b7b121103f Merge pull request #1244 from gideonite/patch-1
improve pipe, tee, izip explanation
2017-08-06 14:34:07 +02:00
Gideon Dresdner
7e98a3613c improve pipe, tee, izip explanation
Use an example from an old issue https://github.com/explosion/spaCy/issues/172#issuecomment-183963403.
2017-08-06 13:21:45 +02:00
ines
864cefd3b2 Update README.rst 2017-07-22 18:29:55 +02:00
ines
e349271506 Increment version 2017-07-22 18:29:30 +02:00
Ines Montani
570964e67f Update README.rst 2017-07-22 16:20:19 +02:00
Matthew Honnibal
5494605689 Fiddle with regex pin 2017-07-22 16:09:50 +02:00
Matthew Honnibal
78fcf56dd5 Update version pin for regex library 2017-07-22 15:57:58 +02:00
Matthew Honnibal
d51d55bba6 Increment version 2017-07-22 15:43:16 +02:00
Matthew Honnibal
8ccf154413 Merge branch 'master' of https://github.com/explosion/spaCy 2017-07-22 15:42:44 +02:00
Matthew Honnibal
796b2f4c1b Remove print statements in tests 2017-07-22 15:42:38 +02:00
ines
7c4bf9994d Add note on requirements and preventing model re-downloads (closes #1143) 2017-07-22 15:40:12 +02:00
ines
de25bad036 Use lower min version for requests dependency (fixes #1137)
Ensure compatibility with docker-compose and other packages
2017-07-22 15:29:10 +02:00
ines
d7560047c5 Fix version 2017-07-22 15:24:33 +02:00