Commit Graph

5183 Commits

Author SHA1 Message Date
Matthew Honnibal
ddaff6ca56 Merge pull request #1287 from IamJeffG/feature/1226-more-complete-noun-chunks
Capture more noun chunks
2017-09-08 07:59:10 +02:00
Matthew Honnibal
45029a550e Fix customized-tokenizer tests 2017-09-04 20:13:13 +02:00
Matthew Honnibal
34c585396a Merge pull request #1294 from Vimos/master
Fix issue #1292 and add test case for the Assertion Error
2017-09-04 19:20:40 +02:00
Matthew Honnibal
c68f188eb0 Fix error on test 2017-09-04 18:59:36 +02:00
Matthew Honnibal
33313c01ad Merge pull request #1298 from ericzhao28/master
Lowest common ancestor matrix for spans and docs
2017-09-04 18:57:54 +02:00
Matthew Honnibal
e8a26ebfab Add efficiency note to new get_lca_matrix() method 2017-09-04 15:43:52 +02:00
Eric Zhao
d61c117081 Lowest common ancestor matrix for spans and docs
Added functionality for spans and docs to get lowest common ancestor
matrix by simply calling: doc.get_lca_matrix() or
doc[:3].get_lca_matrix().
Corresponding unit tests were also added under spacy/tests/doc and
spacy/tests/spans.
Designed to address: https://github.com/explosion/spaCy/issues/969.
2017-09-03 12:22:19 -07:00
Matthew Honnibal
9bffcaa73d Update test to make it slightly more direct
The `nlp` container should be unnecessary here. If so, we can test the tokenizer class just a little more directly.
2017-09-01 21:16:56 +02:00
Vimos Tan
a6d9fb5bb6 fix issue #1292 2017-08-30 14:49:14 +08:00
Jeffrey Gerard
884ba168a8 Capture more noun chunks 2017-08-23 21:18:53 -07:00
ines
dcff10abe9 Add regression test for #1281 2017-08-21 16:11:47 +02:00
ines
edc596d9a7 Add missing tokenizer exceptions (resolves #1281) 2017-08-21 16:11:36 +02:00
ines
c5c3f4c7d9 Use more generous .env ignore rule 2017-08-21 16:08:40 +02:00
Ines Montani
dca026124f Merge pull request #1262 from kevinmarsh/patch-1
Fix broken tutorial link on website
2017-08-16 09:58:07 +02:00
Kevin Marsh
e3738aba0d Fix broken tutorial link on website 2017-08-15 21:50:09 +01:00
Ines Montani
a9465271a7 Merge pull request #1245 from delirious-lettuce/fix_typos
Fix typos
2017-08-07 23:11:20 +02:00
Delirious Lettuce
d3b03f0544 Fix typos:
* `auxillary` -> `auxiliary`
  * `consistute` -> `constitute`
  * `earlist` -> `earliest`
  * `prefered` -> `preferred`
  * `direcory` -> `directory`
  * `reuseable` -> `reusable`
  * `idiosyncracies` -> `idiosyncrasies`
  * `enviroment` -> `environment`
  * `unecessary` -> `unnecessary`
  * `yesteday` -> `yesterday`
  * `resouces` -> `resources`
2017-08-06 21:31:39 -06:00
Matthew Honnibal
b7b121103f Merge pull request #1244 from gideonite/patch-1
improve pipe, tee, izip explanation
2017-08-06 14:34:07 +02:00
Gideon Dresdner
7e98a3613c improve pipe, tee, izip explanation
Use an example from an old issue https://github.com/explosion/spaCy/issues/172#issuecomment-183963403.
2017-08-06 13:21:45 +02:00
ines
864cefd3b2 Update README.rst 2017-07-22 18:29:55 +02:00
ines
e349271506 Increment version 2017-07-22 18:29:30 +02:00
Ines Montani
570964e67f Update README.rst 2017-07-22 16:20:19 +02:00
Matthew Honnibal
5494605689 Fiddle with regex pin 2017-07-22 16:09:50 +02:00
Matthew Honnibal
78fcf56dd5 Update version pin for regex library 2017-07-22 15:57:58 +02:00
Matthew Honnibal
d51d55bba6 Increment version 2017-07-22 15:43:16 +02:00
Matthew Honnibal
8ccf154413 Merge branch 'master' of https://github.com/explosion/spaCy 2017-07-22 15:42:44 +02:00
Matthew Honnibal
796b2f4c1b Remove print statements in tests 2017-07-22 15:42:38 +02:00
ines
7c4bf9994d Add note on requirements and preventing model re-downloads (closes #1143) 2017-07-22 15:40:12 +02:00
ines
de25bad036 Use lower min version for requests dependency (fixes #1137)
Ensure compatibility with docker-compose and other packages
2017-07-22 15:29:10 +02:00
ines
d7560047c5 Fix version 2017-07-22 15:24:33 +02:00
Matthew Honnibal
af945ea8e2 Merge branch 'master' of https://github.com/explosion/spaCy 2017-07-22 15:09:59 +02:00
Matthew Honnibal
4b2e5e59ed Add flush_cache method to tokenizer, to fix #1061
The tokenizer caches output for common chunks, for efficiency. This
cache is be invalidated when the tokenizer rules change, e.g. when a new
special-case rule is introduced. That's what was causing #1061.

When the cache is flushed, we free the intermediate token chunks.
I *think* this is safe --- but if we start getting segfaults, this patch
is to blame. The resolution would be to simply not free those bits of
memory. They'll be freed when the tokenizer exits anyway.
2017-07-22 15:06:50 +02:00
Ines Montani
96df9c7154 Update CONTRIBUTORS.md 2017-07-22 15:05:46 +02:00
ines
b22b18a019 Add notes on spacy.explain() to annotation docs 2017-07-22 15:02:15 +02:00
ines
e3f23f9d91 Use latest available version in examples 2017-07-22 14:57:51 +02:00
Matthew Honnibal
23a55b40ca Default to English noun chunks iterator if no lang set 2017-07-22 14:15:25 +02:00
Matthew Honnibal
9750a0128c Fix Span.noun_chunks. Closes #1207 2017-07-22 14:14:57 +02:00
Matthew Honnibal
d9b85675d7 Rename regression test 2017-07-22 14:14:35 +02:00
Matthew Honnibal
dfbc7e49de Add test for Issue #1207 2017-07-22 14:14:01 +02:00
Matthew Honnibal
0ae3807d7d Fix gaps in Lexeme API. Closes #1031 2017-07-22 13:53:48 +02:00
Matthew Honnibal
83e1b5f1e3 Merge branch 'master' of https://github.com/explosion/spaCy 2017-07-22 13:45:35 +02:00
Matthew Honnibal
45f6961ae0 Add __version__ symbol in __init__.py 2017-07-22 13:45:21 +02:00
Matthew Honnibal
8b9c4c5e1c Add missing SP symbol to tag map, re #1052 2017-07-22 13:44:17 +02:00
Ines Montani
69396dcfd3 Update CONTRIBUTORS.md 2017-07-22 13:43:15 +02:00
Ines Montani
9af04ea11f Merge pull request #1161 from AlexisEidelman/patch-1
French NUM_WORDS and ORDINAL_WORDS
2017-07-22 13:40:46 +02:00
Matthew Honnibal
8b581fdac5 Remove unused example 2017-07-22 13:36:54 +02:00
Matthew Honnibal
44dd247e73 Merge branch 'master' of https://github.com/explosion/spaCy 2017-07-22 13:35:30 +02:00
Matthew Honnibal
94267ec50f Fix merge conflit in printer 2017-07-22 13:35:15 +02:00
Ines Montani
c7708dc736 Merge pull request #1177 from swierh/master
Dutch NUM_WORDS and ORDINAL_WORDS
2017-07-22 13:35:08 +02:00
Matthew Honnibal
5916d46ba8 Avoid use of deepcopy in printer 2017-07-22 13:34:01 +02:00