Commit Graph

463 Commits

Author SHA1 Message Date
Adriane Boyd
f0fd77648f Change example title to Dr.
Change example title to Dr. so the current model does exclude the title
in the initial example.
2020-06-16 20:36:21 +02:00
Adriane Boyd
a6abdfbc3c Fix numpy.zeros() dtype for Doc.from_array 2020-06-16 20:35:45 +02:00
Adriane Boyd
9aff317ca7 Update POS in tagging example 2020-06-16 20:26:57 +02:00
Adriane Boyd
457babfa0c Update alignment example for new gold.align 2020-06-16 20:22:03 +02:00
Jannis
aa53ce6996
Documentation Typo Fix (#5492)
* Fix typo

Change 'realize' to 'realise'

* Add contributer agreement
2020-05-22 19:50:26 +02:00
Adriane Boyd
e4a1b5dab1 Rename to url_match
Rename to `url_match` and update docs.
2020-05-22 12:41:03 +02:00
Adriane Boyd
730fa493a4 Merge remote-tracking branch 'upstream/master' into bugfix/revert-token-match 2020-05-22 12:18:00 +02:00
Ines Montani
f333c2a011
Merge pull request #5386 from svlandeg/fix/nel-docs 2020-05-10 12:00:09 +02:00
adrianeboyd
4a15b559ba
Clarify Token.pos as UPOS (#5419) 2020-05-08 10:36:25 +02:00
Adriane Boyd
792c8af8cf Merge remote-tracking branch 'upstream/master' into bugfix/revert-token-match 2020-05-05 09:25:57 +02:00
svlandeg
ebaed7dcfa Few more updates to the EL documentation 2020-04-30 10:17:06 +02:00
Sofie Van Landeghem
cfdaf99b80
Fix passing of component configuration (#5374)
* add kwargs to to_disk methods in docs - otherwise crashes on 'exclude' argument

* add fix and test for Issue 5137
2020-04-29 12:56:17 +02:00
Sofie Van Landeghem
f67343295d
Update NEL examples and documentation (#5370)
* simplify creation of KB by skipping dim reduction

* small fixes to train EL example script

* add KB creation and NEL training example scripts to example section

* update descriptions of example scripts in the documentation

* moving wiki_entity_linking folder from bin to projects

* remove test for wiki NEL functionality that is being moved
2020-04-29 12:53:53 +02:00
adrianeboyd
90ce34db42
Add cuda101 and cuda102 options to setup (#5377)
* Add cuda101 and cuda102 options to setup

* Update cudaNNN options in docs
2020-04-29 12:51:12 +02:00
Mike
481574cbc8
[minor doc change] embedding vis. link is broken in website/docs/usage/examples.md (#5325)
* The embedding vis. link is broken

The first link seems to be reasonable for now unless someone has an updated embedding vis they want to share?

* contributor agreement

* Update Mlawrence95.md

* Update website/docs/usage/examples.md

Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-04-21 20:35:12 +02:00
Sofie Van Landeghem
1137420840
Small doc fixes (#5250)
* fix link

* torchtext instead tochtext
2020-04-03 13:01:43 +02:00
Tiljander
e53232533b
Describing priority rules for overlapping matches (#5197)
* Describing priority rules for overlapping matches

* Create Tiljander.md

* Describing priority rules for overlapping matches

* Update website/docs/api/entityruler.md

Co-Authored-By: Ines Montani <ines@ines.io>

Co-authored-by: Ines Montani <ines@ines.io>
2020-03-26 13:13:22 +01:00
adrianeboyd
d88a377bed
Remove Vectors.from_glove (#5209) 2020-03-26 10:45:47 +01:00
Ines Montani
17bd9ed84f
Merge pull request #5153 from pinealan/fix/website-docs
Fix website typos and weird sentences
2020-03-16 15:03:01 +01:00
Alan Chan
36e3532475 Remove unfinished sentence 2020-03-15 03:45:17 +08:00
Mark Abraham
a0ffa346c0 Fix broken link in docs 2020-03-13 14:07:26 +01:00
Renaud Richardet
eccf6b1686
small typo in code sample 2020-03-09 14:49:11 +01:00
Adriane Boyd
0c31f03ec5 Update docs [ci skip] 2020-03-09 13:41:17 +01:00
Adriane Boyd
1139247532 Revert changes to token_match priority from #4374
* Revert changes to priority of `token_match` so that it has priority
over all other tokenizer patterns

* Add lookahead and potentially slow lookbehind back to the default URL
pattern

* Expand character classes in URL pattern to improve matching around
lookaheads and lookbehinds related to #4882

* Revert changes to Hungarian tokenizer

* Revert (xfail) several URL tests to their status before #4374

* Update `tokenizer.explain()` and docs accordingly
2020-03-09 12:09:41 +01:00
Kabir Khan
f6ed07b85c
Use nlp.pipe in EntityRuler for phrase patterns in add_patterns (#4931)
* Fix ent_ids and labels properties when id attribute used in patterns

* use set for labels

* sort end_ids for comparison in entity_ruler tests

* fixing entity_ruler ent_ids test

* add to set

* Run make_doc optimistically if using phrase matcher patterns.

* remove unused coveragerc I was testing with

* format

* Refactor EntityRuler.add_patterns to use nlp.pipe for phrase patterns. Improves speed substantially.

* Removing old add_patterns function

* Fixing spacing

* Make sure token_patterns loaded as well, before generator was being emptied in from_disk
2020-02-16 18:17:47 +01:00
Julin S
479e81bafc
fix link (#4977) 2020-02-10 20:31:26 -05:00
Ines Montani
9c08d9baa3 Remove old sections [ci skip] (closes #4961) 2020-02-03 13:10:46 +01:00
Preston Badeer
b216ff43c9 Update vectors-similarity.md (#4889)
These links are broken on the website, due to quotes around the URLs.
2020-01-08 16:49:40 +01:00
Geoffrey Gordon Ashbrook
53929138d7 remove extra word typo (#4875)
"let you find you"
2020-01-06 12:37:42 +01:00
Ines Montani
400257a802 Update index.md [ci skip] 2020-01-04 01:52:18 +01:00
Ines Montani
1b838d1313 Divide models into core and starters [ci skip] 2019-12-21 14:10:22 +01:00
Nicolai Bjerre Pedersen
de5453cdcb Fix link to user hooks in docs (#4778)
* Fix link to user hooks in docs

* Update mr_bjerre.md

Mistake in contributor agreement

* Apparently hard to get it right (wrong name of sca)
2019-12-06 19:17:12 +01:00
Ines Montani
cbacb0f1a4 Update shape docs and examples (resolves #4615) [ci skip] 2019-11-23 17:16:55 +01:00
Ines Montani
235fe6fe3b Auto-format [ci skip] 2019-11-20 13:14:58 +01:00
adrianeboyd
2c876eb672 Add tokenizer explain() debugging method (#4596)
* Expose tokenizer rules as a property

Expose the tokenizer rules property in the same way as the other core
properties. (The cache resetting is overkill, but consistent with
`from_bytes` for now.)

Add tests and update Tokenizer API docs.

* Update Hungarian punctuation to remove empty string

Update Hungarian punctuation definitions so that `_units` does not match
an empty string.

* Use _load_special_tokenization consistently

Use `_load_special_tokenization()` and have it to handle `None` checks.

* Fix precedence of `token_match` vs. special cases

Remove `token_match` check from `_split_affixes()` so that special cases
have precedence over `token_match`. `token_match` is checked only before
infixes are split.

* Add `make_debug_doc()` to the Tokenizer

Add `make_debug_doc()` to the Tokenizer as a working implementation of
the pseudo-code in the docs.

Add a test (marked as slow) that checks that `nlp.tokenizer()` and
`nlp.tokenizer.make_debug_doc()` return the same non-whitespace tokens
for all languages that have `examples.sentences` that can be imported.

* Update tokenization usage docs

Update pseudo-code and algorithm description to correspond to
`nlp.tokenizer.make_debug_doc()` with example debugging usage.

Add more examples for customizing tokenizers while preserving the
existing defaults.

Minor edits / clarifications.

* Revert "Update Hungarian punctuation to remove empty string"

This reverts commit f0a577f7a5.

* Rework `make_debug_doc()` as `explain()`

Rework `make_debug_doc()` as `explain()`, which returns a list of
`(pattern_string, token_string)` tuples rather than a non-standard
`Doc`. Update docs and tests accordingly, leaving the visualization for
future work.

* Handle cases with bad tokenizer patterns

Detect when tokenizer patterns match empty prefixes and suffixes so that
`explain()` does not hang on bad patterns.

* Remove unused displacy image

* Add tokenizer.explain() to usage docs
2019-11-20 13:07:25 +01:00
Ines Montani
e8b9cee6fd Make example consistent with model (closes #4587) [ci skip] 2019-11-18 12:41:48 +01:00
Ines Montani
e01a1a237f Auto-format [ci skip] 2019-11-18 12:41:31 +01:00
adrianeboyd
62e00fd9da Update tokenization usage docs (#4666)
Update pseudo-code and algorithm description to correspond to current
tokenizer behavior.

Add more examples for customizing tokenizers while preserving the
existing defaults.

Minor edits / clarifications.
2019-11-18 12:35:13 +01:00
Ines Montani
5adcb352e9 Adjust order of docs sections [ci skip] 2019-11-17 16:08:56 +01:00
Ines Montani
e30d08410a
Add CI for Python 3.8 (#4479)
* Add 3.8 classifier

* Update azure-pipelines.yml

* Remove 3.8 warning from docs [ci skip]
2019-11-15 01:13:48 +01:00
Ines Montani
9d5ff177c4 Work around Markdown rendering issue surfaced in #4600 [ci skip] 2019-11-11 17:12:08 +01:00
walterhenry
5563c42ef5 Fixed typo: Added space between "recognize" and "various" (#4600) 2019-11-06 23:06:36 +01:00
Ines Montani
828ef27a32 Add warnings about 3.8 (resolves #4593) [ci skip] 2019-11-05 18:30:11 +01:00
Ines Montani
4e1de85e43 Update syntax iterators [ci skip] 2019-10-30 14:31:40 +01:00
Ines Montani
493be8e9db Update new version identifier [ci skip] 2019-10-25 11:42:49 +02:00
Ines Montani
f31876154d Adjust formatting [ci skip] 2019-10-25 11:19:46 +02:00
Kabir Khan
93640373c7 Make entity_ruler ent_id resolution 2x faster and add docs for… (#4513)
* Update entityruler.py

* Making ent_id resolution 2x faster and adding docs

* Fixing newlines in docstrings

* Fixing newlines in docstrings
2019-10-25 11:16:42 +02:00
adrianeboyd
7fc39f124c Fix logic in rules+model entity example [ci skip] (#4510) 2019-10-23 14:41:21 +02:00
adrianeboyd
3195a8f170 Add Entity Linking to menu (#4489) 2019-10-21 12:17:30 +02:00
Ines Montani
573e543e4a Alphanumeric -> alphabetic [ci skip]
see ines/spacy-course#38
2019-10-06 13:30:01 +02:00