mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-10 09:16:31 +03:00
Merge branch 'master' into spacy.io
This commit is contained in:
commit
1572490d57
106
.github/contributors/ujwal-narayan.md
vendored
Normal file
106
.github/contributors/ujwal-narayan.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Ujwal Narayan |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 17/05/2019 |
|
||||
| GitHub username | ujwal-narayan |
|
||||
| Website (optional) | |
|
|
@ -4,67 +4,87 @@ from __future__ import unicode_literals
|
|||
|
||||
STOP_WORDS = set(
|
||||
"""
|
||||
ಈ
|
||||
ಮತ್ತು
|
||||
ಹಾಗೂ
|
||||
ಅವರು
|
||||
ಅವರ
|
||||
ಬಗ್ಗೆ
|
||||
ಎಂಬ
|
||||
ಆದರೆ
|
||||
ಅವರನ್ನು
|
||||
ಆದರೆ
|
||||
ತಮ್ಮ
|
||||
ಒಂದು
|
||||
ಎಂದರು
|
||||
ಮೇಲೆ
|
||||
ಹೇಳಿದರು
|
||||
ಸೇರಿದಂತೆ
|
||||
ಬಳಿಕ
|
||||
ಆ
|
||||
ಯಾವುದೇ
|
||||
ಅವರಿಗೆ
|
||||
ನಡೆದ
|
||||
ಕುರಿತು
|
||||
ಇದು
|
||||
ಅವರು
|
||||
ಕಳೆದ
|
||||
ಇದೇ
|
||||
ತಿಳಿಸಿದರು
|
||||
ಹೀಗಾಗಿ
|
||||
ಕೂಡ
|
||||
ತನ್ನ
|
||||
ತಿಳಿಸಿದ್ದಾರೆ
|
||||
ನಾನು
|
||||
ಹೇಳಿದ್ದಾರೆ
|
||||
ಈಗ
|
||||
ಎಲ್ಲ
|
||||
ನನ್ನ
|
||||
ನಮ್ಮ
|
||||
ಈಗಾಗಲೇ
|
||||
ಇದಕ್ಕೆ
|
||||
ಹಲವು
|
||||
ಇದೆ
|
||||
ಮತ್ತೆ
|
||||
ಮಾಡುವ
|
||||
ನೀಡಿದರು
|
||||
ನಾವು
|
||||
ನೀಡಿದ
|
||||
ಇದರಿಂದ
|
||||
ಮೂಲಕ
|
||||
ಹಾಗೂ
|
||||
ಅದು
|
||||
ಇದನ್ನು
|
||||
ನೀಡಿದ್ದಾರೆ
|
||||
ಯಾವ
|
||||
ಎಂದರು
|
||||
ಅವರು
|
||||
ಈಗ
|
||||
ಎಂಬ
|
||||
ಹಾಗಾಗಿ
|
||||
ಅಷ್ಟೇ
|
||||
ನಾವು
|
||||
ಇದೇ
|
||||
ಹೇಳಿ
|
||||
ತಮ್ಮ
|
||||
ಹೀಗೆ
|
||||
ನಮ್ಮ
|
||||
ಬೇರೆ
|
||||
ನೀಡಿದರು
|
||||
ಮತ್ತೆ
|
||||
ಇದು
|
||||
ಈ
|
||||
ನೀವು
|
||||
ನಾನು
|
||||
ಇತ್ತು
|
||||
ಎಲ್ಲಾ
|
||||
ಯಾವುದೇ
|
||||
ನಡೆದ
|
||||
ಅದನ್ನು
|
||||
ಇಲ್ಲಿ
|
||||
ಆಗ
|
||||
ಬಂದಿದೆ.
|
||||
ಅದೇ
|
||||
ಇರುವ
|
||||
ಅಲ್ಲದೆ
|
||||
ಕೆಲವು
|
||||
ಎಂದರೆ
|
||||
ನೀಡಿದೆ
|
||||
ಹೀಗಾಗಿ
|
||||
ಜೊತೆಗೆ
|
||||
ಇದರಿಂದ
|
||||
ನನಗೆ
|
||||
ಅಲ್ಲದೆ
|
||||
ಎಷ್ಟು
|
||||
ಇದರ
|
||||
ಇಲ್ಲ
|
||||
ಕಳೆದ
|
||||
ತುಂಬಾ
|
||||
ಈಗಾಗಲೇ
|
||||
ಮಾಡಿ
|
||||
ಅದಕ್ಕೆ
|
||||
ಬಗ್ಗೆ
|
||||
ಅವರ
|
||||
ಇದನ್ನು
|
||||
ಆ
|
||||
ಇದೆ
|
||||
ಹೆಚ್ಚು
|
||||
ಇನ್ನು
|
||||
ಎಲ್ಲ
|
||||
ಇರುವ
|
||||
ಅವರಿಗೆ
|
||||
ನಿಮ್ಮ
|
||||
ಏನು
|
||||
ಕೂಡ
|
||||
ಇಲ್ಲಿ
|
||||
ನನ್ನನ್ನು
|
||||
ಕೆಲವು
|
||||
ಮಾತ್ರ
|
||||
ಬಳಿಕ
|
||||
ಅಂತ
|
||||
ತನ್ನ
|
||||
ಆಗ
|
||||
ಅಥವಾ
|
||||
ಅಲ್ಲ
|
||||
ಕೇವಲ
|
||||
ಆದರೆ
|
||||
ಮತ್ತು
|
||||
ಇನ್ನೂ
|
||||
ಅದೇ
|
||||
ಆಗಿ
|
||||
ಅವರನ್ನು
|
||||
ಹೇಳಿದ್ದಾರೆ
|
||||
ನಡೆದಿದೆ
|
||||
ಇದಕ್ಕೆ
|
||||
ಎಂಬುದು
|
||||
ಎಂದು
|
||||
ನನ್ನ
|
||||
ಮೇಲೆ
|
||||
""".split()
|
||||
)
|
||||
|
|
|
@ -417,7 +417,9 @@ class Language(object):
|
|||
golds (iterable): A batch of `GoldParse` objects.
|
||||
drop (float): The droput rate.
|
||||
sgd (callable): An optimizer.
|
||||
RETURNS (dict): Results from the update.
|
||||
losses (dict): Dictionary to update with the loss, keyed by component.
|
||||
component_cfg (dict): Config parameters for specific pipeline
|
||||
components, keyed by component name.
|
||||
|
||||
DOCS: https://spacy.io/api/language#update
|
||||
"""
|
||||
|
@ -598,6 +600,19 @@ class Language(object):
|
|||
def evaluate(
|
||||
self, docs_golds, verbose=False, batch_size=256, scorer=None, component_cfg=None
|
||||
):
|
||||
"""Evaluate a model's pipeline components.
|
||||
|
||||
docs_golds (iterable): Tuples of `Doc` and `GoldParse` objects.
|
||||
verbose (bool): Print debugging information.
|
||||
batch_size (int): Batch size to use.
|
||||
scorer (Scorer): Optional `Scorer` to use. If not passed in, a new one
|
||||
will be created.
|
||||
component_cfg (dict): An optional dictionary with extra keyword
|
||||
arguments for specific components.
|
||||
RETURNS (Scorer): The scorer containing the evaluation results.
|
||||
|
||||
DOCS: https://spacy.io/api/language#evaluate
|
||||
"""
|
||||
if scorer is None:
|
||||
scorer = Scorer()
|
||||
if component_cfg is None:
|
||||
|
|
|
@ -35,7 +35,17 @@ class PRFScore(object):
|
|||
|
||||
|
||||
class Scorer(object):
|
||||
"""Compute evaluation scores."""
|
||||
|
||||
def __init__(self, eval_punct=False):
|
||||
"""Initialize the Scorer.
|
||||
|
||||
eval_punct (bool): Evaluate the dependency attachments to and from
|
||||
punctuation.
|
||||
RETURNS (Scorer): The newly created object.
|
||||
|
||||
DOCS: https://spacy.io/api/scorer#init
|
||||
"""
|
||||
self.tokens = PRFScore()
|
||||
self.sbd = PRFScore()
|
||||
self.unlabelled = PRFScore()
|
||||
|
@ -46,34 +56,46 @@ class Scorer(object):
|
|||
|
||||
@property
|
||||
def tags_acc(self):
|
||||
"""RETURNS (float): Part-of-speech tag accuracy (fine grained tags,
|
||||
i.e. `Token.tag`).
|
||||
"""
|
||||
return self.tags.fscore * 100
|
||||
|
||||
@property
|
||||
def token_acc(self):
|
||||
"""RETURNS (float): Tokenization accuracy."""
|
||||
return self.tokens.precision * 100
|
||||
|
||||
@property
|
||||
def uas(self):
|
||||
"""RETURNS (float): Unlabelled dependency score."""
|
||||
return self.unlabelled.fscore * 100
|
||||
|
||||
@property
|
||||
def las(self):
|
||||
"""RETURNS (float): Labelled depdendency score."""
|
||||
return self.labelled.fscore * 100
|
||||
|
||||
@property
|
||||
def ents_p(self):
|
||||
"""RETURNS (float): Named entity accuracy (precision)."""
|
||||
return self.ner.precision * 100
|
||||
|
||||
@property
|
||||
def ents_r(self):
|
||||
"""RETURNS (float): Named entity accuracy (recall)."""
|
||||
return self.ner.recall * 100
|
||||
|
||||
@property
|
||||
def ents_f(self):
|
||||
"""RETURNS (float): Named entity accuracy (F-score)."""
|
||||
return self.ner.fscore * 100
|
||||
|
||||
@property
|
||||
def scores(self):
|
||||
"""RETURNS (dict): All scores with keys `uas`, `las`, `ents_p`,
|
||||
`ents_r`, `ents_f`, `tags_acc` and `token_acc`.
|
||||
"""
|
||||
return {
|
||||
"uas": self.uas,
|
||||
"las": self.las,
|
||||
|
@ -84,9 +106,20 @@ class Scorer(object):
|
|||
"token_acc": self.token_acc,
|
||||
}
|
||||
|
||||
def score(self, tokens, gold, verbose=False, punct_labels=("p", "punct")):
|
||||
if len(tokens) != len(gold):
|
||||
gold = GoldParse.from_annot_tuples(tokens, zip(*gold.orig_annot))
|
||||
def score(self, doc, gold, verbose=False, punct_labels=("p", "punct")):
|
||||
"""Update the evaluation scores from a single Doc / GoldParse pair.
|
||||
|
||||
doc (Doc): The predicted annotations.
|
||||
gold (GoldParse): The correct annotations.
|
||||
verbose (bool): Print debugging information.
|
||||
punct_labels (tuple): Dependency labels for punctuation. Used to
|
||||
evaluate dependency attachments to punctuation if `eval_punct` is
|
||||
`True`.
|
||||
|
||||
DOCS: https://spacy.io/api/scorer#score
|
||||
"""
|
||||
if len(doc) != len(gold):
|
||||
gold = GoldParse.from_annot_tuples(doc, zip(*gold.orig_annot))
|
||||
gold_deps = set()
|
||||
gold_tags = set()
|
||||
gold_ents = set(tags_to_entities([annot[-1] for annot in gold.orig_annot]))
|
||||
|
@ -96,7 +129,7 @@ class Scorer(object):
|
|||
gold_deps.add((id_, head, dep.lower()))
|
||||
cand_deps = set()
|
||||
cand_tags = set()
|
||||
for token in tokens:
|
||||
for token in doc:
|
||||
if token.orth_.isspace():
|
||||
continue
|
||||
gold_i = gold.cand_to_gold[token.i]
|
||||
|
@ -116,7 +149,7 @@ class Scorer(object):
|
|||
cand_deps.add((gold_i, gold_head, token.dep_.lower()))
|
||||
if "-" not in [token[-1] for token in gold.orig_annot]:
|
||||
cand_ents = set()
|
||||
for ent in tokens.ents:
|
||||
for ent in doc.ents:
|
||||
first = gold.cand_to_gold[ent.start]
|
||||
last = gold.cand_to_gold[ent.end - 1]
|
||||
if first is None or last is None:
|
||||
|
|
|
@ -119,8 +119,28 @@ Update the models in the pipeline.
|
|||
| `golds` | iterable | A batch of `GoldParse` objects or dictionaries. Dictionaries will be used to create [`GoldParse`](/api/goldparse) objects. For the available keys and their usage, see [`GoldParse.__init__`](/api/goldparse#init). |
|
||||
| `drop` | float | The dropout rate. |
|
||||
| `sgd` | callable | An optimizer. |
|
||||
| `losses` | dict | Dictionary to update with the loss, keyed by pipeline component. |
|
||||
| `component_cfg` <Tag variant="new">2.1</Tag> | dict | Config parameters for specific pipeline components, keyed by component name. |
|
||||
|
||||
## Language.evaluate {#evaluate tag="method"}
|
||||
|
||||
Evaluate a model's pipeline components.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> scorer = nlp.evaluate(docs_golds, verbose=True)
|
||||
> print(scorer.scores)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| -------------------------------------------- | -------- | ------------------------------------------------------------------------------------- |
|
||||
| `docs_golds` | iterable | Tuples of `Doc` and `GoldParse` objects. |
|
||||
| `verbose` | bool | Print debugging information. |
|
||||
| `batch_size` | int | The batch size to use. |
|
||||
| `scorer` | `Scorer` | Optional [`Scorer`](/api/scorer) to use. If not passed in, a new one will be created. |
|
||||
| `component_cfg` <Tag variant="new">2.1</Tag> | dict | Config parameters for specific pipeline components, keyed by component name. |
|
||||
|
||||
## Language.begin_training {#begin_training tag="method"}
|
||||
|
||||
Allocate models, pre-process training data and acquire an optimizer.
|
||||
|
|
58
website/docs/api/scorer.md
Normal file
58
website/docs/api/scorer.md
Normal file
|
@ -0,0 +1,58 @@
|
|||
---
|
||||
title: Scorer
|
||||
teaser: Compute evaluation scores
|
||||
tag: class
|
||||
source: spacy/scorer.py
|
||||
---
|
||||
|
||||
The `Scorer` computes and stores evaluation scores. It's typically created by
|
||||
[`Language.evaluate`](/api/language#evaluate).
|
||||
|
||||
## Scorer.\_\_init\_\_ {#init tag="method"}
|
||||
|
||||
Create a new `Scorer`.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> from spacy.scorer import Scorer
|
||||
>
|
||||
> scorer = Scorer()
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ------------ | -------- | ------------------------------------------------------------ |
|
||||
| `eval_punct` | bool | Evaluate the dependency attachments to and from punctuation. |
|
||||
| **RETURNS** | `Scorer` | The newly created object. |
|
||||
|
||||
## Scorer.score {#score tag="method"}
|
||||
|
||||
Update the evaluation scores from a single [`Doc`](/api/doc) /
|
||||
[`GoldParse`](/api/goldparse) pair.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> scorer = Scorer()
|
||||
> scorer.score(doc, gold)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| -------------- | ----------- | -------------------------------------------------------------------------------------------------------------------- |
|
||||
| `doc` | `Doc` | The predicted annotations. |
|
||||
| `gold` | `GoldParse` | The correct annotations. |
|
||||
| `verbose` | bool | Print debugging information. |
|
||||
| `punct_labels` | tuple | Dependency labels for punctuation. Used to evaluate dependency attachments to punctuation if `eval_punct` is `True`. |
|
||||
|
||||
## Properties
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | ----- | -------------------------------------------------------------------------------------------- |
|
||||
| `token_acc` | float | Tokenization accuracy. |
|
||||
| `tags_acc` | float | Part-of-speech tag accuracy (fine grained tags, i.e. `Token.tag`). |
|
||||
| `uas` | float | Unlabelled dependency score. |
|
||||
| `las` | float | Labelled dependency score. |
|
||||
| `ents_p` | float | Named entity accuracy (precision). |
|
||||
| `ents_r` | float | Named entity accuracy (recall). |
|
||||
| `ents_f` | float | Named entity accuracy (F-score). |
|
||||
| `scores` | dict | All scores with keys `uas`, `las`, `ents_p`, `ents_r`, `ents_f`, `tags_acc` and `token_acc`. |
|
|
@ -90,7 +90,8 @@
|
|||
{ "text": "StringStore", "url": "/api/stringstore" },
|
||||
{ "text": "Vectors", "url": "/api/vectors" },
|
||||
{ "text": "GoldParse", "url": "/api/goldparse" },
|
||||
{ "text": "GoldCorpus", "url": "/api/goldcorpus" }
|
||||
{ "text": "GoldCorpus", "url": "/api/goldcorpus" },
|
||||
{ "text": "Scorer", "url": "/api/scorer" }
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
Loading…
Reference in New Issue
Block a user