spaCy/examples/README.md
2021-03-12 08:07:20 +01:00

131 lines
5.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
# spaCy examples
For spaCy v3 we've converted many of the [v2 example
scripts](https://github.com/explosion/spaCy/tree/v2.3.x/examples/) into
end-to-end [spacy projects](https://spacy.io/usage/projects) workflows. The
workflows include all the steps to go from data to packaged spaCy models.
## 🪐 Pipeline component demos
The simplest demos for training a single pipeline component are in the
[`pipelines`](https://github.com/explosion/projects/blob/v3/pipelines) category
including:
- [`pipelines/ner_demo`](https://github.com/explosion/projects/blob/v3/pipelines/ner_demo):
Train a named entity recognizer
- [`pipelines/textcat_demo`](https://github.com/explosion/projects/blob/v3/pipelines/textcat_demo):
Train a text classifier
- [`pipelines/parser_intent_demo`](https://github.com/explosion/projects/blob/v3/pipelines/parser_intent_demo):
Train a dependency parser for custom semantics
## 🪐 Tutorials
The [`tutorials`](https://github.com/explosion/projects/blob/v3/tutorials)
category includes examples that work through specific NLP use cases end-to-end:
- [`tutorials/textcat_goemotions`](https://github.com/explosion/projects/blob/v3/tutorials/textcat_goemotions):
Train a text classifier to categorize emotions in Reddit posts
- [`tutorials/nel_emerson`](https://github.com/explosion/projects/blob/v3/tutorials/nel_emerson):
Use an entity linker to disambiguate mentions of the same name
Check out the [projects documentation](https://spacy.io/usage/projects) and
browse through the [available
projects](https://github.com/explosion/projects/)!
## 🚀 Get started with a demo project
The
[`pipelines/ner_demo`](https://github.com/explosion/projects/blob/v3/pipelines/ner_demo)
project converts the spaCy v2
[`train_ner.py`](https://github.com/explosion/spaCy/blob/v2.3.x/examples/training/train_ner.py)
demo script into a spaCy v3 project.
1. Clone the project:
```bash
python -m spacy project clone pipelines/ner_demo
```
2. Install requirements and download any data assets:
```bash
cd ner_demo
python -m pip install -r requirements.txt
python -m spacy project assets
```
3. Run the default workflow to convert, train and evaluate:
```bash
python -m spacy project run all
```
Sample output:
```none
Running workflow 'all'
================================== convert ==================================
Running command: /home/user/venv/bin/python scripts/convert.py en assets/train.json corpus/train.spacy
Running command: /home/user/venv/bin/python scripts/convert.py en assets/dev.json corpus/dev.spacy
=============================== create-config ===============================
Running command: /home/user/venv/bin/python -m spacy init config --lang en --pipeline ner configs/config.cfg --force
Generated config template specific for your use case
- Language: en
- Pipeline: ner
- Optimize for: efficiency
- Hardware: CPU
- Transformer: None
✔ Auto-filled config with all values
✔ Saved config
configs/config.cfg
You can now add your data and train your pipeline:
python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy
=================================== train ===================================
Running command: /home/user/venv/bin/python -m spacy train configs/config.cfg --output training/ --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy --training.eval_frequency 10 --training.max_steps 100 --gpu-id -1
Using CPU
=========================== Initializing pipeline ===========================
[2021-03-11 19:34:59,101] [INFO] Set up nlp object from config
[2021-03-11 19:34:59,109] [INFO] Pipeline: ['tok2vec', 'ner']
[2021-03-11 19:34:59,113] [INFO] Created vocabulary
[2021-03-11 19:34:59,113] [INFO] Finished initializing nlp object
[2021-03-11 19:34:59,265] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
✔ Initialized pipeline
============================= Training pipeline =============================
Pipeline: ['tok2vec', 'ner']
Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS NER ENTS_F ENTS_P ENTS_R SCORE
--- ------ ------------ -------- ------ ------ ------ ------
0 0 0.00 7.90 0.00 0.00 0.00 0.00
10 10 0.11 71.07 0.00 0.00 0.00 0.00
20 20 0.65 22.44 50.00 50.00 50.00 0.50
30 30 0.22 6.38 80.00 66.67 100.00 0.80
40 40 0.00 0.00 80.00 66.67 100.00 0.80
50 50 0.00 0.00 80.00 66.67 100.00 0.80
60 60 0.00 0.00 100.00 100.00 100.00 1.00
70 70 0.00 0.00 100.00 100.00 100.00 1.00
80 80 0.00 0.00 100.00 100.00 100.00 1.00
90 90 0.00 0.00 100.00 100.00 100.00 1.00
100 100 0.00 0.00 100.00 100.00 100.00 1.00
✔ Saved pipeline to output directory
training/model-last
```
4. Package the model:
```bash
python -m spacy project run package
```
5. Visualize the model's output with [Streamlit](https://streamlit.io):
```bash
python -m spacy project run visualize-model
```