From 81efde0ce401f3004e23fc1ed794d445b8cafa51 Mon Sep 17 00:00:00 2001 From: Adriane Boyd Date: Thu, 11 Mar 2021 19:49:46 +0100 Subject: [PATCH] Add examples README --- examples/README.md | 130 ++++++++++++++++++++++++++++++++++++ examples/training/README.md | 5 ++ 2 files changed, 135 insertions(+) create mode 100644 examples/README.md create mode 100644 examples/training/README.md diff --git a/examples/README.md b/examples/README.md new file mode 100644 index 000000000..23ff59acd --- /dev/null +++ b/examples/README.md @@ -0,0 +1,130 @@ + + +# spaCy examples + +For spaCy v3 we've converted many of the [v2 example +scripts](https://github.com/explosion/spaCy/tree/v2.3.x/examples/) into +end-to-end [spacy projects](https://spacy.io/usage/projects) workflows. The +workflows include all the steps to go from data to packaged spaCy models. + +## 🪐 Pipeline component demos + +The simplest demos for training a single pipeline component are in the +[`pipelines`](https://github.com/explosion/projects/blob/v3/pipelines) category +including: + +- [`pipelines/ner_demo`](https://github.com/explosion/projects/blob/v3/pipelines/ner_demo): + Train a named entity recognizer +- [`pipelines/textcat_demo`](https://github.com/explosion/projects/blob/v3/pipelines/textcat_demo): + Train a text classifier +- [`pipelines/parser_intent_demo`](https://github.com/explosion/projects/blob/v3/pipelines/parser_intent_demo): + Train a dependency parser for custom semantics + +## 🪐 Tutorials + +The [`tutorials`](https://github.com/explosion/projects/blob/v3/tutorials) +category includes examples that work through specific NLP use cases end-to-end: + +- [`tutorials/textcat_goemotions`](https://github.com/explosion/projects/blob/v3/tutorials/textcat_goemotions): + Train a text classifier to categorize emotions in Reddit posts +- [`tutorials/nel_emerson`](https://github.com/explosion/projects/blob/v3/tutorials/nel_emerson): + Use an entity linker to disambiguate mentions of the same name + +Check out the [projects documentation](https://spacy.io/usage/projects) and +browse through the [available +projects](https://github.com/explosion/projects/)! + +## 🚀 Get started with a demo project + +The +[`pipelines/ner_demo`](https://github.com/explosion/projects/blob/v3/pipelines/ner_demo) +project converts the spaCy v2 +[`train_ner.py`](https://github.com/explosion/spaCy/blob/v2.3.x/examples/training/train_ner.py) +demo script into a spaCy v3 project. + +1. Clone the project: + + ```bash + python -m spacy project clone pipelines/ner_demo + ``` + +2. Install requirements and download any data assets: + + ```bash + cd ner_demo + python -m pip install -r requirements.txt + python -m spacy project assets + ``` + +3. Run the default workflow to convert, train and evaluate: + + ```bash + python -m spacy project run all + ``` + + Sample output: + + ```none + ℹ Running workflow 'all' + + ================================== convert ================================== + Running command: /home/user/venv/bin/python scripts/convert.py en assets/train.json corpus/train.spacy + Running command: /home/user/venv/bin/python scripts/convert.py en assets/dev.json corpus/dev.spacy + + =============================== create-config =============================== + Running command: /home/user/venv/bin/python -m spacy init config --lang en --pipeline ner configs/config.cfg --force + ℹ Generated config template specific for your use case + - Language: en + - Pipeline: ner + - Optimize for: efficiency + - Hardware: CPU + - Transformer: None + ✔ Auto-filled config with all values + ✔ Saved config + configs/config.cfg + You can now add your data and train your pipeline: + python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy + + =================================== train =================================== + Running command: /home/user/venv/bin/python -m spacy train configs/config.cfg --output training/ --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy --training.eval_frequency 10 --training.max_steps 100 --gpu-id -1 + ℹ Using CPU + + =========================== Initializing pipeline =========================== + [2021-03-11 19:34:59,101] [INFO] Set up nlp object from config + [2021-03-11 19:34:59,109] [INFO] Pipeline: ['tok2vec', 'ner'] + [2021-03-11 19:34:59,113] [INFO] Created vocabulary + [2021-03-11 19:34:59,113] [INFO] Finished initializing nlp object + [2021-03-11 19:34:59,265] [INFO] Initialized pipeline components: ['tok2vec', 'ner'] + ✔ Initialized pipeline + + ============================= Training pipeline ============================= + ℹ Pipeline: ['tok2vec', 'ner'] + ℹ Initial learn rate: 0.001 + E # LOSS TOK2VEC LOSS NER ENTS_F ENTS_P ENTS_R SCORE + --- ------ ------------ -------- ------ ------ ------ ------ + 0 0 0.00 7.90 0.00 0.00 0.00 0.00 + 10 10 0.11 71.07 0.00 0.00 0.00 0.00 + 20 20 0.65 22.44 50.00 50.00 50.00 0.50 + 30 30 0.22 6.38 80.00 66.67 100.00 0.80 + 40 40 0.00 0.00 80.00 66.67 100.00 0.80 + 50 50 0.00 0.00 80.00 66.67 100.00 0.80 + 60 60 0.00 0.00 100.00 100.00 100.00 1.00 + 70 70 0.00 0.00 100.00 100.00 100.00 1.00 + 80 80 0.00 0.00 100.00 100.00 100.00 1.00 + 90 90 0.00 0.00 100.00 100.00 100.00 1.00 + 100 100 0.00 0.00 100.00 100.00 100.00 1.00 + ✔ Saved pipeline to output directory + training/model-last + ``` + +4. Package the model: + + ```bash + python -m spacy project run package + ``` + +5. Visualize the model's output with [Streamlit](https://streamlit.io): + + ```bash + python -m spacy project run visualize-model + ``` diff --git a/examples/training/README.md b/examples/training/README.md new file mode 100644 index 000000000..34689ceb6 --- /dev/null +++ b/examples/training/README.md @@ -0,0 +1,5 @@ + + +# spaCy examples + +See [examples/README.md](../README.md)