Commit Graph

15672 Commits

Author SHA1 Message Date
David Berenstein
ed2ac34a8a
added Concise Concepts to spaCy universe (#10499)
* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

* updated information for Classy Classificaiton 

Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.

* added note about examples

* corrected for wrong formatting changes

* Update website/meta/universe.json with small typo correction

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* resolved another typo

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* added Concise Concepts package to spaCy universe.

* updated example code Concise Concepts

* updated description for Concise Concepts

* updated PR with more visually appealing examples

SO to koaning for the suggestions.

* corrected for small json typo's in concise concepts

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-03-24 18:00:12 +01:00
Kádár Ákos
83ac0477c8 remove useless extra prefix and device from spanpredictor 2022-03-24 16:44:50 +01:00
Kádár Ákos
1c5dabcb47 merge SpanPredictor attributes 2022-03-24 16:23:12 +01:00
Kádár Ákos
a872c69ffb merge 2022-03-24 16:10:04 +01:00
Kádár Ákos
706b2e6f25 gearing up SpanPredictor for gold-heads 2022-03-24 16:06:20 +01:00
Adriane Boyd
3711af74e5
Add tokenizer option to allow Matcher handling for all rules (#10452)
* Add tokenizer option to allow Matcher handling for all rules

Add tokenizer option `with_faster_rules_heuristics` that determines
whether the special cases applied by the internal `Matcher` are filtered
by whether they contain affixes or space. If `True` (default), the rules
are filtered to prioritize speed over rare edge cases. If `False`, all
rules are included in the final `Matcher`-based pass over the doc.

* Reset all caches when reloading special cases

* Revert "Reset all caches when reloading special cases"

This reverts commit 4ef6bd171d.

* Initialize max_length properly

* Add new tag to API docs

* Rename to faster heuristics
2022-03-24 13:21:32 +01:00
Adriane Boyd
31a5d99efa
Maintain support for empty DocBin span groups (#10538) 2022-03-24 11:51:07 +01:00
Daniël de Kok
2ff197603e
matcher: remove an undefined behavior (#10537)
Indexing into a zero-length std::vector is an undefined behavior.
2022-03-24 11:48:22 +01:00
Adriane Boyd
d85117f88c
Stream large assets on download (#10521)
Stream large assets on download rather than reading the whole file at
once and potentially running into `urllib3` limits on single read sizes.
2022-03-24 11:47:05 +01:00
Adriane Boyd
e908a67829
Handle unknown tags in KoreanTokenizer tag map (#10536) 2022-03-24 11:25:36 +01:00
Kádár Ákos
150e7c46d7 conflict 2022-03-23 11:27:02 +01:00
Kádár Ákos
1eaf8fb0cf span predictor debug start 2022-03-23 11:24:27 +01:00
Paul O'Leary McCann
eec00ce60d Fix various sizes in SpanPredictor FFNN 2022-03-23 16:20:31 +09:00
Adriane Boyd
c17980e535
Save vectors as little endian, load with Ops.asarray (#10201)
* Save vectors as little endian, load with Ops.asarray

* Always save vector data as little endian
* Always run `Vectors.to_ops` when vector data is loaded so that
  `Ops.asarray` can be used to load the data correctly for the current
  ops.

* Update spacy/vectors.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/vectors.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-03-21 14:24:46 +01:00
Basile Dura
107bab56b5
docs: add EDS-NLP to spaCy universe (#10489)
* docs: add EDS-NLP to spaCy universe

* fix: remove "standalone" tag for EDS-NLP

Co-authored-by: Basile Dura <basile.dura-ext@aphp.fr>
2022-03-21 11:03:39 +01:00
github-actions[bot]
bf1cf77a5b
Auto-format code with black (#10518)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-03-21 09:21:24 +01:00
Paul O'Leary McCann
2190cbc0e6 Add progress on SpanPredictor component
This isn't working. There is a CUDA error in the torch code during
initialization and it's not clear why.
2022-03-19 19:39:49 +09:00
Kádár Ákos
db422abf01 remove unnecessary .device 2022-03-18 16:24:26 +01:00
Adriane Boyd
04f3f414d1
Update pytest to forbid ==7.1.0, allow >=7.1.1 (#10519) 2022-03-18 13:43:54 +01:00
Paul O'Leary McCann
a098849112 Add fake batching
The way fake batching works is that the pipeline component calls the
model repeatedly in a loop internally. It feels like this should break
something, but it worked in testing.

Another issue is that this changes the signature of some of the pipeline
functions, though I don't think that's an issue.

Tested with batch size of 2, so more testing is needed, but this is a
start.
2022-03-18 19:46:58 +09:00
Lj Miranda
0b02dc4c57
Fix mixed-up parameters for spacy-conll (#10516) 2022-03-18 08:56:21 +01:00
Grey Murav
3ff5a6a5c0
Extend list of _num_words (#10468) 2022-03-16 18:25:42 +01:00
Lj Miranda
a79cd3542b
Add displacy support for overlapping Spans (#10332)
* Fix docstring for EntityRenderer

* Add warning in displacy if doc.spans are empty

* Implement parse_spans converter

One notable change here is that the default spans_key is sc, and
it's set by the user through the options.

* Implement SpanRenderer

Here, I implemented a SpanRenderer that looks similar to the
EntityRenderer except for some templates.  The spans_key, by default, is
set to sc, but can be configured in the options (see parse_spans). The
way I rendered these spans is per-token, i.e., I first check if each
token (1) belongs to a given span type and (2) a starting token of a
given span type. Once I have this information, I render them into the
markup.

* Fix mypy issues on typing

* Add tests for displacy spans support

* Update colors from RGB to hex

Co-authored-by: Ines Montani <ines@ines.io>

* Remove unnecessary CSS properties

* Add documentation for website

* Remove unnecesasry scripts

* Update wording on the documentation

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Put typing dependency on top of file

* Put back z-index so that spans overlap properly

* Make warning more explicit for spans_key

Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-03-16 18:14:34 +01:00
Paul O'Leary McCann
1a79d18796 Formatting 2022-03-16 20:10:47 +09:00
Paul O'Leary McCann
6855df0e66 Skeleton for span predictor component
This should be moved into its own file, but for now just stubbing out
the methods.
2022-03-16 20:09:33 +09:00
Paul O'Leary McCann
0275ae29de Remove stale comment 2022-03-16 20:09:12 +09:00
Paul O'Leary McCann
6974f55daa Hack for transformer listener size 2022-03-16 15:15:53 +09:00
Paul O'Leary McCann
7811a1194b Change architecture 2022-03-16 14:57:15 +09:00
Paul O'Leary McCann
5650853c0f Remove unused functions 2022-03-16 14:38:11 +09:00
David Berenstein
e021dc6279
Updated explenation for for classy classification (#10484)
* Update universe.json

added classy-classification to Spacy universe

* Update universe.json

added classy-classification to the spacy universe resources

* Update universe.json

corrected a small typo in json

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update universe.json

processed merge feedback

* Update universe.json

* updated information for Classy Classificaiton 

Made a more comprehensible and easy description for Classy Classification based on feedback of Philip Vollet to prepare for sharing.

* added note about examples

* corrected for wrong formatting changes

* Update website/meta/universe.json with small typo correction

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* resolved another typo

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-03-15 16:42:33 +01:00
Daniël de Kok
e5debc68e4
Tagger: use unnormalized probabilities for inference (#10197)
* Tagger: use unnormalized probabilities for inference

Using unnormalized softmax avoids use of the relatively expensive exp function,
which can significantly speed up non-transformer models (e.g. I got a speedup
of 27% on a German tagging + parsing pipeline).

* Add spacy.Tagger.v2 with configurable normalization

Normalization of probabilities is disabled by default to improve
performance.

* Update documentation, models, and tests to spacy.Tagger.v2

* Move Tagger.v1 to spacy-legacy

* docs/architectures: run prettier

* Unnormalized softmax is now a Softmax_v2 option

* Require thinc 8.0.14 and spacy-legacy 3.0.9
2022-03-15 14:15:31 +01:00
Paul O'Leary McCann
d0ae2590db Delete all the coref-hoi code 2022-03-15 20:05:24 +09:00
Paul O'Leary McCann
abdc7d87af Clean up util code
Moved everything into coref_util.py, deleted wl-specific file.
2022-03-15 19:59:44 +09:00
Paul O'Leary McCann
55039a66ad Remove old default config 2022-03-15 19:53:09 +09:00
Paul O'Leary McCann
17d017a177 Remove span2head
This doesn't work as a component because it needs to modify gold data,
so instead it's a conversion script (in another repo).
2022-03-15 19:52:20 +09:00
Paul O'Leary McCann
0522a43116 Make span2head component 2022-03-15 19:19:15 +09:00
Adriane Boyd
e8357923ec
Various install docs updates (#10487)
* Simplify quickstart source install to use only editable pip install

* Update pytorch install instructions to more recent versions
2022-03-15 11:12:50 +01:00
vincent d warmerdam
610001e8c7
Update universe.json (#10490)
The project moved away from Rasa and into my personal GitHub account.
2022-03-15 11:12:04 +01:00
Adriane Boyd
0dc454ba95
Update docs for Vocab.get_vector (#10486)
* Update docs for Vocab.get_vector

* Clarify description of 0-vector dimensions
2022-03-15 09:10:47 +01:00
Edward
2eef47dd26
Save span candidates produced by spancat suggesters (#10413)
* Add save_candidates attribute

* Change spancat api

* Add unit test

* reimplement method to produce a list of doc

* Add method to docs

* Add new version tag

* Add intended use to docstring

* prettier formatting
2022-03-14 16:46:58 +01:00
Edward
b68bf43f5b
Add spans to doc.to_json (#10073)
* Add spans to to_json

* adjustments to_json

* Change docstring

* change doc key naming

* Update spacy/tokens/doc.pyx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-03-14 15:47:57 +01:00
Sofie Van Landeghem
23bc93d3d2
limit pytest to <7.1 (#10488)
* limit pytest to <7.1

* 7.1.0
2022-03-14 15:17:22 +01:00
Paul O'Leary McCann
e6917d8dc4 Add util functions for wl-coref 2022-03-14 19:27:55 +09:00
Paul O'Leary McCann
dfec6993d6 Training works now 2022-03-14 19:27:23 +09:00
Paul O'Leary McCann
8eadf3781b Training runs now
Evaluation needs fixing, and code still needs cleanup.
2022-03-14 19:02:17 +09:00
Lj Miranda
6af6c2e86c
Add a note to the dev docs on mypy (#10485) 2022-03-14 09:41:31 +01:00
Paul O'Leary McCann
d22a002641 Forward/backward pass works
Evaluate does not work - predict hasn't been updated
2022-03-14 17:26:27 +09:00
github-actions[bot]
1bbf232074
Auto-format code with black (#10479)
* Auto-format code with black

* Update spacy/lang/hsb/lex_attrs.py

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-03-11 12:20:23 +01:00
Adriane Boyd
297dd82c86
Fix initial special cases for Tokenizer.explain (#10460)
Add the missing initial check for special cases to `Tokenizer.explain`
to align with `Tokenizer._tokenize_affixes`.
2022-03-11 10:50:47 +01:00
Paul O'Leary McCann
c4f9c24738 The coref model is able to be loaded
The span predictor component is initialized but not used at all now.
Plan is to work on it after the word level clustering part is trainable
end-to-end.
2022-03-09 19:31:11 +09:00