spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-11 20:28:20 +03:00

Author	SHA1	Message	Date
Ines Montani	4a1029a9b6	Add infobox [ci skip]	2021-01-19 19:18:39 +11:00
Matthew Honnibal	f277bfdf0f	Add SpanGroup and Graph container types to represent arbitrary annotations (#6696 ) * Draft out initial Spans data structure * Initial span group commit * Basic span group support on Doc * Basic test for span group * Compile span_group.pyx * Draft addition of SpanGroup to DocBin * Add deserialization for SpanGroup * Add tests for serializing SpanGroup * Fix serialization of SpanGroup * Add EdgeC and GraphC structs * Add draft Graph data structure * Compile graph * More work on Graph * Update GraphC * Upd graph * Fix walk functions * Let Graph take nodes and edges on construction * Fix walking and getting * Add graph tests * Fix import * Add module with the SpanGroups dict thingy * Update test * Rename 'span_groups' attribute * Try to fix c++11 compilation * Fix test * Update DocBin * Try to fix compilation * Try to fix graph * Improve SpanGroup docstrings * Add doc.spans to documentation * Fix serialization * Tidy up and add docs * Update docs [ci skip] * Add SpanGroup.has_overlap * WIP updated Graph API * Start testing new Graph API * Update Graph tests * Update Graph * Add docstring Co-authored-by: Ines Montani <ines@ines.io>	2021-01-14 17:30:41 +11:00
Adriane Boyd	a45d89f09a	Add initialize.before_init and after_init callbacks Add `initialize.before_init` and `initialize.after_init` callbacks to the config. The `initialize.before_init` callback is a place to implement one-time tokenizer customizations that are then saved with the model.	2021-01-12 13:07:44 +01:00
Adriane Boyd	1442d2f213	Improve simple training example in v3 migration (#6438 ) * Create the examples once * Use the examples in the initialization * Provide the batch size * Fix `begin_training` migration example	2020-11-30 09:39:45 +08:00
Ines Montani	019a1dd5e8	Fix v3 overview [ci skip]	2020-11-03 18:10:06 +01:00
Ines Montani	20f80587d6	Merge pull request #6257 from walterhenry/develop-proof A few tiny typo fixes to push through with release of nightly	2020-10-15 18:17:30 +02:00
walterhenry	75b7f86383	Three small typos Some little typos since v3.0 is out.	2020-10-15 18:06:37 +02:00
Ines Montani	7f05ccc170	Update docs [ci skip]	2020-10-15 12:35:30 +02:00
Ines Montani	e50dc2c1c9	Update docs [ci skip]	2020-10-09 12:04:52 +02:00
Sofie Van Landeghem	d093d6343b	TrainablePipe (#6213 ) * rename Pipe to TrainablePipe * split functionality between Pipe and TrainablePipe * remove unnecessary methods from certain components * cleanup * hasattr(component, "pipe") should be sufficient again * remove serialization and vocab/cfg from Pipe * unify _ensure_examples and validate_examples * small fixes * hasattr checks for self.cfg and self.vocab * make is_resizable and is_trainable properties * serialize strings.json instead of vocab * fix KB IO + tests * fix typos * more typos * _added_strings as a set * few more tests specifically for _added_strings field * bump to 3.0.0a36	2020-10-08 21:33:49 +02:00
Ines Montani	064575d79d	Merge pull request #6216 from svlandeg/feature/nel-initialize	2020-10-08 11:14:12 +02:00
Ines Montani	43e59bb22a	Update docs and install extras [ci skip]	2020-10-08 10:58:50 +02:00
svlandeg	bcaad28eda	fix typos	2020-10-07 13:05:37 +02:00
Ines Montani	ce14520789	Update docs [ci skip]	2020-10-06 14:35:17 +02:00
Ines Montani	11347f34da	Tidy up, tests and docs	2020-10-04 13:54:05 +02:00
Ines Montani	b6b73a3ca8	Update docs [ci skip]	2020-10-01 17:45:29 +02:00
Ines Montani	0a8a124a6e	Update docs [ci skip]	2020-10-01 12:15:53 +02:00
Ines Montani	115481aca7	Update docs [ci skip]	2020-09-30 15:16:00 +02:00
Ines Montani	ff9a63bfbd	begin_training -> initialize	2020-09-28 21:35:09 +02:00
Ines Montani	e06ff8b71d	Update docs [ci skip]	2020-09-26 13:18:08 +02:00
Ines Montani	6ca06cb62c	Update docs and formatting [ci skip]	2020-09-23 10:14:27 +02:00
Ines Montani	60a317520a	Merge pull request #6109 from svlandeg/feature/2rename	2020-09-23 09:47:12 +02:00
Ines Montani	930b116f00	Update docs [ci skip]	2020-09-23 09:35:21 +02:00
svlandeg	b556a10808	rename converts in_to_out	2020-09-22 11:50:19 +02:00
Ines Montani	67fbcb3da5	Tidy up tests and docs	2020-09-21 20:43:54 +02:00
Ines Montani	012b3a7096	Update docs [ci skip]	2020-09-20 17:44:58 +02:00
Ines Montani	c8fa2247e3	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-09-17 12:34:15 +02:00
Ines Montani	6761028c6f	Update docs [ci skip]	2020-09-17 12:34:11 +02:00
Adriane Boyd	7e4cd7575c	Refactor Docs.is_ flags (#6044 ) * Refactor Docs.is_ flags * Add derived `Doc.has_annotation` method * `Doc.has_annotation(attr)` returns `True` for partial annotation * `Doc.has_annotation(attr, require_complete=True)` returns `True` for complete annotation * Add deprecation warnings to `is_tagged`, `is_parsed`, `is_sentenced` and `is_nered` * Add `Doc._get_array_attrs()`, which returns a full list of `Doc` attrs for use with `Doc.to_array`, `Doc.to_bytes` and `Doc.from_docs`. The list is the `DocBin` attributes list plus `SPACY` and `LENGTH`. Notes on `Doc.has_annotation`: * `HEAD` is converted to `DEP` because heads don't have an unset state * Accept `IS_SENT_START` as a synonym of `SENT_START` Additional changes: * Add `NORM`, `ENT_ID` and `SENT_START` to default attributes for `DocBin` * In `Doc.from_array()` the presence of `DEP` causes `HEAD` to override `SENT_START` * In `Doc.from_array()` using `attrs` other than `Doc._get_array_attrs()` (i.e., a user's custom list rather than our default internal list) with both `HEAD` and `SENT_START` shows a warning that `HEAD` will override `SENT_START` * `set_children_from_heads` does not require dependency labels to set sentence boundaries and sets `sent_start` for all non-sentence starts to `-1` * Fix call to set_children_form_heads Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-09-17 00:14:01 +02:00
Ines Montani	b7faa38960	Update docs [ci skip]	2020-09-15 12:44:03 +02:00
Ines Montani	154752f9c2	Update docs and consistency [ci skip]	2020-09-15 00:32:49 +02:00
Ines Montani	5ebb2a2ac8	Update docs [ci skip]	2020-09-13 22:36:20 +02:00
Ines Montani	47acb45850	Update docs [ci skip]	2020-09-13 22:30:33 +02:00
Ines Montani	8b0dabe987	Update docs [ci skip]	2020-09-12 17:05:10 +02:00
Ines Montani	c443c82722	Update docs [ci skip]	2020-09-05 13:41:10 +02:00
Ines Montani	b3e338d65e	Update docs [ci skip]	2020-09-04 20:58:36 +02:00
Ines Montani	157caf4dfa	WIP: update docs [ci skip]	2020-09-04 16:30:31 +02:00
Adriane Boyd	b927893309	Merge branch 'develop' into feature/dependency-matcher-v3	2020-09-04 13:03:30 +02:00
Ines Montani	121809dd1e	Fix anchor [ci skip]	2020-09-03 16:49:56 +02:00
Ines Montani	b5a0657fd6	"model" terminology consistency in docs	2020-09-03 13:13:03 +02:00
Adriane Boyd	960d9cfadc	Officially support DependencyMatcher Add official support for the `DependencyMatcher`. Redesign the pattern specification. Fix and extend operator implementations. Update API docs and add usage docs. Patterns -------- Refactor pattern structure to: ``` { "LEFT_ID": str, "REL_OP": str, "RIGHT_ID": str, "RIGHT_ATTRS": dict, } ``` The first node contains only `RIGHT_ID` and `RIGHT_ATTRS` and all subsequent nodes contain all four keys. New operators ------------- Because of the way patterns are constructed from left to right, it's helpful to have `follows` operators along with `precedes` operators. Add operators for simple precedes / follows alongside immediate precedes / follows. * `.`: precedes `;`: immediately follows * `;`: follows Operator fixes -------------- `<` and `<<` do not include the node itself * Fix reversed order for all operators involving linear precedence (`.`, all sibling operators) * Linear precedence operators do not match nodes outside the same parse Additional fixes ---------------- * Use v3 Matcher API * Support `get` and `remove` * Support pickling	2020-09-02 17:45:29 +02:00
Ines Montani	add9de5487	Deprecate (Phrase)Matcher.pipe	2020-08-31 17:01:24 +02:00
Sofie Van Landeghem	ec14744ee4	Rename Transformer listener (#6001 ) * rename to spacy-transformers.TransformerListener * add some more tok2vec tests * use select_pipes * fix docs - annotation setter was not changed in the end	2020-08-31 12:41:39 +02:00
Adriane Boyd	216efaf5f5	Restrict tokenizer exceptions to ORTH and NORM	2020-08-31 09:55:01 +02:00
Ines Montani	9b86312bab	Update docs [ci skip]	2020-08-29 18:43:19 +02:00
Adriane Boyd	870774f475	Merge branch 'develop' into docs/morph-usage-v3	2020-08-29 16:00:50 +02:00
Adriane Boyd	f9ed31a757	Update usage docs for lemmatization and morphology	2020-08-29 15:56:50 +02:00
Ines Montani	66d76f5126	Update docs	2020-08-29 12:36:05 +02:00
Ines Montani	8ac5ef1284	Update docs	2020-08-25 11:54:37 +02:00
Matthew Honnibal	e559867605	Allow spacy project to push and pull to/from remote storage (#5949 ) * Add utils for working with remote storage * WIP add remote_cache for project * WIP add push and pull commands * Use pathy in remote_cache * Updarte util * Update remote_cache * Update util * Update project assets * Update pull script * Update push script * Fix type annotation in util * Work on remote storage * Remove site and env hash * Fix imports * Fix type annotation * Require pathy * Require pathy * Fix import * Add a util to handle project variable substitution * Import push and pull commands * Fix pull command * Fix push command * Fix tarfile in remote_storage * Improve printing * Fiddle with status messages * Set version to v3.0.0a9 * Draft docs for spacy project remote storages * Update docs [ci skip] * Use Thinc config to simplify and unify template variables * Auto-format * Don't import Pathy globally for now Causes slow and annoying Google Cloud warning * Tidy up test * Tidy up and update tests * Update to latest Thinc * Update docs * variables -> vars * Update docs [ci skip] * Update docs [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2020-08-23 18:32:09 +02:00

1 2

74 Commits