Commit Graph

9 Commits

Author SHA1 Message Date
Ines Montani
f50502dad7 Update docs [ci skip] 2021-01-19 00:22:47 +11:00
Adriane Boyd
bf0cdae8d4
Add token_splitter component (#6726)
* Add long_token_splitter component

Add a `long_token_splitter` component for use with transformer
pipelines. This component splits up long tokens like URLs into smaller
tokens. This is particularly relevant for pretrained pipelines with
`strided_spans`, since the user can't change the length of the span
`window` and may not wish to preprocess the input texts.

The `long_token_splitter` splits tokens that are at least
`long_token_length` tokens long into smaller tokens of `split_length`
size.

Notes:

* Since this is intended for use as the first component in a pipeline,
the token splitter does not try to preserve any token annotation.
* API docs to come when the API is stable.

* Adjust API, add test

* Fix name in factory
2021-01-17 19:54:41 +08:00
Ines Montani
3ae5e02f4f Update docs, types and API consistency 2020-08-17 16:45:24 +02:00
Ines Montani
7adbaf9a5b Update docs [ci skip] 2020-07-27 00:29:45 +02:00
Ines Montani
262d306eaa unicode -> str consistency 2020-05-24 17:23:00 +02:00
Ines Montani
82c16b7943 Remove u-strings and fix formatting [ci skip] 2019-09-12 16:11:15 +02:00
Christos Aridas
61f5c007a0 DOC Fix pipeline functions examples (#4189) 2019-08-23 19:15:32 +02:00
Ines Montani
296446a1c8
Tidy up and improve docs and docstrings (#3370)
<!--- Provide a general summary of your changes in the title. -->

## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs

### Types of change
enhancement, docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-08 11:42:26 +01:00
Ines Montani
e597110d31
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 19:31:19 +01:00