Commit Graph

43 Commits

Author SHA1 Message Date
Shubham Kumar
935be9dd6e
Fix some MD parsing of inline URLs (#3920) 2022-09-09 21:46:06 +02:00
Lonami Exo
3d32e16235 Fix within surrogate detection 2020-02-20 10:53:28 +01:00
Lonami Exo
d196c89825 Fix unparsing malformed entities, bump v1.10.10 2019-12-30 10:19:29 +01:00
Lonami Exo
f3111f93b2 Fix unparsing text with malformed message entities 2019-12-19 15:48:59 +01:00
Lonami Exo
05b770a93f Fix directly nested markdown entities 2019-07-06 12:55:44 +02:00
Lonami Exo
2d0fc8356f Fix markdown parsing for pre blocks and entity after entity 2019-07-05 20:31:43 +02:00
Lonami Exo
4e80e21ba1 Update markdown parser to support nested entities 2019-06-24 13:48:29 +02:00
Lonami Exo
962949008f Add new message entities to markdown/html parsers 2019-06-23 21:35:33 +02:00
Lonami Exo
229969192a Fix UnicodeDecodeError with malformed input on unparse text 2019-01-01 20:31:39 +01:00
Lonami Exo
aaee092a46 Locally strip outgoing message text respecting entities 2018-11-19 10:15:56 +01:00
painor
340f5614b5 Add name mention formatting to HTML and Markdown (#1019) 2018-10-04 15:56:32 +02:00
Lonami Exo
d64eb7ea2b Avoid cyclic imports on older Python versions 2018-06-29 11:04:42 +02:00
Lonami Exo
2fb5215f5f Fix parsers misbehaving with None text 2018-06-03 13:48:43 +02:00
Lonami Exo
8d7c7a19c0 Add some setters for custom.Message 2018-06-03 11:53:18 +02:00
Lonami Exo
eabaa3854a Replace offset with match.start() to allow custom regex 2018-04-03 13:47:40 +02:00
Lonami Exo
33e908de42 Fix markdown regex not supporting [] inside URLs 2018-03-22 19:02:08 +01:00
Lonami Exo
229cd78df0 Fix markdown's URL regex not acceping newlines 2018-02-27 14:10:02 +01:00
Lonami Exo
83d9d1d78e Fix markdown parser not inverting delimiters dict 2018-02-16 20:30:19 +01:00
Lonami Exo
75d99fbb53 Fix HTML entity parsing failing when needing surrogates 2018-02-15 11:52:46 +01:00
Lonami Exo
59a1a6aef2 Stop working with bytes on the markdown parser 2018-01-07 16:19:41 +01:00
Lonami Exo
ec4ca5dbfc More consistent with asyncio branch (style/small fixes)
Like passing an extra (invalid) dt parameter when serializing
a datetime, and handling more errors in the TcpClient class.
2018-01-05 18:31:48 +01:00
Lonami Exo
605c103f29 Add unparse markdown method 2017-11-26 17:16:59 +01:00
Lonami Exo
57a70d0d47 Document the extensions/ module 2017-11-26 17:14:28 +01:00
Lonami Exo
9767774147 Fix import in markdown parser not being relative 2017-11-17 15:57:48 +01:00
Lonami Exo
346c5bb303 Add method to md parser to extract text surrounded by entities 2017-11-16 19:13:13 +01:00
Lonami Exo
e5deaf5db8 Fix c4e07cf, md parsing adding unfinished entity at wrong offset 2017-11-16 19:07:53 +01:00
Lonami Exo
c4e07cff57 Fix unfinished markdown delimiters being stripped away 2017-11-10 11:44:27 +01:00
Lonami Exo
cb3f20db65 Clean up markdown parsing since tuples aren't used anymore 2017-11-10 11:41:49 +01:00
Lonami Exo
7d75eebdab Make markdown parser use only Telegram's MessageEntity's 2017-11-10 11:07:36 +01:00
Lonami Exo
83af705cc8 Add more comments to the markdown parser 2017-11-06 11:32:40 +01:00
Lonami Exo
3a2c3a9497 Fix URL regex for markdown was greedy (fix-up) 2017-11-06 11:22:58 +01:00
Lonami Exo
07ece83aba Fix overlapping markdown entities being skipped 2017-11-06 10:37:22 +01:00
Lonami Exo
4f80429215 Work on byte level when parsing markdown
Reasoning: instead encoding every character one by one as we
encounter them to use half their length as the correct offset,
we can simply encode the whole string at once as utf-16le and
work with that directly.
2017-11-06 10:29:32 +01:00
Viktor Oreshkin
49eb281251 Proper offset calculation for markdown (#407)
Dan suca
If Dan shared it with Traitor I'll not have to spend my time on this
Not a, sorry for not letting you sleep
k thx bye
Will this stay in history?
2017-11-06 00:17:22 +01:00
Lonami Exo
82cac4836c Fix markdown URL parsing using character index instead offset 2017-10-30 11:15:53 +01:00
Lonami Exo
0a14aa1bc6 Remove additional check when calculating emojies length
This special check treated some emojies as 3 characters long but
this shouldn't have actually been done, likely due to the old
regex matching more things as emoji than it should (which would
have count as 2 too, making up for 1+3 from the new is_emoji()).
2017-10-30 10:56:39 +01:00
Lonami Exo
2609bd9bd1 Use constants and allow empty URL regex when parsing markdown 2017-10-29 18:21:21 +01:00
Lonami Exo
d47a9f83d0 Fix some special cases which are not treated as emojis (offset 1) 2017-10-29 17:07:37 +01:00
Lonami Exo
bcaa8007a3 Fix inline URL matching swallowing all parse entities 2017-10-29 16:43:30 +01:00
Lonami Exo
f5fafc6a27 Enhance emoji detection 2017-10-29 16:41:30 +01:00
Lonami Exo
368269cb11 Add ability to parse inline URLs 2017-10-29 16:33:10 +01:00
Lonami Exo
9600a9ea0b Fix markdown parsing failing if delimiter was last character 2017-10-28 19:17:18 +02:00
Lonami Exo
5adec2e1ab Initial attempt at parsing Markdown-like syntax 2017-10-28 19:06:41 +02:00