I would like to learn if there is a clear consensus in APIs regarding input validation and output encoding for text input from users. Specifically I'm asking about input cleansing of HTML input and output encoding (removing
<script> or other potentially dangerous tags that can lead to Cross-Site Scripting and other risks).
but the
GitHub API doc does not mention whether the input is cleansed or if the output escapes any HTML that may appear in the comment body.
<style>script {display: block;}</style>
&
<script type=3D"text/html" contenteditable>=20
and you don't have to escape everything... or anything.
Calling the Twitter API (using their
API Console - authenticate with OAuth) you can see the script tags etc. are
not escaped in the JSON response below.
...
"text": "<style>script {display: block;}</style>\n&\n<script type="text/html" contenteditable> \nand you don't have to escape everything... or anything.",
...
(Just normal JSON encoding of newlines as \n.) Instead, Twitter clients properly escapes this -- the API does not do any output encoding.
I'd like to confirm that this is the basic expectation of web APIs. I somewhat expect so, as APIs are designed for any potential client, not just HTML clients in browsers. Confusion can certainly arise if the client escapes any content that the API already escapes. No one wants to see <script> turned into &amp;lt;script&amp;gt; (I've certainly seen web apps do that!)
Thanks,
djb