[sqlite-dev] End-of-line characters in the output of ".dump"

131 views
Skip to first unread message

Richard Hipp

unread,
Mar 11, 2017, 8:50:54 AM3/11/17
to sqlite-dev
Suppose you have a table like this:

CREATE TABLE t1(x TEXT);
INSERT INTO t1(x) VALUES('This field
contains a new-line character');

The text field contains a newline character (or maybe a newline and a
carriage-return if you typed the above on a Windows machine). If you
then using the ".dump" command of the CLI to reconstruct the database,
like this:

sqlite3 t1.db .dump | sqlite3 t1-copy.db

The newline and/or carriage-return characters might not make it
through the translation process, depending on your system and how it
is set up to handle end-of-line character translations. The SQLite
CLI could be improved to be cleaner about this, but it depends on
external utilities such as readline() or linenoise() whose handling of
end-of-line characters might not be consistent across all platforms,
so there are limits to what can be done there.

One possible work-around is to encode end-of-line characters specially
to ensure they survive a ".dump" and restore. This is done in a
branch. (https://www.sqlite.org/src/timeline?r=string-quoting-dump).
The output of the above database looks like this:

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE t1(x);
INSERT INTO t1 VALUES('This field'||char(10)||'contains a new-line character');
COMMIT;

The newline character gets translated into a char(10) function.

Question: Is this a desirable feature? Should it be merged to trunk?

Advantages: The .dump output is become immune to the irregularities
in end-of-line handling across systems.

Disadvantages: (1) The .dump output is no longer portable to EBCDIC
systems. (2) The output is more complex. (3) Not all SQL database
engines support the char() function or the || concatenation operator,
which makes the .dump output less portable. (4) The dump becomes
harder to read.

Mitigating Factors: This only comes up for databases that have text
fields with embedded end-of-line characters. My sense is that such
databases are uncommon.

Your feedback is appreciated.
--
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-dev mailing list
sqlit...@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-dev

Scott Robison

unread,
Mar 11, 2017, 8:59:43 AM3/11/17
to sqlit...@mailinglists.sqlite.org
I think it is reasonable to merge. Even if it is not 100% portable to other SQL engines, I think the greater utility is in being able to round trip a dump file back into SQLite reliably. Also, converting a control code into a "visible" character sequence could make further transformations on the dump file easier, even for EBCDIC systems.

Could it be done in the same way with X'##' sequences?

Scott Robison

unread,
Mar 11, 2017, 9:29:39 AM3/11/17
to sqlit...@mailinglists.sqlite.org
On Sat, Mar 11, 2017 at 6:59 AM, Scott Robison <sc...@casaderobison.com> wrote:
I think it is reasonable to merge. Even if it is not 100% portable to other SQL engines, I think the greater utility is in being able to round trip a dump file back into SQLite reliably. Also, converting a control code into a "visible" character sequence could make further transformations on the dump file easier, even for EBCDIC systems.

Could it be done in the same way with X'##' sequences?

Yes, the same could be done with X'##' sequences. At least this:

select typeof('asdf'||X'0A'||'asdf')

yields 'text' as the result.

Would it make sense to use this same technique for all control codes? Perhaps for UTF sequences too? I can imagine arguments both pro and con.

I haven't used an EBCDIC based system for almost 30 years, and never with SQLite. Does the binary form of the SQLite database change on an EBCDIC platform?

Alternatively (because you probably don't have enough your plate </sarcasm>) maybe there would be room for .dump and .dumpesc dot commands? .dump continues to work as it always has, .dumpesc escapes the data, and both can be fed back into the shell modulo EOL processing.

Just thoughts.

-- 
Scott Robison

Roger Binns

unread,
Mar 11, 2017, 10:53:51 AM3/11/17
to sqlit...@mailinglists.sqlite.org
On 11/03/17 05:50, Richard Hipp wrote:
> The newline and/or carriage-return characters might not make it
> through the translation process, ...

Note that embedded null characters don't make it through either:

create table foo(x);
insert into foo values('a' || char(0) || 'b');
.dump
INSERT INTO "foo" VALUES('a');

With APSW's dump implementation you get:

INSERT INTO foo VALUES('a'||X'00'||'b');

As Scott mentioned, it is probably a good idea to do something like this
for all control characters. Maybe also for invalid UTF-8 byte sequences?

I don't know if char() or X'' are better approaches. Would probably
have to see what postgres does and copy that.

Roger

signature.asc

James K. Lowden

unread,
Mar 11, 2017, 11:47:50 AM3/11/17
to sqlit...@mailinglists.sqlite.org
On Sat, 11 Mar 2017 08:50:49 -0500
Richard Hipp <d...@sqlite.org> wrote:

> The SQLite CLI could be improved to be cleaner about this, but it
> depends on external utilities such as readline() or linenoise()

Could you expand on that a bit, please? While I'm sure you've tested
it, the assertion indicts every application that uses readline. Doesn't
readline work at the terminal interface, and pass input transparently
if standard input is a file?

If you decide you don't want to pass newlines verbatim in a dump, I
suggest using vis(3) from BSD. Ubuntu packages it as libbsd-dev. The
output is reversible. There is a simple utility to restore the
newlines for systems that require them, or for visual inspection. (I
haven't been able to find it packaged, but it's easy enough to build.
Cf. http://cvsweb.netbsd.org/bsdweb.cgi/src/usr.bin/vis/.)

Using libvis:

sqlite3 t1.db .dump | sqlite3 t1-copy.db

Just Works, as on your branch.

sqlite3 t1.db .dump | unvis | less

shows the output with newlines, as it appears today.

sqlite3 t1.db .dump | unvis | some-db-utility

can read the output regardless of its support for char() or ||.

--jkl

Richard Hipp

unread,
Mar 11, 2017, 11:56:46 AM3/11/17
to sqlite...@mailinglists.sqlite.org, sqlit...@mailinglists.sqlite.org
On 3/11/17, James K. Lowden <jklo...@schemamania.org> wrote:
> On Sat, 11 Mar 2017 08:50:49 -0500
> Richard Hipp <d...@sqlite.org> wrote:
>
>> The SQLite CLI could be improved to be cleaner about this, but it
>> depends on external utilities such as readline() or linenoise()
>
> Could you expand on that a bit, please?

Readline() strips \n and \r from the end of each input line, does it
not? How is SQLite suppose to know if it was just a \n that was
stripped or both a \n and a \r?


--
D. Richard Hipp
d...@sqlite.org

Richard Hipp

unread,
Mar 11, 2017, 12:01:01 PM3/11/17
to sqlit...@mailinglists.sqlite.org
On 3/11/17, Scott Robison <sc...@casaderobison.com> wrote:
>
> Yes, the same could be done with X'##' sequences. At least this:
>
> select typeof('asdf'||X'0A'||'asdf')
>
> yields 'text' as the result.

That approach only works for UTF8. If you have a UTF16 database, it
breaks down. The use of char(), on the other hand, works for both.

Scott Robison

unread,
Mar 11, 2017, 2:35:34 PM3/11/17
to sqlit...@mailinglists.sqlite.org
On Mar 11, 2017 10:01 AM, "Richard Hipp" <d...@sqlite.org> wrote:
On 3/11/17, Scott Robison <sc...@casaderobison.com> wrote:
>
> Yes, the same could be done with X'##' sequences. At least this:
>
> select typeof('asdf'||X'0A'||'asdf')
>
> yields 'text' as the result.

That approach only works for UTF8.  If you have a UTF16 database, it
breaks down.  The use of char(), on the other hand, works for both.

Ah, good point. This is why you're the doctor.

James K. Lowden

unread,
Mar 11, 2017, 5:18:08 PM3/11/17
to sqlit...@mailinglists.sqlite.org
On Sat, 11 Mar 2017 11:56:41 -0500
Richard Hipp <d...@sqlite.org> wrote:

> >> The SQLite CLI could be improved to be cleaner about this, but it
> >> depends on external utilities such as readline() or linenoise()
> >
> > Could you expand on that a bit, please?
>
> Readline() strips \n and \r from the end of each input line, does it
> not? How is SQLite suppose to know if it was just a \n that was
> stripped or both a \n and a \r?

A quick test suggests that's not exactly what's happening. If readline
really is interfering, though,I think a better answer is go around
it when there's no tty.

My sqlite3 is not linked to readline, and strips out \r. Here are bash
and sqlite3:

$ printf '1\r\n2\r\n3\r\n' | bash -c 'while read f; do echo $f; done' |
cat -v
1^M
2^M
3^M

Bash uses readline, and reproduces \r faithfully.

$ printf "select '1\r\n2\r\n3\r\n';" | sqlite3 | cat -v
1
2
3

Where'd they go? (I'm using Ubuntu.)

Just to verify, no readline in my sqlite3:

$ ldd $(which sqlite3) | sed -E 's/>.+$/>/'
linux-vdso.so.1 =>
libdl.so.2 =>
libpthread.so.0 =>
libc.so.6 =>
/lib64/ld-linux-x86-64.so.2 (0x00007fd59113a000)

It looks to me that carriage returns are being lost without any help
from readline.

But, if readline is interfering on some platforms, sqlite3 isn't
obliged to use it to read from standard input when there's no keyboard
attached to it. Isn't it simpler to test with isatty(3), and use fgets
if it returns 0?

--jkl

Reply all
Reply to author
Forward
0 new messages