Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

is diffkit able to generate a csv file with all diff lines from rsh file which are not in lsh file?

141 views
Skip to first unread message

joachi...@googlemail.com

unread,
May 16, 2013, 11:56:58 AM5/16/13
to diffki...@googlegroups.com
Hi all,

I would like to compare two csv files lsh and rsh with the same structure and with a header line.

Is it possible for diffkit to generate a csv file of the same structure containing the lines from rsh file which are not in lsh file?

Is there an example how to specify a plan file for that job?

Or can I use diffkit for that with sed or any other tool processing the diffkit output?

Many thanks!

Example

lsh file:
COLUMN1,COLUMN2,COLUMN3
1111,1111,1
1111,1111,2
4444,4444,1
4444,4444,2
6666,6666,1
6666,6666,2


rsh file:
COLUMN1,COLUMN2,COLUMN3
1111,1111,1
1111,1111,2
2222,4444,1
4444,4444,2
6666,6666,1
6666,6666,2
7777,7777,1

Output file:
COLUMN1,COLUMN2,COLUMN3
2222,4444,1
7777,7777,1

Sri

unread,
Oct 6, 2014, 8:21:21 AM10/6/14
to diffki...@googlegroups.com, joachi...@googlemail.com
+1 for this feature. If it already exists, please let us know.

Aaron Schumacher

unread,
Feb 4, 2015, 9:31:08 AM2/4/15
to diffki...@googlegroups.com, joachi...@googlemail.com
For the example you give, the unix `comm` utility can be used. Assuming you do know that the structures (and headers) of the two files are the same, and the input files are names `left` and `right`, this should do it:

```
head -1 left > output # to maintain the header row
comm -13 left right >> output
```

It is important that the input files be sorted, as shown in your example.

- Aaron

Paul Fitzpatrick

unread,
Feb 4, 2015, 9:56:22 AM2/4/15
to diffki...@googlegroups.com

You can also get close with daff [1]:
$ daff --context 0 --act insert --id COLUMN1 --id COLUMN2 --id
COLUMN3 lsh.csv rsh.csv
@@,COLUMN1,COLUMN2,COLUMN3
+++,2222,4444,1
+++,7777,7777,1

There's an extra column at the beginning though, which you could strip
with cut:
$ daff --context 0 --act insert --id COLUMN1 --id COLUMN2 --id
COLUMN3 lsh.csv rsh.csv | cut -d, -f 2-
COLUMN1,COLUMN2,COLUMN3
2222,4444,1
7777,7777,1

The daff flags used are as follows:
* --context 0: removes some context rows that daff puts in there by
default, like in a regular diff.
* --act insert: filters for insertions only, ignoring deletions or
modifications.
* --id COLUMN: adds column to the primary key for comparison.

Cheers,
Paul
[1] https://github.com/paulfitz/daff


On 02/04/2015 03:31 PM, Aaron Schumacher wrote:
> For the example you give, the unix `comm` utility can be used.
> Assuming you do know that the structures (and headers) of the two
> files are the same, and the input files are names `left` and `right`,
> this should do it:
>
> ```
> head -1 left > output # to maintain the header row
> comm -13 left right >> output
> ```
>
> It is important that the input files be sorted, as shown in your example.
>
> - Aaron
>
>
> On Monday, October 6, 2014 at 8:21:21 AM UTC-4, Sri wrote:
>
> +1 for this feature. If it already exists, please let us know.
>
> On Thursday, 16 May 2013 11:56:58 UTC-4, joachi...@googlemail.com
> wrote:
>
> Hi all,
>
> I would like to compare two csv files lsh and rsh with the
> same structure and with a header line.
>
> Is it possible for diffkit to generate a csv file of the same
> structure containing the lines from rsh file which are not in
> lsh file?
>
> Is there an example how to specify a plan file for that job?
>
> Or can I use diffkit for that with sed or any other tool
> processing the diffkit output?
>
> Many thanks!
>
> Example
>
> *lsh file:*
> COLUMN1,COLUMN2,COLUMN3
> 1111,1111,1
> 1111,1111,2
> 4444,4444,1
> 4444,4444,2
> 6666,6666,1
> 6666,6666,2
>
> *rsh file:*
> COLUMN1,COLUMN2,COLUMN3
> 1111,1111,1
> 1111,1111,2
> 2222,4444,1
> 4444,4444,2
> 6666,6666,1
> 6666,6666,2
> 7777,7777,1
>
> *Output file:*
> COLUMN1,COLUMN2,COLUMN3
> 2222,4444,1
> 7777,7777,1
>
> --
> You received this message because you are subscribed to the Google
> Groups "diffkit-user" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to diffkit-user...@googlegroups.com
> <mailto:diffkit-user...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages