HELP! How to find and remove duplicates? (Anything like SQL GROUP BY?)

63 views
Skip to first unread message

MV-dev1

unread,
Sep 21, 2015, 9:42:41 AM9/21/15
to orient-...@googlegroups.com
How can I write a query to find duplicate entries that have 2 values (x, y) that are the same?

Because of another bug #4880 I have been able to break my database so that I can't put back the UNIQUE index.

A SQL self-join or SQL GROUP BY would come in handy right now.

select x,y,count(*)
  from myTable
  group by x,y
  having count(*) > 1

PS  My table looks like this...

CREATE CLASS LL EXTENDS V;
    CREATE PROPERTY LL
.y DOUBLE;
    ALTER PROPERTY LL
.y NOTNULL true;
    ALTER PROPERTY LL
.y MANDATORY true;
    CREATE PROPERTY LL
.x DOUBLE;
    ALTER PROPERTY LL
.x NOTNULL true;
    ALTER PROPERTY LL
.x MANDATORY true;



MV-dev1

unread,
Sep 21, 2015, 9:53:42 AM9/21/15
to OrientDB
Sorry for the post but it might help others...   I didn't realize there was a GROUP BY....

Now to figure out how to solve my problems with it.

alessand...@gmail.com

unread,
Sep 21, 2015, 9:59:46 AM9/21/15
to OrientDB
Hi,
try this SELECT FROM (
  select x,y,count(*) as conteggio
  from myTable
  group by x,y
) where conteggio > 1


Kind regards,
Alessandro

MV-dev1

unread,
Sep 21, 2015, 10:21:40 AM9/21/15
to OrientDB
THANKS, yes, so now that I can get the list of duplicate y,x.

What is the most efficient way to remove records that I don't want, maybe all that have only 1 out link?

Maybe it happened at a certain time so I can take out everything after a certain @rid?

Any ideas on that?

Also, I don't know how to write a query to use these Y,X values because I don't know how to use an IN for a set.  Maybe that's possible?  Do I have to use LET?

SELECT y,x,max(@rid) FROM L0 WHERE COUNT(*) > 1 GROUP BY y,x     <<* doesn't work but it would return the last one

I think that I will write a program.

alessand...@gmail.com

unread,
Sep 21, 2015, 10:33:07 AM9/21/15
to OrientDB
Hi,
if you use "SELECT y,x,@rid FROM L0 WHERE COUNT(*) > 1 GROUP BY y,x" it returns the last one

Alessandro

MV-dev1

unread,
Sep 21, 2015, 10:55:09 AM9/21/15
to orient-...@googlegroups.com
Yes, I am trying to make it much too complicated...

But I always find issues... ( I need to be hired as a tester for OrientDb)....

SELECT y,x,COUNT(*) as cnt FROM L0 WHERE COUNT(*) > 1 GROUP BY y,x    << RETURNS EVERYTHING, cnt = 1 is all down my list

SELECT y,x,COUNT(*) as cnt FROM L0 GROUP BY y,x ORDER BY cnt DESC  << These are the ones I need to focus on (this query works)

SELECT FROM (SELECT y,x,COUNT(*) as cnt FROM L0 GROUP BY y,x) WHERE cnt > 1 << Works.

SELECT FROM (SELECT y,x,@rid,COUNT(*) as cnt FROM L0 GROUP BY y,x) WHERE cnt > 1 LIMIT 1000  << This seems to get the last @rid like you suggested

SELECT FROM (SELECT y,x,MIN(@rid),COUNT(*) as cnt FROM L0 GROUP BY y,x) WHERE cnt > 1 LIMIT 1000 << This also seems to work, returning the first one.

^^^ I could use this to DELETE everything not with this first id but having the same y,x but I can't get that query to work.

x SELECT $x,$y FROM (SELECT COUNT(*) as cnt FROM LET $y=y,$x=x,$r=MIN(@rid) L0 GROUP BY y,x) WHERE cnt > 1 LIMIT 1000

I don't know how to write this.  But I'd like to run DELETE VERTEX L0 WHERE x=$x and y=$y and @rid <> r

But I can run the first query you suggested, get the results and then write a Java program to run queries for the 500k things that need to be cleaned.

The Java program could execute statements like...   "DELETE VERTEX L0 WHERE y = 49 and x = -123 and @rid <> #12:0"

alessand...@gmail.com

unread,
Sep 21, 2015, 11:54:34 AM9/21/15
to OrientDB
Hi,
try this code

OrientGraph g=new OrientGraph(currentPath);
Iterable<Vertex> result=g.command(new OSQLSynchQuery<Vertex>("SELECT FROM (SELECT y,x,MIN(@rid),COUNT(*) as cnt FROM l0 GROUP BY y,x) WHERE cnt > 1 LIMIT 1000 ")).execute();
for(Vertex v:result){
String x=v.getProperty("x");
String y=v.getProperty("y");
OrientVertex min=v.getProperty("MIN");
String id=min.getId().toString();
String query="Delete vertex L0 where ( @rid <> " + id + " and x='"+ x + "' and y='" + y +"')";
g.command(new OCommandSQL(query)).execute();
}
g.shutdown();

Alessandro

MV-dev1

unread,
Sep 21, 2015, 12:34:13 PM9/21/15
to orient-...@googlegroups.com
THANKS!

So, now I can create the hash indexes with no errors (in Studio, Console or from code) but get runtime errors when trying to continue my inserts...

Exception: Cannot index record L0{y:43,x:-79}: found duplicated key 'OCompositeKey{keys=[-79, 43]}' in index 'L0.i' previously assigned to the record #12:13
Exception: Cannot index record L0{y:43,x:-79}: found duplicated key 'OCompositeKey{keys=[-79, 43]}' in index 'L0.i' previously assigned to the record #12:13
Exception: Cannot index record L0{y:49,x:-122}: found duplicated key 'OCompositeKey{keys=[-122, 49]}' in index 'L0.i' previously assigned to the record #12:34
Exception: Cannot index record L0{y:40,x:-83}: found duplicated key 'OCompositeKey{keys=[-83, 40]}' in index 'L0.i' previously assigned to the record #12:184
Exception: Cannot index record L0{y:43,x:-79}: found duplicated key 'OCompositeKey{keys=[-79, 43]}' in index 'L0.i' previously assigned to the record #12:13
Exception: Cannot index record L0{y:43,x:-79}: found duplicated key 'OCompositeKey{keys=[-79, 43]}' in index 'L0.i' previously assigned to the record #12:13
Exception: Cannot index record L0{y:43,x:-79}: found duplicated key 'OCompositeKey{keys=[-79, 43]}' in index 'L0.i' previously assigned to the record #12:13

And yes, I did drop and rebuild them a few times.


EDIT:  but dropping, renaming and recreating the index a third time seemed to do the trick.

Here's my code to keep from adding a duplicate index...
    public static Vertex GetCreateLLnVertex(OrientGraphNoTx graph, String strClass, Double d1, Double d2) {
       
Vertex v;
       
String[] keys = {"y","x"};
       
Object[] values = {d1, d2};
       
Iterator<Vertex> iter = graph.getVertices(strClass, keys, values).iterator();

       
if (!iter.hasNext()) {
            v
= graph.addVertex("class:" + strClass);
            v
.setProperty("y", d1);
            v
.setProperty("x", d2);
       
} else
            v
= iter.next();
       
return v;
   
}




Reply all
Reply to author
Forward
0 new messages