Need help with simple query to sort & filter duplicates

10 views
Skip to first unread message

meg...@rediffmail.com

unread,
Apr 3, 2018, 7:55:21 AM4/3/18
to Neo4j
I am new to Neo4j Cypher queries. I am getting HPALM data in Neo4j & then rendering it on grafana. HPALM creates a new row with every test run under the same "test-id". Due to this the summary test status shows wrong count due to duplicate entries with one row showing as status=Passed (latest run) & another as status=Failed (older run). But I need only the latest row data, the one with status=Passed in this example. below is the query which shows sample data having duplicate enrties/ rows but with different test 'status' & different 'last-modified' time. I need query to extract the unique / distinct latest entries & filter out older duplicate entries/ rows.
you can see first 2 rows with same id=44079, test-id=72534 but different status Failed & Passed & different 'last-modified' date time.

MATCH (a:HP) WHERE a.project='CHP' and a.type='test-instance' and a.`cycle-id` ='11743' RETURN distinct a.id, a.`test-id`, a.`exec-date`, a.`last-modified`, a.status, a.name, a.owner order by a.id

╒═════╤═══════════╤═════════════╤═══════════════════╤═════════════╤══════╤═══════╕
│a.id │a.`test-id`│a.`exec-date`│a.`last-modified`  │a.status     │a.name│a.owner│
╞═════╪═══════════╪═════════════╪═══════════════════╪═════════════╪══════╪═══════╡
│44079│72534      │2018-03-09   │2018-03-09 07:42:30│Failed       │(null)│g942477│
├─────┼───────────┼─────────────┼───────────────────┼─────────────┼──────┼───────┤
│44079│72534      │2018-03-20   │2018-03-20 13:01:31│Passed       │(null)│g942477│
├─────┼───────────┼─────────────┼───────────────────┼─────────────┼──────┼───────┤
│44080│72533      │2018-03-07   │2018-03-07 11:12:54│Failed       │(null)│g942477│
├─────┼───────────┼─────────────┼───────────────────┼─────────────┼──────┼───────┤
│44080│72533      │2018-03-21   │2018-03-21 10:42:11│Passed       │(null)│g942477│
├─────┼───────────┼─────────────┼───────────────────┼─────────────┼──────┼───────┤
│44081│72532      │2018-03-05   │2018-03-05 11:51:28│Passed       │(null)│g942477│
├─────┼───────────┼─────────────┼───────────────────┼─────────────┼──────┼───────┤
│44082│72526      │2018-03-05   │2018-03-05 11:56:54│Failed       │(null)│g942477│
├─────┼───────────┼─────────────┼───────────────────┼─────────────┼──────┼───────┤
│44082│72526      │2018-03-21   │2018-03-21 10:57:40│Not Completed│(null)│g942477

Additionally if this is resolved I need to inject the same logic in below summary query to filter out the duplicates:
Currently the real count of Failed, No Run & Not Completed shows higher than expected due adding up of duplicates.
MATCH (a:HP)
WHERE  a.project='CHP'  and a.type='test-instance' and a.`cycle-id`='11743'
RETURN  a.status , count(*) as cnt, a.`cycle-id` order by a.status
╒═════════════╤═══╤════════════╕
│a.status     │cnt│a.`cycle-id`│
╞═════════════╪═══╪════════════╡
│Failed       │4  │11743       │
├─────────────┼───┼────────────┤
│No Run       │2  │11743       │
├─────────────┼───┼────────────┤
│Not Completed│5  │11743       │
├─────────────┼───┼────────────┤
│Passed       │31 │11743       │
└─────────────┴───┴────────────┘
Any help to resolve this is most appreciated. I need help forming query with "Group by" & "having" clause here with example. I also need help creating query for sorting by date time. 
Cannot find correct functions for date time addition / subtraction & sorting as the field type is "string". Let me know if you need more information. 
I am using version 3.0.6 of Neo4j
  • Version:
    3.0.6
  • Name:
    graph.db

Michael Hunger

unread,
Apr 3, 2018, 8:34:30 AM4/3/18
to ne...@googlegroups.com
Hi,


You could do two things:

1. Mark the last run with  a dedicated label, that you update if you insert new data.
And use that label for querying for the last one.

e.g.

OPTIONAL MATCH (n:HP:Last) WHERE n.id = $id  REMOVE n:Last
CREATE (n:HP:Last {id:$id, ....})


MATCH (a:HP:Last) WHERE a.project='CHP' and a.type='test-instance' and a.`cycle-id` ='11743' RETURN distinct a.id, a.`test-id`, a.`exec-date`, a.`last-modified`, a.status, a.name, a.owner order by a.id


2. Alternatively you can sort them by id and date


MATCH (a:HP) WHERE a.project='CHP' and a.type='test-instance' and a.`cycle-id` ='11743' 
WITH a ORDER BY a.id ASC, a.`exec-date` DESC
WITH a.id, head(collect(a)) as last
RETURN distinct last.id, last.`test-id`, last.`exec-date`, last.`last-modified`, last.status, last.name, last.owner order by last.id

HTH Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages