0 | 238 | 2017-03-19 | 27.5 | 2.44 | $34.43 | 346 | 2017-03-19 10:36:00 | 27.5 | 2.44 | 29.94 | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 239 | 2017-03-19 | 34.5 | 3.06 | $41.32 | 255 | 2017-03-19 12:05:00 | 34.5 | 3.05 | 37.55 | |
2 | 239 | 2017-03-19 | 34.5 | 3.06 | $41.32 | 409 | 2017-03-19 08:46:00 | 34.5 | 3.05 | 37.55 | |
3 | 269 | 2017-03-19 | 34.5 | 3.06 | $41.31 | 255 | 2017-03-19 12:05:00 | 34.5 | 3.05 | 37.55 | |
4 | 269 | 2017-03-19 | 34.5 | 3.06 | $41.31 | 409 | 2017-03-19 08:46:00 | 34.5 | 3.05 | 37.55 | |
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
But, if you say that you get duplicated entries in the merged results, that means that you have duplicate values in the merge key.
In that case, it is the responsibility of the user to deal with those I think (eg keep the first, or keep the last, ..).
This isn't a problem in the Excel's PowerQuery plugin, so I don't see how this isn't just a case of finding the right idiom in Pandas to recreate this matching functionality.
If a match is found, then both lines should be excluded from further matching.
--
Can you try to explain how merge_asof is not solving your problem?
2 | 257 | O-2138298788 | 2017-03-19 | 13.0 | $1.15 | $16.27 | 347 | 2017-03-19 10:35:00 | 13.0 | 1.16 | 14.16 |
---|---|---|---|---|---|---|---|---|---|---|---|
3 | 251 | O-1940600619 | 2017-03-19 | 13.5 | $1.20 | $14.70 | 342 | 2017-03-19 10:44:00 | 13.5 | 1.19 | 14.69 |
4 | 254 | O-1344085405 | 2017-03-19 | 13.5 | $1.20 | $17.64 | 342 | 2017-03-19 10:44:00 | 13.5 | 1.19 | 14.69 |
5 | 253 | O-2040237287 | 2017-03-19 | 14.0 | $1.24 | $17.61 | 343 | 2017-03-19 10:41:00 | 14.0 | 1.24 | 15.24 |
6 | 245 | O-1220583480 | 2017-03-19 | 14.5 | $1.29 | $18.16 | 311 | 2017-03-19 11:14:00 | 14.5 | 1.29 | 15.79 |
7 | 247 | O-2139159005 | 2017-03-19 | 14.5 | $1.29 | $18.16 | 311 | 2017-03-19 11:14:00 | 14.5 | 1.29 | 15.79 |
8 | 258 | O-1372725154 | 2017-03-19 | 14.5 | $1.29 | $18.16 | 311 | 2017-03-19 11:14:00 | 14.5 | 1.29 | 15.79 |
But you have still not explained which idiom you want
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
("DF1", ['ID', 'Date', 'Subtotal', 'Tax', 'Total'])
("DF2", ['Time', 'Net Total', 'Tax', 'Total Due'])
What's
more. If you're the type of person who thinks better in code, than at
the pseudo-level. I totally dig it. Attached here are two CSV
files--have at them! (And perfect data masks other problems, so I added
two errors I manually coded into each: df2 on line 10 and df1 on line
20, and I dropped a row; not to mention, df1 doesn't have a time
stamp--only date). But you have still not explained which idiom you wantThat's why I'm reaching out to the community. (^_^)Why can't you then first drop all duplicates except for the first row, and then do the merge?
I think we're going in circles. The reason you can't drop duplicates, as I answered in my second email, is because duplicates in either DF (not the merged) are duplicate sales transactions.Tell me, what about the problem domain of sales transactions is still unclear to you?Let's talk about that, and then see what idioms in Pandas can be used to address this domain of problems.
You can for example do that by giving a small example data (the csv files you provided at least make it reproducible, but it would be even clearer if you would slim it down to the smallest possible reproducible example, eg leave out columns that do not matter, use less rows, ..),and also provide your expected result (up to now you only showed the output you didn't want).
df1 | X | df2 | Y | |
0 | 1.00 | 10.00 | 2.00 | 10.00 |
1 | 3.00 | 15.00 | 4.00 | 15.00 |
2 | 5.00 | 15.00 | 6.00 | 15.00 |
df1 | X | df2 | Y | |
0 | 7.00 | 20.00 | NaN | NaN |
1 | NaN | NaN | 8.00 | 15.75 |
1 | 3.00 | 15.00 | 4.00 | 15.00 |
2 | 5.00 | 15.00 | 4.00 | 15.00 |