recodtable and removing duplicates

225 views
Skip to first unread message

Margarita Mulero

unread,
Jun 28, 2022, 12:51:05 PM6/28/22
to camtrapR
Dear Juergen,
I am writing you because I am quite confused on readtable function in CameratrapR and after having carefully read the instructions (attached pdf), consulted   https://rdrr.io/cran/camtrapR/man/recordTable.html

Generate a species record table from camera trap images and videos Description. Generates a record table from camera trap images or videos. Images/videos must be sorted into station directories at least.
and read google forum, plus run several tests I am still confused. I hope you could provide some help on this:
I tagged camera trap images using Digikam and stored them in folders as indicated. I just have 1 camera/station, thus 1 folder/camera. When tagging in Digikam, I (using "human knowledge") decided what is a new animal and what is not. I created a tag with "number of individuals" and assigned value=1 for any new animal and tagged value=as 0 (zero) for the other photos where the same animal appears again. That is because I want to keep the photos correctly tagged (species field) for future use in artificial intelligence trainning. I plan to analyze the camera trap data filtering the final dataset only for "number of individuals"=1 (and discard the lines where =0), but this is for studying animal paths use, thus, I prefer to keep the csv as "virgin" as possible.

So, I want to use recordtable to generate a csv with ALL the data from my photos (some will have "number of individuals"=0 and some= 1, or 2..10 if many individuals are present in that picture). That is, I want a csv where I expect one line per photo, (as many lines as photos). I do NOT want the program to remove whatever it considers a duplicate or to artificially decide what is a duplicate. I am confused with the values I should input in the different arguments in recordTable to accomplish my goal .
According to camtrapR.pdf:

-removeDuplicateRecords
logical. If there are several records of the same species at the same station (also
same camera if cameraID is defined) at exactly the same time, show only one?
I understand that, if I had 2 photos of deer Aug12, 2020 13:00:01 in C1
and write=TRUE I should get 1 line for the 2 photos, because saying TRUE means I am asking R to do remove duplicates
and write= FALSE I should get 2 lines for the 2 photos because saying FALSE means I am asking R NOT to do remove duplicates

But to my surprise, in the instructions of the pdf I see: "removeDuplicateRecords determines whether duplicate records (identical station, species, date/time,
(and camera if applicable)) are are all returned (TRUE) or collapsed into a single unique record (FALSE). which is exactly the opposite (a typo in the instructions or am I misunderstanding?)
To clarify, I did a test using a total of 37 pictures from 2 cameras with both options, and the results are even more confusing, as I don t know how to interpret the column "nimages" that appears in the generated csv.
when I run the code using removeDuplicateRecords =TRUE I get a table with 26 lines, nimages column in the csv = always 1
when I run the code using removeDuplicateRecords =FALSE I get a table with 37 lines, nimages column in the csv = sometimes 1, sometimes 2, 3

So, it seems to me that the pdf instructions are incorrect, arent they?

And I would like to know:
-confirmation if I should write (true or false) to get 1 line/ photo
-how to interpret the column "nimages" that appears in the generated csv. where does this value come from?

So I d like to know which code version is correct/the best complete and enough to get my csv as I want it?:
metadata <- recordTable(inDir                  = images,
                        IDfrom                 = "directory",
                        removeDuplicateRecords = FALSE)
head(metadata)
write.csv(metadata, file = "camtrapCOMPLET.csv" )

I also tried other alternatives such as:

metadata <- recordTable(inDir                  = images,
                        IDfrom                 = "directory",
                        minDeltaTime = 0,
                        deltaTimeComparedTo = "lastRecord",
                        removeDuplicateRecords = FALSE)
head(metadata)
write.csv(metadata, file = "camtrapCOMPLET.csv" )

But I am not sure what I am doing with the other possible (mandatory?) function arguments as I have these doubts:


-camerasIndependent
logical. If TRUE, species records are considered to be independent between cameras
at a station.
is it this one somehow affecting how the csv is created or is this for further analysis (occupancy models) etc? can I not include it in my code at all or if it is mandatory: what value should I write?

-minDeltaTime integer. Time difference between records of the same species at the same station
to be considered independent (in minutes)
I am confused about the use of "independent" here. That is, does this affect how photo metadata is exported (thus, the csv I get) or is this for further analysis? should I write =0 to get all my records?

-deltaTimeComparedTo
character. For two records to be considered independent, must the second one
be at least minDeltaTime minutes after the last independent record of the same
species ("lastIndependentRecord"), or minDeltaTime minutes after the last
record ("lastRecord")?
isn t this the same as minDeltaTime integer? or maybe deltaTimeComparedTo covers the same idea but for any species (and minDeltaTime integer just for individuals of the very same species)? again should I write =0 or FALSE to get all my records? or should I write = "lastRecord" which is what I found online but don t understand? why did you include an "?" at the end of your explanation in the pdf line referring to this?


I would highly appreciate your response,
Best regards

Juergen Niedballa

unread,
Jul 1, 2022, 4:51:06 AM7/1/22
to camt...@googlegroups.com

Hi,

please see replies below with #

Best regards,

Jürgen

Am 28.06.2022 um 18:51 schrieb Margarita Mulero:
Dear Juergen,
I am writing you because I am quite confused on readtable function in CameratrapR and after having carefully read the instructions (attached pdf), consulted   https://rdrr.io/cran/camtrapR/man/recordTable.html

Generate a species record table from camera trap images and videos Description. Generates a record table from camera trap images or videos. Images/videos must be sorted into station directories at least.
and read google forum, plus run several tests I am still confused. I hope you could provide some help on this:
I tagged camera trap images using Digikam and stored them in folders as indicated. I just have 1 camera/station, thus 1 folder/camera. When tagging in Digikam, I (using "human knowledge") decided what is a new animal and what is not. I created a tag with "number of individuals" and assigned value=1 for any new animal and tagged value=as 0 (zero) for the other photos where the same animal appears again. That is because I want to keep the photos correctly tagged (species field) for future use in artificial intelligence trainning. I plan to analyze the camera trap data filtering the final dataset only for "number of individuals"=1 (and discard the lines where =0), but this is for studying animal paths use, thus, I prefer to keep the csv as "virgin" as possible.

So, I want to use recordtable to generate a csv with ALL the data from my photos (some will have "number of individuals"=0 and some= 1, or 2..10 if many individuals are present in that picture). That is, I want a csv where I expect one line per photo, (as many lines as photos). I do NOT want the program to remove whatever it considers a duplicate or to artificially decide what is a duplicate. I am confused with the values I should input in the different arguments in recordTable to accomplish my goal .
According to camtrapR.pdf:

-removeDuplicateRecords
logical. If there are several records of the same species at the same station (also
same camera if cameraID is defined) at exactly the same time, show only one?
I understand that, if I had 2 photos of deer Aug12, 2020 13:00:01 in C1
and write=TRUE I should get 1 line for the 2 photos, because saying TRUE means I am asking R to do remove duplicates
and write= FALSE I should get 2 lines for the 2 photos because saying FALSE means I am asking R NOT to do remove duplicates
# Correct


But to my surprise, in the instructions of the pdf I see: "removeDuplicateRecords determines whether duplicate records (identical station, species, date/time,
(and camera if applicable)) are are all returned (TRUE) or collapsed into a single unique record (FALSE). which is exactly the opposite (a typo in the instructions or am I misunderstanding?)
# This was a typo in the documentation, sorry. It is fixed on GitHub.
To clarify, I did a test using a total of 37 pictures from 2 cameras with both options, and the results are even more confusing, as I don t know how to interpret the column "nimages" that appears in the generated csv.
when I run the code using removeDuplicateRecords =TRUE I get a table with 26 lines, nimages column in the csv = always 1
when I run the code using removeDuplicateRecords =FALSE I get a table with 37 lines, nimages column in the csv = sometimes 1, sometimes 2, 3
# removeDuplicateRecords =TRUE should return fewer images than removeDuplicateRecords =FALSE. I need to check nimages (in 2 weeks or so), but from what you describe it seems wrong (nimages is meant to show you how many images were collapsed into a single line of the record table).


So, it seems to me that the pdf instructions are incorrect, arent they?
# Yes, they are currently wrong in Version 2.2.0 on CRAN. It is fixed on GitHub. Thank you for pointing this out.


And I would like to know:
-confirmation if I should write (true or false) to get 1 line/ photo
# removeDuplicateRecords = FALSE
-how to interpret the column "nimages" that appears in the generated csv. where does this value come from?
# I need to check (will do so by mid July, currently I don't have time). I think it is supposed to show how many images a row in the table represents, but need to confirm. I see the issue you mention when running example(recordTable) - when comparing rec_table3a and rec_table3b


So I d like to know which code version is correct/the best complete and enough to get my csv as I want it?:
metadata <- recordTable(inDir                  = images,
                        IDfrom                 = "directory",
                        removeDuplicateRecords = FALSE)
head(metadata)
write.csv(metadata, file = "camtrapCOMPLET.csv" )

I also tried other alternatives such as:

metadata <- recordTable(inDir                  = images,
                        IDfrom                 = "directory",
                        minDeltaTime = 0,
                        deltaTimeComparedTo = "lastRecord",
                        removeDuplicateRecords = FALSE)
head(metadata)
write.csv(metadata, file = "camtrapCOMPLET.csv" )
# if you tagged images in digiKam, use IDfrom = "metadata". minDeltaTime is 0 by default, so there is no need to state it explicitly. If minDeltaTime = 0, deltaTimeComparedTo does not change anything.


But I am not sure what I am doing with the other possible (mandatory?) function arguments as I have these doubts:


-camerasIndependent
logical. If TRUE, species records are considered to be independent between cameras
at a station.
is it this one somehow affecting how the csv is created or is this for further analysis (occupancy models) etc? can I not include it in my code at all or if it is mandatory: what value should I write?
# Not necessary since you only have 1 camera per station.


-minDeltaTime integer. Time difference between records of the same species at the same station
to be considered independent (in minutes)
I am confused about the use of "independent" here. That is, does this affect how photo metadata is exported (thus, the csv I get) or is this for further analysis? should I write =0 to get all my records?
# yes, write = 0 to get all records (0 is the default). It does not affect how metadata are exported, but it affects if records are removed.


-deltaTimeComparedTo
character. For two records to be considered independent, must the second one
be at least minDeltaTime minutes after the last independent record of the same
species ("lastIndependentRecord"), or minDeltaTime minutes after the last
record ("lastRecord")?
isn t this the same as minDeltaTime integer? or maybe deltaTimeComparedTo covers the same idea but for any species (and minDeltaTime integer just for individuals of the very same species)? again should I write =0 or FALSE to get all my records? or should I write = "lastRecord" which is what I found online but don t understand? why did you include an "?" at the end of your explanation in the pdf line referring to this?

# deltaTimeComparedTo is only relevant if minDeltaTime is not 0. It is only relevant in the same species at the same stations (= locations).

See this thread for an explanation of the different options with a few examples.



I would highly appreciate your response,
Best regards
--
You received this message because you are subscribed to the Google Groups "camtrapR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to camtrapr+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/camtrapr/46ff7f57-1d7e-4222-9542-3358ac2344cfn%40googlegroups.com.

Margarita Mulero

unread,
Jul 6, 2022, 7:25:50 AM7/6/22
to camtrapR
Hi Juergen
Is it possible that the column "nimages" that appears in the generated csv.is simply an identifier to number each image obtained in that particular moment date/time h:min:second?
I see that when I use the code and do NOT remove duplicates, in the csv I get 1 line/image obtained and if-let s say I get 4 that have exactly the same date-time each has a different consecutive number in that "nimages":  1, 2, 3, 4, so I interpret that 4 images were obtained in the same second.
That seems to make some sense for me. And that implies it is NOT "...to show how many images a row in the table represents...". In other words. nimages does not COUNT but simply identifies?
thanks!

Juergen Niedballa

unread,
Jul 18, 2022, 11:00:26 AM7/18/22
to camtrapR
 Hi, 
the "n_images" column should have nothing to do with the removeDuplicateRecords argument. Instead, it tells you how many non-independent records (as defined by minDeltaTime) are represented by the record (a single row in the record table). 

I just noticed however that with minDeltaTime = 0 it only works as intended when removeDuplicateRecords = TRUE (which is the default). If minDeltaime = 0 and removeDuplicateRecords = FALSE,  the values are wrong and it indeed looks as if it is counting. That is an artefact and not intended. 

For now please use removeDuplicateRecords = T if you want to use the n_images column. 
If minDeltaTime = 0, the current implementation will always return n_images = 1. Only with higher minDeltaTime you get values >1 in n_images. See the example below.

# all records (without duplicates) - n_images is correct
rec_table4a <- recordTable(inDir                  = wd_images_ID_species,

                           IDfrom                 = "directory",
                           minDeltaTime           = 0,
                           exclude                = "UNID",
                           timeZone               = "Asia/Kuala_Lumpur",
                           removeDuplicateRecords = T
)

# all records (with duplicates) - n_images is wrong
rec_table4b <- recordTable(inDir                  = wd_images_ID_species,

                           IDfrom                 = "directory",
                           minDeltaTime           = 0,
                           exclude                = "UNID",
                           timeZone               = "Asia/Kuala_Lumpur",
                           removeDuplicateRecords = F
)

# 60 minute independence (without duplicates) - n_images is correct
rec_table4c <- recordTable(inDir                  = wd_images_ID_species,
                           IDfrom                 = "directory",
                           minDeltaTime           = 60,
                           exclude                = "UNID",
                           timeZone               = "Asia/Kuala_Lumpur",
                           deltaTimeComparedTo    = "lastRecord",
                           removeDuplicateRecords = T
)

# 60 minute independence (with duplicates) - not allowed; removeDuplicateRecords is set to TRUE automatically
rec_table4d <- recordTable(inDir                  = wd_images_ID_species,
                           IDfrom                 = "directory",
                           minDeltaTime           = 60,
                           exclude                = "UNID",
                           timeZone               = "Asia/Kuala_Lumpur",
                           deltaTimeComparedTo    = "lastRecord",
                           removeDuplicateRecords = F        # not allowed
)

# rec_table4c and  rec_table4c  are identical
all.equal(rec_table4c, rec_table4d)
[1] TRUE


table(rec_table4a$n_images)

 1
55

table(rec_table4b$n_images)    # wrong

 1  2  3
55 10  2

table(rec_table4c$n_images)

 1  2  3
24 14  1

table(rec_table4d$n_images)

 1  2  3
24 14  1 

Margarita Mulero

unread,
Jul 21, 2022, 6:52:27 AM7/21/22
to camtrapR
Dear Juergen
I am afraid either I am not understanding you or this is not working as intended.
When working with 2 folders containing a total of 37 images I tried 2 codes and:

Code 1: removeDuplicateRecords = FALSE

metadata <- recordTable(inDir                  = tag_images,

                        IDfrom                 = "metadata",

                        metadataSpeciesTag = "2Species",

                        minDeltaTime = "0",

                        stationCol="Station",

                        removeDuplicateRecords = FALSE)

 I get 37 lines, one per image= I am getting ALL records; nimages seem to be counting the number of images that exist for a time:h: second.


Code 2: removeDuplicateRecords = TRUE

metadata <- recordTable(inDir                  = tag_images,

                        IDfrom                 = "metadata",

                        metadataSpeciesTag = "2Species",

                        minDeltaTime = "0",

                        stationCol="Station",

                        removeDuplicateRecords = TRUE)

I get 26 lines= thus, I am missing data, it IS removing duplicates; nimages is always=1

 You say that: "For now please use removeDuplicateRecords = T if you want to use the n_images column. 
If minDeltaTime = 0, the current implementation will always return n_images = 1."

As I want to get a cvs with ALL my records, I believe I have to use code 1, that is the one  corresponding to your rectable 4b. that is:   removeDuplicateRecords = FALSE. (and not TRUE as you seem to suggest). ¿?

I have 2 questions: 1) why would I use removeDuplicateRecords = FALSE. if that is showing less lines than images? that means I am losing information. 2 )you say  "if you want to use the n_images column", why would me or anyone want to use that column? I still don t get the usefulness of it.

Just to clarify, as explained, when tagging in Digikam I added a tag number of individuals, for which I gave value= 0 for repetitions of an animal, value= 1 for a single appearance of an animal; value= 2,3...if several individuals are present. Therefore, my final analysis will be done deleting all the lines where number of individuals=1. So, I can t afford R to randomly "do not show" lines, as this means I may be missing the line where my animal appears as nindividuals=1.
Thanks, best regards

Juergen Niedballa

unread,
Jul 31, 2022, 1:45:54 PM7/31/22
to camt...@googlegroups.com

Hi Margarita,

if you want all records, use code 1 (removeDuplicateRecords = FALSE). The n_images column in that case is wrong, that is a bug. I'll fix it when I have the time.

As you noted, Code 2 removes duplicates and thus returns fewer images. This is intended. It is because two images of the same species in the same location in the same second usually do not provide any additional information. In your case it is different because of the custom n_individuals tags.

The n_images column is (currently) not designed to take duplicate images into account. It is for cases where minDeltaTime is greater than 0 (for example, 60 minutes). Then this column gives an indication how many non-independent images a given record represents. It is not intended for any analyses, but only to give the user an idea of how many records are lumped when setting minDeltaTime.

On that note, the function has two arguments for summarizing events by occasions (eventSummaryColumn, eventSummaryFunction). They is also intended for cases where minDeltaTime > 0. Summarizing is done after removing duplicates and thus would not help you. I may consider changing this, but would have to rethink a few things.

Best regards,

Jürgen

Reply all
Reply to author
Forward
0 new messages