>
> Question #1: Do you know of any research that demonstrates how many numbers returned from B_E CCN analyses are mathematically valid but still not real CCNs?
No, but the answer would be on the order of 10^14.
> After studying the B_E, I could not tell if there was a context sensitive stop list being utilized (I am examining version1.2).
They are in 1.3; I don't think that they work properly in 1.2.
> Question #2: Do you know if there is already a context sensitive stop list for B_E CCN analysis?
No, I do not know.
> Question #3: Do you think that a python post processor is a good approach?
No, I do not.
> Question #4: Would the fiwalk application perhaps be the best approach?
Best approach for what?
> Question #5: Do you think that manipulating / increasing the analysis within the internal B_E application code relating to scan_accts classes and functions is the best approach?
What do you mean best approach?
> Currently, B_E is validating CCNs by checking (after parsing with Flex) the following on the carved out numbers: (1) extract_digits_and _test, (2) prefix_test, (3) ccv1_test, (4) pattern_test, (5) histogram_test, (6) before_window test, and (7) after_window test.
> If any of the above fails, B_E reports the failure and suppresses the CCN. If all pass B_E reports a validation of the number being a CCN.
> Obviously, B_E is returning validation of many numbers that are ending up NOT being CCNs even after meeting the above 7 criteria.
Why obviously?
> Question #6: Can I assume that the code written for all of the tests is written correctly, including correct logic?
No, you should not assume that. I believe that it is formally undecidable if code is written correctly, including correct logic.
> Question #7: Could numbers be getting validated that are not actually passing all tests?
Yes.
> Question #8 Is there more contextual data surrounding the carved numbers that could bring greater accuracy (currently only 4 characters before and after are assessed for context)?
Probably.
> Question #9: Could the order of the testing logic be improved - is it occurring in the OPTIMAL order?
It is usually impossible to prove that something is Optimal.
> Question #10: Do we know if the CCN false positive problem is largely a Windows problem (lots of Windows PEs and DLLs have GUIDS that are passing all LUHN, length, and BIN tests?
I don't know what problem you are referring to.
> Question #11: What about the name of the file that the number is found in - would this assist with accurate identification (ie, improve accurate identification of the number as a true CCN or not)?
Huh? bulk_extractor doesn't know about files.
> Question #12: Could it be that all CCNs covered by PCI are simply not being addressed / captured by B_E code?
This question makes no sense.
> Question #13: Would it be useful to use the Mars Banks Base and / or the ISO/IEC 7812-1:2006 to refine the B_E search for CCNs by comparing the obtained list of CCNs from analysis to either or both of these CCN number databases post B_E processing - sort of like a plugin or post processor for B_E?
Sure. Is Mars Bank Base freely available?