You might try modifying this custom script:
https://gist.github.com/walterst/2c592044b3b9e44a4290 to only write out reads where the target primers are found-right now, if it doesn't find the primer(s) it writes out the read without cutting off the reads at the primer site.
Since you're only searching for the forward primer, and you don't want to strip off parts of the reads (although maybe you also want to modify this to slice out the adjacent barcode region, depending upon what sort of randomness you have in the nucleotide sequences before the barcodes) as the code currently does, so maybe a modification like this towards the end of the code, where you change this part:
f_count = 0
r_count = 0
no_seq_left = 0
for label,seq,qual in parse_fastq(seqs):
start_slice = 0
end_slice = -1
for curr_primer in forward_primers:
if curr_primer.search(seq):
start_slice = int(curr_primer.search(seq).span()[1])
f_count += 1
for curr_primer in reverse_primers:
if curr_primer.search(seq):
end_slice = int(curr_primer.search(seq).span()[0])
r_count += 1
curr_seq = seq[start_slice:end_slice]
curr_qual = qual[start_slice:end_slice]
if len(curr_seq) < 1:
no_seq_left += 1
continue
formatted_fastq_line = format_fastq_record(label, curr_seq, curr_qual)
out_seqs.write("%s" % (formatted_fastq_line))
log_out.write("Forward primer hits: %d\n" % f_count)
log_out.write("Reverse primer hits: %d\n" % r_count)
log_out.write("No seq left after truncation: %d" % no_seq_left)
to something like this:
for label,seq,qual in parse_fastq(seqs):
found_primer = False
for curr_primer in forward_primers:
if curr_primer.search(seq):
found_primer = True
if not found_primer:
no_primer_hit += 1
continue
curr_seq = seq[start_slice:end_slice]
curr_qual = qual[start_slice:end_slice]
formatted_fastq_line = format_fastq_record(label, seq, qual)
out_seqs.write("%s" % (formatted_fastq_line))