gsub('H3D2_([^,]+).*', '\\1', "H3D2_clay no geo, obs 1/ObsNod.out")
So the regular expression part is the fixed string "H3D2_" plus a capture group (everything in the parenthesis), plus .* to match everything after (and including) the comma. gsub() captures the stuff in the parenthesis and this is the first capture group, which we refer to as "\\1" later.
The important part is the regex pattern ([^,]+) which means capture everything until a comma. Inside brackets, ^ means negate (so any character that's not a comma). This is non-greedy, which is a more conservative (and generally good) way to construct regular expressions.
Capture groups are super useful. You can use them to capture multiple groups too. In genomics, we often need to extract out chromosome/start/end position info formatted as "chrom:start-end". As an example, this could be done using gsub/strsplit with:
> gsub('(chr\\w+):(\\d+)-(\\d+)', '\\1;;;\\2;;;\\3', 'chr13:123-12313')
[1] "chr13;;;123;;;12313"
then:
> strsplit(gsub('(chr\\w+):(\\d+)-(\\d+)', '\\1;;;\\2;;;\\3', 'chr13:123-12313'), ';;;')
[[1]]
[1] "chr13" "123" "12313"
which can then be coerced into different forms. Or, for the tidy way:
> library(tidyverse)
> tibble(pos='chr13:123-12313') %>% extract(pos, into=c('chrom', 'start', 'end'), '(chr\\w+):(\\d+)-(\\d+)', convert=TRUE)
# A tibble: 1 × 3
chrom start end
* <chr> <int> <int>
1 chr13 123 12313
HTH,
Vince