cockroachdb digest since Thu Jan 2 02:30:00 UTC 2020

1 view
Skip to first unread message

Marvin

unread,
Jan 3, 2020, 12:05:59 AM1/3/20
to cockroach...@googlegroups.com, e...@cockroachlabs.com
Opened Pull Requests
ui: Upgrade to Webpack@^4 and babel@^7 versions
Opened by Koorosh at Thu Jan 2 09:05:47 with 2,083 additions, 1,454 deletions, 1 comments
●●●●● SIZE     pkg/ui: 3,537

Release note: None


Commits:
  • ui: Upgrade to Webpack@^4 and babel@^7 versions
    
    Release note: None
  • ui: Add missed dependency on babel-loader
    
    Release note: None
 
[wip] event system take 3: save mapping in DB from event names to job kinds
Opened by vilterp at Thu Jan 2 20:33:15 with 739 additions, 166 deletions, 0 comments
●●●● SIZE     pkg/jobs: 203 ,  pkg/eventsystem: 187 ,  console/mailers: 116 ,  console/events: 69 ,  console/migrations: 53 ,  intrusion/server: 40

Background

Goals

  • use the job system to execute reactions to events
  • persist events themselves in the DB
  • do something simple and reliable, ideally avoiding adding a big dep like kafka by just using the DB
  • provide a simple API which could be migrated to a different impl in the future
  • get properties of a pub/sub system
    • keep all knowledge of event reactions out of publishing process
    • allow an arbitrary number of reactions to an event, executed in an arbitrary number of processes

Problem

The central point of a pub/sub system is to decouple publishers from subscribers: a publisher publishes an event without knowing or caring how it will be reacted to; an arbitrary number of subscribers pick it up and do stuff with it.

We have infrastructure for the "doing stuff" part: the job system. The remaining question is: where does the mapping from events to jobs live? My previous attempts #1293 and #1299, kept it in the memory of the subscribing process. The awkward part of this was always "claiming" events to react to them — how could we ensure that all necessary reactions to an event were executed exactly once?

Realization

If we store the mapping from events to jobs in the DB, the publishing process can just look up this mapping and create jobs, and they'll be picked up by the job system of the subscribing process using the existing polling mechanism for running unclaimed jobs. The mapping would be stored in a table where each row represents a job that should be created in response to an event, e.g.:

  • row 1: "when there is a ClusterCreationFinished event, create a SendEmail job"
  • row 2: "when there is a ClusterCreationFinished event, create a NotifySupport job"

This event-to-jobs lookup could be done in a couple ways:

  • a separate server. I.e. you RPC to this server to publish an event, and it saves the events and the jobs, and perhaps executes the job by RPCing to the subscribing.
    • I was very inspired by Segment, which is essentially an event server like this. You post an event to them; they look up what destinations it should be delivered to, and webhook to them (using a job-retrying system called Centrifuge, which is strikingly similar to our job system, down to its DB schema)
  • directly in the publishing process itself, in the event system's Publish method) (went with this)

Design

https://github.com/cockroachlabs/managed-service/blob/10edc3d2dd7409d6d3d9f0400f4ea7efc3617271/pkg/eventsystem/events.go#L11-L24

https://github.com/cockroachlabs/managed-service/blob/10edc3d2dd7409d6d3d9f0400f4ea7efc3617271/console/migrations/20200102065719_add_event_job_mappings.up.sql#L1-L8

To subscribe to an event:

  • eventSystem.Subscribe(eventName, jobKind, handlerFunc) (normally at process startup)
    • it upserts a record in the event_job_mappings table, indicating that when an event of type eventName is created, a job of type jobKind should be created (passing the event payload as the job request)
    • it registers the given handler with the job system, which will poll for and run jobs of that type

To publish an event:

  • you call eventSystem.Publish(eventName, payload)
    • it saves the event to the events table
    • it looks up the mappings for this event name, and creates a job for each
    • (these jobs are picked up and executed by whatever process has registered handlers for them in its job system)

Status

sending an email on cluster creation finishing works. polling interval is 30s, which is pretty ok.

  • Intrusion publishes a ClusterCreationFinished event (and writes a SendClusterCreationFinishedEmail job)
  • Console picks it up and sends the email

TODOs / issues:

  • job system currently assumes that it will have all handlers registered in the current process
  • event payload is persistent 1+n times, where n is the number of mappings for that event name — once in events table, n times in jobs table. Not a big deal really...
  • ...?
  • tests should be more comprehensive

Commits:
  • events: get basic types in place
  • events: get basic methods in place
  • basic test
  • events: test passes!
    
    wow
  • persist events themselves
  • try testing proto payload
    
    gonna need to use an actual proto now
  • test payload with actual proto
  • upsert event mapping
    
    TODO: test
  • persist payload bytes
  • initialize event system in console; set up emails
  • separate out event system protos; job_name => job_kind
    
    email sent for first time!
  • set heartbeat on event-system-published jobs to way in the past
    
    so they get claimed immediately
  • foreign key jobs to events
    
    and fix tests
  • uncomment createKubeCluster
 
colexec: begin refactor of colexec type system
Opened by yuzefovich at Thu Jan 2 21:28:00 with 1,518 additions, 1,487 deletions, 1 comments
●●●● SIZE     pkg/col/colserde: 204 ,  pkg/sql/colexec: 178 ,  pkg/col/coldata: 168

pkg/col: rename coltypes package to phystypes

This commit renames col/coltypes to col/phystypes. We have "logical" type system (sql/types) and "physical" type system (previously col/coltypes). Later we will introduce "vectorized execution" type system (maybe something like colexec/colexectypes) which will combine two others. And the goal of this change is to reduce the amount of confusion.

Release note: None

colexec/typeconv: rename typeconv to colexectypes

This commit renames colexec/typeconv to colexec/colexectypes which will soon become the "vectorized execution type system" package.

Release note: None

colexectypes: introduce new colexectypes.T struct

This commit introduces a new "columnar execution" type that is a wrapper around phystypes.T that also stores the original "logical" SQL type.

Addresses: #43559.

Release note: None


Commits:
  • pkg/col: rename coltypes package to phystypes
    
    This commit renames col/coltypes to col/phystypes. We have "logical"
    type system (sql/types) and "physical" type system (previously
    col/coltypes). Later we will introduce "vectorized execution" type
    system (maybe something like colexec/colexectypes) which will combine
    two others. And the goal of this change is to reduce the amount of
    confusion.
    
    Release note: None
  • colexec/typeconv: rename typeconv to colexectypes
    
    This commit renames colexec/typeconv to colexec/colexectypes which will
    soon become the "vectorized execution type system" package.
    
    Release note: None
  • colexectypes: introduce new colexectypes.T struct
    
    This commit introduces a new "columnar execution" type that is a wrapper
    around phystypes.T that also stores the original "logical" SQL type.
    
    Release note: None
 
storage/batcheval/result: perform various cleanup on LocalResult struct
Opened by nvanbenschoten at Thu Jan 2 21:38:22 with 204 additions, 220 deletions, 1 comments
●●● SIZE     pkg/storage/batcheval/result: 164 ,  pkg/storage: 141 ,  pkg/storage/intentresolver: 105

This PR contains a series of cleanups to the result.LocalResult struct. It leaves the structure in a good position to be extended to include information about newly written, updated, and removed intents, which are hooked into the concurrency package in future commits the same way that UpdatedTxns is currently hooked into the TxnWaitQueue.

The changes are broken into a series of incremental steps to make them easier to review in isolation.


Commits:
  • storage/batcheval: clean up ordering of fields in LocalResult
    
    Release note: None
  • storage/batcheval: remove IntentsWithArg
    
    The Arg part of this was never used.
    
    Release note: None
  • storage/batcheval: rename LocalResult.Intents to EncounteredIntents
    
    Strictly a rename.
    
    Release note: None
  • storage/batcheval: store pointer to Txn in EndTxnIntents
    
    Small refactor.
    
    Release note: None
  • storage/batcheval: clean up read-only path LocalResult handling
    
    Small refactor.
    
    Release note: None
  • storage/batcheval: stop storing pointers to slices in LocalResult
    
    This commit removes the pattern of storing pointers to slices in
    LocalResult so that it remains "comaprable". This isn't worth the
    complexity it creates, so this commit removes the pattern and stores
    slices directly in the struct.
    
    Release note: None
 
sqlsmith: add support for interleaved tables
Opened by mjibson at Thu Jan 2 20:59:32 with 114 additions, 47 deletions, 2 comments
●●● SIZE     pkg/sql/sqlbase: 140

This commit adds interleaved table support to sqlsmith. When running with the rand-tables configuration, there's a 50% chance of all tables but the first one to get interleaved into a random other table.

Release note: None


Commits:
  • sqlsmith: add support for interleaved tables
    
    This commit adds interleaved table support to sqlsmith. When running
    with the rand-tables configuration, there's a 50% chance of all tables
    but the first one to get interleaved into a random other table.
    
    Release note: None
 
opt: add a rule to push filters into project-set
Opened by rohany at Thu Jan 2 18:54:19 with 81 additions, 0 deletions, 1 comments
●● SIZE     pkg/sql/opt/norm/rules: 28 ,  pkg/sql/opt/exec/execbuilder/testdata: 27 ,  pkg/sql/opt/norm/testdata/rules: 26

Fixes #42202.

This PR adds an optimizer rule to push filters down into project-set operations.

Release note: None


Commits:
  • opt: add a rule to push filters into project-set
    
    Fixes #42202.
    
    This PR adds an optimizer rule to push filters down into
    project-set operations.
    
    Release note: None
 
roachprod/vm/aws: distribute nodes over zones rather than regions for geo
Opened by ajwerner at Thu Jan 2 19:07:59 with 4 additions, 21 deletions, 2 comments
●● SIZE     pkg/cmd/roachprod/vm/aws: 25

Before this commit, for --geo AWS cluster creation we'd evenly distribute nodes over the regions represented by the zones in the --aws-zones flag. Within each region we'd then round-robin over the available zones in that region. My hunch is that this was done so that multiple AZs per region can be provided as the defaults for the --aws-zones flag and that nodes would still be spread out over all of the regions. This is an unnecessary departure from the logic for the gcloud vm provider.

This PR brings the --aws-zones behavior into conformance with that of the --gce-zones flag. In doing so it also chooses specific AZs in the default regions. This should only end up becoming problematic if we start to observe that there are an insufficient number of instances in any of those regions.

Release note: None


Commits:
  • roachprod/vm/aws: distribute nodes over zones rather than regions for geo
    
    Before this commit, for `--geo` AWS cluster creation we'd evenly distribute
    nodes over the regions represented by the zones in the `--aws-zones` flag.
    Within each region we'd then round-robin over the available zones in that
    region. My hunch is that this was done so that multiple AZs per region can
    be provided as the defaults for the `--aws-zones` flag and that nodes would
    still be spread out over all of the regions. This is an unnecessary departure
    from the logic for the gcloud vm provider.
    
    This PR brings the `--aws-zones` behavior into conformance with that of the
    `--gce-zones` flag. In doing so it also chooses specific AZs in the default
    regions. This should only end up becoming problematic if we start to observe
    that there are an insufficient number of instances in any of those regions.
    
    Fixes #43499
    
    Release note: None
 
sql: fix GetAllDatabaseDescriptorIDs panic
Opened by solongordon at Thu Jan 2 18:20:41 with 6 additions, 0 deletions, 1 comments
 SIZE     pkg/sql: 6

The GetAllDatabaseDescriptorIDs function was panicking when it encountered namespace rows which had been migrated from the old namespace table. Since the migration uses the SQL layer rather than KV puts, it writes sentinel KVs which are not normally written to this table. GetAllDatabaseDescriptorIDs was choking when it encountered these KVs since the value was an empty TUPLE rather than the INT it was expecting. I added a bit of logic to skip these sentinel KVs.

Fixes #43616

Release note: None


Commits:
  • sql: fix GetAllDatabaseDescriptorIDs panic
    
    The GetAllDatabaseDescriptorIDs function was panicking when it
    encountered namespace rows which had been migrated from the old
    namespace table. Since the migration uses the SQL layer rather than KV
    puts, it writes sentinel KVs which are not normally written to this
    table. GetAllDatabaseDescriptorIDs was choking when it encountered these
    KVs since the value was an empty TUPLE rather than the INT it was
    expecting. I added a bit of logic to skip these sentinel KVs.
    
    Fixes #43616
    
    Release note: None
 
sql: ignore SCRUB for SYSTEM in RSG tests
Opened by mjibson at Thu Jan 2 17:40:52 with 3 additions, 0 deletions, 1 comments
 SIZE     pkg/sql/tests: 3

See #43693

Release note: None


Commits:
  • sql: ignore SCRUB for SYSTEM in RSG tests
    
    See #43693
    
    Release note: None
 
Closed Pull Requests
opt: fix optsteps crash caused by constraint expressions
Closed by craig[bot] at Thu Jan 2 17:56:15 with 1,266 additions, 86 deletions, 5 comments
●●●●● SIZE     pkg/sql/opt/optbuilder/testdata: 688 ,  pkg/sql/opt/testutils/opttester/testdata: 434

opt: make FormatExpr easier to use

During debugging, it's sometimes useful to add a statement to print an expression. This is fairly hard and can crash easily because the memo or the catalog isn't set in the formatting context. In particular, it is impossible to print a scalar expression that contains a relational expression because FormatExpr only sets the memo when we print relational expressions.

This change improves the situation by allowing the caller of FormatExpr to pass the *Memo and by adding a convenience wrapper method on the Optimizer. We also automatically unset the "fully qualify" flag if there is no catalog (which would otherwise crash).

Release note: None

opt: fix optsteps crash caused by TableMeta expressions

The optbuilder creates scalar constraint and computed column expressions and hangs them off the TableMeta. If a normalization rule fires for one of these expressions, optsteps panics because it can't find a path to the expression in the memo.

This change fixes the problem by adding special code in opt_steps which effectively treats these expressions as children of Scan expressions.

Release note: None

opt: show TableMeta expressions when formatting expressions

We build check constraint and computed column expressions and attach them to TableMeta, to be used later by exploration rules. These are currently invisible. This change adds them under "canonical" scans (the normalized scan expression before any exploration rules).

Release note: None


Commits:
  • opt: make FormatExpr easier to use
    
    During debugging, it's sometimes useful to add a statement to print an
    expression. This is fairly hard and can crash easily because the memo
    or the catalog isn't set in the formatting context. In particular, it
    is impossible to print a scalar expression that contains a relational
    expression because `FormatExpr` only sets the memo when we print
    relational expressions.
    
    This change improves the situation by allowing the caller of
    `FormatExpr` to pass the `*Memo` and by adding a convenience wrapper
    method on the `Optimizer`. We also automatically unset the "fully
    qualify" flag if there is no catalog (which would otherwise crash).
    
    Release note: None
  • opt: fix optsteps crash caused by TableMeta expressions
    
    The optbuilder creates scalar constraint and computed column
    expressions and hangs them off the `TableMeta`. If a normalization
    rule fires for one of these expressions, `optsteps` panics because it
    can't find a path to the expression in the memo.
    
    This change fixes the problem by adding special code in opt_steps
    which effectively treats these expressions as children of Scan
    expressions.
    
    Release note: None
  • opt: show TableMeta expressions when formatting expressions
    
    We build check constraint and computed column expressions and attach
    them to `TableMeta`, to be used later by exploration rules. These are
    currently invisible. This change adds them under "canonical" scans
    (the normalized scan expression before any exploration rules).
    
    Release note: None
 
storage: write to AbortSpan less, clean up AbortSpan more
Closed by craig[bot] at Thu Jan 2 21:20:18 with 971 additions, 205 deletions, 4 comments
●●●● SIZE     pkg/storage/batcheval: 797

This PR introduces two improvements to our handling of the AbortSpan. It also introduces the first set of comprehensive testing around AbortSpan entry state transitions, which was sorely missing.

This comes after a few different customer issues that at least at first glance appeared to be AbortSpan leaks. There's still more to do to resolve those, mostly by improving GC, but anything we can do here on the frontend to reduce the number of AbortSpan entries that need to be GCed in the first place helps.

Clean up span on non-poisoning, aborting EndTransaction request

Fixes #29128.

Before this change, an EndTransaction request sent to rollback a transaction record would not remove any AbortSpan entries, even if its own Poison flag was set to false. This allowed AbortSpan entries to leak. This commit fixes this behavior by removing the AbortSpan entry in this case.

There were concerns raised in #29128 about this being safe. It turns out that we already do this on every Range except the transaction record's Range during normal intent resolution, so this isn't introducing any new concerns.

Only write AbortSpan entries if intents are removed

This reduces the frequency of AbortSpan entries that can be abandoned even without a transaction coordinator failure. Specifically, it protects against the case where intent resolution races with a transaction coordinator cleaning up its own transaction record and intents. This can happen for both aborted and committed transactions.

In the first case, a pusher might find a transaction's intent and then find its record to be aborted after that transaction had cleanly rolled back its own intents. Even though the transaction's coordinator had already cleaned up and potentially "unpoisoned" AbortSpans, the pusher would happily re-introduce AbortSpan records when it goes to resolve the intents that were already cleaned up. These AbortSpan entries would be fully abandoned and have to wait out the GC.

Similarly, in the second case, the transaction might have committed. Here, the pushee might hit an intent and the txn coordinator might clean up and auto-GC its txn record before the pushee arrives at the txn record. Once the pushee gets there, it would mistake the txn for aborted (by design) and proceed to write an AbortSpan record where the intent it had once observed had been (not by design).

We can tell both of these cases by simply recognizing whether intent resolution actually succeeds. If intent resolution doesn't find an intent, then we might be in either case. That's fine, because we only need to ever poison the abort span if we actually remove an intent that could confuse a zombie transaction.


Commits:
  • storage/batcheval: rename truncate_log_test.go to cmd_truncate_log_test.go
    
    This mirrors `cmd_truncate_log.go`.
  • storage/batcheval: test requests that modify the AbortSpan
    
    Up until this point, we had no testing around this. For instance, not a single
    test would catch it if we removed the deletion path from SetAbortSpan.
  • storage/batcheval: clean up span on non-poisoning, aborting EndTransaction request
    
    Fixes #29128.
    
    Before this change, an EndTransaction request sent to rollback a transaction record
    would not remove any AbortSpan entries, even if its own Poison flag was set to false.
    This allowed AbortSpan entries to leak. This commit fixes this behavior by removing
    the AbortSpan entry in this case.
    
    There were concerns raised in #29128 about this being safe. It turns out that we already
    do this on every Range except the transaction record's Range, so this isn't introducing
    any new concerns.
    
    Release note (bug fix): AbortSpan records are now cleaned up more
    aggresively when doing so is known to be safe.
  • storage: only write AbortSpan entries if intents are removed
    
    This reduces the frequency of AbortSpan entries that can be abandoned even
    without a transaction coordinator failure. Specifically, it protects against
    the case where intent resolution races with a transaction coordinator cleaning
    up its own transaction record and intents. This can happen for both aborted and
    committed transactions.
    
    In the first case, a pusher might find a transaction's intent and then find its
    record to be aborted after that transaction had cleanly rolled back its own intents.
    Even though the transaction's coordinator had already cleaned up and potentially
    "unpoisoned" AbortSpans, the pusher would happily re-introduce AbortSpan records when
    it goes to resolve the intents that were already cleaned up. These AbortSpan entries
    would be fully abandoned and have to wait out the GC.
    
    Similarly, in the second case, the transaction might have committed. Here, the
    pushee might hit an intent and the txn coordinator might clean up and auto-GC its
    txn record before the pushee arrives at the txn record. Once the pushee gets there,
    it would mistake the txn for aborted (by design) and proceed to write an AbortSpan
    record where the intent it had once observed had been (not by design).
    
    We can tell both of these cases by simply recognizing whether intent resolution actually
    succeeds. If intent resolution doesn't find an intent, then we might be in either
    case. That's fine, because we only need to ever poison the abort span if we actually
    remove an intent that could confuse a zombie transaction.
    
    Release note: None
  • storage: assert that Poison=false for EndTxn(Commit=true) reqs
    
    Return an error if not.
    
    Release note: None
  • storage/batcheval: rename SetAbortSpan to UpdateAbortSpan
    
    Purely a rename.
    
    Release note: None
  • storage/batcheval: remove impossible cases in TestUpdateAbortSpan
    
    Release note: None
 
protectedts/ptcache: implement the Cache to watch protectedts state
Closed by craig[bot] at Thu Jan 2 22:14:18 with 865 additions, 1 deletions, 2 comments
●●●● SIZE     pkg/storage/protectedts/ptcache: 763

The Cache periodically polls the protectedts state. The Cache polls the meta row from each node at a period defined by a cluster setting. Clients can toggle a manual refresh.

Release note: None.


Commits:
  • protectedts/ptcache: implement the Cache to watch protectedts state
    
    The Cache periodically polls the protectedts state. The Cache polls
    the meta row from each node at a period defined by a cluster setting.
    Clients can toggle a manual refresh.
    
    Release note: None.
 
rfc: temporary tables support
Closed by craig[bot] at Thu Jan 2 22:14:49 with 528 additions, 0 deletions, 31 comments
●●●● SIZE     docs/RFCS: 528
 
rfcs: resurrect RFC for altering primary key
Closed by craig[bot] at Thu Jan 2 22:14:46 with 253 additions, 175 deletions, 5 comments
●●● SIZE     docs/RFCS: 428

A year-and-a-half ago, David Taylor wrote an RFC for how to support primary key changes, but that work was deprioritized while the RFC was still a draft. Now we are ready to perform that work, so Rohan and I are bringing that RFC back from the dead. The core approach remains the same, but we've filled in a few TODOs and added more implementation details, syntax examples, and follow-up work ideas.

The original RFC got a few comments before it was shelved (see #25208), but we intend to go through a more thorough review process this time around.

Release note: None


Commits:
  • rfcs: resurrect RFC for altering primary key
    
    A year-and-a-half ago, David Taylor wrote an RFC for how to support
    primary key changes, but that work was deprioritized while the RFC was
    still a draft. Now we are ready to perform that work, so Rohan and I are
    bringing that RFC back from the dead.
    
    Our approach has changed significantly from the original proposal. The
    original idea was that users could first build the new primary index as
    a secondary index and then swap it in as the primary one. This also
    necessitated ensuring that all secondary indexes were compatible with
    both the old and new index.
    
    We ran into two major roadblocks when investigating how to implement
    that approach. The first was that it was not necessarily possible for
    non-unique secondary indexes to be compatible with both the old and new
    primary indexes, because they implicitly store primary index columns in
    their keys. This necessitated rewriting some secondary indexes when the
    primary index was swapped.
    
    The second problem we encountered was that there are many subtle
    encoding differences between primary and secondary indexes. This greatly
    complicated the proposed index swap.
    
    Due to these issues, we propose a new approach where ALTER PRIMARY KEY
    is implemented as a long-running job which rewrites indexes as
    necessary.
    
    Release note: None
 
lint: rename roachlint to roachvet, fold std vet into it, run roachvet on every PR
Closed by craig[bot] at Fri Jan 3 00:04:08 with 210 additions, 143 deletions, 15 comments
●●● SIZE     pkg/testutils/lint: 184 ,  pkg/cmd/roachvet: 84 ,  .: 37

In #42595 we disabled the roachlint lint test from running on every commit. This was done due to its outsized memory usage. This PR identifies that the root cause of this memory usage was the decision to use multichecker rather than unitchecker in the roachlint binary. Multichecker analyses all compilation units in a single phase whereas unitchecker analyzes each compilation unit separately. When run using the unitchecker frontend, roachlint does not seem to use more RAM than normal compilation. After making this observation I noticed that go vet is defined as a set of analysis passes. This change then rolls all of the go vet passes into our one vet binary.

The last second commit fixes a vet problem which was uncovered. The third commit enables vet for test files and fixes issues found in test files.

Release note: None


Commits:
  • lint: rename roachlint to roachvet, add go vet passes
    
    This commit seeks to improve the efficiency of our `lint` build phase.
    It does this by recognizing that `go vet` runs analysis stages. The `roachlint`
    binary already ran such passes. Firstly it realizes that `go vet` uses the
    `unitchecker` rather than the `multichecker`. The `multichecker` simultaneously
    checks all compilation units at the same time whereas the `unitchecker` checks
    a single compilation unit at a time.
    
    This change renames `roachlint` to `roachvet`. In doing so it also adds all of
    the standard analysis performed by `go vet` to `roachvet`. Lastly it replaces
    the `Vet` and `RoachLint` steps of our `LintTest` with a `RoachVet` step which
    performs all of the checking performed by the previous setup.
    
    Release note: None
  • util/timeutil: remove unnecessary conversion
    
    Before this commit we had an unnecessary conversion from string to string.
    
    Release note: None
  • lint,*: fix unnecessary conversions in tests and enable vet for test files
    
    Before this PR we filtered vet errors in tests. This commit fixes the
    remaining errors and re-enables vet for tests.
    
    Release note: None
 
release-19.2: importccl: resilient import over http
Closed by craig[bot] at Thu Jan 2 21:53:00 with 295 additions, 11 deletions, 3 comments
●●● SIZE     pkg/ccl/storageccl: 294

Backport:

  • 1/1 commits from "importccl: Resume interrupted http download during import." (#43374)
  • 1/1 commits from "importccl: improve handling of transient errors when importing over http" (#43558)

Please see individual PRs for details.

/cc @cockroachdb/release


Commits:
  • importccl: Resume interrupted http download during import.
    
    Make importing data into cockroach from external http servers
    resilient to connection interruption.
    
    If the download is interrupted, and if the external server
    supports "Accept-Ranges: bytes" (and most servers do support that),
    we attempt to request the next "Content-Range" from the server.
    
    Release note (enterprise change): more resilient http import
  • importccl: improve handling of transient errors when importing over http.
    
    Make importing data into cockroach from external http servers
    resilient to connection interruption due to connection reset by peer errors.
    Add backoff when retrying.
    
    Release note (enterprise change): more resilient http import
 
storage: stop modifying requests and returning responses from TxnWaitQueue
Closed by craig[bot] at Thu Jan 2 21:20:06 with 174 additions, 106 deletions, 6 comments
●●● SIZE     pkg/storage: 153 ,  pkg/storage/txnwait: 120

This PR contains three pieces of cleanup that pave the way for pulling the TxnWaitQueue underneath the upcoming pkg/storage/concurrency package. It removes all cases where the queue mandates an update to the PushTxnRequest that entered it. It also removes all cases where the queue could field requests directly and return results for them. This restricts the TxnWaitQueue's interface and allows it to be pushed into the concurrency manager abstraction without bloating this new abstraction.


Commits:
  • storage: don't use header timestamp to determine txn expiration
    
    This commit removes the use of a PushRequest's batch header timestamp
    to determine whether a pushee has expired or not. This is the first
    step in a refactor to remove cases where we mutate requests on the
    server before command evaluation.
    
    There is no migration needed for this change because it's impact
    is local to a single node - the leaseholder evaluating the request.
    
    Release note: None
  • storage: send force abort PushTxn req directly from TxnWaitQueue
    
    This commit removes the interaction between the TxnWaitQueue and the
    Replica-level retry loop when transaction deadlocks are detected. Instead
    of instructing a waiting PushTxnRequest to change to a force push request,
    the queue now launches push abort directly. This is cleaner and avoids
    the need to mutate the input batch in Replica.maybeWaitForPushee.
    
    Release note: None
 
release-19.1: storage/intentresolver: don't capture loop iteration vars in async task
Closed by nvanbenschoten at Thu Jan 2 19:22:21 with 200 additions, 12 deletions, 1 comments
●●● SIZE     pkg/storage/intentresolver: 208

Backport 1/1 commits from #43563.

/cc @cockroachdb/release


It's unclear if we've ever seen issues from this, but I intend to backport the fix to v19.2, v19.1, and v2.1. I believe the worst thing that could have happened is that a batch that observed multiple intents or pushed multiple txns would only end up cleaning up a single one of these. It would then run into some of these intents again when it tried to re-evaluate, forcing it to push again. This subverts the parallelism that we were trying to achieve here, but would never cause a stall.

Release note (bug fix): Ensure that all intents or transactions that a batch observes are asynchronously cleaned up.


Commits:
  • storage/intentresolver: don't capture loop iteration vars in async task
    
    It's unclear if we've ever seen issues from this, but I intend to backport the
    fix to v19.2, v19.1, and v2.1. I believe the worst thing that could have
    happened is that a batch that observed multiple intents or pushed multiple txns
    would only end up cleaning up a single one of these. It would then run into some
    of these intents again when it tried to re-evaluate, forcing it to push again.
    This subverts the parallelism that we were trying to achieve here, but would
    never cause a stall.
    
    Release note (performance improvement): A transaction running into multiple
    intents from an abandoned conflicting transaction cleans them up more efficiently.
 
release-19.2: storage/intentresolver: don't capture loop iteration vars in async task
Closed by nvanbenschoten at Thu Jan 2 19:22:13 with 200 additions, 12 deletions, 3 comments
●●● SIZE     pkg/storage/intentresolver: 208

Backport 1/1 commits from #43563.

/cc @cockroachdb/release


It's unclear if we've ever seen issues from this, but I intend to backport the fix to v19.2, v19.1, and v2.1. I believe the worst thing that could have happened is that a batch that observed multiple intents or pushed multiple txns would only end up cleaning up a single one of these. It would then run into some of these intents again when it tried to re-evaluate, forcing it to push again. This subverts the parallelism that we were trying to achieve here, but would never cause a stall.

Release note (bug fix): Ensure that all intents or transactions that a batch observes are asynchronously cleaned up.


Commits:
  • storage/intentresolver: don't capture loop iteration vars in async task
    
    It's unclear if we've ever seen issues from this, but I intend to backport the
    fix to v19.2, v19.1, and v2.1. I believe the worst thing that could have
    happened is that a batch that observed multiple intents or pushed multiple txns
    would only end up cleaning up a single one of these. It would then run into some
    of these intents again when it tried to re-evaluate, forcing it to push again.
    This subverts the parallelism that we were trying to achieve here, but would
    never cause a stall.
    
    Release note (performance improvement): A transaction running into multiple
    intents from an abandoned conflicting transaction cleans them up more efficiently.
 
Misc. IMPORT updates
Closed by lnhsingh at Thu Jan 2 18:21:33 with 124 additions, 64 deletions, 8 comments
●●● SIZE     v19.2: 94 ,  v20.1: 94

Edits:

  • Add example of DELIMITED with escaping. Closes #5895.
  • Add notes about DEFAULT values and how to set multiple options. Closes #5494, #5929.
  • Add strict_quotes option. Closes #5851

Commits:
  • IMPORT updates
    
    Add example of DELIMITED with escaping. Closes #5895.
    
    Add note about DEFAULT values and using multiple options.
    
    Closes #5494. Closes #5929.
    
    Add strict_quotes option. Closes #5851
    
    Edits based on feedback
    
    Edits based on feedback
 
storage: improve the error for aborted txns after lease transfer
Closed by craig[bot] at Thu Jan 2 17:32:24 with 309 additions, 241 deletions, 7 comments
●● SIZE     pkg/storage: 51 ,  pkg/roachpb: 27

Lease transfers wipe the timestamp cache, and so a txn that straddles a lease transfer will not be allowed to create its txn record. This commit introduces a specific reason for the TransactionAbortedError highlighting that there's a new lease.

Release note: None


Commits:
  • storage: improve the error for aborted txns after lease transfer
    
    Lease transfers wipe the timestamp cache, and so a txn that straddles a
    lease transfer will not be allowed to create its txn record. This commit
    introduces a specific reason for the TransactionAbortedError
    highlighting that there's a new lease.
    
    Release note: None
 
sql: minor error handling improvements
Closed by craig[bot] at Thu Jan 2 18:26:49 with 18 additions, 40 deletions, 3 comments
●● SIZE     pkg/sql/parser: 38 ,  pkg/sql: 11

See individual commits.


Commits:
  • sql: simplify a callback
    
    Remove an unused error.
    
    Release note: None
  • sql: add a note about some special statements
    
    The behavior of statements like SHOW TRANSACTION STATE is special with
    regards to transactions and errors. This patch notes that a surprising
    lack of error handling is intentional.
    
    Release note: None
 
roachtest: add a tpcc+alter for sqlsmith
Closed by craig[bot] at Thu Jan 2 23:25:34 with 27 additions, 9 deletions, 3 comments
●● SIZE     pkg/internal/sqlsmith: 34

This will test a large set of existing tables with schema changes.

Also rejigger the alter statement frequency. Drops are 1, renames 5, and creates 10. This should prevent the problem of drops happening too often.

Release note: None


Commits:
  • roachtest: add a tpcc+alter for sqlsmith
    
    This will test a large set of existing tables with schema changes.
    
    Also rejigger the alter statement frequency. Drops are 1, renames 5, and
    creates 10. This should prevent the problem of drops happening too often.
    
    Release note: None
 
sql: move Arul's TODOs to Solon
Closed by craig[bot] at Thu Jan 2 17:00:51 with 15 additions, 15 deletions, 2 comments
●● SIZE     pkg/sql: 16 ,  pkg/sql/sqlbase: 12

There are a bunch of TODO(whomever) left over. As per the Google style guide, a TODO is addressed to the person others can ask about the TODO. Asking whomever is not a thing. Solon, I guess you're the closest thing.

Instead or after this patch, please do me a favor if you don't mind and improve these TODOs. A bunch of them seem to be about code gated on a cluster version being ok to remove in the next version. That doesn't need a TODO. Others seem to be about not using a DeprecatedDatabaseKey thing. I think they can all be centralized on the struct definition, which struct also cries for an explanation.

Release note: None


Commits:
  • sql: move Arul's TODOs to Solon
    
    There are a bunch of TODO(whomever) left over. As per the Google style
    guide, a TODO is addressed to the person others can ask about the TODO.
    Asking whomever is not a thing. Solon, I guess you're the closest thing.
    
    Instead or after this patch, please do me a favor if you don't mind and
    improve these TODOs. A bunch of them seem to be about code gated on a
    cluster version being ok to remove in the next version. That doesn't
    need a TODO.  Others seem to be about not using a DeprecatedDatabaseKey
    thing. I think they can all be centralized on the struct definition,
    which struct also cries for an explanation.
    
    Release note: None
 
build(deps): bump handlebars from 4.1.2 to 4.5.3 in /console
Closed by chrisseto at Thu Jan 2 15:41:00 with 13 additions, 8 deletions, 0 comments
●● SIZE     console: 21

Bumps handlebars from 4.1.2 to 4.5.3.

Changelog

Sourced from handlebars's changelog.

v4.5.3 - November 18th, 2019

Bugfixes:

  • fix: add "no-prototype-builtins" eslint-rule and fix all occurences - f7f05d7
  • fix: add more properties required to be enumerable - 1988878

Chores / Build:

  • fix: use !== 0 instead of != 0 - c02b05f
  • add chai and dirty-chai and sinon, for cleaner test-assertions and spies, deprecate old assertion-methods - 93e284e, 886ba86, 0817dad, 93516a0

Security:

  • The properties __proto__, __defineGetter__, __defineSetter__ and __lookupGetter__ have been added to the list of "properties that must be enumerable". If a property by that name is found and not enumerable on its parent, it will silently evaluate to undefined. This is done in both the compiled template and the "lookup"-helper. This will prevent new Remote-Code-Execution exploits that have been published recently.

Compatibility notes:

  • Due to the security-fixes. The semantics of the templates using __proto__, __defineGetter__, __defineSetter__ and __lookupGetter__ in the respect that those expression now return undefined rather than their actual value from the proto.
  • The semantics have not changed in cases where the properties are enumerable, as in:
{
  __proto__: 'some string'
}
  • The change may be breaking in that respect, but we still only increase the patch-version, because the incompatible use-cases are not intended, undocumented and far less important than fixing Remote-Code-Execution exploits on existing systems.

Commits

v4.5.2 - November 13th, 2019

Bugfixes

  • fix: use String(field) in lookup when checking for "constructor" - d541378
  • test: add fluent API for testing Handlebars - c2ac79c

Compatibility notes:

  • no incompatibility are to be expected
... (truncated) Commits
  • c819c8b v4.5.3
  • 827c9d0 Update release notes
  • f7f05d7 fix: add "no-prototype-builtins" eslint-rule and fix all occurences
  • 1988878 fix: add more properties required to be enumerable
  • 886ba86 test/chore: add chai/expect and sinon to "runtime"-environment
  • 0817dad test: add sinon as global variable to eslint in the specs
  • 93516a0 test: add sinon.js for spies, deprecate current assertions
  • 93e284e chore: add chai and dirty-chai for better test assertions
  • c02b05f fix: use !== 0 instead of != 0
  • 8de121d v4.5.2
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.


Commits:
 
reducesql: also attempt to remove columns from PKs
Closed by craig[bot] at Thu Jan 2 17:56:13 with 11 additions, 6 deletions, 3 comments
 SIZE     pkg/testutils/reduce/reducesql: 17

Release note: None


Commits:
  • reducesql: also attempt to remove columns from PKs
    
    Release note: None
 
colexec/typeconv: remove orphaned Int1 case
Closed by craig[bot] at Fri Jan 3 00:04:01 with 0 additions, 8 deletions, 4 comments
 SIZE     pkg/sql/colexec/typeconv: 8

At some point we had support for Int1 (8 bit integer) in the vectorized engine, but it has been removed. This commit removes one of the leftover pieces.

Release note: None


Commits:
  • colexec/typeconv: remove orphaned Int1 case
    
    At some point we had support for Int1 (8 bit integer) in the vectorized
    engine, but it has been removed. This commit removes one of the leftover
    pieces.
    
    Release note: None
 
console: make billing email accept long TLDs
Closed by DuskEagle at Thu Jan 2 15:52:50 with 6 additions, 2 deletions, 2 comments
 SIZE     console/apipb: 6

Previously, long TLDs such as .marketing or .cloud were being rejected as invalid.


Commits:
  • console: make billing email accept long TLDs
    
    Previously, long TLDs such as `.marketing` or `.cloud` were being rejected
    as invalid.
 
kv: don't close TCS interceptors on errors
Closed by craig[bot] at Thu Jan 2 17:32:23 with 2 additions, 3 deletions, 3 comments
 SIZE     pkg/kv: 5

Before this patch, the TxnCoordSender was closing all the interceptors when on non-retriable errors. This was a useless optimization serving to stop the heartbeat loop early; the client was required to send a rollback to clean up the intents. This patch gets rid of the optimization in anticipation of the savepoints API, which will serve for error recovery.

Release note: None


Commits:
  • kv: don't close TCS interceptors on errors
    
    Before this patch, the TxnCoordSender was closing all the interceptors
    when on non-retriable errors. This was a useless optimization serving to
    stop the heartbeat loop early; the client was required to send a
    rollback to clean up the intents.
    This patch gets rid of the optimization in anticipation of the
    savepoints API, which will serve for error recovery.
    
    Release note: None
 
release-19.2: sql/flowinfra: stop swallowing error while flushing
Closed by ajwerner at Thu Jan 2 20:36:42 with 1 additions, 1 deletions, 1 comments
 SIZE     pkg/sql/flowinfra: 2

@solongordon I omitted a release note because I can't think of what the user-visible impact of not returning this error might be. If you can think of one let me know and I'll update the commit message.

Release note: None


Commits:
  • sql/flowinfra: stop swallowing error while flushing
    
    @solongordon I omitted a release note because I can't think of what the
    user-visible impact of not returning this error might be. If you can
    think of one let me know and I'll update the commit message.
    
    Release note: None
 
Enable saving bad csv rows to a side file
Closed by lnhsingh at Fri Jan 3 04:55:25 with 2 additions, 0 deletions, 5 comments
 SIZE     v19.2: 1 ,  v20.1: 1

Closes #5906.


Commits:
  • Enable saving bad csv rows to a side file
    
    Closes #5906.
    
    Edits based on feedback
 
Reply all
Reply to author
Forward
0 new messages