cockroachdb digest since Thu Jan 2 02:30:00 UTC 2020

1 view

Skip to first unread message

Marvin

unread,

Jan 3, 2020, 5:05:59 AM1/3/20

to cockroach...@googlegroups.com, e...@cockroachlabs.com

Opened Pull Requests

ui: Upgrade to Webpack@^4 and babel@^7 versions Opened by Koorosh at Thu Jan 2 09:05:47 with 2,083 additions, 1,454 deletions, 1 comments ●●●●● SIZE pkg/ui: 3,537
Release note: None Commits: ui: Upgrade to Webpack@^4 and babel@^7 versions Release note: None ui: Add missed dependency on babel-loader Release note: None

[wip] event system take 3: save mapping in DB from event names to job kinds

Opened by vilterp at Thu Jan 2 20:33:15 with 739 additions, 166 deletions, 0 comments

●●●● SIZE pkg/jobs: 203 , pkg/eventsystem: 187 , console/mailers: 116 , console/events: 69 , console/migrations: 53 , intrusion/server: 40

Background

Goals

use the job system to execute reactions to events
persist events themselves in the DB
do something simple and reliable, ideally avoiding adding a big dep like kafka by just using the DB
provide a simple API which could be migrated to a different impl in the future
get properties of a pub/sub system
- keep all knowledge of event reactions out of publishing process
- allow an arbitrary number of reactions to an event, executed in an arbitrary number of processes

Problem

The central point of a pub/sub system is to decouple publishers from subscribers: a publisher publishes an event without knowing or caring how it will be reacted to; an arbitrary number of subscribers pick it up and do stuff with it.

We have infrastructure for the "doing stuff" part: the job system. The remaining question is: where does the mapping from events to jobs live? My previous attempts #1293 and #1299, kept it in the memory of the subscribing process. The awkward part of this was always "claiming" events to react to them — how could we ensure that all necessary reactions to an event were executed exactly once?

Realization

If we store the mapping from events to jobs in the DB, the publishing process can just look up this mapping and create jobs, and they'll be picked up by the job system of the subscribing process using the existing polling mechanism for running unclaimed jobs. The mapping would be stored in a table where each row represents a job that should be created in response to an event, e.g.:

row 1: "when there is a ClusterCreationFinished event, create a SendEmail job"
row 2: "when there is a ClusterCreationFinished event, create a NotifySupport job"

This event-to-jobs lookup could be done in a couple ways:

a separate server. I.e. you RPC to this server to publish an event, and it saves the events and the jobs, and perhaps executes the job by RPCing to the subscribing.
- I was very inspired by Segment, which is essentially an event server like this. You post an event to them; they look up what destinations it should be delivered to, and webhook to them (using a job-retrying system called Centrifuge, which is strikingly similar to our job system, down to its DB schema)
directly in the publishing process itself, in the event system's Publish method) (went with this)

Design

https://github.com/cockroachlabs/managed-service/blob/10edc3d2dd7409d6d3d9f0400f4ea7efc3617271/pkg/eventsystem/events.go#L11-L24

https://github.com/cockroachlabs/managed-service/blob/10edc3d2dd7409d6d3d9f0400f4ea7efc3617271/console/migrations/20200102065719_add_event_job_mappings.up.sql#L1-L8

To subscribe to an event:

eventSystem.Subscribe(eventName, jobKind, handlerFunc) (normally at process startup)
- it upserts a record in the event_job_mappings table, indicating that when an event of type eventName is created, a job of type jobKind should be created (passing the event payload as the job request)
- it registers the given handler with the job system, which will poll for and run jobs of that type

To publish an event:

you call eventSystem.Publish(eventName, payload)
- it saves the event to the events table
- it looks up the mappings for this event name, and creates a job for each
- (these jobs are picked up and executed by whatever process has registered handlers for them in its job system)

Status

sending an email on cluster creation finishing works. polling interval is 30s, which is pretty ok.

Intrusion publishes a ClusterCreationFinished event (and writes a SendClusterCreationFinishedEmail job)
Console picks it up and sends the email

TODOs / issues:

job system currently assumes that it will have all handlers registered in the current process
event payload is persistent 1+n times, where n is the number of mappings for that event name — once in events table, n times in jobs table. Not a big deal really...
...?
tests should be more comprehensive

Commits:

```
events: get basic types in place
```
```
events: get basic methods in place
```
```
basic test
```
```
events: test passes!

wow
```
```
persist events themselves
```

try testing proto payload

gonna need to use an actual proto now

```
test payload with actual proto
```
```
upsert event mapping

TODO: test
```
```
persist payload bytes
```

initialize event system in console; set up emails

separate out event system protos; job_name => job_kind

email sent for first time!

set heartbeat on event-system-published jobs to way in the past

so they get claimed immediately

foreign key jobs to events

and fix tests

```
uncomment createKubeCluster
```

colexec: begin refactor of colexec type system

Opened by yuzefovich at Thu Jan 2 21:28:00 with 1,518 additions, 1,487 deletions, 1 comments

●●●● SIZE pkg/col/colserde: 204 , pkg/sql/colexec: 178 , pkg/col/coldata: 168

pkg/col: rename coltypes package to phystypes

This commit renames col/coltypes to col/phystypes. We have "logical" type system (sql/types) and "physical" type system (previously col/coltypes). Later we will introduce "vectorized execution" type system (maybe something like colexec/colexectypes) which will combine two others. And the goal of this change is to reduce the amount of confusion.

Release note: None

colexec/typeconv: rename typeconv to colexectypes

This commit renames colexec/typeconv to colexec/colexectypes which will soon become the "vectorized execution type system" package.

Release note: None

colexectypes: introduce new colexectypes.T struct

This commit introduces a new "columnar execution" type that is a wrapper around phystypes.T that also stores the original "logical" SQL type.

Addresses: #43559.

Release note: None

Commits:

pkg/col: rename coltypes package to phystypes

This commit renames col/coltypes to col/phystypes. We have "logical"
type system (sql/types) and "physical" type system (previously
col/coltypes). Later we will introduce "vectorized execution" type
system (maybe something like colexec/colexectypes) which will combine
two others. And the goal of this change is to reduce the amount of
confusion.

Release note: None

colexec/typeconv: rename typeconv to colexectypes

This commit renames colexec/typeconv to colexec/colexectypes which will
soon become the "vectorized execution type system" package.

Release note: None

colexectypes: introduce new colexectypes.T struct

This commit introduces a new "columnar execution" type that is a wrapper
around phystypes.T that also stores the original "logical" SQL type.

Release note: None

storage/batcheval/result: perform various cleanup on LocalResult struct

Opened by nvanbenschoten at Thu Jan 2 21:38:22 with 204 additions, 220 deletions, 1 comments

●●● SIZE pkg/storage/batcheval/result: 164 , pkg/storage: 141 , pkg/storage/intentresolver: 105

This PR contains a series of cleanups to the result.LocalResult struct. It leaves the structure in a good position to be extended to include information about newly written, updated, and removed intents, which are hooked into the concurrency package in future commits the same way that UpdatedTxns is currently hooked into the TxnWaitQueue.

The changes are broken into a series of incremental steps to make them easier to review in isolation.

Commits:

storage/batcheval: clean up ordering of fields in LocalResult

Release note: None

storage/batcheval: remove IntentsWithArg

The Arg part of this was never used.

Release note: None

storage/batcheval: rename LocalResult.Intents to EncounteredIntents

Strictly a rename.

Release note: None

storage/batcheval: store pointer to Txn in EndTxnIntents

Small refactor.

Release note: None

storage/batcheval: clean up read-only path LocalResult handling

Small refactor.

Release note: None

storage/batcheval: stop storing pointers to slices in LocalResult

This commit removes the pattern of storing pointers to slices in
LocalResult so that it remains "comaprable". This isn't worth the
complexity it creates, so this commit removes the pattern and stores
slices directly in the struct.

Release note: None

sqlsmith: add support for interleaved tables

Opened by mjibson at Thu Jan 2 20:59:32 with 114 additions, 47 deletions, 2 comments

●●● SIZE pkg/sql/sqlbase: 140

This commit adds interleaved table support to sqlsmith. When running with the rand-tables configuration, there's a 50% chance of all tables but the first one to get interleaved into a random other table.

Release note: None

Commits:

sqlsmith: add support for interleaved tables

This commit adds interleaved table support to sqlsmith. When running
with the rand-tables configuration, there's a 50% chance of all tables
but the first one to get interleaved into a random other table.

Release note: None

opt: add a rule to push filters into project-set Opened by rohany at Thu Jan 2 18:54:19 with 81 additions, 0 deletions, 1 comments ●● SIZE pkg/sql/opt/norm/rules: 28 , pkg/sql/opt/exec/execbuilder/testdata: 27 , pkg/sql/opt/norm/testdata/rules: 26
Fixes #42202. This PR adds an optimizer rule to push filters down into project-set operations. Release note: None Commits: opt: add a rule to push filters into project-set Fixes #42202. This PR adds an optimizer rule to push filters down into project-set operations. Release note: None

roachprod/vm/aws: distribute nodes over zones rather than regions for geo

Opened by ajwerner at Thu Jan 2 19:07:59 with 4 additions, 21 deletions, 2 comments

●● SIZE pkg/cmd/roachprod/vm/aws: 25

Before this commit, for --geo AWS cluster creation we'd evenly distribute nodes over the regions represented by the zones in the --aws-zones flag. Within each region we'd then round-robin over the available zones in that region. My hunch is that this was done so that multiple AZs per region can be provided as the defaults for the --aws-zones flag and that nodes would still be spread out over all of the regions. This is an unnecessary departure from the logic for the gcloud vm provider.

This PR brings the --aws-zones behavior into conformance with that of the --gce-zones flag. In doing so it also chooses specific AZs in the default regions. This should only end up becoming problematic if we start to observe that there are an insufficient number of instances in any of those regions.

Release note: None

Commits:

roachprod/vm/aws: distribute nodes over zones rather than regions for geo

Before this commit, for `--geo` AWS cluster creation we'd evenly distribute
nodes over the regions represented by the zones in the `--aws-zones` flag.
Within each region we'd then round-robin over the available zones in that
region. My hunch is that this was done so that multiple AZs per region can
be provided as the defaults for the `--aws-zones` flag and that nodes would
still be spread out over all of the regions. This is an unnecessary departure
from the logic for the gcloud vm provider.

This PR brings the `--aws-zones` behavior into conformance with that of the
`--gce-zones` flag. In doing so it also chooses specific AZs in the default
regions. This should only end up becoming problematic if we start to observe
that there are an insufficient number of instances in any of those regions.

Fixes #43499

Release note: None

sql: fix GetAllDatabaseDescriptorIDs panic

Opened by solongordon at Thu Jan 2 18:20:41 with 6 additions, 0 deletions, 1 comments

● SIZE pkg/sql: 6

The GetAllDatabaseDescriptorIDs function was panicking when it encountered namespace rows which had been migrated from the old namespace table. Since the migration uses the SQL layer rather than KV puts, it writes sentinel KVs which are not normally written to this table. GetAllDatabaseDescriptorIDs was choking when it encountered these KVs since the value was an empty TUPLE rather than the INT it was expecting. I added a bit of logic to skip these sentinel KVs.

Fixes #43616

Release note: None

Commits:

sql: fix GetAllDatabaseDescriptorIDs panic

The GetAllDatabaseDescriptorIDs function was panicking when it
encountered namespace rows which had been migrated from the old
namespace table. Since the migration uses the SQL layer rather than KV
puts, it writes sentinel KVs which are not normally written to this
table. GetAllDatabaseDescriptorIDs was choking when it encountered these
KVs since the value was an empty TUPLE rather than the INT it was
expecting. I added a bit of logic to skip these sentinel KVs.

Fixes #43616

Release note: None

sql: ignore SCRUB for SYSTEM in RSG tests Opened by mjibson at Thu Jan 2 17:40:52 with 3 additions, 0 deletions, 1 comments ● SIZE pkg/sql/tests: 3
See #43693 Release note: None Commits: sql: ignore SCRUB for SYSTEM in RSG tests See #43693 Release note: None

Closed Pull Requests

opt: fix optsteps crash caused by constraint expressions

Closed by craig[bot] at Thu Jan 2 17:56:15 with 1,266 additions, 86 deletions, 5 comments

●●●●● SIZE pkg/sql/opt/optbuilder/testdata: 688 , pkg/sql/opt/testutils/opttester/testdata: 434

opt: make FormatExpr easier to use

During debugging, it's sometimes useful to add a statement to print an expression. This is fairly hard and can crash easily because the memo or the catalog isn't set in the formatting context. In particular, it is impossible to print a scalar expression that contains a relational expression because FormatExpr only sets the memo when we print relational expressions.

This change improves the situation by allowing the caller of FormatExpr to pass the *Memo and by adding a convenience wrapper method on the Optimizer. We also automatically unset the "fully qualify" flag if there is no catalog (which would otherwise crash).

Release note: None

opt: fix optsteps crash caused by TableMeta expressions

The optbuilder creates scalar constraint and computed column expressions and hangs them off the TableMeta. If a normalization rule fires for one of these expressions, optsteps panics because it can't find a path to the expression in the memo.

This change fixes the problem by adding special code in opt_steps which effectively treats these expressions as children of Scan expressions.

Release note: None

opt: show TableMeta expressions when formatting expressions

We build check constraint and computed column expressions and attach them to TableMeta, to be used later by exploration rules. These are currently invisible. This change adds them under "canonical" scans (the normalized scan expression before any exploration rules).

Release note: None

Commits:

opt: make FormatExpr easier to use

During debugging, it's sometimes useful to add a statement to print an
expression. This is fairly hard and can crash easily because the memo
or the catalog isn't set in the formatting context. In particular, it
is impossible to print a scalar expression that contains a relational
expression because `FormatExpr` only sets the memo when we print
relational expressions.

This change improves the situation by allowing the caller of
`FormatExpr` to pass the `*Memo` and by adding a convenience wrapper
method on the `Optimizer`. We also automatically unset the "fully
qualify" flag if there is no catalog (which would otherwise crash).

Release note: None

opt: fix optsteps crash caused by TableMeta expressions

The optbuilder creates scalar constraint and computed column
expressions and hangs them off the `TableMeta`. If a normalization
rule fires for one of these expressions, `optsteps` panics because it
can't find a path to the expression in the memo.

This change fixes the problem by adding special code in opt_steps
which effectively treats these expressions as children of Scan
expressions.

Release note: None

opt: show TableMeta expressions when formatting expressions

We build check constraint and computed column expressions and attach
them to `TableMeta`, to be used later by exploration rules. These are
currently invisible. This change adds them under "canonical" scans
(the normalized scan expression before any exploration rules).

Release note: None

storage: write to AbortSpan less, clean up AbortSpan more

Closed by craig[bot] at Thu Jan 2 21:20:18 with 971 additions, 205 deletions, 4 comments

●●●● SIZE pkg/storage/batcheval: 797

This PR introduces two improvements to our handling of the AbortSpan. It also introduces the first set of comprehensive testing around AbortSpan entry state transitions, which was sorely missing.

This comes after a few different customer issues that at least at first glance appeared to be AbortSpan leaks. There's still more to do to resolve those, mostly by improving GC, but anything we can do here on the frontend to reduce the number of AbortSpan entries that need to be GCed in the first place helps.

Clean up span on non-poisoning, aborting EndTransaction request

Fixes #29128.

Before this change, an EndTransaction request sent to rollback a transaction record would not remove any AbortSpan entries, even if its own Poison flag was set to false. This allowed AbortSpan entries to leak. This commit fixes this behavior by removing the AbortSpan entry in this case.

There were concerns raised in #29128 about this being safe. It turns out that we already do this on every Range except the transaction record's Range during normal intent resolution, so this isn't introducing any new concerns.

Only write AbortSpan entries if intents are removed

This reduces the frequency of AbortSpan entries that can be abandoned even without a transaction coordinator failure. Specifically, it protects against the case where intent resolution races with a transaction coordinator cleaning up its own transaction record and intents. This can happen for both aborted and committed transactions.

In the first case, a pusher might find a transaction's intent and then find its record to be aborted after that transaction had cleanly rolled back its own intents. Even though the transaction's coordinator had already cleaned up and potentially "unpoisoned" AbortSpans, the pusher would happily re-introduce AbortSpan records when it goes to resolve the intents that were already cleaned up. These AbortSpan entries would be fully abandoned and have to wait out the GC.

Similarly, in the second case, the transaction might have committed. Here, the pushee might hit an intent and the txn coordinator might clean up and auto-GC its txn record before the pushee arrives at the txn record. Once the pushee gets there, it would mistake the txn for aborted (by design) and proceed to write an AbortSpan record where the intent it had once observed had been (not by design).

We can tell both of these cases by simply recognizing whether intent resolution actually succeeds. If intent resolution doesn't find an intent, then we might be in either case. That's fine, because we only need to ever poison the abort span if we actually remove an intent that could confuse a zombie transaction.

Commits:

storage/batcheval: rename truncate_log_test.go to cmd_truncate_log_test.go

This mirrors `cmd_truncate_log.go`.

storage/batcheval: test requests that modify the AbortSpan

Up until this point, we had no testing around this. For instance, not a single
test would catch it if we removed the deletion path from SetAbortSpan.

storage/batcheval: clean up span on non-poisoning, aborting EndTransaction request

Fixes #29128.

Before this change, an EndTransaction request sent to rollback a transaction record
would not remove any AbortSpan entries, even if its own Poison flag was set to false.
This allowed AbortSpan entries to leak. This commit fixes this behavior by removing
the AbortSpan entry in this case.

There were concerns raised in #29128 about this being safe. It turns out that we already
do this on every Range except the transaction record's Range, so this isn't introducing
any new concerns.

Release note (bug fix): AbortSpan records are now cleaned up more
aggresively when doing so is known to be safe.

storage: only write AbortSpan entries if intents are removed

This reduces the frequency of AbortSpan entries that can be abandoned even
without a transaction coordinator failure. Specifically, it protects against
the case where intent resolution races with a transaction coordinator cleaning
up its own transaction record and intents. This can happen for both aborted and
committed transactions.

In the first case, a pusher might find a transaction's intent and then find its
record to be aborted after that transaction had cleanly rolled back its own intents.
Even though the transaction's coordinator had already cleaned up and potentially
"unpoisoned" AbortSpans, the pusher would happily re-introduce AbortSpan records when
it goes to resolve the intents that were already cleaned up. These AbortSpan entries
would be fully abandoned and have to wait out the GC.

Similarly, in the second case, the transaction might have committed. Here, the
pushee might hit an intent and the txn coordinator might clean up and auto-GC its
txn record before the pushee arrives at the txn record. Once the pushee gets there,
it would mistake the txn for aborted (by design) and proceed to write an AbortSpan
record where the intent it had once observed had been (not by design).

We can tell both of these cases by simply recognizing whether intent resolution actually
succeeds. If intent resolution doesn't find an intent, then we might be in either
case. That's fine, because we only need to ever poison the abort span if we actually
remove an intent that could confuse a zombie transaction.

Release note: None

storage: assert that Poison=false for EndTxn(Commit=true) reqs

Return an error if not.

Release note: None

storage/batcheval: rename SetAbortSpan to UpdateAbortSpan

Purely a rename.

Release note: None

storage/batcheval: remove impossible cases in TestUpdateAbortSpan

Release note: None

protectedts/ptcache: implement the Cache to watch protectedts state

Closed by craig[bot] at Thu Jan 2 22:14:18 with 865 additions, 1 deletions, 2 comments

●●●● SIZE pkg/storage/protectedts/ptcache: 763

The Cache periodically polls the protectedts state. The Cache polls the meta row from each node at a period defined by a cluster setting. Clients can toggle a manual refresh.

Release note: None.

Commits:

protectedts/ptcache: implement the Cache to watch protectedts state

The Cache periodically polls the protectedts state. The Cache polls
the meta row from each node at a period defined by a cluster setting.
Clients can toggle a manual refresh.

Release note: None.

rfc: temporary tables support Closed by craig[bot] at Thu Jan 2 22:14:49 with 528 additions, 0 deletions, 31 comments ●●●● SIZE docs/RFCS: 528
This RFC proposes to add support for temporary tables. RFC text: https://github.com/arulajmani/cockroach/blob/temp-tables-rfc/docs/RFCS/20191009_temp_tables.md Release note: None Commits: rfc: temporary tables support This RFC proposes to add support for temporary tables. Release note: None

rfcs: resurrect RFC for altering primary key

Closed by craig[bot] at Thu Jan 2 22:14:46 with 253 additions, 175 deletions, 5 comments

●●● SIZE docs/RFCS: 428

A year-and-a-half ago, David Taylor wrote an RFC for how to support primary key changes, but that work was deprioritized while the RFC was still a draft. Now we are ready to perform that work, so Rohan and I are bringing that RFC back from the dead. The core approach remains the same, but we've filled in a few TODOs and added more implementation details, syntax examples, and follow-up work ideas.

The original RFC got a few comments before it was shelved (see #25208), but we intend to go through a more thorough review process this time around.

Release note: None

Commits:

rfcs: resurrect RFC for altering primary key

A year-and-a-half ago, David Taylor wrote an RFC for how to support
primary key changes, but that work was deprioritized while the RFC was
still a draft. Now we are ready to perform that work, so Rohan and I are
bringing that RFC back from the dead.

Our approach has changed significantly from the original proposal. The
original idea was that users could first build the new primary index as
a secondary index and then swap it in as the primary one. This also
necessitated ensuring that all secondary indexes were compatible with
both the old and new index.

We ran into two major roadblocks when investigating how to implement
that approach. The first was that it was not necessarily possible for
non-unique secondary indexes to be compatible with both the old and new
primary indexes, because they implicitly store primary index columns in
their keys. This necessitated rewriting some secondary indexes when the
primary index was swapped.

The second problem we encountered was that there are many subtle
encoding differences between primary and secondary indexes. This greatly
complicated the proposed index swap.

Due to these issues, we propose a new approach where ALTER PRIMARY KEY
is implemented as a long-running job which rewrites indexes as
necessary.

Release note: None

lint: rename roachlint to roachvet, fold std vet into it, run roachvet on every PR

Closed by craig[bot] at Fri Jan 3 00:04:08 with 210 additions, 143 deletions, 15 comments

●●● SIZE pkg/testutils/lint: 184 , pkg/cmd/roachvet: 84 , .: 37

In #42595 we disabled the roachlint lint test from running on every commit. This was done due to its outsized memory usage. This PR identifies that the root cause of this memory usage was the decision to use multichecker rather than unitchecker in the roachlint binary. Multichecker analyses all compilation units in a single phase whereas unitchecker analyzes each compilation unit separately. When run using the unitchecker frontend, roachlint does not seem to use more RAM than normal compilation. After making this observation I noticed that go vet is defined as a set of analysis passes. This change then rolls all of the go vet passes into our one vet binary.

The last second commit fixes a vet problem which was uncovered. The third commit enables vet for test files and fixes issues found in test files.

Release note: None

Commits:

lint: rename roachlint to roachvet, add go vet passes

This commit seeks to improve the efficiency of our `lint` build phase.
It does this by recognizing that `go vet` runs analysis stages. The `roachlint`
binary already ran such passes. Firstly it realizes that `go vet` uses the
`unitchecker` rather than the `multichecker`. The `multichecker` simultaneously
checks all compilation units at the same time whereas the `unitchecker` checks
a single compilation unit at a time.

This change renames `roachlint` to `roachvet`. In doing so it also adds all of
the standard analysis performed by `go vet` to `roachvet`. Lastly it replaces
the `Vet` and `RoachLint` steps of our `LintTest` with a `RoachVet` step which
performs all of the checking performed by the previous setup.

Release note: None

util/timeutil: remove unnecessary conversion

Before this commit we had an unnecessary conversion from string to string.

Release note: None

lint,*: fix unnecessary conversions in tests and enable vet for test files

Before this PR we filtered vet errors in tests. This commit fixes the
remaining errors and re-enables vet for tests.

Release note: None

release-19.2: importccl: resilient import over http

Closed by craig[bot] at Thu Jan 2 21:53:00 with 295 additions, 11 deletions, 3 comments

●●● SIZE pkg/ccl/storageccl: 294

Backport:

1/1 commits from "importccl: Resume interrupted http download during import." (#43374)
1/1 commits from "importccl: improve handling of transient errors when importing over http" (#43558)

Please see individual PRs for details.

/cc @cockroachdb/release

Commits:

importccl: Resume interrupted http download during import.

Make importing data into cockroach from external http servers
resilient to connection interruption.

If the download is interrupted, and if the external server
supports "Accept-Ranges: bytes" (and most servers do support that),
we attempt to request the next "Content-Range" from the server.

Release note (enterprise change): more resilient http import

importccl: improve handling of transient errors when importing over http.

Make importing data into cockroach from external http servers
resilient to connection interruption due to connection reset by peer errors.
Add backoff when retrying.

Release note (enterprise change): more resilient http import

storage: stop modifying requests and returning responses from TxnWaitQueue

Closed by craig[bot] at Thu Jan 2 21:20:06 with 174 additions, 106 deletions, 6 comments

●●● SIZE pkg/storage: 153 , pkg/storage/txnwait: 120

This PR contains three pieces of cleanup that pave the way for pulling the TxnWaitQueue underneath the upcoming pkg/storage/concurrency package. It removes all cases where the queue mandates an update to the PushTxnRequest that entered it. It also removes all cases where the queue could field requests directly and return results for them. This restricts the TxnWaitQueue's interface and allows it to be pushed into the concurrency manager abstraction without bloating this new abstraction.

Commits:

storage: don't use header timestamp to determine txn expiration

This commit removes the use of a PushRequest's batch header timestamp
to determine whether a pushee has expired or not. This is the first
step in a refactor to remove cases where we mutate requests on the
server before command evaluation.

There is no migration needed for this change because it's impact
is local to a single node - the leaseholder evaluating the request.

Release note: None

storage: send force abort PushTxn req directly from TxnWaitQueue

This commit removes the interaction between the TxnWaitQueue and the
Replica-level retry loop when transaction deadlocks are detected. Instead
of instructing a waiting PushTxnRequest to change to a force push request,
the queue now launches push abort directly. This is cleaner and avoids
the need to mutate the input batch in Replica.maybeWaitForPushee.

Release note: None

release-19.1: storage/intentresolver: don't capture loop iteration vars in async task

Closed by nvanbenschoten at Thu Jan 2 19:22:21 with 200 additions, 12 deletions, 1 comments

●●● SIZE pkg/storage/intentresolver: 208

Backport 1/1 commits from #43563.

/cc @cockroachdb/release

It's unclear if we've ever seen issues from this, but I intend to backport the fix to v19.2, v19.1, and v2.1. I believe the worst thing that could have happened is that a batch that observed multiple intents or pushed multiple txns would only end up cleaning up a single one of these. It would then run into some of these intents again when it tried to re-evaluate, forcing it to push again. This subverts the parallelism that we were trying to achieve here, but would never cause a stall.

Release note (bug fix): Ensure that all intents or transactions that a batch observes are asynchronously cleaned up.

Commits:

storage/intentresolver: don't capture loop iteration vars in async task

It's unclear if we've ever seen issues from this, but I intend to backport the
fix to v19.2, v19.1, and v2.1. I believe the worst thing that could have
happened is that a batch that observed multiple intents or pushed multiple txns
would only end up cleaning up a single one of these. It would then run into some
of these intents again when it tried to re-evaluate, forcing it to push again.
This subverts the parallelism that we were trying to achieve here, but would
never cause a stall.

Release note (performance improvement): A transaction running into multiple
intents from an abandoned conflicting transaction cleans them up more efficiently.

release-19.2: storage/intentresolver: don't capture loop iteration vars in async task

Closed by nvanbenschoten at Thu Jan 2 19:22:13 with 200 additions, 12 deletions, 3 comments

●●● SIZE pkg/storage/intentresolver: 208

Backport 1/1 commits from #43563.

/cc @cockroachdb/release

Release note (bug fix): Ensure that all intents or transactions that a batch observes are asynchronously cleaned up.

Commits:

storage/intentresolver: don't capture loop iteration vars in async task

It's unclear if we've ever seen issues from this, but I intend to backport the
fix to v19.2, v19.1, and v2.1. I believe the worst thing that could have
happened is that a batch that observed multiple intents or pushed multiple txns
would only end up cleaning up a single one of these. It would then run into some
of these intents again when it tried to re-evaluate, forcing it to push again.
This subverts the parallelism that we were trying to achieve here, but would
never cause a stall.

Release note (performance improvement): A transaction running into multiple
intents from an abandoned conflicting transaction cleans them up more efficiently.

Misc. IMPORT updates

Closed by lnhsingh at Thu Jan 2 18:21:33 with 124 additions, 64 deletions, 8 comments

●●● SIZE v19.2: 94 , v20.1: 94

Edits:

Add example of DELIMITED with escaping. Closes #5895.
Add notes about DEFAULT values and how to set multiple options. Closes #5494, #5929.
Add strict_quotes option. Closes #5851

Commits:

IMPORT updates

Add example of DELIMITED with escaping. Closes #5895.

Add note about DEFAULT values and using multiple options.

Closes #5494. Closes #5929.

Add strict_quotes option. Closes #5851

Edits based on feedback

Edits based on feedback

storage: improve the error for aborted txns after lease transfer

Closed by craig[bot] at Thu Jan 2 17:32:24 with 309 additions, 241 deletions, 7 comments

●● SIZE pkg/storage: 51 , pkg/roachpb: 27

Lease transfers wipe the timestamp cache, and so a txn that straddles a lease transfer will not be allowed to create its txn record. This commit introduces a specific reason for the TransactionAbortedError highlighting that there's a new lease.

Release note: None

Commits:

storage: improve the error for aborted txns after lease transfer

Lease transfers wipe the timestamp cache, and so a txn that straddles a
lease transfer will not be allowed to create its txn record. This commit
introduces a specific reason for the TransactionAbortedError
highlighting that there's a new lease.

Release note: None

sql: minor error handling improvements

Closed by craig[bot] at Thu Jan 2 18:26:49 with 18 additions, 40 deletions, 3 comments

●● SIZE pkg/sql/parser: 38 , pkg/sql: 11

See individual commits.

Commits:

sql: simplify a callback

Remove an unused error.

Release note: None

sql: add a note about some special statements

The behavior of statements like SHOW TRANSACTION STATE is special with
regards to transactions and errors. This patch notes that a surprising
lack of error handling is intentional.

Release note: None

roachtest: add a tpcc+alter for sqlsmith

Closed by craig[bot] at Thu Jan 2 23:25:34 with 27 additions, 9 deletions, 3 comments

●● SIZE pkg/internal/sqlsmith: 34

This will test a large set of existing tables with schema changes.

Also rejigger the alter statement frequency. Drops are 1, renames 5, and creates 10. This should prevent the problem of drops happening too often.

Release note: None

Commits:

roachtest: add a tpcc+alter for sqlsmith

This will test a large set of existing tables with schema changes.

Also rejigger the alter statement frequency. Drops are 1, renames 5, and
creates 10. This should prevent the problem of drops happening too often.

Release note: None

sql: move Arul's TODOs to Solon

Closed by craig[bot] at Thu Jan 2 17:00:51 with 15 additions, 15 deletions, 2 comments

●● SIZE pkg/sql: 16 , pkg/sql/sqlbase: 12

There are a bunch of TODO(whomever) left over. As per the Google style guide, a TODO is addressed to the person others can ask about the TODO. Asking whomever is not a thing. Solon, I guess you're the closest thing.

Instead or after this patch, please do me a favor if you don't mind and improve these TODOs. A bunch of them seem to be about code gated on a cluster version being ok to remove in the next version. That doesn't need a TODO. Others seem to be about not using a DeprecatedDatabaseKey thing. I think they can all be centralized on the struct definition, which struct also cries for an explanation.

Release note: None

Commits:

sql: move Arul's TODOs to Solon

There are a bunch of TODO(whomever) left over. As per the Google style
guide, a TODO is addressed to the person others can ask about the TODO.
Asking whomever is not a thing. Solon, I guess you're the closest thing.

Instead or after this patch, please do me a favor if you don't mind and
improve these TODOs. A bunch of them seem to be about code gated on a
cluster version being ok to remove in the next version. That doesn't
need a TODO.  Others seem to be about not using a DeprecatedDatabaseKey
thing. I think they can all be centralized on the struct definition,
which struct also cries for an explanation.

Release note: None

build(deps): bump handlebars from 4.1.2 to 4.5.3 in /console

Closed by chrisseto at Thu Jan 2 15:41:00 with 13 additions, 8 deletions, 0 comments

●● SIZE console: 21

Bumps handlebars from 4.1.2 to 4.5.3.

Changelog

Sourced from handlebars's changelog.

v4.5.3 - November 18th, 2019

Bugfixes:

fix: add "no-prototype-builtins" eslint-rule and fix all occurences - f7f05d7

fix: add more properties required to be enumerable - 1988878

Chores / Build:

fix: use !== 0 instead of != 0 - c02b05f

add chai and dirty-chai and sinon, for cleaner test-assertions and spies, deprecate old assertion-methods - 93e284e, 886ba86, 0817dad, 93516a0

Security:

The properties __proto__, __defineGetter__, __defineSetter__ and __lookupGetter__ have been added to the list of "properties that must be enumerable". If a property by that name is found and not enumerable on its parent, it will silently evaluate to undefined. This is done in both the compiled template and the "lookup"-helper. This will prevent new Remote-Code-Execution exploits that have been published recently.

Compatibility notes:

Due to the security-fixes. The semantics of the templates using __proto__, __defineGetter__, __defineSetter__ and __lookupGetter__ in the respect that those expression now return undefined rather than their actual value from the proto.

The semantics have not changed in cases where the properties are enumerable, as in:
{
  __proto__: 'some string'
}
The change may be breaking in that respect, but we still only increase the patch-version, because the incompatible use-cases are not intended, undocumented and far less important than fixing Remote-Code-Execution exploits on existing systems.

Commits

v4.5.2 - November 13th, 2019

Bugfixes

fix: use String(field) in lookup when checking for "constructor" - d541378

test: add fluent API for testing Handlebars - c2ac79c

Compatibility notes:

no incompatibility are to be expected

... (truncated) Commits

c819c8b v4.5.3
827c9d0 Update release notes
f7f05d7 fix: add "no-prototype-builtins" eslint-rule and fix all occurences
1988878 fix: add more properties required to be enumerable
886ba86 test/chore: add chai/expect and sinon to "runtime"-environment
0817dad test: add sinon as global variable to eslint in the specs
93516a0 test: add sinon.js for spies, deprecate current assertions
93e284e chore: add chai and dirty-chai for better test assertions
c02b05f fix: use !== 0 instead of != 0
8de121d v4.5.2
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
@dependabot use these labels will set the current labels as the default for future PRs for this repo and language
@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

Commits:

build(deps): bump handlebars from 4.1.2 to 4.5.3 in /console

Bumps [handlebars](https://github.com/wycats/handlebars.js) from 4.1.2 to 4.5.3.
- [Release notes](https://github.com/wycats/handlebars.js/releases)
- [Changelog](https://github.com/wycats/handlebars.js/blob/master/release-notes.md)
- [Commits](https://github.com/wycats/handlebars.js/compare/v4.1.2...v4.5.3)

Signed-off-by: dependabot[bot]

reducesql: also attempt to remove columns from PKs Closed by craig[bot] at Thu Jan 2 17:56:13 with 11 additions, 6 deletions, 3 comments ● SIZE pkg/testutils/reduce/reducesql: 17
Release note: None Commits: reducesql: also attempt to remove columns from PKs Release note: None

colexec/typeconv: remove orphaned Int1 case Closed by craig[bot] at Fri Jan 3 00:04:01 with 0 additions, 8 deletions, 4 comments ● SIZE pkg/sql/colexec/typeconv: 8
At some point we had support for Int1 (8 bit integer) in the vectorized engine, but it has been removed. This commit removes one of the leftover pieces. Release note: None Commits: colexec/typeconv: remove orphaned Int1 case At some point we had support for Int1 (8 bit integer) in the vectorized engine, but it has been removed. This commit removes one of the leftover pieces. Release note: None

console: make billing email accept long TLDs Closed by DuskEagle at Thu Jan 2 15:52:50 with 6 additions, 2 deletions, 2 comments ● SIZE console/apipb: 6
Previously, long TLDs such as `.marketing` or `.cloud` were being rejected as invalid. Commits: console: make billing email accept long TLDs Previously, long TLDs such as `.marketing` or `.cloud` were being rejected as invalid.

kv: don't close TCS interceptors on errors

Closed by craig[bot] at Thu Jan 2 17:32:23 with 2 additions, 3 deletions, 3 comments

● SIZE pkg/kv: 5

Before this patch, the TxnCoordSender was closing all the interceptors when on non-retriable errors. This was a useless optimization serving to stop the heartbeat loop early; the client was required to send a rollback to clean up the intents. This patch gets rid of the optimization in anticipation of the savepoints API, which will serve for error recovery.

Release note: None

Commits:

kv: don't close TCS interceptors on errors

Before this patch, the TxnCoordSender was closing all the interceptors
when on non-retriable errors. This was a useless optimization serving to
stop the heartbeat loop early; the client was required to send a
rollback to clean up the intents.
This patch gets rid of the optimization in anticipation of the
savepoints API, which will serve for error recovery.

Release note: None

release-19.2: sql/flowinfra: stop swallowing error while flushing

Closed by ajwerner at Thu Jan 2 20:36:42 with 1 additions, 1 deletions, 1 comments

● SIZE pkg/sql/flowinfra: 2

@solongordon I omitted a release note because I can't think of what the user-visible impact of not returning this error might be. If you can think of one let me know and I'll update the commit message.

Release note: None

Commits:

sql/flowinfra: stop swallowing error while flushing

@solongordon I omitted a release note because I can't think of what the
user-visible impact of not returning this error might be. If you can
think of one let me know and I'll update the commit message.

Release note: None

Enable saving bad csv rows to a side file Closed by lnhsingh at Fri Jan 3 04:55:25 with 2 additions, 0 deletions, 5 comments ● SIZE v19.2: 1 , v20.1: 1
Closes #5906. Commits: Enable saving bad csv rows to a side file Closes #5906. Edits based on feedback

Reply all

Reply to author

Forward

0 new messages