Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: add skaffold trace information (design + plumbing) #5756

Closed
aaron-prindle opened this issue Apr 29, 2021 · 2 comments
Closed

Feature Request: add skaffold trace information (design + plumbing) #5756

aaron-prindle opened this issue Apr 29, 2021 · 2 comments

Comments

@aaron-prindle
Copy link
Contributor

@aaron-prindle aaron-prindle commented Apr 29, 2021

For getting more information about skaffold's performance and ensuring that skaffold does not have performance degradation over time, skaffold should adding trace information to skaffold commands. Currently skaffold metrics can track the length of a skaffold dev session but there is no "child" trace information showing how specific actions took - yaml parsing, builds, deploys, etc.

The steps that I have outlined for this work include:

  • update opentelemetry (otel) libs (v0.13.0 -> v0.20.0) for skaffold (to use updated APIs & trace implementation) - #5757
  • fully plumb trace exporters, etc. (similar to metric exporter) & plumb tracing through skaffold (usable by env var) - #5854
  • add functionality/example where users can get generate & view their own trace information w/ jaeger - #5854
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Apr 29, 2021
What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/[email protected]49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Apr 29, 2021
What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/[email protected]49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Apr 29, 2021
What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/[email protected]49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Apr 29, 2021
What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/[email protected]49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Apr 29, 2021
What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/[email protected]49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Apr 29, 2021
What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/[email protected]49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Apr 29, 2021
What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/[email protected]49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
@MarlonGamez
Copy link
Contributor

@MarlonGamez MarlonGamez commented Apr 30, 2021

trying to make triage-party happy :)

aaron-prindle added a commit that referenced this issue May 2, 2021
What is the problem being solved?
Part of #5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/[email protected]49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
@aaron-prindle aaron-prindle added this to the v1.24.0 milestone May 3, 2021
@aaron-prindle aaron-prindle self-assigned this May 4, 2021
@aaron-prindle aaron-prindle changed the title Feature Request: Skaffold Performance Observability - skaffold trace information Feature Request: skaffold trace information (design + plumbing) May 5, 2021
@aaron-prindle aaron-prindle changed the title Feature Request: skaffold trace information (design + plumbing) Feature Request: add skaffold trace information (design + plumbing) May 5, 2021
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.  Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.  Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.  Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.  Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.  Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.  Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 24, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 24, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 24, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
@tejal29 tejal29 modified the milestones: v1.25.0, v1.26.0 May 26, 2021
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 26, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 26, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 26, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 26, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 26, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 26, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 27, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Jun 1, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Jun 1, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Jun 1, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Jun 1, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Jun 1, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Jun 2, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Jun 2, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit that referenced this issue Jun 2, 2021
…rters (#5854)

What is the problem being solved?
Part of #5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
@aaron-prindle
Copy link
Contributor Author

@aaron-prindle aaron-prindle commented Jun 2, 2021

Fixed with #5854

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants