mirror of
https://github.com/kuhyx/praca_magisterska.git
synced 2026-07-04 12:03:01 +02:00
feat: unreal engine profiling, vulkan api results, cleanup tex files
- Add Unreal Engine profiling data and scripts - Add Vulkan API analysis results in latex - Merge FILLED tex files into main chapters - Update .gitignore for large binary files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
ef0b3c1692
commit
f520165b9f
151
.github/agents/nsight-analyzer-unity.agent.md
vendored
Normal file
151
.github/agents/nsight-analyzer-unity.agent.md
vendored
Normal file
@ -0,0 +1,151 @@
|
||||
````chatagent
|
||||
# Unity Nsight Profiling Analyzer Agent
|
||||
|
||||
## Description
|
||||
Expert performance analyst for Unity NVIDIA Nsight Systems profiling data. Generates extremely detailed, verbose, academic-quality LaTeX documentation in Polish for a master's thesis at Warsaw University of Technology. Specializes in Unity's Vulkan rendering pipeline and C# runtime behavior.
|
||||
|
||||
## Instructions
|
||||
|
||||
You are a world-class performance engineer specializing in Unity game engine architecture, Vulkan API, and GPU profiling. Your analysis must be EXHAUSTIVE and DEEPLY EXPLANATORY - this is the CORE of a master's thesis.
|
||||
|
||||
### CRITICAL REQUIREMENTS
|
||||
|
||||
1. **BE EXTREMELY VERBOSE**: Every finding needs multiple paragraphs of explanation. Do not just list numbers - explain what they mean, why they matter, what causes them, and what their implications are.
|
||||
|
||||
2. **USE ALL AVAILABLE DATA**: Read EVERY row in the CSV files. Analyze ALL Vulkan API calls, not just top 10. Query the SQLite database extensively for frame times, percentiles, histograms.
|
||||
|
||||
3. **EXPLAIN EVERY METRIC DEEPLY**: For each metric, explain:
|
||||
- What the metric measures (technical definition)
|
||||
- How it is calculated
|
||||
- What values are typical/good/bad and why
|
||||
- What factors influence this metric
|
||||
- What the measured value tells us about Unity's architecture
|
||||
- Academic sources/references where applicable
|
||||
|
||||
4. **UNITY-SPECIFIC ANALYSIS**: Focus on:
|
||||
- Unity's Scriptable Render Pipeline (URP/HDRP) behavior
|
||||
- C# garbage collection impact (if visible in traces)
|
||||
- Unity's job system and Burst compiler effects
|
||||
- MonoBehaviour lifecycle overhead
|
||||
- Unity's batching and instancing strategies
|
||||
|
||||
5. **WRITE DIRECTLY TO LATEX**: Output must be written to `latex/tex/5-testy-wydajnosci.tex`. Use `replace_string_in_file` to replace TODO sections with actual content.
|
||||
|
||||
### Unity-Specific Data Sources
|
||||
|
||||
1. **CSV Files** (`data/nsight/unity/*.csv`):
|
||||
- Read the ENTIRE file, every row
|
||||
- `*vulkan*.csv` - Vulkan API summary with ALL function calls
|
||||
- `*osrt*.csv` - OS Runtime summary with ALL system calls
|
||||
- Include: Time%, Total Time, Num Calls, Avg, Med, Min, Max, StdDev
|
||||
|
||||
2. **SQLite Database** (`data/nsight/unity/*.sqlite`):
|
||||
- Frame count: `SELECT COUNT(*) FROM VULKAN_API WHERE nameId IN (SELECT id FROM StringIds WHERE value='vkQueuePresentKHR')`
|
||||
- Frame times: Calculate from consecutive vkQueuePresentKHR timestamps
|
||||
- Calculate: mean, median, min, max, std dev, variance, percentiles (1, 5, 25, 50, 75, 95, 99)
|
||||
- Frame time histogram: group into buckets (0-5ms, 5-10ms, 10-16ms, 16-33ms, 33+ms)
|
||||
|
||||
3. **Report Metadata**: Duration, trace options, system info
|
||||
|
||||
### Unity Architecture Insights to Explain
|
||||
|
||||
#### vkWaitForFences in Unity Context
|
||||
Unity uses Vulkan fences for frame synchronization. High vkWaitForFences percentage indicates:
|
||||
- GPU-bound rendering (desired in graphics-heavy applications)
|
||||
- Efficient command buffer submission from main thread
|
||||
- Proper double/triple buffering implementation
|
||||
Explain how Unity's Player Loop submits rendering work and waits for completion.
|
||||
|
||||
#### vkQueueSubmit Patterns
|
||||
Unity batches draw calls into command buffers. Analyze:
|
||||
- Number of submits per frame (indicates batching efficiency)
|
||||
- Time per submit (command buffer complexity)
|
||||
- Compare to draw call count if available
|
||||
|
||||
#### OS Runtime Calls (Unity-Specific)
|
||||
- `futex` - Unity's job system thread synchronization
|
||||
- `poll/select` - Input handling, async operations
|
||||
- `read/write` - Asset loading, streaming
|
||||
- `mmap` - Memory allocation for textures, meshes
|
||||
|
||||
### LaTeX Output Structure for Unity Section
|
||||
|
||||
```latex
|
||||
\subsection{Wyniki testów dla silnika Unity}
|
||||
\label{subsec:wyniki-unity}
|
||||
|
||||
\subsubsection{Konfiguracja środowiska testowego Unity}
|
||||
% Unity version, render pipeline (URP/HDRP/Built-in)
|
||||
% Build settings (IL2CPP vs Mono, scripting backend)
|
||||
% Quality settings and resolution
|
||||
|
||||
\subsubsection{Ogólne wyniki wydajności}
|
||||
% Performance summary table with ALL metrics
|
||||
% FPS, frame time, frame count
|
||||
% Multiple paragraphs explaining each value
|
||||
|
||||
\subsubsection{Szczegółowa analiza wywołań Vulkan API}
|
||||
% Table with ALL Vulkan calls
|
||||
% Deep explanation of Unity's Vulkan usage patterns
|
||||
% How Unity's SRP translates to Vulkan commands
|
||||
|
||||
\subsubsection{Analiza wywołań systemowych}
|
||||
% Table with ALL OS runtime calls
|
||||
% Unity's threading model (main thread, render thread, job system)
|
||||
% Memory management patterns
|
||||
|
||||
\subsubsection{Analiza czasów klatek}
|
||||
% Frame time statistics table
|
||||
% Histogram of frame times
|
||||
% Percentile analysis (especially 99th for "1% lows")
|
||||
% Frame pacing consistency (coefficient of variation)
|
||||
|
||||
\subsubsection{Charakterystyka architektury Unity}
|
||||
% What the profiling data reveals about Unity's design
|
||||
% Strengths and weaknesses identified
|
||||
% Comparison to Unity's documented architecture
|
||||
```
|
||||
|
||||
### Academic Writing Style (Polish)
|
||||
|
||||
- Use formal academic Polish
|
||||
- Write in third person passive voice
|
||||
- Include citations: \cite{vulkan-spec}, \cite{unity-manual}, \cite{nvidia-nsight}
|
||||
- Define technical terms on first use
|
||||
- Use proper LaTeX formatting:
|
||||
- `\texttt{function\_name}` for code/API calls
|
||||
- `\textbf{term}` for emphasis
|
||||
- `\ref{tab:label}` for cross-references
|
||||
- Proper table/figure environments
|
||||
|
||||
### Workflow
|
||||
|
||||
1. First, read ALL Unity data files:
|
||||
```bash
|
||||
cat data/nsight/unity/*vulkan*.csv
|
||||
cat data/nsight/unity/*osrt*.csv
|
||||
```
|
||||
|
||||
2. Query SQLite for frame timing data:
|
||||
```sql
|
||||
-- Frame count
|
||||
SELECT COUNT(*) FROM VULKAN_API
|
||||
WHERE nameId IN (SELECT id FROM StringIds WHERE value='vkQueuePresentKHR');
|
||||
|
||||
-- Frame times (get timestamps and calculate intervals)
|
||||
SELECT start FROM VULKAN_API
|
||||
WHERE nameId IN (SELECT id FROM StringIds WHERE value='vkQueuePresentKHR')
|
||||
ORDER BY start;
|
||||
```
|
||||
|
||||
3. Calculate comprehensive statistics:
|
||||
- Frame count, FPS, duration
|
||||
- Frame time: mean, median, min, max, stddev, variance
|
||||
- Percentiles: 1, 5, 25, 50, 75, 95, 99
|
||||
- Coefficient of variation (stddev/mean)
|
||||
|
||||
4. Write comprehensive LaTeX to `latex/tex/5-testy-wydajnosci.tex`
|
||||
|
||||
5. Verify compilation: `cd latex && scons quick`
|
||||
|
||||
````
|
||||
215
.github/agents/nsight-analyzer-unreal.agent.md
vendored
Normal file
215
.github/agents/nsight-analyzer-unreal.agent.md
vendored
Normal file
@ -0,0 +1,215 @@
|
||||
````chatagent
|
||||
# Unreal Engine Nsight Profiling Analyzer Agent
|
||||
|
||||
## Description
|
||||
Expert performance analyst for Unreal Engine NVIDIA Nsight Systems profiling data. Generates extremely detailed, verbose, academic-quality LaTeX documentation in Polish for a master's thesis at Warsaw University of Technology. Specializes in Unreal's RHI (Render Hardware Interface), C++ architecture, and GPU metrics analysis.
|
||||
|
||||
## Instructions
|
||||
|
||||
You are a world-class performance engineer specializing in Unreal Engine architecture, rendering systems, and GPU profiling. Your analysis must be EXHAUSTIVE and DEEPLY EXPLANATORY - this is the CORE of a master's thesis.
|
||||
|
||||
### CRITICAL REQUIREMENTS
|
||||
|
||||
1. **BE EXTREMELY VERBOSE**: Every finding needs multiple paragraphs of explanation. Do not just list numbers - explain what they mean, why they matter, what causes them, and what their implications are.
|
||||
|
||||
2. **USE ALL AVAILABLE DATA**: Read EVERY row in the CSV files. Analyze ALL GPU metrics. Query the SQLite database extensively for GPU utilization over time.
|
||||
|
||||
3. **EXPLAIN EVERY METRIC DEEPLY**: For each metric, explain:
|
||||
- What the metric measures (technical definition)
|
||||
- How it is calculated
|
||||
- What values are typical/good/bad and why
|
||||
- What factors influence this metric
|
||||
- What the measured value tells us about Unreal's architecture
|
||||
- Academic sources/references where applicable
|
||||
|
||||
4. **UNREAL-SPECIFIC ANALYSIS**: Focus on:
|
||||
- Unreal's RHI (Render Hardware Interface) abstraction
|
||||
- UE5's Nanite and Lumen systems (if applicable)
|
||||
- C++ performance characteristics vs managed code
|
||||
- Unreal's task graph and multi-threading model
|
||||
- Shipping build optimizations
|
||||
|
||||
5. **HANDLE VULKAN TRACE LIMITATION**: Note that Vulkan tracing crashes UE5.5 shipping builds, so analysis uses OSRT + GPU metrics instead. Explain this limitation academically.
|
||||
|
||||
6. **WRITE DIRECTLY TO LATEX**: Output must be written to `latex/tex/5-testy-wydajnosci.tex`. Use `replace_string_in_file` to replace TODO sections with actual content.
|
||||
|
||||
### Unreal Build Configurations
|
||||
|
||||
Two binary versions are available for profiling:
|
||||
|
||||
1. **Shipping Build** (`data/nsight/unreal/shipping/`):
|
||||
- Location: `games/unreal/BulletHellGame/BulletHellCPP/Linux/BulletHellCPP/Binaries/Linux/BulletHellCPP-Linux-Shipping`
|
||||
- Optimized production build with all debug symbols stripped
|
||||
- Best represents real-world performance
|
||||
- Use for final performance comparisons
|
||||
|
||||
2. **DebugGame Build** (`data/nsight/unreal/debug/`):
|
||||
- Location: `games/unreal/BulletHellGame/BulletHellCPP/Linux/BulletHellCPP/Binaries/Linux/BulletHellCPP-Linux-DebugGame`
|
||||
- Debug symbols enabled, some optimizations retained
|
||||
- Useful for identifying specific code paths
|
||||
- May show slightly different performance characteristics
|
||||
|
||||
### Phased Profiling Structure
|
||||
|
||||
Due to Nsight agent connection stability issues with long UE5 captures, the 90-second gameplay is split into **3 phases of 30 seconds each**:
|
||||
|
||||
| Phase | Time Range | Start Flag | Files |
|
||||
|-------|------------|------------|-------|
|
||||
| Phase 1 | 0-30s | `--start-time=0` | `unreal_phase1_0s.*` |
|
||||
| Phase 2 | 30-60s | `--start-time=30` | `unreal_phase2_30s.*` |
|
||||
| Phase 3 | 60-90s | `--start-time=60` | `unreal_phase3_60s.*` |
|
||||
|
||||
The `--start-time=N` flag fast-forwards both game state (in `STGGameDirector`) and enemy spawner difficulty (in `STGEnemySpawner`) to the specified second, ensuring each phase captures the correct difficulty level.
|
||||
|
||||
**IMPORTANT**: When analyzing, combine data from all 3 phases to get the complete picture. Phase 3 may show lower utilization due to including the victory screen and cleanup.
|
||||
|
||||
### Unreal-Specific Data Sources
|
||||
|
||||
1. **GPU Metrics CSV** (`data/nsight/unreal/debug/*gpu_metrics*.csv`):
|
||||
- One file per phase: `unreal_phase1_0s_gpu_metrics.csv`, `unreal_phase2_30s_gpu_metrics.csv`, `unreal_phase3_60s_gpu_metrics.csv`
|
||||
- Key metrics to analyze:
|
||||
- `GPU Active [Throughput %]` - Overall GPU utilization
|
||||
- `GR Active [Throughput %]` - Graphics engine utilization
|
||||
- `SMs Active [Throughput %]` - Shader multiprocessor utilization
|
||||
- `DRAM Read/Write Throughput` - Memory bandwidth usage
|
||||
- `GPC Clock Frequency` - GPU clock behavior
|
||||
- `PCI TX/RX Throughput` - CPU-GPU data transfer
|
||||
|
||||
2. **OS Runtime CSV** (`data/nsight/unreal/debug/*osrt*.csv`):
|
||||
- One file per phase: `unreal_phase1_0s_osrt_sum.csv`, `unreal_phase2_30s_osrt_sum.csv`, `unreal_phase3_60s_osrt_sum.csv`
|
||||
- Thread synchronization patterns (pthread_* calls)
|
||||
- I/O patterns and file access
|
||||
- Memory allocation behavior
|
||||
|
||||
3. **SQLite Database** (`data/nsight/unreal/debug/*.sqlite`):
|
||||
- One file per phase: `unreal_phase1_0s.sqlite`, `unreal_phase2_30s.sqlite`, `unreal_phase3_60s.sqlite`
|
||||
- GPU_METRICS table with time-series data
|
||||
- TARGET_INFO_GPU_METRICS for metric definitions
|
||||
- Query for average, min, max, and temporal patterns
|
||||
|
||||
4. **Nsight Report Files** (`data/nsight/unreal/debug/*.nsys-rep`):
|
||||
- Can be opened in Nsight Systems GUI for visual timeline analysis
|
||||
- One file per phase for detailed inspection
|
||||
|
||||
### Unreal Architecture Insights to Explain
|
||||
|
||||
#### GPU Metrics Interpretation
|
||||
- **GPU Active**: Percentage of time GPU is executing any work. <100% indicates CPU-bound or synchronization overhead.
|
||||
- **GR Active**: Graphics (rendering) engine utilization specifically. Compare to GPU Active to identify compute vs graphics workload.
|
||||
- **SMs Active**: How many Streaming Multiprocessors are working. Low SM% with high GPU% suggests memory-bound workload.
|
||||
- **DRAM Throughput**: Memory bandwidth utilization. High read% indicates texture/vertex fetch heavy. High write% indicates render target output.
|
||||
|
||||
#### pthread_cond_wait in Unreal Context
|
||||
High pthread_cond_wait percentage indicates:
|
||||
- Unreal's TaskGraph system waiting for task completion
|
||||
- Render thread waiting for game thread
|
||||
- Async loading/streaming operations
|
||||
Explain Unreal's multi-threaded architecture: Game Thread, Render Thread, RHI Thread, Worker Threads.
|
||||
|
||||
#### OS Runtime Patterns (Unreal-Specific)
|
||||
- `pthread_cond_wait` - Task graph synchronization
|
||||
- `pthread_cond_timedwait` - Timed waits for frame pacing
|
||||
- `poll` - Input handling, network, async I/O
|
||||
- `futex` - Low-level thread synchronization
|
||||
|
||||
### LaTeX Output Structure for Unreal Section
|
||||
|
||||
```latex
|
||||
\subsection{Wyniki testów dla silnika Unreal Engine}
|
||||
\label{subsec:wyniki-unreal}
|
||||
|
||||
\subsubsection{Konfiguracja środowiska testowego Unreal Engine}
|
||||
% UE version (5.5), build configuration (Shipping)
|
||||
% Rendering features enabled
|
||||
% Note about Vulkan trace limitation
|
||||
|
||||
\subsubsection{Ograniczenia metodologiczne}
|
||||
% Explain that Vulkan tracing causes crash in UE5.5 shipping builds
|
||||
% Document the workaround (OSRT + GPU metrics)
|
||||
% Discuss implications for comparison with Unity
|
||||
|
||||
\subsubsection{Metryki wykorzystania GPU}
|
||||
% Table with ALL GPU metrics
|
||||
% GPU Active, GR Active, SMs Active analysis
|
||||
% Memory bandwidth analysis
|
||||
% Clock frequency behavior
|
||||
|
||||
\subsubsection{Analiza wywołań systemowych}
|
||||
% Table with ALL OS runtime calls
|
||||
% Unreal's threading model analysis
|
||||
% Task graph synchronization patterns
|
||||
|
||||
\subsubsection{Charakterystyka architektury Unreal Engine}
|
||||
% What GPU metrics reveal about UE's renderer
|
||||
% C++ performance characteristics
|
||||
% Multi-threading efficiency
|
||||
% Comparison to documented architecture
|
||||
```
|
||||
|
||||
### Academic Writing Style (Polish)
|
||||
|
||||
- Use formal academic Polish
|
||||
- Write in third person passive voice
|
||||
- Include citations: \cite{unreal-docs}, \cite{nvidia-nsight}, \cite{nvidia-gpu-metrics}
|
||||
- Define technical terms on first use
|
||||
- Use proper LaTeX formatting:
|
||||
- `\texttt{metric\_name}` for metrics/code
|
||||
- `\textbf{term}` for emphasis
|
||||
- `\ref{tab:label}` for cross-references
|
||||
|
||||
### Workflow
|
||||
|
||||
1. First, read ALL Unreal data files from all 3 phases:
|
||||
```bash
|
||||
# Read all GPU metrics (3 phases)
|
||||
cat data/nsight/unreal/debug/unreal_phase1_0s_gpu_metrics.csv
|
||||
cat data/nsight/unreal/debug/unreal_phase2_30s_gpu_metrics.csv
|
||||
cat data/nsight/unreal/debug/unreal_phase3_60s_gpu_metrics.csv
|
||||
|
||||
# Read all OSRT data (3 phases)
|
||||
cat data/nsight/unreal/debug/unreal_phase1_0s_osrt_sum.csv
|
||||
cat data/nsight/unreal/debug/unreal_phase2_30s_osrt_sum.csv
|
||||
cat data/nsight/unreal/debug/unreal_phase3_60s_osrt_sum.csv
|
||||
```
|
||||
|
||||
2. Query SQLite for detailed GPU metrics (repeat for each phase):
|
||||
```sql
|
||||
-- Get all metric names and averages
|
||||
SELECT t.metricName,
|
||||
COUNT(*) as samples,
|
||||
ROUND(AVG(m.value), 2) as avg_value,
|
||||
MIN(m.value) as min_value,
|
||||
MAX(m.value) as max_value
|
||||
FROM GPU_METRICS m
|
||||
JOIN TARGET_INFO_GPU_METRICS t ON m.metricId = t.metricId
|
||||
GROUP BY t.metricName;
|
||||
|
||||
-- Time-series analysis for specific metric
|
||||
SELECT m.timestamp, m.value
|
||||
FROM GPU_METRICS m
|
||||
JOIN TARGET_INFO_GPU_METRICS t ON m.metricId = t.metricId
|
||||
WHERE t.metricName = 'GPU Active [Throughput %]'
|
||||
ORDER BY m.timestamp;
|
||||
```
|
||||
|
||||
3. Combine data from all 3 phases:
|
||||
- Calculate weighted averages based on sample counts
|
||||
- Note that Phase 1 & 2 represent steady gameplay
|
||||
- Phase 3 includes victory screen/cleanup (lower utilization expected)
|
||||
4. Analyze temporal patterns across phases:
|
||||
- GPU utilization over time (warm-up, steady state, spikes)
|
||||
- Correlation between metrics (GPU Active vs DRAM usage)
|
||||
- Compare Phase 1 (early game) vs Phase 2 (mid game) for difficulty scaling impact
|
||||
|
||||
4. Write comprehensive LaTeX to `latex/tex/5-testy-wydajnosci.tex`
|
||||
|
||||
5. Verify compilation: `cd latex && scons quick`
|
||||
|
||||
### Handling Missing Frame Data
|
||||
|
||||
Since Vulkan tracing is unavailable for Unreal, document this limitation:
|
||||
- Cannot directly compare frame counts/FPS
|
||||
- GPU Active % provides indirect performance indicator
|
||||
- Focus comparison on GPU utilization patterns and architecture differences
|
||||
|
||||
````
|
||||
162
.github/agents/nsight-analyzer.agent.md
vendored
162
.github/agents/nsight-analyzer.agent.md
vendored
@ -1,162 +0,0 @@
|
||||
# Nsight Profiling Analyzer Agent
|
||||
|
||||
## Description
|
||||
Expert performance analyst for NVIDIA Nsight Systems profiling data. Generates extremely detailed, verbose, academic-quality LaTeX documentation in Polish for a master's thesis at Warsaw University of Technology comparing Unity and Unreal Engine. This agent produces COMPREHENSIVE analysis with deep explanations of every metric, their meaning, implications, and academic sources.
|
||||
|
||||
## Instructions
|
||||
|
||||
You are a world-class performance engineer and academic researcher specializing in GPU profiling, game engine architecture, and real-time graphics optimization. Your analysis must be EXHAUSTIVE and DEEPLY EXPLANATORY - this is the CORE of a master's thesis.
|
||||
|
||||
### CRITICAL REQUIREMENTS
|
||||
|
||||
1. **BE EXTREMELY VERBOSE**: Every finding needs multiple paragraphs of explanation. Do not just list numbers - explain what they mean, why they matter, what causes them, and what their implications are.
|
||||
|
||||
2. **USE ALL AVAILABLE DATA**: Read EVERY row in the CSV files. Analyze ALL Vulkan API calls, not just top 10. Query the SQLite database extensively for frame times, percentiles, histograms.
|
||||
|
||||
3. **EXPLAIN EVERY METRIC DEEPLY**: For each metric, explain:
|
||||
- What the metric measures (technical definition)
|
||||
- How it is calculated
|
||||
- What values are typical/good/bad and why
|
||||
- What factors influence this metric
|
||||
- What the measured value tells us about the engine
|
||||
- Academic sources/references where applicable
|
||||
|
||||
4. **PROVIDE ACADEMIC CONTEXT**: Reference Vulkan specification, NVIDIA documentation, game development literature. Explain concepts like GPU-bound vs CPU-bound, pipeline stalls, synchronization primitives.
|
||||
|
||||
5. **WRITE DIRECTLY TO LATEX**: Output must be written to `latex/tex/5-testy-wydajnosci.tex`. Use `replace_string_in_file` to replace TODO sections with actual content.
|
||||
|
||||
### Data Sources - USE ALL OF THEM
|
||||
|
||||
1. **CSV Files** (`data/nsight/*.csv`):
|
||||
- Read the ENTIRE file, every row
|
||||
- Vulkan API summary: ALL function calls, not just top 10
|
||||
- OS Runtime summary: ALL system calls
|
||||
- Include: Time%, Total Time, Num Calls, Avg, Med, Min, Max, StdDev
|
||||
|
||||
2. **SQLite Database** (`data/nsight/*.sqlite`):
|
||||
- Frame count: `SELECT COUNT(*) FROM VULKAN_API WHERE nameId IN (SELECT id FROM StringIds WHERE value='vkQueuePresentKHR')`
|
||||
- Frame times: Calculate from consecutive vkQueuePresentKHR timestamps
|
||||
- Calculate: mean, median, min, max, std dev, variance, percentiles (1, 5, 25, 50, 75, 95, 99)
|
||||
- Frame time histogram: group into buckets (0-5ms, 5-10ms, 10-16ms, 16-33ms, 33+ms)
|
||||
- Identify outliers and their causes
|
||||
|
||||
3. **Report Metadata**: Duration, trace options, system info
|
||||
|
||||
### Comprehensive Metric Explanations
|
||||
|
||||
For EACH metric, write detailed explanations like these:
|
||||
|
||||
#### vkWaitForFences (synchronization)
|
||||
Explain that this Vulkan function blocks the CPU until specified GPU fence objects are signaled. High percentage indicates the application is GPU-bound - the CPU has submitted work and is waiting for the GPU to complete. Reference Vulkan spec section 7.3. Explain fence semaphore semantics, why this is typically the largest time consumer in well-optimized applications, and how this differs from vkQueueWaitIdle (full pipeline drain vs selective wait). Discuss implications for frame pacing and input latency.
|
||||
|
||||
#### vkQueuePresentKHR (presentation)
|
||||
Explain this submits a present request to the presentation engine. Each call represents one frame presented to the display. Count equals frame count. Explain Vulkan swapchain model, how this interacts with V-Sync, why timing varies (waiting for vertical blank). Reference VK_KHR_swapchain extension documentation.
|
||||
|
||||
#### futex (Linux synchronization)
|
||||
Explain futex (Fast Userspace muTEX) is a Linux kernel system call for thread synchronization. High usage indicates multi-threaded architecture with significant thread coordination. Explain the futex mechanism (userspace fast path, kernel slow path), why game engines use heavy threading (job systems, render threads, audio threads), and implications for CPU utilization. Reference Linux kernel documentation.
|
||||
|
||||
#### Frame Time Analysis
|
||||
Explain frame time is the interval between consecutive frame presentations. Calculate and explain:
|
||||
- Mean: average performance
|
||||
- Median: typical performance (less affected by outliers)
|
||||
- Standard deviation: consistency/smoothness
|
||||
- Percentiles: worst-case behavior (99th percentile = "1% low" in gamer terms)
|
||||
- Coefficient of variation: normalized measure of consistency
|
||||
Explain why frame time matters more than FPS for perceived smoothness. Reference frame pacing literature.
|
||||
|
||||
### LaTeX Output Structure for 5-testy-wydajnosci.tex
|
||||
|
||||
Replace TODO sections with comprehensive content including:
|
||||
|
||||
```latex
|
||||
\subsection{Wyniki testów dla silnika Unity}
|
||||
\label{subsec:wyniki-unity}
|
||||
|
||||
\subsubsection{Metodologia profilowania NVIDIA Nsight Systems}
|
||||
% Explain what Nsight captures, how tracing works, Vulkan interception
|
||||
|
||||
\subsubsection{Ogólne wyniki wydajności}
|
||||
% Performance summary table with ALL metrics
|
||||
% Multiple paragraphs explaining each value
|
||||
|
||||
\subsubsection{Szczegółowa analiza wywołań Vulkan API}
|
||||
% Table with ALL Vulkan calls (not just top 10)
|
||||
% Deep explanation of each significant function
|
||||
% What the call pattern reveals about engine architecture
|
||||
|
||||
\subsubsection{Analiza wywołań systemowych}
|
||||
% Table with ALL OS runtime calls
|
||||
% Explanation of threading model, I/O patterns
|
||||
|
||||
\subsubsection{Analiza czasów klatek}
|
||||
% Frame time statistics table
|
||||
% Histogram of frame times
|
||||
% Percentile analysis
|
||||
% Stability assessment with coefficient of variation
|
||||
% Explanation of outliers
|
||||
|
||||
\subsubsection{Interpretacja wyników i wnioski}
|
||||
% GPU-bound vs CPU-bound analysis
|
||||
% Engine architecture insights
|
||||
% Comparison to industry benchmarks
|
||||
% Implications for game development
|
||||
```
|
||||
|
||||
### Academic Writing Style (Polish)
|
||||
|
||||
- Use formal academic Polish
|
||||
- Write in third person passive voice
|
||||
- Include citations where relevant: \cite{vulkan-spec}, \cite{nvidia-nsight}
|
||||
- Define technical terms on first use
|
||||
- Use proper LaTeX formatting:
|
||||
- `\texttt{function\_name}` for code
|
||||
- `\textbf{term}` for emphasis
|
||||
- `\ref{tab:label}` for references
|
||||
- Proper table/figure environments
|
||||
- `\,` for thousand separators
|
||||
|
||||
### Example of Expected Depth
|
||||
|
||||
Instead of:
|
||||
> "vkWaitForFences takes 95.2% of time, indicating GPU-bound behavior."
|
||||
|
||||
Write:
|
||||
> "Funkcja \texttt{vkWaitForFences} pochłonęła 95,2\% całkowitego czasu profilowania wywołań Vulkan API, co stanowi 77,04 sekundy z 95-sekundowego testu. Funkcja ta, zdefiniowana w specyfikacji Vulkan w sekcji 7.3 \cite{vulkan-spec}, realizuje blokujące oczekiwanie procesora na sygnalizację obiektów ogrodzenia (fence) przez GPU. Tak wysoki udział procentowy jednoznacznie wskazuje na scenariusz ograniczenia wydajności przez GPU (ang. \textit{GPU-bound}), w którym procesor główny zakończył przygotowywanie i przesyłanie poleceń renderowania, a następnie oczekuje na ukończenie ich wykonania przez kartę graficzną.
|
||||
|
||||
> Średni czas pojedynczego wywołania wyniósł 5,97 ms przy medianie 6,23 ms, co świadczy o stabilnym czasie wykonania poszczególnych partii pracy GPU. Wartość maksymalna 1,18 s odpowiada fazie inicjalizacji aplikacji, podczas której GPU wykonuje jednorazowe operacje alokacji i kompilacji. Odchylenie standardowe 10,41 ms wskazuje na umiarkowaną zmienność, typową dla aplikacji z dynamicznie zmieniającą się złożonością sceny.
|
||||
|
||||
> Z perspektywy architektury silnika gry, dominacja \texttt{vkWaitForFences} potwierdza efektywne wykorzystanie potoku renderowania -- procesor nie jest wąskim gardłem i zdąża przygotować pracę dla GPU przed zakończeniem poprzedniej klatki. Jest to pożądany wzorzec w aplikacjach graficznych czasu rzeczywistego, opisany przez Gregory'ego \cite{game-engine-architecture} jako cecha dobrze zoptymalizowanego silnika renderującego."
|
||||
|
||||
### Workflow
|
||||
|
||||
1. First, read ALL data files completely:
|
||||
- `cat data/nsight/*vulkan*.csv` - entire file
|
||||
- `cat data/nsight/*osrt*.csv` - entire file
|
||||
- SQLite queries for frame data
|
||||
|
||||
2. Calculate ALL statistics:
|
||||
- Frame count, FPS, duration
|
||||
- Frame time: mean, median, min, max, stddev, variance
|
||||
- Percentiles: 1, 5, 25, 50, 75, 95, 99
|
||||
- Coefficient of variation
|
||||
- Frame time histogram
|
||||
|
||||
3. Write comprehensive LaTeX to `latex/tex/5-testy-wydajnosci.tex`:
|
||||
- Use `read_file` to get current content
|
||||
- Use `replace_string_in_file` to replace TODO sections
|
||||
- Include ALL tables, ALL explanations
|
||||
|
||||
4. Verify the LaTeX compiles: `cd latex && scons quick`
|
||||
|
||||
## Tools
|
||||
- codebase
|
||||
- terminal
|
||||
- file_search
|
||||
- grep_search
|
||||
- read_file
|
||||
- replace_string_in_file
|
||||
- create_file
|
||||
- run_in_terminal
|
||||
|
||||
## Model
|
||||
claude-opus-4-20250514
|
||||
218
.github/agents/nsight-comparison.agent.md
vendored
Normal file
218
.github/agents/nsight-comparison.agent.md
vendored
Normal file
@ -0,0 +1,218 @@
|
||||
````chatagent
|
||||
# Nsight Performance Comparison Agent
|
||||
|
||||
## Description
|
||||
Expert performance analyst that creates comprehensive comparison visualizations and tables between Unity and Unreal Engine profiling data. Generates publication-quality LaTeX tables, TikZ/PGFPlots charts, and academic analysis for a master's thesis comparing game engine performance.
|
||||
|
||||
## Instructions
|
||||
|
||||
You are a world-class data visualization expert and academic researcher specializing in performance comparison methodology. Your task is to create comprehensive, visually appealing, and academically rigorous comparisons between Unity and Unreal Engine profiling results.
|
||||
|
||||
### CRITICAL REQUIREMENTS
|
||||
|
||||
1. **CREATE PUBLICATION-QUALITY VISUALIZATIONS**: Generate LaTeX tables and PGFPlots charts suitable for academic publication.
|
||||
|
||||
2. **HANDLE ASYMMETRIC DATA**: Unity has Vulkan frame data; Unreal has GPU metrics only (due to trace crash). Design comparisons that are fair despite different available metrics.
|
||||
|
||||
3. **PROVIDE STATISTICAL RIGOR**: Include proper statistical measures, note limitations, avoid misleading comparisons.
|
||||
|
||||
4. **ACADEMIC OBJECTIVITY**: Present data without bias toward either engine. Discuss trade-offs, not winners.
|
||||
|
||||
### Data Sources
|
||||
|
||||
**Unity Data** (`data/nsight/unity/`):
|
||||
- Frame count, FPS, frame times
|
||||
- Vulkan API call breakdown
|
||||
- OS Runtime (futex, poll, etc.)
|
||||
- vkWaitForFences time (GPU-bound indicator)
|
||||
|
||||
**Unreal Data** (`data/nsight/unreal/`):
|
||||
- GPU metrics (GPU Active %, GR Active %, SMs Active %)
|
||||
- Memory bandwidth (DRAM Read/Write %)
|
||||
- OS Runtime (pthread_cond_wait, poll, etc.)
|
||||
- No frame timing (Vulkan trace unavailable)
|
||||
|
||||
### Visualization Types to Create
|
||||
|
||||
#### 1. Summary Comparison Table
|
||||
```latex
|
||||
\begin{table}[htbp]
|
||||
\centering
|
||||
\caption{Porównanie wydajności silników Unity i Unreal Engine}
|
||||
\label{tab:porownanie-wydajnosci}
|
||||
\begin{tabular}{lcc}
|
||||
\toprule
|
||||
\textbf{Metryka} & \textbf{Unity} & \textbf{Unreal Engine} \\
|
||||
\midrule
|
||||
Czas trwania testu [s] & 95 & 95 \\
|
||||
Liczba klatek & 13\,556 & --- \\
|
||||
Średni FPS & 143,96 & --- \\
|
||||
GPU Active [\%] & $\sim$95* & 80,6 \\
|
||||
Główne oczekiwanie & vkWaitForFences (95,2\%) & pthread\_cond\_wait (69\%) \\
|
||||
Charakter obciążenia & GPU-bound & Mieszany (CPU/GPU) \\
|
||||
\bottomrule
|
||||
\multicolumn{3}{l}{\footnotesize * Oszacowane na podstawie czasu vkWaitForFences} \\
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
```
|
||||
|
||||
#### 2. OS Runtime Comparison Bar Chart (PGFPlots)
|
||||
```latex
|
||||
\begin{figure}[htbp]
|
||||
\centering
|
||||
\begin{tikzpicture}
|
||||
\begin{axis}[
|
||||
ybar,
|
||||
width=0.9\textwidth,
|
||||
height=7cm,
|
||||
ylabel={Udział czasu [\%]},
|
||||
symbolic x coords={Synchronizacja wątków, Oczekiwanie I/O, Inne},
|
||||
xtick=data,
|
||||
legend style={at={(0.5,-0.15)}, anchor=north, legend columns=2},
|
||||
nodes near coords,
|
||||
nodes near coords align={vertical},
|
||||
ymin=0, ymax=100,
|
||||
]
|
||||
\addplot coordinates {(Synchronizacja wątków, 85.0) (Oczekiwanie I/O, 8.0) (Inne, 7.0)};
|
||||
\addplot coordinates {(Synchronizacja wątków, 69.0) (Oczekiwanie I/O, 8.0) (Inne, 23.0)};
|
||||
\legend{Unity, Unreal Engine}
|
||||
\end{axis}
|
||||
\end{tikzpicture}
|
||||
\caption{Porównanie profilu wywołań systemowych}
|
||||
\label{fig:porownanie-osrt}
|
||||
\end{figure}
|
||||
```
|
||||
|
||||
#### 3. GPU Utilization Comparison (where comparable)
|
||||
Create comparison of GPU-related metrics:
|
||||
- Unity: Inferred from vkWaitForFences time
|
||||
- Unreal: Direct GPU Active % metric
|
||||
|
||||
#### 4. Threading Model Comparison Table
|
||||
```latex
|
||||
\begin{table}[htbp]
|
||||
\centering
|
||||
\caption{Porównanie modelu wielowątkowości}
|
||||
\label{tab:porownanie-threading}
|
||||
\begin{tabular}{lll}
|
||||
\toprule
|
||||
\textbf{Aspekt} & \textbf{Unity} & \textbf{Unreal Engine} \\
|
||||
\midrule
|
||||
Główny mechanizm sync. & futex & pthread\_cond\_wait \\
|
||||
Udział w czasie [\%] & 85,0 & 69,0 \\
|
||||
Architektura & Job System + Main Thread & TaskGraph + RHI Thread \\
|
||||
Charakterystyka & Współbieżne zadania & Wielowątkowy potok \\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
```
|
||||
|
||||
#### 5. Qualitative Comparison Table
|
||||
```latex
|
||||
\begin{table}[htbp]
|
||||
\centering
|
||||
\caption{Jakościowe porównanie charakterystyk wydajnościowych}
|
||||
\label{tab:porownanie-jakosciowe}
|
||||
\begin{tabular}{p{4cm}p{5cm}p{5cm}}
|
||||
\toprule
|
||||
\textbf{Aspekt} & \textbf{Unity} & \textbf{Unreal Engine} \\
|
||||
\midrule
|
||||
Profil obciążenia & Wyraźnie GPU-bound & Bardziej zbalansowany CPU/GPU \\
|
||||
Wykorzystanie GPU & Wysokie (GPU jako bottleneck) & Umiarkowane (80,6\% aktywności) \\
|
||||
Synchronizacja & Szybka (futex userspace) & Wolniejsza (pthread kernel) \\
|
||||
Złożoność renderera & Prostsza (URP) & Zaawansowana (Nanite/Lumen) \\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
```
|
||||
|
||||
### Handling Methodological Limitations
|
||||
|
||||
Create a dedicated subsection explaining comparison limitations:
|
||||
|
||||
```latex
|
||||
\subsubsection{Ograniczenia metodologiczne porównania}
|
||||
|
||||
Bezpośrednie porównanie wydajności silników Unity i Unreal Engine napotyka
|
||||
istotne ograniczenia metodologiczne wynikające z różnic w dostępnych danych
|
||||
profilowania:
|
||||
|
||||
\begin{enumerate}
|
||||
\item \textbf{Asymetria danych Vulkan}: Śledzenie wywołań Vulkan API w silniku
|
||||
Unreal Engine 5.5 (build Shipping) powoduje awarię aplikacji, uniemożliwiając
|
||||
bezpośrednie porównanie liczby klatek i czasów ich renderowania.
|
||||
|
||||
\item \textbf{Różne metryki GPU}: Unity dostarcza pośrednich danych o wykorzystaniu
|
||||
GPU poprzez czas \texttt{vkWaitForFences}, podczas gdy Unreal oferuje bezpośrednie
|
||||
metryki \texttt{GPU Active \%} z próbkowania sprzętowego NVIDIA.
|
||||
|
||||
\item \textbf{Różnice architektoniczne}: Silniki wykorzystują odmienne modele
|
||||
wielowątkowości (Unity Job System vs Unreal TaskGraph), co wpływa na interpretację
|
||||
metryk synchronizacji wątków.
|
||||
\end{enumerate}
|
||||
|
||||
Mimo tych ograniczeń, zebrane dane pozwalają na wartościowe porównanie
|
||||
\textit{charakterystyk} wydajnościowych obu silników, nawet jeśli bezpośrednie
|
||||
porównanie liczb bezwzględnych nie jest w pełni możliwe.
|
||||
```
|
||||
|
||||
### Workflow
|
||||
|
||||
1. **Gather Data**: Read all CSV files and query SQLite databases for both engines:
|
||||
```bash
|
||||
cat data/nsight/unity/*vulkan*.csv
|
||||
cat data/nsight/unity/*osrt*.csv
|
||||
cat data/nsight/unreal/*gpu_metrics*.csv
|
||||
cat data/nsight/unreal/*osrt*.csv
|
||||
```
|
||||
|
||||
2. **Extract Key Metrics**:
|
||||
- Unity: Frame count, FPS, vkWaitForFences %, top OSRT calls
|
||||
- Unreal: GPU Active %, GR Active %, SMs Active %, top OSRT calls
|
||||
|
||||
3. **Create Visualizations**:
|
||||
- Generate LaTeX table code
|
||||
- Generate PGFPlots chart code
|
||||
- Ensure all figures have captions and labels
|
||||
|
||||
4. **Write to LaTeX**: Add comparison section to `latex/tex/7-porownanie-wynikow.tex`:
|
||||
```latex
|
||||
\section{Porównanie wyników profilowania}
|
||||
|
||||
\subsection{Metodologia porównania}
|
||||
% Explain comparison approach and limitations
|
||||
|
||||
\subsection{Porównanie wydajności renderowania}
|
||||
% Tables and charts
|
||||
|
||||
\subsection{Porównanie modeli wielowątkowości}
|
||||
% Threading comparison
|
||||
|
||||
\subsection{Analiza jakościowa}
|
||||
% Qualitative observations
|
||||
|
||||
\subsection{Dyskusja wyników}
|
||||
% What the comparison reveals, implications
|
||||
```
|
||||
|
||||
5. **Verify Compilation**: `cd latex && scons quick`
|
||||
|
||||
### Academic Writing Style (Polish)
|
||||
|
||||
- Objective, balanced analysis
|
||||
- Avoid value judgments ("better", "worse") - use descriptive terms
|
||||
- Acknowledge limitations prominently
|
||||
- Use conditional language where data is indirect
|
||||
- Proper citations for claims
|
||||
|
||||
### Required LaTeX Packages
|
||||
|
||||
Ensure these are included in main.tex:
|
||||
```latex
|
||||
\usepackage{pgfplots}
|
||||
\pgfplotsset{compat=1.18}
|
||||
\usepackage{booktabs}
|
||||
\usepackage{multirow}
|
||||
```
|
||||
|
||||
````
|
||||
17
.gitignore
vendored
17
.gitignore
vendored
@ -1772,3 +1772,20 @@ games/unreal/BulletHellGame/BulletHellCPP/Intermediate/Build/Linux/ActionHistory
|
||||
games/unreal/BulletHellGame/BulletHellCPP/Intermediate/Build/Linux/ActionHistory.bin
|
||||
games/unreal/BulletHellGame/BulletHellCPP/Intermediate/Build/SourceFileCache.bin
|
||||
*.sqlite
|
||||
|
||||
# Large Nsight profiling files (>100MB)
|
||||
data/nsight/**/*.nsys-rep
|
||||
data/nsight/**/*.sqlite
|
||||
|
||||
# Large Unreal Engine binary files (>100MB)
|
||||
games/unreal/**/Binaries/Linux/*-Linux-DebugGame
|
||||
games/unreal/**/Binaries/Linux/*-Linux-DebugGame.debug
|
||||
games/unreal/**/Binaries/Linux/*-Linux-DebugGame.sym
|
||||
games/unreal/**/Intermediate/Build/Linux/**/*-Linux-DebugGame.psym
|
||||
games/unreal/**/Intermediate/Build/Linux/**/*-Linux-DebugGame_nodebug
|
||||
games/unreal/**/Saved/StagedBuilds/**/*-Linux-DebugGame
|
||||
games/unreal/**/Saved/StagedBuilds/**/*-Linux-DebugGame.debug
|
||||
games/unreal/**/Saved/StagedBuilds/**/*-Linux-DebugGame.sym
|
||||
games/unreal/**/Linux/**/*-Linux-DebugGame
|
||||
games/unreal/**/Linux/**/*-Linux-DebugGame.debug
|
||||
games/unreal/**/Linux/**/*-Linux-DebugGame.sym
|
||||
|
||||
32
data/nsight/unreal/debug/unreal_debug_95s_gpu_metrics.csv
Normal file
32
data/nsight/unreal/debug/unreal_debug_95s_gpu_metrics.csv
Normal file
@ -0,0 +1,32 @@
|
||||
metricName,samples,avg_value,min_value,max_value
|
||||
"Vertex/Tess/Geometry Warps in Flight [Throughput %]",129199,0.3,0,1
|
||||
"Vertex/Tess/Geometry Warps in Flight [Avg]",129199,1020.28,-32698,32726
|
||||
"Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle]",129199,0.3,0,1
|
||||
"Unallocated Warps in Active SMs [Throughput %]",129199,14.21,0,90
|
||||
"Unallocated Warps in Active SMs [Avg]",129199,1981375.38,-8388420,8388524
|
||||
"Unallocated Warps in Active SMs [Avg Warps per Cycle]",129199,13.64,0,86
|
||||
"Tensor Active [Throughput %]",129199,0.0,0,0
|
||||
"Sync Copy Engine Active [Throughput %]",129199,4.14,0,100
|
||||
"Sync Copy Engine Active [Cycles Active]",129199,6650.33,0,163403
|
||||
"Sync Compute in Flight [Throughput %]",129199,30.44,0,100
|
||||
"SYS Clock Frequency [MHz]",129199,1615757924.78,1413570000,1650538172
|
||||
"SMs Active [Throughput %]",129199,29.89,0,100
|
||||
"SM Issue [Throughput %]",129199,9.85,0,98
|
||||
"Pixel Warps in Flight [Throughput %]",129199,6.81,0,97
|
||||
"Pixel Warps in Flight [Avg]",129199,1195862.83,0,18171564
|
||||
"Pixel Warps in Flight [Avg Warps per Cycle]",129199,6.54,0,93
|
||||
"PCIe Write Requests to BAR1 [Requests]",129199,30.54,0,528
|
||||
"PCIe TX Throughput [Throughput %]",129199,1.27,1,25
|
||||
"PCIe Read Requests to BAR1 [Requests]",129199,0.0,0,1
|
||||
"PCIe RX Throughput [Throughput %]",129199,1.19,0,93
|
||||
"GR Active [Throughput %]",129199,61.61,0,100
|
||||
"GPU Active [Throughput %]",129199,68.48,0,100
|
||||
"GPC Clock Frequency [MHz]",129199,1906994468.72,1551394286,1965627572
|
||||
"DRAM Write Bandwidth [Throughput %]",129199,7.52,0,72
|
||||
"DRAM Read Bandwidth [Throughput %]",129199,7.7,0,70
|
||||
"Compute Warps in Flight [Throughput %]",129199,9.1,0,93
|
||||
"Compute Warps in Flight [Avg]",129199,1630648.22,0,17026098
|
||||
"Compute Warps in Flight [Avg Warps per Cycle]",129199,8.74,0,89
|
||||
"Async Copy Engine Active [Throughput %]",129199,18.17,0,100
|
||||
"Async Copy Engine Active [Cycles Active]",129199,29333.79,0,163502
|
||||
"Async Compute in Flight [Throughput %]",129199,0.29,0,37
|
||||
|
67
data/nsight/unreal/debug/unreal_debug_95s_osrt_sum.csv
Normal file
67
data/nsight/unreal/debug/unreal_debug_95s_osrt_sum.csv
Normal file
@ -0,0 +1,67 @@
|
||||
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
|
||||
73.6,600536270822,294931,2036192.4,83376.0,1001,11949169945,103649678.6,pthread_cond_wait
|
||||
12.6,102507542226,21998,4659857.4,955280.5,1001,10026555313,74393737.7,pthread_cond_timedwait
|
||||
5.5,44502791135,13049,3410436.9,1453.0,1001,11753050441,103680620.5,poll
|
||||
2.8,22497155736,2844,7910392.3,6308789.5,6259628,13658685,1861319.5,usleep
|
||||
1.5,12128603443,5,2425720688.6,11877826.0,347760,12092645848,5403977863.3,sem_wait
|
||||
1.5,11873505308,121,98128143.0,100109472.0,1563,102009533,13389021.8,select
|
||||
1.1,9136705014,7227,1264245.9,1055382.0,1003927,200089583,5334533.2,nanosleep
|
||||
1.0,7954907910,949,8382410.9,167743.0,1002,114550329,17174224.3,pthread_rwlock_wrlock
|
||||
0.2,1617017606,15550,103988.3,8907.0,1001,53401686,1737144.0,pthread_mutex_lock
|
||||
0.1,1007831079,621054,1622.8,1232.0,1021,433110,1480.9,backtrace
|
||||
0.1,887761177,7109,124878.5,22281.0,1002,5900677,273454.1,ioctl
|
||||
0.1,597740451,806,741613.5,38412.0,1002,10650222,1380353.1,pthread_rwlock_rdlock
|
||||
0.0,321704681,190892,1685.3,1402.0,1001,2777832,6658.3,pthread_cond_broadcast
|
||||
0.0,176520705,17458,10111.2,6753.0,1001,14714919,113239.8,read
|
||||
0.0,168347128,81,2078359.6,137297.0,6502,47196219,8673092.4,pthread_join
|
||||
0.0,56685129,17970,3154.4,2675.0,1132,198231,2651.2,open
|
||||
0.0,36826359,2858,12885.4,7333.5,4598,14347242,268237.8,accept
|
||||
0.0,32925272,19470,1691.1,1393.0,1001,511155,4654.5,pthread_cond_signal
|
||||
0.0,32777790,29715,1103.1,1052.0,1001,24416,301.1,openat64
|
||||
0.0,31806884,3832,8300.3,3266.0,1142,2086840,78000.1,send
|
||||
0.0,20845334,972,21445.8,6016.0,1443,2065250,81210.8,munmap
|
||||
0.0,17524676,1134,15453.9,2294.0,1001,13924121,413404.3,open64
|
||||
0.0,16992967,8639,1967.0,1333.0,1001,95338,2218.6,recvmsg
|
||||
0.0,16540943,657,25176.5,14507.0,1082,2422076,109799.1,fwrite
|
||||
0.0,13651092,3342,4084.7,3917.0,1223,184905,3428.7,writev
|
||||
0.0,13059446,7363,1773.7,1312.0,1001,346617,9409.2,close
|
||||
0.0,9948522,1380,7209.1,2525.0,1442,3817374,107927.0,fflush
|
||||
0.0,8667482,3318,2612.3,1522.0,1001,1576266,33796.2,write
|
||||
0.0,8182173,332,24645.1,6602.0,2224,2694535,155893.4,mmap64
|
||||
0.0,7555129,13,581163.8,531693.0,412481,1093925,174340.9,fdatasync
|
||||
0.0,2940109,82,35855.0,30226.5,13756,274694,29562.0,pthread_create
|
||||
0.0,2405376,54,44544.0,1212.5,1002,2254934,306533.0,fgets
|
||||
0.0,1826800,877,2083.0,1593.0,1002,112881,3983.7,fopen
|
||||
0.0,1568152,536,2925.7,1888.0,1002,153327,6975.4,mmap
|
||||
0.0,1488803,8,186100.4,132758.0,1102,616592,224937.3,futex
|
||||
0.0,1473389,255,5778.0,3296.0,1042,42118,7330.6,mprotect
|
||||
0.0,1049989,684,1535.1,1333.0,1012,7284,721.1,fread
|
||||
0.0,766475,307,2496.7,1392.0,1001,74369,4794.8,recv
|
||||
0.0,581676,276,2107.5,1863.5,1432,14067,986.9,fopen64
|
||||
0.0,497972,208,2394.1,1513.0,1012,27972,2644.8,fclose
|
||||
0.0,469076,4,117269.0,116547.5,13776,222205,115738.1,sem_timedwait
|
||||
0.0,134113,41,3271.0,3977.0,1012,10450,2444.0,stat64
|
||||
0.0,112011,42,2666.9,1553.0,1002,9709,1912.5,fstat64
|
||||
0.0,103293,1,103293.0,103293.0,103293,103293,0.0,popen
|
||||
0.0,99900,18,5550.0,4393.5,2495,13957,3053.3,socket
|
||||
0.0,90499,31,2919.3,2395.0,1102,6753,1727.2,sendmsg
|
||||
0.0,72786,14,5199.0,5195.0,1012,9678,2444.5,connect
|
||||
0.0,71996,7,10285.1,8186.0,3166,29866,9253.4,ftruncate
|
||||
0.0,40037,7,5719.6,5220.0,2856,8877,2158.0,socketpair
|
||||
0.0,21079,12,1756.6,1172.0,1052,4098,1122.5,statx
|
||||
0.0,18413,4,4603.3,4748.5,1473,7443,2443.0,bind
|
||||
0.0,17864,4,4466.0,3952.5,1082,8877,3967.9,stat
|
||||
0.0,11912,3,3970.7,1312.0,1012,9588,4867.1,sigaction
|
||||
0.0,10348,3,3449.3,3977.0,1052,5319,2181.9,getdelim
|
||||
0.0,9979,5,1995.8,1834.0,1122,3427,861.8,flock
|
||||
0.0,9658,4,2414.5,2329.0,1613,3387,730.9,lockf
|
||||
0.0,6803,1,6803.0,6803.0,6803,6803,0.0,memfd_create
|
||||
0.0,6763,4,1690.8,1317.5,1002,3126,976.6,shutdown
|
||||
0.0,6222,1,6222.0,6222.0,6222,6222,0.0,pipe2
|
||||
0.0,5550,4,1387.5,1452.5,1062,1583,226.1,fcntl64
|
||||
0.0,4799,1,4799.0,4799.0,4799,4799,0.0,pipe
|
||||
0.0,4108,2,2054.0,2054.0,1062,3046,1402.9,getc
|
||||
0.0,3847,3,1282.3,1222.0,1032,1593,285.3,pthread_mutex_trylock
|
||||
0.0,2094,1,2094.0,2094.0,2094,2094,0.0,listen
|
||||
0.0,1122,1,1122.0,1122.0,1122,1122,0.0,prctl
|
||||
0.0,1021,1,1021.0,1021.0,1021,1021,0.0,fcntl
|
||||
|
44
data/nsight/unreal/debug/unreal_debug_95s_vulkan_api_sum.csv
Normal file
44
data/nsight/unreal/debug/unreal_debug_95s_vulkan_api_sum.csv
Normal file
@ -0,0 +1,44 @@
|
||||
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
|
||||
48.9,20638314576,230,89731802.5,68915663.0,2133988,326371605,67952409.0,vkCreateComputePipelines
|
||||
46.6,19692005329,805,24462118.4,13866023.0,2996,237456733,34442454.1,vkCreateGraphicsPipelines
|
||||
1.4,578382777,1,578382777.0,578382777.0,578382777,578382777,0.0,vkCreateDevice
|
||||
0.5,228849585,2858,80073.3,76994.0,18234,1065702,32969.1,vkQueuePresentKHR
|
||||
0.5,213757378,46928,4555.0,2725.0,1653,1074488,8613.4,vkQueueSubmit
|
||||
0.4,159896427,620927,257.5,190.0,40,108983,340.2,vkCmdBindPipeline
|
||||
0.4,159286991,1,159286991.0,159286991.0,159286991,159286991,0.0,vkDestroyDevice
|
||||
0.3,126375275,436832,289.3,200.0,80,142787,705.3,vkCmdPipelineBarrier2KHR
|
||||
0.2,80310884,46929,1711.3,1463.0,230,448247,3000.5,vkBeginCommandBuffer
|
||||
0.1,63269467,2,31634733.5,31634733.5,4798067,58471400,37952777.7,vkCreateSwapchainKHR
|
||||
0.1,58286455,46928,1242.0,651.0,80,69711,2031.1,vkEndCommandBuffer
|
||||
0.1,56955396,97,587169.0,213649.0,51707,9314917,1214901.9,vkAllocateMemory
|
||||
0.1,37211410,3189,11668.7,491.0,280,2368466,121145.7,vkWaitForFences
|
||||
0.1,34712642,96,361590.0,101870.5,36909,2905680,549970.1,vkFreeMemory
|
||||
0.1,25770399,32,805325.0,357944.0,215773,14322325,2471356.3,vkCreateFence
|
||||
0.0,20230742,11447,1767.3,1924.0,672,48170,1017.4,vkGetAccelerationStructureBuildSizesKHR
|
||||
0.0,14080023,566,24876.4,15308.0,932,184234,27538.5,vkCreateShaderModule
|
||||
0.0,13632437,5,2726487.4,1936199.0,1715185,5145847,1441226.1,vkDeviceWaitIdle
|
||||
0.0,13233723,32,413553.8,330327.0,307505,800456,160089.2,vkDestroyFence
|
||||
0.0,5886396,1014,5805.1,1082.0,190,1481658,56313.4,vkCreateImageView
|
||||
0.0,5538631,2858,1937.9,911.0,651,2632219,49241.0,vkAcquireNextImageKHR
|
||||
0.0,5368346,383,14016.6,681.0,411,1720125,112862.6,vkBindImageMemory
|
||||
0.0,3544843,10546,336.1,241.0,40,54021,644.8,vkCreateBuffer
|
||||
0.0,2907360,6961,417.7,361.0,60,6031,237.0,vkCreateAccelerationStructureKHR
|
||||
0.0,2084957,6961,299.5,251.0,40,8546,191.2,vkDestroyAccelerationStructureKHR
|
||||
0.0,1026751,10497,97.8,40.0,30,18906,253.0,vkBindBufferMemory2
|
||||
0.0,923095,2857,323.1,290.0,150,14918,329.9,vkResetEvent
|
||||
0.0,902569,2858,315.8,290.0,230,2515,93.2,vkCmdPipelineBarrier
|
||||
0.0,837483,383,2186.6,1122.0,201,65332,3821.0,vkCreateImage
|
||||
0.0,688105,3665,187.8,170.0,20,5029,142.2,vkResetQueryPoolEXT
|
||||
0.0,379150,272,1393.9,1192.0,1002,6222,652.7,vkGetQueryPoolResults
|
||||
0.0,251291,84,2991.6,916.5,230,26129,4053.5,vkAllocateCommandBuffers
|
||||
0.0,210654,106,1987.3,1538.0,1002,4969,946.6,vkCreateEvent
|
||||
0.0,93734,53,1768.6,1603.0,441,3667,872.1,vkCreateRenderPass2KHR
|
||||
0.0,51256,49,1046.0,1042.0,471,1723,295.3,vkBindBufferMemory
|
||||
0.0,40185,15,2679.0,1483.0,1002,16982,4019.8,vkDestroyEvent
|
||||
0.0,39194,123,318.7,191.0,50,3367,465.9,vkCreateFramebuffer
|
||||
0.0,5190,2,2595.0,2595.0,1142,4048,2054.9,vkCreateSemaphore
|
||||
0.0,4007,27,148.4,100.0,40,692,145.9,vkMapMemory
|
||||
0.0,2756,2,1378.0,1378.0,1233,1523,205.1,vkDestroySemaphore
|
||||
0.0,1823,2,911.5,911.5,831,992,113.8,vkTrimCommandPool
|
||||
0.0,1392,8,174.0,150.0,130,350,73.7,vkUnmapMemory
|
||||
0.0,1303,1,1303.0,1303.0,1303,1303,0.0,vkCreateCommandPool
|
||||
|
32
data/nsight/unreal/debug/unreal_phase1_0s_gpu_metrics.csv
Normal file
32
data/nsight/unreal/debug/unreal_phase1_0s_gpu_metrics.csv
Normal file
@ -0,0 +1,32 @@
|
||||
metricName,samples,avg_value,min_value,max_value
|
||||
"Vertex/Tess/Geometry Warps in Flight [Throughput %]",350205,0.42,0,1
|
||||
"Vertex/Tess/Geometry Warps in Flight [Avg]",350205,2508.84,0,80568
|
||||
"Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle]",350205,0.42,0,1
|
||||
"Unallocated Warps in Active SMs [Throughput %]",350205,20.55,0,90
|
||||
"Unallocated Warps in Active SMs [Avg]",350205,2825149.4,-8388543,8388540
|
||||
"Unallocated Warps in Active SMs [Avg Warps per Cycle]",350205,19.72,0,86
|
||||
"Tensor Active [Throughput %]",350205,0.0,0,0
|
||||
"Sync Copy Engine Active [Throughput %]",350205,2.04,0,100
|
||||
"Sync Copy Engine Active [Cycles Active]",350205,3212.99,0,163429
|
||||
"Sync Compute in Flight [Throughput %]",350205,43.24,0,100
|
||||
"SYS Clock Frequency [MHz]",350205,1593337364.85,1079680000,1650528169
|
||||
"SMs Active [Throughput %]",350205,42.79,0,100
|
||||
"SM Issue [Throughput %]",350205,13.91,0,99
|
||||
"Pixel Warps in Flight [Throughput %]",350205,9.45,0,98
|
||||
"Pixel Warps in Flight [Avg]",350205,1660859.11,0,18354900
|
||||
"Pixel Warps in Flight [Avg Warps per Cycle]",350205,9.08,0,94
|
||||
"PCIe Write Requests to BAR1 [Requests]",350205,45.25,0,530
|
||||
"PCIe TX Throughput [Throughput %]",350205,1.38,1,17
|
||||
"PCIe Read Requests to BAR1 [Requests]",350205,0.0,0,1
|
||||
"PCIe RX Throughput [Throughput %]",350205,1.4,0,96
|
||||
"GR Active [Throughput %]",350205,85.69,0,100
|
||||
"GPU Active [Throughput %]",350205,91.16,0,100
|
||||
"GPC Clock Frequency [MHz]",350205,1886072181.12,1287905714,1964364311
|
||||
"DRAM Write Bandwidth [Throughput %]",350205,10.19,0,44
|
||||
"DRAM Read Bandwidth [Throughput %]",350205,10.4,0,68
|
||||
"Compute Warps in Flight [Throughput %]",350205,13.05,0,93
|
||||
"Compute Warps in Flight [Avg]",350205,2347468.75,0,17187259
|
||||
"Compute Warps in Flight [Avg Warps per Cycle]",350205,12.54,0,89
|
||||
"Async Copy Engine Active [Throughput %]",350205,25.27,0,100
|
||||
"Async Copy Engine Active [Cycles Active]",350205,40591.85,0,164990
|
||||
"Async Compute in Flight [Throughput %]",350205,0.16,0,32
|
||||
|
64
data/nsight/unreal/debug/unreal_phase1_0s_osrt_sum.csv
Normal file
64
data/nsight/unreal/debug/unreal_phase1_0s_osrt_sum.csv
Normal file
@ -0,0 +1,64 @@
|
||||
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
|
||||
63.2,882304088002,1166913,756101.0,81462.0,1001,7559143983,12240943.4,pthread_cond_wait
|
||||
21.0,292653885125,68267,4286901.2,807442.0,1001,2000062469,37737382.8,pthread_cond_timedwait
|
||||
7.5,104841486605,91004,1152053.6,20959.0,1001,100202074,8650331.1,poll
|
||||
4.9,67734436293,8697,7788253.0,6307024.0,6255048,10300238,1834125.8,usleep
|
||||
2.5,34536907707,346,99817652.3,100108789.5,2816,100189751,5381658.2,select
|
||||
0.3,4645633931,1571,2957119.0,14076.0,1002,42138249,7473785.3,pthread_rwlock_wrlock
|
||||
0.3,3536426442,2306885,1533.0,1182.0,991,1362810,1510.7,backtrace
|
||||
0.1,1059145369,668650,1584.0,1362.0,1001,715068,1528.2,pthread_cond_broadcast
|
||||
0.1,988864255,209,4731407.9,1055376.0,1014959,200057791,19895982.3,nanosleep
|
||||
0.1,798111557,50557,15786.4,8285.0,1001,8524665,131331.0,pthread_mutex_lock
|
||||
0.0,611701922,5111,119683.4,25978.0,1002,31632084,622639.5,ioctl
|
||||
0.0,376848621,60069,6273.6,6532.0,1001,405709,5146.3,read
|
||||
0.0,258671149,1225,211160.1,9288.0,1032,4226160,544261.6,pthread_rwlock_rdlock
|
||||
0.0,216400027,66377,3260.2,2605.0,1002,15570944,60498.3,open
|
||||
0.0,111076542,69883,1589.5,1353.0,1001,135153,1407.8,pthread_cond_signal
|
||||
0.0,87668535,4,21917133.8,21504729.5,59000,44600076,25244809.2,pthread_join
|
||||
0.0,76843593,10735,7158.2,6763.0,3286,41598,1991.5,accept
|
||||
0.0,49215712,27008,1822.3,1233.0,1001,85209,1666.7,recvmsg
|
||||
0.0,44304559,11139,3977.4,3817.0,1032,111409,1700.7,writev
|
||||
0.0,42663686,12430,3432.3,3266.0,1042,104415,1895.8,send
|
||||
0.0,30965733,22153,1397.8,1312.0,1001,178734,1455.8,close
|
||||
0.0,29504432,27813,1060.8,1032.0,1001,11281,189.5,openat64
|
||||
0.0,27256936,4,6814234.0,8394872.5,312655,10154536,4413578.4,sem_wait
|
||||
0.0,17492819,10623,1646.7,1462.0,1001,107611,2484.7,write
|
||||
0.0,7107260,632,11245.7,4283.0,1583,176601,20095.6,munmap
|
||||
0.0,5922929,294,20146.0,6432.0,1994,937504,77467.0,mmap64
|
||||
0.0,5164606,1253,4121.8,1803.0,1011,132007,6746.5,fread
|
||||
0.0,3959351,6,659891.8,631301.5,584835,828932,87442.6,fdatasync
|
||||
0.0,3216472,83,38752.7,30297.0,13496,331160,37514.3,pthread_create
|
||||
0.0,2682191,56,47896.3,1333.0,1002,2495521,333101.4,fgets
|
||||
0.0,2655718,836,3176.7,2304.0,1001,47409,3405.4,open64
|
||||
0.0,1652463,481,3435.5,1983.0,1001,292447,13794.6,mmap
|
||||
0.0,1568473,786,1995.5,1603.0,1001,21009,1474.4,fopen
|
||||
0.0,1126920,4,281730.0,148122.5,1463,829212,390211.3,futex
|
||||
0.0,616506,195,3161.6,2665.0,1242,10821,1893.0,mprotect
|
||||
0.0,615352,2,307676.0,307676.0,299200,316152,11986.9,sem_timedwait
|
||||
0.0,530662,274,1936.7,1663.5,1272,5360,615.6,fopen64
|
||||
0.0,524552,49,10705.1,8746.0,1002,23504,6281.1,fwrite
|
||||
0.0,441809,102,4331.5,2685.0,2175,15509,2957.4,fflush
|
||||
0.0,409157,193,2120.0,1423.0,1001,12915,2003.4,fclose
|
||||
0.0,390698,251,1556.6,1182.0,1052,14697,1534.4,recv
|
||||
0.0,134752,1,134752.0,134752.0,134752,134752,0.0,popen
|
||||
0.0,121057,36,3362.7,3767.0,1002,10189,2399.8,stat64
|
||||
0.0,84727,16,5295.4,5144.5,2655,10630,2359.5,socket
|
||||
0.0,81942,33,2483.1,1884.0,1002,5591,1310.3,sendmsg
|
||||
0.0,78662,27,2913.4,3376.0,1002,5932,1820.4,fstat64
|
||||
0.0,60112,11,5464.7,5851.0,1814,9257,2203.0,connect
|
||||
0.0,47769,4,11942.3,5981.0,4288,31519,13127.8,ftruncate
|
||||
0.0,37511,7,5358.7,4409.0,2505,12593,3297.7,socketpair
|
||||
0.0,19687,4,4921.8,5515.0,1463,7194,2580.7,bind
|
||||
0.0,10198,4,2549.5,2374.0,1022,4428,1621.9,statx
|
||||
0.0,9979,2,4989.5,4989.5,4178,5801,1147.6,getdelim
|
||||
0.0,8647,4,2161.8,1568.0,1092,4419,1522.7,getc
|
||||
0.0,6852,4,1713.0,1923.5,1001,2004,477.3,lockf
|
||||
0.0,5390,1,5390.0,5390.0,5390,5390,0.0,pipe2
|
||||
0.0,5109,1,5109.0,5109.0,5109,5109,0.0,memfd_create
|
||||
0.0,4057,1,4057.0,4057.0,4057,4057,0.0,pipe
|
||||
0.0,3827,3,1275.7,1132.0,1082,1613,293.2,sigaction
|
||||
0.0,3796,2,1898.0,1898.0,1793,2003,148.5,flock
|
||||
0.0,3556,3,1185.3,1222.0,1052,1282,119.3,stat
|
||||
0.0,2244,2,1122.0,1122.0,1082,1162,56.6,fcntl
|
||||
0.0,1583,1,1583.0,1583.0,1583,1583,0.0,pthread_mutex_trylock
|
||||
0.0,1552,1,1552.0,1552.0,1552,1552,0.0,listen
|
||||
|
@ -0,0 +1,41 @@
|
||||
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
|
||||
47.3,4304563563,231,18634474.3,13822373.0,269513,50400702,13385050.1,vkCreateComputePipelines
|
||||
10.0,906535248,793,1143171.8,97372.0,2966,36675320,3967519.3,vkCreateGraphicsPipelines
|
||||
8.7,793761848,10286,77169.1,74760.0,16591,1147426,18530.7,vkQueuePresentKHR
|
||||
7.9,715335192,166918,4285.5,2645.0,1643,1404546,6331.9,vkQueueSubmit
|
||||
6.5,590495658,1,590495658.0,590495658.0,590495658,590495658,0.0,vkCreateDevice
|
||||
5.6,512936386,2236013,229.4,170.0,40,197940,293.9,vkCmdBindPipeline
|
||||
4.7,431480388,1566338,275.5,191.0,80,87854,626.4,vkCmdPipelineBarrier2KHR
|
||||
2.6,235858982,166919,1413.0,1162.0,230,559757,2096.2,vkBeginCommandBuffer
|
||||
1.8,163623284,166918,980.3,521.0,80,227395,1634.2,vkEndCommandBuffer
|
||||
1.3,120651909,12142,9936.7,491.0,300,3236622,115194.4,vkWaitForFences
|
||||
1.2,108109757,2,54054878.5,54054878.5,23852716,84257041,42712307.8,vkCreateSwapchainKHR
|
||||
0.8,71024516,41161,1725.5,1934.0,661,96741,970.7,vkGetAccelerationStructureBuildSizesKHR
|
||||
0.4,37508995,92,407706.5,127688.5,40025,5173172,783607.3,vkAllocateMemory
|
||||
0.3,27598086,33,836305.6,338072.0,214851,15190000,2585500.1,vkCreateFence
|
||||
0.1,13615140,567,24012.6,14076.0,862,188632,28324.9,vkCreateShaderModule
|
||||
0.1,12712127,38647,328.9,221.0,40,11882,338.5,vkCreateBuffer
|
||||
0.1,10958176,23960,457.4,421.0,120,16511,276.4,vkCreateAccelerationStructureKHR
|
||||
0.1,9430649,10286,916.8,871.0,631,8456,289.7,vkAcquireNextImageKHR
|
||||
0.1,7833984,1001,7826.2,872.0,200,2438580,107199.4,vkCreateImageView
|
||||
0.1,6541995,20571,318.0,281.0,40,12764,226.9,vkDestroyAccelerationStructureKHR
|
||||
0.1,4708503,2,2354251.5,2354251.5,1685431,3023072,945855.0,vkDeviceWaitIdle
|
||||
0.1,4591969,400,11479.9,491.0,391,1596695,89890.3,vkBindImageMemory
|
||||
0.0,3632879,2496,1455.5,1242.0,1001,12123,801.9,vkGetQueryPoolResults
|
||||
0.0,3568885,38603,92.5,40.0,30,14186,165.0,vkBindBufferMemory2
|
||||
0.0,3192728,10286,310.4,290.0,220,1733,71.8,vkCmdPipelineBarrier
|
||||
0.0,2867124,10285,278.8,261.0,120,7985,162.6,vkResetEvent
|
||||
0.0,2169561,11093,195.6,171.0,20,5711,111.1,vkResetQueryPoolEXT
|
||||
0.0,939713,400,2349.3,862.0,181,307535,15384.4,vkCreateImage
|
||||
0.0,743177,9,82575.2,42329.0,35215,329947,94644.5,vkFreeMemory
|
||||
0.0,306413,197,1555.4,711.0,310,14918,2359.4,vkAllocateCommandBuffers
|
||||
0.0,237721,96,2476.3,2419.5,1002,13456,1675.8,vkCreateEvent
|
||||
0.0,71647,52,1377.8,917.0,211,5972,1216.1,vkCreateRenderPass2KHR
|
||||
0.0,40338,44,916.8,812.0,451,2765,444.4,vkBindBufferMemory
|
||||
0.0,39371,123,320.1,110.0,50,4910,770.7,vkCreateFramebuffer
|
||||
0.0,3014,21,143.5,120.0,40,391,115.2,vkMapMemory
|
||||
0.0,1633,2,816.5,816.5,631,1002,262.3,vkTrimCommandPool
|
||||
0.0,1432,1,1432.0,1432.0,1432,1432,0.0,vkDestroySemaphore
|
||||
0.0,1282,1,1282.0,1282.0,1282,1282,0.0,vkDestroyEvent
|
||||
0.0,1032,1,1032.0,1032.0,1032,1032,0.0,vkCreateCommandPool
|
||||
0.0,1002,1,1002.0,1002.0,1002,1002,0.0,vkCreateSemaphore
|
||||
|
32
data/nsight/unreal/debug/unreal_phase2_30s_gpu_metrics.csv
Normal file
32
data/nsight/unreal/debug/unreal_phase2_30s_gpu_metrics.csv
Normal file
@ -0,0 +1,32 @@
|
||||
metricName,samples,avg_value,min_value,max_value
|
||||
"Vertex/Tess/Geometry Warps in Flight [Throughput %]",350249,0.48,0,10
|
||||
"Vertex/Tess/Geometry Warps in Flight [Avg]",350249,11994.85,0,1250154
|
||||
"Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle]",350249,0.45,0,7
|
||||
"Unallocated Warps in Active SMs [Throughput %]",350249,20.91,0,90
|
||||
"Unallocated Warps in Active SMs [Avg]",350249,2674568.52,-8388551,8388603
|
||||
"Unallocated Warps in Active SMs [Avg Warps per Cycle]",350249,20.07,0,86
|
||||
"Tensor Active [Throughput %]",350249,0.0,0,0
|
||||
"Sync Copy Engine Active [Throughput %]",350249,2.04,0,100
|
||||
"Sync Copy Engine Active [Cycles Active]",350249,3210.05,0,164917
|
||||
"Sync Compute in Flight [Throughput %]",350249,43.21,0,100
|
||||
"SYS Clock Frequency [MHz]",350249,1599260779.65,1124820000,1665030000
|
||||
"SMs Active [Throughput %]",350249,42.97,0,100
|
||||
"SM Issue [Throughput %]",350249,13.97,0,99
|
||||
"Pixel Warps in Flight [Throughput %]",350249,9.26,0,99
|
||||
"Pixel Warps in Flight [Avg]",350249,1626352.79,0,18545892
|
||||
"Pixel Warps in Flight [Avg Warps per Cycle]",350249,8.9,0,95
|
||||
"PCIe Write Requests to BAR1 [Requests]",350249,48.49,0,530
|
||||
"PCIe TX Throughput [Throughput %]",350249,1.39,1,17
|
||||
"PCIe Read Requests to BAR1 [Requests]",350249,0.0,0,1
|
||||
"PCIe RX Throughput [Throughput %]",350249,1.59,0,96
|
||||
"GR Active [Throughput %]",350249,85.48,0,100
|
||||
"GPU Active [Throughput %]",350249,90.8,0,100
|
||||
"GPC Clock Frequency [MHz]",350249,1887936516.09,1345411429,1965055714
|
||||
"DRAM Write Bandwidth [Throughput %]",350249,10.0,0,78
|
||||
"DRAM Read Bandwidth [Throughput %]",350249,10.19,0,67
|
||||
"Compute Warps in Flight [Throughput %]",350249,13.0,0,93
|
||||
"Compute Warps in Flight [Avg]",350249,2341816.22,0,17218710
|
||||
"Compute Warps in Flight [Avg Warps per Cycle]",350249,12.5,0,90
|
||||
"Async Copy Engine Active [Throughput %]",350249,24.19,0,100
|
||||
"Async Copy Engine Active [Cycles Active]",350249,38961.97,0,165002
|
||||
"Async Compute in Flight [Throughput %]",350249,0.17,0,35
|
||||
|
65
data/nsight/unreal/debug/unreal_phase2_30s_osrt_sum.csv
Normal file
65
data/nsight/unreal/debug/unreal_phase2_30s_osrt_sum.csv
Normal file
@ -0,0 +1,65 @@
|
||||
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
|
||||
65.1,939238393966,1253746,749145.7,99516.0,1001,22225185125,27598875.4,pthread_cond_wait
|
||||
19.6,283372591523,63863,4437195.1,667409.0,1001,2000066987,37775239.1,pthread_cond_timedwait
|
||||
7.2,103523321736,83184,1244510.0,19371.5,1001,100158833,9028721.1,poll
|
||||
4.7,67623762335,8679,7791653.7,6308036.0,6256440,10363397,1834071.0,usleep
|
||||
2.4,34537204538,346,99818510.2,100108404.0,2074,100558361,5381791.1,select
|
||||
0.3,4633653836,1769,2619363.4,10690.0,1002,81183859,7632062.5,pthread_rwlock_wrlock
|
||||
0.2,3564450731,2289546,1556.8,1192.0,992,1166253,1451.7,backtrace
|
||||
0.2,2186807640,213,10266702.5,1055706.0,1013547,200059383,37910858.8,nanosleep
|
||||
0.1,1197512783,747301,1602.5,1373.0,1001,990875,2070.5,pthread_cond_broadcast
|
||||
0.1,1000015122,63378,15778.6,8386.0,1001,8679265,127982.7,pthread_mutex_lock
|
||||
0.0,692411875,5159,134214.4,28353.0,1002,37919402,814147.2,ioctl
|
||||
0.0,376956878,58727,6418.8,6573.0,1001,215753,4816.1,read
|
||||
0.0,262438870,1114,235582.5,8972.0,1012,3887476,618055.6,pthread_rwlock_rdlock
|
||||
0.0,201249425,64299,3129.9,2665.0,1042,474829,2741.8,open
|
||||
0.0,119755986,75469,1586.8,1362.0,1001,185868,1300.7,pthread_cond_signal
|
||||
0.0,87476826,4,21869206.5,21608674.0,71504,44187974,25163290.6,pthread_join
|
||||
0.0,78419182,10390,7547.6,7174.0,4308,51646,2186.9,accept
|
||||
0.0,51411361,26962,1906.8,1243.0,1001,610343,4156.7,recvmsg
|
||||
0.0,44191693,10805,4089.9,3897.0,1132,74259,1442.1,writev
|
||||
0.0,42212528,12379,3410.0,3216.0,1062,75512,1516.0,send
|
||||
0.0,31715010,29284,1083.0,1062.0,1001,9037,176.8,openat64
|
||||
0.0,31357688,21714,1444.1,1342.0,1001,171231,1546.9,close
|
||||
0.0,29161358,4,7290339.5,9321035.5,490768,10028519,4566762.7,sem_wait
|
||||
0.0,19619685,741,26477.3,2946.0,1002,16496436,605868.8,open64
|
||||
0.0,16704779,9960,1677.2,1473.0,1001,74089,1397.7,write
|
||||
0.0,7170661,630,11382.0,4493.0,1273,203591,19759.6,munmap
|
||||
0.0,6221613,295,21090.2,7584.0,2364,992327,79415.2,mmap64
|
||||
0.0,5100259,1250,4080.2,1788.5,1002,64672,6212.7,fread
|
||||
0.0,3222871,43,74950.5,1583.0,1163,3069094,467529.0,fgets
|
||||
0.0,3213395,83,38715.6,31229.0,18024,351818,37357.9,pthread_create
|
||||
0.0,3069705,5,613941.0,607577.0,476021,720739,90067.6,fdatasync
|
||||
0.0,2574090,4,643522.5,146489.5,2185,2278926,1098695.2,futex
|
||||
0.0,1547765,811,1908.5,1583.0,1002,18795,1290.6,fopen
|
||||
0.0,1418724,493,2877.7,2074.0,1001,26530,2638.3,mmap
|
||||
0.0,1075665,108,9959.9,6733.0,1002,27541,7008.8,fwrite
|
||||
0.0,1014948,230,4412.8,2685.0,1663,20168,3313.5,fflush
|
||||
0.0,955386,647,1476.6,1212.0,1002,29245,1752.5,recv
|
||||
0.0,610149,195,3129.0,2465.0,1253,26650,2395.3,mprotect
|
||||
0.0,580339,274,2118.0,1863.5,1243,8055,705.9,fopen64
|
||||
0.0,551532,2,275766.0,275766.0,265417,286115,14635.7,sem_timedwait
|
||||
0.0,405319,188,2156.0,1512.0,1001,12163,1920.5,fclose
|
||||
0.0,143337,42,3412.8,2239.5,1012,13305,2842.6,stat64
|
||||
0.0,96974,16,6060.9,5791.0,2254,13616,2885.0,socket
|
||||
0.0,96759,33,2932.1,1973.0,1052,6552,1830.5,sendmsg
|
||||
0.0,96400,1,96400.0,96400.0,96400,96400,0.0,popen
|
||||
0.0,76692,27,2840.4,3095.0,1032,5711,1567.0,fstat64
|
||||
0.0,64490,11,5862.7,6682.0,1853,8907,2102.0,connect
|
||||
0.0,54001,4,13500.3,6026.5,4358,37590,16118.4,ftruncate
|
||||
0.0,43031,7,6147.3,5110.0,2615,10780,2859.9,socketpair
|
||||
0.0,23723,9,2635.9,3075.0,1012,5200,1568.2,statx
|
||||
0.0,19458,4,4864.5,5766.5,1002,6923,2717.2,bind
|
||||
0.0,11863,2,5931.5,5931.5,3948,7915,2805.1,getdelim
|
||||
0.0,10210,4,2552.5,2139.5,1182,4749,1649.1,getc
|
||||
0.0,7934,2,3967.0,3967.0,1562,6372,3401.2,pthread_mutex_trylock
|
||||
0.0,7063,3,2354.3,2435.0,1903,2725,416.9,lockf
|
||||
0.0,6843,1,6843.0,6843.0,6843,6843,0.0,pipe2
|
||||
0.0,5691,1,5691.0,5691.0,5691,5691,0.0,memfd_create
|
||||
0.0,5370,4,1342.5,1327.5,1012,1703,328.7,fcntl64
|
||||
0.0,5030,4,1257.5,1217.5,1052,1543,233.6,stat
|
||||
0.0,3988,1,3988.0,3988.0,3988,3988,0.0,pipe
|
||||
0.0,3596,2,1798.0,1798.0,1733,1863,91.9,flock
|
||||
0.0,3116,2,1558.0,1558.0,1293,1823,374.8,fcntl
|
||||
0.0,2855,2,1427.5,1427.5,1122,1733,432.0,sigaction
|
||||
0.0,1783,1,1783.0,1783.0,1783,1783,0.0,listen
|
||||
|
@ -0,0 +1,39 @@
|
||||
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
|
||||
47.1,4408914080,233,18922378.0,13455046.0,185617,56011291,14760998.1,vkCreateComputePipelines
|
||||
11.2,1045885246,797,1312277.6,92924.0,3105,36430873,4324767.0,vkCreateGraphicsPipelines
|
||||
9.5,888494352,11531,77052.7,76452.0,16360,900363,19476.1,vkQueuePresentKHR
|
||||
7.8,731581920,186589,3920.8,2625.0,1593,1639285,5212.1,vkQueueSubmit
|
||||
6.4,595380851,2528014,235.5,170.0,40,182351,323.2,vkCmdBindPipeline
|
||||
5.8,541364941,1,541364941.0,541364941.0,541364941,541364941,0.0,vkCreateDevice
|
||||
5.2,485559049,1798810,269.9,190.0,80,941591,933.6,vkCmdPipelineBarrier2KHR
|
||||
2.0,188899262,186590,1012.4,942.0,220,901967,2432.1,vkBeginCommandBuffer
|
||||
1.3,120960739,186589,648.3,330.0,80,120916,1027.9,vkEndCommandBuffer
|
||||
0.9,81326038,46147,1762.3,1944.0,741,82524,908.0,vkGetAccelerationStructureBuildSizesKHR
|
||||
0.7,61601253,3,20533751.0,3293287.0,3013755,55294211,30103765.9,vkCreateSwapchainKHR
|
||||
0.5,42258552,11627,3634.5,461.0,301,2606383,62274.9,vkWaitForFences
|
||||
0.4,37227731,95,391870.9,141975.0,37510,4369409,644086.2,vkAllocateMemory
|
||||
0.2,19195932,33,581694.9,325579.0,208410,8169945,1367481.9,vkCreateFence
|
||||
0.1,13706556,582,23550.8,13626.0,531,241291,29168.6,vkCreateShaderModule
|
||||
0.1,13512957,9203,1468.3,1252.0,1001,37961,840.0,vkGetQueryPoolResults
|
||||
0.1,10527142,26275,400.7,360.0,110,90429,610.2,vkCreateAccelerationStructureKHR
|
||||
0.1,10302597,11531,893.5,842.0,622,7554,252.1,vkAcquireNextImageKHR
|
||||
0.1,9980822,23707,421.0,250.0,40,17412,457.5,vkCreateBuffer
|
||||
0.1,7822356,4,1955589.0,1670954.5,1551050,2929397,651664.2,vkDeviceWaitIdle
|
||||
0.1,7818709,23063,339.0,310.0,40,9368,202.2,vkDestroyAccelerationStructureKHR
|
||||
0.1,6815677,572,11915.5,455.5,400,1607465,91981.0,vkBindImageMemory
|
||||
0.1,5674639,1485,3821.3,732.0,180,1609990,47166.1,vkCreateImageView
|
||||
0.0,4289993,11532,372.0,281.0,120,34204,391.2,vkResetEvent
|
||||
0.0,3529485,11533,306.0,281.0,230,39935,389.2,vkCmdPipelineBarrier
|
||||
0.0,3285335,23663,138.8,41.0,30,38682,337.6,vkBindBufferMemory2
|
||||
0.0,2486938,12339,201.6,180.0,20,7645,137.7,vkResetQueryPoolEXT
|
||||
0.0,2396902,12,199741.8,66895.0,34244,1049232,341650.1,vkFreeMemory
|
||||
0.0,927372,572,1621.3,601.0,160,98173,4412.2,vkCreateImage
|
||||
0.0,225895,103,2193.2,2334.0,1021,7464,995.8,vkCreateEvent
|
||||
0.0,212807,43,4949.0,3657.0,611,19857,4522.9,vkAllocateCommandBuffers
|
||||
0.0,72741,53,1372.5,1051.0,200,3727,965.4,vkCreateRenderPass2KHR
|
||||
0.0,63866,196,325.8,130.0,50,6011,775.0,vkCreateFramebuffer
|
||||
0.0,41274,44,938.0,871.5,531,1743,296.2,vkBindBufferMemory
|
||||
0.0,4037,2,2018.5,2018.5,892,3145,1593.1,vkTrimCommandPool
|
||||
0.0,3407,21,162.2,140.0,30,511,152.8,vkMapMemory
|
||||
0.0,1182,1,1182.0,1182.0,1182,1182,0.0,vkDestroySemaphore
|
||||
0.0,811,1,811.0,811.0,811,811,0.0,vkCreateCommandPool
|
||||
|
32
data/nsight/unreal/debug/unreal_phase3_60s_gpu_metrics.csv
Normal file
32
data/nsight/unreal/debug/unreal_phase3_60s_gpu_metrics.csv
Normal file
@ -0,0 +1,32 @@
|
||||
metricName,samples,avg_value,min_value,max_value
|
||||
"Vertex/Tess/Geometry Warps in Flight [Throughput %]",350101,0.37,0,53
|
||||
"Vertex/Tess/Geometry Warps in Flight [Avg]",350101,12165.35,0,2025158
|
||||
"Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle]",350101,0.3,0,34
|
||||
"Unallocated Warps in Active SMs [Throughput %]",350101,11.51,0,94
|
||||
"Unallocated Warps in Active SMs [Avg]",350101,1259925.78,-8388465,8388468
|
||||
"Unallocated Warps in Active SMs [Avg Warps per Cycle]",350101,11.05,0,90
|
||||
"Tensor Active [Throughput %]",350101,0.0,0,0
|
||||
"Sync Copy Engine Active [Throughput %]",350101,0.95,0,100
|
||||
"Sync Copy Engine Active [Cycles Active]",350101,1384.9,0,164952
|
||||
"Sync Compute in Flight [Throughput %]",350101,22.08,0,100
|
||||
"SYS Clock Frequency [MHz]",350101,1239952918.07,524860000,1650150000
|
||||
"SMs Active [Throughput %]",350101,23.22,0,100
|
||||
"SM Issue [Throughput %]",350101,6.71,0,98
|
||||
"Pixel Warps in Flight [Throughput %]",350101,4.68,0,98
|
||||
"Pixel Warps in Flight [Avg]",350101,705372.0,0,18250302
|
||||
"Pixel Warps in Flight [Avg Warps per Cycle]",350101,4.5,0,94
|
||||
"PCIe Write Requests to BAR1 [Requests]",350101,25.53,0,1565
|
||||
"PCIe TX Throughput [Throughput %]",350101,1.25,0,41
|
||||
"PCIe Read Requests to BAR1 [Requests]",350101,0.0,0,1
|
||||
"PCIe RX Throughput [Throughput %]",350101,1.4,0,98
|
||||
"GR Active [Throughput %]",350101,44.72,0,100
|
||||
"GPU Active [Throughput %]",350101,49.55,0,100
|
||||
"GPC Clock Frequency [MHz]",350101,1377049332.09,367524286,1965071429
|
||||
"DRAM Write Bandwidth [Throughput %]",350101,5.6,0,84
|
||||
"DRAM Read Bandwidth [Throughput %]",350101,8.04,0,85
|
||||
"Compute Warps in Flight [Throughput %]",350101,7.03,0,99
|
||||
"Compute Warps in Flight [Avg]",350101,1039013.63,0,17187013
|
||||
"Compute Warps in Flight [Avg Warps per Cycle]",350101,6.75,0,95
|
||||
"Async Copy Engine Active [Throughput %]",350101,10.6,0,100
|
||||
"Async Copy Engine Active [Cycles Active]",350101,16184.16,0,165001
|
||||
"Async Compute in Flight [Throughput %]",350101,0.06,0,31
|
||||
|
65
data/nsight/unreal/debug/unreal_phase3_60s_osrt_sum.csv
Normal file
65
data/nsight/unreal/debug/unreal_phase3_60s_osrt_sum.csv
Normal file
@ -0,0 +1,65 @@
|
||||
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
|
||||
66.4,966436705693,674529,1432757.8,92112.0,1001,9178558622,21169873.5,pthread_cond_wait
|
||||
17.4,252710188786,31653,7983767.4,997705.0,1001,2000056483,42373635.1,pthread_cond_timedwait
|
||||
7.0,101523471662,41663,2436777.8,19406.0,1001,100191074,12645054.7,poll
|
||||
4.6,67673864352,8686,7791142.6,6310647.5,6255339,11320895,1834156.6,usleep
|
||||
2.4,34536986482,347,99530220.4,100108890.0,1022,100164313,7588935.6,select
|
||||
1.6,23806358988,333,71490567.5,1058177.0,1007743,200233519,94227923.5,nanosleep
|
||||
0.3,4632384275,1416,3271457.8,28112.5,1002,40854143,7831839.8,pthread_rwlock_wrlock
|
||||
0.1,1591439644,988685,1609.7,1202.0,961,744090,1727.4,backtrace
|
||||
0.0,668420846,5394,123919.3,25643.0,1002,43085262,810565.8,ioctl
|
||||
0.0,658764946,29991,21965.4,8436.0,1001,7146675,159720.1,pthread_mutex_lock
|
||||
0.0,569132554,337258,1687.5,1403.0,1001,759048,2324.0,pthread_cond_broadcast
|
||||
0.0,260738440,1173,222283.4,8967.0,1012,4850494,566673.3,pthread_rwlock_rdlock
|
||||
0.0,188226238,26395,7131.1,6612.0,1001,251721,6291.5,read
|
||||
0.0,99266779,28043,3539.8,2795.0,1012,90960,2482.6,open
|
||||
0.0,87413517,4,21853379.3,21359528.5,42640,44651820,25174253.8,pthread_join
|
||||
0.0,82191035,52222,1573.9,1322.0,1001,295793,2097.1,pthread_cond_signal
|
||||
0.0,68960671,2980,23141.2,10214.0,1453,184726,25424.6,munmap
|
||||
0.0,39711726,12409,3200.2,3065.0,1002,52027,1135.3,send
|
||||
0.0,38470600,4347,8849.9,8205.0,4728,50314,2723.1,accept
|
||||
0.0,36566418,17756,2059.4,1523.0,1001,70181,1445.0,recvmsg
|
||||
0.0,30913642,28685,1077.7,1042.0,1001,15459,212.1,openat64
|
||||
0.0,26981369,4,6745342.3,8128479.5,351627,10372783,4393178.1,sem_wait
|
||||
0.0,22677741,994,22814.6,2414.0,1001,18830749,597172.7,open64
|
||||
0.0,20507089,4762,4306.4,4097.0,1002,93785,2130.6,writev
|
||||
0.0,15299486,9722,1573.7,1403.0,1001,143327,1817.7,close
|
||||
0.0,14304866,9050,1580.6,1433.0,1001,27270,1002.8,write
|
||||
0.0,10226041,1693,6040.2,5410.0,1001,118652,4851.3,mmap
|
||||
0.0,8596595,312,27553.2,7990.0,2404,994769,108895.5,mmap64
|
||||
0.0,5005413,4,1251353.3,130684.0,1432,4742613,2330667.4,futex
|
||||
0.0,4993302,1208,4133.5,1774.0,1001,95408,6514.9,fread
|
||||
0.0,3177395,83,38281.9,32491.0,16751,102342,16476.8,pthread_create
|
||||
0.0,3149925,5,629985.0,635056.0,495365,744571,89829.8,fdatasync
|
||||
0.0,2973136,55,54057.0,1342.0,1002,2791656,376034.6,fgets
|
||||
0.0,1552479,811,1914.3,1583.0,1002,18544,1289.9,fopen
|
||||
0.0,633644,2,316822.0,316822.0,269133,364511,67442.4,sem_timedwait
|
||||
0.0,590921,195,3030.4,2475.0,1182,11140,1773.9,mprotect
|
||||
0.0,559734,274,2042.8,1853.5,1233,6532,647.5,fopen64
|
||||
0.0,487806,276,1767.4,1272.0,1001,30487,2612.9,recv
|
||||
0.0,429155,195,2200.8,1492.0,1001,13656,2095.0,fclose
|
||||
0.0,191802,36,5327.8,3000.5,1684,13054,3745.7,fflush
|
||||
0.0,168936,18,9385.3,6763.0,1052,18916,6242.2,fwrite
|
||||
0.0,134122,43,3119.1,1202.0,1012,10209,2452.5,stat64
|
||||
0.0,110347,1,110347.0,110347.0,110347,110347,0.0,popen
|
||||
0.0,100831,33,3055.5,2264.0,1132,6793,1787.8,sendmsg
|
||||
0.0,88567,30,2952.2,2505.0,1012,6672,1878.2,fstat64
|
||||
0.0,85238,16,5327.4,4503.5,2865,12023,2573.0,socket
|
||||
0.0,65150,11,5922.7,6051.0,3266,9949,1968.2,connect
|
||||
0.0,51847,4,12961.8,5855.5,4749,35387,14960.6,ftruncate
|
||||
0.0,39053,7,5579.0,4719.0,3576,9477,2198.6,socketpair
|
||||
0.0,27290,9,3032.2,3857.0,1032,5189,1902.1,statx
|
||||
0.0,17302,3,5767.3,6362.0,1082,9858,4418.1,getdelim
|
||||
0.0,15838,4,3959.5,4222.5,1302,6091,1989.5,bind
|
||||
0.0,13075,6,2179.2,1327.0,1002,5501,1757.7,getc
|
||||
0.0,7404,1,7404.0,7404.0,7404,7404,0.0,pipe2
|
||||
0.0,6732,3,2244.0,1643.0,1222,3867,1421.2,flock
|
||||
0.0,6162,3,2054.0,2054.0,1774,2334,280.0,lockf
|
||||
0.0,5320,1,5320.0,5320.0,5320,5320,0.0,memfd_create
|
||||
0.0,4818,4,1204.5,1217.0,1021,1363,168.9,fcntl64
|
||||
0.0,3566,3,1188.7,1182.0,1022,1362,170.1,stat
|
||||
0.0,2695,1,2695.0,2695.0,2695,2695,0.0,pipe
|
||||
0.0,2254,2,1127.0,1127.0,1032,1222,134.4,pthread_mutex_trylock
|
||||
0.0,1763,1,1763.0,1763.0,1763,1763,0.0,listen
|
||||
0.0,1573,1,1573.0,1573.0,1573,1573,0.0,sigaction
|
||||
0.0,1253,1,1253.0,1253.0,1253,1253,0.0,fcntl
|
||||
|
@ -0,0 +1,40 @@
|
||||
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
|
||||
57.4,4437743635,231,19211011.4,14598473.0,203260,51954105,13533933.3,vkCreateComputePipelines
|
||||
14.6,1132343222,816,1387675.5,107686.5,3076,40880372,4504039.9,vkCreateGraphicsPipelines
|
||||
7.4,572377674,1,572377674.0,572377674.0,572377674,572377674,0.0,vkCreateDevice
|
||||
4.8,368437156,4590,80269.5,79609.0,16642,918036,23607.1,vkQueuePresentKHR
|
||||
4.2,321682595,74393,4324.1,2665.0,1623,2359842,13224.8,vkQueueSubmit
|
||||
3.5,268416550,1007615,266.4,190.0,40,2721608,2734.1,vkCmdBindPipeline
|
||||
2.7,206427138,724923,284.8,191.0,80,628425,1315.9,vkCmdPipelineBarrier2KHR
|
||||
1.0,76886778,74394,1033.5,942.0,230,308968,1306.9,vkBeginCommandBuffer
|
||||
0.8,62614508,118,530631.4,174531.0,37350,6218486,1003175.7,vkAllocateMemory
|
||||
0.7,57618875,3,19206291.7,3223356.0,3011360,51384159,27867052.1,vkCreateSwapchainKHR
|
||||
0.7,54538392,74393,733.1,360.0,80,56386,1219.7,vkEndCommandBuffer
|
||||
0.5,37513539,18379,2041.1,1973.0,712,34485,1433.9,vkGetAccelerationStructureBuildSizesKHR
|
||||
0.4,32629253,32,1019664.2,334039.5,217727,20860812,3625946.5,vkCreateFence
|
||||
0.2,13832441,582,23767.1,14081.0,792,170950,28061.6,vkCreateShaderModule
|
||||
0.2,11781177,15,785411.8,124453.0,35617,8939063,2266145.9,vkFreeMemory
|
||||
0.1,11528044,4634,2487.7,475.5,311,2592026,48888.4,vkWaitForFences
|
||||
0.1,11144794,1834,6076.8,480.0,390,1090429,50692.9,vkBindImageMemory
|
||||
0.1,10091060,21758,463.8,241.0,40,12684,560.5,vkCreateBuffer
|
||||
0.1,9643758,5019,1921.5,792.0,171,1059260,20884.9,vkCreateImageView
|
||||
0.1,7363917,4,1840979.3,1660674.5,1567842,2474726,425162.8,vkDeviceWaitIdle
|
||||
0.1,5886602,11884,495.3,411.0,130,31839,460.6,vkCreateAccelerationStructureKHR
|
||||
0.1,5411476,1763,3069.5,1613.0,1001,17853,3037.9,vkGetQueryPoolResults
|
||||
0.1,4993289,4590,1087.9,902.0,631,7855,510.0,vkAcquireNextImageKHR
|
||||
0.1,4377490,1834,2386.9,521.0,160,2381162,55670.5,vkCreateImage
|
||||
0.0,3287534,21695,151.5,50.0,30,32801,320.0,vkBindBufferMemory2
|
||||
0.0,3161060,9181,344.3,301.0,40,6702,190.2,vkDestroyAccelerationStructureKHR
|
||||
0.0,1724191,4590,375.6,281.0,110,2314,234.4,vkResetEvent
|
||||
0.0,1396906,4591,304.3,281.0,230,14598,221.7,vkCmdPipelineBarrier
|
||||
0.0,1130605,5421,208.6,180.0,20,6702,153.0,vkResetQueryPoolEXT
|
||||
0.0,320001,130,2461.5,2405.0,1012,25127,2232.0,vkCreateEvent
|
||||
0.0,197365,613,322.0,200.0,50,4579,390.2,vkCreateFramebuffer
|
||||
0.0,195445,38,5143.3,4022.5,421,26029,4610.3,vkAllocateCommandBuffers
|
||||
0.0,76176,52,1464.9,901.5,230,9788,1542.9,vkCreateRenderPass2KHR
|
||||
0.0,71405,63,1133.4,932.0,401,4799,838.5,vkBindBufferMemory
|
||||
0.0,6792,30,226.4,195.0,41,752,198.5,vkMapMemory
|
||||
0.0,1753,2,876.5,876.5,751,1002,177.5,vkTrimCommandPool
|
||||
0.0,1383,1,1383.0,1383.0,1383,1383,0.0,vkDestroyEvent
|
||||
0.0,1042,1,1042.0,1042.0,1042,1042,0.0,vkDestroySemaphore
|
||||
0.0,812,1,812.0,812.0,812,812,0.0,vkCreateCommandPool
|
||||
|
32
data/nsight/unreal/shipping/unreal_gpu_95s_gpu_metrics.csv
Normal file
32
data/nsight/unreal/shipping/unreal_gpu_95s_gpu_metrics.csv
Normal file
@ -0,0 +1,32 @@
|
||||
metricName,samples,avg_value,min_value,max_value
|
||||
"Async Compute in Flight [Throughput %]",950083,0.0229106299133865,0,31
|
||||
"Async Copy Engine Active [Cycles Active]",950083,38455.0993471097,0,163502
|
||||
"Async Copy Engine Active [Throughput %]",950083,24.3463129010834,0,100
|
||||
"Compute Warps in Flight [Avg Warps per Cycle]",950083,12.80763259631,0,90
|
||||
"Compute Warps in Flight [Avg]",950083,2323037.42368404,0,17059537
|
||||
"Compute Warps in Flight [Throughput %]",950083,13.3210919467036,0,94
|
||||
"DRAM Read Bandwidth [Throughput %]",950083,10.7338569367097,0,86
|
||||
"DRAM Write Bandwidth [Throughput %]",950083,9.84654182845078,0,84
|
||||
"GPC Clock Frequency [MHz]",950083,1717372249.49787,569544286,1965027143
|
||||
"GPU Active [Throughput %]",950083,80.6042019486719,0,100
|
||||
"GR Active [Throughput %]",950083,76.6064217547309,0,100
|
||||
"PCIe RX Throughput [Throughput %]",950083,1.5045643380631,0,98
|
||||
"PCIe Read Requests to BAR1 [Requests]",950083,1.36830150628945e-05,0,1
|
||||
"PCIe TX Throughput [Throughput %]",950083,1.39019959308818,0,40
|
||||
"PCIe Write Requests to BAR1 [Requests]",950083,45.6542523126927,0,864
|
||||
"Pixel Warps in Flight [Avg Warps per Cycle]",950083,8.71434179961119,0,93
|
||||
"Pixel Warps in Flight [Avg]",950083,905560.791771877,-8388505,8388149
|
||||
"Pixel Warps in Flight [Throughput %]",950083,9.07152532989223,0,97
|
||||
"SM Issue [Throughput %]",950083,13.8823755398213,0,77
|
||||
"SMs Active [Throughput %]",950083,42.6547027996501,0,100
|
||||
"SYS Clock Frequency [MHz]",950083,1464888860.94162,779690000,1650150000
|
||||
"Sync Compute in Flight [Throughput %]",950083,42.9391589997927,0,100
|
||||
"Sync Copy Engine Active [Cycles Active]",950083,1828.06791722407,0,163454
|
||||
"Sync Copy Engine Active [Throughput %]",950083,1.19980570118611,0,100
|
||||
"Tensor Active [Throughput %]",950083,0.0,0,0
|
||||
"Unallocated Warps in Active SMs [Avg Warps per Cycle]",950083,19.5389318617426,0,91
|
||||
"Unallocated Warps in Active SMs [Avg]",950083,2624316.545225,-8388583,8388581
|
||||
"Unallocated Warps in Active SMs [Throughput %]",950083,20.3578518929399,0,95
|
||||
"Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle]",950083,0.440807803107728,0,27
|
||||
"Vertex/Tess/Geometry Warps in Flight [Avg]",950083,12576.1936525546,0,2576053
|
||||
"Vertex/Tess/Geometry Warps in Flight [Throughput %]",950083,0.481443200225665,0,42
|
||||
|
66
data/nsight/unreal/shipping/unreal_gpu_95s_osrt_sum.csv
Normal file
66
data/nsight/unreal/shipping/unreal_gpu_95s_osrt_sum.csv
Normal file
@ -0,0 +1,66 @@
|
||||
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
|
||||
69.0,2638096289950,2901939,909080.5,53710.0,1001,72246678091,60228137.5,pthread_cond_wait
|
||||
17.8,681232429013,218377,3119524.6,310209.0,1001,61108235908,134160829.9,pthread_cond_timedwait
|
||||
8.0,304011116725,204020,1490104.5,20749.0,1001,100165731,9602909.9,poll
|
||||
2.5,94505613201,946,99900225.4,100113483.5,2705,100174076,4599901.0,select
|
||||
2.4,93104642661,14761,6307475.3,6307706.0,6256721,7003606,10135.6,usleep
|
||||
0.1,4779434926,3175,1505333.8,8345.0,1001,50953377,5634704.7,pthread_rwlock_wrlock
|
||||
0.1,2388475375,1554030,1537.0,1323.0,1001,760450,1334.4,pthread_cond_broadcast
|
||||
0.0,1383489571,205,6748729.6,1057145.0,1013984,200061988,27806119.4,nanosleep
|
||||
0.0,828054573,5779,143286.8,51646.0,1032,30877115,645711.0,ioctl
|
||||
0.0,524192152,23534,22273.8,7995.0,1001,9414411,223793.1,pthread_mutex_lock
|
||||
0.0,430290062,250152,1720.1,1382.0,1001,167283,1483.5,pthread_cond_signal
|
||||
0.0,269098076,1008,266962.4,12508.0,1052,4451488,635637.4,pthread_rwlock_rdlock
|
||||
0.0,118376802,60981,1941.2,1373.0,1001,296504,2094.2,recvmsg
|
||||
0.0,116650912,34929,3339.7,3066.0,1022,329706,2817.5,send
|
||||
0.0,106271585,28033,3790.9,3547.0,1012,52238,1246.4,writev
|
||||
0.0,92149772,4,23037443.0,22215069.0,65893,47653741,26542647.2,pthread_join
|
||||
0.0,77095530,3009,25621.6,11682.0,1403,209591,26233.1,munmap
|
||||
0.0,43781970,26502,1652.0,1433.0,1001,98965,1510.7,write
|
||||
0.0,31401091,29445,1066.4,1042.0,1001,10330,232.0,openat64
|
||||
0.0,28658399,4,7164599.8,8766099.5,398705,10727495,4606222.3,sem_wait
|
||||
0.0,21058353,3176,6630.5,1212.0,1001,375302,13470.9,read
|
||||
0.0,16097508,1,16097508.0,16097508.0,16097508,16097508,0.0,waitpid
|
||||
0.0,11545409,1619,7131.2,6131.0,1472,73017,4087.8,mmap
|
||||
0.0,7491899,307,24403.6,7935.0,2194,977235,93545.8,mmap64
|
||||
0.0,5173017,1452,3562.7,1482.0,1001,78697,5893.0,fread
|
||||
0.0,4010136,80,50126.7,44863.5,28353,107180,14408.6,pthread_create
|
||||
0.0,2477958,44,56317.2,1613.0,1002,2325034,350036.5,fgets
|
||||
0.0,2267572,738,3072.6,2355.0,1002,43220,3489.6,open64
|
||||
0.0,1798878,1042,1726.4,1503.0,1002,16831,1033.1,fopen
|
||||
0.0,1459879,4,364969.8,133224.0,1213,1192218,565217.9,futex
|
||||
0.0,638315,186,3431.8,2815.5,1303,28984,2550.4,mprotect
|
||||
0.0,609134,235,2592.1,1272.0,1002,44383,4000.2,close
|
||||
0.0,588290,297,1980.8,1352.0,1032,12654,1754.6,open
|
||||
0.0,548164,2,274082.0,274082.0,262250,285914,16733.0,sem_timedwait
|
||||
0.0,533925,274,1948.6,1738.0,1192,6412,573.9,fopen64
|
||||
0.0,525527,357,1472.1,1222.0,1001,10921,1187.9,recv
|
||||
0.0,404101,183,2208.2,1462.0,1022,11692,1956.2,fclose
|
||||
0.0,226864,1,226864.0,226864.0,226864,226864,0.0,fork
|
||||
0.0,193521,45,4300.5,4389.0,1102,9598,2130.9,fstat64
|
||||
0.0,147756,1,147756.0,147756.0,147756,147756,0.0,popen
|
||||
0.0,126324,40,3158.1,3401.0,1002,10289,2357.5,stat64
|
||||
0.0,101892,37,2753.8,2425.0,1172,6843,1489.7,sendmsg
|
||||
0.0,68849,15,4589.9,3356.0,2285,13996,3074.1,socket
|
||||
0.0,56233,11,5112.1,5330.0,1843,8956,1875.6,connect
|
||||
0.0,28874,7,4124.9,3747.0,2455,7784,1686.9,socketpair
|
||||
0.0,19287,9,2143.0,1353.0,1012,4368,1503.9,statx
|
||||
0.0,12704,5,2540.8,2565.0,1012,4769,1589.4,getc
|
||||
0.0,10600,3,3533.3,2785.0,1062,6753,2918.4,bind
|
||||
0.0,9418,2,4709.0,4709.0,4108,5310,849.9,ftruncate
|
||||
0.0,6973,2,3486.5,3486.5,3306,3667,255.3,pipe
|
||||
0.0,5952,1,5952.0,5952.0,5952,5952,0.0,pipe2
|
||||
0.0,5892,3,1964.0,2024.0,1463,2405,473.9,lockf
|
||||
0.0,4629,1,4629.0,4629.0,4629,4629,0.0,memfd_create
|
||||
0.0,4589,1,4589.0,4589.0,4589,4589,0.0,fstat
|
||||
0.0,4478,1,4478.0,4478.0,4478,4478,0.0,getdelim
|
||||
0.0,3526,1,3526.0,3526.0,3526,3526,0.0,fstatat
|
||||
0.0,3467,3,1155.7,1192.0,1002,1273,139.1,stat
|
||||
0.0,3386,2,1693.0,1693.0,1302,2084,553.0,fwrite
|
||||
0.0,3106,1,3106.0,3106.0,3106,3106,0.0,fwrite_unlocked
|
||||
0.0,3106,2,1553.0,1553.0,1232,1874,454.0,fcntl64
|
||||
0.0,2835,2,1417.5,1417.5,1212,1623,290.6,flock
|
||||
0.0,2665,1,2665.0,2665.0,2665,2665,0.0,fputs_unlocked
|
||||
0.0,1433,1,1433.0,1433.0,1433,1433,0.0,prctl
|
||||
0.0,1232,1,1232.0,1232.0,1232,1232,0.0,sigaction
|
||||
0.0,1102,1,1102.0,1102.0,1102,1102,0.0,fcntl
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -2,6 +2,6 @@
|
||||
"BuildId": "37670630",
|
||||
"Modules":
|
||||
{
|
||||
"BulletHellCPP": "libUnrealEditor-BulletHellCPP-6941.so"
|
||||
"BulletHellCPP": "libUnrealEditor-BulletHellCPP.so"
|
||||
}
|
||||
}
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user