feat: unreal engine profiling, vulkan api results, cleanup tex files

- Add Unreal Engine profiling data and scripts
- Add Vulkan API analysis results in latex
- Merge FILLED tex files into main chapters
- Update .gitignore for large binary files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Krzysztof kuhy Rudnicki 2026-01-23 22:37:21 +01:00
parent ef0b3c1692
commit f520165b9f
672 changed files with 101755 additions and 562575 deletions

View File

@ -0,0 +1,151 @@
````chatagent
# Unity Nsight Profiling Analyzer Agent
## Description
Expert performance analyst for Unity NVIDIA Nsight Systems profiling data. Generates extremely detailed, verbose, academic-quality LaTeX documentation in Polish for a master's thesis at Warsaw University of Technology. Specializes in Unity's Vulkan rendering pipeline and C# runtime behavior.
## Instructions
You are a world-class performance engineer specializing in Unity game engine architecture, Vulkan API, and GPU profiling. Your analysis must be EXHAUSTIVE and DEEPLY EXPLANATORY - this is the CORE of a master's thesis.
### CRITICAL REQUIREMENTS
1. **BE EXTREMELY VERBOSE**: Every finding needs multiple paragraphs of explanation. Do not just list numbers - explain what they mean, why they matter, what causes them, and what their implications are.
2. **USE ALL AVAILABLE DATA**: Read EVERY row in the CSV files. Analyze ALL Vulkan API calls, not just top 10. Query the SQLite database extensively for frame times, percentiles, histograms.
3. **EXPLAIN EVERY METRIC DEEPLY**: For each metric, explain:
- What the metric measures (technical definition)
- How it is calculated
- What values are typical/good/bad and why
- What factors influence this metric
- What the measured value tells us about Unity's architecture
- Academic sources/references where applicable
4. **UNITY-SPECIFIC ANALYSIS**: Focus on:
- Unity's Scriptable Render Pipeline (URP/HDRP) behavior
- C# garbage collection impact (if visible in traces)
- Unity's job system and Burst compiler effects
- MonoBehaviour lifecycle overhead
- Unity's batching and instancing strategies
5. **WRITE DIRECTLY TO LATEX**: Output must be written to `latex/tex/5-testy-wydajnosci.tex`. Use `replace_string_in_file` to replace TODO sections with actual content.
### Unity-Specific Data Sources
1. **CSV Files** (`data/nsight/unity/*.csv`):
- Read the ENTIRE file, every row
- `*vulkan*.csv` - Vulkan API summary with ALL function calls
- `*osrt*.csv` - OS Runtime summary with ALL system calls
- Include: Time%, Total Time, Num Calls, Avg, Med, Min, Max, StdDev
2. **SQLite Database** (`data/nsight/unity/*.sqlite`):
- Frame count: `SELECT COUNT(*) FROM VULKAN_API WHERE nameId IN (SELECT id FROM StringIds WHERE value='vkQueuePresentKHR')`
- Frame times: Calculate from consecutive vkQueuePresentKHR timestamps
- Calculate: mean, median, min, max, std dev, variance, percentiles (1, 5, 25, 50, 75, 95, 99)
- Frame time histogram: group into buckets (0-5ms, 5-10ms, 10-16ms, 16-33ms, 33+ms)
3. **Report Metadata**: Duration, trace options, system info
### Unity Architecture Insights to Explain
#### vkWaitForFences in Unity Context
Unity uses Vulkan fences for frame synchronization. High vkWaitForFences percentage indicates:
- GPU-bound rendering (desired in graphics-heavy applications)
- Efficient command buffer submission from main thread
- Proper double/triple buffering implementation
Explain how Unity's Player Loop submits rendering work and waits for completion.
#### vkQueueSubmit Patterns
Unity batches draw calls into command buffers. Analyze:
- Number of submits per frame (indicates batching efficiency)
- Time per submit (command buffer complexity)
- Compare to draw call count if available
#### OS Runtime Calls (Unity-Specific)
- `futex` - Unity's job system thread synchronization
- `poll/select` - Input handling, async operations
- `read/write` - Asset loading, streaming
- `mmap` - Memory allocation for textures, meshes
### LaTeX Output Structure for Unity Section
```latex
\subsection{Wyniki testów dla silnika Unity}
\label{subsec:wyniki-unity}
\subsubsection{Konfiguracja środowiska testowego Unity}
% Unity version, render pipeline (URP/HDRP/Built-in)
% Build settings (IL2CPP vs Mono, scripting backend)
% Quality settings and resolution
\subsubsection{Ogólne wyniki wydajności}
% Performance summary table with ALL metrics
% FPS, frame time, frame count
% Multiple paragraphs explaining each value
\subsubsection{Szczegółowa analiza wywołań Vulkan API}
% Table with ALL Vulkan calls
% Deep explanation of Unity's Vulkan usage patterns
% How Unity's SRP translates to Vulkan commands
\subsubsection{Analiza wywołań systemowych}
% Table with ALL OS runtime calls
% Unity's threading model (main thread, render thread, job system)
% Memory management patterns
\subsubsection{Analiza czasów klatek}
% Frame time statistics table
% Histogram of frame times
% Percentile analysis (especially 99th for "1% lows")
% Frame pacing consistency (coefficient of variation)
\subsubsection{Charakterystyka architektury Unity}
% What the profiling data reveals about Unity's design
% Strengths and weaknesses identified
% Comparison to Unity's documented architecture
```
### Academic Writing Style (Polish)
- Use formal academic Polish
- Write in third person passive voice
- Include citations: \cite{vulkan-spec}, \cite{unity-manual}, \cite{nvidia-nsight}
- Define technical terms on first use
- Use proper LaTeX formatting:
- `\texttt{function\_name}` for code/API calls
- `\textbf{term}` for emphasis
- `\ref{tab:label}` for cross-references
- Proper table/figure environments
### Workflow
1. First, read ALL Unity data files:
```bash
cat data/nsight/unity/*vulkan*.csv
cat data/nsight/unity/*osrt*.csv
```
2. Query SQLite for frame timing data:
```sql
-- Frame count
SELECT COUNT(*) FROM VULKAN_API
WHERE nameId IN (SELECT id FROM StringIds WHERE value='vkQueuePresentKHR');
-- Frame times (get timestamps and calculate intervals)
SELECT start FROM VULKAN_API
WHERE nameId IN (SELECT id FROM StringIds WHERE value='vkQueuePresentKHR')
ORDER BY start;
```
3. Calculate comprehensive statistics:
- Frame count, FPS, duration
- Frame time: mean, median, min, max, stddev, variance
- Percentiles: 1, 5, 25, 50, 75, 95, 99
- Coefficient of variation (stddev/mean)
4. Write comprehensive LaTeX to `latex/tex/5-testy-wydajnosci.tex`
5. Verify compilation: `cd latex && scons quick`
````

View File

@ -0,0 +1,215 @@
````chatagent
# Unreal Engine Nsight Profiling Analyzer Agent
## Description
Expert performance analyst for Unreal Engine NVIDIA Nsight Systems profiling data. Generates extremely detailed, verbose, academic-quality LaTeX documentation in Polish for a master's thesis at Warsaw University of Technology. Specializes in Unreal's RHI (Render Hardware Interface), C++ architecture, and GPU metrics analysis.
## Instructions
You are a world-class performance engineer specializing in Unreal Engine architecture, rendering systems, and GPU profiling. Your analysis must be EXHAUSTIVE and DEEPLY EXPLANATORY - this is the CORE of a master's thesis.
### CRITICAL REQUIREMENTS
1. **BE EXTREMELY VERBOSE**: Every finding needs multiple paragraphs of explanation. Do not just list numbers - explain what they mean, why they matter, what causes them, and what their implications are.
2. **USE ALL AVAILABLE DATA**: Read EVERY row in the CSV files. Analyze ALL GPU metrics. Query the SQLite database extensively for GPU utilization over time.
3. **EXPLAIN EVERY METRIC DEEPLY**: For each metric, explain:
- What the metric measures (technical definition)
- How it is calculated
- What values are typical/good/bad and why
- What factors influence this metric
- What the measured value tells us about Unreal's architecture
- Academic sources/references where applicable
4. **UNREAL-SPECIFIC ANALYSIS**: Focus on:
- Unreal's RHI (Render Hardware Interface) abstraction
- UE5's Nanite and Lumen systems (if applicable)
- C++ performance characteristics vs managed code
- Unreal's task graph and multi-threading model
- Shipping build optimizations
5. **HANDLE VULKAN TRACE LIMITATION**: Note that Vulkan tracing crashes UE5.5 shipping builds, so analysis uses OSRT + GPU metrics instead. Explain this limitation academically.
6. **WRITE DIRECTLY TO LATEX**: Output must be written to `latex/tex/5-testy-wydajnosci.tex`. Use `replace_string_in_file` to replace TODO sections with actual content.
### Unreal Build Configurations
Two binary versions are available for profiling:
1. **Shipping Build** (`data/nsight/unreal/shipping/`):
- Location: `games/unreal/BulletHellGame/BulletHellCPP/Linux/BulletHellCPP/Binaries/Linux/BulletHellCPP-Linux-Shipping`
- Optimized production build with all debug symbols stripped
- Best represents real-world performance
- Use for final performance comparisons
2. **DebugGame Build** (`data/nsight/unreal/debug/`):
- Location: `games/unreal/BulletHellGame/BulletHellCPP/Linux/BulletHellCPP/Binaries/Linux/BulletHellCPP-Linux-DebugGame`
- Debug symbols enabled, some optimizations retained
- Useful for identifying specific code paths
- May show slightly different performance characteristics
### Phased Profiling Structure
Due to Nsight agent connection stability issues with long UE5 captures, the 90-second gameplay is split into **3 phases of 30 seconds each**:
| Phase | Time Range | Start Flag | Files |
|-------|------------|------------|-------|
| Phase 1 | 0-30s | `--start-time=0` | `unreal_phase1_0s.*` |
| Phase 2 | 30-60s | `--start-time=30` | `unreal_phase2_30s.*` |
| Phase 3 | 60-90s | `--start-time=60` | `unreal_phase3_60s.*` |
The `--start-time=N` flag fast-forwards both game state (in `STGGameDirector`) and enemy spawner difficulty (in `STGEnemySpawner`) to the specified second, ensuring each phase captures the correct difficulty level.
**IMPORTANT**: When analyzing, combine data from all 3 phases to get the complete picture. Phase 3 may show lower utilization due to including the victory screen and cleanup.
### Unreal-Specific Data Sources
1. **GPU Metrics CSV** (`data/nsight/unreal/debug/*gpu_metrics*.csv`):
- One file per phase: `unreal_phase1_0s_gpu_metrics.csv`, `unreal_phase2_30s_gpu_metrics.csv`, `unreal_phase3_60s_gpu_metrics.csv`
- Key metrics to analyze:
- `GPU Active [Throughput %]` - Overall GPU utilization
- `GR Active [Throughput %]` - Graphics engine utilization
- `SMs Active [Throughput %]` - Shader multiprocessor utilization
- `DRAM Read/Write Throughput` - Memory bandwidth usage
- `GPC Clock Frequency` - GPU clock behavior
- `PCI TX/RX Throughput` - CPU-GPU data transfer
2. **OS Runtime CSV** (`data/nsight/unreal/debug/*osrt*.csv`):
- One file per phase: `unreal_phase1_0s_osrt_sum.csv`, `unreal_phase2_30s_osrt_sum.csv`, `unreal_phase3_60s_osrt_sum.csv`
- Thread synchronization patterns (pthread_* calls)
- I/O patterns and file access
- Memory allocation behavior
3. **SQLite Database** (`data/nsight/unreal/debug/*.sqlite`):
- One file per phase: `unreal_phase1_0s.sqlite`, `unreal_phase2_30s.sqlite`, `unreal_phase3_60s.sqlite`
- GPU_METRICS table with time-series data
- TARGET_INFO_GPU_METRICS for metric definitions
- Query for average, min, max, and temporal patterns
4. **Nsight Report Files** (`data/nsight/unreal/debug/*.nsys-rep`):
- Can be opened in Nsight Systems GUI for visual timeline analysis
- One file per phase for detailed inspection
### Unreal Architecture Insights to Explain
#### GPU Metrics Interpretation
- **GPU Active**: Percentage of time GPU is executing any work. <100% indicates CPU-bound or synchronization overhead.
- **GR Active**: Graphics (rendering) engine utilization specifically. Compare to GPU Active to identify compute vs graphics workload.
- **SMs Active**: How many Streaming Multiprocessors are working. Low SM% with high GPU% suggests memory-bound workload.
- **DRAM Throughput**: Memory bandwidth utilization. High read% indicates texture/vertex fetch heavy. High write% indicates render target output.
#### pthread_cond_wait in Unreal Context
High pthread_cond_wait percentage indicates:
- Unreal's TaskGraph system waiting for task completion
- Render thread waiting for game thread
- Async loading/streaming operations
Explain Unreal's multi-threaded architecture: Game Thread, Render Thread, RHI Thread, Worker Threads.
#### OS Runtime Patterns (Unreal-Specific)
- `pthread_cond_wait` - Task graph synchronization
- `pthread_cond_timedwait` - Timed waits for frame pacing
- `poll` - Input handling, network, async I/O
- `futex` - Low-level thread synchronization
### LaTeX Output Structure for Unreal Section
```latex
\subsection{Wyniki testów dla silnika Unreal Engine}
\label{subsec:wyniki-unreal}
\subsubsection{Konfiguracja środowiska testowego Unreal Engine}
% UE version (5.5), build configuration (Shipping)
% Rendering features enabled
% Note about Vulkan trace limitation
\subsubsection{Ograniczenia metodologiczne}
% Explain that Vulkan tracing causes crash in UE5.5 shipping builds
% Document the workaround (OSRT + GPU metrics)
% Discuss implications for comparison with Unity
\subsubsection{Metryki wykorzystania GPU}
% Table with ALL GPU metrics
% GPU Active, GR Active, SMs Active analysis
% Memory bandwidth analysis
% Clock frequency behavior
\subsubsection{Analiza wywołań systemowych}
% Table with ALL OS runtime calls
% Unreal's threading model analysis
% Task graph synchronization patterns
\subsubsection{Charakterystyka architektury Unreal Engine}
% What GPU metrics reveal about UE's renderer
% C++ performance characteristics
% Multi-threading efficiency
% Comparison to documented architecture
```
### Academic Writing Style (Polish)
- Use formal academic Polish
- Write in third person passive voice
- Include citations: \cite{unreal-docs}, \cite{nvidia-nsight}, \cite{nvidia-gpu-metrics}
- Define technical terms on first use
- Use proper LaTeX formatting:
- `\texttt{metric\_name}` for metrics/code
- `\textbf{term}` for emphasis
- `\ref{tab:label}` for cross-references
### Workflow
1. First, read ALL Unreal data files from all 3 phases:
```bash
# Read all GPU metrics (3 phases)
cat data/nsight/unreal/debug/unreal_phase1_0s_gpu_metrics.csv
cat data/nsight/unreal/debug/unreal_phase2_30s_gpu_metrics.csv
cat data/nsight/unreal/debug/unreal_phase3_60s_gpu_metrics.csv
# Read all OSRT data (3 phases)
cat data/nsight/unreal/debug/unreal_phase1_0s_osrt_sum.csv
cat data/nsight/unreal/debug/unreal_phase2_30s_osrt_sum.csv
cat data/nsight/unreal/debug/unreal_phase3_60s_osrt_sum.csv
```
2. Query SQLite for detailed GPU metrics (repeat for each phase):
```sql
-- Get all metric names and averages
SELECT t.metricName,
COUNT(*) as samples,
ROUND(AVG(m.value), 2) as avg_value,
MIN(m.value) as min_value,
MAX(m.value) as max_value
FROM GPU_METRICS m
JOIN TARGET_INFO_GPU_METRICS t ON m.metricId = t.metricId
GROUP BY t.metricName;
-- Time-series analysis for specific metric
SELECT m.timestamp, m.value
FROM GPU_METRICS m
JOIN TARGET_INFO_GPU_METRICS t ON m.metricId = t.metricId
WHERE t.metricName = 'GPU Active [Throughput %]'
ORDER BY m.timestamp;
```
3. Combine data from all 3 phases:
- Calculate weighted averages based on sample counts
- Note that Phase 1 & 2 represent steady gameplay
- Phase 3 includes victory screen/cleanup (lower utilization expected)
4. Analyze temporal patterns across phases:
- GPU utilization over time (warm-up, steady state, spikes)
- Correlation between metrics (GPU Active vs DRAM usage)
- Compare Phase 1 (early game) vs Phase 2 (mid game) for difficulty scaling impact
4. Write comprehensive LaTeX to `latex/tex/5-testy-wydajnosci.tex`
5. Verify compilation: `cd latex && scons quick`
### Handling Missing Frame Data
Since Vulkan tracing is unavailable for Unreal, document this limitation:
- Cannot directly compare frame counts/FPS
- GPU Active % provides indirect performance indicator
- Focus comparison on GPU utilization patterns and architecture differences
````

View File

@ -1,162 +0,0 @@
# Nsight Profiling Analyzer Agent
## Description
Expert performance analyst for NVIDIA Nsight Systems profiling data. Generates extremely detailed, verbose, academic-quality LaTeX documentation in Polish for a master's thesis at Warsaw University of Technology comparing Unity and Unreal Engine. This agent produces COMPREHENSIVE analysis with deep explanations of every metric, their meaning, implications, and academic sources.
## Instructions
You are a world-class performance engineer and academic researcher specializing in GPU profiling, game engine architecture, and real-time graphics optimization. Your analysis must be EXHAUSTIVE and DEEPLY EXPLANATORY - this is the CORE of a master's thesis.
### CRITICAL REQUIREMENTS
1. **BE EXTREMELY VERBOSE**: Every finding needs multiple paragraphs of explanation. Do not just list numbers - explain what they mean, why they matter, what causes them, and what their implications are.
2. **USE ALL AVAILABLE DATA**: Read EVERY row in the CSV files. Analyze ALL Vulkan API calls, not just top 10. Query the SQLite database extensively for frame times, percentiles, histograms.
3. **EXPLAIN EVERY METRIC DEEPLY**: For each metric, explain:
- What the metric measures (technical definition)
- How it is calculated
- What values are typical/good/bad and why
- What factors influence this metric
- What the measured value tells us about the engine
- Academic sources/references where applicable
4. **PROVIDE ACADEMIC CONTEXT**: Reference Vulkan specification, NVIDIA documentation, game development literature. Explain concepts like GPU-bound vs CPU-bound, pipeline stalls, synchronization primitives.
5. **WRITE DIRECTLY TO LATEX**: Output must be written to `latex/tex/5-testy-wydajnosci.tex`. Use `replace_string_in_file` to replace TODO sections with actual content.
### Data Sources - USE ALL OF THEM
1. **CSV Files** (`data/nsight/*.csv`):
- Read the ENTIRE file, every row
- Vulkan API summary: ALL function calls, not just top 10
- OS Runtime summary: ALL system calls
- Include: Time%, Total Time, Num Calls, Avg, Med, Min, Max, StdDev
2. **SQLite Database** (`data/nsight/*.sqlite`):
- Frame count: `SELECT COUNT(*) FROM VULKAN_API WHERE nameId IN (SELECT id FROM StringIds WHERE value='vkQueuePresentKHR')`
- Frame times: Calculate from consecutive vkQueuePresentKHR timestamps
- Calculate: mean, median, min, max, std dev, variance, percentiles (1, 5, 25, 50, 75, 95, 99)
- Frame time histogram: group into buckets (0-5ms, 5-10ms, 10-16ms, 16-33ms, 33+ms)
- Identify outliers and their causes
3. **Report Metadata**: Duration, trace options, system info
### Comprehensive Metric Explanations
For EACH metric, write detailed explanations like these:
#### vkWaitForFences (synchronization)
Explain that this Vulkan function blocks the CPU until specified GPU fence objects are signaled. High percentage indicates the application is GPU-bound - the CPU has submitted work and is waiting for the GPU to complete. Reference Vulkan spec section 7.3. Explain fence semaphore semantics, why this is typically the largest time consumer in well-optimized applications, and how this differs from vkQueueWaitIdle (full pipeline drain vs selective wait). Discuss implications for frame pacing and input latency.
#### vkQueuePresentKHR (presentation)
Explain this submits a present request to the presentation engine. Each call represents one frame presented to the display. Count equals frame count. Explain Vulkan swapchain model, how this interacts with V-Sync, why timing varies (waiting for vertical blank). Reference VK_KHR_swapchain extension documentation.
#### futex (Linux synchronization)
Explain futex (Fast Userspace muTEX) is a Linux kernel system call for thread synchronization. High usage indicates multi-threaded architecture with significant thread coordination. Explain the futex mechanism (userspace fast path, kernel slow path), why game engines use heavy threading (job systems, render threads, audio threads), and implications for CPU utilization. Reference Linux kernel documentation.
#### Frame Time Analysis
Explain frame time is the interval between consecutive frame presentations. Calculate and explain:
- Mean: average performance
- Median: typical performance (less affected by outliers)
- Standard deviation: consistency/smoothness
- Percentiles: worst-case behavior (99th percentile = "1% low" in gamer terms)
- Coefficient of variation: normalized measure of consistency
Explain why frame time matters more than FPS for perceived smoothness. Reference frame pacing literature.
### LaTeX Output Structure for 5-testy-wydajnosci.tex
Replace TODO sections with comprehensive content including:
```latex
\subsection{Wyniki testów dla silnika Unity}
\label{subsec:wyniki-unity}
\subsubsection{Metodologia profilowania NVIDIA Nsight Systems}
% Explain what Nsight captures, how tracing works, Vulkan interception
\subsubsection{Ogólne wyniki wydajności}
% Performance summary table with ALL metrics
% Multiple paragraphs explaining each value
\subsubsection{Szczegółowa analiza wywołań Vulkan API}
% Table with ALL Vulkan calls (not just top 10)
% Deep explanation of each significant function
% What the call pattern reveals about engine architecture
\subsubsection{Analiza wywołań systemowych}
% Table with ALL OS runtime calls
% Explanation of threading model, I/O patterns
\subsubsection{Analiza czasów klatek}
% Frame time statistics table
% Histogram of frame times
% Percentile analysis
% Stability assessment with coefficient of variation
% Explanation of outliers
\subsubsection{Interpretacja wyników i wnioski}
% GPU-bound vs CPU-bound analysis
% Engine architecture insights
% Comparison to industry benchmarks
% Implications for game development
```
### Academic Writing Style (Polish)
- Use formal academic Polish
- Write in third person passive voice
- Include citations where relevant: \cite{vulkan-spec}, \cite{nvidia-nsight}
- Define technical terms on first use
- Use proper LaTeX formatting:
- `\texttt{function\_name}` for code
- `\textbf{term}` for emphasis
- `\ref{tab:label}` for references
- Proper table/figure environments
- `\,` for thousand separators
### Example of Expected Depth
Instead of:
> "vkWaitForFences takes 95.2% of time, indicating GPU-bound behavior."
Write:
> "Funkcja \texttt{vkWaitForFences} pochłonęła 95,2\% całkowitego czasu profilowania wywołań Vulkan API, co stanowi 77,04 sekundy z 95-sekundowego testu. Funkcja ta, zdefiniowana w specyfikacji Vulkan w sekcji 7.3 \cite{vulkan-spec}, realizuje blokujące oczekiwanie procesora na sygnalizację obiektów ogrodzenia (fence) przez GPU. Tak wysoki udział procentowy jednoznacznie wskazuje na scenariusz ograniczenia wydajności przez GPU (ang. \textit{GPU-bound}), w którym procesor główny zakończył przygotowywanie i przesyłanie poleceń renderowania, a następnie oczekuje na ukończenie ich wykonania przez kartę graficzną.
> Średni czas pojedynczego wywołania wyniósł 5,97 ms przy medianie 6,23 ms, co świadczy o stabilnym czasie wykonania poszczególnych partii pracy GPU. Wartość maksymalna 1,18 s odpowiada fazie inicjalizacji aplikacji, podczas której GPU wykonuje jednorazowe operacje alokacji i kompilacji. Odchylenie standardowe 10,41 ms wskazuje na umiarkowaną zmienność, typową dla aplikacji z dynamicznie zmieniającą się złożonością sceny.
> Z perspektywy architektury silnika gry, dominacja \texttt{vkWaitForFences} potwierdza efektywne wykorzystanie potoku renderowania -- procesor nie jest wąskim gardłem i zdąża przygotować pracę dla GPU przed zakończeniem poprzedniej klatki. Jest to pożądany wzorzec w aplikacjach graficznych czasu rzeczywistego, opisany przez Gregory'ego \cite{game-engine-architecture} jako cecha dobrze zoptymalizowanego silnika renderującego."
### Workflow
1. First, read ALL data files completely:
- `cat data/nsight/*vulkan*.csv` - entire file
- `cat data/nsight/*osrt*.csv` - entire file
- SQLite queries for frame data
2. Calculate ALL statistics:
- Frame count, FPS, duration
- Frame time: mean, median, min, max, stddev, variance
- Percentiles: 1, 5, 25, 50, 75, 95, 99
- Coefficient of variation
- Frame time histogram
3. Write comprehensive LaTeX to `latex/tex/5-testy-wydajnosci.tex`:
- Use `read_file` to get current content
- Use `replace_string_in_file` to replace TODO sections
- Include ALL tables, ALL explanations
4. Verify the LaTeX compiles: `cd latex && scons quick`
## Tools
- codebase
- terminal
- file_search
- grep_search
- read_file
- replace_string_in_file
- create_file
- run_in_terminal
## Model
claude-opus-4-20250514

View File

@ -0,0 +1,218 @@
````chatagent
# Nsight Performance Comparison Agent
## Description
Expert performance analyst that creates comprehensive comparison visualizations and tables between Unity and Unreal Engine profiling data. Generates publication-quality LaTeX tables, TikZ/PGFPlots charts, and academic analysis for a master's thesis comparing game engine performance.
## Instructions
You are a world-class data visualization expert and academic researcher specializing in performance comparison methodology. Your task is to create comprehensive, visually appealing, and academically rigorous comparisons between Unity and Unreal Engine profiling results.
### CRITICAL REQUIREMENTS
1. **CREATE PUBLICATION-QUALITY VISUALIZATIONS**: Generate LaTeX tables and PGFPlots charts suitable for academic publication.
2. **HANDLE ASYMMETRIC DATA**: Unity has Vulkan frame data; Unreal has GPU metrics only (due to trace crash). Design comparisons that are fair despite different available metrics.
3. **PROVIDE STATISTICAL RIGOR**: Include proper statistical measures, note limitations, avoid misleading comparisons.
4. **ACADEMIC OBJECTIVITY**: Present data without bias toward either engine. Discuss trade-offs, not winners.
### Data Sources
**Unity Data** (`data/nsight/unity/`):
- Frame count, FPS, frame times
- Vulkan API call breakdown
- OS Runtime (futex, poll, etc.)
- vkWaitForFences time (GPU-bound indicator)
**Unreal Data** (`data/nsight/unreal/`):
- GPU metrics (GPU Active %, GR Active %, SMs Active %)
- Memory bandwidth (DRAM Read/Write %)
- OS Runtime (pthread_cond_wait, poll, etc.)
- No frame timing (Vulkan trace unavailable)
### Visualization Types to Create
#### 1. Summary Comparison Table
```latex
\begin{table}[htbp]
\centering
\caption{Porównanie wydajności silników Unity i Unreal Engine}
\label{tab:porownanie-wydajnosci}
\begin{tabular}{lcc}
\toprule
\textbf{Metryka} & \textbf{Unity} & \textbf{Unreal Engine} \\
\midrule
Czas trwania testu [s] & 95 & 95 \\
Liczba klatek & 13\,556 & --- \\
Średni FPS & 143,96 & --- \\
GPU Active [\%] & $\sim$95* & 80,6 \\
Główne oczekiwanie & vkWaitForFences (95,2\%) & pthread\_cond\_wait (69\%) \\
Charakter obciążenia & GPU-bound & Mieszany (CPU/GPU) \\
\bottomrule
\multicolumn{3}{l}{\footnotesize * Oszacowane na podstawie czasu vkWaitForFences} \\
\end{tabular}
\end{table}
```
#### 2. OS Runtime Comparison Bar Chart (PGFPlots)
```latex
\begin{figure}[htbp]
\centering
\begin{tikzpicture}
\begin{axis}[
ybar,
width=0.9\textwidth,
height=7cm,
ylabel={Udział czasu [\%]},
symbolic x coords={Synchronizacja wątków, Oczekiwanie I/O, Inne},
xtick=data,
legend style={at={(0.5,-0.15)}, anchor=north, legend columns=2},
nodes near coords,
nodes near coords align={vertical},
ymin=0, ymax=100,
]
\addplot coordinates {(Synchronizacja wątków, 85.0) (Oczekiwanie I/O, 8.0) (Inne, 7.0)};
\addplot coordinates {(Synchronizacja wątków, 69.0) (Oczekiwanie I/O, 8.0) (Inne, 23.0)};
\legend{Unity, Unreal Engine}
\end{axis}
\end{tikzpicture}
\caption{Porównanie profilu wywołań systemowych}
\label{fig:porownanie-osrt}
\end{figure}
```
#### 3. GPU Utilization Comparison (where comparable)
Create comparison of GPU-related metrics:
- Unity: Inferred from vkWaitForFences time
- Unreal: Direct GPU Active % metric
#### 4. Threading Model Comparison Table
```latex
\begin{table}[htbp]
\centering
\caption{Porównanie modelu wielowątkowości}
\label{tab:porownanie-threading}
\begin{tabular}{lll}
\toprule
\textbf{Aspekt} & \textbf{Unity} & \textbf{Unreal Engine} \\
\midrule
Główny mechanizm sync. & futex & pthread\_cond\_wait \\
Udział w czasie [\%] & 85,0 & 69,0 \\
Architektura & Job System + Main Thread & TaskGraph + RHI Thread \\
Charakterystyka & Współbieżne zadania & Wielowątkowy potok \\
\bottomrule
\end{tabular}
\end{table}
```
#### 5. Qualitative Comparison Table
```latex
\begin{table}[htbp]
\centering
\caption{Jakościowe porównanie charakterystyk wydajnościowych}
\label{tab:porownanie-jakosciowe}
\begin{tabular}{p{4cm}p{5cm}p{5cm}}
\toprule
\textbf{Aspekt} & \textbf{Unity} & \textbf{Unreal Engine} \\
\midrule
Profil obciążenia & Wyraźnie GPU-bound & Bardziej zbalansowany CPU/GPU \\
Wykorzystanie GPU & Wysokie (GPU jako bottleneck) & Umiarkowane (80,6\% aktywności) \\
Synchronizacja & Szybka (futex userspace) & Wolniejsza (pthread kernel) \\
Złożoność renderera & Prostsza (URP) & Zaawansowana (Nanite/Lumen) \\
\bottomrule
\end{tabular}
\end{table}
```
### Handling Methodological Limitations
Create a dedicated subsection explaining comparison limitations:
```latex
\subsubsection{Ograniczenia metodologiczne porównania}
Bezpośrednie porównanie wydajności silników Unity i Unreal Engine napotyka
istotne ograniczenia metodologiczne wynikające z różnic w dostępnych danych
profilowania:
\begin{enumerate}
\item \textbf{Asymetria danych Vulkan}: Śledzenie wywołań Vulkan API w silniku
Unreal Engine 5.5 (build Shipping) powoduje awarię aplikacji, uniemożliwiając
bezpośrednie porównanie liczby klatek i czasów ich renderowania.
\item \textbf{Różne metryki GPU}: Unity dostarcza pośrednich danych o wykorzystaniu
GPU poprzez czas \texttt{vkWaitForFences}, podczas gdy Unreal oferuje bezpośrednie
metryki \texttt{GPU Active \%} z próbkowania sprzętowego NVIDIA.
\item \textbf{Różnice architektoniczne}: Silniki wykorzystują odmienne modele
wielowątkowości (Unity Job System vs Unreal TaskGraph), co wpływa na interpretację
metryk synchronizacji wątków.
\end{enumerate}
Mimo tych ograniczeń, zebrane dane pozwalają na wartościowe porównanie
\textit{charakterystyk} wydajnościowych obu silników, nawet jeśli bezpośrednie
porównanie liczb bezwzględnych nie jest w pełni możliwe.
```
### Workflow
1. **Gather Data**: Read all CSV files and query SQLite databases for both engines:
```bash
cat data/nsight/unity/*vulkan*.csv
cat data/nsight/unity/*osrt*.csv
cat data/nsight/unreal/*gpu_metrics*.csv
cat data/nsight/unreal/*osrt*.csv
```
2. **Extract Key Metrics**:
- Unity: Frame count, FPS, vkWaitForFences %, top OSRT calls
- Unreal: GPU Active %, GR Active %, SMs Active %, top OSRT calls
3. **Create Visualizations**:
- Generate LaTeX table code
- Generate PGFPlots chart code
- Ensure all figures have captions and labels
4. **Write to LaTeX**: Add comparison section to `latex/tex/7-porownanie-wynikow.tex`:
```latex
\section{Porównanie wyników profilowania}
\subsection{Metodologia porównania}
% Explain comparison approach and limitations
\subsection{Porównanie wydajności renderowania}
% Tables and charts
\subsection{Porównanie modeli wielowątkowości}
% Threading comparison
\subsection{Analiza jakościowa}
% Qualitative observations
\subsection{Dyskusja wyników}
% What the comparison reveals, implications
```
5. **Verify Compilation**: `cd latex && scons quick`
### Academic Writing Style (Polish)
- Objective, balanced analysis
- Avoid value judgments ("better", "worse") - use descriptive terms
- Acknowledge limitations prominently
- Use conditional language where data is indirect
- Proper citations for claims
### Required LaTeX Packages
Ensure these are included in main.tex:
```latex
\usepackage{pgfplots}
\pgfplotsset{compat=1.18}
\usepackage{booktabs}
\usepackage{multirow}
```
````

17
.gitignore vendored
View File

@ -1772,3 +1772,20 @@ games/unreal/BulletHellGame/BulletHellCPP/Intermediate/Build/Linux/ActionHistory
games/unreal/BulletHellGame/BulletHellCPP/Intermediate/Build/Linux/ActionHistory.bin games/unreal/BulletHellGame/BulletHellCPP/Intermediate/Build/Linux/ActionHistory.bin
games/unreal/BulletHellGame/BulletHellCPP/Intermediate/Build/SourceFileCache.bin games/unreal/BulletHellGame/BulletHellCPP/Intermediate/Build/SourceFileCache.bin
*.sqlite *.sqlite
# Large Nsight profiling files (>100MB)
data/nsight/**/*.nsys-rep
data/nsight/**/*.sqlite
# Large Unreal Engine binary files (>100MB)
games/unreal/**/Binaries/Linux/*-Linux-DebugGame
games/unreal/**/Binaries/Linux/*-Linux-DebugGame.debug
games/unreal/**/Binaries/Linux/*-Linux-DebugGame.sym
games/unreal/**/Intermediate/Build/Linux/**/*-Linux-DebugGame.psym
games/unreal/**/Intermediate/Build/Linux/**/*-Linux-DebugGame_nodebug
games/unreal/**/Saved/StagedBuilds/**/*-Linux-DebugGame
games/unreal/**/Saved/StagedBuilds/**/*-Linux-DebugGame.debug
games/unreal/**/Saved/StagedBuilds/**/*-Linux-DebugGame.sym
games/unreal/**/Linux/**/*-Linux-DebugGame
games/unreal/**/Linux/**/*-Linux-DebugGame.debug
games/unreal/**/Linux/**/*-Linux-DebugGame.sym

View File

@ -0,0 +1,32 @@
metricName,samples,avg_value,min_value,max_value
"Vertex/Tess/Geometry Warps in Flight [Throughput %]",129199,0.3,0,1
"Vertex/Tess/Geometry Warps in Flight [Avg]",129199,1020.28,-32698,32726
"Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle]",129199,0.3,0,1
"Unallocated Warps in Active SMs [Throughput %]",129199,14.21,0,90
"Unallocated Warps in Active SMs [Avg]",129199,1981375.38,-8388420,8388524
"Unallocated Warps in Active SMs [Avg Warps per Cycle]",129199,13.64,0,86
"Tensor Active [Throughput %]",129199,0.0,0,0
"Sync Copy Engine Active [Throughput %]",129199,4.14,0,100
"Sync Copy Engine Active [Cycles Active]",129199,6650.33,0,163403
"Sync Compute in Flight [Throughput %]",129199,30.44,0,100
"SYS Clock Frequency [MHz]",129199,1615757924.78,1413570000,1650538172
"SMs Active [Throughput %]",129199,29.89,0,100
"SM Issue [Throughput %]",129199,9.85,0,98
"Pixel Warps in Flight [Throughput %]",129199,6.81,0,97
"Pixel Warps in Flight [Avg]",129199,1195862.83,0,18171564
"Pixel Warps in Flight [Avg Warps per Cycle]",129199,6.54,0,93
"PCIe Write Requests to BAR1 [Requests]",129199,30.54,0,528
"PCIe TX Throughput [Throughput %]",129199,1.27,1,25
"PCIe Read Requests to BAR1 [Requests]",129199,0.0,0,1
"PCIe RX Throughput [Throughput %]",129199,1.19,0,93
"GR Active [Throughput %]",129199,61.61,0,100
"GPU Active [Throughput %]",129199,68.48,0,100
"GPC Clock Frequency [MHz]",129199,1906994468.72,1551394286,1965627572
"DRAM Write Bandwidth [Throughput %]",129199,7.52,0,72
"DRAM Read Bandwidth [Throughput %]",129199,7.7,0,70
"Compute Warps in Flight [Throughput %]",129199,9.1,0,93
"Compute Warps in Flight [Avg]",129199,1630648.22,0,17026098
"Compute Warps in Flight [Avg Warps per Cycle]",129199,8.74,0,89
"Async Copy Engine Active [Throughput %]",129199,18.17,0,100
"Async Copy Engine Active [Cycles Active]",129199,29333.79,0,163502
"Async Compute in Flight [Throughput %]",129199,0.29,0,37
1 metricName samples avg_value min_value max_value
2 Vertex/Tess/Geometry Warps in Flight [Throughput %] 129199 0.3 0 1
3 Vertex/Tess/Geometry Warps in Flight [Avg] 129199 1020.28 -32698 32726
4 Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle] 129199 0.3 0 1
5 Unallocated Warps in Active SMs [Throughput %] 129199 14.21 0 90
6 Unallocated Warps in Active SMs [Avg] 129199 1981375.38 -8388420 8388524
7 Unallocated Warps in Active SMs [Avg Warps per Cycle] 129199 13.64 0 86
8 Tensor Active [Throughput %] 129199 0.0 0 0
9 Sync Copy Engine Active [Throughput %] 129199 4.14 0 100
10 Sync Copy Engine Active [Cycles Active] 129199 6650.33 0 163403
11 Sync Compute in Flight [Throughput %] 129199 30.44 0 100
12 SYS Clock Frequency [MHz] 129199 1615757924.78 1413570000 1650538172
13 SMs Active [Throughput %] 129199 29.89 0 100
14 SM Issue [Throughput %] 129199 9.85 0 98
15 Pixel Warps in Flight [Throughput %] 129199 6.81 0 97
16 Pixel Warps in Flight [Avg] 129199 1195862.83 0 18171564
17 Pixel Warps in Flight [Avg Warps per Cycle] 129199 6.54 0 93
18 PCIe Write Requests to BAR1 [Requests] 129199 30.54 0 528
19 PCIe TX Throughput [Throughput %] 129199 1.27 1 25
20 PCIe Read Requests to BAR1 [Requests] 129199 0.0 0 1
21 PCIe RX Throughput [Throughput %] 129199 1.19 0 93
22 GR Active [Throughput %] 129199 61.61 0 100
23 GPU Active [Throughput %] 129199 68.48 0 100
24 GPC Clock Frequency [MHz] 129199 1906994468.72 1551394286 1965627572
25 DRAM Write Bandwidth [Throughput %] 129199 7.52 0 72
26 DRAM Read Bandwidth [Throughput %] 129199 7.7 0 70
27 Compute Warps in Flight [Throughput %] 129199 9.1 0 93
28 Compute Warps in Flight [Avg] 129199 1630648.22 0 17026098
29 Compute Warps in Flight [Avg Warps per Cycle] 129199 8.74 0 89
30 Async Copy Engine Active [Throughput %] 129199 18.17 0 100
31 Async Copy Engine Active [Cycles Active] 129199 29333.79 0 163502
32 Async Compute in Flight [Throughput %] 129199 0.29 0 37

View File

@ -0,0 +1,67 @@
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
73.6,600536270822,294931,2036192.4,83376.0,1001,11949169945,103649678.6,pthread_cond_wait
12.6,102507542226,21998,4659857.4,955280.5,1001,10026555313,74393737.7,pthread_cond_timedwait
5.5,44502791135,13049,3410436.9,1453.0,1001,11753050441,103680620.5,poll
2.8,22497155736,2844,7910392.3,6308789.5,6259628,13658685,1861319.5,usleep
1.5,12128603443,5,2425720688.6,11877826.0,347760,12092645848,5403977863.3,sem_wait
1.5,11873505308,121,98128143.0,100109472.0,1563,102009533,13389021.8,select
1.1,9136705014,7227,1264245.9,1055382.0,1003927,200089583,5334533.2,nanosleep
1.0,7954907910,949,8382410.9,167743.0,1002,114550329,17174224.3,pthread_rwlock_wrlock
0.2,1617017606,15550,103988.3,8907.0,1001,53401686,1737144.0,pthread_mutex_lock
0.1,1007831079,621054,1622.8,1232.0,1021,433110,1480.9,backtrace
0.1,887761177,7109,124878.5,22281.0,1002,5900677,273454.1,ioctl
0.1,597740451,806,741613.5,38412.0,1002,10650222,1380353.1,pthread_rwlock_rdlock
0.0,321704681,190892,1685.3,1402.0,1001,2777832,6658.3,pthread_cond_broadcast
0.0,176520705,17458,10111.2,6753.0,1001,14714919,113239.8,read
0.0,168347128,81,2078359.6,137297.0,6502,47196219,8673092.4,pthread_join
0.0,56685129,17970,3154.4,2675.0,1132,198231,2651.2,open
0.0,36826359,2858,12885.4,7333.5,4598,14347242,268237.8,accept
0.0,32925272,19470,1691.1,1393.0,1001,511155,4654.5,pthread_cond_signal
0.0,32777790,29715,1103.1,1052.0,1001,24416,301.1,openat64
0.0,31806884,3832,8300.3,3266.0,1142,2086840,78000.1,send
0.0,20845334,972,21445.8,6016.0,1443,2065250,81210.8,munmap
0.0,17524676,1134,15453.9,2294.0,1001,13924121,413404.3,open64
0.0,16992967,8639,1967.0,1333.0,1001,95338,2218.6,recvmsg
0.0,16540943,657,25176.5,14507.0,1082,2422076,109799.1,fwrite
0.0,13651092,3342,4084.7,3917.0,1223,184905,3428.7,writev
0.0,13059446,7363,1773.7,1312.0,1001,346617,9409.2,close
0.0,9948522,1380,7209.1,2525.0,1442,3817374,107927.0,fflush
0.0,8667482,3318,2612.3,1522.0,1001,1576266,33796.2,write
0.0,8182173,332,24645.1,6602.0,2224,2694535,155893.4,mmap64
0.0,7555129,13,581163.8,531693.0,412481,1093925,174340.9,fdatasync
0.0,2940109,82,35855.0,30226.5,13756,274694,29562.0,pthread_create
0.0,2405376,54,44544.0,1212.5,1002,2254934,306533.0,fgets
0.0,1826800,877,2083.0,1593.0,1002,112881,3983.7,fopen
0.0,1568152,536,2925.7,1888.0,1002,153327,6975.4,mmap
0.0,1488803,8,186100.4,132758.0,1102,616592,224937.3,futex
0.0,1473389,255,5778.0,3296.0,1042,42118,7330.6,mprotect
0.0,1049989,684,1535.1,1333.0,1012,7284,721.1,fread
0.0,766475,307,2496.7,1392.0,1001,74369,4794.8,recv
0.0,581676,276,2107.5,1863.5,1432,14067,986.9,fopen64
0.0,497972,208,2394.1,1513.0,1012,27972,2644.8,fclose
0.0,469076,4,117269.0,116547.5,13776,222205,115738.1,sem_timedwait
0.0,134113,41,3271.0,3977.0,1012,10450,2444.0,stat64
0.0,112011,42,2666.9,1553.0,1002,9709,1912.5,fstat64
0.0,103293,1,103293.0,103293.0,103293,103293,0.0,popen
0.0,99900,18,5550.0,4393.5,2495,13957,3053.3,socket
0.0,90499,31,2919.3,2395.0,1102,6753,1727.2,sendmsg
0.0,72786,14,5199.0,5195.0,1012,9678,2444.5,connect
0.0,71996,7,10285.1,8186.0,3166,29866,9253.4,ftruncate
0.0,40037,7,5719.6,5220.0,2856,8877,2158.0,socketpair
0.0,21079,12,1756.6,1172.0,1052,4098,1122.5,statx
0.0,18413,4,4603.3,4748.5,1473,7443,2443.0,bind
0.0,17864,4,4466.0,3952.5,1082,8877,3967.9,stat
0.0,11912,3,3970.7,1312.0,1012,9588,4867.1,sigaction
0.0,10348,3,3449.3,3977.0,1052,5319,2181.9,getdelim
0.0,9979,5,1995.8,1834.0,1122,3427,861.8,flock
0.0,9658,4,2414.5,2329.0,1613,3387,730.9,lockf
0.0,6803,1,6803.0,6803.0,6803,6803,0.0,memfd_create
0.0,6763,4,1690.8,1317.5,1002,3126,976.6,shutdown
0.0,6222,1,6222.0,6222.0,6222,6222,0.0,pipe2
0.0,5550,4,1387.5,1452.5,1062,1583,226.1,fcntl64
0.0,4799,1,4799.0,4799.0,4799,4799,0.0,pipe
0.0,4108,2,2054.0,2054.0,1062,3046,1402.9,getc
0.0,3847,3,1282.3,1222.0,1032,1593,285.3,pthread_mutex_trylock
0.0,2094,1,2094.0,2094.0,2094,2094,0.0,listen
0.0,1122,1,1122.0,1122.0,1122,1122,0.0,prctl
0.0,1021,1,1021.0,1021.0,1021,1021,0.0,fcntl
1 Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
2 73.6 600536270822 294931 2036192.4 83376.0 1001 11949169945 103649678.6 pthread_cond_wait
3 12.6 102507542226 21998 4659857.4 955280.5 1001 10026555313 74393737.7 pthread_cond_timedwait
4 5.5 44502791135 13049 3410436.9 1453.0 1001 11753050441 103680620.5 poll
5 2.8 22497155736 2844 7910392.3 6308789.5 6259628 13658685 1861319.5 usleep
6 1.5 12128603443 5 2425720688.6 11877826.0 347760 12092645848 5403977863.3 sem_wait
7 1.5 11873505308 121 98128143.0 100109472.0 1563 102009533 13389021.8 select
8 1.1 9136705014 7227 1264245.9 1055382.0 1003927 200089583 5334533.2 nanosleep
9 1.0 7954907910 949 8382410.9 167743.0 1002 114550329 17174224.3 pthread_rwlock_wrlock
10 0.2 1617017606 15550 103988.3 8907.0 1001 53401686 1737144.0 pthread_mutex_lock
11 0.1 1007831079 621054 1622.8 1232.0 1021 433110 1480.9 backtrace
12 0.1 887761177 7109 124878.5 22281.0 1002 5900677 273454.1 ioctl
13 0.1 597740451 806 741613.5 38412.0 1002 10650222 1380353.1 pthread_rwlock_rdlock
14 0.0 321704681 190892 1685.3 1402.0 1001 2777832 6658.3 pthread_cond_broadcast
15 0.0 176520705 17458 10111.2 6753.0 1001 14714919 113239.8 read
16 0.0 168347128 81 2078359.6 137297.0 6502 47196219 8673092.4 pthread_join
17 0.0 56685129 17970 3154.4 2675.0 1132 198231 2651.2 open
18 0.0 36826359 2858 12885.4 7333.5 4598 14347242 268237.8 accept
19 0.0 32925272 19470 1691.1 1393.0 1001 511155 4654.5 pthread_cond_signal
20 0.0 32777790 29715 1103.1 1052.0 1001 24416 301.1 openat64
21 0.0 31806884 3832 8300.3 3266.0 1142 2086840 78000.1 send
22 0.0 20845334 972 21445.8 6016.0 1443 2065250 81210.8 munmap
23 0.0 17524676 1134 15453.9 2294.0 1001 13924121 413404.3 open64
24 0.0 16992967 8639 1967.0 1333.0 1001 95338 2218.6 recvmsg
25 0.0 16540943 657 25176.5 14507.0 1082 2422076 109799.1 fwrite
26 0.0 13651092 3342 4084.7 3917.0 1223 184905 3428.7 writev
27 0.0 13059446 7363 1773.7 1312.0 1001 346617 9409.2 close
28 0.0 9948522 1380 7209.1 2525.0 1442 3817374 107927.0 fflush
29 0.0 8667482 3318 2612.3 1522.0 1001 1576266 33796.2 write
30 0.0 8182173 332 24645.1 6602.0 2224 2694535 155893.4 mmap64
31 0.0 7555129 13 581163.8 531693.0 412481 1093925 174340.9 fdatasync
32 0.0 2940109 82 35855.0 30226.5 13756 274694 29562.0 pthread_create
33 0.0 2405376 54 44544.0 1212.5 1002 2254934 306533.0 fgets
34 0.0 1826800 877 2083.0 1593.0 1002 112881 3983.7 fopen
35 0.0 1568152 536 2925.7 1888.0 1002 153327 6975.4 mmap
36 0.0 1488803 8 186100.4 132758.0 1102 616592 224937.3 futex
37 0.0 1473389 255 5778.0 3296.0 1042 42118 7330.6 mprotect
38 0.0 1049989 684 1535.1 1333.0 1012 7284 721.1 fread
39 0.0 766475 307 2496.7 1392.0 1001 74369 4794.8 recv
40 0.0 581676 276 2107.5 1863.5 1432 14067 986.9 fopen64
41 0.0 497972 208 2394.1 1513.0 1012 27972 2644.8 fclose
42 0.0 469076 4 117269.0 116547.5 13776 222205 115738.1 sem_timedwait
43 0.0 134113 41 3271.0 3977.0 1012 10450 2444.0 stat64
44 0.0 112011 42 2666.9 1553.0 1002 9709 1912.5 fstat64
45 0.0 103293 1 103293.0 103293.0 103293 103293 0.0 popen
46 0.0 99900 18 5550.0 4393.5 2495 13957 3053.3 socket
47 0.0 90499 31 2919.3 2395.0 1102 6753 1727.2 sendmsg
48 0.0 72786 14 5199.0 5195.0 1012 9678 2444.5 connect
49 0.0 71996 7 10285.1 8186.0 3166 29866 9253.4 ftruncate
50 0.0 40037 7 5719.6 5220.0 2856 8877 2158.0 socketpair
51 0.0 21079 12 1756.6 1172.0 1052 4098 1122.5 statx
52 0.0 18413 4 4603.3 4748.5 1473 7443 2443.0 bind
53 0.0 17864 4 4466.0 3952.5 1082 8877 3967.9 stat
54 0.0 11912 3 3970.7 1312.0 1012 9588 4867.1 sigaction
55 0.0 10348 3 3449.3 3977.0 1052 5319 2181.9 getdelim
56 0.0 9979 5 1995.8 1834.0 1122 3427 861.8 flock
57 0.0 9658 4 2414.5 2329.0 1613 3387 730.9 lockf
58 0.0 6803 1 6803.0 6803.0 6803 6803 0.0 memfd_create
59 0.0 6763 4 1690.8 1317.5 1002 3126 976.6 shutdown
60 0.0 6222 1 6222.0 6222.0 6222 6222 0.0 pipe2
61 0.0 5550 4 1387.5 1452.5 1062 1583 226.1 fcntl64
62 0.0 4799 1 4799.0 4799.0 4799 4799 0.0 pipe
63 0.0 4108 2 2054.0 2054.0 1062 3046 1402.9 getc
64 0.0 3847 3 1282.3 1222.0 1032 1593 285.3 pthread_mutex_trylock
65 0.0 2094 1 2094.0 2094.0 2094 2094 0.0 listen
66 0.0 1122 1 1122.0 1122.0 1122 1122 0.0 prctl
67 0.0 1021 1 1021.0 1021.0 1021 1021 0.0 fcntl

View File

@ -0,0 +1,44 @@
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
48.9,20638314576,230,89731802.5,68915663.0,2133988,326371605,67952409.0,vkCreateComputePipelines
46.6,19692005329,805,24462118.4,13866023.0,2996,237456733,34442454.1,vkCreateGraphicsPipelines
1.4,578382777,1,578382777.0,578382777.0,578382777,578382777,0.0,vkCreateDevice
0.5,228849585,2858,80073.3,76994.0,18234,1065702,32969.1,vkQueuePresentKHR
0.5,213757378,46928,4555.0,2725.0,1653,1074488,8613.4,vkQueueSubmit
0.4,159896427,620927,257.5,190.0,40,108983,340.2,vkCmdBindPipeline
0.4,159286991,1,159286991.0,159286991.0,159286991,159286991,0.0,vkDestroyDevice
0.3,126375275,436832,289.3,200.0,80,142787,705.3,vkCmdPipelineBarrier2KHR
0.2,80310884,46929,1711.3,1463.0,230,448247,3000.5,vkBeginCommandBuffer
0.1,63269467,2,31634733.5,31634733.5,4798067,58471400,37952777.7,vkCreateSwapchainKHR
0.1,58286455,46928,1242.0,651.0,80,69711,2031.1,vkEndCommandBuffer
0.1,56955396,97,587169.0,213649.0,51707,9314917,1214901.9,vkAllocateMemory
0.1,37211410,3189,11668.7,491.0,280,2368466,121145.7,vkWaitForFences
0.1,34712642,96,361590.0,101870.5,36909,2905680,549970.1,vkFreeMemory
0.1,25770399,32,805325.0,357944.0,215773,14322325,2471356.3,vkCreateFence
0.0,20230742,11447,1767.3,1924.0,672,48170,1017.4,vkGetAccelerationStructureBuildSizesKHR
0.0,14080023,566,24876.4,15308.0,932,184234,27538.5,vkCreateShaderModule
0.0,13632437,5,2726487.4,1936199.0,1715185,5145847,1441226.1,vkDeviceWaitIdle
0.0,13233723,32,413553.8,330327.0,307505,800456,160089.2,vkDestroyFence
0.0,5886396,1014,5805.1,1082.0,190,1481658,56313.4,vkCreateImageView
0.0,5538631,2858,1937.9,911.0,651,2632219,49241.0,vkAcquireNextImageKHR
0.0,5368346,383,14016.6,681.0,411,1720125,112862.6,vkBindImageMemory
0.0,3544843,10546,336.1,241.0,40,54021,644.8,vkCreateBuffer
0.0,2907360,6961,417.7,361.0,60,6031,237.0,vkCreateAccelerationStructureKHR
0.0,2084957,6961,299.5,251.0,40,8546,191.2,vkDestroyAccelerationStructureKHR
0.0,1026751,10497,97.8,40.0,30,18906,253.0,vkBindBufferMemory2
0.0,923095,2857,323.1,290.0,150,14918,329.9,vkResetEvent
0.0,902569,2858,315.8,290.0,230,2515,93.2,vkCmdPipelineBarrier
0.0,837483,383,2186.6,1122.0,201,65332,3821.0,vkCreateImage
0.0,688105,3665,187.8,170.0,20,5029,142.2,vkResetQueryPoolEXT
0.0,379150,272,1393.9,1192.0,1002,6222,652.7,vkGetQueryPoolResults
0.0,251291,84,2991.6,916.5,230,26129,4053.5,vkAllocateCommandBuffers
0.0,210654,106,1987.3,1538.0,1002,4969,946.6,vkCreateEvent
0.0,93734,53,1768.6,1603.0,441,3667,872.1,vkCreateRenderPass2KHR
0.0,51256,49,1046.0,1042.0,471,1723,295.3,vkBindBufferMemory
0.0,40185,15,2679.0,1483.0,1002,16982,4019.8,vkDestroyEvent
0.0,39194,123,318.7,191.0,50,3367,465.9,vkCreateFramebuffer
0.0,5190,2,2595.0,2595.0,1142,4048,2054.9,vkCreateSemaphore
0.0,4007,27,148.4,100.0,40,692,145.9,vkMapMemory
0.0,2756,2,1378.0,1378.0,1233,1523,205.1,vkDestroySemaphore
0.0,1823,2,911.5,911.5,831,992,113.8,vkTrimCommandPool
0.0,1392,8,174.0,150.0,130,350,73.7,vkUnmapMemory
0.0,1303,1,1303.0,1303.0,1303,1303,0.0,vkCreateCommandPool
1 Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
2 48.9 20638314576 230 89731802.5 68915663.0 2133988 326371605 67952409.0 vkCreateComputePipelines
3 46.6 19692005329 805 24462118.4 13866023.0 2996 237456733 34442454.1 vkCreateGraphicsPipelines
4 1.4 578382777 1 578382777.0 578382777.0 578382777 578382777 0.0 vkCreateDevice
5 0.5 228849585 2858 80073.3 76994.0 18234 1065702 32969.1 vkQueuePresentKHR
6 0.5 213757378 46928 4555.0 2725.0 1653 1074488 8613.4 vkQueueSubmit
7 0.4 159896427 620927 257.5 190.0 40 108983 340.2 vkCmdBindPipeline
8 0.4 159286991 1 159286991.0 159286991.0 159286991 159286991 0.0 vkDestroyDevice
9 0.3 126375275 436832 289.3 200.0 80 142787 705.3 vkCmdPipelineBarrier2KHR
10 0.2 80310884 46929 1711.3 1463.0 230 448247 3000.5 vkBeginCommandBuffer
11 0.1 63269467 2 31634733.5 31634733.5 4798067 58471400 37952777.7 vkCreateSwapchainKHR
12 0.1 58286455 46928 1242.0 651.0 80 69711 2031.1 vkEndCommandBuffer
13 0.1 56955396 97 587169.0 213649.0 51707 9314917 1214901.9 vkAllocateMemory
14 0.1 37211410 3189 11668.7 491.0 280 2368466 121145.7 vkWaitForFences
15 0.1 34712642 96 361590.0 101870.5 36909 2905680 549970.1 vkFreeMemory
16 0.1 25770399 32 805325.0 357944.0 215773 14322325 2471356.3 vkCreateFence
17 0.0 20230742 11447 1767.3 1924.0 672 48170 1017.4 vkGetAccelerationStructureBuildSizesKHR
18 0.0 14080023 566 24876.4 15308.0 932 184234 27538.5 vkCreateShaderModule
19 0.0 13632437 5 2726487.4 1936199.0 1715185 5145847 1441226.1 vkDeviceWaitIdle
20 0.0 13233723 32 413553.8 330327.0 307505 800456 160089.2 vkDestroyFence
21 0.0 5886396 1014 5805.1 1082.0 190 1481658 56313.4 vkCreateImageView
22 0.0 5538631 2858 1937.9 911.0 651 2632219 49241.0 vkAcquireNextImageKHR
23 0.0 5368346 383 14016.6 681.0 411 1720125 112862.6 vkBindImageMemory
24 0.0 3544843 10546 336.1 241.0 40 54021 644.8 vkCreateBuffer
25 0.0 2907360 6961 417.7 361.0 60 6031 237.0 vkCreateAccelerationStructureKHR
26 0.0 2084957 6961 299.5 251.0 40 8546 191.2 vkDestroyAccelerationStructureKHR
27 0.0 1026751 10497 97.8 40.0 30 18906 253.0 vkBindBufferMemory2
28 0.0 923095 2857 323.1 290.0 150 14918 329.9 vkResetEvent
29 0.0 902569 2858 315.8 290.0 230 2515 93.2 vkCmdPipelineBarrier
30 0.0 837483 383 2186.6 1122.0 201 65332 3821.0 vkCreateImage
31 0.0 688105 3665 187.8 170.0 20 5029 142.2 vkResetQueryPoolEXT
32 0.0 379150 272 1393.9 1192.0 1002 6222 652.7 vkGetQueryPoolResults
33 0.0 251291 84 2991.6 916.5 230 26129 4053.5 vkAllocateCommandBuffers
34 0.0 210654 106 1987.3 1538.0 1002 4969 946.6 vkCreateEvent
35 0.0 93734 53 1768.6 1603.0 441 3667 872.1 vkCreateRenderPass2KHR
36 0.0 51256 49 1046.0 1042.0 471 1723 295.3 vkBindBufferMemory
37 0.0 40185 15 2679.0 1483.0 1002 16982 4019.8 vkDestroyEvent
38 0.0 39194 123 318.7 191.0 50 3367 465.9 vkCreateFramebuffer
39 0.0 5190 2 2595.0 2595.0 1142 4048 2054.9 vkCreateSemaphore
40 0.0 4007 27 148.4 100.0 40 692 145.9 vkMapMemory
41 0.0 2756 2 1378.0 1378.0 1233 1523 205.1 vkDestroySemaphore
42 0.0 1823 2 911.5 911.5 831 992 113.8 vkTrimCommandPool
43 0.0 1392 8 174.0 150.0 130 350 73.7 vkUnmapMemory
44 0.0 1303 1 1303.0 1303.0 1303 1303 0.0 vkCreateCommandPool

View File

@ -0,0 +1,32 @@
metricName,samples,avg_value,min_value,max_value
"Vertex/Tess/Geometry Warps in Flight [Throughput %]",350205,0.42,0,1
"Vertex/Tess/Geometry Warps in Flight [Avg]",350205,2508.84,0,80568
"Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle]",350205,0.42,0,1
"Unallocated Warps in Active SMs [Throughput %]",350205,20.55,0,90
"Unallocated Warps in Active SMs [Avg]",350205,2825149.4,-8388543,8388540
"Unallocated Warps in Active SMs [Avg Warps per Cycle]",350205,19.72,0,86
"Tensor Active [Throughput %]",350205,0.0,0,0
"Sync Copy Engine Active [Throughput %]",350205,2.04,0,100
"Sync Copy Engine Active [Cycles Active]",350205,3212.99,0,163429
"Sync Compute in Flight [Throughput %]",350205,43.24,0,100
"SYS Clock Frequency [MHz]",350205,1593337364.85,1079680000,1650528169
"SMs Active [Throughput %]",350205,42.79,0,100
"SM Issue [Throughput %]",350205,13.91,0,99
"Pixel Warps in Flight [Throughput %]",350205,9.45,0,98
"Pixel Warps in Flight [Avg]",350205,1660859.11,0,18354900
"Pixel Warps in Flight [Avg Warps per Cycle]",350205,9.08,0,94
"PCIe Write Requests to BAR1 [Requests]",350205,45.25,0,530
"PCIe TX Throughput [Throughput %]",350205,1.38,1,17
"PCIe Read Requests to BAR1 [Requests]",350205,0.0,0,1
"PCIe RX Throughput [Throughput %]",350205,1.4,0,96
"GR Active [Throughput %]",350205,85.69,0,100
"GPU Active [Throughput %]",350205,91.16,0,100
"GPC Clock Frequency [MHz]",350205,1886072181.12,1287905714,1964364311
"DRAM Write Bandwidth [Throughput %]",350205,10.19,0,44
"DRAM Read Bandwidth [Throughput %]",350205,10.4,0,68
"Compute Warps in Flight [Throughput %]",350205,13.05,0,93
"Compute Warps in Flight [Avg]",350205,2347468.75,0,17187259
"Compute Warps in Flight [Avg Warps per Cycle]",350205,12.54,0,89
"Async Copy Engine Active [Throughput %]",350205,25.27,0,100
"Async Copy Engine Active [Cycles Active]",350205,40591.85,0,164990
"Async Compute in Flight [Throughput %]",350205,0.16,0,32
1 metricName samples avg_value min_value max_value
2 Vertex/Tess/Geometry Warps in Flight [Throughput %] 350205 0.42 0 1
3 Vertex/Tess/Geometry Warps in Flight [Avg] 350205 2508.84 0 80568
4 Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle] 350205 0.42 0 1
5 Unallocated Warps in Active SMs [Throughput %] 350205 20.55 0 90
6 Unallocated Warps in Active SMs [Avg] 350205 2825149.4 -8388543 8388540
7 Unallocated Warps in Active SMs [Avg Warps per Cycle] 350205 19.72 0 86
8 Tensor Active [Throughput %] 350205 0.0 0 0
9 Sync Copy Engine Active [Throughput %] 350205 2.04 0 100
10 Sync Copy Engine Active [Cycles Active] 350205 3212.99 0 163429
11 Sync Compute in Flight [Throughput %] 350205 43.24 0 100
12 SYS Clock Frequency [MHz] 350205 1593337364.85 1079680000 1650528169
13 SMs Active [Throughput %] 350205 42.79 0 100
14 SM Issue [Throughput %] 350205 13.91 0 99
15 Pixel Warps in Flight [Throughput %] 350205 9.45 0 98
16 Pixel Warps in Flight [Avg] 350205 1660859.11 0 18354900
17 Pixel Warps in Flight [Avg Warps per Cycle] 350205 9.08 0 94
18 PCIe Write Requests to BAR1 [Requests] 350205 45.25 0 530
19 PCIe TX Throughput [Throughput %] 350205 1.38 1 17
20 PCIe Read Requests to BAR1 [Requests] 350205 0.0 0 1
21 PCIe RX Throughput [Throughput %] 350205 1.4 0 96
22 GR Active [Throughput %] 350205 85.69 0 100
23 GPU Active [Throughput %] 350205 91.16 0 100
24 GPC Clock Frequency [MHz] 350205 1886072181.12 1287905714 1964364311
25 DRAM Write Bandwidth [Throughput %] 350205 10.19 0 44
26 DRAM Read Bandwidth [Throughput %] 350205 10.4 0 68
27 Compute Warps in Flight [Throughput %] 350205 13.05 0 93
28 Compute Warps in Flight [Avg] 350205 2347468.75 0 17187259
29 Compute Warps in Flight [Avg Warps per Cycle] 350205 12.54 0 89
30 Async Copy Engine Active [Throughput %] 350205 25.27 0 100
31 Async Copy Engine Active [Cycles Active] 350205 40591.85 0 164990
32 Async Compute in Flight [Throughput %] 350205 0.16 0 32

View File

@ -0,0 +1,64 @@
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
63.2,882304088002,1166913,756101.0,81462.0,1001,7559143983,12240943.4,pthread_cond_wait
21.0,292653885125,68267,4286901.2,807442.0,1001,2000062469,37737382.8,pthread_cond_timedwait
7.5,104841486605,91004,1152053.6,20959.0,1001,100202074,8650331.1,poll
4.9,67734436293,8697,7788253.0,6307024.0,6255048,10300238,1834125.8,usleep
2.5,34536907707,346,99817652.3,100108789.5,2816,100189751,5381658.2,select
0.3,4645633931,1571,2957119.0,14076.0,1002,42138249,7473785.3,pthread_rwlock_wrlock
0.3,3536426442,2306885,1533.0,1182.0,991,1362810,1510.7,backtrace
0.1,1059145369,668650,1584.0,1362.0,1001,715068,1528.2,pthread_cond_broadcast
0.1,988864255,209,4731407.9,1055376.0,1014959,200057791,19895982.3,nanosleep
0.1,798111557,50557,15786.4,8285.0,1001,8524665,131331.0,pthread_mutex_lock
0.0,611701922,5111,119683.4,25978.0,1002,31632084,622639.5,ioctl
0.0,376848621,60069,6273.6,6532.0,1001,405709,5146.3,read
0.0,258671149,1225,211160.1,9288.0,1032,4226160,544261.6,pthread_rwlock_rdlock
0.0,216400027,66377,3260.2,2605.0,1002,15570944,60498.3,open
0.0,111076542,69883,1589.5,1353.0,1001,135153,1407.8,pthread_cond_signal
0.0,87668535,4,21917133.8,21504729.5,59000,44600076,25244809.2,pthread_join
0.0,76843593,10735,7158.2,6763.0,3286,41598,1991.5,accept
0.0,49215712,27008,1822.3,1233.0,1001,85209,1666.7,recvmsg
0.0,44304559,11139,3977.4,3817.0,1032,111409,1700.7,writev
0.0,42663686,12430,3432.3,3266.0,1042,104415,1895.8,send
0.0,30965733,22153,1397.8,1312.0,1001,178734,1455.8,close
0.0,29504432,27813,1060.8,1032.0,1001,11281,189.5,openat64
0.0,27256936,4,6814234.0,8394872.5,312655,10154536,4413578.4,sem_wait
0.0,17492819,10623,1646.7,1462.0,1001,107611,2484.7,write
0.0,7107260,632,11245.7,4283.0,1583,176601,20095.6,munmap
0.0,5922929,294,20146.0,6432.0,1994,937504,77467.0,mmap64
0.0,5164606,1253,4121.8,1803.0,1011,132007,6746.5,fread
0.0,3959351,6,659891.8,631301.5,584835,828932,87442.6,fdatasync
0.0,3216472,83,38752.7,30297.0,13496,331160,37514.3,pthread_create
0.0,2682191,56,47896.3,1333.0,1002,2495521,333101.4,fgets
0.0,2655718,836,3176.7,2304.0,1001,47409,3405.4,open64
0.0,1652463,481,3435.5,1983.0,1001,292447,13794.6,mmap
0.0,1568473,786,1995.5,1603.0,1001,21009,1474.4,fopen
0.0,1126920,4,281730.0,148122.5,1463,829212,390211.3,futex
0.0,616506,195,3161.6,2665.0,1242,10821,1893.0,mprotect
0.0,615352,2,307676.0,307676.0,299200,316152,11986.9,sem_timedwait
0.0,530662,274,1936.7,1663.5,1272,5360,615.6,fopen64
0.0,524552,49,10705.1,8746.0,1002,23504,6281.1,fwrite
0.0,441809,102,4331.5,2685.0,2175,15509,2957.4,fflush
0.0,409157,193,2120.0,1423.0,1001,12915,2003.4,fclose
0.0,390698,251,1556.6,1182.0,1052,14697,1534.4,recv
0.0,134752,1,134752.0,134752.0,134752,134752,0.0,popen
0.0,121057,36,3362.7,3767.0,1002,10189,2399.8,stat64
0.0,84727,16,5295.4,5144.5,2655,10630,2359.5,socket
0.0,81942,33,2483.1,1884.0,1002,5591,1310.3,sendmsg
0.0,78662,27,2913.4,3376.0,1002,5932,1820.4,fstat64
0.0,60112,11,5464.7,5851.0,1814,9257,2203.0,connect
0.0,47769,4,11942.3,5981.0,4288,31519,13127.8,ftruncate
0.0,37511,7,5358.7,4409.0,2505,12593,3297.7,socketpair
0.0,19687,4,4921.8,5515.0,1463,7194,2580.7,bind
0.0,10198,4,2549.5,2374.0,1022,4428,1621.9,statx
0.0,9979,2,4989.5,4989.5,4178,5801,1147.6,getdelim
0.0,8647,4,2161.8,1568.0,1092,4419,1522.7,getc
0.0,6852,4,1713.0,1923.5,1001,2004,477.3,lockf
0.0,5390,1,5390.0,5390.0,5390,5390,0.0,pipe2
0.0,5109,1,5109.0,5109.0,5109,5109,0.0,memfd_create
0.0,4057,1,4057.0,4057.0,4057,4057,0.0,pipe
0.0,3827,3,1275.7,1132.0,1082,1613,293.2,sigaction
0.0,3796,2,1898.0,1898.0,1793,2003,148.5,flock
0.0,3556,3,1185.3,1222.0,1052,1282,119.3,stat
0.0,2244,2,1122.0,1122.0,1082,1162,56.6,fcntl
0.0,1583,1,1583.0,1583.0,1583,1583,0.0,pthread_mutex_trylock
0.0,1552,1,1552.0,1552.0,1552,1552,0.0,listen
1 Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
2 63.2 882304088002 1166913 756101.0 81462.0 1001 7559143983 12240943.4 pthread_cond_wait
3 21.0 292653885125 68267 4286901.2 807442.0 1001 2000062469 37737382.8 pthread_cond_timedwait
4 7.5 104841486605 91004 1152053.6 20959.0 1001 100202074 8650331.1 poll
5 4.9 67734436293 8697 7788253.0 6307024.0 6255048 10300238 1834125.8 usleep
6 2.5 34536907707 346 99817652.3 100108789.5 2816 100189751 5381658.2 select
7 0.3 4645633931 1571 2957119.0 14076.0 1002 42138249 7473785.3 pthread_rwlock_wrlock
8 0.3 3536426442 2306885 1533.0 1182.0 991 1362810 1510.7 backtrace
9 0.1 1059145369 668650 1584.0 1362.0 1001 715068 1528.2 pthread_cond_broadcast
10 0.1 988864255 209 4731407.9 1055376.0 1014959 200057791 19895982.3 nanosleep
11 0.1 798111557 50557 15786.4 8285.0 1001 8524665 131331.0 pthread_mutex_lock
12 0.0 611701922 5111 119683.4 25978.0 1002 31632084 622639.5 ioctl
13 0.0 376848621 60069 6273.6 6532.0 1001 405709 5146.3 read
14 0.0 258671149 1225 211160.1 9288.0 1032 4226160 544261.6 pthread_rwlock_rdlock
15 0.0 216400027 66377 3260.2 2605.0 1002 15570944 60498.3 open
16 0.0 111076542 69883 1589.5 1353.0 1001 135153 1407.8 pthread_cond_signal
17 0.0 87668535 4 21917133.8 21504729.5 59000 44600076 25244809.2 pthread_join
18 0.0 76843593 10735 7158.2 6763.0 3286 41598 1991.5 accept
19 0.0 49215712 27008 1822.3 1233.0 1001 85209 1666.7 recvmsg
20 0.0 44304559 11139 3977.4 3817.0 1032 111409 1700.7 writev
21 0.0 42663686 12430 3432.3 3266.0 1042 104415 1895.8 send
22 0.0 30965733 22153 1397.8 1312.0 1001 178734 1455.8 close
23 0.0 29504432 27813 1060.8 1032.0 1001 11281 189.5 openat64
24 0.0 27256936 4 6814234.0 8394872.5 312655 10154536 4413578.4 sem_wait
25 0.0 17492819 10623 1646.7 1462.0 1001 107611 2484.7 write
26 0.0 7107260 632 11245.7 4283.0 1583 176601 20095.6 munmap
27 0.0 5922929 294 20146.0 6432.0 1994 937504 77467.0 mmap64
28 0.0 5164606 1253 4121.8 1803.0 1011 132007 6746.5 fread
29 0.0 3959351 6 659891.8 631301.5 584835 828932 87442.6 fdatasync
30 0.0 3216472 83 38752.7 30297.0 13496 331160 37514.3 pthread_create
31 0.0 2682191 56 47896.3 1333.0 1002 2495521 333101.4 fgets
32 0.0 2655718 836 3176.7 2304.0 1001 47409 3405.4 open64
33 0.0 1652463 481 3435.5 1983.0 1001 292447 13794.6 mmap
34 0.0 1568473 786 1995.5 1603.0 1001 21009 1474.4 fopen
35 0.0 1126920 4 281730.0 148122.5 1463 829212 390211.3 futex
36 0.0 616506 195 3161.6 2665.0 1242 10821 1893.0 mprotect
37 0.0 615352 2 307676.0 307676.0 299200 316152 11986.9 sem_timedwait
38 0.0 530662 274 1936.7 1663.5 1272 5360 615.6 fopen64
39 0.0 524552 49 10705.1 8746.0 1002 23504 6281.1 fwrite
40 0.0 441809 102 4331.5 2685.0 2175 15509 2957.4 fflush
41 0.0 409157 193 2120.0 1423.0 1001 12915 2003.4 fclose
42 0.0 390698 251 1556.6 1182.0 1052 14697 1534.4 recv
43 0.0 134752 1 134752.0 134752.0 134752 134752 0.0 popen
44 0.0 121057 36 3362.7 3767.0 1002 10189 2399.8 stat64
45 0.0 84727 16 5295.4 5144.5 2655 10630 2359.5 socket
46 0.0 81942 33 2483.1 1884.0 1002 5591 1310.3 sendmsg
47 0.0 78662 27 2913.4 3376.0 1002 5932 1820.4 fstat64
48 0.0 60112 11 5464.7 5851.0 1814 9257 2203.0 connect
49 0.0 47769 4 11942.3 5981.0 4288 31519 13127.8 ftruncate
50 0.0 37511 7 5358.7 4409.0 2505 12593 3297.7 socketpair
51 0.0 19687 4 4921.8 5515.0 1463 7194 2580.7 bind
52 0.0 10198 4 2549.5 2374.0 1022 4428 1621.9 statx
53 0.0 9979 2 4989.5 4989.5 4178 5801 1147.6 getdelim
54 0.0 8647 4 2161.8 1568.0 1092 4419 1522.7 getc
55 0.0 6852 4 1713.0 1923.5 1001 2004 477.3 lockf
56 0.0 5390 1 5390.0 5390.0 5390 5390 0.0 pipe2
57 0.0 5109 1 5109.0 5109.0 5109 5109 0.0 memfd_create
58 0.0 4057 1 4057.0 4057.0 4057 4057 0.0 pipe
59 0.0 3827 3 1275.7 1132.0 1082 1613 293.2 sigaction
60 0.0 3796 2 1898.0 1898.0 1793 2003 148.5 flock
61 0.0 3556 3 1185.3 1222.0 1052 1282 119.3 stat
62 0.0 2244 2 1122.0 1122.0 1082 1162 56.6 fcntl
63 0.0 1583 1 1583.0 1583.0 1583 1583 0.0 pthread_mutex_trylock
64 0.0 1552 1 1552.0 1552.0 1552 1552 0.0 listen

View File

@ -0,0 +1,41 @@
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
47.3,4304563563,231,18634474.3,13822373.0,269513,50400702,13385050.1,vkCreateComputePipelines
10.0,906535248,793,1143171.8,97372.0,2966,36675320,3967519.3,vkCreateGraphicsPipelines
8.7,793761848,10286,77169.1,74760.0,16591,1147426,18530.7,vkQueuePresentKHR
7.9,715335192,166918,4285.5,2645.0,1643,1404546,6331.9,vkQueueSubmit
6.5,590495658,1,590495658.0,590495658.0,590495658,590495658,0.0,vkCreateDevice
5.6,512936386,2236013,229.4,170.0,40,197940,293.9,vkCmdBindPipeline
4.7,431480388,1566338,275.5,191.0,80,87854,626.4,vkCmdPipelineBarrier2KHR
2.6,235858982,166919,1413.0,1162.0,230,559757,2096.2,vkBeginCommandBuffer
1.8,163623284,166918,980.3,521.0,80,227395,1634.2,vkEndCommandBuffer
1.3,120651909,12142,9936.7,491.0,300,3236622,115194.4,vkWaitForFences
1.2,108109757,2,54054878.5,54054878.5,23852716,84257041,42712307.8,vkCreateSwapchainKHR
0.8,71024516,41161,1725.5,1934.0,661,96741,970.7,vkGetAccelerationStructureBuildSizesKHR
0.4,37508995,92,407706.5,127688.5,40025,5173172,783607.3,vkAllocateMemory
0.3,27598086,33,836305.6,338072.0,214851,15190000,2585500.1,vkCreateFence
0.1,13615140,567,24012.6,14076.0,862,188632,28324.9,vkCreateShaderModule
0.1,12712127,38647,328.9,221.0,40,11882,338.5,vkCreateBuffer
0.1,10958176,23960,457.4,421.0,120,16511,276.4,vkCreateAccelerationStructureKHR
0.1,9430649,10286,916.8,871.0,631,8456,289.7,vkAcquireNextImageKHR
0.1,7833984,1001,7826.2,872.0,200,2438580,107199.4,vkCreateImageView
0.1,6541995,20571,318.0,281.0,40,12764,226.9,vkDestroyAccelerationStructureKHR
0.1,4708503,2,2354251.5,2354251.5,1685431,3023072,945855.0,vkDeviceWaitIdle
0.1,4591969,400,11479.9,491.0,391,1596695,89890.3,vkBindImageMemory
0.0,3632879,2496,1455.5,1242.0,1001,12123,801.9,vkGetQueryPoolResults
0.0,3568885,38603,92.5,40.0,30,14186,165.0,vkBindBufferMemory2
0.0,3192728,10286,310.4,290.0,220,1733,71.8,vkCmdPipelineBarrier
0.0,2867124,10285,278.8,261.0,120,7985,162.6,vkResetEvent
0.0,2169561,11093,195.6,171.0,20,5711,111.1,vkResetQueryPoolEXT
0.0,939713,400,2349.3,862.0,181,307535,15384.4,vkCreateImage
0.0,743177,9,82575.2,42329.0,35215,329947,94644.5,vkFreeMemory
0.0,306413,197,1555.4,711.0,310,14918,2359.4,vkAllocateCommandBuffers
0.0,237721,96,2476.3,2419.5,1002,13456,1675.8,vkCreateEvent
0.0,71647,52,1377.8,917.0,211,5972,1216.1,vkCreateRenderPass2KHR
0.0,40338,44,916.8,812.0,451,2765,444.4,vkBindBufferMemory
0.0,39371,123,320.1,110.0,50,4910,770.7,vkCreateFramebuffer
0.0,3014,21,143.5,120.0,40,391,115.2,vkMapMemory
0.0,1633,2,816.5,816.5,631,1002,262.3,vkTrimCommandPool
0.0,1432,1,1432.0,1432.0,1432,1432,0.0,vkDestroySemaphore
0.0,1282,1,1282.0,1282.0,1282,1282,0.0,vkDestroyEvent
0.0,1032,1,1032.0,1032.0,1032,1032,0.0,vkCreateCommandPool
0.0,1002,1,1002.0,1002.0,1002,1002,0.0,vkCreateSemaphore
1 Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
2 47.3 4304563563 231 18634474.3 13822373.0 269513 50400702 13385050.1 vkCreateComputePipelines
3 10.0 906535248 793 1143171.8 97372.0 2966 36675320 3967519.3 vkCreateGraphicsPipelines
4 8.7 793761848 10286 77169.1 74760.0 16591 1147426 18530.7 vkQueuePresentKHR
5 7.9 715335192 166918 4285.5 2645.0 1643 1404546 6331.9 vkQueueSubmit
6 6.5 590495658 1 590495658.0 590495658.0 590495658 590495658 0.0 vkCreateDevice
7 5.6 512936386 2236013 229.4 170.0 40 197940 293.9 vkCmdBindPipeline
8 4.7 431480388 1566338 275.5 191.0 80 87854 626.4 vkCmdPipelineBarrier2KHR
9 2.6 235858982 166919 1413.0 1162.0 230 559757 2096.2 vkBeginCommandBuffer
10 1.8 163623284 166918 980.3 521.0 80 227395 1634.2 vkEndCommandBuffer
11 1.3 120651909 12142 9936.7 491.0 300 3236622 115194.4 vkWaitForFences
12 1.2 108109757 2 54054878.5 54054878.5 23852716 84257041 42712307.8 vkCreateSwapchainKHR
13 0.8 71024516 41161 1725.5 1934.0 661 96741 970.7 vkGetAccelerationStructureBuildSizesKHR
14 0.4 37508995 92 407706.5 127688.5 40025 5173172 783607.3 vkAllocateMemory
15 0.3 27598086 33 836305.6 338072.0 214851 15190000 2585500.1 vkCreateFence
16 0.1 13615140 567 24012.6 14076.0 862 188632 28324.9 vkCreateShaderModule
17 0.1 12712127 38647 328.9 221.0 40 11882 338.5 vkCreateBuffer
18 0.1 10958176 23960 457.4 421.0 120 16511 276.4 vkCreateAccelerationStructureKHR
19 0.1 9430649 10286 916.8 871.0 631 8456 289.7 vkAcquireNextImageKHR
20 0.1 7833984 1001 7826.2 872.0 200 2438580 107199.4 vkCreateImageView
21 0.1 6541995 20571 318.0 281.0 40 12764 226.9 vkDestroyAccelerationStructureKHR
22 0.1 4708503 2 2354251.5 2354251.5 1685431 3023072 945855.0 vkDeviceWaitIdle
23 0.1 4591969 400 11479.9 491.0 391 1596695 89890.3 vkBindImageMemory
24 0.0 3632879 2496 1455.5 1242.0 1001 12123 801.9 vkGetQueryPoolResults
25 0.0 3568885 38603 92.5 40.0 30 14186 165.0 vkBindBufferMemory2
26 0.0 3192728 10286 310.4 290.0 220 1733 71.8 vkCmdPipelineBarrier
27 0.0 2867124 10285 278.8 261.0 120 7985 162.6 vkResetEvent
28 0.0 2169561 11093 195.6 171.0 20 5711 111.1 vkResetQueryPoolEXT
29 0.0 939713 400 2349.3 862.0 181 307535 15384.4 vkCreateImage
30 0.0 743177 9 82575.2 42329.0 35215 329947 94644.5 vkFreeMemory
31 0.0 306413 197 1555.4 711.0 310 14918 2359.4 vkAllocateCommandBuffers
32 0.0 237721 96 2476.3 2419.5 1002 13456 1675.8 vkCreateEvent
33 0.0 71647 52 1377.8 917.0 211 5972 1216.1 vkCreateRenderPass2KHR
34 0.0 40338 44 916.8 812.0 451 2765 444.4 vkBindBufferMemory
35 0.0 39371 123 320.1 110.0 50 4910 770.7 vkCreateFramebuffer
36 0.0 3014 21 143.5 120.0 40 391 115.2 vkMapMemory
37 0.0 1633 2 816.5 816.5 631 1002 262.3 vkTrimCommandPool
38 0.0 1432 1 1432.0 1432.0 1432 1432 0.0 vkDestroySemaphore
39 0.0 1282 1 1282.0 1282.0 1282 1282 0.0 vkDestroyEvent
40 0.0 1032 1 1032.0 1032.0 1032 1032 0.0 vkCreateCommandPool
41 0.0 1002 1 1002.0 1002.0 1002 1002 0.0 vkCreateSemaphore

View File

@ -0,0 +1,32 @@
metricName,samples,avg_value,min_value,max_value
"Vertex/Tess/Geometry Warps in Flight [Throughput %]",350249,0.48,0,10
"Vertex/Tess/Geometry Warps in Flight [Avg]",350249,11994.85,0,1250154
"Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle]",350249,0.45,0,7
"Unallocated Warps in Active SMs [Throughput %]",350249,20.91,0,90
"Unallocated Warps in Active SMs [Avg]",350249,2674568.52,-8388551,8388603
"Unallocated Warps in Active SMs [Avg Warps per Cycle]",350249,20.07,0,86
"Tensor Active [Throughput %]",350249,0.0,0,0
"Sync Copy Engine Active [Throughput %]",350249,2.04,0,100
"Sync Copy Engine Active [Cycles Active]",350249,3210.05,0,164917
"Sync Compute in Flight [Throughput %]",350249,43.21,0,100
"SYS Clock Frequency [MHz]",350249,1599260779.65,1124820000,1665030000
"SMs Active [Throughput %]",350249,42.97,0,100
"SM Issue [Throughput %]",350249,13.97,0,99
"Pixel Warps in Flight [Throughput %]",350249,9.26,0,99
"Pixel Warps in Flight [Avg]",350249,1626352.79,0,18545892
"Pixel Warps in Flight [Avg Warps per Cycle]",350249,8.9,0,95
"PCIe Write Requests to BAR1 [Requests]",350249,48.49,0,530
"PCIe TX Throughput [Throughput %]",350249,1.39,1,17
"PCIe Read Requests to BAR1 [Requests]",350249,0.0,0,1
"PCIe RX Throughput [Throughput %]",350249,1.59,0,96
"GR Active [Throughput %]",350249,85.48,0,100
"GPU Active [Throughput %]",350249,90.8,0,100
"GPC Clock Frequency [MHz]",350249,1887936516.09,1345411429,1965055714
"DRAM Write Bandwidth [Throughput %]",350249,10.0,0,78
"DRAM Read Bandwidth [Throughput %]",350249,10.19,0,67
"Compute Warps in Flight [Throughput %]",350249,13.0,0,93
"Compute Warps in Flight [Avg]",350249,2341816.22,0,17218710
"Compute Warps in Flight [Avg Warps per Cycle]",350249,12.5,0,90
"Async Copy Engine Active [Throughput %]",350249,24.19,0,100
"Async Copy Engine Active [Cycles Active]",350249,38961.97,0,165002
"Async Compute in Flight [Throughput %]",350249,0.17,0,35
1 metricName samples avg_value min_value max_value
2 Vertex/Tess/Geometry Warps in Flight [Throughput %] 350249 0.48 0 10
3 Vertex/Tess/Geometry Warps in Flight [Avg] 350249 11994.85 0 1250154
4 Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle] 350249 0.45 0 7
5 Unallocated Warps in Active SMs [Throughput %] 350249 20.91 0 90
6 Unallocated Warps in Active SMs [Avg] 350249 2674568.52 -8388551 8388603
7 Unallocated Warps in Active SMs [Avg Warps per Cycle] 350249 20.07 0 86
8 Tensor Active [Throughput %] 350249 0.0 0 0
9 Sync Copy Engine Active [Throughput %] 350249 2.04 0 100
10 Sync Copy Engine Active [Cycles Active] 350249 3210.05 0 164917
11 Sync Compute in Flight [Throughput %] 350249 43.21 0 100
12 SYS Clock Frequency [MHz] 350249 1599260779.65 1124820000 1665030000
13 SMs Active [Throughput %] 350249 42.97 0 100
14 SM Issue [Throughput %] 350249 13.97 0 99
15 Pixel Warps in Flight [Throughput %] 350249 9.26 0 99
16 Pixel Warps in Flight [Avg] 350249 1626352.79 0 18545892
17 Pixel Warps in Flight [Avg Warps per Cycle] 350249 8.9 0 95
18 PCIe Write Requests to BAR1 [Requests] 350249 48.49 0 530
19 PCIe TX Throughput [Throughput %] 350249 1.39 1 17
20 PCIe Read Requests to BAR1 [Requests] 350249 0.0 0 1
21 PCIe RX Throughput [Throughput %] 350249 1.59 0 96
22 GR Active [Throughput %] 350249 85.48 0 100
23 GPU Active [Throughput %] 350249 90.8 0 100
24 GPC Clock Frequency [MHz] 350249 1887936516.09 1345411429 1965055714
25 DRAM Write Bandwidth [Throughput %] 350249 10.0 0 78
26 DRAM Read Bandwidth [Throughput %] 350249 10.19 0 67
27 Compute Warps in Flight [Throughput %] 350249 13.0 0 93
28 Compute Warps in Flight [Avg] 350249 2341816.22 0 17218710
29 Compute Warps in Flight [Avg Warps per Cycle] 350249 12.5 0 90
30 Async Copy Engine Active [Throughput %] 350249 24.19 0 100
31 Async Copy Engine Active [Cycles Active] 350249 38961.97 0 165002
32 Async Compute in Flight [Throughput %] 350249 0.17 0 35

View File

@ -0,0 +1,65 @@
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
65.1,939238393966,1253746,749145.7,99516.0,1001,22225185125,27598875.4,pthread_cond_wait
19.6,283372591523,63863,4437195.1,667409.0,1001,2000066987,37775239.1,pthread_cond_timedwait
7.2,103523321736,83184,1244510.0,19371.5,1001,100158833,9028721.1,poll
4.7,67623762335,8679,7791653.7,6308036.0,6256440,10363397,1834071.0,usleep
2.4,34537204538,346,99818510.2,100108404.0,2074,100558361,5381791.1,select
0.3,4633653836,1769,2619363.4,10690.0,1002,81183859,7632062.5,pthread_rwlock_wrlock
0.2,3564450731,2289546,1556.8,1192.0,992,1166253,1451.7,backtrace
0.2,2186807640,213,10266702.5,1055706.0,1013547,200059383,37910858.8,nanosleep
0.1,1197512783,747301,1602.5,1373.0,1001,990875,2070.5,pthread_cond_broadcast
0.1,1000015122,63378,15778.6,8386.0,1001,8679265,127982.7,pthread_mutex_lock
0.0,692411875,5159,134214.4,28353.0,1002,37919402,814147.2,ioctl
0.0,376956878,58727,6418.8,6573.0,1001,215753,4816.1,read
0.0,262438870,1114,235582.5,8972.0,1012,3887476,618055.6,pthread_rwlock_rdlock
0.0,201249425,64299,3129.9,2665.0,1042,474829,2741.8,open
0.0,119755986,75469,1586.8,1362.0,1001,185868,1300.7,pthread_cond_signal
0.0,87476826,4,21869206.5,21608674.0,71504,44187974,25163290.6,pthread_join
0.0,78419182,10390,7547.6,7174.0,4308,51646,2186.9,accept
0.0,51411361,26962,1906.8,1243.0,1001,610343,4156.7,recvmsg
0.0,44191693,10805,4089.9,3897.0,1132,74259,1442.1,writev
0.0,42212528,12379,3410.0,3216.0,1062,75512,1516.0,send
0.0,31715010,29284,1083.0,1062.0,1001,9037,176.8,openat64
0.0,31357688,21714,1444.1,1342.0,1001,171231,1546.9,close
0.0,29161358,4,7290339.5,9321035.5,490768,10028519,4566762.7,sem_wait
0.0,19619685,741,26477.3,2946.0,1002,16496436,605868.8,open64
0.0,16704779,9960,1677.2,1473.0,1001,74089,1397.7,write
0.0,7170661,630,11382.0,4493.0,1273,203591,19759.6,munmap
0.0,6221613,295,21090.2,7584.0,2364,992327,79415.2,mmap64
0.0,5100259,1250,4080.2,1788.5,1002,64672,6212.7,fread
0.0,3222871,43,74950.5,1583.0,1163,3069094,467529.0,fgets
0.0,3213395,83,38715.6,31229.0,18024,351818,37357.9,pthread_create
0.0,3069705,5,613941.0,607577.0,476021,720739,90067.6,fdatasync
0.0,2574090,4,643522.5,146489.5,2185,2278926,1098695.2,futex
0.0,1547765,811,1908.5,1583.0,1002,18795,1290.6,fopen
0.0,1418724,493,2877.7,2074.0,1001,26530,2638.3,mmap
0.0,1075665,108,9959.9,6733.0,1002,27541,7008.8,fwrite
0.0,1014948,230,4412.8,2685.0,1663,20168,3313.5,fflush
0.0,955386,647,1476.6,1212.0,1002,29245,1752.5,recv
0.0,610149,195,3129.0,2465.0,1253,26650,2395.3,mprotect
0.0,580339,274,2118.0,1863.5,1243,8055,705.9,fopen64
0.0,551532,2,275766.0,275766.0,265417,286115,14635.7,sem_timedwait
0.0,405319,188,2156.0,1512.0,1001,12163,1920.5,fclose
0.0,143337,42,3412.8,2239.5,1012,13305,2842.6,stat64
0.0,96974,16,6060.9,5791.0,2254,13616,2885.0,socket
0.0,96759,33,2932.1,1973.0,1052,6552,1830.5,sendmsg
0.0,96400,1,96400.0,96400.0,96400,96400,0.0,popen
0.0,76692,27,2840.4,3095.0,1032,5711,1567.0,fstat64
0.0,64490,11,5862.7,6682.0,1853,8907,2102.0,connect
0.0,54001,4,13500.3,6026.5,4358,37590,16118.4,ftruncate
0.0,43031,7,6147.3,5110.0,2615,10780,2859.9,socketpair
0.0,23723,9,2635.9,3075.0,1012,5200,1568.2,statx
0.0,19458,4,4864.5,5766.5,1002,6923,2717.2,bind
0.0,11863,2,5931.5,5931.5,3948,7915,2805.1,getdelim
0.0,10210,4,2552.5,2139.5,1182,4749,1649.1,getc
0.0,7934,2,3967.0,3967.0,1562,6372,3401.2,pthread_mutex_trylock
0.0,7063,3,2354.3,2435.0,1903,2725,416.9,lockf
0.0,6843,1,6843.0,6843.0,6843,6843,0.0,pipe2
0.0,5691,1,5691.0,5691.0,5691,5691,0.0,memfd_create
0.0,5370,4,1342.5,1327.5,1012,1703,328.7,fcntl64
0.0,5030,4,1257.5,1217.5,1052,1543,233.6,stat
0.0,3988,1,3988.0,3988.0,3988,3988,0.0,pipe
0.0,3596,2,1798.0,1798.0,1733,1863,91.9,flock
0.0,3116,2,1558.0,1558.0,1293,1823,374.8,fcntl
0.0,2855,2,1427.5,1427.5,1122,1733,432.0,sigaction
0.0,1783,1,1783.0,1783.0,1783,1783,0.0,listen
1 Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
2 65.1 939238393966 1253746 749145.7 99516.0 1001 22225185125 27598875.4 pthread_cond_wait
3 19.6 283372591523 63863 4437195.1 667409.0 1001 2000066987 37775239.1 pthread_cond_timedwait
4 7.2 103523321736 83184 1244510.0 19371.5 1001 100158833 9028721.1 poll
5 4.7 67623762335 8679 7791653.7 6308036.0 6256440 10363397 1834071.0 usleep
6 2.4 34537204538 346 99818510.2 100108404.0 2074 100558361 5381791.1 select
7 0.3 4633653836 1769 2619363.4 10690.0 1002 81183859 7632062.5 pthread_rwlock_wrlock
8 0.2 3564450731 2289546 1556.8 1192.0 992 1166253 1451.7 backtrace
9 0.2 2186807640 213 10266702.5 1055706.0 1013547 200059383 37910858.8 nanosleep
10 0.1 1197512783 747301 1602.5 1373.0 1001 990875 2070.5 pthread_cond_broadcast
11 0.1 1000015122 63378 15778.6 8386.0 1001 8679265 127982.7 pthread_mutex_lock
12 0.0 692411875 5159 134214.4 28353.0 1002 37919402 814147.2 ioctl
13 0.0 376956878 58727 6418.8 6573.0 1001 215753 4816.1 read
14 0.0 262438870 1114 235582.5 8972.0 1012 3887476 618055.6 pthread_rwlock_rdlock
15 0.0 201249425 64299 3129.9 2665.0 1042 474829 2741.8 open
16 0.0 119755986 75469 1586.8 1362.0 1001 185868 1300.7 pthread_cond_signal
17 0.0 87476826 4 21869206.5 21608674.0 71504 44187974 25163290.6 pthread_join
18 0.0 78419182 10390 7547.6 7174.0 4308 51646 2186.9 accept
19 0.0 51411361 26962 1906.8 1243.0 1001 610343 4156.7 recvmsg
20 0.0 44191693 10805 4089.9 3897.0 1132 74259 1442.1 writev
21 0.0 42212528 12379 3410.0 3216.0 1062 75512 1516.0 send
22 0.0 31715010 29284 1083.0 1062.0 1001 9037 176.8 openat64
23 0.0 31357688 21714 1444.1 1342.0 1001 171231 1546.9 close
24 0.0 29161358 4 7290339.5 9321035.5 490768 10028519 4566762.7 sem_wait
25 0.0 19619685 741 26477.3 2946.0 1002 16496436 605868.8 open64
26 0.0 16704779 9960 1677.2 1473.0 1001 74089 1397.7 write
27 0.0 7170661 630 11382.0 4493.0 1273 203591 19759.6 munmap
28 0.0 6221613 295 21090.2 7584.0 2364 992327 79415.2 mmap64
29 0.0 5100259 1250 4080.2 1788.5 1002 64672 6212.7 fread
30 0.0 3222871 43 74950.5 1583.0 1163 3069094 467529.0 fgets
31 0.0 3213395 83 38715.6 31229.0 18024 351818 37357.9 pthread_create
32 0.0 3069705 5 613941.0 607577.0 476021 720739 90067.6 fdatasync
33 0.0 2574090 4 643522.5 146489.5 2185 2278926 1098695.2 futex
34 0.0 1547765 811 1908.5 1583.0 1002 18795 1290.6 fopen
35 0.0 1418724 493 2877.7 2074.0 1001 26530 2638.3 mmap
36 0.0 1075665 108 9959.9 6733.0 1002 27541 7008.8 fwrite
37 0.0 1014948 230 4412.8 2685.0 1663 20168 3313.5 fflush
38 0.0 955386 647 1476.6 1212.0 1002 29245 1752.5 recv
39 0.0 610149 195 3129.0 2465.0 1253 26650 2395.3 mprotect
40 0.0 580339 274 2118.0 1863.5 1243 8055 705.9 fopen64
41 0.0 551532 2 275766.0 275766.0 265417 286115 14635.7 sem_timedwait
42 0.0 405319 188 2156.0 1512.0 1001 12163 1920.5 fclose
43 0.0 143337 42 3412.8 2239.5 1012 13305 2842.6 stat64
44 0.0 96974 16 6060.9 5791.0 2254 13616 2885.0 socket
45 0.0 96759 33 2932.1 1973.0 1052 6552 1830.5 sendmsg
46 0.0 96400 1 96400.0 96400.0 96400 96400 0.0 popen
47 0.0 76692 27 2840.4 3095.0 1032 5711 1567.0 fstat64
48 0.0 64490 11 5862.7 6682.0 1853 8907 2102.0 connect
49 0.0 54001 4 13500.3 6026.5 4358 37590 16118.4 ftruncate
50 0.0 43031 7 6147.3 5110.0 2615 10780 2859.9 socketpair
51 0.0 23723 9 2635.9 3075.0 1012 5200 1568.2 statx
52 0.0 19458 4 4864.5 5766.5 1002 6923 2717.2 bind
53 0.0 11863 2 5931.5 5931.5 3948 7915 2805.1 getdelim
54 0.0 10210 4 2552.5 2139.5 1182 4749 1649.1 getc
55 0.0 7934 2 3967.0 3967.0 1562 6372 3401.2 pthread_mutex_trylock
56 0.0 7063 3 2354.3 2435.0 1903 2725 416.9 lockf
57 0.0 6843 1 6843.0 6843.0 6843 6843 0.0 pipe2
58 0.0 5691 1 5691.0 5691.0 5691 5691 0.0 memfd_create
59 0.0 5370 4 1342.5 1327.5 1012 1703 328.7 fcntl64
60 0.0 5030 4 1257.5 1217.5 1052 1543 233.6 stat
61 0.0 3988 1 3988.0 3988.0 3988 3988 0.0 pipe
62 0.0 3596 2 1798.0 1798.0 1733 1863 91.9 flock
63 0.0 3116 2 1558.0 1558.0 1293 1823 374.8 fcntl
64 0.0 2855 2 1427.5 1427.5 1122 1733 432.0 sigaction
65 0.0 1783 1 1783.0 1783.0 1783 1783 0.0 listen

View File

@ -0,0 +1,39 @@
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
47.1,4408914080,233,18922378.0,13455046.0,185617,56011291,14760998.1,vkCreateComputePipelines
11.2,1045885246,797,1312277.6,92924.0,3105,36430873,4324767.0,vkCreateGraphicsPipelines
9.5,888494352,11531,77052.7,76452.0,16360,900363,19476.1,vkQueuePresentKHR
7.8,731581920,186589,3920.8,2625.0,1593,1639285,5212.1,vkQueueSubmit
6.4,595380851,2528014,235.5,170.0,40,182351,323.2,vkCmdBindPipeline
5.8,541364941,1,541364941.0,541364941.0,541364941,541364941,0.0,vkCreateDevice
5.2,485559049,1798810,269.9,190.0,80,941591,933.6,vkCmdPipelineBarrier2KHR
2.0,188899262,186590,1012.4,942.0,220,901967,2432.1,vkBeginCommandBuffer
1.3,120960739,186589,648.3,330.0,80,120916,1027.9,vkEndCommandBuffer
0.9,81326038,46147,1762.3,1944.0,741,82524,908.0,vkGetAccelerationStructureBuildSizesKHR
0.7,61601253,3,20533751.0,3293287.0,3013755,55294211,30103765.9,vkCreateSwapchainKHR
0.5,42258552,11627,3634.5,461.0,301,2606383,62274.9,vkWaitForFences
0.4,37227731,95,391870.9,141975.0,37510,4369409,644086.2,vkAllocateMemory
0.2,19195932,33,581694.9,325579.0,208410,8169945,1367481.9,vkCreateFence
0.1,13706556,582,23550.8,13626.0,531,241291,29168.6,vkCreateShaderModule
0.1,13512957,9203,1468.3,1252.0,1001,37961,840.0,vkGetQueryPoolResults
0.1,10527142,26275,400.7,360.0,110,90429,610.2,vkCreateAccelerationStructureKHR
0.1,10302597,11531,893.5,842.0,622,7554,252.1,vkAcquireNextImageKHR
0.1,9980822,23707,421.0,250.0,40,17412,457.5,vkCreateBuffer
0.1,7822356,4,1955589.0,1670954.5,1551050,2929397,651664.2,vkDeviceWaitIdle
0.1,7818709,23063,339.0,310.0,40,9368,202.2,vkDestroyAccelerationStructureKHR
0.1,6815677,572,11915.5,455.5,400,1607465,91981.0,vkBindImageMemory
0.1,5674639,1485,3821.3,732.0,180,1609990,47166.1,vkCreateImageView
0.0,4289993,11532,372.0,281.0,120,34204,391.2,vkResetEvent
0.0,3529485,11533,306.0,281.0,230,39935,389.2,vkCmdPipelineBarrier
0.0,3285335,23663,138.8,41.0,30,38682,337.6,vkBindBufferMemory2
0.0,2486938,12339,201.6,180.0,20,7645,137.7,vkResetQueryPoolEXT
0.0,2396902,12,199741.8,66895.0,34244,1049232,341650.1,vkFreeMemory
0.0,927372,572,1621.3,601.0,160,98173,4412.2,vkCreateImage
0.0,225895,103,2193.2,2334.0,1021,7464,995.8,vkCreateEvent
0.0,212807,43,4949.0,3657.0,611,19857,4522.9,vkAllocateCommandBuffers
0.0,72741,53,1372.5,1051.0,200,3727,965.4,vkCreateRenderPass2KHR
0.0,63866,196,325.8,130.0,50,6011,775.0,vkCreateFramebuffer
0.0,41274,44,938.0,871.5,531,1743,296.2,vkBindBufferMemory
0.0,4037,2,2018.5,2018.5,892,3145,1593.1,vkTrimCommandPool
0.0,3407,21,162.2,140.0,30,511,152.8,vkMapMemory
0.0,1182,1,1182.0,1182.0,1182,1182,0.0,vkDestroySemaphore
0.0,811,1,811.0,811.0,811,811,0.0,vkCreateCommandPool
1 Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
2 47.1 4408914080 233 18922378.0 13455046.0 185617 56011291 14760998.1 vkCreateComputePipelines
3 11.2 1045885246 797 1312277.6 92924.0 3105 36430873 4324767.0 vkCreateGraphicsPipelines
4 9.5 888494352 11531 77052.7 76452.0 16360 900363 19476.1 vkQueuePresentKHR
5 7.8 731581920 186589 3920.8 2625.0 1593 1639285 5212.1 vkQueueSubmit
6 6.4 595380851 2528014 235.5 170.0 40 182351 323.2 vkCmdBindPipeline
7 5.8 541364941 1 541364941.0 541364941.0 541364941 541364941 0.0 vkCreateDevice
8 5.2 485559049 1798810 269.9 190.0 80 941591 933.6 vkCmdPipelineBarrier2KHR
9 2.0 188899262 186590 1012.4 942.0 220 901967 2432.1 vkBeginCommandBuffer
10 1.3 120960739 186589 648.3 330.0 80 120916 1027.9 vkEndCommandBuffer
11 0.9 81326038 46147 1762.3 1944.0 741 82524 908.0 vkGetAccelerationStructureBuildSizesKHR
12 0.7 61601253 3 20533751.0 3293287.0 3013755 55294211 30103765.9 vkCreateSwapchainKHR
13 0.5 42258552 11627 3634.5 461.0 301 2606383 62274.9 vkWaitForFences
14 0.4 37227731 95 391870.9 141975.0 37510 4369409 644086.2 vkAllocateMemory
15 0.2 19195932 33 581694.9 325579.0 208410 8169945 1367481.9 vkCreateFence
16 0.1 13706556 582 23550.8 13626.0 531 241291 29168.6 vkCreateShaderModule
17 0.1 13512957 9203 1468.3 1252.0 1001 37961 840.0 vkGetQueryPoolResults
18 0.1 10527142 26275 400.7 360.0 110 90429 610.2 vkCreateAccelerationStructureKHR
19 0.1 10302597 11531 893.5 842.0 622 7554 252.1 vkAcquireNextImageKHR
20 0.1 9980822 23707 421.0 250.0 40 17412 457.5 vkCreateBuffer
21 0.1 7822356 4 1955589.0 1670954.5 1551050 2929397 651664.2 vkDeviceWaitIdle
22 0.1 7818709 23063 339.0 310.0 40 9368 202.2 vkDestroyAccelerationStructureKHR
23 0.1 6815677 572 11915.5 455.5 400 1607465 91981.0 vkBindImageMemory
24 0.1 5674639 1485 3821.3 732.0 180 1609990 47166.1 vkCreateImageView
25 0.0 4289993 11532 372.0 281.0 120 34204 391.2 vkResetEvent
26 0.0 3529485 11533 306.0 281.0 230 39935 389.2 vkCmdPipelineBarrier
27 0.0 3285335 23663 138.8 41.0 30 38682 337.6 vkBindBufferMemory2
28 0.0 2486938 12339 201.6 180.0 20 7645 137.7 vkResetQueryPoolEXT
29 0.0 2396902 12 199741.8 66895.0 34244 1049232 341650.1 vkFreeMemory
30 0.0 927372 572 1621.3 601.0 160 98173 4412.2 vkCreateImage
31 0.0 225895 103 2193.2 2334.0 1021 7464 995.8 vkCreateEvent
32 0.0 212807 43 4949.0 3657.0 611 19857 4522.9 vkAllocateCommandBuffers
33 0.0 72741 53 1372.5 1051.0 200 3727 965.4 vkCreateRenderPass2KHR
34 0.0 63866 196 325.8 130.0 50 6011 775.0 vkCreateFramebuffer
35 0.0 41274 44 938.0 871.5 531 1743 296.2 vkBindBufferMemory
36 0.0 4037 2 2018.5 2018.5 892 3145 1593.1 vkTrimCommandPool
37 0.0 3407 21 162.2 140.0 30 511 152.8 vkMapMemory
38 0.0 1182 1 1182.0 1182.0 1182 1182 0.0 vkDestroySemaphore
39 0.0 811 1 811.0 811.0 811 811 0.0 vkCreateCommandPool

View File

@ -0,0 +1,32 @@
metricName,samples,avg_value,min_value,max_value
"Vertex/Tess/Geometry Warps in Flight [Throughput %]",350101,0.37,0,53
"Vertex/Tess/Geometry Warps in Flight [Avg]",350101,12165.35,0,2025158
"Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle]",350101,0.3,0,34
"Unallocated Warps in Active SMs [Throughput %]",350101,11.51,0,94
"Unallocated Warps in Active SMs [Avg]",350101,1259925.78,-8388465,8388468
"Unallocated Warps in Active SMs [Avg Warps per Cycle]",350101,11.05,0,90
"Tensor Active [Throughput %]",350101,0.0,0,0
"Sync Copy Engine Active [Throughput %]",350101,0.95,0,100
"Sync Copy Engine Active [Cycles Active]",350101,1384.9,0,164952
"Sync Compute in Flight [Throughput %]",350101,22.08,0,100
"SYS Clock Frequency [MHz]",350101,1239952918.07,524860000,1650150000
"SMs Active [Throughput %]",350101,23.22,0,100
"SM Issue [Throughput %]",350101,6.71,0,98
"Pixel Warps in Flight [Throughput %]",350101,4.68,0,98
"Pixel Warps in Flight [Avg]",350101,705372.0,0,18250302
"Pixel Warps in Flight [Avg Warps per Cycle]",350101,4.5,0,94
"PCIe Write Requests to BAR1 [Requests]",350101,25.53,0,1565
"PCIe TX Throughput [Throughput %]",350101,1.25,0,41
"PCIe Read Requests to BAR1 [Requests]",350101,0.0,0,1
"PCIe RX Throughput [Throughput %]",350101,1.4,0,98
"GR Active [Throughput %]",350101,44.72,0,100
"GPU Active [Throughput %]",350101,49.55,0,100
"GPC Clock Frequency [MHz]",350101,1377049332.09,367524286,1965071429
"DRAM Write Bandwidth [Throughput %]",350101,5.6,0,84
"DRAM Read Bandwidth [Throughput %]",350101,8.04,0,85
"Compute Warps in Flight [Throughput %]",350101,7.03,0,99
"Compute Warps in Flight [Avg]",350101,1039013.63,0,17187013
"Compute Warps in Flight [Avg Warps per Cycle]",350101,6.75,0,95
"Async Copy Engine Active [Throughput %]",350101,10.6,0,100
"Async Copy Engine Active [Cycles Active]",350101,16184.16,0,165001
"Async Compute in Flight [Throughput %]",350101,0.06,0,31
1 metricName samples avg_value min_value max_value
2 Vertex/Tess/Geometry Warps in Flight [Throughput %] 350101 0.37 0 53
3 Vertex/Tess/Geometry Warps in Flight [Avg] 350101 12165.35 0 2025158
4 Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle] 350101 0.3 0 34
5 Unallocated Warps in Active SMs [Throughput %] 350101 11.51 0 94
6 Unallocated Warps in Active SMs [Avg] 350101 1259925.78 -8388465 8388468
7 Unallocated Warps in Active SMs [Avg Warps per Cycle] 350101 11.05 0 90
8 Tensor Active [Throughput %] 350101 0.0 0 0
9 Sync Copy Engine Active [Throughput %] 350101 0.95 0 100
10 Sync Copy Engine Active [Cycles Active] 350101 1384.9 0 164952
11 Sync Compute in Flight [Throughput %] 350101 22.08 0 100
12 SYS Clock Frequency [MHz] 350101 1239952918.07 524860000 1650150000
13 SMs Active [Throughput %] 350101 23.22 0 100
14 SM Issue [Throughput %] 350101 6.71 0 98
15 Pixel Warps in Flight [Throughput %] 350101 4.68 0 98
16 Pixel Warps in Flight [Avg] 350101 705372.0 0 18250302
17 Pixel Warps in Flight [Avg Warps per Cycle] 350101 4.5 0 94
18 PCIe Write Requests to BAR1 [Requests] 350101 25.53 0 1565
19 PCIe TX Throughput [Throughput %] 350101 1.25 0 41
20 PCIe Read Requests to BAR1 [Requests] 350101 0.0 0 1
21 PCIe RX Throughput [Throughput %] 350101 1.4 0 98
22 GR Active [Throughput %] 350101 44.72 0 100
23 GPU Active [Throughput %] 350101 49.55 0 100
24 GPC Clock Frequency [MHz] 350101 1377049332.09 367524286 1965071429
25 DRAM Write Bandwidth [Throughput %] 350101 5.6 0 84
26 DRAM Read Bandwidth [Throughput %] 350101 8.04 0 85
27 Compute Warps in Flight [Throughput %] 350101 7.03 0 99
28 Compute Warps in Flight [Avg] 350101 1039013.63 0 17187013
29 Compute Warps in Flight [Avg Warps per Cycle] 350101 6.75 0 95
30 Async Copy Engine Active [Throughput %] 350101 10.6 0 100
31 Async Copy Engine Active [Cycles Active] 350101 16184.16 0 165001
32 Async Compute in Flight [Throughput %] 350101 0.06 0 31

View File

@ -0,0 +1,65 @@
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
66.4,966436705693,674529,1432757.8,92112.0,1001,9178558622,21169873.5,pthread_cond_wait
17.4,252710188786,31653,7983767.4,997705.0,1001,2000056483,42373635.1,pthread_cond_timedwait
7.0,101523471662,41663,2436777.8,19406.0,1001,100191074,12645054.7,poll
4.6,67673864352,8686,7791142.6,6310647.5,6255339,11320895,1834156.6,usleep
2.4,34536986482,347,99530220.4,100108890.0,1022,100164313,7588935.6,select
1.6,23806358988,333,71490567.5,1058177.0,1007743,200233519,94227923.5,nanosleep
0.3,4632384275,1416,3271457.8,28112.5,1002,40854143,7831839.8,pthread_rwlock_wrlock
0.1,1591439644,988685,1609.7,1202.0,961,744090,1727.4,backtrace
0.0,668420846,5394,123919.3,25643.0,1002,43085262,810565.8,ioctl
0.0,658764946,29991,21965.4,8436.0,1001,7146675,159720.1,pthread_mutex_lock
0.0,569132554,337258,1687.5,1403.0,1001,759048,2324.0,pthread_cond_broadcast
0.0,260738440,1173,222283.4,8967.0,1012,4850494,566673.3,pthread_rwlock_rdlock
0.0,188226238,26395,7131.1,6612.0,1001,251721,6291.5,read
0.0,99266779,28043,3539.8,2795.0,1012,90960,2482.6,open
0.0,87413517,4,21853379.3,21359528.5,42640,44651820,25174253.8,pthread_join
0.0,82191035,52222,1573.9,1322.0,1001,295793,2097.1,pthread_cond_signal
0.0,68960671,2980,23141.2,10214.0,1453,184726,25424.6,munmap
0.0,39711726,12409,3200.2,3065.0,1002,52027,1135.3,send
0.0,38470600,4347,8849.9,8205.0,4728,50314,2723.1,accept
0.0,36566418,17756,2059.4,1523.0,1001,70181,1445.0,recvmsg
0.0,30913642,28685,1077.7,1042.0,1001,15459,212.1,openat64
0.0,26981369,4,6745342.3,8128479.5,351627,10372783,4393178.1,sem_wait
0.0,22677741,994,22814.6,2414.0,1001,18830749,597172.7,open64
0.0,20507089,4762,4306.4,4097.0,1002,93785,2130.6,writev
0.0,15299486,9722,1573.7,1403.0,1001,143327,1817.7,close
0.0,14304866,9050,1580.6,1433.0,1001,27270,1002.8,write
0.0,10226041,1693,6040.2,5410.0,1001,118652,4851.3,mmap
0.0,8596595,312,27553.2,7990.0,2404,994769,108895.5,mmap64
0.0,5005413,4,1251353.3,130684.0,1432,4742613,2330667.4,futex
0.0,4993302,1208,4133.5,1774.0,1001,95408,6514.9,fread
0.0,3177395,83,38281.9,32491.0,16751,102342,16476.8,pthread_create
0.0,3149925,5,629985.0,635056.0,495365,744571,89829.8,fdatasync
0.0,2973136,55,54057.0,1342.0,1002,2791656,376034.6,fgets
0.0,1552479,811,1914.3,1583.0,1002,18544,1289.9,fopen
0.0,633644,2,316822.0,316822.0,269133,364511,67442.4,sem_timedwait
0.0,590921,195,3030.4,2475.0,1182,11140,1773.9,mprotect
0.0,559734,274,2042.8,1853.5,1233,6532,647.5,fopen64
0.0,487806,276,1767.4,1272.0,1001,30487,2612.9,recv
0.0,429155,195,2200.8,1492.0,1001,13656,2095.0,fclose
0.0,191802,36,5327.8,3000.5,1684,13054,3745.7,fflush
0.0,168936,18,9385.3,6763.0,1052,18916,6242.2,fwrite
0.0,134122,43,3119.1,1202.0,1012,10209,2452.5,stat64
0.0,110347,1,110347.0,110347.0,110347,110347,0.0,popen
0.0,100831,33,3055.5,2264.0,1132,6793,1787.8,sendmsg
0.0,88567,30,2952.2,2505.0,1012,6672,1878.2,fstat64
0.0,85238,16,5327.4,4503.5,2865,12023,2573.0,socket
0.0,65150,11,5922.7,6051.0,3266,9949,1968.2,connect
0.0,51847,4,12961.8,5855.5,4749,35387,14960.6,ftruncate
0.0,39053,7,5579.0,4719.0,3576,9477,2198.6,socketpair
0.0,27290,9,3032.2,3857.0,1032,5189,1902.1,statx
0.0,17302,3,5767.3,6362.0,1082,9858,4418.1,getdelim
0.0,15838,4,3959.5,4222.5,1302,6091,1989.5,bind
0.0,13075,6,2179.2,1327.0,1002,5501,1757.7,getc
0.0,7404,1,7404.0,7404.0,7404,7404,0.0,pipe2
0.0,6732,3,2244.0,1643.0,1222,3867,1421.2,flock
0.0,6162,3,2054.0,2054.0,1774,2334,280.0,lockf
0.0,5320,1,5320.0,5320.0,5320,5320,0.0,memfd_create
0.0,4818,4,1204.5,1217.0,1021,1363,168.9,fcntl64
0.0,3566,3,1188.7,1182.0,1022,1362,170.1,stat
0.0,2695,1,2695.0,2695.0,2695,2695,0.0,pipe
0.0,2254,2,1127.0,1127.0,1032,1222,134.4,pthread_mutex_trylock
0.0,1763,1,1763.0,1763.0,1763,1763,0.0,listen
0.0,1573,1,1573.0,1573.0,1573,1573,0.0,sigaction
0.0,1253,1,1253.0,1253.0,1253,1253,0.0,fcntl
1 Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
2 66.4 966436705693 674529 1432757.8 92112.0 1001 9178558622 21169873.5 pthread_cond_wait
3 17.4 252710188786 31653 7983767.4 997705.0 1001 2000056483 42373635.1 pthread_cond_timedwait
4 7.0 101523471662 41663 2436777.8 19406.0 1001 100191074 12645054.7 poll
5 4.6 67673864352 8686 7791142.6 6310647.5 6255339 11320895 1834156.6 usleep
6 2.4 34536986482 347 99530220.4 100108890.0 1022 100164313 7588935.6 select
7 1.6 23806358988 333 71490567.5 1058177.0 1007743 200233519 94227923.5 nanosleep
8 0.3 4632384275 1416 3271457.8 28112.5 1002 40854143 7831839.8 pthread_rwlock_wrlock
9 0.1 1591439644 988685 1609.7 1202.0 961 744090 1727.4 backtrace
10 0.0 668420846 5394 123919.3 25643.0 1002 43085262 810565.8 ioctl
11 0.0 658764946 29991 21965.4 8436.0 1001 7146675 159720.1 pthread_mutex_lock
12 0.0 569132554 337258 1687.5 1403.0 1001 759048 2324.0 pthread_cond_broadcast
13 0.0 260738440 1173 222283.4 8967.0 1012 4850494 566673.3 pthread_rwlock_rdlock
14 0.0 188226238 26395 7131.1 6612.0 1001 251721 6291.5 read
15 0.0 99266779 28043 3539.8 2795.0 1012 90960 2482.6 open
16 0.0 87413517 4 21853379.3 21359528.5 42640 44651820 25174253.8 pthread_join
17 0.0 82191035 52222 1573.9 1322.0 1001 295793 2097.1 pthread_cond_signal
18 0.0 68960671 2980 23141.2 10214.0 1453 184726 25424.6 munmap
19 0.0 39711726 12409 3200.2 3065.0 1002 52027 1135.3 send
20 0.0 38470600 4347 8849.9 8205.0 4728 50314 2723.1 accept
21 0.0 36566418 17756 2059.4 1523.0 1001 70181 1445.0 recvmsg
22 0.0 30913642 28685 1077.7 1042.0 1001 15459 212.1 openat64
23 0.0 26981369 4 6745342.3 8128479.5 351627 10372783 4393178.1 sem_wait
24 0.0 22677741 994 22814.6 2414.0 1001 18830749 597172.7 open64
25 0.0 20507089 4762 4306.4 4097.0 1002 93785 2130.6 writev
26 0.0 15299486 9722 1573.7 1403.0 1001 143327 1817.7 close
27 0.0 14304866 9050 1580.6 1433.0 1001 27270 1002.8 write
28 0.0 10226041 1693 6040.2 5410.0 1001 118652 4851.3 mmap
29 0.0 8596595 312 27553.2 7990.0 2404 994769 108895.5 mmap64
30 0.0 5005413 4 1251353.3 130684.0 1432 4742613 2330667.4 futex
31 0.0 4993302 1208 4133.5 1774.0 1001 95408 6514.9 fread
32 0.0 3177395 83 38281.9 32491.0 16751 102342 16476.8 pthread_create
33 0.0 3149925 5 629985.0 635056.0 495365 744571 89829.8 fdatasync
34 0.0 2973136 55 54057.0 1342.0 1002 2791656 376034.6 fgets
35 0.0 1552479 811 1914.3 1583.0 1002 18544 1289.9 fopen
36 0.0 633644 2 316822.0 316822.0 269133 364511 67442.4 sem_timedwait
37 0.0 590921 195 3030.4 2475.0 1182 11140 1773.9 mprotect
38 0.0 559734 274 2042.8 1853.5 1233 6532 647.5 fopen64
39 0.0 487806 276 1767.4 1272.0 1001 30487 2612.9 recv
40 0.0 429155 195 2200.8 1492.0 1001 13656 2095.0 fclose
41 0.0 191802 36 5327.8 3000.5 1684 13054 3745.7 fflush
42 0.0 168936 18 9385.3 6763.0 1052 18916 6242.2 fwrite
43 0.0 134122 43 3119.1 1202.0 1012 10209 2452.5 stat64
44 0.0 110347 1 110347.0 110347.0 110347 110347 0.0 popen
45 0.0 100831 33 3055.5 2264.0 1132 6793 1787.8 sendmsg
46 0.0 88567 30 2952.2 2505.0 1012 6672 1878.2 fstat64
47 0.0 85238 16 5327.4 4503.5 2865 12023 2573.0 socket
48 0.0 65150 11 5922.7 6051.0 3266 9949 1968.2 connect
49 0.0 51847 4 12961.8 5855.5 4749 35387 14960.6 ftruncate
50 0.0 39053 7 5579.0 4719.0 3576 9477 2198.6 socketpair
51 0.0 27290 9 3032.2 3857.0 1032 5189 1902.1 statx
52 0.0 17302 3 5767.3 6362.0 1082 9858 4418.1 getdelim
53 0.0 15838 4 3959.5 4222.5 1302 6091 1989.5 bind
54 0.0 13075 6 2179.2 1327.0 1002 5501 1757.7 getc
55 0.0 7404 1 7404.0 7404.0 7404 7404 0.0 pipe2
56 0.0 6732 3 2244.0 1643.0 1222 3867 1421.2 flock
57 0.0 6162 3 2054.0 2054.0 1774 2334 280.0 lockf
58 0.0 5320 1 5320.0 5320.0 5320 5320 0.0 memfd_create
59 0.0 4818 4 1204.5 1217.0 1021 1363 168.9 fcntl64
60 0.0 3566 3 1188.7 1182.0 1022 1362 170.1 stat
61 0.0 2695 1 2695.0 2695.0 2695 2695 0.0 pipe
62 0.0 2254 2 1127.0 1127.0 1032 1222 134.4 pthread_mutex_trylock
63 0.0 1763 1 1763.0 1763.0 1763 1763 0.0 listen
64 0.0 1573 1 1573.0 1573.0 1573 1573 0.0 sigaction
65 0.0 1253 1 1253.0 1253.0 1253 1253 0.0 fcntl

View File

@ -0,0 +1,40 @@
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
57.4,4437743635,231,19211011.4,14598473.0,203260,51954105,13533933.3,vkCreateComputePipelines
14.6,1132343222,816,1387675.5,107686.5,3076,40880372,4504039.9,vkCreateGraphicsPipelines
7.4,572377674,1,572377674.0,572377674.0,572377674,572377674,0.0,vkCreateDevice
4.8,368437156,4590,80269.5,79609.0,16642,918036,23607.1,vkQueuePresentKHR
4.2,321682595,74393,4324.1,2665.0,1623,2359842,13224.8,vkQueueSubmit
3.5,268416550,1007615,266.4,190.0,40,2721608,2734.1,vkCmdBindPipeline
2.7,206427138,724923,284.8,191.0,80,628425,1315.9,vkCmdPipelineBarrier2KHR
1.0,76886778,74394,1033.5,942.0,230,308968,1306.9,vkBeginCommandBuffer
0.8,62614508,118,530631.4,174531.0,37350,6218486,1003175.7,vkAllocateMemory
0.7,57618875,3,19206291.7,3223356.0,3011360,51384159,27867052.1,vkCreateSwapchainKHR
0.7,54538392,74393,733.1,360.0,80,56386,1219.7,vkEndCommandBuffer
0.5,37513539,18379,2041.1,1973.0,712,34485,1433.9,vkGetAccelerationStructureBuildSizesKHR
0.4,32629253,32,1019664.2,334039.5,217727,20860812,3625946.5,vkCreateFence
0.2,13832441,582,23767.1,14081.0,792,170950,28061.6,vkCreateShaderModule
0.2,11781177,15,785411.8,124453.0,35617,8939063,2266145.9,vkFreeMemory
0.1,11528044,4634,2487.7,475.5,311,2592026,48888.4,vkWaitForFences
0.1,11144794,1834,6076.8,480.0,390,1090429,50692.9,vkBindImageMemory
0.1,10091060,21758,463.8,241.0,40,12684,560.5,vkCreateBuffer
0.1,9643758,5019,1921.5,792.0,171,1059260,20884.9,vkCreateImageView
0.1,7363917,4,1840979.3,1660674.5,1567842,2474726,425162.8,vkDeviceWaitIdle
0.1,5886602,11884,495.3,411.0,130,31839,460.6,vkCreateAccelerationStructureKHR
0.1,5411476,1763,3069.5,1613.0,1001,17853,3037.9,vkGetQueryPoolResults
0.1,4993289,4590,1087.9,902.0,631,7855,510.0,vkAcquireNextImageKHR
0.1,4377490,1834,2386.9,521.0,160,2381162,55670.5,vkCreateImage
0.0,3287534,21695,151.5,50.0,30,32801,320.0,vkBindBufferMemory2
0.0,3161060,9181,344.3,301.0,40,6702,190.2,vkDestroyAccelerationStructureKHR
0.0,1724191,4590,375.6,281.0,110,2314,234.4,vkResetEvent
0.0,1396906,4591,304.3,281.0,230,14598,221.7,vkCmdPipelineBarrier
0.0,1130605,5421,208.6,180.0,20,6702,153.0,vkResetQueryPoolEXT
0.0,320001,130,2461.5,2405.0,1012,25127,2232.0,vkCreateEvent
0.0,197365,613,322.0,200.0,50,4579,390.2,vkCreateFramebuffer
0.0,195445,38,5143.3,4022.5,421,26029,4610.3,vkAllocateCommandBuffers
0.0,76176,52,1464.9,901.5,230,9788,1542.9,vkCreateRenderPass2KHR
0.0,71405,63,1133.4,932.0,401,4799,838.5,vkBindBufferMemory
0.0,6792,30,226.4,195.0,41,752,198.5,vkMapMemory
0.0,1753,2,876.5,876.5,751,1002,177.5,vkTrimCommandPool
0.0,1383,1,1383.0,1383.0,1383,1383,0.0,vkDestroyEvent
0.0,1042,1,1042.0,1042.0,1042,1042,0.0,vkDestroySemaphore
0.0,812,1,812.0,812.0,812,812,0.0,vkCreateCommandPool
1 Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
2 57.4 4437743635 231 19211011.4 14598473.0 203260 51954105 13533933.3 vkCreateComputePipelines
3 14.6 1132343222 816 1387675.5 107686.5 3076 40880372 4504039.9 vkCreateGraphicsPipelines
4 7.4 572377674 1 572377674.0 572377674.0 572377674 572377674 0.0 vkCreateDevice
5 4.8 368437156 4590 80269.5 79609.0 16642 918036 23607.1 vkQueuePresentKHR
6 4.2 321682595 74393 4324.1 2665.0 1623 2359842 13224.8 vkQueueSubmit
7 3.5 268416550 1007615 266.4 190.0 40 2721608 2734.1 vkCmdBindPipeline
8 2.7 206427138 724923 284.8 191.0 80 628425 1315.9 vkCmdPipelineBarrier2KHR
9 1.0 76886778 74394 1033.5 942.0 230 308968 1306.9 vkBeginCommandBuffer
10 0.8 62614508 118 530631.4 174531.0 37350 6218486 1003175.7 vkAllocateMemory
11 0.7 57618875 3 19206291.7 3223356.0 3011360 51384159 27867052.1 vkCreateSwapchainKHR
12 0.7 54538392 74393 733.1 360.0 80 56386 1219.7 vkEndCommandBuffer
13 0.5 37513539 18379 2041.1 1973.0 712 34485 1433.9 vkGetAccelerationStructureBuildSizesKHR
14 0.4 32629253 32 1019664.2 334039.5 217727 20860812 3625946.5 vkCreateFence
15 0.2 13832441 582 23767.1 14081.0 792 170950 28061.6 vkCreateShaderModule
16 0.2 11781177 15 785411.8 124453.0 35617 8939063 2266145.9 vkFreeMemory
17 0.1 11528044 4634 2487.7 475.5 311 2592026 48888.4 vkWaitForFences
18 0.1 11144794 1834 6076.8 480.0 390 1090429 50692.9 vkBindImageMemory
19 0.1 10091060 21758 463.8 241.0 40 12684 560.5 vkCreateBuffer
20 0.1 9643758 5019 1921.5 792.0 171 1059260 20884.9 vkCreateImageView
21 0.1 7363917 4 1840979.3 1660674.5 1567842 2474726 425162.8 vkDeviceWaitIdle
22 0.1 5886602 11884 495.3 411.0 130 31839 460.6 vkCreateAccelerationStructureKHR
23 0.1 5411476 1763 3069.5 1613.0 1001 17853 3037.9 vkGetQueryPoolResults
24 0.1 4993289 4590 1087.9 902.0 631 7855 510.0 vkAcquireNextImageKHR
25 0.1 4377490 1834 2386.9 521.0 160 2381162 55670.5 vkCreateImage
26 0.0 3287534 21695 151.5 50.0 30 32801 320.0 vkBindBufferMemory2
27 0.0 3161060 9181 344.3 301.0 40 6702 190.2 vkDestroyAccelerationStructureKHR
28 0.0 1724191 4590 375.6 281.0 110 2314 234.4 vkResetEvent
29 0.0 1396906 4591 304.3 281.0 230 14598 221.7 vkCmdPipelineBarrier
30 0.0 1130605 5421 208.6 180.0 20 6702 153.0 vkResetQueryPoolEXT
31 0.0 320001 130 2461.5 2405.0 1012 25127 2232.0 vkCreateEvent
32 0.0 197365 613 322.0 200.0 50 4579 390.2 vkCreateFramebuffer
33 0.0 195445 38 5143.3 4022.5 421 26029 4610.3 vkAllocateCommandBuffers
34 0.0 76176 52 1464.9 901.5 230 9788 1542.9 vkCreateRenderPass2KHR
35 0.0 71405 63 1133.4 932.0 401 4799 838.5 vkBindBufferMemory
36 0.0 6792 30 226.4 195.0 41 752 198.5 vkMapMemory
37 0.0 1753 2 876.5 876.5 751 1002 177.5 vkTrimCommandPool
38 0.0 1383 1 1383.0 1383.0 1383 1383 0.0 vkDestroyEvent
39 0.0 1042 1 1042.0 1042.0 1042 1042 0.0 vkDestroySemaphore
40 0.0 812 1 812.0 812.0 812 812 0.0 vkCreateCommandPool

View File

@ -0,0 +1,32 @@
metricName,samples,avg_value,min_value,max_value
"Async Compute in Flight [Throughput %]",950083,0.0229106299133865,0,31
"Async Copy Engine Active [Cycles Active]",950083,38455.0993471097,0,163502
"Async Copy Engine Active [Throughput %]",950083,24.3463129010834,0,100
"Compute Warps in Flight [Avg Warps per Cycle]",950083,12.80763259631,0,90
"Compute Warps in Flight [Avg]",950083,2323037.42368404,0,17059537
"Compute Warps in Flight [Throughput %]",950083,13.3210919467036,0,94
"DRAM Read Bandwidth [Throughput %]",950083,10.7338569367097,0,86
"DRAM Write Bandwidth [Throughput %]",950083,9.84654182845078,0,84
"GPC Clock Frequency [MHz]",950083,1717372249.49787,569544286,1965027143
"GPU Active [Throughput %]",950083,80.6042019486719,0,100
"GR Active [Throughput %]",950083,76.6064217547309,0,100
"PCIe RX Throughput [Throughput %]",950083,1.5045643380631,0,98
"PCIe Read Requests to BAR1 [Requests]",950083,1.36830150628945e-05,0,1
"PCIe TX Throughput [Throughput %]",950083,1.39019959308818,0,40
"PCIe Write Requests to BAR1 [Requests]",950083,45.6542523126927,0,864
"Pixel Warps in Flight [Avg Warps per Cycle]",950083,8.71434179961119,0,93
"Pixel Warps in Flight [Avg]",950083,905560.791771877,-8388505,8388149
"Pixel Warps in Flight [Throughput %]",950083,9.07152532989223,0,97
"SM Issue [Throughput %]",950083,13.8823755398213,0,77
"SMs Active [Throughput %]",950083,42.6547027996501,0,100
"SYS Clock Frequency [MHz]",950083,1464888860.94162,779690000,1650150000
"Sync Compute in Flight [Throughput %]",950083,42.9391589997927,0,100
"Sync Copy Engine Active [Cycles Active]",950083,1828.06791722407,0,163454
"Sync Copy Engine Active [Throughput %]",950083,1.19980570118611,0,100
"Tensor Active [Throughput %]",950083,0.0,0,0
"Unallocated Warps in Active SMs [Avg Warps per Cycle]",950083,19.5389318617426,0,91
"Unallocated Warps in Active SMs [Avg]",950083,2624316.545225,-8388583,8388581
"Unallocated Warps in Active SMs [Throughput %]",950083,20.3578518929399,0,95
"Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle]",950083,0.440807803107728,0,27
"Vertex/Tess/Geometry Warps in Flight [Avg]",950083,12576.1936525546,0,2576053
"Vertex/Tess/Geometry Warps in Flight [Throughput %]",950083,0.481443200225665,0,42
1 metricName samples avg_value min_value max_value
2 Async Compute in Flight [Throughput %] 950083 0.0229106299133865 0 31
3 Async Copy Engine Active [Cycles Active] 950083 38455.0993471097 0 163502
4 Async Copy Engine Active [Throughput %] 950083 24.3463129010834 0 100
5 Compute Warps in Flight [Avg Warps per Cycle] 950083 12.80763259631 0 90
6 Compute Warps in Flight [Avg] 950083 2323037.42368404 0 17059537
7 Compute Warps in Flight [Throughput %] 950083 13.3210919467036 0 94
8 DRAM Read Bandwidth [Throughput %] 950083 10.7338569367097 0 86
9 DRAM Write Bandwidth [Throughput %] 950083 9.84654182845078 0 84
10 GPC Clock Frequency [MHz] 950083 1717372249.49787 569544286 1965027143
11 GPU Active [Throughput %] 950083 80.6042019486719 0 100
12 GR Active [Throughput %] 950083 76.6064217547309 0 100
13 PCIe RX Throughput [Throughput %] 950083 1.5045643380631 0 98
14 PCIe Read Requests to BAR1 [Requests] 950083 1.36830150628945e-05 0 1
15 PCIe TX Throughput [Throughput %] 950083 1.39019959308818 0 40
16 PCIe Write Requests to BAR1 [Requests] 950083 45.6542523126927 0 864
17 Pixel Warps in Flight [Avg Warps per Cycle] 950083 8.71434179961119 0 93
18 Pixel Warps in Flight [Avg] 950083 905560.791771877 -8388505 8388149
19 Pixel Warps in Flight [Throughput %] 950083 9.07152532989223 0 97
20 SM Issue [Throughput %] 950083 13.8823755398213 0 77
21 SMs Active [Throughput %] 950083 42.6547027996501 0 100
22 SYS Clock Frequency [MHz] 950083 1464888860.94162 779690000 1650150000
23 Sync Compute in Flight [Throughput %] 950083 42.9391589997927 0 100
24 Sync Copy Engine Active [Cycles Active] 950083 1828.06791722407 0 163454
25 Sync Copy Engine Active [Throughput %] 950083 1.19980570118611 0 100
26 Tensor Active [Throughput %] 950083 0.0 0 0
27 Unallocated Warps in Active SMs [Avg Warps per Cycle] 950083 19.5389318617426 0 91
28 Unallocated Warps in Active SMs [Avg] 950083 2624316.545225 -8388583 8388581
29 Unallocated Warps in Active SMs [Throughput %] 950083 20.3578518929399 0 95
30 Vertex/Tess/Geometry Warps in Flight [Avg Warps per Cycle] 950083 0.440807803107728 0 27
31 Vertex/Tess/Geometry Warps in Flight [Avg] 950083 12576.1936525546 0 2576053
32 Vertex/Tess/Geometry Warps in Flight [Throughput %] 950083 0.481443200225665 0 42

View File

@ -0,0 +1,66 @@
Time (%),Total Time (ns),Num Calls,Avg (ns),Med (ns),Min (ns),Max (ns),StdDev (ns),Name
69.0,2638096289950,2901939,909080.5,53710.0,1001,72246678091,60228137.5,pthread_cond_wait
17.8,681232429013,218377,3119524.6,310209.0,1001,61108235908,134160829.9,pthread_cond_timedwait
8.0,304011116725,204020,1490104.5,20749.0,1001,100165731,9602909.9,poll
2.5,94505613201,946,99900225.4,100113483.5,2705,100174076,4599901.0,select
2.4,93104642661,14761,6307475.3,6307706.0,6256721,7003606,10135.6,usleep
0.1,4779434926,3175,1505333.8,8345.0,1001,50953377,5634704.7,pthread_rwlock_wrlock
0.1,2388475375,1554030,1537.0,1323.0,1001,760450,1334.4,pthread_cond_broadcast
0.0,1383489571,205,6748729.6,1057145.0,1013984,200061988,27806119.4,nanosleep
0.0,828054573,5779,143286.8,51646.0,1032,30877115,645711.0,ioctl
0.0,524192152,23534,22273.8,7995.0,1001,9414411,223793.1,pthread_mutex_lock
0.0,430290062,250152,1720.1,1382.0,1001,167283,1483.5,pthread_cond_signal
0.0,269098076,1008,266962.4,12508.0,1052,4451488,635637.4,pthread_rwlock_rdlock
0.0,118376802,60981,1941.2,1373.0,1001,296504,2094.2,recvmsg
0.0,116650912,34929,3339.7,3066.0,1022,329706,2817.5,send
0.0,106271585,28033,3790.9,3547.0,1012,52238,1246.4,writev
0.0,92149772,4,23037443.0,22215069.0,65893,47653741,26542647.2,pthread_join
0.0,77095530,3009,25621.6,11682.0,1403,209591,26233.1,munmap
0.0,43781970,26502,1652.0,1433.0,1001,98965,1510.7,write
0.0,31401091,29445,1066.4,1042.0,1001,10330,232.0,openat64
0.0,28658399,4,7164599.8,8766099.5,398705,10727495,4606222.3,sem_wait
0.0,21058353,3176,6630.5,1212.0,1001,375302,13470.9,read
0.0,16097508,1,16097508.0,16097508.0,16097508,16097508,0.0,waitpid
0.0,11545409,1619,7131.2,6131.0,1472,73017,4087.8,mmap
0.0,7491899,307,24403.6,7935.0,2194,977235,93545.8,mmap64
0.0,5173017,1452,3562.7,1482.0,1001,78697,5893.0,fread
0.0,4010136,80,50126.7,44863.5,28353,107180,14408.6,pthread_create
0.0,2477958,44,56317.2,1613.0,1002,2325034,350036.5,fgets
0.0,2267572,738,3072.6,2355.0,1002,43220,3489.6,open64
0.0,1798878,1042,1726.4,1503.0,1002,16831,1033.1,fopen
0.0,1459879,4,364969.8,133224.0,1213,1192218,565217.9,futex
0.0,638315,186,3431.8,2815.5,1303,28984,2550.4,mprotect
0.0,609134,235,2592.1,1272.0,1002,44383,4000.2,close
0.0,588290,297,1980.8,1352.0,1032,12654,1754.6,open
0.0,548164,2,274082.0,274082.0,262250,285914,16733.0,sem_timedwait
0.0,533925,274,1948.6,1738.0,1192,6412,573.9,fopen64
0.0,525527,357,1472.1,1222.0,1001,10921,1187.9,recv
0.0,404101,183,2208.2,1462.0,1022,11692,1956.2,fclose
0.0,226864,1,226864.0,226864.0,226864,226864,0.0,fork
0.0,193521,45,4300.5,4389.0,1102,9598,2130.9,fstat64
0.0,147756,1,147756.0,147756.0,147756,147756,0.0,popen
0.0,126324,40,3158.1,3401.0,1002,10289,2357.5,stat64
0.0,101892,37,2753.8,2425.0,1172,6843,1489.7,sendmsg
0.0,68849,15,4589.9,3356.0,2285,13996,3074.1,socket
0.0,56233,11,5112.1,5330.0,1843,8956,1875.6,connect
0.0,28874,7,4124.9,3747.0,2455,7784,1686.9,socketpair
0.0,19287,9,2143.0,1353.0,1012,4368,1503.9,statx
0.0,12704,5,2540.8,2565.0,1012,4769,1589.4,getc
0.0,10600,3,3533.3,2785.0,1062,6753,2918.4,bind
0.0,9418,2,4709.0,4709.0,4108,5310,849.9,ftruncate
0.0,6973,2,3486.5,3486.5,3306,3667,255.3,pipe
0.0,5952,1,5952.0,5952.0,5952,5952,0.0,pipe2
0.0,5892,3,1964.0,2024.0,1463,2405,473.9,lockf
0.0,4629,1,4629.0,4629.0,4629,4629,0.0,memfd_create
0.0,4589,1,4589.0,4589.0,4589,4589,0.0,fstat
0.0,4478,1,4478.0,4478.0,4478,4478,0.0,getdelim
0.0,3526,1,3526.0,3526.0,3526,3526,0.0,fstatat
0.0,3467,3,1155.7,1192.0,1002,1273,139.1,stat
0.0,3386,2,1693.0,1693.0,1302,2084,553.0,fwrite
0.0,3106,1,3106.0,3106.0,3106,3106,0.0,fwrite_unlocked
0.0,3106,2,1553.0,1553.0,1232,1874,454.0,fcntl64
0.0,2835,2,1417.5,1417.5,1212,1623,290.6,flock
0.0,2665,1,2665.0,2665.0,2665,2665,0.0,fputs_unlocked
0.0,1433,1,1433.0,1433.0,1433,1433,0.0,prctl
0.0,1232,1,1232.0,1232.0,1232,1232,0.0,sigaction
0.0,1102,1,1102.0,1102.0,1102,1102,0.0,fcntl
1 Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
2 69.0 2638096289950 2901939 909080.5 53710.0 1001 72246678091 60228137.5 pthread_cond_wait
3 17.8 681232429013 218377 3119524.6 310209.0 1001 61108235908 134160829.9 pthread_cond_timedwait
4 8.0 304011116725 204020 1490104.5 20749.0 1001 100165731 9602909.9 poll
5 2.5 94505613201 946 99900225.4 100113483.5 2705 100174076 4599901.0 select
6 2.4 93104642661 14761 6307475.3 6307706.0 6256721 7003606 10135.6 usleep
7 0.1 4779434926 3175 1505333.8 8345.0 1001 50953377 5634704.7 pthread_rwlock_wrlock
8 0.1 2388475375 1554030 1537.0 1323.0 1001 760450 1334.4 pthread_cond_broadcast
9 0.0 1383489571 205 6748729.6 1057145.0 1013984 200061988 27806119.4 nanosleep
10 0.0 828054573 5779 143286.8 51646.0 1032 30877115 645711.0 ioctl
11 0.0 524192152 23534 22273.8 7995.0 1001 9414411 223793.1 pthread_mutex_lock
12 0.0 430290062 250152 1720.1 1382.0 1001 167283 1483.5 pthread_cond_signal
13 0.0 269098076 1008 266962.4 12508.0 1052 4451488 635637.4 pthread_rwlock_rdlock
14 0.0 118376802 60981 1941.2 1373.0 1001 296504 2094.2 recvmsg
15 0.0 116650912 34929 3339.7 3066.0 1022 329706 2817.5 send
16 0.0 106271585 28033 3790.9 3547.0 1012 52238 1246.4 writev
17 0.0 92149772 4 23037443.0 22215069.0 65893 47653741 26542647.2 pthread_join
18 0.0 77095530 3009 25621.6 11682.0 1403 209591 26233.1 munmap
19 0.0 43781970 26502 1652.0 1433.0 1001 98965 1510.7 write
20 0.0 31401091 29445 1066.4 1042.0 1001 10330 232.0 openat64
21 0.0 28658399 4 7164599.8 8766099.5 398705 10727495 4606222.3 sem_wait
22 0.0 21058353 3176 6630.5 1212.0 1001 375302 13470.9 read
23 0.0 16097508 1 16097508.0 16097508.0 16097508 16097508 0.0 waitpid
24 0.0 11545409 1619 7131.2 6131.0 1472 73017 4087.8 mmap
25 0.0 7491899 307 24403.6 7935.0 2194 977235 93545.8 mmap64
26 0.0 5173017 1452 3562.7 1482.0 1001 78697 5893.0 fread
27 0.0 4010136 80 50126.7 44863.5 28353 107180 14408.6 pthread_create
28 0.0 2477958 44 56317.2 1613.0 1002 2325034 350036.5 fgets
29 0.0 2267572 738 3072.6 2355.0 1002 43220 3489.6 open64
30 0.0 1798878 1042 1726.4 1503.0 1002 16831 1033.1 fopen
31 0.0 1459879 4 364969.8 133224.0 1213 1192218 565217.9 futex
32 0.0 638315 186 3431.8 2815.5 1303 28984 2550.4 mprotect
33 0.0 609134 235 2592.1 1272.0 1002 44383 4000.2 close
34 0.0 588290 297 1980.8 1352.0 1032 12654 1754.6 open
35 0.0 548164 2 274082.0 274082.0 262250 285914 16733.0 sem_timedwait
36 0.0 533925 274 1948.6 1738.0 1192 6412 573.9 fopen64
37 0.0 525527 357 1472.1 1222.0 1001 10921 1187.9 recv
38 0.0 404101 183 2208.2 1462.0 1022 11692 1956.2 fclose
39 0.0 226864 1 226864.0 226864.0 226864 226864 0.0 fork
40 0.0 193521 45 4300.5 4389.0 1102 9598 2130.9 fstat64
41 0.0 147756 1 147756.0 147756.0 147756 147756 0.0 popen
42 0.0 126324 40 3158.1 3401.0 1002 10289 2357.5 stat64
43 0.0 101892 37 2753.8 2425.0 1172 6843 1489.7 sendmsg
44 0.0 68849 15 4589.9 3356.0 2285 13996 3074.1 socket
45 0.0 56233 11 5112.1 5330.0 1843 8956 1875.6 connect
46 0.0 28874 7 4124.9 3747.0 2455 7784 1686.9 socketpair
47 0.0 19287 9 2143.0 1353.0 1012 4368 1503.9 statx
48 0.0 12704 5 2540.8 2565.0 1012 4769 1589.4 getc
49 0.0 10600 3 3533.3 2785.0 1062 6753 2918.4 bind
50 0.0 9418 2 4709.0 4709.0 4108 5310 849.9 ftruncate
51 0.0 6973 2 3486.5 3486.5 3306 3667 255.3 pipe
52 0.0 5952 1 5952.0 5952.0 5952 5952 0.0 pipe2
53 0.0 5892 3 1964.0 2024.0 1463 2405 473.9 lockf
54 0.0 4629 1 4629.0 4629.0 4629 4629 0.0 memfd_create
55 0.0 4589 1 4589.0 4589.0 4589 4589 0.0 fstat
56 0.0 4478 1 4478.0 4478.0 4478 4478 0.0 getdelim
57 0.0 3526 1 3526.0 3526.0 3526 3526 0.0 fstatat
58 0.0 3467 3 1155.7 1192.0 1002 1273 139.1 stat
59 0.0 3386 2 1693.0 1693.0 1302 2084 553.0 fwrite
60 0.0 3106 1 3106.0 3106.0 3106 3106 0.0 fwrite_unlocked
61 0.0 3106 2 1553.0 1553.0 1232 1874 454.0 fcntl64
62 0.0 2835 2 1417.5 1417.5 1212 1623 290.6 flock
63 0.0 2665 1 2665.0 2665.0 2665 2665 0.0 fputs_unlocked
64 0.0 1433 1 1433.0 1433.0 1433 1433 0.0 prctl
65 0.0 1232 1 1232.0 1232.0 1232 1232 0.0 sigaction
66 0.0 1102 1 1102.0 1102.0 1102 1102 0.0 fcntl

View File

@ -2,6 +2,6 @@
"BuildId": "37670630", "BuildId": "37670630",
"Modules": "Modules":
{ {
"BulletHellCPP": "libUnrealEditor-BulletHellCPP-6941.so" "BulletHellCPP": "libUnrealEditor-BulletHellCPP.so"
} }
} }

Some files were not shown because too many files have changed in this diff Show More