# pytorch-gpu-case **Repository Path**: zeasa/pytorch-gpu-case ## Basic Information - **Project Name**: pytorch-gpu-case - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-26 - **Last Updated**: 2026-01-26 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README python resnet50-bf16.py -w 10 -b 4 -i 100 -t 1 parser.add_argument('-w', '--warmup', type=int, default=50, help='Number of warmup iterations') parser.add_argument('-b', '--batch', type=int, default=16, help='Batch size') parser.add_argument('-i', '--iter', type=int, default=1000, help='Number of test iterations') parser.add_argument('-t', '--thread', type=int, help='Number of threads') python llama2-7b-bf16.py -b 4 -i 1 -s 1024 -o 16 -d -t 1 parser.add_argument('-t', '--thread', type=int, help='Number of threads for torch') parser.add_argument('-b', '--batch', type=int, default=1, help='Batch size for inference') parser.add_argument('-i', '--iter', type=int, default=5, help='Number of iterations for each benchmark') parser.add_argument('-s', '--sequence', type=int, default=128, help='Sequence length for throughput test (single int), default=128') parser.add_argument('-o', '--outputlen', type=int, default=128, help='Generation length for generation test (single int), default=128') parser.add_argument('-p', '--inference', action='store_true', help='Run inference benchmark') parser.add_argument('-d', '--generation', action='store_true', help='Run generation benchmark') nsys profile --trace cuda,osrt,nvtx --gpu-metrics-devices=all --cuda-memory-usage true --force-overwrite true --python-backtrace=cuda --output profile_resnet_f32_bs1 python resnet50-gpu.py -b 1