Fast CRCComputation foriSCSI PolynomialUsing CRC32Instruction

White Paper
Vinodh Gopal Jim Guilford Erdinc Ozturk Gil Wolrich Wajdi Feghali Martin Dixon IA Architects Intel Corporation Deniz Karakoyunlu PhD Worcester Polytechnic Institute
Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction
April 2011
323405
Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction
Executive Summary
Cyclic Redundancy Check (CRC) codes are widely used for integrity checking of data in fields such as storage and networking. There is an ever-increasing need for very high-speed CRC computations on processors for end-to-end integrity checks. We present fast and efficient methods of computing CRC on Intel processors for the fixed (degree-32) iSCSI polynomial, using the CRC32 instruction present in the Intel?? Core?? i5 processor 650 . Instead of computing CRC of the entire message with a traditional linear method, we use a faster method to split an arbitrary length buffer to a number of smaller fixed size segments, compute the CRC on these segments in parallel followed by a recombination step of computing the effective CRC using the partial CRCs of the segments. Parallelized CRC computation is used to maximize the throughput of the CRC32 instruction. We show an efficient method for data buffers of arbitrary length. The final recombination of CRCs adds an overhead and can be implemented with lookup tables on the Nehalem microarchitecture – we show how to do this with as few tables as possible while giving excellent overall performance on the range of sizes. The PCLMULQDQ instruction in the Westmere microarchitecture allows efficient recombination of CRCs without lookup tables. The various methods are thoroughly explained in this paper with real code examples.
2
Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction
These functions work across an arbitrary range of buffer sizes guaranteeing excellent performance across the range, achieving nearly 3X the performance of a linear implementation of CRC32. For instance, a single core of an Intel?? Core?? i5 processor 650 can compute the CRC of a 1024-byte buffer at the rate of 0.145 cycles/byte with a single thread! 1

下一页