• #cb#ce#cc#e5 > Fast CRCComputation foriSCSI PolynomialUsing CRC32Instruction
  • Fast CRCComputation foriSCSI PolynomialUsing CRC32Instruction

    免费下载 下载该文档 文档格式:PDF   更新时间:2012-09-02   下载次数:0   点击次数:1
    White Paper
    Vinodh Gopal Jim Guilford Erdinc Ozturk Gil Wolrich Wajdi Feghali Martin Dixon IA Architects Intel Corporation Deniz Karakoyunlu PhD Worcester Polytechnic Institute
    Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction
    April 2011
    323405
    Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction
    Executive Summary
    Cyclic Redundancy Check (CRC) codes are widely used for integrity checking of data in fields such as storage and networking. There is an ever-increasing need for very high-speed CRC computations on processors for end-to-end integrity checks. We present fast and efficient methods of computing CRC on Intel processors for the fixed (degree-32) iSCSI polynomial, using the CRC32 instruction present in the Intel?? Core?? i5 processor 650 . Instead of computing CRC of the entire message with a traditional linear method, we use a faster method to split an arbitrary length buffer to a number of smaller fixed size segments, compute the CRC on these segments in parallel followed by a recombination step of computing the effective CRC using the partial CRCs of the segments. Parallelized CRC computation is used to maximize the throughput of the CRC32 instruction. We show an efficient method for data buffers of arbitrary length. The final recombination of CRCs adds an overhead and can be implemented with lookup tables on the Nehalem microarchitecture – we show how to do this with as few tables as possible while giving excellent overall performance on the range of sizes. The PCLMULQDQ instruction in the Westmere microarchitecture allows efficient recombination of CRCs without lookup tables. The various methods are thoroughly explained in this paper with real code examples.
    2
    Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction
    These functions work across an arbitrary range of buffer sizes guaranteeing excellent performance across the range, achieving nearly 3X the performance of a linear implementation of CRC32. For instance, a single core of an Intel?? Core?? i5 processor 650 can compute the CRC of a 1024-byte buffer at the rate of 0.145 cycles/byte with a single thread! 1

    下一页

  • 下载地址 (推荐使用迅雷下载地址,速度快,支持断点续传)
  • 免费下载 PDF格式下载
  • 您可能感兴趣的
  • cecb  如图cacbcdce  badacce5  cbce认证  b7c2cbcegb2312