Maximise the performance of your applications with Intel's advanced tuning libraries for software and hardware, including Media, Communications and Entertainment applications. Easily test and run your code on a variety of network fabrics and get outstanding performance on Intel processors and integrate with most popular development tools and environments.

The 64-bit Tipping Point

This paper explores the increasing importance of 64-bit capable platforms, and offers guidelines for planning a cost-effective transition.

Intel Integrated Performance Primitives 7.1

boxshot

Deliver rich multimedia experiences to your customers with Intel Integrated Performance Primitives (Intel IPP) - a library of thousands of highly optimised software functions for multimedia and data processing applications.

Multi-Core Power for Multimedia and Data Processing.

Highlights

  • New Intel Core 2 Processor Optimisations.

  • New version for Mac OS now available!

  • Free code samples to jump-start your development.

  • Free, unlimited redistribution Intel IPP runtime libraries.

Key Features

Support for Multi-Core Processors - Intel IPP’s support for the Intel Core microarchitecture processor family is better than ever!

Extensive Range of Optimised Functions - Cut through performance bottlenecks wherever they occur in your application with a huge selection of performance-optimised functions.

Codec Development Framework - Jumpstart advanced codec development for media applications with the Intel IPP Unified Media Classes (UMC) and Unified Speech Classes (USC) code samples.

Every purchase of an Intel Software Development Product includes a year of support services, which provides access to Intel Premier Support and all product releases during that time. Intel Premier Support gives you online access to Intel's expert engineering support staff, technical notes, application notes, and documentation.

Function Domains

Intel IPP extends the unrivalled breadth and diversity of highly optimised functions to let application developers cut through performance bottlenecks top-to-bottom in applications.

Intel IPP library functions are organised into function domains. Click on a link below to jump to a description of the domain and its associated code samples:

  • Video Coding

  • Audio Coding

  • Image Coding

  • Speech Coding

  • Computer Vision

  • Image Processing

  • Speech Recognition

  • Signal Processing

  • Cryptography

  • Data Compression

  • Color Conversion

  • String

  • Vector Math

Performance

Advanced tuning techniques for software and hardware make it easy to get maximum performance out of your applications.

  • Media applications get the performance headroom to add features and deliver more high-quality video and audio streams to more users.

  • Communications applications get the performance headroom to provide more differentiated services to clients.

  • Entertainment applications deliver more vivid, interactive user experiences.

These techniques include:

  • Microarchitecture-based tuning:

  • Pre-fetching and cache blocking

  • Avoiding data and trace cache misses

  • Avoiding branch mispredictions

  • Exploiting the instruction set architecture:

  • Intel MMX technology

  • Streaming SIMD Extensions (SSE), SSE2, SSE3

  • Intel Extended Memory 64 Technology (Intel EM64T)

  • Using Multi-Core and Hyper-Threading Technologies

For more information on Intel IPP performance indicators, see the Intel IPP performance page.

Compatibility

Support for Mac OS is now here! With Intel Integrated Performance Primitives 5.1 for Mac OS, you can easily port your applications to Mac OS by using the same library API and functions available for Windows and Linux.

Intel IPP is easily used and integrated with popular development tools and environments, such as Microsoft Visual Studio, Xcode, Eclipse, GCC, and the Intel C++ Compiler.

Processors

  • Multi-core processors, including Intel Core 2 Duo, Intel Core Duo, Intel Xeon and Intel Pentium D processors.

  • Intel EM64T-based systems, including Intel Core 2 processors, Intel Xeon processors and Intel Pentium D processors.

  • Intel Core Solo, Intel Pentium 4 and Intel Pentium M processors.

  • Intel XScale Microarchitecture-based processors, including Intel IXP4xx processors and Intel PXA27x processors with Intel Wireless MMX Technology.

  • Intel Itanium 2 processors.

System Requirements

Please refer to the Intel Software Development Products Web site for details on system requirements for Intel Integrated Performance Primitives.


Resources

Boosting Application Performance Using Intel Performance Libraries

This paper focuses on the benefits of using Intel Performance Libraries for boosting application performance.

Intel Integrated Performance Primitives Webinar Archive

Download and view the audio visual recording of the Intel IPP Webinar delivered by Intel's Stewart Taylor and hosted by Hearne Scientific Software on the 25th of August 2004.

Intel IPP Cognitec case study

Cognitec utilized Intel® IPP to greatly increase the performance of their biometric facial recognition application. By integrating Intel IPP with their FaceVACs application, Cognitec achieved up to a 10x increase in the number of facial comparisons delivered by a system, relative to the pre-Intel IPP baseline.

The 64-bit Tipping Point

This paper explores the increasing importance of 64-bit capable platforms, and offers guidelines for planning a cost-effective transition.

Intel Math Kernel Library 11.0

boxshot

Intel Math Kernel Library (Intel MKL) offers highly optimised, thread-safe math routines for science, engineering, and financial applications that require maximum performance.

Intel MKL is available as a standalone product, as well as with the Intel Cluster Toolkit and or with the Professional Editions of the Intel compilers.

Choose Intel Math Kernel Library for its features, functionality and compatibility with Linux, Windows and Mac OS.

Compatibility

Operating Systems - Support for Mac OS is now here! With Intel MKL for Mac OS, you can easily port your applications to Mac OS by using the same library API and functions available for Windows and Linux.

Development Environments - Intel MKL is easily used and integrated with popular development tools and environments.

Processors - Underneath a single consistent API, Intel MKL functions are highly optimised for a broad range of 32-bit and 64-bit microprocessors:

  • Multi-core processors - Intel Core Duo processor and Intel Pentium D processor

  • Intel Core Solo processor

  • Intel Xeon processor

  • Processors with Intel EM64T, including 64-bit Intel Xeon processor, Pentium D processors, and Pentium processor Extreme Edition

  • Pentium 4 and Pentium M processors

  • Processors based on Intel XScale technology, including Intel IXP4xx processors and Intel PXA27x application processors with Intel Wireless MMX technology support.

  • Itanium 2 processor

Every purchase of an Intel Software Development Product includes a year of support services, which provides access to Intel Premier Support and all product releases during that time. Intel Premier Support gives you online access to Intel's expert engineering support staff, technical notes, application notes, and documentation.

Highlights

Outstanding performance on Intel processors

Achieve outstanding performance with the math library that is highly optimised for Intel Itanium 2, Intel Xeon, and Intel Pentium 4 processor-based systems. Intel MKL performance is competitive with that of other math software packages on AMD processors.

Multi-core ready

  • Excellent scaling on multiprocessor systems - Use the built-in parallelism of Intel MKL to automatically obtain excellent scaling on multiprocessors. Intel MKL Level-3 BLAS and Fast Fourier Transforms are heavily threaded using OpenMP.

  • Thread-Safe - All Intel MKL functions are thread-safe.

Automatic runtime processor detection

A runtime check is performed so that processor-specific optimised code is executed, ensuring that your application achieves optimal performance, whatever system it is executing on.

Support for C and Fortran interfaces

Unlike some alternative math libraries that require you to purchase multiple products to get C and Fortran interfaces, Intel MKL includes both.

Support for multiple Intel processors in one package

Alternative math libraries require you to purchase multiple products for support of Itanium 2, Intel Xeon, and Pentium 4 processors. Intel MKL includes support for ALL of these processors in a single, inexpensive package.

Royalty-free distribution rights

Redistribute unlimited copies of the runtime libraries with your software.

User forum

Share experiences with others at the Intel engineer moderated Intel MKL Discussion Forum.

Intel Premier Support

Receive one year of world-class technical support with every purchase of Intel MKL. During this period, you can download product upgrades free of charge, including major version releases.

Functionality

Linear Algebra – BLAS and LAPACK

Deploy BLAS and LAPACK routines that have been highly optimised for Intel processors and provide significant performance improvements over alternative implementations.

Linear Algebra – Sparse Solvers

Intel MKL provides both direct and indirect/iterative sparse solvers. Solve large, sparse, symmetric, and asymmetric linear systems of equations on shared-memory multiprocessors with the PARDISO Direct Sparse Solver - an easy-to-use, thread-safe, high-performance, and memory-efficient software library licensed from the University of Basel. Solve symmetric, positive-definite systems of linear equations using the new Conjugate Gradient iterative solver with a flexible reverse communication interface.

Fast Fourier Transforms (FFT)

Employ multi-dimensional FFT routines (1D up to 7D) with mixed radix support and a modern, easy-to-use C/Fortran interface. Intel MKL also provides a set of C routines ("wrappers") that mimic the FFTW 2.x and 3.0 interfaces, making it easy for current FFTW users to compare performance with Intel MKL.

Vector Math Library (VML)

Increase application speeds with vectorised implementations of computationally intensive core mathematical functions (power, trigonometric, exponential, hyperbolic, logarithmic, etc.)

Vector Random-Number Generators

Speed up your simulations using our vector random number generators, which can provide substantial performance improvements over scalar random number generator alternatives.

System Requirements

The Intel Math Kernel Library (Intel MKL) runs on Intel architecture-based workstations, servers, and personal computers running Linux or Microsoft Windows operating systems.

Refer to the Intel Web site for further details.


Resources

Boosting Application Performance Using Intel Performance Libraries

This paper focuses on the benefits of using Intel Performance Libraries for boosting application performance.

Intel Math Kernel Library Webinar Archive

Download and view the audio visual recording of the Intel MKL Webinar delivered by Intel's Todd Rosenquist and hosted by Hearne Scientific Software on the 19th of August 2004.

Intel MKL Abaqus case study

Abaqus Inc., a leading maker of mechanical simulation software, needs to simulate the physical response of structures and solid bodies to load, temperature, contact, impact, and other environmental conditions. Intel® Math Kernel Library (Intel® MKL) and Intel® VTune™ Performance Analyzer offered Abaqus engineers a path to a compelling solution on Intel architecture for these simulation requirements

Making the Move to Quad-Core and Beyond

Article from: Technology@Intel Magazine, December 2006. Why multi-core processors represent the future of computing.

The 64-bit Tipping Point

This paper explores the increasing importance of 64-bit capable platforms, and offers guidelines for planning a cost-effective transition.

Intel MPI Library Development Kit

boxshot

Intel MPI Library delivers a flexible, multi-fabric enabled, message-passing interface for developers and users of cluster applications. The library enables you to easily test and run your code on a variety of network fabrics.

Industries such as manufacturing, finance, transportation, and energy rely on clusters to manage their extensive data processing and storage needs. High-performance applications running on clusters utilise a Message-Passing Interface (MPI) to facilitate communication between processes.

The library supports Transmission Control Protocol (TCP) based messaging for Ethernet fabrics, Shared Memory-based messaging for Shared Memory fabrics, and Remote Data Memory Access (RDMA) based messaging for the InfiniBand architecture fabric.

Deliver Flexible, Efficient Cluster Messaging

Implementing the high performance MPI-2 specification on multiple fabrics, Intel MPI Library 3.1 focuses on making applications perform better on IA based clusters. The Intel MPI Library enables you to quickly deliver maximum end user performance even if you change or upgrade to new interconnects, without requiring major changes to the software or to the operating environment.

Every purchase of an Intel Software Development Product includes a year of support services, which provides access to Intel Premier Support and all product releases during that time. Intel Premier Support gives you online access to Intel's expert engineering support staff, technical notes, application notes, and documentation.

Features

New universal multi-fabric device

  • Smart fabrics selection – Simplified usage through automatic choice of fastest transport protocol between MPI processes without additional environment settings. Fully configurable device and fallback selection by environment variables.

  • Enhanced dynamic connection establishment – Reduced memory footprint by introducing lazy mode connection, which establishes connections only when needed.

  • Two-phase communication buffer enlargement – Reduced memory consumption by allocating only the necessary communication buffer memory space.

Increased application performance

  • DAPL intra-node communication mode – Bandwidth advantage for large messages by optional use of DAPL inside a multi-core or SMP node.

  • Further optimised collective operations – Significantly optimised versions of MPI_Reduce, MPI_Allreduce, MPI_Alltoall, MPI_Alltoallv, MPI_Broadcast, and others.

  • Enhanced process pinning – Maximised performance on Intel multi-core and SMP nodes. Provides for flexible process pinning.

  • Scalable job startup protocol – Application startup time significantly improved with this release.

  • Static version of libraries built without -fpic flag – Offers better performance for statically linked applications.

New installer capabilities

  • Supports a distributed install option that provides installation with Intel MPI Library 3.0 on the head node and compute nodes of a cluster, in one operation.

Increased interoperability

  • Additional thread safe libraries at level MPI_THREAD_MULTIPLE – An MPI application process can be multi-threaded and multiple threads may make MPI calls without restrictions.

  • Backward binary compatibility with Intel MPI Library 2.0 – Applications and objects compiled with Intel MPI Library 2.0 will work with the Intel MPI Library 3.0 run-time library.

  • Enhanced handling of multi-homed environment – Simplifies job and communication management and increases performance when working with multiple network interfaces per node.

Extended compiler support

  • Intel C++ Compiler 9.1 for Linux

  • Intel Fortran Compiler 9.1 for Linux

  • GNU Fortran 95 compiler, 4.0 and higher

Extended operating system support

  • Support for SLES 10

  • Enhanced Intel tool support

  • Intel Trace Analyzer and Collector 7.0 – Enhanced analysis capability for system network activity when combined with Intel MPI Library 3.0, adding new features such as trace file comparison and performance counters.

  • Intel Math Kernel Library Cluster Edition (Intel MKL Cluster Edition) 9.0 – Get optimal performance when you combine Intel MKL Cluster Edition with Intel MPI Library.

  • Intel MPI Benchmarks 3.0

Enhanced debugger support

  • Support for Intel Debugger 8.1-23, 9.1-23 – Advancing productivity by running the Intel command line debugger on a parallel application using Intel MPI Library.

  • Etnus Totalview 6.8 and higher

  • Allinea DDT* 1.9.2 and higher

Integration with the job schedulers

  • Parallelnavi NQS for Linux V2.0L10 and higher

  • Parallelnavi for Linux Advanced Edition V1.0L10A and higher

More Features

  • Product setup is streamlined with the ability to install under root or through an ordinary user ID. In addition, you can implement scripts for easy path setting.

  • Create dynamic applications with the ability to map to processes as they are spawned.

  • Write algorithms more efficiently using passive-target one-sided communication.

  • Take advantage of simplified process management, including automated MPD startup and cleanup; flexible system-, user-, and session-specific configuration files; and transparent support for alternative IP interfaces (which gives the end-user a better experience).

  • Use environment variables for runtime control over such things as memory registration cache, device-specific and collective protocol thresholds, and platform-specific fine-grain timers.

  • MPI-2 compatibility includes new features such as generalised requests and preliminary thread support.

  • Increase interoperability with your Fortran tools with new features.

  • Take advantage of all your disks with a tightly integrated ROMIO component.

  • Work with DAPL 1.1 and 1.2 compliant DAPL providers.

  • Use message queue browsing with the Etnus TotalView debugger.

  • Use internal MPI library state tracing with Intel Trace Analyzer and Collector 6.0.

  • Intel MPI Library now offers enhanced support of new operating systems and compilers, including Red Hat Enterprise Linux 4.0, SUSE Linux Enterprise Server 9, and Intel Compilers for Linux, version 9.0.

  • Get up and running faster with new documentation for the Development Kit.

Highlights

Performance

Multiple hardware fabrics

  • Get high-performance interconnects, including InfiniBand*, Myrinet*, QsNet*, as well as TCP, shared memory, and others.

  • Efficiently work through the Direct Access Programming Library (DAPL), making it easy for you to test and run applications on a variety of network fabrics.

Streamlined product setup

  • Get users up and running faster with the ability to install under root or through an ordinary user ID.

  • Implement mpivars.sh and mpivars.csh scripts for easy environment setup.

Simplified process management

  • Reduce hand-coding work by using the mpirun script, which automates multiprocessing daemon (MPD) startup and cleanup.

  • Take advantage of flexible system-, user-, and session-specific configuration files.

  • Give the end user a reliable runtime with transparent support for fallback Internet Protocol (IP) interfaces.

Environment variables for runtime control

  • Increase performance with the ability to use device-specific and collective-protocol thresholds.

  • Boost performance with memory registration cache.

  • Get more accurate measurements with platform-specific fine-grain timers.


Intel MPI Library 3.1 Interoperability

Intel MPI Library 3.1 is based on Argonne National Laboratory's MPICH-2 implementation and is targeted toward industry-wide standardization of the MPI-2 ABI with maximum performance. All MPI-1 features are supported, plus many MPI-2 features including the following:

  • Active target one-sided communication

  • Passive target one-sided communication

  • Generalized requests

  • Full thread support

  • File I/O


Simplified Integration with leading Linux Job Schedulers

Intel MPI Library 3.1 can be easily integrated with:

  • Platform LSF 6.1 and higher

  • Altair PBS Pro* 7.1 and higher

  • OpenPBS* 2.3

  • Torque* 1.2.0 and higher

  • Parallelnavi* NQS* for Linux V2.0L10 and higher

  • Parallelnavi for Linux Advanced Edition V1.0L10A and higher

  • NetBatch* 6.x and higher


Support for Process Managers

Intel MPI Library automatically recognizes PMI extension support and provides backward compatibility with older process managers.

Works with leading Linux Parallel Debuggers

Intel MPI Library can be integrated at job startup or as a process attachment. It also provides message queue browsing support and is interoperable with:

  • Intel Debugger 10.1

  • Allinea* Distributed Debugging Tool (DDT) 1.9.2 and higher

  • Etnus TotalView* debugger 6.8 and higher

  • GNU* debuggers

  • Valgrind* 3.2.3 (including suppression rules)


Integrated Programming Environments

  • Eclipse PTP* 1.0 GUI process launcher for Linux

  • Microsoft Visual Studio .NET*

 

Tested interoperability with Intel compilers and other Intel Cluster Toolkit applications

  • Intel C++ or Fortran Compiler 9.1 and higher (Windows)

  • Intel C++ or Fortran Compiler 8.0+, Linux 8.1+ for Intel 64 architecture

  • GNU Compilers 3.0 and higher

  • Build and Runtime Linkage with Intel Trace Analyzer and Collector 7.1 (Linux and Windows CCS)

  • Intel Math Kernel Library 9.1 and higher (Linux and Windows CCS)


Compatibility

Deliver high-performance applications to market sooner by using Intel
MPI Library, which provides a high degree of interoperability with Intel
tools and architecture:

  • Based on Argonne National Laboratory’s MPICH-2 implementation

  • Simplified Integration with leading Linux Job Schedulers

  • MPI-2 standard compliance and portability

  • Support for ROMIO* (a high-performance, portable MPI-IO implementation)

  • Support for leading Linux* Parallel Debuggers

  • Support for GNU compilers (version 3.0 or higher)


System Requirements

Refer to the Intel Software Development Products Web site for details on system requirements for Intel MPI Library.


Resources

Intel MPI Library 2.0 Case Study

The Australian Centre for Advanced Computing and Communications (ac3) built an Intel architecture–based supercomputing cluster with the goal of obtaining the best possible performance from the hardware.

Release Notes

Release notes for Intel MPI Library 2.0

Intel Threading Building Blocks 4.1

boxshot

Thread like an expert, without being one. Intel Threading Building Blocks is a C++ runtime library that simplifies threading for performance. It provides parallel algorithms and concurrent data structures that eliminate tedious threading implementation work. It's a tested and performance-tuned parallel substrate for your application.

Introduce threading that unleashes the performance of multi-core platforms. Write applications once and deploy on multiple OSs. Intel Threading Building Blocks enables your application performance to scale as the number of cores grows.

The Intel Threading Building Blocks (Intel TBB) are cross-platform compatible (Windows, Linux, and Mac OS), support 32-bit/64-bit applications, and work with Intel, Microsoft and GNU compilers.

This library is specifically designed to work in concert with other threading technologies, such as Win32, POSIX, and OpenMP threads, providing a high degree of design and development flexibility. The templates implemented in Intel Threading Building Blocks rely on generic programming in order to provide high-speed and flexible algorithms with very few implementation constraints.

Intel Threading Building Blocks adds to the functionality of Intel Thread Checker, Intel Thread Profiler, and the Intel Compilers, to enable the rapid implementation of high-performance threads in applications.

Every purchase of an Intel Software Development Product includes a year of support services, which provides access to Intel Premier Support and all product releases during that time. Intel Premier Support gives you online access to Intel's expert engineering support staff, technical notes, application notes, and documentation.

New Features

Enhancements in Intel Threading Building Blocks 3.0 include:

  • Extended compatibility and interoperability support for Microsoft Visual Studio 2010 Parallel Patterns Library* (PPL) and Concurrency Runtime* (ConcRT).

  • Added Microsoft Windows* 7, and Apple Mac OS* Snow Leopard support

  • Enhanced Task Scheduler provides starvation-proof scheduling tasks for queue-like work; Master Thread Isolation improves task scheduling predictability and responsiveness.

  • Enhanced Memory Allocator includes performance optimization for large-block allocations

  • Expanded lambdas support provides C++ 0x condition variable; parallel_pipeline a strongly typed lambda friendly interface for building and running pipelines; new container concurrent_unordered_map support

Highlights

Ready to use parallel algorithms

Select from a library of highly-efficient parallel algorithm templates, and rapidly obtain the advantages of multi-core Intel processors.

  • Quickly employ commonly needed algorithms designed for parallel performance and scalability.

  • Generic templates let you easily tailor these algorithms to your needs.

  • Supports easy plug-in deployment into applications to deliver scalable software speed-up, optimising for both available cores and cache locality.

  • Reduce the work required to produce threaded software in many cases, by means of pre-built parallel constructs.

Cross platform support

Write applications once and deploy on multiple OS's.

  • Provides a single solution for Windows, Linux, and Mac OS on 32-bit and 64-bit platforms using Intel Microsoft, and GNU compilers.

  • Supports industry-leading compilers from Intel, Microsoft and GNU.

  • Speeds deployment of applications on multiple multi-core platformsSupports industry-leading compilers from Intel, Microsoft and GNU.

Task based parallelism

Specify threading functionality in terms of logical tasks instead of physical threads.

  • Lets developer focus on higher level of scalable task patterns instead of low-level thread mechanics.

  • Uses proven data-decomposition abstractions that efficiently use multiple cores.

  • Enables automatic load balancing.

  • Efficiently supports nested parallelism, allowing parallel components to be built from other parallel components.

Library based solution

Get highly optimised parallel functionality now with minimal effort.

  • Your C++ application simply calls the Intel Threading Building Blocks library.

  • Standard C++ - no need to rewrite code in a new language.

  • Compatible with other threading packages.

  • Allows unlimited distribution of the runtime libraries with your software.

  • Seamlessly integrates into existing development environments.

Highly concurrent containers

Optimise the processor's ability to perform simultaneous tasks.

  • Simplify multithreaded application development with interfaces designed for thread-safety and high concurrency.

  • Improve application quality by employing pre-tested data structures.

  • Improve application performance by enabling multiple execution cores or processors to work together more efficiently.

System Requirements

Refer to the Intel Software Development Products Web site for details on system requirements for Intel Threading Building Blocks.


Resources

Making the Move to Quad-Core and Beyond

Article from: Technology@Intel Magazine, December 2006. Why multi-core processors represent the future of computing.

Multi-Core Processing

Software Sea Change: Multi-Core Processing Opens Innovative Business Possibilities. Article from Intel Software Insight magazine, September 2006.

Back to Top