Optimizing Your BI Semantic Model for Performance and Scale
Tuesday, December 11, 2012
by jsalvo
Here’s my notes from the ‘Optimizing Your BI Semantic Model for Performance and Scale (BIA-402-M)’ session at PASS Summit, presented by Akshai Mirchandani and Allan Folting of Microsoft.
Tabular Design Principles
- Speed and performance
- Optimize query performance over processing performance, focus on end-user experience.
- Accommodate changes without forcing reload, if possible
Tabular Model Overview
Vertipaq is a column based data store.
Column based data store is good for analytical queries, can ignore columns that aren’t needed for analysis.
Column based data store provides better compression = faster performance.
Each column has multiple data structures associated to it (calculated columns are treated like regular columns).
Data Structures
- Dictionary: Stores unique/distinct values found in a column.
- Segments: Store integer values associated with the dictionary values. A segment is the data size per read or unit of Vertipaq operation.
- Hierarchies: Have separate data storage for efficiency, provide quicker access for querying.
- Relationships: Accelerate lookups across tables.
- Partitions: Contain a group of segments. Can be created on any table, no distinction between dimension and fact tables. Partitioning in tabular is for data management purposes (ability to process partial data, expire old data, etc.).
- Storage on Disk
- Each column has an associated data file (.idf) that stores data IDs and values.
- Segments are stored in .idf file.
- Encoding
- Map column values to data ID. Advantage of data ID is compression.
- Two types of encoding:
- Value
- Arithmetic operation to calculate data ID from column value.
- Great for dense value distribution.
- Allows computation on data IDs.
- Stored in metadata.
- Hash
- Hash table to map values to data ID
- Great for sparse value distribution.
- Requires hash lookup for computation.
- Persisted on disk.
- Unique Identifier columns: don’t get much benefit from encoding. Max number of distinct values allowed is 2 billion.
- System chooses the encoding type for each column, this is not configurable.
- Encoding selection (selecting between value or hash)
- Sample rows and look at values (Sparse?, Dense?, String?)
- Re-encoding may occur during processing. May change encoding to hash type if it is determined that value encoding won’t work. Can be VERY expensive, requires all processed segments to be re-encoded.
System Views specific to Tabular (documented on MSDN)
- Connect to tabular instance (in PowerPivot for example)
- Example query
- Select DATABASE_NAME, ‘CUBE_NAME’ AS CUBE__NAME, MEASURE_GROUP_NAME, PARTITION_NAME, DIMENSION_NAME, TABLE_ID, COLUMN_ID, SEGMENT_NUMBER, TABLE_PARTITION_NUMBER, RECORDS_COUNT, ALLOCATED_SIZE, USED_SIZE, COMPRESSION_TYPE, BITS_COUNT, BOOKMARK_BITS_COUNT, VERTIPAQ_STATE From $system.DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS
- COLUMN_ENCODING column: 1 = Hash and 2 = Value
- Configuration Settings
- DefaultSegmentRowCount = 8M for AnalysisServices, = 1M for PowerPivot
- Value must be power of 2, should be at least 1M.
Larger = Generally better compression, faster queries.
Smaller = Smaller working set during processing.
ProcessingTimeboxSecPerMRow
- -1 (default) : 10 sec.
- Smaller = Greedy algorithm, gives most gains in beginning.
- Larger = Almost always better compression, higher query performance.
- Increase for large number of columns (>200)
Processing
Phases
- Read and encode data for segment N. Segment data is loaded sequentially in tabular.
- Compress segment N and read and encode data for segment N + 1 (happen in parallel). Compression is a greedy-algorithm.
- Continue for all remaining segments…
- Other structures built at end (calculated columns, hierarchies, etc.).
- Options
- Data: Loads data into partition or table.
- Recalc: For all tables in the database, recalculates calculated columns, rebuilds relationships. rebuilds user hierarchies, and rebuilds other internal engine structures. Table dictionaries are not affected. (http://blogs.msdn.com/b/cathyk/archive/2011/09/26/processing-tabular-models-101.aspx)
- Full: Data + Recalc
- Default: Data + Recalc, as needed
- Clear: Removes data, invalidates calculated data structures.
- Add: Creates new segments and merges. Adds new rows to a partition. Any affected calculated columns, relationships, user hierarchies, or internal engine structures (except table dictionaries) are recalculated. (http://blogs.msdn.com/b/cathyk/archive/2011/09/26/processing-tabular-models-101.aspx)
- Defrag: Defragment dictionaries for a table. Scans data in table, rebuild dictionaries. Optimizes the table dictionary (an internal engine structure) for a given table or for all tables in the database*. This operation removes all dictionary entries that no longer exist in the data set and rebuilds the partition data based on the new dictionaries. (http://blogs.msdn.com/b/cathyk/archive/2011/09/26/processing-tabular-models-101.aspx)
- Incremental Processing
- Typical
- Create new partitions for new data
- ProcessData on new partition or to reload into existing tables
- ProcessRecalc to rebuild calculated columns, hierarchies, relationships.
- Avoid
- Multiple ProcessFull since each causes a Recalc, unless contained in a single transactional batch. Better to do one Recalc.
- Advanced Processing
Parallel processing
- Use single transaction batch for processing tables in parallel
- No parallel processing of a table’s partitions in SQL Server 2012
- Use ProcessDefrag periodically
- Deleting partitions may leave outdated values in dictionaries
- Remove unused columns
- Make sure you modify source query as well
- Merge partitions to reduce segment fragmentation
- Last segment will be incomplete, partial segments
- Merging compacts the partial segments
Error handling
- RI violations assigned a blank value, do not generate an error during processing
Querying
- Two formula engines: MDX and DAX
- Formula engine (FE) calls into Vertipaq to retrieve data
- Query logic pushed to Vertipaq where possible
- Vertipaq query executes in parallel, one core per segment
- Optimized for execution on compressed format
- VertiPaq level caching for chatty FE/Vertipaq communication
- DAX/MDX vs. Vertipaq Query
- DAX/MDX
- Rich
- Single-threaded per query
- Designed for expressivity
- Vertipaq
- Simple
- One core per segment
- Heavily optimized for speed
- Simpler queries pushed to Vertipaq from formula engine.
- Will become more powerful over time
- DAX/MDX
- Inspecting Query Performance
- Profiler
Capacity Planning
Memory
- Ballpark of 3/10th of source data
- Fully reprocessing a model will require double that (old version that queries are still working against and new version being built during processing)
- Need to ensure enough memory available for queries, depends on query patterns (granularity, filters)
- High level estimation: 1/2 of source data, to be safe. Paging is also possible, but not recommended
- CPU
- Lower end hardware is often better. Newer/latest cores often take longer to reach high-end
- Look for faster cores, better cache systems