Search This Blog

Tuesday, March 31, 2009

Nelder-Mead method

- Also called downhill simplex method or amoeba method
- Commonly used nonlinear optimization algorithm

What is Simplex?

A simplex is a convex hull of a set of (n+1) points in some Euclidean space of dimension n or higher.

Audi Shark


What is CCI?

CCI stands for Collect Call Interview.

Monday, March 30, 2009

LG LGenius Innovations Keynote Highlights

Just kidding!

Friday, March 27, 2009

Disk Scheduling In Linux

In a disk, an access time is too big, since it has a mechanical part - disk head which is moving on the disk. It makes overwhole performance of a disk too poor.

Formal problem statement:

Input: A set of requests
Input: The current disk head location
Input: Algorithm state (direction)

Output: the next disk block to get

The goals:

1. Maximize throughput
2. Maximize fairness
3. Avoid starvation
4. Real time concerns

Some scheduling algorithms:

1. A pessimal algorithm
2. An optimal algorithm - shortest service time first
3. First come first serve
4. Elevator
5. Cyclic elevator
6. Deadline scheduler
7. Anticipatory scheduling
8. Completely fair queuing


Andrew Morton said, "the anticipatory shceduler is wiping the others off the map!"

Wednesday, March 25, 2009

MB_ICONSTOP makes a sound that associated with this icon

MB_ICONSTOP makes a sound that associated with this icon.

In order not to make a sound, it is probably necessary to mute master volume. Otherwise, other message box could be desired. Probably, MB_ICONSTOP wants user to be alarmed in a certain situation, so it might be acceptable by users.

DFTL: A Flash Translation Layer Employing Demand-based Selective Caching of Page-level Address Mappings

DFTL: A Flash Translation Layer Employing Demand-based Selective Caching of Page-level Address Mappings
Aayush Gupta Youngjae Kim Bhuvan Urgaonkar
Department of Computer Science and Engineering
The Pennsylvania State University, University Park, PA 16802, USA
{axg354, youkim, bhuvan}

This paper argues that page level FTL has better performance than block level or hybrid type FTL, and propose cache algorithm to get over the disadvantage of page level FTL. Since page level FTL requires much of memory to store mapping table, the paper proposed cache algorithm to reduce required memory.

uFLIP: Understanding Flash IO Patterns

uFLIP: Understanding Flash IO Patterns
Luc Bouganim from INRIA
Bjorn Tor Jonsson from Reykjavik University
and Philippe Bonnet from the University of Copenhagen

The paper introduces a simple yet complete and powerful benchmarking methodology to quantify the performance of flash I/O patterns. Finally, it gives some design hints:
1. Flash devices do incur latency.
2. Block size should be 32KB.
3. Blocks should be aligned to flash pages.
4. Random writes should be limited to a focused area.
5. Sequential writes should be limited to a few partitions.
6. Combining a limited number of patterns is acceptable.
7. Neither concurrent nor delayed IOs improve the performance.

Non Volatile Cache Command Proposal for ATA8-ACS

Non Volatile Cache Command Proposal for ATA8-ACS
Nathan Obr

In order for the host to be able to provide guidance to the device on what sectors the NV Cache should contain, there needs to be new ATA commands that are capable of:
Device identification – NV cache size and power modes supported
Features negotiation – enable or disable use of the NV Cache
Entering the NV Cache power mode
Leaving the NV Cache power mode
Pinning (add and removes) LBAs in NV Cache
Synchronizing the NV Cache with magnetic media
Query LBAs in NV Cache
Query cache misses for access profiling
Afterwards, some commands for NV Cache have been added into ATA8-ACS.

NAND Flash Memory and Its Role in Storage Architectures

NAND Flash Memory and Its Role in Storage Architectures
Marco A. A. Sanvido, Frank R. Chu, Anand Kulkarni, and Robert Selinger

NAND flash technology
Reliability and failure mechanism – Wearout, retention, write disturb, and read disturb
Environmental factors – high density, relatively low entry costs, low power consumption, and robust characteristics with regards to response to shock and vibrations and wide range of ambient temperatures
Disk Caching architectures – Hybrid HDD, External Caching (Turbo Memory), and Non Volatile Memory Hardware Controller Interface (NVMHCI)
Solid State Drives (SSD)

Most server environments contain terabytes of capacity, so significant usage of flash is not affordable in the foreseeable future, NAND flash may be used in place of some of the DRAM caches used in storage subsystems, since it has a lower cost per bit and is nonvolatile without relying on batteries. However, most server data does not have the same locality as PC programs and user data, so caching approaches (e.g. hybrid HDDs) may not be as useful in server workloads. And if the caching is successful, then the higher I/O rates may create greater problems of wearout for flash.

Advances in Magnetic Data Storage Technologies

Advances in Magnetic Data Storage Technologies
Bandic, Z. Victora, R. H.

This special issue covers advances that have spurred real density growth of magnetic recording technologies, and future technologies that are expected including new architectures of storage systems.

Scaling of areal bit density of magnetic media requires scaling of the grain size. However, the grain size cannot be arbitrarily reduced because the superparamagnetic limit is reached at the point when a grain is so small that thermal energy alone is sufficient to cause change in its magnetic orientation. The current consensus in the HDD industry is that perpendicular magnetic recording may continue to scale up to 500-1000Gb/in2, while in the future, patterned media and HAMR will be required to extend bit areal density beyond 1 Tb/in2.

For Silicon Flash-based memories, as the gate length is reduced, the oxide thickness has to be proportionally reduced, currently reaching the point where the oxide film is so think that charge retention becomes degraded trough trap assisted tunneling. This leads to a reduced number of erase cycles available for Si flash, and in turn reduces the number of times Si flash can be read or written.

For the storage system, power efficiency, data security, and new storage architectures for Si flash-based memories become one of the important required features.

Currently, perpendicular magnetic recording is used, but MRAM, HAMR, and Si-based NAND flash memory lead new storage architectures.

Data Center Power Management

Data Center Power Management
Nagapramod Mandagere and David Du
Digital Technology Center Intelligent Storage Consortium (DISC)

Server power management
Dynamic Voltage and Frequency Scaling (DVFS)
Request Batching – a process that tradeoffs performance/responsiveness for power savings
Heterogeneity Aware Provisioning – a model based approach which tries to minimize the ratio of overall power consumption and throughput of the entire heterogeneous server cluster

Storage power management

Research domains for magnetic disks
1. Adaptive spin down or sleep transitioning
2. Dynamic modulation of rotation speed – dynamic rotation control
3. Caching aid – optimizing data layout (PDC and MAID) and Read/Write Caching (PALRU and PBLRU)
4. New media type – NAND-based flash memory

Future outstanding research issues
1. Research in understanding the tradeoffs between reliability, performance, and power consumption
2. Substituting other types of storage devices in place of power hungry SCSI and FC drives or mixing several types of storage devices in storage hierarchy
3. Research in integrating SSD and hybrid drives into enterprise storage systems

HVAC power management – Heating Ventilation and Air Conditioning

Recent Advancements and Future Challenges of Storage Systems

Recent Advancements and Future Challenges of Storage Systems
In the Internet environment, hard drives are becoming global storage devices that can access many types of data, manage and locate desired data, and securely preserve it wherever it resides.
By David H. C. Du, Fellow IEEE

Traditional goals of storage systems – performance, scalability, availability, and reliability

Other challenges – manageability, searching for the desired information, energy efficiency, long-term data preservation, and data security

The research on magnetic disks focuses on how to increase the aerial density and how to improve the reliability of the drive.

Another feature that has been included in today’s disk drives is a failure warning provided to the host systems when error rates exceed a threshold value. In 1994, a failure warning standard specification called self monitoring and reporting technology (SMART) was adopted by the disk drive industry.

On some disks like fiber channel (FC) drives, the processor can be programmed to execute XOR computations. By computing XOR on disks, it helps to reduce the need for such resources as CPU and redundant arrays of independent disk (RAID) controllers in parity computations.

The data available over the internet today is highly unstructured and heterogeneous. Therefore, the traditional way of accessing data via file name or url may not be adequate. As we are moving into the direction of accessing data based on their semantics, how to store semantic information with the data and how to allow data retrieval based on semantic become extremely challenging.

Intelligent storage devices – how we take advantage of the onboard computing and memory capability of a disk

DAS – direct attached storage
1. SCSI – small computer systems interface
2. ATA – advanced technology attachment (also known as IDE)
When multiple servers are required in an organization, DAS may seem to be low in cost from eeach server’s point of view. When the number of servers becomes larger, the DAS model becomes inefficient and expensive in the total cost of the ownership

NAS – Network attached storage
1. NFS – Network file system
2. CIFS – common internet file system

SAN – storage area network
Fiber channel and InfiniBand(IB) are two popular standards that provide physical connections and communication protocols in SAN. To a lesser extent, serial storage architecture (SSA) and gigabit (or 10 Gbit) Ethernet are also used for this purpose.
Supported topology – point to point, loop topology (FC-AL), and fabric
Supported topology – SSA string, SSA loop, and switched
Strong point – spatial reuse (multiple nonoverlapping simultaneous transfers on loop)
Limitations of SAN – distance and cost -> IP-based storage solutions
3 major IP storage standards – iSCSI, fiber channel over IP (FCIP), and iFCP
OSD – object-oriented storage device, intelligent storage device
Fault-tolerance storage – RAID

Additional challenges for current and future storage systems
Efficient power energy consumption – Massive array of idle disks (MAID), popular data concentration (PDC), Hibernator, dynamic revolutions per minute (DRPM)
Safe data protection – backup and efficient retrieval of keys, key recovery, long-term management, usability, scalability
Long-term data preservation

Blurring the Line Between Oses and Storage Devices

In today’s systems, the storage interface consists mainly of simple read and write commands; as a result, OSes operate with little understanding of device-specific characteristics and devices operate with little understanding of system priorities. More expressive interfaces, together with extended versions of today’s OS and firmware specializations, would allow the two to cooperate to achieve performance and functionality that neither can achieve alone. This technical report shows that greater cooperation will enable device-side specialization and OS-side specializations. For instance, freeblock scheduling results in up to a 300% speedup and track aligned extents shows performance improvement.

G. R Ganger. Blurring the Line Between Oses and Storage Devices. Technical Report CMU-CS-01166, Carnegie Mellon University, December 2001.


To maximize random write performance, Extreme FFS operates on a page-based algorithm, which means there is no fixed coupling between physical and logical location. The result is an improvement in random write performance – by up to 100 times – as well as in overall endurance. In addition, it incorporates a fully non-blocking architecture in which all of the NAND channels can behave independently, with some reading while other are writing and garbage collecting. Another element is usage-based content localization, which allows the advanced flash management system to “learn” user patterns and over time localize data to maximize the product’s performance and endurance.

Semantically-Smart Disk Systems

As opposed to a traditional “smart” disk and SDS has detailed knowledge of how the file system above is using the disk system, including information about the on-disk data structures of the file system. An SDS exploits this knowledge to transparently improve performance or enhance functionality beneath a standard block read/write interface. EOF can discover file-system structure for certain types of file systems, and then show how an SDS can exploit this knowledge on-line to understand file-system behavior. Furthermore, it is studied how the issues surrounding SDS construction by designing and implementing a number of prototypes as case studies can be handled. It shows that SDS is helpful to implement track aligned extent, secure deletion and journaling.

Muthian Sivathanu , Vijayan Prabhakaran , Florentina I. Popovici , Timothy E. Denehy , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau, Semantically-Smart Disk Systems, Proceedings of the 2nd USENIX Conference on File and Storage Technologies, March 31-31, 2003, San Francisco, CA

μ-tree: an ordered index structure for NAND flash memory

All the nodes along the path from the root to the leaf are put together into a single flash memory page in order to minimize the number of flash write operations when a leaf node is updated.

Dongwon Kang , Dawoon Jung , Jeong-Uk Kang , Jin-Soo Kim, μ-tree: an ordered index structure for NAND flash memory, Proceedings of the 7th ACM & IEEE international conference on Embedded software, September 30-October 03, 2007, Salzburg, Austria

A Reconfigurable FTL (Flash Translation Layer) Architecture for NAND Flash-Based Applications


Chanik Park, Wonmoon Cheon, Jeonguk Kang, Kangho Roh, Wonhee Cho, and Jin-Soo Kim, "A Reconfigurable FTL (Flash Translation Layer) Architecture for NAND Flash-Based Applications," ACM Transactions on Embedded Computing Systems, Vol. 7, No. 4, pp.1-23, July 2008.

Design Tradeoffs for SSD Performance

SLC NAND-flash (Samsung's K9XXG08UXM series)

Page Read to Register - 25us
Page Program (Write) from Register - 200us
Block Erase - 1.5ms
Serial Access to Register (Data bus) - 100us

Die Size - 2GB
Block Size - 256 KB
Page Size - 4KB
Data Register - 4KB
Planes per die - 4
Dies per package (2GB/4GB/8GB) - 1, 2, or 4
Program/Erase Cycles - 100K

Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, Rina Panigrahy. Design Tradeoffs for SSD Performance, Usenix Annual Technical Conference (USENIX '08), June 2008, Boston, MA.

A log buffer based flash translation layer using fully associative sector translation


LEE, S.-W., PARK, D.-J., CHUNG, T.-S., LEE, D.-H., PARK, S., AND SONG H.-J. 2006. A log buffer based flash translation layer using fully associative sector translation. ACM Trans. Embed. Comput. Syst.

A superblock-based flash translation layer for NAND flash memory


KANG, J. U., JO, H., KIM, J. S., AND LEE, J. 2006. A superblock-based flash translation layer for NAND flash memory. In Proceedings of the 6th ACM/IEEE Conference on Embedded Software (EMSOFT’06). Seoul, S. Korea.

Large Block CLOCK (LB-CLOCK) Algorithm: An Improved Write-back Caching Scheme for Solid State Disks

TBD Biplob Debnath Large Block CLOCK (LB-CLOCK) Algorithm: An Improved Write-back Caching Scheme for Solid State Disks draft

A Log-based Flash Translation Layer for. LargeNAND Flash Memory


Soo-Young Kim and Sung-In Jung, “A Log-based Flash Translation Layer for. LargeNAND Flash Memory,” ICACT, Feb 2006.

LGeDBMS: A Small DBMS for Embedded System with Flash Memory


G.-J. Kim, S.-C. Baek, H.-S. Lee, et al. LGeDBMS: A Small DBMS for Embedded System with Flash Memory. VLDB 2006: 1255-1258

Intel Corporation. Understanding the Flash Translation Layer (FTL) Specification


Endurance Enhancement of Flash-Memory Storage, Systems: An Efficient Static Wear Leveling Design


Yuan-Hao Chang Jen-Wei Hsieh Tei-Wei Kuo. Endurance Enhancement of Flash-Memory Storage, Systems: An Efficient Static Wear Leveling Design

M-Systems. Technical Note: TrueFFS Wear-Leveling Mechanism (TN-DOC-017) March 20, 2002

This document provides an overview of the DiskOnChip TrueFFS wear-leveling mechanism and explains why it is required for NAND flash technology.

Dynamic wear leveling – When there are no free physical erase units left in the pool, a process called folding – garbage collection occurs.

FAT filter – When it detects that a cluster has changed status from “allocated” to “free”, the FAT fitler notifies the other layers of the TrueFFS driver that the sectors associated with these clusters should be marked as erased. Utilizing the FAT filter enables more areas of the flash memory to be reclained by TrueFFS, which improves the efficiency of the wear-leveling algorithm.

Static Data Wear-Leveling – The static wear-leveling process performs virtual-unit swapping one time per 256 virtual sector write operations.

M-Systems. Quick Reference Guide: TrueFFS 6.x Software Development Kit (SDK)

TrueFFS (True Flash File System) is M-Systems’ patented flash management software. It is a software layer that resides between the OS file system and DiskOnChip. TrueFFS provides the OS with full block-device functionality, appearing to the OS a sa regular hard drive, while at the same time transparently providing flash media management.

A Transactional Flash File System for Microcontrollers


E. Gal and S. Toledo. A Transactional Flash File System for Microcontrollers. In Proceedings of the USENIX Annual Technical Conference, pages 89–104, 2005.

Tuesday, March 24, 2009

Eviction Based Cache Placement for Storage Caches

Most buffer management scheme are using access-based eviction criterion. In access-based eviction criterion, the time at which a data is accessed is stamped in the cache. However, the time at which a data is evicted from upper layer is stamped in the cache in eviction-based criterion. In order to do, the cache manages client content tracking table records. At every read/write operation, the table is consulted to find out which disk block was previously put in the given client memory address. If the old disk block is different from the currently access disk block, the old block must have been evicted from the client. While consulting, it needs the information about client memory address.

In this case of SSD, the secondary storage doesn't know the information about clinet memory. Thus, it is NOT applicable to SSD or any secondary storage itself.

Belady's MIN in cache

In a page replacement scheme, Belady's MIN serves as an theoretical optimum, which kicks the page which will not be used for the longest time in the future.

For more information:

Exercise 16.3.5 - Introduction to Algorithms

Suppose we have an optimal prefix code on a set C = {0, 1, ..., n - 1} of characters and we wish to transmit this code using as few bits as possible. Show how to represent any optimal prefix code on C using only 2n - 1 + n ⌈lg n⌉ bits. (Hint: Use 2n - 1 bits to specify the structure of the tree, as discovered by a walk of the tree.)


Blog Archive