Categories: Azure
Posted by
mheydt on
11/18/2009 10:19 PM |
Comments (0)
Supposedly $6M will get one of these dropped off on your premises and connected to the Azure cloud. The amount of airflow out of this thing was amazing; it was like the mythbusters the other week with the wind blower contraption trying to blow a house over.







55baee25-45a0-4081-bc79-a6eda06c18d1|0|.0
SQL Azure Database
- Highly scaled out relational database as a service
- Accessible via TDS
- SOAP/REST/ADO.NET/EDM/HTTP/HTTPs
- On premise access via SSRS
Customer value props:
- Self provisioning and capacity on demand
- Symmetry with on promesis database
- Automatic high availability and fault-tolerance
- Automated DB maintenance
Provisioning:
- Account as zero or more servers, billing instrument
- Each server contains one or more database
- Metadata
- Unit of security
- Unit of geo-location
- Logical grouping of DBs
- Each database has standard objects: unit of consistency, users, tables, views, indices, etc..
Free period ends 2/1/2010
Future:
- Improve tools
- Improve operational model
- Improve application programming model
Saas:
- Mention of provisioning API's outside of the portal
- Template databases
- Metadata tracking
- Additional billing scenarios
Single unit for deployment and upgrades
Data Synch
Upgrade and downgrade options between SKU's
Read only databases
Scale out support:
- Today
- workload partitioning
- but cluster managmeent is difficult
- Scale out addressed today:
- high availability
- Zero admin
- No downtime
- Elastic resources
- Pay as you grow
- No friction provisioning
- Never run out of hardware
- Future
- More scale out
- Dynamic database splits
- Ability to merge databases
- Improved schema management across groups of databases
- Additional database size options
- multiple database connection management
- Support for fan-out query
- Recent requests from customers
- Profillers, DMVs
- Spatial data types
- Full text support
- Change tracking
- CLR
- BI
- Encryption
- WIF
3c8031cb-6916-45f3-9a40-3068f910c70e|0|.0
Evolution of Azure
- 2008:
- App hosting, two roles, queue based communication, partial trust asp.net
- storage: blobs, tables, queues
- Desktop sdk: cloud simulation
- Service mgmt portal: vip swap upgrades, automatic os servicing
Coming soon:Paid usage Feb 2010
New api exposed and updates IP/Port values
Direct inter-role communications
RoleEnvironment.Changed
RoleEnvironment.Changing
Secure certificate store in the cloud
Logging and diagnostics
Random writes to blobs
Disk drives (February)
Geo-replication
Secondary indexes on tables
Service Management API
In-place rolling upgrade
553bf24f-d663-4257-a267-213d5a5ee0f8|0|.0
Categories: Azure
Posted by
mheydt on
11/17/2009 3:26 AM |
Comments (0)
Azure pricing is more simple than Amazon
- 10c/GB in and 15c/GB out, regardless of service
- Data withing data center is unmetered
- Between data centers is at full rate
Compute
- 12c/h deployed instances (ready state+)
- Same charge regardless of
- Instance is actually running
- In staging or production
Storage
- 15c/GB/m
- 1c per 10,000 REST Calls
- CDN charges (TBD)
- Storage is pro-rated by average daily amount
- Blocks cost money even if not PUT (don't orphan blocks)
- Overhead can be expensive in tables
- Entity pairs, name / values, names also take space
Storage
- 1c / 10000 REST calls
- Appears inconsequential, but hand add up
- Using a queue is at least 3 transactions (enqueue, deque, delete)
- Batch transactions to reduce cost
SQL Azure
- $9.99 / m for 1GB
- $99.99 / m for 10GB
SQL Azure - Instance Based
- Prorated by time (midnight UTC)
- CPU like other roles
- No transaction charge, but beware of throttle
SQL Azure - Accounting
- Included in cac: shema/objects
- ... missed rest too fast
SQL Azure - Determining Usage
Session State
- Cheaper in SQL Azure than in tables due to transactions
Pricing Tips
- Bandwidth: compress response, minify/optimize (aptimize.com)
- Reduce resource size: compress data in blog storage
- User affinity groups to geolocate services
Compute
- Add or remove instances as required
- start same number of instances in staging
- warm up instances
- swap deployments
- more (went too fast)
Storage
- Batch requests
- Model usage
- Compress blog data
- Set content-encoding when putting blob
- use caching to reduce request transaction count
- use CSS sprites and Data URIs to reduce transaction count
Where do static resources live:
- In web role? incurs additional free but resource consuming requests
- Blob: storage cost and transaction cost
Use CSS sprites
- Can massively reduce round-trips and transaction counts in Azure
SQL Azure
- Vertically partition out large data columns
- Support dynamic partitioning if possible
- Consider just-in-time partitioning
- Pull archive data out of the cloud to cheaper on premise storage or to azure storage
- Manipulate db from cloud using System.Data.SqlClient.SqlBulkCopy
d224d0a2-31da-44e2-b0d6-0c9076eb8e99|0|.0
Categories: Azure
Posted by
mheydt on
11/17/2009 2:20 AM |
Comments (0)
Why Partition?
- Classic: Data volume, work load,
- Cloud: cost, elasticity
Horizontal Partitioning (Sharding)
- Spread data across similar nodes
- Achieve massive scaleout
- Intra-partition queries easy
- Cross-partition hard
Vertical Partitioning
- Spread data across dis-similar nodes
- Frequent data in expensive indexed storage
- Large in cheap storage
- Retrieving all data required more than one query
Hybrid
- Combination of horz and vert
Table Storage Key Points
- Partitions auto balanced
- Partition key and row key = primary key
- Distributed queries priced on transaction not cpu, so less costly than things like EC2
- Continuation tokens
- queries without partition keys need these
- helps with cross partition results
- each call with the token is a transaction
- Key columns can be up to 1kb, but 260 is practical limit due to URIs
- Row key = partition key => just one partition (and no continuation tokens issued)
Horizontal partitioning - SQL Azure
- For example, first char of last name is the heuristic for partitioning
- Partition for
- Data volume > 10GB
- Transaction throttle (non-deterministic) always code for retry
- All partitioning is up to the developer
- Partitions are not auto balanced
Choosing a partition key
- Natural keys (last name, ssn, ...)
- Modulo
Hashes
- Project one distribution into another
- Use a function that is a random distribution
- Do not use a crypto hash (overkill on CPU)
- Plenty of examples: tinyurl.com/part-hash
- Be careful of using object.GetHashCode() ( boxing might give different hashes for the same value when hashed more than once)
- Lots of hash stuff on codeplex
Partition stability over time
- May need to change partition scheme
- Two options: repartition all data, or versioning partition scheme
Vertical partitioning
- Balance performance vs cost
- SQL Azure
- Fully indexable
- No query transaction charges
- $9.99/GB
- Azure storage
- ... missed this - slides went too fast
- Duplicated data can lower transaction costs on data
Azure tables != RDBMS
- Storage is cheap
- Cross-partition queries are resource intensive
Modeling Azure Tables
- Currently no secondary indexes
- build indexes yourself
- If associated data is small enough
- Save additional queries
- Duplicate data with each index
- Lots of worker roles to massage data into indexes
Summary
- Partition Data Key to scale cloud apps
- Horiz partition for scale out
- Vertical for cost/performance
- Choose appropriate keys
7ee0f96a-65b9-4bb5-9697-67a44c1aa740|0|.0
Categories: Azure
Posted by
mheydt on
11/17/2009 1:12 AM |
Comments (0)
- All requests go through the load balancer
- Idempotency provided through compensating messages to other queues to provide for replay
- Generally try to build for idempotency
- CRUD is generally not idempotenty, but using integrity keys can be
- issue is with data changing underneath
- Azure queues do not participate in DTS
- Poison Message Handling / Zombie Messages - write message id's to persistent store
- Make sure poison test is at the top of your processing, but should be handled prior to any other code
- Therefore, another worker role that does this before passing messages to other roles, or
- in a base class or your worker role
- dynamic work type in message to route to specific workers
- Key points for dynamic workers
- smart polling model (each poll costs $$)
- use app domains to separate loaded types
- MapReduce pattern
- reduce large problem to small pieces, process, aggregate results
- very parallelizable
- map -> group -> reduce
- Generally processor intensive / ram light
- Summary
- Use async
- Use queues, but make sure for idempotency and compensation
- watch for poison messages
- Dynamic worker provide scalability
f61e2b0d-7f34-4e01-a906-9c80403f440d|0|.0
Categories: EF
Posted by
mheydt on
11/11/2009 11:42 PM |
Comments (0)
I spent an airplane flight last week catching up on some MSDN articles. Three that I read related to Entity Framework and in particular EF 4.0. I thought they were well put together, actually discussing the issues involved with various types of data access patters and how they have evolved or been addressed by EF / EF4.
Building N-Tier Apps with EF4 is the third of the three articles and I think the most relevant (if not most important). One of the things I really liked was this graph showing the trade offs of the four primary data access models:

The article(s) go over these patterns thoroughly so I won't discuss here, but it's nice too see it quantified for once and to reiterate it.
The big part of EF 4 will be the concept of Self-tracking entities, which looks like a real panacea for many of the problems exhibited by the other three models. Again, this article goes into great detail on this so I won't repeat.
The other two articles in the series, both also worth a read, are:
Anti-patterns to Avoid in N-Tier AppsN-Tier Application Patterns
Cudos to Daniel Simmons for writing these.
aa6b4261-0daa-4f44-996f-1cdb6ed334ee|0|.0
Today I spent a few minutes checking to see if the LinkedIn API is available yet. Many months back I looked into the possibility of integrating LinkedIn with a Twitter client I was (and still am) building. So I applied for access to the LinkedIn API and got no response at all from LinkedIn. Zip. Zero. And this is not something unique to me either.
So today I thought I'd just take a look again as it was always stated that LinkedIn would be releasing a "public API" for everyone to use, and I decided to take a look again, and came across this post on the LinkedIn blog
post on the LinkedIn blog, posted on 11/9 (yesterday as of this writing).
I may be overly sensitive, but I can not tell you how much this upsets me. LinkedIn took it upon themselves to keep their API private so that they can roll out integration with Twitter on their own instead of promoting open innovation in the community. Sure it is their right to do this, but it is just not in the spirit of how things work today. I know of many people who wanted to provide these capabilities and were just simply blocked by LinkedIn from doing such.
Particularly grievous to me is that they went and did this integration with Twitter (instead of some other service, say Facebook) whose model has always been "Here's our API - you go and innovate." In that sense I guess I am also not very happy with Twitter going ahead and doing this as it just seems unfair to me that they would promote integration with another system that has a closed API.
LinkedIn, get your act together and play nicely with the other kids.
a4c4d97f-9e32-442b-acaf-fe530ba17554|0|.0