Bucky Woody
Book Review (Book 8) - The Elements of Persuasion
This is a continuation of the books I challenged myself to read to help my career - one a month, for year. You can read my first book review here. The book I chose for January 2012 was: The Elements of Persuasion by Richard Maxwell and Robert Dickman.
Why I chose this Book:
As I mentioned in a previous review, I think good storytelling is an essential part of any career. Communication is basic in not only our professional but personal lives, and everyone I’ve met responds well to stories, from children to executive audiences. Not only that, learning to tell a story helps you formulate concepts about the topic, which is yet another way of learning.
I heard about this book from a couple of folks, and it landed within search of “storytelling” and “business”. Whenever I just search for “Storytelling” I either end up with lists of stories (which is fine) or lists of children’s books on storytelling (which is also fine) but neither of these are quite what I’m looking for.
A quick search on Amazon and I located the book, and then a quick check of my various e-library offerings and I downloaded it to my laptop for reading.
What I learned:
This is a “selling” book, but not like you might think. It’s not a book of a quick sale like at a car-lot or a “quick-sale” environment. It’s more along the lines at the executive level and longer-term sales - those involve stories as well.
Sadly, this is another “business book” - the kind I normally don’t like much. There are typical case-study layouts with lots of examples, but in my mind not enough didactic information to actually help you develop a good story-telling mantra.
Even so, I learned some interesting things about the process these authors use. Some of the case studies are interesting, and I did pull out that a story should work towards a single, defining sentence. This isn’t unique to this book, but it is a reinforcement of what I’ve learned elsewhere. Although nothing to do with storytelling, I did like the reference to Lockheed’s “14 Rules”, which I hadn’t read before. They also break down the storytelling process into five elements, which is actually covered better (in my mind) in a book called “20 Master Plots” , which may actually be the storytelling book I’ve been searching for.
Or perhaps I should just write the one I’m looking for.
At any rate, not sure I would recommend this book to others - perhaps as a check-out, but not a purchase, at least if this is for the same reason I looked it up.
Raw Notes:
As I read, I take notes - it’s called “reading with a pencil”. These are the notes I made to myself, in no particular order and with no context other than the book itself:
- Stories are interesting to us all.
- Describes five elements in a story, but in fact this is for only one type of story. Other books describes more story types.
- Very standard business book, but there are good tips in some of the chapters.
- Explained how to connect with the audience, good points Spends a lot of time referring to other books The book of five rings Work towards a single, memorable sentence.
- Changes partway through into stories about stories. This is better.
- A mix of storytelling and sales, although this was touted for sales, feels much more like selling than storytelling, advertisements.
- Interesting story about memory championships, where contestants memorize cards. They use unusual stories.
- Look up Lockheed and the 14 rules
Team Foundation Server (TFS) in the Cloud - My Experience So Far
I recently joined a software development project that involves not only myself and other internal Microsoft employees, but a partner and a customer as well. We are building a hybrid solution that uses assets on premises as well as Windows Azure for processing. When we put the team together we picked a methodology (Agile) for the project (we use multiple methodologies at Microsoft - whatever the project needs) and then we started talking about Source Control.
We’re all comfortable with various tools for check-in-check-out, branching, and so on. We have all used GIT, SVN, and TFS. Some of us have even used Source Safe in past, but that’s another post. Each company has a full set of Source Control systems in place. But using each other’s systems requires logins, firewalls and the like - so we decided to use the TFS Service Preview to run the entire project from “the cloud”. Here are my experiences with that.
The process was really simple. In fact, we talked about using the cloud TFS in the first SCRUM, and the team was working from the Work Items list that afternoon. The original account login provides a web interface to allow people to join the team. Each of us happened to have a Live.Com address, so we just invited those addresses to join and they got a link, like this:
projectname.tfspreview.com
I’m using Visual Studio, and it’s a requirement for TFS preview to have SP1 installed, and this patch: KB2581206
From there, I opened Visual Studio and navigated from the main menu to Team and then Connect to Team Foundation Server. I’m given this menu:
Selecting port 443 and HTTPS (for security) and then ensuring the lower link has the “tfs” appended as the location, I opened the project.
(This VSTS screenshot is of a project I did in my University of Washington class I teach - I never show client code or names in a blog post)
From there it’s a normal set of operations. Right now the preview doesn’t have some things I’d really like, such as an automated build or some of the testing tools, but you can read this blog entry to learn more about the entire sign-up process, and what the team has planned.
Each day I log in to the project, and I’m given this new sign-in option:
I click the option, and I open the environment, hit My Work Items query, and get to work. All in all, a seamless - although basic - experience. The speed at which we could set up and work on a project was really sweet. It’s remarkable how un-remarkable this is - I just do my work each day, everything is running and backed up in the cloud. I think that’s the point.
Bug-Out Bags and Cloud Architecture Considerations
I served in the U.S. Military for a while, and as part of my training we had to maintain a “Bug-Out Bag”, which was a large duffle-bag full of certain items that we could live on/fight with in an emergency. I’ve carried the spirit of that idea forward with me into civilian life, in Florida and especially here in the Pacific Northwest.
In Florida we dealt with the threat of hurricanes - I went through four of those in one year that hit my area. You’re without power, it floods quickly, and it gets wicked hot. You roof might be gone, whatever. Here in the Pacific Northwest, I live near one of the largest volcano's in the world, we have flooding, and recently we were hit with an ice-storm. Now I’ve lived all over the world, from Alaska to North Dakota and even near the Kamchatka Peninsula in Russia, and I can handle the snow. But ice - that’s a toughie no matter where you live. We had so much that it split my little pine tree in front of the house in half.
We lost power - although I think the folks at Puget Sound Energy did an amazing job at getting us back up in less than 24 hours, but we weren’t worried anyway. That bug-out bag mentality carried forward to a “second pantry” we keep in the garage.
We have a large plastic box (that will fit in the back of the Subaru) with dried goods like pasta, and canned goods and even a little cook stove. We have 25 gallons of clean water in Jerry-Cans. We have batteries, candles and matches. And we have flashlights around every door. We use supplies from the “pantry” to fill our house pantry, and then refill the emergency one from the grocery store. That way everything is fresh, rotated, and we can “bug-out” here at home or on the road.
So what does this have to do with Distributed Computing Architectures?
It’s the thought process. In both the military and civilian life, I’ve done a few things:
- Sat down and thought carefully about exactly what I need. Did I include a can-opener? A small shovel to dig out of whatever I got stuck in? Then I weed out what I *really* don’t need.
- Put those things into a small, manageable container.
- Tried them - even when (especially when) I didn’t have an emergency
- Tweaked the process to see what I could do better.
Have you done this when you moved an app to the “cloud”? Each of these has a computing parallel - do you know what you would do if you couldn’t access the Distributed Computing Environment?
I’ve found these thoughts are actually a great place to start - keeps the process simplified from the start, and gives you a sense of assurance when you’re asked if you can recover from an emergency.
Cloud Computing In Action: How I work with Live Mesh, SkyDrive, and Office Live Workspace
Recently I had an tweetversation with a couple of friends on some confusion around two of our products: SkyDrive and Live Mesh. Like most of our software, there’s no single way to do things. That can be a strength or it can cause that confusion. They asked if I would blog how I work with these two products, and what advantages there are to this way of working.
Before I start - this is specific to these two Microsoft products. If you’re a fanboi of another product, that’s great. Awesome. Go for it. You don’t have to use these. There’s no law about it or anything. It’s all good. I use the products you see below because I evaluated lots of them, and these work the best for me - not because I work at Microsoft. But do what makes you happy.
Let’s start with what each of these products do. Live Mesh synchronizes files to various locations. You can create a file on one PC, save it, and then when you fire up another PC that file will be copied from the original location. It’s a mirror of the file, and it exists in both places. You can change the file on the second location, and it will be copied back to the other system, stepping right on top of it.
SkyDrive is a storage system. You can store lots of data in there - larger than most of the other free offerings.
Office Live Workspaces allows you to integrate SkyDrive into your local copy of Microsoft Office, so that you can create, save and edit a document and it will be stored in SkyDrive, and not only that, it will keep a local, synced copy so that you can work offline. But it also has a web-based subset of Microsoft Office. You can create, edit and work with Microsoft Office documents with no software installed at all. From Linux, Mac, a cell phone, whatever has a browser. In fact, we’ve released one of my favorite products, OneNote, in iPhone and iPad flavors, which also buffer down the file as if you had a PC and Microsoft Office.
I rely on these each of these products every day. Here’s how I use them.
I use Live Mesh to copy my entire “Data” directory - files, music, everything - from my home “server” to my work and other systems. Since SkyDrive has a limit, I only send certain files to SkyDrive using Mesh. Just the ones I need access to from non Microsoft-OS devices. Of course, this means I have to leave my home server turned on - which I do anyway since it’s my media server, web server, TV, etc. But everything else I sync to about four computers running Windows.
For my OneNote files - quickly becoming the center of my universe - and anything else I want to access from anywhere, all the time, I use SkyDrive and Live Office. Here’s how that works.
If it’s an MP3, Visual Studio Code, a training video or whatever my customer needs, I save it in SkyDrive, mark it public, and send them the link. Done. Any device that can render these can access the file over the web. Since I play in a group on Sunday, I even put my music there (I use MuseScore) and then I can pop the music up on my netbook right at the pulpit and leave the paper at home.
For OneNote or other Microsoft Office documents, I create the document first in Office Live. Once the file is open, and before I even type in it, I click the button marked “Open in OneNote” (or Word, or Excel, or whatever) and from them on I have that file linked in the local system, and a shadow copy for working offline. I can also work with that document from the web using my Linux or Apple OS’s if needed. I recently attended a very Microsoft-hostile environment, so everything from the presentation to the code review for Windows Azure I did from Live Office and my SkyDrive, all from my Linux Laptop.
As I’ve always said - use what works. This arrangement gives me the ultimate flexibility. I have my data from Live Mesh synchronized on multiple systems. More than once I’ve deleted something I needed, or changed something. I simply boot up the other device without being connected to the web, copy the old version off, and then let it connect and sync. I also back up my home server once a week to a set of local drives, so I have offsite and onsite backups. I can work from anywhere I have a browser, or someone that will let me borrow a device. I have all my presentations ready to present from any system, even if mine breaks.
Hopefully this helps - and hopefully it inspires you to write a blog entry on how you use your favorite cloud products. There are always multiple ways to do things, and I love to learn.
Stand-Up Cloud Computing
When I was very young, I asked my uncle for career advice. He went silent, thinking for a bit, and then said: “People who work sitting down make more than people who work standing up.” I’m not certain how true that really is, but my career as a technology professional has led me to work in a seated position for most of my life.
Turns out, that’s a bad thing. Although I consider myself pretty fit, eating right, sleeping well and working out several times a week in addition to a morning walk with the family each day, I always look for low-barrier ways to stay healthy. When I first moved to the Pacific Northwest to work for Microsoft, I noticed several folks working at tall desks with no chairs. Some even had treadmills. I chalked it up to the ethos here; and certainly not something I would do.
But this year that changed. I noticed that my back was a little stiffer when I got done with my 12-13 hour days of work. For the last couple of years, I’ve worked from home, so I don’t attend meetings (at least in person) as often or have to walk very far to do almost any part of my job. I start work around 6 in the morning, and sometimes get so focused that I don’t moved for many hours. I read an article on how bad sitting really is, and after further investigation thought I might give one of those stand-up desks a try.
The research led me to believe that you don’t actually have to use a stand-up desk per-se, you can also use an alternate chair or just get up every so often. But I wanted to try this out, and figured that I would be more likely to take a break and sit every hour than I would to remember to stand every hour.
The Before
My office desk is fairly typical, but I do have a decent office chair. That’s after going through probably six or seven chairs in the last few years. I have good lighting, a speakerphone, a web cam and two monitors. I also have the typical flotsam and jetsam of desk clutter, although I’m neater than some.
This arrangement has suited me well since I’ve been working at home. I had something similar in an office environment, although I didn’t always have the option of a decent chair. I didn’t go through the trouble of bringing one of my own in; I just put up with whatever I got, or could “appropriate” from an empty office or conference room.
The Build
My criteria were fairly simple: the experiment had to cost less than 100.00, and be at the proper height and size to hold my keyboard, trackball, phone and monitors so that I could type with my elbows staying at a 90 degree angle.
After researching standing desks, 100.00 was going to be impossible, not even for a used one. I visited several thrift shops in the area (I do that a lot anyway to donate and to buy) and didn’t find anything that worked. Of course, when you’re faced with finding cheap furniture, you naturally turn to the most amazing store on the planet.
Before I left, I measured the top area and height of my desk, and wrote down acceptable measurements based on how I high I stood, the stuff I needed the top to hold, and the distance I needed for my typing to be done at the right height. Measurements in hand, I headed to the store.
I found a coffee table - a really cheap one (19.00) called (oddly) LACK and brought it home to begin the surreal process of assembling something bought at Ikea.
Happily, this was REALLY simple. Four lag bolts hold the legs on, and eight screws punched into the wood (or at least wood-like) to attach the shelf. My original thought was that I would move the shelf up higher than the Ikea instructions, and then use the pull-out tray from my desk to put the keyboard and trackball on. However, that was not to be. On investigation I found that the tray was not hung underneath the desk, but attached at the sides. That meant I had to either buy another tray, or place the keyboard on top, necessitating standing two inches higher.
The After
Researching trays, I found they were terribly expensive. These things used to be everywhere, so I was surprised that they aren’t as easy to get as they once were. Off to the thrift store to see what they had. I found an older tray, but it looked flimsy. I then found a child’s plastic picnic table. The plastic was strong, and 1.5 inches thick. I figured I needed some padding to stand on anyway, so I bought the table, pulled off the legs, and wrapped some padding in an older rug. This brought the total cost of the build to 25.00 and 2.00 for an espresso and a cinnamon bun at Ikea (I would be burning these calories off with my new desk, after all).
I re-routed all my cables, and everything fit correctly.
My Early Conclusions
The first day was easy. I thought - well I should have done this a LONG time ago! My back wasn’t that sore, and I didn’t feel that tired.
Then I woke up the next morning. My feet were sore, although not terribly. The second day, I had to sit down each hour. Not just wanted to sit down - needed to. I play in a group at Church on Sunday, so I put my guitar in the office and spent 5 minutes each hour (roughly - sometimes I have calls that are longer than that) and practice a little sitting down. That helped a lot.
I’ve now been at the desk for four days, and I don’t need the breaks as often. I also find I need to shift around a lot, which of course burns even more calories and is better for me. I honestly think the treadmill desk might be easier than a standing one. We’ll see if I go that far someday.
The verdict so far? Glad I’ve done this. If it doesn’t work out, I’ll just re-purpose the coffee table and go back to sitting - although I’m pretty stubborn and will probably stick with this for a while. I’ll let you know if I change back, and why.
My executive assistant hasn’t changed her office arrangement at all. She still keeps her (Ikea) chair just like she’s had it since she started working with me, and dutifully stays at her workstation for the entire 12-13 hours each day. We do, however, take our lunchtime walk still. She burns her calories that way, and thinks it’s better than just standing around all the time.
Valentine’s Day and Your Career
The new year has begun. It’s traditional to make “Resolutions” at this time, but as I’ve mentioned before, I don’t do that. I make goals instead. I like things to be measurable, and I hold myself accountable to those goals - some of which deal with my professional life.
But you might not buy into all that. Perhaps you’re the kind of person who doesn’t buy Valentine’s day cards, or take your significant other out for dinner on Valentine’s day. After all, it’s a manufactured, made up holiday from the greeting-card companies, right? Somebody just decided to come up with a day to make you do something you don’t normally do.
Here’s a tip: do it anyway. Buy the flowers. Jump into the hype. Yes, it’s a made-up holiday. Yes, they’re making money off of you. But take that person out for the nicest dinner you can find. Treating someone you love in a special way on a periodic date is shown to increase the bond in a relationship, simply because it’s a ritual date that others keep. The ritual is the magic.
What does this have to do with New Year’s, or your career? Everything. Not to burst a bubble here, but the universe is not aware of human timekeeping mechanisms. The New Year is just as artificial as Valentine’s day. In fact, many other cultures don’t even count the 1st of January as the New Year. But it’s OK - just like Valentine’s day, you can use the “start of the new year” as a time to focus on something you need to do.
It’s pretty simple to do this - but of course simple != easy. Goals need to be realistic - so sit down sometime this week, and follow this process:
- Write down where you want to be in a year in your career. Make it specific. An award, a position, a company, a raise. Write it down.
- Write down a few books you want to read that will help you get there. Blog about these books.
- Write down the people you need to talk to, inside your company and out. Send an invite out to these people to chat. Do that this week.
- Write down the things you need to accomplish for that goal in your job.
- Tell others you are doing these things, and what you expect.
- Implement your plan.
- Review your plan and adjust as needed each month.
Yes, the “new year” is artificial, like Valentine’s day. So what. Use it to get where you need to go.
Happy New Career.
Book Review (Book 7) - Think Stats
This is a continuation of the books I challenged myself to read to help my career - one a month, for year. You can read my first book review here. The book I chose for December 2011 was: Think Stats, by Allen B. Downey.
Why I chose this Book:
I originally chose another book for this month, but changed to this one after a difference in focus (sort of) in my technical career. That brings up a couple of interesting points right away. The first is that it’s OK to change a list - remember that the purpose of reading these books is to gain information that gets you closer to your professional goals. When you develop your list, you have a certain amount of knowledge, and as you read more, experience more, and are exposed to more, you get different information. When that happens, adapt.
The second point is that your goal itself may change. I am focusing on “Big Data” this year and with the changes we’ve made in Windows and SQL Azure at Microsoft, this fits neatly with my professional goals personally and the company I work for. Actually, my goals in technology haven’t changed in the 27+ years I’ve worked in IT, in roles from electronics, programming, consulting, management, architect and in my current technical role here at Microsoft. I think that it has always been about data - everything in IT is an interface to data. And I have always wanted to be at the center of that. Data Science involves not just the sourcing, administration and movement of data, but in applying scientific (with an emphasis on mathematical) disciplines to get at the meaning the situation needs.
So that brings me to this choice. My friend Jeremiah Peschka found this resource for a role I am VERY interested in - the “Data Scientist”. It’s a combination of high-end mathematics, Data Analysis and Big Data. The resource is a series of books from O’Reilly for that very title. You can find that here.
Personally, I find the grouping of books a little cobbled together. They are all fine books, but I’m not certain how they lead you through the series of knowledge required for the topic, but that’s a post for another day. Within that series of books is the one I’m reviewing today. I started (since there is no implied order in the books) with the “Data Analysis” book, but it seemed to start in the middle of some topics I needed to research, so I switched to reading this one, and chose it as my December book.
Another note here - December is a tough month. Since so many people take vacation time during this month, most of my clients try to get as much work in before the Holidays as possible. Since they are all doing that at once, it makes for a lot of overtime. Also, I travel to see family, which of course puts me out of pocket for a while myself. So staying on track with the books - especially one that makes heavy use of computing, math and focus is hard. So it’s tough to maintain your goals all of the time - but keeping in mind why you do this is the important thing. It will keep you on track.
What I learned:
This book focuses more on what the title says - it’s more about being mindful of the way you use statistics than the statistics themselves. It’s assumed you know not only the basics of statistics (I used these free lessons as a refresher, along with some of my old stats books) but how they are used.The author doesn’t stop to explain a lot of stats he uses, but periodically he does show why a given formula works the way it does. This is very useful, and helps with understanding the point of using one method over another. He also does a great job of using statistics to verify other statistics.
Although it should be obvious, the meaning of the data is essential. We think about this when we deal with the result of data processing, but not necessarily when we work with the sources. For instance - as the author explained some central tendency, smoothing and so on using statistical methods, he introduced some numbers and asks you to guess the central number from the set. Dutifully you work out the answer, but in time he reveals that it’s a series of numbers on a die - which of course can only be whole numbers. The point is that you’re so focused on getting the right answer, you don’t define what the real problem is first.
Another great tool - and a fascinating study that I need to look into further - is the fact that you can often make at least educated inferences into data you might not imagine. For instance, he talks about the example of a series of train cars, numbered sequentially. You see a train car numbered “60” - can you guess with any certainty how many train cars the company has? Fascinating stuff.
He includes a glossary at the end of each chapter. I found this a great approach for summarizing the information in one place, and really helpful in making sure I understood everything before moving on. I didn’t always, so I had to re-read parts of the book and freshen up my stats knowledge along the way as well.
He uses Python as the language of choice - which I found a bit unusual. Most of the stats profession uses something more like the R language, which I’ve also started learning, and one of the other books in this series includes R as a primary subject. Because the author uses Python, he includes references to a series of libraries you add into it to work through the examples. Python certainly is a Data Scientist’s tool, just normally not for statistics. The author uses great examples and assignments, but doesn’t really follow up on those. I guess I’d rather see those introduced earlier in the chapter and explained better. He tends to jump around a bit, and his references are to Wikipedia, which isn’t always as reliable or thorough as it can be. But these are small quibbles. It’s a good book, and a I learned a lot reading it. In fact, I have lots of concepts to unpack based on what I read.
Windows Azure Storage (WAS) Internals - Achieving Consistency
Windows Azure Storage has three primary components - a Queue, a Binary Large Object (BLOB) store (two types of these), and Table Storage.
Storage of data on-premises is fairly well understood - but there components of it that you may not consider. When you move to a distributed architecture, certain factors should be taken into account, such as consistency. Consistency means that when you store a datum it should be available in the same bit format across the calling mechanism. In other words, if you store a picture with a certain name, whenever you call that name that particular picture should show up. That might sound obvious - but when you begin to scale horizontally, it’s a big consideration. Systems are spread out over multiple physical racks, which are further separated into separate “fault domains” each with its own power, networking and so on, and in Windows Azure, the storage is replicated to ensure high-availability.
Some “cloud” systems relax the consistency target to allow for the highest speed throughput. This might allow inconsistent reads, meaning that the datum recorded in the naming system would be available yet, or that it might allow an older version of the datum to be read. In Windows Azure, we took the position that the consistency is of the highest importance. We achieved this through constructs such as the Location Service (LS), Stream, Partition and Front-End layers, and separate replication engines. Of key importance in a system that allows high consistency is in the naming and object access protocols - in fact, these turn out to be some of the most pivotal.
Windows Azure Storage has a complex arrangement to ensure this high consistency. You can read some very deep internals here. And a video of the talk held at an ACM conference is here.
How Microsoft helps you NOT break your Windows Azure Application: Storage Services Versioning
One of the advantages of using Windows Azure to run your code is that you don’t have to constantly manage upgrades on your platform. While that’s a big advantage indeed, it immediately brings up the question - how do the upgrades happen? Microsoft upgrades the Azure platform in periodic increments, and the components that are affected are documented.
This brings up another question - upgrades mean change, and change can sometimes alter the way you might implement a feature. What if you have taken a dependency on some feature in your code that has been altered by an upgrade? Windows Azure does have an Application Lifecycle Management (ALM) Process, which I’ll reference at the end of this post. But beyond that, there are some features we’ve put into place that will help you manage many of these changes. One of those is being able to set the version of storage features you would like your code to use.
Windows Azure is made up of three main component areas: Computing, Storage and a group of features called the Application Fabric. You can use these components together or separately, depending on what you would like your application to do. In this post I’ll deal with the version control in the storage subsystem - in other posts I’ll explain how to track and in some cases control the versions of the other components you work with.
When you send a request to a Windows Azure resource, you’re actually using a REST call. That’s a three-part call to the system that has a request (called a URI), a header, and a body of code you want to send. So a typical call, such as to a table, might look like this example, which changes the properties of a Blob:
URI:
PUT http://myaccount.table.core.windows.net/?restype=service&comp=properties HTTP/1.1
Header:
x-ms-version: 2011-08-18
x-ms-date: Tue, 30 Aug 2011 04:28:19 GMT
Authorization: SharedKey
myaccount:Z1lTLDwtq5o1UYQluucdsXk6/iB7YxEu0m6VofAEkUE=
Host: myaccount.table.core.windows.net
Body:
<?xml version="1.0" encoding="utf-8"?>
<StorageServiceProperties>
<Logging>
<Version>1.0</Version>
<Delete>true</Delete>
<Read>false</Read>
<Write>true</Write>
<RetentionPolicy>
<Enabled>true</Enabled>
<Days>7</Days>
</RetentionPolicy>
</Logging>
<Metrics>
<Version>1.0</Version>
<Enabled>true</Enabled>
<IncludeAPIs>false</IncludeAPIs>
<RetentionPolicy>
<Enabled>true</Enabled>
<Days>7</Days>
</RetentionPolicy>
</Metrics>
</StorageServiceProperties>
(Source of this code)
You can see that I’ve highlighted a portion of the header block - that’s where you set the version of the Storage Services you would like to use. You can find a list of the features introduced in each version here. It’s not a requirement of adding that element to the header, but it’s best practices to do so.
You don’t have to use REST calls directly, however. It’s more common to use the API in the Software Development Kit to just change the property in your IDE environment - the setting you’re looking for there is the Set Storage Service Properties call.
Interestingly, rather than a breaking change you might run into an unexpected behavior if you are not aware of these parameters. In some code I recently reviewed a newer feature from the storage system failed when it was called. On inspection I found that the developer had used an older codeblock from a previous version of the storage system - he was not aware you can set the version of storage in the call. We changed the header to the latest version, and everything worked as expected.
References:
The Storage Services Versioning and the changes for each version:
http://msdn.microsoft.com/en-us/library/windowsazure/dd894041.aspx
Windows Azure Application Lifecycle Management:
http://msdn.microsoft.com/en-us/library/ff803362.aspx
http://channel9.msdn.com/posts/Windows-Azure-Jump-Start-03-Windows-Azure-Lifecycle-Part-1
http://channel9.msdn.com/Events/TechEd/Australia/Tech-Ed-Australia-2011/COS201
Windows Azure Best Practices: Affinity Groups
When you create a Windows Azure application, you’ll pick a subscription to put it under. This is a billing container - underneath that, you’ll deploy a Hosted Service. That holds the Web and Worker Roles that you’ll deploy for your applications. Along side that, you use the Storage Account to create storage for the application. (In some cases, you might choose to use only storage or Roles - the info here applies anyway)
As you are setting up your environment, you’re asked to pick a “region” where your application will run.
If you choose a Region, you’ll be asked where to put the Roles. You’re given choices like Asia, North America and so on. This is where the hardware that physically runs your code lives. We have lots of fault domains, power considerations and so on to keep that set of datacenters running, but keep in mind that this is where the application lives.
You also get this selection for Storage Accounts. When you make new storage, it’s a best practice to put it where your computing is. This makes the shortest path from the code to the data, and then back out to the user.
One of the selections for the location is “Anywhere U.S.”. This selection might be interpreted to mean that we will bias towards keeping the data and the code together, but that may not be the case. There is a specific abstraction we created for just that purpose: Affinity Groups.
An Affinity Group is simply a name you can use to tie together resources. You can do this in two places - when you’re creating the Hosted Service (shown above) and on it’s own tree item on the left, called “Affinity Groups”. When you select either of those actions, You’re presented with a dialog box that allows you to specify a name, and then the Region that names ties the resources to. Choose a specific region - not one of the "Anywhere" choices.
Now you can select that Affinity Group just as if it were a Region, and your code and data will stay together. That helps with keeping the performance high.
Official Documentation: http://msdn.microsoft.com/en-us/library/windowsazure/hh531560.aspx
Book review (Book 6) - Wikinomics
This is a continuation of the books I challenged myself to read to help my career - one a month, for year. You can read my first book review here. The book I chose for November 2011 was: Wikinomics: How Mass Collaboration Changes Everything, by Don Tapscott
Why I chose this Book:
I’ve heard a lot about this book - was one of the “must read” kind of business books (many of which are very “fluffy”) and supposedly deals with collaborating using technology - so I want to see what it says about collaborative efforts and how I can leverage them.
What I learned:
I really disliked this book. I’ve never been a fan of the latest “business book”, and sadly that’s what this felt like to me. A “business book” is what I call a work that has a fairly simple concept to get across, and then proceeds to use various made-up terms, analogies and other mechanisms to fill hundreds of pages doing it.
This perception is at my own – the book is pretty old, and these things go stale quickly. The author’s general point (at least what I took away from it) was: Open Source is good, proprietary is bad. Collaboration is the hallmark of successful companies. In my mind, you can save yourself the trouble of reading this work if you get these two concepts down.
Don’t get me wrong – open source is awesome, and collaboration is a good thing, especially in places where it fits. But it’s not a panacea as the author seems to indicate. For instance, he continuously uses the example of MySpace to show a “2.0” company, which I think means that you can enter text as well as read it on a web page. All well and good. But we all know what happened to MySpace, and of course he missed the point entirely about this new web environment: low barriers to entry often mean low barriers to exit.
And the open, collaborative company being the best model – well, I think we all know a certain computer company famous for phones and music that is arguably quite successful, and is probably one of the most closed, non-collaborative (at least with its customers) on the planet. So that sort of takes away that argument.
The reality of business is far more complicated. Collaboration is an amazing tool, and should be leveraged heavily. However, at the end of the day, after you do your research you need to pick a strategy and stick with it. Asking thousands of people to assist you in building your product probably will not work well.
Open Source is great – but some proprietary products are quite functional as well, have a long track record, are well supported, and will probably be upgraded.
Everything has its place, so use what works where it is needed. There is no single answer, sadly.
So did I waste my time reading the book? Did I make a bad choice? Not at all! Reading the opinions and thoughts of others is almost always useful, and it’s important to consider opinions other than your own. If nothing else, thinking through the process either convinces you that you are wrong, or helps you understand better why you are right.
The Data Scientist
A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it.
In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets.
The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly.
I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas.
Persistence
The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all?
This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization.
Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to.
Processing
Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on.
A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one.
Presentation
This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays.
This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does.
And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist.
Why I’m excited
I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail.
Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data.
So watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.
Developing a Cost Model for Cloud Applications
Note - please pay attention to the date of this post. As much as I attempt to make the information below accurate, the nature of distributed computing means that components, units and pricing will change over time. The definitive costs for Microsoft Windows Azure and SQL Azure are located here, and are more accurate than anything you will see in this post: http://www.microsoft.com/windowsazure/offers/
When writing software that is run on a Platform-as-a-Service (PaaS) offering like Windows Azure / SQL Azure, one of the questions you must answer is how much the system will cost. I will not discuss the comparisons between on-premise costs (which are nigh impossible to calculate accurately) versus cloud costs, but instead focus on creating a general model for estimating costs for a given application.
You should be aware that there are (at this writing) two billing mechanisms for Windows and SQL Azure: “Pay-as-you-go” or consumption, and “Subscription” or commitment. Conceptually, you can consider the former a pay-as-you-go cell phone plan, where you pay by the unit used (at a slightly higher rate) and the latter as a standard cell phone plan where you commit to a contract and thus pay lower rates. In this post I’ll stick with the pay-as-you-go mechanism for simplicity, which should be the maximum cost you would pay. From there you may be able to get a lower cost if you use the other mechanism. In any case, the model you create should hold.
Developing a good cost model is essential. As a developer or architect, you’ll most certainly be asked how much something will cost, and you need to have a reliable way to estimate that. Businesses and Organizations have been used to paying for servers, software licenses, and other infrastructure as an up-front cost, and power, people to the systems and so on as an ongoing (and sometimes not factored) cost. When presented with a new paradigm like distributed computing, they may not understand the true cost/value proposition, and that’s where the architect and developer can guide the conversation to make a choice based on features of the application versus the true costs.
The two big buckets of use-types for these applications are customer-based and steady-state. In the customer-based use type, each successful use of the program results in a sale or income for your organization. Perhaps you’ve written an application that provides the spot-price of foo, and your customer pays for the use of that application. In that case, once you’ve estimated your cost for a successful traversal of the application, you can build that into the price you charge the user. It’s a standard restaurant model, where the price of the meal is determined by the cost of making it, plus any profit you can make.
In the second use-type, the application will be used by a more-or-less constant number of processes or users and no direct revenue is attached to the system. A typical example is a customer-tracking system used by the employees within your company. In this case, the cost model is often created “in reverse” - meaning that you pilot the application, monitor the use (and costs) and that cost is held steady. This is where the comparison with an on-premise system becomes necessary, even though it is more difficult to estimate those on-premise true costs. For instance, do you know exactly how much cost the air conditioning is because you have a team of system administrators? This may sound trivial, but that, along with the insurance for the building, the wiring, and every other part of the system is in fact a cost to the business.
There are three primary methods that I’ve been successful with in estimating the cost. None are perfect, all are demand-driven. The general process is to lay out a matrix of:
- components
- units
- cost per unit
and then multiply that times the usage of the system, based on which components you use in the program. That sounds a bit simplistic, but using those metrics in a calculation becomes more detailed. In all of the methods that follow, you need to know your application. The components for a PaaS include computing instances, storage, transactions, bandwidth and in the case of SQL Azure, database size. In most cases, architects start with the first model and progress through the other methods to gain accuracy.
Simple Estimation
The simplest way to calculate costs is to architect the application (even UML or on-paper, no coding involved) and then estimate which of the components you’ll use, and how much of each will be used. Microsoft provides two tools to do this - one is a simple slider-application located here: http://www.microsoft.com/windowsazure/pricing-calculator/
The other is a tool you download to create an “Return on Investment” (ROI) spreadsheet, which has the advantage of leading you through various questions to estimate what you plan to use, located here: https://roianalyst.alinean.com/msft/AutoLogin.do?d=176318219048082115
You can also just create a spreadsheet yourself with a structure like this:
Program Element Azure Component Unit of Measure Cost Per Unit Estimated Use of Component Total Cost Per Component Cumulative CostOf course, the consideration with this model is that it is difficult to predict a system that is not running or hasn’t even been developed. Which brings us to the next model type.
Measure and Project
A more accurate model is to actually write the code for the application, using the Software Development Kit (SDK) which can run entirely disconnected from Azure. The code should be instrumented to estimate the use of the application components, logging to a local file on the development system. A series of unit and integration tests should be run, which will create load on the test system.
You can use standard development concepts to track this usage, and even use Windows Performance Monitor counters. The best place to start with this method is to use the Windows Azure Diagnostics subsystem in your code, which you can read more about here: http://blogs.msdn.com/b/sumitm/archive/2009/11/18/introducing-windows-azure-diagnostics.aspx This set of API’s greatly simplifies tracking the application, and in fact you can use this information for more than just a cost model.
After you have the tracking logs, you can plug the numbers into ay of the tools above, which should give a representative cost or in some cases a unit cost.
The consideration with this model is that the SDK fabric is not a one-to-one comparison with performance on the actual Windows Azure fabric. Those differences are usually smaller, but they do need to be considered. Also, you may not be able to accurately predict the load on the system, which might lead to an architectural change, which changes the model. This leads us to the next, most accurate method for a cost model.
Sample and Estimate
Using standard statistical and other predictive math, once the application is deployed you will get a bill each month from Microsoft for your Azure usage. The bill is quite detailed, and you can export the data from it to do analysis, and using methods like regression and so on project out into the future what the costs will be. I normally advise that the architect also extrapolate a unit cost from those metrics as well. This is the information that should be reported back to the executives that pay the bills: the past cost, future projected costs, and unit cost “per click” or “per transaction”, as your case warrants.
The challenge here is in the model itself - statistical methods are not foolproof, and the larger the sample (in this case I recommend the entire population, not a smaller sample) is key.
References and Tools
Articles:
http://technet.microsoft.com/en-us/magazine/gg213848.aspx
http://blogs.msdn.com/b/johnalioto/archive/2010/08/25/10054193.aspx
Other Tools:
Windows Azure Use Case: Supplementing Infrastructure
I’ve explained before that Windows Azure is a Platform as a Service - at its simplest, that means that you write software and Azure runs it for you. But what if you are a shop that normally buys “off the shelf” software, and the only software you write is an internal utility here and there - can you still use Windows Azure?
Absolutely. Windows Azure is made up of several components, such as computing, storage and other objects you can call in code. And as such, some companies have extended software and hardware you can use that is backed by Windows Azure. In some cases that’s simply software you can use like you would any other - making Azure a “Software as a Service”, which I’ll write about later.
But in other cases the software a vendor provides a utility you can use for your infrastructure. For instance - “BlobShare”, a free codeplex offering, allows you to create a place where you can upload and download files securely. It allows you to hook in your local Windows accounts (without sending names and passwords over the Internet) or even Google, LiveID or Yahoo logins to access the files. You can think of it as a “corporate DropBox”. Download it and read more about it here: http://blobshare.codeplex.com/ and learn how to hook in Active Directory here: http://blogs.msdn.com/b/vbertocci/archive/2011/10/31/blobshare-sample-acs-protected-file-sharing.aspx
Another option along these lines is using a hardware appliance to store data in Azure. This has a particular attraction, since some of these appliances (such as the offerings from StorSimple) do more than just act as a storage target. The device plugs into the network as storage, moves the data to Windows Azure, de-duplicates (thus saving storage costs), encrypts, acts as a backup device, works with SharePoint and more. No coding is required to use their solution, and it can even act as a Disaster Recovery site.
Storage isn’t the only way to use Windows Azure in place of infrastructure. Using the Application Fabric Service Bus, you can perform some functions of data access and transfer between companies that used to take a VPN setup - which is no longer required. the caching function in the Service Bus can even relieve the need to upgrade a server for performance. And of course SQL Azure allows you to access a SQL Server database without any server at all.
So if you’re thinking that Windows and SQL Azure aren’t for you, think again. Start with the problem you have, and see what options you have for solving it.
Book Review (Book 5) - The Cloud of Unknowing
This is a continuation of the books I challenged myself to read to help my career - one a month, for year. You can read my first book review here. The book I chose for October 2011 was: The Cloud of Unknowing, Anonymous: The role of faith in life. Once again, this is out of order, but the book came in from a hold at the library so I’ll do this one now.
Why I chose this Book:
This book is probably the most far afield for many of my readers – some folks don’t hold a faith, others have faiths that are different from mine. That’s fine – I think this is still an intriguing read.
However this is a religious work – if you’re not into that sort of thing, it’s completely OK. As I’ve mentioned early-on, a book list for your career can include many kinds of books. Faith is such a part of my life that I find it impossible to separate from my day-to-day efforts. To that end, a little about this book is in order.
The book was written in the 14th century by an unknown, anonymous author, probably a Carthusian monk in Europe. It’s a work involving the deep thoughts around the intersection of intellection over contemplation. Contemplation is what ancient Catholics called meditation, or focus. It’s a really deep work involving philosophical history, specifically in the Christian tradition. Even as old and philosophical as it is, many contemporary writers, singers (Like Leonard Cohen) and others have referenced it, used it words, or developed entire works around it.
What I learned:
So what did I learn? Actually quite a bit, specifically on the role in contemplation in life. As a technologist, I tend to stay incredibly busy and the temptation is always to be very scattered. But lack of focus is often the enemy of getting things done correctly. Focusing on what really matters and being in the moment is a powerful tool in professional life. It has taught me that now, more than ever, I need to decide what I will not give a lot
As I mentioned, this is a religious work – it has far more value to me there than in the pure business sense, although I find those two linked irrevocably. The way I treat others in both personal and business relationships is my character – which I strive to improve every day.
I highly recommend you read something that is internal like this in your career development. Money, success and fame are not all that there is – and those things are not separated from who you are as a person.
Bonus Rant: Conference Speakers and the Golden Rule
Fair warning: This isn’t a technical post. This is my opinion, and not Microsoft’s.
I’ve been a public speaker at events since 1980. Interestingly, even though I’ve done this, it’s never been my paid job. I’ve spoken on technical topics at groups as small as 5 people to over 10,000, in
local and global events. I’ve been on TV, radio, on stage and on webcasts. I teach at college as well.
In all of that time, I’ve come to realize that speaking is the easy part. To be sure, I put an extraordinary amount of work into my presentations. I learn the topic, research speaking styles and communication
methods, fret over my demos and practice relentlessly. But I reiterate that this is the easy part.
The hard part is putting the events together.
Most folks have no idea how hard it is to create, advertise, pay for, staff, run and tear down a technical event. Even small ones. And the folks that do it get none of the glory, none of the praise, and usually all of
the criticism. These folks deserve way more respect than they get.
And the worst offenders in not offering this respect are the speakers.
Let me say that again – some speakers are rude, arrogant and disrespectful of the event planning groups that run their sessions. They have unrealistic expectations, and I’ve seen some actively fight the requirements these teams impose. From not delivering slides on time, to not following the rules the event planners lay out, to not even calling when they don’t show up. Showing up and saying "I'm still working on my slides" with a smirk doesn't impress me as a listener at all - it just makes me think you're dismissing my time as unimportant as a listener. Your content should be nailed way before the event, not the day of. Practicing on the day of presentation is expected - changing it indicates unprofessionalism. Sorry, but there it is.
I don’t bring up problems without suggesting some solutions – that’s my military training coming through. :) Here’s what can fix this issue:
- All speakers should have to demonstrate they have assisted in the logistics of an event somewhere. Or you don’t speak.
- All speakers should follow the rules. Don’t get your slides in, don’t follow the rules? You don’t present. There are LOTS of people wanting
to speak. - All speakers should follow the golden rule. Sure, you’re special, just like everyone else. Understand that the event planners are there – with you – to make the event a success. Treat them like you want to be treated.
No, I’m not pointing anyone out. No, no one asked me to write this post. I’ve just been on both sides of the fence, and you really have no idea how difficult the logistics of an event are until you put one on. Want to find out if I’m right? Volunteer at a local even like SQL Saturday, ask to help out at PASS, TechEd, whatever. The point is, walk a mile in the shoes of the event planner. It will make you a better presenter.
Big Data and the Cloud - More Hype or a Real Workload?
Last week Microsoft announced several new offerings for “Big Data” - and since I’m a stickler for definitions, I wanted to make sure I understood what that really means. What is “Big Data”? What size hard drive is that? After all, my laptop has 1TB of storage - is my laptop “Big Data”?
There are actually a few definitions for this term, most notably those involving the “Four V’s” Volume, Velocity, Variety and Variability. Others disagree with this definition. I tend to try and get things into their simplest form, so I’m using this definition for myself:
Big data is defined as a large set of computationally expensive data that is worked on simultaneously.
Let me flesh that out a little. To be sure, “Big Data” has a larger size than say a few megabytes. The reason this is important is that it takes special hardware to be able to move large sets of data around, store it, process it and so on. (large set)
If you store a LOT of data, but only use a small portion of it at a time, that really isn’t super-hard to do. It’s mainly a storage issue at that point. But, if you do need to work with a large portion of the data at one time, then the memory, CPU and transfer components of the system have to adapt to be responsive - new ways to work with that data (game theory, knot-algorithms, map-reduce, etc.) need to be brought into play. (computationally expensive)
Once that data is loaded into the processing area (memory or whatever other mechanism is used) it must be worked on in parallel to come back in a reasonable time. You have two options here - you can scale the system up with more internal hardware (CPU’s, memory and so on) or you can scale it out to have multiple systems work on it at the same time using paradigms such as map/reduce and so on. Actually, when you lay this out in an architecture diagram, scale up or out doesn’t actually change the logical structure of the process - in scale out the network becomes the bus, and the nodes become more RAM and computing power. Of course, there are changes in code for how you stitch the workload back together. (worked on simultaneously)
So back to the original question. Is Big Data, as I have defined it here, a workload for Windows and SQL Azure? Absolutely! In fact, it’s probably one of the main workloads, and I believe it represents the latest, and perhaps also the earliest frontier of computing. Jim Gray, a former researcher here at Microsoft and a hero of mine, was working on this very topic. I believe as he did - all computing is simply an interface over data.
Microsoft has multiple offerings on the topic of Big Data. In posts that follow from myself and my co-workers, we’ll explore when and where you use each one. Whether you are a data professional or a developer, this is the new frontier - don’t wait to educate yourself on how to leverage Big Data for your organization.
Hadoop on Windows Azure and SQL Server - Microsoft’s partnership to include Hadoop workloads on Windows Azure and SQL Server/Parallel Data Warehouse (PDW)
LINQ to HPC - Microsoft’s High-Performance Computing SKU of HPC is now in Azure
Windows Azure Table Storage - A key/value pair type storage with full partitioning that is immediately consistent, able to handle huge loads of data and works with any REST-compatible language
Other offerings - Including the new Data Explorer, Project Daytona (with a Big Data Toolkit for Scientists and researchers), Power View and more.
The era of Big Data is here. And you can use Windows and SQL Azure to bring it to your organization.
