« The Measure of a Day | Main| In the Halls of MSFT (CAUTION: POLITICALLY INCORRECT) »

Common misconceptions about DAOS


More than a few articles and posts have appeared around the Yellowsphere regarding DAOS, and there's always some L.H. Putgrass responding on these threads claiming that DAOS is somehow the spawn of Satan.  Of course, such claims are based almost entirely on totally incorrect ideas about how DAOS operates, so I thought I'd clear up a few misconceptions.

Myth 1: DAOS requires transaction logging and transaction logging is bad.  Well, it's true that you need transaction logs for DAOS, but it's not true that transaction logging is bad.  Poor implementations of transaction logs are bad, just like poor implementations of anything.  So if you put your translog on a SAN volume that also houses, say, your Notes data directory, then you're not going to get a good result.  The translog requires a high-speed, localized dedicated drive (or RAID), so don't even think of implementing it any other way.  But maintaining a transaction log is trivial, and at this point, you shouldn't be running production environments any other way.

Myth 2: DAOS is SCOS.  I specifically asked the Domino Server Chief Architect about this almost a year ago and he laughed, "DAOS does not share a single line of code with SCOS."  The architectures are fundamentally different.  The only thing they have in common is that they are intended to achieve the same outcome: reduced cost of ownership for high-attachment volume.  SCOS didn't deliver that.  DAOS does.

Myth 3: DAOS is insecure.  No, it's not.  The Domino 8.5 beta 1 did not include encryption on .NLO files (it includes LZ1 compression, not encryption,) but the DAOS team has been quite clear that the individual files written to disk in the gold version will be encrypted using the server's Notes credentials.  Of course, even if they weren't encrypted, does that make them less secure than storing in NSFs?  Do you locally encrypt your mail NSFs on the server?  No?  Then the degree of risk is exactly the same: access to the OS file system trumps all Domino security, period.

Myth 4: DAOS ruins your backups.  Actually DAOS helps your backups tremendously.  Everyone knows the problems with attachment duplication in mail NSFs -- if you send a 5MB attachment to 100 users, it's consuming 500MB of drive space.  But if your backup approach is to make copies of your mail NSFs (and all but the most sophisticated backup solutions do exactly that) then you're backing up that 500MB every night.  That means in 100 days of backups, that 5MB attachment is now devouring 50GB of archival space!  With DAOS, if you write the .NLO file to your backup system, then you only have to do it ONCE, since the file mod time doesn't change.

Myth 5: DAOS messes up your restores.  That depends entirely on how you want to maintain your attachment store.  In SCOS, when the last instance of a pointer was removed from an NSF, the central object store was deleted.  In DAOS, you can control your retention period.  The setting is "Defer object deletion for X days," and you can set it from 0 to 9999.  So if you're truly worried about your restore of attachments being valid, you can retain your attachments in the file structure for over 27 years.  It's not expensive to do so, if you're not repeatedly making duplicates of the file in your backup system.  (See Myth 4.)

Myth 6: DAOS doesn't get you much, because disk space is cheap.  GBs are cheap, but managing them is not.  DAOS reduces your backup and restore cycle times and storage needs.  It aids in NSF reliability.  It simplifies mail quota rules (do you even need them?)  

Myth 7: DAOS breaks replication.  Clients, and even other servers, have no idea that an attachment is stored in DAOS.  The maintaining of the link between the .NLO and the Notes document happens at low enough level in the API that anything reading documents has no idea that DAOS is even in use.  Like transaction logging, it's invisible to any outside process.

The authoritative source for DAOS details can be found at the Domino Blog.  If you're not sure that you want to implement DAOS on the first day you install a Domino 8.5 server, go read the articles there.  And check out the overview on Developer Works.  I guarantee that if you take the time to learn how it works, you'll be absolutely salivating for it.

Comments

1 - Great mythbuster article Nathan. Though I haven't seen anyone stating a myth otherwise, I would note that this can significantly improve mail routing performance. Consider that 500 MB of attachments. Instead of Domino now having to write out 500 MB to disk at routing time (thus also slowing down mail delivery in general), the router will create the NLO file and just have to write a few KB to each NSF!

2 - I heard a myth that DAOS screws with quotas but it actually work seamlessly in the background

3 - @2 - Chris, I kinda wish you could tell DAOS not to count centrally stored files towards a mail quota.

But honestly, given that an 8.5 server has DAOS, summary compression, policy-based server archiving and a new backup API, I don't understand why people think quotas are a good idea. I think it really highlights the arrogance of a lot of IT departments that they think it's okay to put the burden of space management on business users, instead of putting it on automated processes. :-/

4 - I heard a myth that DAOS was hanging out with Paris Hilton and it made Lindsey Lohan jealous, but it turns out it was SCOS not DAOS, so she was fine with that.

Sorry ... couldn't resist.

5 - Great info Nathan, thanks for taking time to write this up.

6 - Thanks for the great write-up, Nathan! Glad to see that you are embracing DAOS - it's a feature the developers are very happy about and even happier that it's getting great press! Thanks too for the plug about dominoblog! As an aside - anyone looking at the articles there, I also encourage you to read the comments and responses - GREAT dialog and even more questions/response!

7 - @3 - Man, Nathan...you will never move up the ladder in any company with all of these crazy, makes-total-sense kind of statements! I mean, come on...business users are supposed to manage their own storage, aren't they? How else can IT make life difficult for them? Emoticon

Great write up, as usual. 8.5 has so many things going for it, I can't wait to help all of our customers upgrade as soon as possible.


8 - The only problem I can see is that my file system will have billions of files... I'm using Windows. I know DAOS manages to have several internal subfolders but I'm not sure if Windows will support so many files and if it causes performances issues.

Also, does encryption cause CPU performance issues?

What about antivirus? Is it compatible?

You also talk about "defer" and that deletion should be defered to maybe 30 days... But maybe there will be plenty of unuseful atts (e.g. spams) that will leave unused atts for more than one month. This will case more storage loss due to atts that should not be stored because they were deleted.

Once I implement DAOS, Is there any way to really see the savings?

9 - And still I would like it even more if transaction logging would be optional. Still a great thing.

10 - "I know DAOS manages to have several internal subfolders but I'm not sure if Windows will support so many files and if it causes performances issues."

DAOS is smart enough to divide the files into a folder structure that keeps the operating system happy.

"Also, does encryption cause CPU performance issues?"

Not anymore than, say, port encryption. The truth is, CPU is *VERY* unlikely to be a bottleneck for you on mail servers. It's far more likely that you have an I/O bottleneck, which is where DAOS will help you more.

The workload for local encryption on these files is very light, just as the workload is for medium encryption on local NSFs, and network ports.

"What about antivirus? Is it compatible?"

You'd have to ask your AV vendor for an authoritative answer, but since most of them work by either intercepting the message while it's in the mail.box (in which case it's not in DAOS yet,) or by creating a temporary file by using the API to detach a copy (in which case it's invisible that the file was even IN DAOS,) I would expect AV to work just fine.

Probably more importantly, I know for a fact that Lotus is actively working with both backup and AV vendors to ensure compatibility.

"You also talk about "defer" and that deletion should be defered to maybe 30 days... But maybe there will be plenty of unuseful atts (e.g. spams) that will leave unused atts for more than one month. This will case more storage loss due to atts that should not be stored because they were deleted."

First off, you can adjust the threshold for minimum file size before you bother with DAOS. So you can say "don't bother to store the 3K GIF attachments in DAOS."

Second, if you're getting spam with file attachments in a volume where this is a concern for you, may I suggest that, for the sake of your users, you address THAT problem? (Not you specifically, but anyone that reads here.) The best place to start is { Link }

Third, even if you're storing a large number of files in DAOS in that situation, at least you're doing a hash-check to make sure you're only storing a piece of spam once, instead of duplicating across all your NSFs. How would centralizing that one attachment make you worse off than duplicating across all your users?

If you're server is using FAT (shudder) then the maximum number of files in a directory is 64K. On an NTFS volume, it's over 4 billion files. Please reference the discussion on UNID collisions for some discussions on what the number 4 billion is really like.

"Once I implement DAOS, Is there any way to really see the savings?"

Yes. There are several ways to get at the information, but I think the easiest is the Files tab on the administrator client.

11 - Regarding AV and AntiSPAMS, you say that DAOS is not enabled in the mail.box but IBM encourages to enable it in the mail.box (I think by this way the att will be cached in DAOS at this earlier moment so IO should be improved when later the router distributes the email)

Also AV and AntiSPAMS work in the following way: The SMTP task writes the email in the mail.box and the AntiSPAM puts the email in dead state (by using an extension manager) and it puts this email in an internal queue. The another AntiSPAM server task will read this email to analyze it.
Here the attachment was already cached in DAOS... that's why even if the ANTISPAM removes the attachment or the email, DAOS will already cache it.

Also you take about 4 billions (4.000.000.000) of files in NTFS... it makes me think in a typical situation:

1 user receives 50 emails per day and from these 50 emails there are 40 atts per day. In five years this user will have 73000 atts.

If I have 50.000 users in one server then I will have: 3.650.000.000 attachments.

So it's quite near the limit.

I know you will say that I should archive but as you know it depends on each organization....

At the same time DAOS reduces fixup time... but what about if the file system crashes and after reboot it needs to analyze 3.650.000.000 entries?

So I think DAOS can work in a small/medium organization but I'm afraid that in large organizations it will not scale...

12 - @11 - 50 emails and 40 attachments per day??? 80% of your emails have attachments? Sounds like you need some business process engineering done, and use lotus notes for what its REALLY good at .... I know a few business partners that can help (me for instance, nathan for another) ...

13 - @11 - Again, I would suggest talking to your AV vendor rather than engaging in the kind of irresponsible speculation that seems to be popular on this topic, but if the AV persists in the process of holding the message in the mail.box, and then analyzing it, then it's going to follow the practice of using the API to extract the file to a temporary space on the harddrive (or purely in memory) to run the analysis. Essentially, it will do an EmbeddedObject.extractFile to a stream (probably in the C API, I should note -- I don't have time to go look up the C API calls) and perform the AV check on that stream.

In this process, DAOS IS INVISIBLE. When you make the API request, the server automatically knows to extract the file from the NLO object instead of from the $File item in the note (which is itself just a pointer to a special BLOB appended to the NSF.)

So let's take your example of limitations on the operating system....

Let's take all of your assumptions as a given. Your scenario is that every single user receives 40 unique attachments, sent exclusively to that user alone, for 365 days a year, 5 years straight, with no filtering whatsoever, and that you retain 100% of all messages. And you put 50,000 such users on a single Windows server.

To say this stretches the boundaries of imagination is putting it likely.

But I'll accept your hypothesis. Let's say the average size of these attachments is a paltry 100KB. That means that each user has a whopping 7.3GB in attachments. With 50,000 users, that's 365 TERAbytes of attachments.

On a single Windows server.

365 TB of attachment data, without a single duplication, and accessed by 50K users. On an NTFS volume.

This is your definition of how to architect for an enterprise?

Chris, it's not DAOS that doesn't scale here. It's your proposed solution. You're saying that you'd be BETTER OFF locally storing 365TB of attachments into NSFs, where they slow down indexing, new messages, routing, backups, replication, compacting and pretty much every other imaginable operation. But that's BETTER than running a checksum on these attachments at post time, compressing & encrypting them, and storing them as managable, atomic files that can be individually handled. All because you're afraid that NTFS can't handle enough files in a single folder.

Truly, sir, you have a dizzying intellect.

14 - @12: First, One email can have 3 atts. Second as you may not know: When you receive emails in MIME format, if the HTML portion of the email is more than 40 KB then this HTML portion of the MIME email will "internally" come as an attachment. And DAOS will also cache it (I already tested it)... so... it's not a strange situation to receive 40 atts per day.

@13: You didn't catch what I tried to say. Let's put an example: User Peter receives lots of SPAMS. So... when one SPAM email comes the nSMTP.exe task will write this email in the mail.box. At this time this email can have an attachment: Either because it's a "real" attachment or because the HTML portion of the email is bigger than 50 KB and it is internally stored as an attachment.

The Antivirus or ANTISPAM will (via an extension manager) catch this UPDATE event and it will put the email in dead state (it cannot be analyzed at this time since else it will slow down the nserver or nsmtp process). Then this email is later analyzed by an antivirus.

What I mean here is that even if the AntiSPAM sees that this email is a SPAM, these ATTS will be already in the DAOS folder!!!

This is what I try to tell you: Peter will probably not receive the SPAM email... but DAOS will be affected by SPAMs.

So.. if 50% of emails are SPAMs and 20% of these emails contains atts then (neraly) 10% of the attachments in the DAOS folder will contain SPAMS.

Try to read it several times and you will discover that what I'm saying is not stupid.

Also my AV or AntiSPAM vendor is doing a well product (e.g. mcafee) since this is the usual way virus and spams are checked and it is OK that it is done on this way.

I know what I'm talking, I'm also a developer and I've been working with Domino since R4 and I know the architecture.

7 GB of atts is not a strange number... I have several customers having DBs biger than 10 GB.... even if they are not email DBs.



15 - @14 - I have lots of databases larger than 10GB. I'm not marveling at a 7GB mail database. I'm marveling at the idea that you'd expect to support 50,000 of them on a single box.

But, fine.... let's continue on with your concerns.

1) If you have 50,000 users, why is your SMTP inbound gateway the same server you're supporting your users on? Best practice for the enterprise is to have inbound SMTP go through a dedicated server -- preferably one that doesn't bounce back NDRs.

2) If you ARE using an inbound gateway, then don't turn on DAOS on that machine. Or turn it on and set your purge interval to 2 days. Or turn it on and set the purge interval to 365 days. You still aren't pushing the edge of the capabilities of your server.

DAOS is a good piece of technology. It's not magic. If you deliberately set out to have a bad implementation of it, then you can certainly achieve that goal. And everything you've said here is "if I do this in a worst practice fashion, and don't pay attention to this, and I do it all on the lowest end OS supported by Domino, but with staggeringly high numbers on a single server, then I might break something in five year."

Well, that's pretty much deliberately trying to make it fail as far as I'm concerned.

If you want to implement DAOS because you want to realize the incredible storage, maintenance and performance benefits of it, but you're genuinely concerned about inbound spam retention, the move your inbound SMTP to a gateway machine in the DMZ, and put your AV/Spam control software there. And don't run DAOS on it.

That's the enterprise solution.

16 - For those fearful of DAOS (and that haven't read the Domino Server Team Blog): You can also enable/disabled DAOS on a per-db level. So if you're really that concerned about a particular db, just turn it off there.

17 - suppose I start using DAOS for a given DB... and then I want to stop using it and to convert my DB to a standad NSF with atts inside, that is, to reinclude the attachments stored in DAOS inside the NSF as it was before.
Is there any way to do it? Maybe a load compact?

18 - @17 - Don't know about a compact with params, but you could definitely do it by creating a local replica, deleting on the server, then recreating on the server.

19 - @17 - IIRC the Server Team blog talks about this. The answer was either a compact or a some function from within the DAOS tracking db.

20 - First of all, many thanks, Nathan, for this great article demistifying DAOS.

How are these attachments handled on Web applications? Is it also transparent?

I have some Web applications where I get the attachment names in a PostSave event and display custom icons - PDFs, Word, Excel files and so on.

I wonder if I can reference these attachments which now are not in the NSF but in the file system the same way.

21 - @20 - Yes, it's totally transparent to your web app. @Attachments and .embeddedObjects both work in EXACTLY the same way as before.

22 - @Chris/Nathan - I've only ever worked with Notes in a SMB setting. Do larger enterprises really run 50,000 users on a single Domino server, and do they really use this production mail server to handle antispam and antivirus filtering? Even in an SMB I offloaded these functions to a Barracuda filtering appliance. I'm having a hard time believing that someone with 50K users wouldn't do the same. I know Domino is capable of it and there are add-on products to make it easier, but in my own experience I quickly hit the wall where it didn't make sense from a TCO perspective.

23 - @22 - "Do larger enterprises really run 50,000 users on a single Domino server, and do they really use this production mail server to handle antispam and antivirus filtering? "

No, of course not. If you could get 50,000 active users on a Windows box, that alone would be reason to fall over dead in disbelief.

Post A Comment

:-D:-o:-p:-x:-(:-):-\:angry::cool::cry::emb::grin::huh::laugh::lips::rolleyes:;-)

11 Aug 

Hire Me 

Lotus-911-Logo.jpg

Search 

Disclaimer 

Welcome to Escape Velocity!

Opinions expressed here by Nathan T. Freeman are not necessarily those of his employer. However, there's a decent chance they are, so check with them if you really want to know.

But really... do you need that kind of validation? Are the opinions expressed here in doubt?

MiscLinks