Common misconceptions about DAOS
More than a few articles and posts have appeared around the Yellowsphere regarding DAOS, and there's always some L.H. Putgrass responding on these threads claiming that DAOS is somehow the spawn of Satan. Of course, such claims are based almost entirely on totally incorrect ideas about how DAOS operates, so I thought I'd clear up a few misconceptions.
Myth 1: DAOS requires transaction logging and transaction logging is bad. Well, it's true that you need transaction logs for DAOS, but it's not true that transaction logging is bad. Poor implementations of transaction logs are bad, just like poor implementations of anything. So if you put your translog on a SAN volume that also houses, say, your Notes data directory, then you're not going to get a good result. The translog requires a high-speed, localized dedicated drive (or RAID), so don't even think of implementing it any other way. But maintaining a transaction log is trivial, and at this point, you shouldn't be running production environments any other way.
Myth 2: DAOS is SCOS. I specifically asked the Domino Server Chief Architect about this almost a year ago and he laughed, "DAOS does not share a single line of code with SCOS." The architectures are fundamentally different. The only thing they have in common is that they are intended to achieve the same outcome: reduced cost of ownership for high-attachment volume. SCOS didn't deliver that. DAOS does.
Myth 3: DAOS is insecure. No, it's not. The Domino 8.5 beta 1 did not include encryption on .NLO files (it includes LZ1 compression, not encryption,) but the DAOS team has been quite clear that the individual files written to disk in the gold version will be encrypted using the server's Notes credentials. Of course, even if they weren't encrypted, does that make them less secure than storing in NSFs? Do you locally encrypt your mail NSFs on the server? No? Then the degree of risk is exactly the same: access to the OS file system trumps all Domino security, period.
Myth 4: DAOS ruins your backups. Actually DAOS helps your backups tremendously. Everyone knows the problems with attachment duplication in mail NSFs -- if you send a 5MB attachment to 100 users, it's consuming 500MB of drive space. But if your backup approach is to make copies of your mail NSFs (and all but the most sophisticated backup solutions do exactly that) then you're backing up that 500MB every night. That means in 100 days of backups, that 5MB attachment is now devouring 50GB of archival space! With DAOS, if you write the .NLO file to your backup system, then you only have to do it ONCE, since the file mod time doesn't change.
Myth 5: DAOS messes up your restores. That depends entirely on how you want to maintain your attachment store. In SCOS, when the last instance of a pointer was removed from an NSF, the central object store was deleted. In DAOS, you can control your retention period. The setting is "Defer object deletion for X days," and you can set it from 0 to 9999. So if you're truly worried about your restore of attachments being valid, you can retain your attachments in the file structure for over 27 years. It's not expensive to do so, if you're not repeatedly making duplicates of the file in your backup system. (See Myth 4.)
Myth 6: DAOS doesn't get you much, because disk space is cheap. GBs are cheap, but managing them is not. DAOS reduces your backup and restore cycle times and storage needs. It aids in NSF reliability. It simplifies mail quota rules (do you even need them?)
Myth 7: DAOS breaks replication. Clients, and even other servers, have no idea that an attachment is stored in DAOS. The maintaining of the link between the .NLO and the Notes document happens at low enough level in the API that anything reading documents has no idea that DAOS is even in use. Like transaction logging, it's invisible to any outside process.
The authoritative source for DAOS details can be found at the Domino Blog. If you're not sure that you want to implement DAOS on the first day you install a Domino 8.5 server, go read the articles there. And check out the overview on Developer Works. I guarantee that if you take the time to learn how it works, you'll be absolutely salivating for it.




Comments
Posted by Chris Whisonant At 12:53:41 PM On 08/21/2008 | - Website - |
Posted by Chris Miller At 01:00:10 PM On 08/21/2008 | - Website - |
But honestly, given that an 8.5 server has DAOS, summary compression, policy-based server archiving and a new backup API, I don't understand why people think quotas are a good idea. I think it really highlights the arrogance of a lot of IT departments that they think it's okay to put the burden of space management on business users, instead of putting it on automated processes. :-/
Posted by Nathan T. Freeman At 01:07:24 PM On 08/21/2008 | - Website - |
Sorry ... couldn't resist.
Posted by Jeremy Hodge At 02:09:47 PM On 08/21/2008 | - Website - |
Posted by Brian Green At 02:13:45 PM On 08/21/2008 | - Website - |
Posted by Andrea Waugh-Metzger At 02:20:49 PM On 08/21/2008 | - Website - |
Great write up, as usual. 8.5 has so many things going for it, I can't wait to help all of our customers upgrade as soon as possible.
Posted by Chris Blatnick At 02:44:11 PM On 08/21/2008 | - Website - |
Also, does encryption cause CPU performance issues?
What about antivirus? Is it compatible?
You also talk about "defer" and that deletion should be defered to maybe 30 days... But maybe there will be plenty of unuseful atts (e.g. spams) that will leave unused atts for more than one month. This will case more storage loss due to atts that should not be stored because they were deleted.
Once I implement DAOS, Is there any way to really see the savings?
Posted by Chris At 02:57:56 PM On 08/21/2008 | - Website - |
Posted by Henning Heinz At 03:35:43 PM On 08/21/2008 | - Website - |
DAOS is smart enough to divide the files into a folder structure that keeps the operating system happy.
"Also, does encryption cause CPU performance issues?"
Not anymore than, say, port encryption. The truth is, CPU is *VERY* unlikely to be a bottleneck for you on mail servers. It's far more likely that you have an I/O bottleneck, which is where DAOS will help you more.
The workload for local encryption on these files is very light, just as the workload is for medium encryption on local NSFs, and network ports.
"What about antivirus? Is it compatible?"
You'd have to ask your AV vendor for an authoritative answer, but since most of them work by either intercepting the message while it's in the mail.box (in which case it's not in DAOS yet,) or by creating a temporary file by using the API to detach a copy (in which case it's invisible that the file was even IN DAOS,) I would expect AV to work just fine.
Probably more importantly, I know for a fact that Lotus is actively working with both backup and AV vendors to ensure compatibility.
"You also talk about "defer" and that deletion should be defered to maybe 30 days... But maybe there will be plenty of unuseful atts (e.g. spams) that will leave unused atts for more than one month. This will case more storage loss due to atts that should not be stored because they were deleted."
First off, you can adjust the threshold for minimum file size before you bother with DAOS. So you can say "don't bother to store the 3K GIF attachments in DAOS."
Second, if you're getting spam with file attachments in a volume where this is a concern for you, may I suggest that, for the sake of your users, you address THAT problem? (Not you specifically, but anyone that reads here.) The best place to start is { Link }
Third, even if you're storing a large number of files in DAOS in that situation, at least you're doing a hash-check to make sure you're only storing a piece of spam once, instead of duplicating across all your NSFs. How would centralizing that one attachment make you worse off than duplicating across all your users?
If you're server is using FAT (shudder) then the maximum number of files in a directory is 64K. On an NTFS volume, it's over 4 billion files. Please reference the discussion on UNID collisions for some discussions on what the number 4 billion is really like.
"Once I implement DAOS, Is there any way to really see the savings?"
Yes. There are several ways to get at the information, but I think the easiest is the Files tab on the administrator client.
Posted by Nathan T. Freeman At 03:49:30 PM On 08/21/2008 | - Website - |
Also AV and AntiSPAMS work in the following way: The SMTP task writes the email in the mail.box and the AntiSPAM puts the email in dead state (by using an extension manager) and it puts this email in an internal queue. The another AntiSPAM server task will read this email to analyze it.
Here the attachment was already cached in DAOS... that's why even if the ANTISPAM removes the attachment or the email, DAOS will already cache it.
Also you take about 4 billions (4.000.000.000) of files in NTFS... it makes me think in a typical situation:
1 user receives 50 emails per day and from these 50 emails there are 40 atts per day. In five years this user will have 73000 atts.
If I have 50.000 users in one server then I will have: 3.650.000.000 attachments.
So it's quite near the limit.
I know you will say that I should archive but as you know it depends on each organization....
At the same time DAOS reduces fixup time... but what about if the file system crashes and after reboot it needs to analyze 3.650.000.000 entries?
So I think DAOS can work in a small/medium organization but I'm afraid that in large organizations it will not scale...
Posted by Chris At 07:53:08 AM On 08/22/2008 | - Website - |
Posted by Jeremy Hodge At 09:34:50 AM On 08/22/2008 | - Website - |
In this process, DAOS IS INVISIBLE. When you make the API request, the server automatically knows to extract the file from the NLO object instead of from the $File item in the note (which is itself just a pointer to a special BLOB appended to the NSF.)
So let's take your example of limitations on the operating system....
Let's take all of your assumptions as a given. Your scenario is that every single user receives 40 unique attachments, sent exclusively to that user alone, for 365 days a year, 5 years straight, with no filtering whatsoever, and that you retain 100% of all messages. And you put 50,000 such users on a single Windows server.
To say this stretches the boundaries of imagination is putting it likely.
But I'll accept your hypothesis. Let's say the average size of these attachments is a paltry 100KB. That means that each user has a whopping 7.3GB in attachments. With 50,000 users, that's 365 TERAbytes of attachments.
On a single Windows server.
365 TB of attachment data, without a single duplication, and accessed by 50K users. On an NTFS volume.
This is your definition of how to architect for an enterprise?
Chris, it's not DAOS that doesn't scale here. It's your proposed solution. You're saying that you'd be BETTER OFF locally storing 365TB of attachments into NSFs, where they slow down indexing, new messages, routing, backups, replication, compacting and pretty much every other imaginable operation. But that's BETTER than running a checksum on these attachments at post time, compressing & encrypting them, and storing them as managable, atomic files that can be individually handled. All because you're afraid that NTFS can't handle enough files in a single folder.
Truly, sir, you have a dizzying intellect.
Posted by Nathan T. Freeman At 10:09:10 AM On 08/22/2008 | - Website - |
@13: You didn't catch what I tried to say. Let's put an example: User Peter receives lots of SPAMS. So... when one SPAM email comes the nSMTP.exe task will write this email in the mail.box. At this time this email can have an attachment: Either because it's a "real" attachment or because the HTML portion of the email is bigger than 50 KB and it is internally stored as an attachment.
The Antivirus or ANTISPAM will (via an extension manager) catch this UPDATE event and it will put the email in dead state (it cannot be analyzed at this time since else it will slow down the nserver or nsmtp process). Then this email is later analyzed by an antivirus.
What I mean here is that even if the AntiSPAM sees that this email is a SPAM, these ATTS will be already in the DAOS folder!!!
This is what I try to tell you: Peter will probably not receive the SPAM email... but DAOS will be affected by SPAMs.
So.. if 50% of emails are SPAMs and 20% of these emails contains atts then (neraly) 10% of the attachments in the DAOS folder will contain SPAMS.
Try to read it several times and you will discover that what I'm saying is not stupid.
Also my AV or AntiSPAM vendor is doing a well product (e.g. mcafee) since this is the usual way virus and spams are checked and it is OK that it is done on this way.
I know what I'm talking, I'm also a developer and I've been working with Domino since R4 and I know the architecture.
7 GB of atts is not a strange number... I have several customers having DBs biger than 10 GB.... even if they are not email DBs.
Posted by Chris At 11:32:11 AM On 08/22/2008 | - Website - |
But, fine.... let's continue on with your concerns.
1) If you have 50,000 users, why is your SMTP inbound gateway the same server you're supporting your users on? Best practice for the enterprise is to have inbound SMTP go through a dedicated server -- preferably one that doesn't bounce back NDRs.
2) If you ARE using an inbound gateway, then don't turn on DAOS on that machine. Or turn it on and set your purge interval to 2 days. Or turn it on and set the purge interval to 365 days. You still aren't pushing the edge of the capabilities of your server.
DAOS is a good piece of technology. It's not magic. If you deliberately set out to have a bad implementation of it, then you can certainly achieve that goal. And everything you've said here is "if I do this in a worst practice fashion, and don't pay attention to this, and I do it all on the lowest end OS supported by Domino, but with staggeringly high numbers on a single server, then I might break something in five year."
Well, that's pretty much deliberately trying to make it fail as far as I'm concerned.
If you want to implement DAOS because you want to realize the incredible storage, maintenance and performance benefits of it, but you're genuinely concerned about inbound spam retention, the move your inbound SMTP to a gateway machine in the DMZ, and put your AV/Spam control software there. And don't run DAOS on it.
That's the enterprise solution.
Posted by Nathan T. Freeman At 02:10:32 PM On 08/22/2008 | - Website - |
Posted by Erik Brooks At 04:30:18 PM On 08/22/2008 | - Website - |
Is there any way to do it? Maybe a load compact?
Posted by peter At 10:51:56 PM On 08/22/2008 | - Website - |
Posted by Nathan T. Freeman At 12:12:53 AM On 08/23/2008 | - Website - |
Posted by Erik Brooks At 10:58:38 AM On 08/23/2008 | - Website - |
How are these attachments handled on Web applications? Is it also transparent?
I have some Web applications where I get the attachment names in a PostSave event and display custom icons - PDFs, Word, Excel files and so on.
I wonder if I can reference these attachments which now are not in the NSF but in the file system the same way.
Posted by Mael At 01:49:11 PM On 08/25/2008 | - Website - |
Posted by Nathan T. Freeman At 01:58:45 PM On 08/25/2008 | - Website - |
Posted by Charles Robinson At 09:31:12 AM On 08/26/2008 | - Website - |
No, of course not. If you could get 50,000 active users on a Windows box, that alone would be reason to fall over dead in disbelief.
Posted by Nathan T. Freeman At 01:09:59 PM On 08/28/2008 | - Website - |