Monday, March 26, 2012

More connect feedback required please

[Microsoft follow-up]

I submitted a posting to connect here: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=287213

Here is what I wrote:

A number of tasks output data to variables or text files. e.g. WMI Data Reader Task, Execute SQL Task, Web Service Task.

Consuming that output in a data-flow isn't particularly easy. You either need a Flat File source adapter (which requires an othrewise superfluous connection manager) or write come code in a script component to parse the recordset. There is simply no easy way to push data from these sources into the dataflow.

Thw built-in mechanism for passing data between different tasks is raw files. Currently they can only be used by the dataflow but I see no reason why they couldn't be used by other tasks as well. It makes complete sense to me for a WMI Datareader Task to push some data into a raw file and then we chew that data up in a dataflow.

The following response came back

Our current architecture actually doesn't have the buffer system as in Data Flow, when you are in the Control Flow. What you are asking would require us to build a similar buffer system in the Control Flow, which is a fundemantal architectural change. We'll not be able to take this, sorry.

I'm afraid I don't understand that response. Obviously I know that buffers are only in the data-flow - but I don't see why that's relevant. Raw files are just files on the file system, same as any other. OK, their format is very proprietary but its you guys that built the format. Essentially all I'm asking you to do is output the data in raw file format as opposed to flat file format. There's no notion of buffers in a raw file because its just a lump of data. Or is there? If not, I'm afraid I don't understand your answer.

Please could you clarify the answer?

-Jamie

Hi,

Are the [Microsoft follow-up] tags not working anymore? Smile

-Jamie

|||

They do work. Sometimes, is not easy to find a brave soul to catch a hot potato. Smile

I was waiting for somebody from the group that triaged this item to respond, but let me present my view:

The row files are basically data flow buffers streamed into files. More or less it is collection of "raw" data flow buffers copied byte by byte. It would be possible to mimic this format from other tasks but it would not be a trivial task. The data other tasks are dealing with is differently structured and it will probably need to go through chains of adaptations/conversions to make it conform to the metadata parameters understandable by the data flow.

I understand your motivation behind this request and it is more than sensible. I just wish we had a standardized metadata system across runtime and pipeline that would allow us to easily implement this.

Thanks,

Bob

|||

Bob Bojanic - MSFT wrote:

They do work. Sometimes, is not easy to find a brave soul to catch a hot potato.

Fair enough. Credit to you then Bob for picking it up!

Bob Bojanic - MSFT wrote:

I was waiting for somebody from the group that triaged this item to respond, but let me present my view:

The row files are basically data flow buffers streamed into files. More or less it is collection of "raw" data flow buffers copied byte by byte. It would be possible to mimic this format from other tasks but it would not be a trivial task. The data other tasks are dealing with is differently structured and it will probably need to go through chains of adaptations/conversions to make it conform to the metadata parameters understandable by the data flow.

I understand your motivation behind this request and it is more than sensible. I just wish we had a standardized metadata system across runtime and pipeline that would allow us to easily implement this.

Thanks,

Bob

OK, that kinda makes sense. Here's another even hotter potato for you then. Why DON'T you have "a standardized metadata system across runtime and pipeline ". It has always seemed strange to me that (e.g.) variable data types are different to pipeline data types.

-Jamie

|||

Jamie Thomson wrote:


OK, that kinda makes sense. Here's another even hotter potato for you then. Why DON'T you have "a standardized metadata system across runtime and pipeline ". It has always seemed strange to me that (e.g.) variable data types are different to pipeline data types.

-Jamie

HAHAHA! You know, that's so true. If, in your example, we had consistent data types between variables, pipelines, parameter mappings, etc... consumer adoption of SSIS would be much greater and our jobs on this forum would be made SO much easier.

|||

Jamie Thomson wrote:

Here's another even hotter potato for you then. Why DON'T you have "a standardized metadata system across runtime and pipeline ". It has always seemed strange to me that (e.g.) variable data types are different to pipeline data types.

Hmm. I can try to answer it but have to watch not to burn my fingers too much. Smile

IMHO, and I can be off since these designs/implementations date before my time with this group, we focused too much on breaking DTS into two functional modules (runtime and pipeline) that we failed to realize some obvious synergies they have. We started building two separate modules from the start and divided our resources around them. That resulted in separate designs, implementations and overall look and feel. We even had two hamsters, at some point, named "Runtime" and "Pipeline". Smile

Again, this is only my view and other people on the team might have different insights. I wish we did a better job on this, and I believe we should have a serious thinking around making it better. It is not likely we will be able to improve much of it in Katmai, though.

Also, to be fair, coming up with standardized data type system that could be used for many purposes is quite a complex task given the ambiguity of implementations in existing data sources and apparent lack of standardization. Even seemingly simple tool like the Import/Export wizard, which is supposed to move data between all sorts of heterogeneous data sources, faces loads of problems in its quest to make the transfers "simply work". In a perfect world, we should have had standardized implementations of data type systems, provided by a platform (OS), all other layers (DBs, CLR, tools and services) can plug into, and we would not need to worry about these things.

Anyway, I hope this adds some light…

Thanks,

Bob

|||

Wow. That's a very candid response Bob. Thank you very much, I really appreciate the honesty.

As always, if we in the community can help in any way to shape what happens here in the future then come and knock on our door.


-Jamie

|||

Thank you, Jamie, for keeping us honest.

The community is here to keep us both; inspired and humble, at the same time. I like it for that.

|||

Now here's something I keep thinking about

SSIS trying to be all things to all people. Why? (the telephone toaster spring to mind at this point)

Oracle and SQL are both now excellent products. Chances are your typical Oracle system will appear to be better because the design of it will be of higher quality. This is only because the barriers to entry are so high, unlike MS where any muppet can come along and design some crappy Access database, migrate using a wizard, and then think they are a dba. This is known and historic and as time goes by the overall design quality of SQL db will meet that of Oracle.

This is also true because most SQL installations originally came about because people didn't or couldn't or wouldn't spend the money on hiring Oracle and all their costly consultants to come and set them up a db. But also they wouldn't spend the money on training people how to use SQL properly either.

This means that any company using Oracle or SQL is likely to stay doing so, and any new companies are likely to use SQL. It's "cheaper". MS cleverly understand that the extra costs of getting MS stuff going is not measurable on the balance sheet Smile

So assuming this logic, I would bet money that the number of people using SSIS in a non SQL environment is on a par with the number of people using obscure (non IE/FF) browsers.

If you had a website, you wouldn't even bother spending the money trying to make it work with the obscure browser. It is not worth the pain.

So assume that no CIO in their right mind is going to sanction the switching of a (probably working very nicely thankyou) Oracle ETL tool, to SSIS. Why do MS persist with this strategy of trying to make SSIS all things to all people?

Just make all of the SSIS datatypes the same as SQL! That's where the data is going to largely end up anyway.

I would bet that 99% of SQL connection managers are the following:

sql connection; flat file/csv connection; excel connection.

For the 1% (and please someone from MS correct me on these figures - you do after all capture 'usage' information) of other datasources, why not have dataconverters, to just convert them all to sql datatypes?

|||Personally, I have done a fair percentage of my work in SSIS against non-SQL Server databases, so I really like their strategy. If SSIS didn't work with other databases, it would be fairly useless as an enterprise ETL tool.|||

jwelch wrote:

Personally, I have done a fair percentage of my work in SSIS against non-SQL Server databases, so I really like their strategy. If SSIS didn't work with other databases, it would be fairly useless as an enterprise ETL tool.

Couldn't agree more. Many of the SSIS I have built don't go near SQL Server except for logging purposes.

-Jamie

|||

I'm just basing it on my experience of the FTSE listed companies I have worked/consulted at since 1995.

Out of 10, only 1 was Oracle, the rest were SQL and only SQL.

I also don't remember attending any interviews at places where they were also using Oracle etc.

For every multinational giant, there are hundreds of SMEs all running SQL.

--

So if you have another DB and SQL. You are still having to convert to SSIS types along the way.

At least if they used SQL types, you would only have convert once.

|||

We do not live inside SQL Server and simply reusing the SQL types is not going to help us much under the hoods. There is still additional layer between us and SQL engine and conversions are not easy to avoid. It would possibly make the exposed semantics simpler, but only if we were able to consolidate the data type system in the entire product.

The devil lies in details -- making these conversions transparent and seamless is quite delicate business, and its complexity jumps exponentially as we increase the number of considered data sources.

Thanks,

Bob

|||

Product suggestion:

Allow the package to have an optional mode.

"CommonDataTypeMode"

Which you can set to SQL Server, Oracle etc

By setting this is will automatically default all of your datatypes to the chosen mode. e.g. any datetime will default to DT_DBTIMESTAMP

I am wasting so much time having to go in and 'correct'* data type information.

What is gained by having it:

Less time spent typing into miniscule textboxes.

Fewer errors due to always having the correct type.

*I have also raised a bug on the connect site regarding SSIS changing your datatypes when you change an expression in the derived column editor. I haven't heard anything back and don't expect to.

sql

No comments:

Post a Comment