Data Science Project Ideas

I know we have a lot of guys with some great ideas that need feedback. i.e. how to start the project, how to get financing, how to structure a system, what kind of data to include, what kind of algorithms work well for what types of problems. etc. etc.
I'd love to hear your ideas and if you need help or a sounding board, this is a good place to start.

What I have seen with most startups/initiatives is that they don't get their ideas stolen, they just die on the vine because they don't have a feedback system or the motivation/accountability.  
You are better off sharing than keeping your ideas in the dark.

Let's hear your ideas!  most likely someone is already working on something similar and can help you out, or you can team up.

We will be featuring a thread on the facebook DMMLAI forum as well to send traffic to this thread and try to get a stream of thought going.

I'm also copying good ideas from that forum into this one for easy searchability and follow up.

Hi, I have an idea for image recognition of plants and mushrooms. To collect the data and train the model I'd write an app that everyone could use to make some photos of the plants/mushrooms and the next day or so they would receive all infomation about it and a wikipedia link, if the information seems correct, the user can just click on a like button.

The photos will be uploaded to an online portal, where users can earn activity points for correctly identifying(like button) a plant/mushroom on an image, the activity points will be used for a payment system. People from poor lands who have knowledge about plants/mushroom could easily use this to earn some money, they only have to deliver a wikipedia link in their language, don't have to be english.

I think this would help a lot of people.

 

What do you think about this? How to get financing this?

Well Eugen,  There are a lot of applications out there that already do photo identification for plans and animals.  Now that there are libraries that basically do all of the hard work for you, and you just need to train the nets on what you are looking for, it is more about the data bases that you have available to generate the models with.   

As a matter of experience, I've found that getting user participation is a difficult task on any of these types of projects.  You have to have some sort of incentive for them to participate intitially before you hit a critical mass and people just do it of their own accord.

So with that in mind, if you have a dataset to start out, then you bring something to the table, if not, it's an uphill battle.F
https://play.google.com/store/apps/details?id=org.plantnet&hl=en  is already out there and looks pretty cool.

It's a good idea, but you will need to compete with the incumbent systems out there.

For Funding there are a lot of ways to go.  There are Grants from grants.org, government contracts, finding an angel investor (Basically just a rich guy that likes your idea), or partner with an existing company that does something similar but not directly in competition that would get some benefit from it.  The arrangements between you and whomever gives you money are very diverse in arrangment, and depend on expectations of all parties.

AI Based Portfolio Managment
http://www.unicorninvesting.us/  

forecast for human suffering by region that I think would be a cool project...

I.e. aggregates weather, geo-political, population, resource availability, climate change models, etc. and puts them all into one index to predict the suffering trend for a city/district.

Haven't started it, but I think it would be cool.

Its good idea, i wonder  what are your thoughts of rating the suffering of human based on your selected KPI

regards

Nadeem

Phil Teare So I should give full credit where it's due. My daughter told me this morning that she had a dream that she was in ICT class and they gave her the challenge of writing some code which accurately predicted movie plots. e.g. who done it in a who done it.

My extension to the challenge would be (maybe not for a 6th former in highschool) to then make an adversarial network which could generate unforeseeable plot twists which still seemed like sound plot progression. Encoding formalised plot synopsis seems like a non-trivial sub-challenge / aspect.

Stuart Gray I have a side project I'm just starting work on - a 'self driving business'.

Haven't fully committed to the core business idea, have a few possibilities in mind. Most likely a tried & tested Print on Demand-based online store. The business itself is secondary for now, the main idea is to make it 100% automated post launch, or as close as I can possibly get. Including things like sourcing and creating new designs, marketing, customer acquisition and support using automation, AI, and machine learning. Have most of the theory worked out, just starting to experiment with making it happen.

The reality isn't quite as grand as that makes it sound. While I'm hoping it will be viable and give me a little extra income, self funding would be good enough for now as it's largely intended as a Proof of Concept for a larger vision I can't discus for a few weeks or so (pending grant application).

LikeShow more reactions

 · Reply · 

2

 · August 11 at 12:08pm · EditedManage

Phil Teare

Phil Teare evolutionary mechanisms and RL seem like strong contenders in there

Stuart Gray

Stuart Gray A tool to augment debates & debaters by acting as an independent referee to call out various logical fallacies based on prior exposure to sufficient training data

LikeShow more reactions

 · Reply · 

2

 · August 12 at 7:08amManage

Hide 12 Replies

Phil Teare

Phil Teare Nice. I like this a lot. Might ponder in a little while once I've done my chores here... what are your model concepts so far?

LikeShow more reactions

 · Reply · 

1

 · August 12 at 7:10amManage

Stuart Gray

Stuart Gray Phil Teare None so far, it was a random idea sparked by a recent discussion about how few people know how to genuinely debate a topic and seeing this;

http://perspectiveapi.com/

Most often take things/get personal, confuse argument with debate, think the idea is to 'win' etc... Often things get heated and polarised, whereas a bot could probably aid in a lot of situations without knowing the topic, but given a few basic rules and principles to hold participants to.

It wouldn't solve all debating issues - presenting low quality evidence for example, but might address a lot of the more common ones.

Perspective

PERSPECTIVEAPI.COM

LikeShow more reactions

 · Reply · Remove Preview · 

2

 · August 12 at 7:16amManage

Stuart Gray

Stuart Gray Might also be interesting as a real time checker watching political debates on tv.

LikeShow more reactions

 · Reply · 

1

 · August 12 at 7:18amManage

Phil Teare

Phil Teare Interesting. It's not there yet is it. I tried "The president is ignorant of science" and "The president is the saviour of science". 

The first, strictly speaking should not be offensive (we are all ignorant of many things). It is predicted 'toxic'. Though the word does cause offense. So it is less pedantic than me (possibly a good thing). 

The second is just as dangerous (and so 'toxic') IMO. positive hyperbole may not seem toxic to the truth, but it very much is IMO. Thoughts?

LikeShow more reactions

 · Reply · 

1

 · August 12 at 7:26amManage

Stuart Gray

Stuart Gray Phil Teare I've seen a few similar examples from others. I think in part it probably depends on context - the tool is primarily intended to rate comments, where there's usually more text to work with.

Most sentiment classifiers screw up with a short piece of text, so I'm not too surprised by your results.

It also depends a lot on the objective - preventing offensive comments, contributing meaningfully, factual accuracy, or some other metric i.e. It's fairly easy to lie and spew false information without being offensive

LikeShow more reactions

 · Reply · August 12 at 7:34am · EditedManage

Phil Teare

Phil Teare Right. Personally I'd go for factual accuracy, rather than avoiding offence (despite being a 'lefty libtard'). But no reason not to make multiple filters for each metric. Could be very interesting to see where and how they diverge from correlation.

LikeShow more reactions

 · Reply · 

1

 · August 12 at 7:39am · EditedManage

Phil Teare

Phil Teare I've always thought the slashdot comments rating data is a goldmine for this kind of thing. Are you familiar?

LikeShow more reactions

 · Reply · 

1

 · August 12 at 7:38amManage

Stuart Gray

Stuart Gray Phil Teare can't say I am?

Also worth considering, if the metric for that service is offence, it might be more meaningful when looked at over time, as a pattern. Both for individuals on a topic and in general, and the range of comments a given post or article attracts, rather than individual comments per se.

LikeShow more reactions

 · Reply · August 12 at 7:41amManage

Phil Teare

Phil Teare Absolutely. Trending dynamics could be very interesting across a topic space.

But drivers of one on the other could be enlightening right? E.g. does offence drive counter factuality or vice versa?

Factuality is obviously getting a lot of consideration ATM. Check https://www.trustservista.com/

I'd like to see more weakly supervised mechanisms. That generalise well to expand into other topic spaces reliably. 

Slashdot ('news for nerds') has gone through phases of being an interesting online forum and a shouting match between angry teenage pedants, as you can imagine. But it's almost always been bearable to read the comments since people can vote up or down each other's comments as being: 'insightful', 'interesting' 'informative', 'troll', 'underrated', 'overrated', 'funny', 'flamebait', 'offtopic'. Giving much control to the user regarding their experience, whilst offering good freedom of expression. As well as raising the credible truth value of the visible content. There must be a ton of data by now (it's as old as the web nearly). I'd love to train something big on that and wikipedia, wikinews and a few journals and papers. See how well I could get it to generallize to other topics after training on one. Then train on all. They tag each original post as a topic ('Law', 'Technology' etc...)
https://slashdot.org/faq/metamod.shtml

TrustServista

The hidden part of the information iceberg

TRUSTSERVISTA.COM

LikeShow more reactions

 · Reply · Remove Preview · 

2

 · August 12 at 8:00am · EditedManage

Stuart Gray

Stuart Gray I definitely think the idea of multiple classifiers for different metrics has a lot of merit.

All this talk has reminded me of a crude rule-based classifier I wrote many years ago at a former workplace before I had any real ML knowledge. It read through my work inbox and scored each sender according to the 'type of communicator' they were; auditory, visual, or kinaesthetic; based on their use of keywords and metaphors.

No idea how sound the science is behind that, just that it was something I was taught to be aware of on a soft skills course at the time. 

The idea was that I would use it to learn someone's preferred style, and adapt my email phrasing to them to suit their preferences in the hope of being better received/understood. 

It seemed to have some results, although I never took a scientific approach to it. I also didn't use it for very long - at the time I found it ended up making me more aware in general of how people tended to communicate, and learned to automatically adapt my tone and style to suit.

Also made me ponder at the time how this would work out if everyone did this - no one would register a distinct personal style because we would all be trying to adapt to one another.

Eric Heitzman I have an idea for synthetic synesthesia.
I would train 2 deep autoencoders.
Like one for human movement and body positions, and another for music.
Both would have the same number of outputs at the final interior layer.
And then I would encode the music, and reconstruct it as body movements for interpretive dance.
But it could be used for anything.
Like decoding images as sound, or horses as people, or jazz as rock, or whatever.

Phil Teare

Phil Teare I have another. Attention based network for transferring vocal intonation, emotion and timing to make film dubbing better, at low cost. Translation of the script is often affordable, but effective re-directing of the voice actors can be too expensive. ...See More

Performance RNN: Generating Music with Expressive Timing and Dynamics

We present Performance RNN, an LSTM-based…

MAGENTA.TENSORFLOW.ORG

LikeShow more reactions

 · Reply · Remove Preview · 

2

 · August 13 at 3:04pmManage

Phil Teare

Phil Teare There's also a ton of training data out there to train a discriminator, in the form of films and their dubbed counterparts. Adversarial is the way to go I think. Generator then tries to fool discriminator.

Like

 · Reply · August 13 at 3:10pm

Mobile Phone application that runs at all time and listens to the environment via microphone.  Incorporates location data, crime statistics, weather, etc. to determine a Danger ratio... if it hits a certain threshold it notifies you that you may be in danger and to be more aware of your surroundings.

Could be useful for deaf users or just people who don't pay attention to their surroundings.

Maybe even listen in on your conversations to listen to queues about obfuscation or mal intent?

Emmanuel Atobrah's picture

Thanks

<b>ATOMIC</b>